Oracle Announces MySQL HeatWave on AWS

>>Oracle continues to enhance my sequel Heatwave at a very rapid pace. The company is now in its fourth major release since the original announcement in December 2020. 1 of the main criticisms of my sequel, Heatwave, is that it only runs on O. C I. Oracle Cloud Infrastructure and as a lock in to Oracle's Cloud. Oracle recently announced that heat wave is now going to be available in AWS Cloud and it announced its intent to bring my sequel Heatwave to Azure. So my secret heatwave on AWS is a significant TAM expansion move for Oracle because of the momentum AWS Cloud continues to show. And evidently the Heatwave Engineering team has taken the development effort from O. C I. And is bringing that to A W S with a number of enhancements that we're gonna dig into today is senior vice president. My sequel Heatwave at Oracle is back with me on a cube conversation to discuss the latest heatwave news, and we're eager to hear any benchmarks relative to a W S or any others. Nippon has been leading the Heatwave engineering team for over 10 years and there's over 100 and 85 patents and database technology. Welcome back to the show and good to see you. >>Thank you. Very happy to be back. >>Now for those who might not have kept up with the news, uh, to kick things off, give us an overview of my sequel, Heatwave and its evolution. So far, >>so my sequel, Heat Wave, is a fully managed my secret database service offering from Oracle. Traditionally, my secret has been designed and optimised for transaction processing. So customers of my sequel then they had to run analytics or when they had to run machine learning, they would extract the data out of my sequel into some other database for doing. Unlike processing or machine learning processing my sequel, Heat provides all these capabilities built in to a single database service, which is my sequel. He'd fake So customers of my sequel don't need to move the data out with the same database. They can run transaction processing and predicts mixed workloads, machine learning, all with a very, very good performance in very good price performance. Furthermore, one of the design points of heat wave is is a scale out architecture, so the system continues to scale and performed very well, even when customers have very large late assignments. >>So we've seen some interesting moves by Oracle lately. The collaboration with Azure we've we've covered that pretty extensively. What was the impetus here for bringing my sequel Heatwave onto the AWS cloud? What were the drivers that you considered? >>So one of the observations is that a very large percentage of users of my sequel Heatwave, our AWS users who are migrating of Aurora or so already we see that a good percentage of my secret history of customers are migrating from GWS. However, there are some AWS customers who are still not able to migrate the O. C. I to my secret heat wave. And the reason is because of, um, exorbitant cost, which was charges. So in order to migrate the workload from AWS to go see, I digress. Charges are very high fees which becomes prohibitive for the customer or the second example we have seen is that the latency of practising a database which is outside of AWS is very high. So there's a class of customers who would like to get the benefits of my secret heatwave but were unable to do so and with this support of my secret trip inside of AWS, these customers can now get all the grease of the benefits of my secret he trip without having to pay the high fees or without having to suffer with the poorly agency, which is because of the ws architecture. >>Okay, so you're basically meeting the customer's where they are. So was this a straightforward lifted shift from from Oracle Cloud Infrastructure to AWS? >>No, it is not because one of the design girls we have with my sequel, Heatwave is that we want to provide our customers with the best price performance regardless of the cloud. So when we decided to offer my sequel, he headed west. Um, we have optimised my sequel Heatwave on it as well. So one of the things to point out is that this is a service with the data plane control plane and the console are natively running on AWS. And the benefits of doing so is that now we can optimise my sequel Heatwave for the E. W s architecture. In addition to that, we have also announced a bunch of new capabilities as a part of the service which will also be available to the my secret history of customers and our CI, But we just announced them and we're offering them as a part of my secret history of offering on AWS. >>So I just want to make sure I understand that it's not like you just wrapped your stack in a container and stuck it into a W s to be hosted. You're saying you're actually taking advantage of the capabilities of the AWS cloud natively? And I think you've made some other enhancements as well that you're alluding to. Can you maybe, uh, elucidate on those? Sure. >>So for status, um, we have taken the mind sequel Heatwave code and we have optimised for the It was infrastructure with its computer network. And as a result, customers get very good performance and price performance. Uh, with my secret he trade in AWS. That's one performance. Second thing is, we have designed new interactive counsel for the service, which means that customers can now provision there instances with the council. But in addition, they can also manage their schemas. They can. Then court is directly from the council. Autopilot is integrated. The council we have introduced performance monitoring, so a lot of capabilities which we have introduced as a part of the new counsel. The third thing is that we have added a bunch of new security features, uh, expose some of the security features which were part of the My Secret Enterprise edition as a part of the service, which gives customers now a choice of using these features to build more secure applications. And finally, we have extended my secret autopilot for a number of old gpus cases. In the past, my secret autopilot had a lot of capabilities for Benedict, and now we have augmented my secret autopilot to offer capabilities for elderly people. Includes as well. >>But there was something in your press release called Auto thread. Pooling says it provides higher and sustained throughput. High concerns concerns concurrency by determining Apple number of transactions, which should be executed. Uh, what is that all about? The auto thread pool? It seems pretty interesting. How does it affect performance? Can you help us understand that? >>Yes, and this is one of the capabilities of alluding to which we have added in my secret autopilot for transaction processing. So here is the basic idea. If you have a system where there's a large number of old EP transactions coming into it at a high degrees of concurrency in many of the existing systems of my sequel based systems, it can lead to a state where there are few transactions executing, but a bunch of them can get blocked with or a pilot tried pulling. What we basically do is we do workload aware admission control and what this does is it figures out, what's the right scheduling or all of these algorithms, so that either the transactions are executing or as soon as something frees up, they can start executing, so there's no transaction which is blocked. The advantage to the customer of this capability is twofold. A get significantly better throughput compared to service like Aurora at high levels of concurrency. So at high concurrency, for instance, uh, my secret because of this capability Uh oh, thread pulling offers up to 10 times higher compared to Aurora, that's one first benefit better throughput. The second advantage is that the true part of the system never drops, even at high levels of concurrency, whereas in the case of Aurora, the trooper goes up, but then, at high concurrency is, let's say, starting, uh, level of 500 or something. It depends upon the underlying shit they're using the troopers just dropping where it's with my secret heatwave. The truth will never drops. Now, the ramification for the customer is that if the truth is not gonna drop, the user can start off with a small shape, get the performance and be a show that even the workload increases. They will never get a performance, which is worse than what they're getting with lower levels of concurrency. So this let's leads to customers provisioning a shape which is just right for them. And if they need, they can, uh, go with the largest shape. But they don't like, you know, over pay. So those are the two benefits. Better performance and sustain, uh, regardless of the level of concurrency. >>So how do we quantify that? I know you've got some benchmarks. How can you share comparisons with other cloud databases especially interested in in Amazon's own databases are obviously very popular, and and are you publishing those again and get hub, as you have done in the past? Take us through the benchmarks. >>Sure, So benchmarks are important because that gives customers a sense of what performance to expect and what price performance to expect. So we have run a number of benchmarks. And yes, all these benchmarks are available on guitar for customers to take a look at. So we have performance results on all the three castle workloads, ol DB Analytics and Machine Learning. So let's start with the Rdp for Rdp and primarily because of the auto thread pulling feature. We show that for the IPCC for attended dataset at high levels of concurrency, heatwave offers up to 10 times better throughput and this performance is sustained, whereas in the case of Aurora, the performance really drops. So that's the first thing that, uh, tend to alibi. Sorry, 10 gigabytes. B B C c. I can come and see the performance are the throughput is 10 times better than Aurora for analytics. We have done a comparison of my secret heatwave in AWS and compared with Red Ship Snowflake Googled inquiry, we find that the price performance of my secret heatwave compared to read ship is seven times better. So my sequel, Heat Wave in AWS, provides seven times better price performance than red ship. That's a very, uh, interesting results to us. Which means that customers of Red Shift are really going to take the service seriously because they're gonna get seven times better price performance. And this is all running in a W s so compared. >>Okay, carry on. >>And then I was gonna say, compared to like, Snowflake, uh, in AWS offers 10 times better price performance. And compared to Google, ubiquity offers 12 times better price performance. And this is based on a four terabyte p PCH workload. Results are available on guitar, and then the third category is machine learning and for machine learning, uh, for training, the performance of my secret heatwave is 25 times faster compared to that shit. So all the three workloads we have benchmark's results, and all of these scripts are available on YouTube. >>Okay, so you're comparing, uh, my sequel Heatwave on AWS to Red Shift and snowflake on AWS. And you're comparing my sequel Heatwave on a W s too big query. Obviously running on on Google. Um, you know, one of the things Oracle is done in the past when you get the price performance and I've always tried to call fouls you're, like, double your price for running the oracle database. Uh, not Heatwave, but Oracle Database on a W s. And then you'll show how it's it's so much cheaper on on Oracle will be like Okay, come on. But they're not doing that here. You're basically taking my sequel Heatwave on a W s. I presume you're using the same pricing for whatever you see to whatever else you're using. Storage, um, reserved instances. That's apples to apples on A W s. And you have to obviously do some kind of mapping for for Google, for big query. Can you just verify that for me, >>we are being more than fair on two dimensions. The first thing is, when I'm talking about the price performance for analytics, right for, uh, with my secret heat rape, the cost I'm talking about from my secret heat rape is the cost of running transaction processing, analytics and machine learning. So it's a fully loaded cost for the case of my secret heatwave. There has been I'm talking about red ship when I'm talking about Snowflake. I'm just talking about the cost of these databases for running, and it's only it's not, including the source database, which may be more or some other database, right? So that's the first aspect that far, uh, trip. It's the cost for running all three kinds of workloads, whereas for the competition, it's only for running analytics. The second thing is that for these are those services whether it's like shit or snowflakes, That's right. We're talking about one year, fully paid up front cost, right? So that's what most of the customers would pay for. Many of the customers would pay that they will sign a one year contract and pay all the costs ahead of time because they get a discount. So we're using that price and the case of Snowflake. The costs were using is their standard edition of price, not the Enterprise edition price. So yes, uh, more than in this competitive. >>Yeah, I think that's an important point. I saw an analysis by Marx Tamer on Wiki Bond, where he was doing the TCO comparisons. And I mean, if you have to use two separate databases in two separate licences and you have to do et yelling and all the labour associated with that, that that's that's a big deal and you're not even including that aspect in in your comparison. So that's pretty impressive. To what do you attribute that? You know, given that unlike, oh, ci within the AWS cloud, you don't have as much control over the underlying hardware. >>So look hard, but is one aspect. Okay, so there are three things which give us this advantage. The first thing is, uh, we have designed hateful foreign scale out architecture. So we came up with new algorithms we have come up with, like, uh, one of the design points for heat wave is a massively partitioned architecture, which leads to a very high degree of parallelism. So that's a lot of hype. Each were built, So that's the first part. The second thing is that although we don't have control over the hardware, but the second design point for heat wave is that it is optimised for commodity cloud and the commodity infrastructure so we can have another guys, what to say? The computer we get, how much network bandwidth do we get? How much of, like objects to a brand that we get in here? W s. And we have tuned heat for that. That's the second point And the third thing is my secret autopilot, which provides machine learning based automation. So what it does is that has the users workload is running. It learns from it, it improves, uh, various premieres in the system. So the system keeps getting better as you learn more and more questions. And this is the third thing, uh, as a result of which we get a significant edge over the competition. >>Interesting. I mean, look, any I SV can go on any cloud and take advantage of it. And that's, uh I love it. We live in a new world. How about machine learning workloads? What? What did you see there in terms of performance and benchmarks? >>Right. So machine learning. We offer three capabilities training, which is fully automated, running in France and explanations. So one of the things which many of our customers told us coming from the enterprise is that explanations are very important to them because, uh, customers want to know that. Why did the the system, uh, choose a certain prediction? So we offer explanations for all models which have been derailed by. That's the first thing. Now, one of the interesting things about training is that training is usually the most expensive phase of machine learning. So we have spent a lot of time improving the performance of training. So we have a bunch of techniques which we have developed inside of Oracle to improve the training process. For instance, we have, uh, metal and proxy models, which really give us an advantage. We use adaptive sampling. We have, uh, invented in techniques for paralysing the hyper parameter search. So as a result of a lot of this work, our training is about 25 times faster than that ship them health and all the data is, uh, inside the database. All this processing is being done inside the database, so it's much faster. It is inside the database. And I want to point out that there is no additional charge for the history of customers because we're using the same cluster. You're not working in your service. So all of these machine learning capabilities are being offered at no additional charge inside the database and as a performance, which is significantly faster than that, >>are you taking advantage of or is there any, uh, need not need, but any advantage that you can get if two by exploiting things like gravity. John, we've talked about that a little bit in the past. Or trainee. Um, you just mentioned training so custom silicon that AWS is doing, you're taking advantage of that. Do you need to? Can you give us some insight >>there? So there are two things, right? We're always evaluating What are the choices we have from hybrid perspective? Obviously, for us to leverage is right and like all the things you mention about like we have considered them. But there are two things to consider. One is he is a memory system. So he favours a big is the dominant cost. The processor is a person of the cost, but memory is the dominant cost. So what we have evaluated and found is that the current shape which we are using is going to provide our customers with the best price performance. That's the first thing. The second thing is that there are opportunities at times when we can use a specialised processor for vaccinating the world for a bit. But then it becomes a matter of the cost of the customer. Advantage of our current architecture is on the same hardware. Customers are getting very good performance. Very good, energetic performance in a very good machine learning performance. If you will go with the specialised processor, it may. Actually, it's a machine learning, but then it's an additional cost with the customers we need to pay. So we are very sensitive to the customer's request, which is usually to provide very good performance at a very low cost. And we feel is that the current design we have as providing customers very good performance and very good price performance. >>So part of that is architectural. The memory intensive nature of of heat wave. The other is A W s pricing. If AWS pricing were to flip, it might make more sense for you to take advantage of something like like cranium. Okay, great. Thank you. And welcome back to the benchmarks benchmarks. Sometimes they're artificial right there. A car can go from 0 to 60 in two seconds. But I might not be able to experience that level of performance. Do you? Do you have any real world numbers from customers that have used my sequel Heatwave on A W s. And how they look at performance? >>Yes, absolutely so the my Secret service on the AWS. This has been in Vera for, like, since November, right? So we have a lot of customers who have tried the service. And what actually we have found is that many of these customers, um, planning to migrate from Aurora to my secret heat rape. And what they find is that the performance difference is actually much more pronounced than what I was talking about. Because with Aurora, the performance is actually much poorer compared to uh, like what I've talked about. So in some of these cases, the customers found improvement from 60 times, 240 times, right? So he travels 100 for 240 times faster. It was much less expensive. And the third thing, which is you know, a noteworthy is that customers don't need to change their applications. So if you ask the top three reasons why customers are migrating, it's because of this. No change to the application much faster, and it is cheaper. So in some cases, like Johnny Bites, what they found is that the performance of their applications for the complex storeys was about 60 to 90 times faster. Then we had 60 technologies. What they found is that the performance of heat we have compared to Aurora was 100 and 39 times faster. So, yes, we do have many such examples from real workloads from customers who have tried it. And all across what we find is if it offers better performance, lower cost and a single database such that it is compatible with all existing by sequel based applications and workloads. >>Really impressive. The analysts I talked to, they're all gaga over heatwave, and I can see why. Okay, last question. Maybe maybe two and one. Uh, what's next? In terms of new capabilities that customers are going to be able to leverage and any other clouds that you're thinking about? We talked about that upfront, but >>so in terms of the capabilities you have seen, like they have been, you know, non stop attending to the feedback from the customers in reacting to it. And also, we have been in a wedding like organically. So that's something which is gonna continue. So, yes, you can fully expect that people not dressed and continue to in a way and with respect to the other clouds. Yes, we are planning to support my sequel. He tripped on a show, and this is something that will be announced in the near future. Great. >>All right, Thank you. Really appreciate the the overview. Congratulations on the work. Really exciting news that you're moving my sequel Heatwave into other clouds. It's something that we've been expecting for some time. So it's great to see you guys, uh, making that move, and as always, great to have you on the Cube. >>Thank you for the opportunity. >>All right. And thank you for watching this special cube conversation. I'm Dave Volonte, and we'll see you next time.

Published Date : Sep 14 2022

SUMMARY :

The company is now in its fourth major release since the original announcement in December 2020. Very happy to be back. Now for those who might not have kept up with the news, uh, to kick things off, give us an overview of my So customers of my sequel then they had to run analytics or when they had to run machine So we've seen some interesting moves by Oracle lately. So one of the observations is that a very large percentage So was this a straightforward lifted shift from No, it is not because one of the design girls we have with my sequel, So I just want to make sure I understand that it's not like you just wrapped your stack in So for status, um, we have taken the mind sequel Heatwave code and we have optimised Can you help us understand that? So this let's leads to customers provisioning a shape which is So how do we quantify that? So that's the first thing that, So all the three workloads we That's apples to apples on A W s. And you have to obviously do some kind of So that's the first aspect And I mean, if you have to use two So the system keeps getting better as you learn more and What did you see there in terms of performance and benchmarks? So we have a bunch of techniques which we have developed inside of Oracle to improve the training need not need, but any advantage that you can get if two by exploiting We're always evaluating What are the choices we have So part of that is architectural. And the third thing, which is you know, a noteworthy is that In terms of new capabilities that customers are going to be able so in terms of the capabilities you have seen, like they have been, you know, non stop attending So it's great to see you guys, And thank you for watching this special cube conversation.

ENTITIES

Entity	Category	Confidence
Dave Volonte	PERSON	0.99+
December 2020	DATE	0.99+
Amazon	ORGANIZATION	0.99+
John	PERSON	0.99+
France	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
10 times	QUANTITY	0.99+
two things	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
Heatwave	TITLE	0.99+
100	QUANTITY	0.99+
60 times	QUANTITY	0.99+
one year	QUANTITY	0.99+
12 times	QUANTITY	0.99+
GWS	ORGANIZATION	0.99+
60 technologies	QUANTITY	0.99+
first part	QUANTITY	0.99+
240 times	QUANTITY	0.99+
two separate licences	QUANTITY	0.99+
third category	QUANTITY	0.99+
second advantage	QUANTITY	0.99+
0	QUANTITY	0.99+
seven times	QUANTITY	0.99+
two seconds	QUANTITY	0.99+
two	QUANTITY	0.99+
Apple	ORGANIZATION	0.99+
seven times	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
one	QUANTITY	0.99+
25 times	QUANTITY	0.99+
second point	QUANTITY	0.99+
November	DATE	0.99+
85 patents	QUANTITY	0.99+
second thing	QUANTITY	0.99+
Aurora	TITLE	0.99+
third thing	QUANTITY	0.99+
Each	QUANTITY	0.99+
second example	QUANTITY	0.99+
10 gigabytes	QUANTITY	0.99+
three things	QUANTITY	0.99+
One	QUANTITY	0.99+
two benefits	QUANTITY	0.99+
one aspect	QUANTITY	0.99+
first aspect	QUANTITY	0.98+
two separate databases	QUANTITY	0.98+
over 10 years	QUANTITY	0.98+
fourth major release	QUANTITY	0.98+
39 times	QUANTITY	0.98+
first thing	QUANTITY	0.98+
Heat Wave	TITLE	0.98+

Breaking Analysis: We Have the Data…What Private Tech Companies Don’t Tell you About Their Business

>> From The Cube Studios in Palo Alto and Boston, bringing you data driven insights from The Cube at ETR. This is "Breaking Analysis" with Dave Vellante. >> The reverse momentum in tech stocks caused by rising interest rates, less attractive discounted cash flow models, and more tepid forward guidance, can be easily measured by public market valuations. And while there's lots of discussion about the impact on private companies and cash runway and 409A valuations, measuring the performance of non-public companies isn't as easy. IPOs have dried up and public statements by private companies, of course, they accentuate the good and they kind of hide the bad. Real data, unless you're an insider, is hard to find. Hello and welcome to this week's "Wikibon Cube Insights" powered by ETR. In this "Breaking Analysis", we unlock some of the secrets that non-public, emerging tech companies may or may not be sharing. And we do this by introducing you to a capability from ETR that we've not exposed you to over the past couple of years, it's called the Emerging Technologies Survey, and it is packed with sentiment data and performance data based on surveys of more than a thousand CIOs and IT buyers covering more than 400 companies. And we've invited back our colleague, Erik Bradley of ETR to help explain the survey and the data that we're going to cover today. Erik, this survey is something that I've not personally spent much time on, but I'm blown away at the data. It's really unique and detailed. First of all, welcome. Good to see you again. >> Great to see you too, Dave, and I'm really happy to be talking about the ETS or the Emerging Technology Survey. Even our own clients of constituents probably don't spend as much time in here as they should. >> Yeah, because there's so much in the mainstream, but let's pull up a slide to bring out the survey composition. Tell us about the study. How often do you run it? What's the background and the methodology? >> Yeah, you were just spot on the way you were talking about the private tech companies out there. So what we did is we decided to take all the vendors that we track that are not yet public and move 'em over to the ETS. And there isn't a lot of information out there. If you're not in Silicon (indistinct), you're not going to get this stuff. So PitchBook and Tech Crunch are two out there that gives some data on these guys. But what we really wanted to do was go out to our community. We have 6,000, ITDMs in our community. We wanted to ask them, "Are you aware of these companies? And if so, are you allocating any resources to them? Are you planning to evaluate them," and really just kind of figure out what we can do. So this particular survey, as you can see, 1000 plus responses, over 450 vendors that we track. And essentially what we're trying to do here is talk about your evaluation and awareness of these companies and also your utilization. And also if you're not utilizing 'em, then we can also figure out your sales conversion or churn. So this is interesting, not only for the ITDMs themselves to figure out what their peers are evaluating and what they should put in POCs against the big guys when contracts come up. But it's also really interesting for the tech vendors themselves to see how they're performing. >> And you can see 2/3 of the respondents are director level of above. You got 28% is C-suite. There is of course a North America bias, 70, 75% is North America. But these smaller companies, you know, that's when they start doing business. So, okay. We're going to do a couple of things here today. First, we're going to give you the big picture across the sectors that ETR covers within the ETS survey. And then we're going to look at the high and low sentiment for the larger private companies. And then we're going to do the same for the smaller private companies, the ones that don't have as much mindshare. And then I'm going to put those two groups together and we're going to look at two dimensions, actually three dimensions, which companies are being evaluated the most. Second, companies are getting the most usage and adoption of their offerings. And then third, which companies are seeing the highest churn rates, which of course is a silent killer of companies. And then finally, we're going to look at the sentiment and mindshare for two key areas that we like to cover often here on "Breaking Analysis", security and data. And data comprises database, including data warehousing, and then big data analytics is the second part of data. And then machine learning and AI is the third section within data that we're going to look at. Now, one other thing before we get into it, ETR very often will include open source offerings in the mix, even though they're not companies like TensorFlow or Kubernetes, for example. And we'll call that out during this discussion. The reason this is done is for context, because everyone is using open source. It is the heart of innovation and many business models are super glued to an open source offering, like take MariaDB, for example. There's the foundation and then there's with the open source code and then there, of course, the company that sells services around the offering. Okay, so let's first look at the highest and lowest sentiment among these private firms, the ones that have the highest mindshare. So they're naturally going to be somewhat larger. And we do this on two dimensions, sentiment on the vertical axis and mindshare on the horizontal axis and note the open source tool, see Kubernetes, Postgres, Kafka, TensorFlow, Jenkins, Grafana, et cetera. So Erik, please explain what we're looking at here, how it's derived and what the data tells us. >> Certainly, so there is a lot here, so we're going to break it down first of all by explaining just what mindshare and net sentiment is. You explain the axis. We have so many evaluation metrics, but we need to aggregate them into one so that way we can rank against each other. Net sentiment is really the aggregation of all the positive and subtracting out the negative. So the net sentiment is a very quick way of looking at where these companies stand versus their peers in their sectors and sub sectors. Mindshare is basically the awareness of them, which is good for very early stage companies. And you'll see some names on here that are obviously been around for a very long time. And they're clearly be the bigger on the axis on the outside. Kubernetes, for instance, as you mentioned, is open source. This de facto standard for all container orchestration, and it should be that far up into the right, because that's what everyone's using. In fact, the open source leaders are so prevalent in the emerging technology survey that we break them out later in our analysis, 'cause it's really not fair to include them and compare them to the actual companies that are providing the support and the security around that open source technology. But no survey, no analysis, no research would be complete without including these open source tech. So what we're looking at here, if I can just get away from the open source names, we see other things like Databricks and OneTrust . They're repeating as top net sentiment performers here. And then also the design vendors. People don't spend a lot of time on 'em, but Miro and Figma. This is their third survey in a row where they're just dominating that sentiment overall. And Adobe should probably take note of that because they're really coming after them. But Databricks, we all know probably would've been a public company by now if the market hadn't turned, but you can see just how dominant they are in a survey of nothing but private companies. And we'll see that again when we talk about the database later. >> And I'll just add, so you see automation anywhere on there, the big UiPath competitor company that was not able to get to the public markets. They've been trying. Snyk, Peter McKay's company, they've raised a bunch of money, big security player. They're doing some really interesting things in developer security, helping developers secure the data flow, H2O.ai, Dataiku AI company. We saw them at the Snowflake Summit. Redis Labs, Netskope and security. So a lot of names that we know that ultimately we think are probably going to be hitting the public market. Okay, here's the same view for private companies with less mindshare, Erik. Take us through this one. >> On the previous slide too real quickly, I wanted to pull that security scorecard and we'll get back into it. But this is a newcomer, that I couldn't believe how strong their data was, but we'll bring that up in a second. Now, when we go to the ones of lower mindshare, it's interesting to talk about open source, right? Kubernetes was all the way on the top right. Everyone uses containers. Here we see Istio up there. Not everyone is using service mesh as much. And that's why Istio is in the smaller breakout. But still when you talk about net sentiment, it's about the leader, it's the highest one there is. So really interesting to point out. Then we see other names like Collibra in the data side really performing well. And again, as always security, very well represented here. We have Aqua, Wiz, Armis, which is a standout in this survey this time around. They do IoT security. I hadn't even heard of them until I started digging into the data here. And I couldn't believe how well they were doing. And then of course you have AnyScale, which is doing a second best in this and the best name in the survey Hugging Face, which is a machine learning AI tool. Also doing really well on a net sentiment, but they're not as far along on that access of mindshare just yet. So these are again, emerging companies that might not be as well represented in the enterprise as they will be in a couple of years. >> Hugging Face sounds like something you do with your two year old. Like you said, you see high performers, AnyScale do machine learning and you mentioned them. They came out of Berkeley. Collibra Governance, InfluxData is on there. InfluxDB's a time series database. And yeah, of course, Alex, if you bring that back up, you get a big group of red dots, right? That's the bad zone, I guess, which Sisense does vis, Yellowbrick Data is a NPP database. How should we interpret the red dots, Erik? I mean, is it necessarily a bad thing? Could it be misinterpreted? What's your take on that? >> Sure, well, let me just explain the definition of it first from a data science perspective, right? We're a data company first. So the gray dots that you're seeing that aren't named, that's the mean that's the average. So in order for you to be on this chart, you have to be at least one standard deviation above or below that average. So that gray is where we're saying, "Hey, this is where the lump of average comes in. This is where everyone normally stands." So you either have to be an outperformer or an underperformer to even show up in this analysis. So by definition, yes, the red dots are bad. You're at least one standard deviation below the average of your peers. It's not where you want to be. And if you're on the lower left, not only are you not performing well from a utilization or an actual usage rate, but people don't even know who you are. So that's a problem, obviously. And the VCs and the PEs out there that are backing these companies, they're the ones who mostly are interested in this data. >> Yeah. Oh, that's great explanation. Thank you for that. No, nice benchmarking there and yeah, you don't want to be in the red. All right, let's get into the next segment here. Here going to look at evaluation rates, adoption and the all important churn. First new evaluations. Let's bring up that slide. And Erik, take us through this. >> So essentially I just want to explain what evaluation means is that people will cite that they either plan to evaluate the company or they're currently evaluating. So that means we're aware of 'em and we are choosing to do a POC of them. And then we'll see later how that turns into utilization, which is what a company wants to see, awareness, evaluation, and then actually utilizing them. That's sort of the life cycle for these emerging companies. So what we're seeing here, again, with very high evaluation rates. H2O, we mentioned. SecurityScorecard jumped up again. Chargebee, Snyk, Salt Security, Armis. A lot of security names are up here, Aqua, Netskope, which God has been around forever. I still can't believe it's in an Emerging Technology Survey But so many of these names fall in data and security again, which is why we decided to pick those out Dave. And on the lower side, Vena, Acton, those unfortunately took the dubious award of the lowest evaluations in our survey, but I prefer to focus on the positive. So SecurityScorecard, again, real standout in this one, they're in a security assessment space, basically. They'll come in and assess for you how your security hygiene is. And it's an area of a real interest right now amongst our ITDM community. >> Yeah, I mean, I think those, and then Arctic Wolf is up there too. They're doing managed services. You had mentioned Netskope. Yeah, okay. All right, let's look at now adoption. These are the companies whose offerings are being used the most and are above that standard deviation in the green. Take us through this, Erik. >> Sure, yet again, what we're looking at is, okay, we went from awareness, we went to evaluation. Now it's about utilization, which means a survey respondent's going to state "Yes, we evaluated and we plan to utilize it" or "It's already in our enterprise and we're actually allocating further resources to it." Not surprising, again, a lot of open source, the reason why, it's free. So it's really easy to grow your utilization on something that's free. But as you and I both know, as Red Hat proved, there's a lot of money to be made once the open source is adopted, right? You need the governance, you need the security, you need the support wrapped around it. So here we're seeing Kubernetes, Postgres, Apache Kafka, Jenkins, Grafana. These are all open source based names. But if we're looking at names that are non open source, we're going to see Databricks, Automation Anywhere, Rubrik all have the highest mindshare. So these are the names, not surprisingly, all names that probably should have been public by now. Everyone's expecting an IPO imminently. These are the names that have the highest mindshare. If we talk about the highest utilization rates, again, Miro and Figma pop up, and I know they're not household names, but they are just dominant in this survey. These are applications that are meant for design software and, again, they're going after an Autodesk or a CAD or Adobe type of thing. It is just dominant how high the utilization rates are here, which again is something Adobe should be paying attention to. And then you'll see a little bit lower, but also interesting, we see Collibra again, we see Hugging Face again. And these are names that are obviously in the data governance, ML, AI side. So we're seeing a ton of data, a ton of security and Rubrik was interesting in this one, too, high utilization and high mindshare. We know how pervasive they are in the enterprise already. >> Erik, Alex, keep that up for a second, if you would. So yeah, you mentioned Rubrik. Cohesity's not on there. They're sort of the big one. We're going to talk about them in a moment. Puppet is interesting to me because you remember the early days of that sort of space, you had Puppet and Chef and then you had Ansible. Red Hat bought Ansible and then Ansible really took off. So it's interesting to see Puppet on there as well. Okay. So now let's look at the churn because this one is where you don't want to be. It's, of course, all red 'cause churn is bad. Take us through this, Erik. >> Yeah, definitely don't want to be here and I don't love to dwell on the negative. So we won't spend as much time. But to your point, there's one thing I want to point out that think it's important. So you see Rubrik in the same spot, but Rubrik has so many citations in our survey that it actually would make sense that they're both being high utilization and churn just because they're so well represented. They have such a high overall representation in our survey. And the reason I call that out is Cohesity. Cohesity has an extremely high churn rate here about 17% and unlike Rubrik, they were not on the utilization side. So Rubrik is seeing both, Cohesity is not. It's not being utilized, but it's seeing a high churn. So that's the way you can look at this data and say, "Hm." Same thing with Puppet. You noticed that it was on the other slide. It's also on this one. So basically what it means is a lot of people are giving Puppet a shot, but it's starting to churn, which means it's not as sticky as we would like. One that was surprising on here for me was Tanium. It's kind of jumbled in there. It's hard to see in the middle, but Tanium, I was very surprised to see as high of a churn because what I do hear from our end user community is that people that use it, like it. It really kind of spreads into not only vulnerability management, but also that endpoint detection and response side. So I was surprised by that one, mostly to see Tanium in here. Mural, again, was another one of those application design softwares that's seeing a very high churn as well. >> So you're saying if you're in both... Alex, bring that back up if you would. So if you're in both like MariaDB is for example, I think, yeah, they're in both. They're both green in the previous one and red here, that's not as bad. You mentioned Rubrik is going to be in both. Cohesity is a bit of a concern. Cohesity just brought on Sanjay Poonen. So this could be a go to market issue, right? I mean, 'cause Cohesity has got a great product and they got really happy customers. So they're just maybe having to figure out, okay, what's the right ideal customer profile and Sanjay Poonen, I guarantee, is going to have that company cranking. I mean they had been doing very well on the surveys and had fallen off of a bit. The other interesting things wondering the previous survey I saw Cvent, which is an event platform. My only reason I pay attention to that is 'cause we actually have an event platform. We don't sell it separately. We bundle it as part of our offerings. And you see Hopin on here. Hopin raised a billion dollars during the pandemic. And we were like, "Wow, that's going to blow up." And so you see Hopin on the churn and you didn't see 'em in the previous chart, but that's sort of interesting. Like you said, let's not kind of dwell on the negative, but you really don't. You know, churn is a real big concern. Okay, now we're going to drill down into two sectors, security and data. Where data comprises three areas, database and data warehousing, machine learning and AI and big data analytics. So first let's take a look at the security sector. Now this is interesting because not only is it a sector drill down, but also gives an indicator of how much money the firm has raised, which is the size of that bubble. And to tell us if a company is punching above its weight and efficiently using its venture capital. Erik, take us through this slide. Explain the dots, the size of the dots. Set this up please. >> Yeah. So again, the axis is still the same, net sentiment and mindshare, but what we've done this time is we've taken publicly available information on how much capital company is raised and that'll be the size of the circle you see around the name. And then whether it's green or red is basically saying relative to the amount of money they've raised, how are they doing in our data? So when you see a Netskope, which has been around forever, raised a lot of money, that's why you're going to see them more leading towards red, 'cause it's just been around forever and kind of would expect it. Versus a name like SecurityScorecard, which is only raised a little bit of money and it's actually performing just as well, if not better than a name, like a Netskope. OneTrust doing absolutely incredible right now. BeyondTrust. We've seen the issues with Okta, right. So those are two names that play in that space that obviously are probably getting some looks about what's going on right now. Wiz, we've all heard about right? So raised a ton of money. It's doing well on net sentiment, but the mindshare isn't as well as you'd want, which is why you're going to see a little bit of that red versus a name like Aqua, which is doing container and application security. And hasn't raised as much money, but is really neck and neck with a name like Wiz. So that is why on a relative basis, you'll see that more green. As we all know, information security is never going away. But as we'll get to later in the program, Dave, I'm not sure in this current market environment, if people are as willing to do POCs and switch away from their security provider, right. There's a little bit of tepidness out there, a little trepidation. So right now we're seeing overall a slight pause, a slight cooling in overall evaluations on the security side versus historical levels a year ago. >> Now let's stay on here for a second. So a couple things I want to point out. So it's interesting. Now Snyk has raised over, I think $800 million but you can see them, they're high on the vertical and the horizontal, but now compare that to Lacework. It's hard to see, but they're kind of buried in the middle there. That's the biggest dot in this whole thing. I think I'm interpreting this correctly. They've raised over a billion dollars. It's a Mike Speiser company. He was the founding investor in Snowflake. So people watch that very closely, but that's an example of where they're not punching above their weight. They recently had a layoff and they got to fine tune things, but I'm still confident they they're going to do well. 'Cause they're approaching security as a data problem, which is probably people having trouble getting their arms around that. And then again, I see Arctic Wolf. They're not red, they're not green, but they've raised fair amount of money, but it's showing up to the right and decent level there. And a couple of the other ones that you mentioned, Netskope. Yeah, they've raised a lot of money, but they're actually performing where you want. What you don't want is where Lacework is, right. They've got some work to do to really take advantage of the money that they raised last November and prior to that. >> Yeah, if you're seeing that more neutral color, like you're calling out with an Arctic Wolf, like that means relative to their peers, this is where they should be. It's when you're seeing that red on a Lacework where we all know, wow, you raised a ton of money and your mindshare isn't where it should be. Your net sentiment is not where it should be comparatively. And then you see these great standouts, like Salt Security and SecurityScorecard and Abnormal. You know they haven't raised that much money yet, but their net sentiment's higher and their mindshare's doing well. So those basically in a nutshell, if you're a PE or a VC and you see a small green circle, then you're doing well, then it means you made a good investment. >> Some of these guys, I don't know, but you see these small green circles. Those are the ones you want to start digging into and maybe help them catch a wave. Okay, let's get into the data discussion. And again, three areas, database slash data warehousing, big data analytics and ML AI. First, we're going to look at the database sector. So Alex, thank you for bringing that up. Alright, take us through this, Erik. Actually, let me just say Postgres SQL. I got to ask you about this. It shows some funding, but that actually could be a mix of EDB, the company that commercializes Postgres and Postgres the open source database, which is a transaction system and kind of an open source Oracle. You see MariaDB is a database, but open source database. But the companies they've raised over $200 million and they filed an S-4. So Erik looks like this might be a little bit of mashup of companies and open source products. Help us understand this. >> Yeah, it's tough when you start dealing with the open source side and I'll be honest with you, there is a little bit of a mashup here. There are certain names here that are a hundred percent for profit companies. And then there are others that are obviously open source based like Redis is open source, but Redis Labs is the one trying to monetize the support around it. So you're a hundred percent accurate on this slide. I think one of the things here that's important to note though, is just how important open source is to data. If you're going to be going to any of these areas, it's going to be open source based to begin with. And Neo4j is one I want to call out here. It's not one everyone's familiar with, but it's basically geographical charting database, which is a name that we're seeing on a net sentiment side actually really, really high. When you think about it's the third overall net sentiment for a niche database play. It's not as big on the mindshare 'cause it's use cases aren't as often, but third biggest play on net sentiment. I found really interesting on this slide. >> And again, so MariaDB, as I said, they filed an S-4 I think $50 million in revenue, that might even be ARR. So they're not huge, but they're getting there. And by the way, MariaDB, if you don't know, was the company that was formed the day that Oracle bought Sun in which they got MySQL and MariaDB has done a really good job of replacing a lot of MySQL instances. Oracle has responded with MySQL HeatWave, which was kind of the Oracle version of MySQL. So there's some interesting battles going on there. If you think about the LAMP stack, the M in the LAMP stack was MySQL. And so now it's all MariaDB replacing that MySQL for a large part. And then you see again, the red, you know, you got to have some concerns about there. Aerospike's been around for a long time. SingleStore changed their name a couple years ago, last year. Yellowbrick Data, Fire Bolt was kind of going after Snowflake for a while, but yeah, you want to get out of that red zone. So they got some work to do. >> And Dave, real quick for the people that aren't aware, I just want to let them know that we can cut this data with the public company data as well. So we can cross over this with that because some of these names are competing with the larger public company names as well. So we can go ahead and cross reference like a MariaDB with a Mongo, for instance, or of something of that nature. So it's not in this slide, but at another point we can certainly explain on a relative basis how these private names are doing compared to the other ones as well. >> All right, let's take a quick look at analytics. Alex, bring that up if you would. Go ahead, Erik. >> Yeah, I mean, essentially here, I can't see it on my screen, my apologies. I just kind of went to blank on that. So gimme one second to catch up. >> So I could set it up while you're doing that. You got Grafana up and to the right. I mean, this is huge right. >> Got it thank you. I lost my screen there for a second. Yep. Again, open source name Grafana, absolutely up and to the right. But as we know, Grafana Labs is actually picking up a lot of speed based on Grafana, of course. And I think we might actually hear some noise from them coming this year. The names that are actually a little bit more disappointing than I want to call out are names like ThoughtSpot. It's been around forever. Their mindshare of course is second best here but based on the amount of time they've been around and the amount of money they've raised, it's not actually outperforming the way it should be. We're seeing Moogsoft obviously make some waves. That's very high net sentiment for that company. It's, you know, what, third, fourth position overall in this entire area, Another name like Fivetran, Matillion is doing well. Fivetran, even though it's got a high net sentiment, again, it's raised so much money that we would've expected a little bit more at this point. I know you know this space extremely well, but basically what we're looking at here and to the bottom left, you're going to see some names with a lot of red, large circles that really just aren't performing that well. InfluxData, however, second highest net sentiment. And it's really pretty early on in this stage and the feedback we're getting on this name is the use cases are great, the efficacy's great. And I think it's one to watch out for. >> InfluxData, time series database. The other interesting things I just noticed here, you got Tamer on here, which is that little small green. Those are the ones we were saying before, look for those guys. They might be some of the interesting companies out there and then observe Jeremy Burton's company. They do observability on top of Snowflake, not green, but kind of in that gray. So that's kind of cool. Monte Carlo is another one, they're sort of slightly green. They are doing some really interesting things in data and data mesh. So yeah, okay. So I can spend all day on this stuff, Erik, phenomenal data. I got to get back and really dig in. Let's end with machine learning and AI. Now this chart it's similar in its dimensions, of course, except for the money raised. We're not showing that size of the bubble, but AI is so hot. We wanted to cover that here, Erik, explain this please. Why TensorFlow is highlighted and walk us through this chart. >> Yeah, it's funny yet again, right? Another open source name, TensorFlow being up there. And I just want to explain, we do break out machine learning, AI is its own sector. A lot of this of course really is intertwined with the data side, but it is on its own area. And one of the things I think that's most important here to break out is Databricks. We started to cover Databricks in machine learning, AI. That company has grown into much, much more than that. So I do want to state to you Dave, and also the audience out there that moving forward, we're going to be moving Databricks out of only the MA/AI into other sectors. So we can kind of value them against their peers a little bit better. But in this instance, you could just see how dominant they are in this area. And one thing that's not here, but I do want to point out is that we have the ability to break this down by industry vertical, organization size. And when I break this down into Fortune 500 and Fortune 1000, both Databricks and Tensorflow are even better than you see here. So it's quite interesting to see that the names that are succeeding are also succeeding with the largest organizations in the world. And as we know, large organizations means large budgets. So this is one area that I just thought was really interesting to point out that as we break it down, the data by vertical, these two names still are the outstanding players. >> I just also want to call it H2O.ai. They're getting a lot of buzz in the marketplace and I'm seeing them a lot more. Anaconda, another one. Dataiku consistently popping up. DataRobot is also interesting because all the kerfuffle that's going on there. The Cube guy, Cube alum, Chris Lynch stepped down as executive chairman. All this stuff came out about how the executives were taking money off the table and didn't allow the employees to participate in that money raising deal. So that's pissed a lot of people off. And so they're now going through some kind of uncomfortable things, which is unfortunate because DataRobot, I noticed, we haven't covered them that much in "Breaking Analysis", but I've noticed them oftentimes, Erik, in the surveys doing really well. So you would think that company has a lot of potential. But yeah, it's an important space that we're going to continue to watch. Let me ask you Erik, can you contextualize this from a time series standpoint? I mean, how is this changed over time? >> Yeah, again, not show here, but in the data. I'm sorry, go ahead. >> No, I'm sorry. What I meant, I should have interjected. In other words, you would think in a downturn that these emerging companies would be less interesting to buyers 'cause they're more risky. What have you seen? >> Yeah, and it was interesting before we went live, you and I were having this conversation about "Is the downturn stopping people from evaluating these private companies or not," right. In a larger sense, that's really what we're doing here. How are these private companies doing when it comes down to the actual practitioners? The people with the budget, the people with the decision making. And so what I did is, we have historical data as you know, I went back to the Emerging Technology Survey we did in November of 21, right at the crest right before the market started to really fall and everything kind of started to fall apart there. And what I noticed is on the security side, very much so, we're seeing less evaluations than we were in November 21. So I broke it down. On cloud security, net sentiment went from 21% to 16% from November '21. That's a pretty big drop. And again, that sentiment is our one aggregate metric for overall positivity, meaning utilization and actual evaluation of the name. Again in database, we saw it drop a little bit from 19% to 13%. However, in analytics we actually saw it stay steady. So it's pretty interesting that yes, cloud security and security in general is always going to be important. But right now we're seeing less overall net sentiment in that space. But within analytics, we're seeing steady with growing mindshare. And also to your point earlier in machine learning, AI, we're seeing steady net sentiment and mindshare has grown a whopping 25% to 30%. So despite the downturn, we're seeing more awareness of these companies in analytics and machine learning and a steady, actual utilization of them. I can't say the same in security and database. They're actually shrinking a little bit since the end of last year. >> You know it's interesting, we were on a round table, Erik does these round tables with CISOs and CIOs, and I remember one time you had asked the question, "How do you think about some of these emerging tech companies?" And one of the executives said, "I always include somebody in the bottom left of the Gartner Magic Quadrant in my RFPs. I think he said, "That's how I found," I don't know, it was Zscaler or something like that years before anybody ever knew of them "Because they're going to help me get to the next level." So it's interesting to see Erik in these sectors, how they're holding up in many cases. >> Yeah. It's a very important part for the actual IT practitioners themselves. There's always contracts coming up and you always have to worry about your next round of negotiations. And that's one of the roles these guys play. You have to do a POC when contracts come up, but it's also their job to stay on top of the new technology. You can't fall behind. Like everyone's a software company. Now everyone's a tech company, no matter what you're doing. So these guys have to stay in on top of it. And that's what this ETS can do. You can go in here and look and say, "All right, I'm going to evaluate their technology," and it could be twofold. It might be that you're ready to upgrade your technology and they're actually pushing the envelope or it simply might be I'm using them as a negotiation ploy. So when I go back to the big guy who I have full intentions of writing that contract to, at least I have some negotiation leverage. >> Erik, we got to leave it there. I could spend all day. I'm going to definitely dig into this on my own time. Thank you for introducing this, really appreciate your time today. >> I always enjoy it, Dave and I hope everyone out there has a great holiday weekend. Enjoy the rest of the summer. And, you know, I love to talk data. So anytime you want, just point the camera on me and I'll start talking data. >> You got it. I also want to thank the team at ETR, not only Erik, but Darren Bramen who's a data scientist, really helped prepare this data, the entire team over at ETR. I cannot tell you how much additional data there is. We are just scratching the surface in this "Breaking Analysis". So great job guys. I want to thank Alex Myerson. Who's on production and he manages the podcast. Ken Shifman as well, who's just coming back from VMware Explore. Kristen Martin and Cheryl Knight help get the word out on social media and in our newsletters. And Rob Hof is our editor in chief over at SiliconANGLE. Does some great editing for us. Thank you. All of you guys. Remember these episodes, they're all available as podcast, wherever you listen. All you got to do is just search "Breaking Analysis" podcast. I publish each week on wikibon.com and siliconangle.com. Or you can email me to get in touch david.vellante@siliconangle.com. You can DM me at dvellante or comment on my LinkedIn posts and please do check out etr.ai for the best survey data in the enterprise tech business. This is Dave Vellante for Erik Bradley and The Cube Insights powered by ETR. Thanks for watching. Be well. And we'll see you next time on "Breaking Analysis". (upbeat music)

Published Date : Sep 7 2022

SUMMARY :

bringing you data driven it's called the Emerging Great to see you too, Dave, so much in the mainstream, not only for the ITDMs themselves It is the heart of innovation So the net sentiment is a very So a lot of names that we And then of course you have AnyScale, That's the bad zone, I guess, So the gray dots that you're rates, adoption and the all And on the lower side, Vena, Acton, in the green. are in the enterprise already. So now let's look at the churn So that's the way you can look of dwell on the negative, So again, the axis is still the same, And a couple of the other And then you see these great standouts, Those are the ones you want to but Redis Labs is the one And by the way, MariaDB, So it's not in this slide, Alex, bring that up if you would. So gimme one second to catch up. So I could set it up but based on the amount of time Those are the ones we were saying before, And one of the things I think didn't allow the employees to here, but in the data. What have you seen? the market started to really And one of the executives said, And that's one of the Thank you for introducing this, just point the camera on me We are just scratching the surface

ENTITIES

Entity	Category	Confidence
Erik	PERSON	0.99+
Alex Myerson	PERSON	0.99+
Ken Shifman	PERSON	0.99+
Sanjay Poonen	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
Erik Bradley	PERSON	0.99+
November 21	DATE	0.99+
Darren Bramen	PERSON	0.99+
Alex	PERSON	0.99+
Cheryl Knight	PERSON	0.99+
Postgres	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
Netskope	ORGANIZATION	0.99+
Adobe	ORGANIZATION	0.99+
Rob Hof	PERSON	0.99+
Fivetran	ORGANIZATION	0.99+
$50 million	QUANTITY	0.99+
21%	QUANTITY	0.99+
Chris Lynch	PERSON	0.99+
19%	QUANTITY	0.99+
Jeremy Burton	PERSON	0.99+
$800 million	QUANTITY	0.99+
6,000	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
Redis Labs	ORGANIZATION	0.99+
November '21	DATE	0.99+
ETR	ORGANIZATION	0.99+
First	QUANTITY	0.99+
25%	QUANTITY	0.99+
last year	DATE	0.99+
OneTrust	ORGANIZATION	0.99+
two dimensions	QUANTITY	0.99+
two groups	QUANTITY	0.99+
November of 21	DATE	0.99+
both	QUANTITY	0.99+
Boston	LOCATION	0.99+
more than 400 companies	QUANTITY	0.99+
Kristen Martin	PERSON	0.99+
MySQL	TITLE	0.99+
Moogsoft	ORGANIZATION	0.99+
The Cube	ORGANIZATION	0.99+
third	QUANTITY	0.99+
Grafana	ORGANIZATION	0.99+
H2O	ORGANIZATION	0.99+
Mike Speiser	PERSON	0.99+
david.vellante@siliconangle.com	OTHER	0.99+
second	QUANTITY	0.99+
two	QUANTITY	0.99+
first	QUANTITY	0.99+
28%	QUANTITY	0.99+
16%	QUANTITY	0.99+
Second	QUANTITY	0.99+

Andy Palmer, TAMR | MIT CDOIQ 2019

>> from Cambridge, Massachusetts. It's the Cube covering M. I. T. Chief Data officer and Information Quality Symposium 2019 Brought to you by Silicon Angle Media >> Welcome back to M I. T. Everybody watching the Cube. The leader in live tech coverage we hear a Day two of the M I t chief data officer information Quality Conference Day Volonte with Paul Dillon. Andy Palmer's here. He's the co founder and CEO of Tamer. Good to see again. It's great to see it actually coming out. So I didn't ask this to Mike. I could kind of infirm from someone's dances. But why did you guys start >> Tamer? >> Well, it really started with an academic project that Mike was doing over at M. I. T. And I was over in of artists at the time. Is the chief get officer over there? And what we really found was that there were a lot of companies really suffering from data mastering as the primary bottleneck in their company did used great new tech like the vertical system that we've built and, you know, automated a lot of their warehousing and such. But the real bottleneck was getting lots of data integrated and mastered really, really >> quickly. Yeah, He took us through the sort of problems with obviously the d. W. In terms of scaling master data management and the scanning problems was Was that really the problem that you were trying to solve? >> Yeah, it really was. And when we started, I mean, it was like, seven years ago, eight years ago, now that we started the company and maybe almost 10 when we started working on the academic project, and at that time, people weren't really thinking are worried about that. They were still kind of digesting big data. A zit was called, but I think what Mike and I kind of felt was going on was that people were gonna get over the big data, Um, and the volume of data. And we're going to start worrying about the variety of the data and how to make the data cleaner and more organized. And, uh, I think I think way called that one pretty much right. Maybe >> we're a little >> bit early, but but I think now variety is the big problem >> with the other thing about your big day. Big data's oftentimes associated with Duke, which was a batch and then you sort of saw the shifter real time and spark was gonna fix all that. And so what are you seeing in terms of the trends in terms of how data is being used to drive almost near real time business decisions. >> You know, Mike and I came out really specifically back in 2007 and declared that we thought, uh, Hadoop and H D f s was going to be far less impactful than other people. >> 07 >> Yeah, Yeah. And Mike Mike actually was really aggressive and saying it was gonna be a disaster. And I think we've finally seen that actually play out of it now that the bloom is off the rose, so to speak. And so they're They're these fundamental things that big companies struggle with in terms of their data and, you know, cleaning it up and organizing it and making it, Iike want. Anybody that's worked at one of these big companies can tell you that the data that they get from most of their internal system sucks plain and simple, and so cleaning up that data, turning it into something it's an asset rather than liability is really what what tamers all about? And it's kind of our mission. We're out there to do this and it sort of pails and compare. Do you think about the amount of money that some of these companies have spent on systems like ASAP on you're like, Yeah, but all the data inside of the systems so bad and so, uh, ugly and unuseful like we're gonna fix that problem. >> So you're you're you're special sauce and machine learning. Where are you applying machine learning most most effectively when >> we apply machine learning to probably the least sexy problem on the planet. There are a lot of companies out there that use machine learning and a I t o do predictive algorithms and all kinds of cool stuff. All we do with machine learning is actually use it to clean up data and organize data. Get it ready for people to use a I I I started in the eye industry back in the late 19 eighties on, you know, really, I learned from the sky. Marvin Minsky and Mark Marvin taught me two things. First was garbage in garbage out. There's no algorithm that's worth anything unless you've got great data, and the 2nd 1 is it's always about the human in the machine working together. And I've really been working on those two same principles most of my career, and Tamer really brings both of those together. Our goal is to prepare data so that it can be used analytically inside of these companies, that it's actually high quality and useful. And the way we do that involves bringing together the machine, mostly these advanced machine learning algorithms with humans, subject matter experts inside of these companies that actually know all the ins and outs and all the intricacies of the data inside of their company. >> So say garbage in garbage out. If you don't have good training data course you're not going good ML model. How much how much upfront work is required. G. I know it was one of your customers and how much time is required to put together on ML model that can deal with 20,000,000 records like that? >> Well, you know, the amazing thing that this happened for us in the last five years, especially is that now we've got we've built enough models from scratch inside of these large global 2000 companies that very rarely do we go into a place where there we don't already have a model that's pre built. That they can use is a starting point. And I think that's the same thing that's happening in modeling in general. If you look a great companies like data robot Andi and even in in the Python community ml live that the accessibility of these modeling tools and the models themselves are actually so they're commoditized. And so most of our models and most of the projects we work on, we've already got a model. That's a starting point. We don't really have to start from scratch. >> You mentioned gonna ta I in the eighties Is that is the notion of a I Is it same as it was in the eighties and now we've just got the tooling, the horsepower, the data to take advantage of it is the concept changed? The >> math is all the same, like, you know, absolutely full stop, like there's really no new math. The two things I think that have changed our first. There's a lot more data that's available now, and, you know, uh, neural nets are a great example, right? in Marvin's things that, you know when you look at Google translate and how aggressively they used neural nets, it was the quantity of data that was available that actually made neural nets work. The second thing that that's that's changed is the cheap availability of Compute that Now the largest supercomputer in the world is available to rent by the minute. And so we've got all this data. You've got all this really cheap compute. And then third thing is what you alluded to earlier. The accessibility of all the math that now it's becoming so simple and easy to apply these math techniques, and they're becoming you know, it's It's almost to the point where the average data scientists not the advance With the average data, scientists can do a practice. Aye, aye. Techniques that 20 years ago required five PhDs. >> It's not surprising that Google, with its new neural net technology, all the search data that it has has been so successful. It's a surprise you that that Amazon with Alexa was able to compete so effectively. >> Oh, I think that I would never underestimate Amazon and their ability to, you know, build great tact. They've done some amazing work. One of my favorite Mike and I actually, one of our favorite examples in the last, uh, three years, they took their red shift system, you know, that competed with with Veronica and they they re implemented it and, you know, as a compiled system and it really runs incredibly fast. I mean, that that feat of engineering, what was truly exceptional >> to hear you say that Because it wasn't Red Shift originally Park. So yeah, that's right, Larry Ellison craps all over Red Shift because it's just open source offer that they just took and repackage. But you're saying they did some major engineering to Oh >> my gosh, yeah, It's like Mike and I both way Never. You know, we always compared par, excelled over tika, and, you know, we always knew we were better in a whole bunch of ways. But this this latest rewrite that they've done this compiled version like it's really good. >> So as a guy has been doing a eye for 30 years now, and it's really seeing it come into its own, a lot of a I project seems right now are sort of low hanging fruit is it's small scale stuff where you see a I in five years what kind of projects are going our bar company's gonna be undertaking and what kind of new applications are gonna come out of this? But >> I think we're at the very beginning of this cycle, and actually there's a lot more potential than has been realized. So I think we are in the pick the low hanging fruit kind of a thing. But some of the potential applications of A I are so much more impactful, especially as we modernize core infrastructure in the enterprise. So the enterprise is sort of living with this huge legacy burden. And we always air encouraging a tamer our customers to think of all their existing legacy systems is just dated generating machines and the faster they can get that data into a state where they can start doing state of the art A. I work on top of it, the better. And so really, you know, you gotta put the legacy burden aside and kind of draw this line in the sand so that as you really get, build their muscles on the A. I side that you can take advantage of that with all the data that they're generating every single day. >> Everything about these data repose. He's Enterprise Data Warehouse. You guys built better with MPP technology. Better data warehouses, the master data management stuff, the top down, you know, Enterprise data models, Dupin in big data, none of them really lived up to their promise, you know? Yeah, it's kind of somewhat unfair toe toe like the MPP guys because you said, Hey, we're just gonna run faster. And you did. But you didn't say you're gonna change the world and all that stuff, right? Where's e d? W? Did Do you feel like this next wave is actually gonna live up to the promise? >> I think the next phase is it's very logical. Like, you know, I know you're talking to Chris Lynch here in a minute, and you know what? They're doing it at scale and at scale and tamer. These companies are all in the same general area. That's kind of related to how do you take all this data and actually prepare it and turn it into something that's consumable really quickly and easily for all of these new data consumers in the enterprise and like so that that's the next logical phase in this process. Now, will this phase be the one that finally sort of meets the high expectations that were set 2030 years ago with enterprise data warehousing? I don't know, but we're certainly getting closer >> to I kind of hoped knockers, and we'll have less to do any other cool stuff that you see out there. That was a technology just >> I'm huge. I'm fanatical right now about health care. I think that the opportunity for health care to be transformed with technology is, you know, almost makes everything else look like chump change. What aspect of health care? Well, I think that the most obvious thing is that now, with the consumer sort of in the driver seat in healthcare, that technology companies that come in and provide consumer driven solutions that meet the needs of patients, regardless of how dysfunctional the health care system is, that's killer stuff. We had a great company here in Boston called Pill Pack was a great example of that where they just build something better for consumers, and it was so popular and so, you know, broadly adopted again again. Eventually, Amazon bought it for $1,000,000,000. But those kinds of things and health care Pill pack is just the beginning. There's lots and lots of those kinds of opportunities. >> Well, it's right. Healthcare's ripe for disruption on, and it hasn't been hit with the digital destruction. And neither is financialservices. Really? Certainly, defenses has not yet another. They're high risk industry, so Absolutely takes longer. Well, Andy, thanks so much for making the time. You know, You gotta run. Yeah. Yeah. Thank you. All right, keep it right. Everybody move back with our next guest right after this short break. You're watching the Cube from M I T c B O Q. Right back.

Published Date : Aug 1 2019

SUMMARY :

you by Silicon Angle Media But why did you guys start like the vertical system that we've built and, you know, the problem that you were trying to solve? now that we started the company and maybe almost 10 when we started working on the academic And so what are you seeing in terms of the trends in terms of how data that we thought, uh, Hadoop and H D f s was going to be far big companies struggle with in terms of their data and, you know, cleaning it up and organizing Where are you applying machine the eye industry back in the late 19 eighties on, you know, If you don't have good training data course And so most of our models and most of the projects we work on, we've already got a model. math is all the same, like, you know, absolutely full stop, like there's really no new math. It's a surprise you that that Amazon implemented it and, you know, as a compiled system and to hear you say that Because it wasn't Red Shift originally Park. we always compared par, excelled over tika, and, you know, we always knew we were better in a whole bunch of ways. And so really, you know, you gotta put the legacy of them really lived up to their promise, you know? That's kind of related to how do you take all this data and actually to I kind of hoped knockers, and we'll have less to do any other cool stuff that you see out health care to be transformed with technology is, you know, Well, Andy, thanks so much for making the time.

ENTITIES

Entity	Category	Confidence
Mike	PERSON	0.99+
Andy	PERSON	0.99+
Andy Palmer	PERSON	0.99+
Mark Marvin	PERSON	0.99+
2007	DATE	0.99+
Amazon	ORGANIZATION	0.99+
Paul Dillon	PERSON	0.99+
Boston	LOCATION	0.99+
$1,000,000,000	QUANTITY	0.99+
Chris Lynch	PERSON	0.99+
Marvin Minsky	PERSON	0.99+
Larry Ellison	PERSON	0.99+
First	QUANTITY	0.99+
both	QUANTITY	0.99+
30 years	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
Cambridge, Massachusetts	LOCATION	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
second thing	QUANTITY	0.99+
third thing	QUANTITY	0.99+
20,000,000 records	QUANTITY	0.99+
two same principles	QUANTITY	0.99+
seven years ago	DATE	0.99+
eight years ago	DATE	0.99+
Mike Mike	PERSON	0.98+
three years	QUANTITY	0.98+
late 19 eighties	DATE	0.98+
first	QUANTITY	0.98+
five years	QUANTITY	0.98+
2030 years ago	DATE	0.98+
2nd 1	QUANTITY	0.98+
one	QUANTITY	0.98+
One	QUANTITY	0.98+
two things	QUANTITY	0.97+
five PhDs	QUANTITY	0.97+
Day two	QUANTITY	0.97+
Veronica	PERSON	0.97+
M I. T.	PERSON	0.96+
Marvin	PERSON	0.96+
20 years ago	DATE	0.96+
Python	TITLE	0.96+
eighties	DATE	0.94+
2019	DATE	0.94+
2000 companies	QUANTITY	0.94+
Red Shift	TITLE	0.94+
Duke	ORGANIZATION	0.93+
Alexa	TITLE	0.91+
last five years	DATE	0.9+
M I t	EVENT	0.88+
almost 10	QUANTITY	0.87+
TAMR	PERSON	0.86+
Andi	PERSON	0.8+
M. I. T.	ORGANIZATION	0.79+
Tamer	ORGANIZATION	0.78+
Information Quality Symposium	EVENT	0.78+
Quality Conference Day Volonte	EVENT	0.77+
Tamer	PERSON	0.77+
Google translate	TITLE	0.75+
single day	QUANTITY	0.71+
H	PERSON	0.71+
Chief	PERSON	0.66+
Hadoop	PERSON	0.64+
MIT	ORGANIZATION	0.63+
Cube	ORGANIZATION	0.61+
more	QUANTITY	0.6+
M. I. T.	PERSON	0.57+
Pill pack	COMMERCIAL_ITEM	0.56+
Pill Pack	ORGANIZATION	0.53+
D f s	ORGANIZATION	0.48+
Park	TITLE	0.44+
CDOIQ	EVENT	0.32+
Cube	PERSON	0.27+

Michael Stonebraker, TAMR | MIT CDOIQ 2019

>> from Cambridge, Massachusetts. It's the Cube covering M I T. Chief data officer and information quality Symposium 2019. Brought to you by Silicon Angle Media. >> Welcome back to Cambridge, Massachusetts. Everybody, You're watching the Cube, the leader in live tech coverage, and we're covering the M I t CDO conference M I t. CDO. My name is David Monty in here with my co host, Paul Galen. Mike Stone breakers here. The legend is founder CTO of Of Tamer, as well as many other companies. Inventor Michael. Thanks for coming back in the Cube. Good to see again. Nice to be here. So this is kind of ah, repeat pattern for all of us. We kind of gather here in August that the CDO conference You're always the highlight of the show. You gave a talk this week on the top 10. Big data mistakes. You and I are one of the few. You were the few people who still use the term big data. I happen to like it. Sad that it's out of vogue already, but people associated with the doo doop it's kind of waning, but regardless, so welcome. How'd the talk go? What were you talking about. >> So I talked to a lot of people who were doing analytics. We're doing operation Offer operational day of data at scale, and they always make most of them make a collection of bad mistakes. And so the talk waas a litany of the blunders that I've seen people make, and so the audience could relate to the blunders about most. Most of the enterprise is represented. Make a bunch of the blunders. So I think no. One blunder is not planning on moving most everything to the cloud. >> So that's interesting, because a lot of people would would would love to debate that, but and I would imagine you probably could have done this 10 years ago in a lot of the blunders would be the same, but that's one that wouldn't have been there. But so I tend to agree. I was one of the two hands that went up this morning, and vocalist talk when he asked, Is the cloud cheaper for us? It is anyway. But so what? Why should everybody move everything? The cloud aren't there laws of physics, laws of economics, laws of the land that suggest maybe you >> shouldn't? Well, I guess 22 things and then a comment. First thing is James Hamilton, who's no techies. Techie works for Amazon. We know James. So he claims that he could stand up a server for 25% of your cost. I have no reason to disbelieve him. That number has been pretty constant for a few years, so his cost is 1/4 of your cost. Sooner or later, prices are gonna reflect costs as there's a race to the bottom of cloud servers. So >> So can I just stop you there for a second? Because you're some other date on that. All you have to do is look at a W S is operating margin and you'll see how profitable they are. They have software like economics. Now we're deploying servers. So sorry to interrupt, but so carry. So >> anyway, sooner or later, they're gonna have their gonna be wildly cheaper than you are. The second, then yet is from Dave DeWitt, whose database wizard. And here's the current technology that that Microsoft Azure is using. As of 18 months ago, it's shipping containers and parking lots, chilled water in power in Internet, Ian otherwise sealed roof and walls optional. So if you're doing raised flooring in Cambridge versus I'm doing shipping containers in the Columbia River Valley, who's gonna be a lot cheaper? And so you know the economies of scale? I mean, that, uh, big, big cloud guys are building data centers as fast as they can, using the cheapest technology around. You put up the data center every 10 years on dhe. You do it on raised flooring in Cambridge. So sooner or later, the cloud guys are gonna be a lot cheaper. And the only thing that isn't gonna the only thing that will change that equation is For example, my lab is up the street with Frank Gehry building, and we have we have an I t i t department who runs servers in Cambridge. Uh, and they claim they're cheaper than the cloud. And they don't pay rent for square footage and they don't pay for electricity. So yeah, if if think externalities, If there are no externalities, the cloud is assuredly going to be cheaper. And then the other thing is that most everybody tonight that I talk thio including me, has very skewed resource demands. So in the cloud finding three servers, except for the last day of the month on the last day of the month. I need 20 servers. I just do it. If I'm doing on Prem, I've got a provision for peak load. And so again, I'm just way more expensive. So I think sooner or later these combinations of effects was going to send everybody to the cloud for most everything, >> and my point about the operating margins is difference in price and cost. I think James Hamilton's right on it. If he If you look at the actual cost of deploying, it's even lower than the price with the market allows them to their growing at 40 plus percent a year and a 35 $40,000,000,000 run rate company sooner, Sooner or >> later, it's gonna be a race to the lot of you >> and the only guys are gonna win. You have guys have the best cost structure. A >> couple other highlights from your talk. >> Sure, I think 2nd 2nd thing like Thio Thio, no stress is that machine learning is going to be a game is going to be a game changer for essentially everybody. And not only is it going to be autonomous vehicles. It's gonna be automatic. Check out. It's going to be drone delivery of most everything. Uh, and so you can, either. And it's gonna affect essentially everybody gonna concert of, say, categorically. Any job that is easy to understand is going to get automated. And I think that's it's gonna be majorly impactful to most everybody. So if you're in Enterprise, you have two choices. You can be a disrupt or or you could be a disruptive. And so you can either be a taxi company or you can be you over, and it's gonna be a I machine learning that's going going to be determined which side of that equation you're on. So I was a big blunder that I see people not taking ml incredibly seriously. >> Do you see that? In fact, everyone I talked who seems to be bought in that this is we've got to get on the bandwagon. Yeah, >> I'm just pointing out the obvious. Yeah, yeah, I think, But one that's not quite so obvious you're is a lot of a lot of people I talked to say, uh, I'm on top of data science. I've hired a group of of 10 data scientists, and they're doing great. And when I talked, one vignette that's kind of fun is I talked to a data scientist from iRobot, which is the guys that have the vacuum cleaner that runs around your living room. So, uh, she said, I spend 90% of my time locating the data. I want to analyze getting my hands on it and cleaning it, leaving the 10% to do data science job for which I was hired. Of the 10% I spend 90% fixing the data cleaning errors in my data so that my models work. So she spends 99% of her time on what you call data preparation 1% of her time doing the job for which he was hired. So data science is not about data science. It's about data integration, data cleaning, data, discovery. >> But your new latest venture, >> so tamer does that sort of stuff. And so that's But that's the rial data science problem. And a lot of people don't realize that yet, And, uh, you know they will. I >> want to ask you because you've been involved in this by my count and starting up at least a dozen companies. Um, 99 Okay, It's a lot. >> It's not overstated. You estimated high fall. How do you How >> do you >> decide what challenge to move on? Because they're really not. You're not solving the same problems. You're You're moving on to new problems. How do you decide? What's the next thing that interests you? Enough to actually start a company. Okay, >> that's really easy. You know, I'm on the faculty of M i t. My job is to think of news new ship and investigate it, and I come up. No, I'm paid to come up with new ideas, some of which have commercial value, some of which don't and the ones that have commercial value, like, commercialized on. So it's whatever I'm doing at the time on. And that's why all the things I've commercialized, you're different >> s so going back to tamer data integration platform is a lot of companies out there claim to do it day to get integration right now. What did you see? What? That was the deficit in the market that you could address. >> Okay, great question. So there's the traditional data. Integration is extract transforming load systems and so called Master Data management systems brought to you by IBM in from Attica. Talent that class of folks. So a dirty little secret is that that technology does not scale Okay, in the following sense that it's all well, e t l doesn't scale for a different reason with an m d l e t l doesn't scale because e t. L is based on the premise that somebody really smart comes up with a global data model For all the data sources you want put together. You then send a human out to interview each business unit to figure out exactly what data they've got and then how to transform it into the global data model. How to load it into your data warehouse. That's very human intensive. And it doesn't scale because it's so human intensive. So I've never talked to a data warehouse operator who who says I integrate the average I talk to says they they integrate less than 10 data sources. Some people 20. If you twist my arm hard, I'll give you 50. So a Here. Here's a real world problem, which is Toyota Motor Europe. I want you right now. They have a distributor in Spain, another distributor in France. They have a country by country distributor, sometimes canton by Canton. Distribute distribution. So if you buy a Toyota and Spain and move to France, Toyota develops amnesia. The French French guys know nothing about you. So they've got 250 separate customer databases with 40,000,000 total records in 50 languages. And they're in the process of integrating that. It was single customer database so that they can Duke custom. They could do the customer service we expect when you cross cross and you boundary. I've never seen an e t l system capable of dealing with that kind of scale. E t l dozen scale to this level of problem. >> So how do you solve that problem? >> I'll tell you that they're a tamer customer. I'll tell you all about it. Let me first tell you why MGM doesn't scare. >> Okay. Great. >> So e t l says I now have all your data in one place in the same format, but now you've got following problems. You've got a d duplicated because if if I if I bought it, I bought a Toyota in Spain, I bought another Toyota in France. I'm both databases. So if you want to avoid double counting customers, you got a dupe. Uh, you know, got Duke 30,000,000 records. And so MGM says Okay, you write some rules. It's a rule based technology. So you write a rule. That's so, for example, my favorite example of a rule. I don't know if you guys like to downhill downhill skiing, All right? I love downhill skiing. So ski areas, Aaron, all kinds of public databases assemble those all together. Now you gotta figure out which ones are the same the same ski area, and they're called different names in different addresses and so forth. However, a vertical drop from bottom to the top is the same. Chances are they're the same ski area. So that's a rule that says how to how to put how to put data together in clusters. And so I now have a cluster for mount sanity, and I have a problem which is, uh, one address says something rather another address as something else. Which one is right or both? Right, so now you want. Now you have a gold. Let's call the golden Record problem to basically decide which, which, which data elements among a variety that maybe all associated with the same entity are in fact correct. So again, MDM, that's a rule's a rule based system. So it's a rule based technology and rule systems don't scale the best example I can give you for why Rules systems don't scale. His tamer has another customer. General Electric probably heard of them, and G wanted to do spend analytics, and so they had 20,000,000 spend transactions. Frank the year before last and spend transaction is I paid $12 to take a cab from here here to the airport, and I charged it to cost center X Y Z 20,000,000 of those so G has a pre built classification system for spend, so they have parts and underneath parts or computers underneath computers and memory and so forth. So pre existing preexisting class classifications for spend they want to simply classified 20,000,000 spent transactions into this pre existing hierarchy. So the traditional technology is, well, let's write some rules. So G wrote 500 rules, which is about the most any single human I can get there, their arms around so that classified 2,000,000 of the 20,000,000 transactions. You've now got 18 to go and another 500 rules is not going to give you 2,000,000 more. It's gonna give you love diminishing returns, right? So you have to write a huge number of rules and no one can possibly understand. So the technology simply doesn't scale, right? So in the case of G, uh, they had tamer health. Um, solve this. Solved this classification problem. Tamer used their 2,000,000 rule based, uh, tag records as training data. They used an ML model, then work off the training data classifies remaining 18,000,000. So the answer is machine learning. If you don't use machine learning, you're absolutely toast. So the answer to MDM the answer to MGM doesn't scale. You've got to use them. L The answer to each yell doesn't scale. You gotta You're putting together disparate records can. The answer is ml So you've got to replace humans by machine learning. And so that's that seems, at least in this conference, that seems to be resonating, which is people are understanding that at scale tradition, traditional data integration, technology's just don't work >> well and you got you got a great shot out on yesterday from the former G S K Mark Grams, a leader Mark Ramsay. Exactly. Guys. And how they solve their problem. He basically laid it out. BTW didn't work and GM didn't work, All right. I mean, kick it, kick the can top down data modelling, didn't work, kicked the candid governance That's not going to solve the problem. And But Tamer did, along with some other tooling. Obviously, of course, >> the Well, the other thing is No. One technology. There's no silver bullet here. It's going to be a bunch of technologies working together, right? Mark Ramsay is a great example. He used his stream sets and a bunch of other a bunch of other startup technology operating together and that traditional guys >> Okay, we're good >> question. I want to show we have time. >> So with traditional vendors by and large or 10 years behind the times, And if you want cutting edge stuff, you've got to go to start ups. >> I want to jump. It's a different topic, but I know that you in the past were critic of know of the no sequel movement, and no sequel isn't going away. It seems to be a uh uh, it seems to be actually gaining steam right now. What what are the flaws in no sequel? It has your opinion changed >> all? No. So so no sequel originally meant no sequel. Don't use it then. Then the marketing message changed to not only sequel, So sequel is fine, but no sequel does others. >> Now it's all sequel, right? >> And my point of view is now. No sequel means not yet sequel because high level language, high level data languages, air good. Mongo is inventing one Cassandra's inventing one. Those unless you squint, look like sequel. And so I think the answer is no sequel. Guys are drifting towards sequel. Meanwhile, Jason is That's a great idea. If you've got your regular data sequel, guys were saying, Sure, let's have Jason is the data type, and I think the only place where this a fair amount of argument is schema later versus schema first, and I pretty much think schema later is a bad idea because schema later really means you're creating a data swamp exactly on. So if you >> have to fix it and then you get a feel of >> salary, so you're storing employees and salaries. So, Paul salaries recorded as dollars per month. Uh, Dave, salary is in euros per week with a lunch allowance minds. So if you if you don't, If you don't deal with irregularities up front on data that you care about, you're gonna create a mess. >> No scheme on right. Was convenient of larger store, a lot of data cheaply. But then what? Hard to get value out of it created. >> So So I think the I'm not opposed to scheme later. As long as you realize that you were kicking the can down the road and you're just you're just going to give your successor a big mess. >> Yeah, right. Michael, we gotta jump. But thank you so much. Sure appreciate it. All right. Keep it right there, everybody. We'll be back with our next guest right into the short break. You watching the cue from M i t cdo Ike, you right back

Published Date : Aug 1 2019

SUMMARY :

Brought to you by We kind of gather here in August that the CDO conference You're always the highlight of the so the audience could relate to the blunders about most. physics, laws of economics, laws of the land that suggest maybe you So he claims that So can I just stop you there for a second? And so you know the and my point about the operating margins is difference in price and cost. You have guys have the best cost structure. And so you can either be a taxi company got to get on the bandwagon. leaving the 10% to do data science job for which I was hired. But that's the rial data science problem. want to ask you because you've been involved in this by my count and starting up at least a dozen companies. How do you How You're You're moving on to new problems. No, I'm paid to come up with new ideas, s so going back to tamer data integration platform is a lot of companies out there claim to do and so called Master Data management systems brought to you by IBM I'll tell you that they're a tamer customer. So the answer to MDM the I mean, kick it, kick the can top down data modelling, It's going to be a bunch of technologies working together, I want to show we have time. and large or 10 years behind the times, And if you want cutting edge It's a different topic, but I know that you in the past were critic of know of the no sequel movement, No. So so no sequel originally meant no So if you So if you if Hard to get value out of it created. So So I think the I'm not opposed to scheme later. But thank you so much.

ENTITIES

Entity	Category	Confidence
Michael	PERSON	0.99+
James	PERSON	0.99+
Mark Ramsay	PERSON	0.99+
James Hamilton	PERSON	0.99+
Paul Galen	PERSON	0.99+
Dave DeWitt	PERSON	0.99+
Toyota	ORGANIZATION	0.99+
David Monty	PERSON	0.99+
General Electric	ORGANIZATION	0.99+
2,000,000	QUANTITY	0.99+
France	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
20,000,000	QUANTITY	0.99+
10%	QUANTITY	0.99+
Michael Stonebraker	PERSON	0.99+
Cambridge	LOCATION	0.99+
IBM	ORGANIZATION	0.99+
50	QUANTITY	0.99+
$12	QUANTITY	0.99+
Spain	LOCATION	0.99+
18,000,000	QUANTITY	0.99+
25%	QUANTITY	0.99+
20 servers	QUANTITY	0.99+
90%	QUANTITY	0.99+
Columbia River Valley	LOCATION	0.99+
99%	QUANTITY	0.99+
18	QUANTITY	0.99+
Aaron	PERSON	0.99+
Dave	PERSON	0.99+
August	DATE	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
three servers	QUANTITY	0.99+
35 $40,000,000,000	QUANTITY	0.99+
50 languages	QUANTITY	0.99+
500 rules	QUANTITY	0.99+
22 things	QUANTITY	0.99+
10 data scientists	QUANTITY	0.99+
Mike Stone	PERSON	0.99+
Cambridge, Massachusetts	LOCATION	0.99+
MGM	ORGANIZATION	0.99+
less than 10 data sources	QUANTITY	0.99+
Ian	PERSON	0.99+
Paul	PERSON	0.99+
1%	QUANTITY	0.99+
both	QUANTITY	0.99+
Toyota Motor Europe	ORGANIZATION	0.99+
Of Tamer	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
one	QUANTITY	0.99+
single	QUANTITY	0.99+
Attica	ORGANIZATION	0.99+
10 years ago	DATE	0.99+
yesterday	DATE	0.99+
iRobot	ORGANIZATION	0.99+
Mark Grams	PERSON	0.99+
TAMR	PERSON	0.99+
10 years	QUANTITY	0.99+
20	QUANTITY	0.98+
1/4	QUANTITY	0.98+
250 separate customer databases	QUANTITY	0.98+
Cassandra	PERSON	0.98+
First thing	QUANTITY	0.98+
30,000,000 records	QUANTITY	0.98+
both databases	QUANTITY	0.98+
18 months ago	DATE	0.98+
first	QUANTITY	0.98+
M I t CDO	EVENT	0.98+
One blunder	QUANTITY	0.98+
Tamer	PERSON	0.98+
one place	QUANTITY	0.98+
second	QUANTITY	0.97+
two choices	QUANTITY	0.97+
tonight	DATE	0.97+
each business unit	QUANTITY	0.97+
Thio Thio	PERSON	0.97+
two hands	QUANTITY	0.96+
this week	DATE	0.96+
Frank	PERSON	0.95+
Duke	ORGANIZATION	0.95+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Tamer: