Brian Payne, Dell Technologies and Raghu Nambiar, AMD | SuperComputing 22
(upbeat music) >> We're back at SC22 SuperComputing Conference in Dallas. My name's Paul Gillan, my co-host, John Furrier, SiliconANGLE founder. And huge exhibit floor here. So much activity, so much going on in HPC, and much of it around the chips from AMD, which has been on a roll lately. And in partnership with Dell, our guests are Brian Payne, Dell Technologies, VP of Product Management for ISG mid-range technical solutions, and Raghu Nambiar, corporate vice president of data system, data center ecosystem, and application engineering, that's quite a mouthful, at AMD, And gentlemen, welcome. Thank you. >> Thanks for having us. >> This has been an evolving relationship between you two companies, obviously a growing one, and something Dell was part of the big general rollout, AMD's new chip set last week. Talk about how that relationship has evolved over the last five years. >> Yeah, sure. Well, so it goes back to the advent of the EPIC architecture. So we were there from the beginning, partnering well before the launch five years ago, thinking about, "Hey how can we come up with a way to solve customer problems? address workloads in unique ways?" And that was kind of the origin of the relationship. We came out with some really disruptive and capable platforms. And then it continues, it's continued till then, all the way to the launch of last week, where we've introduced four of the most capable platforms we've ever had in the PowerEdge portfolio. >> Yeah, I'm really excited about the partnership with the Dell. As Brian said, we have been partnering very closely for last five years since we introduced the first generation of EPIC. So we collaborate on, you know, system design, validation, performance benchmarks, and more importantly on software optimizations and solutions to offer out of the box experience to our customers. Whether it is HPC or databases, big data analytics or AI. >> You know, you guys have been on theCUBE, you guys are veterans 2012, 2014 back in the day. So much has changed over the years. Raghu, you were on the founding chair of the TPC for AI. We've talked about the different iterations of power service. So much has changed. Why the focus on these workloads now? What's the inflection point that we're seeing here at SuperComputing? It feels like we've been in this, you know run the ball, get, gain a yard, move the chains, you know, but we feel, I feel like there's a moment where the there's going to be an unleashing of innovation around new use cases. Where's the workloads? Why the performance? What are some of those use cases right now that are front and center? >> Yeah, I mean if you look at today, the enterprise ecosystem has become extremely complex, okay? People are running traditional workloads like Relational Database Management Systems, also new generation of workloads with the AI and HPC and actually like AI actually HPC augmented with some of the AI technologies. So what customers are looking for is, as I said, out of the box experience, or time to value is extremely critical. Unlike in the past, you know, people, the customers don't have the time and resources to run months long of POCs, okay? So that's one idea that we are focusing, you know, working closely with Dell to give out of the box experience. Again, you know, the enterprise applicate ecosystem is, you know, really becoming complex and the, you know, as you mentioned, some of the industry standard benchmark is designed to give the fair comparison of performance, and price performance for the, our end customers. And you know, Brian and my team has been working closely to demonstrate our joint capabilities in the AI space with, in a set of TPCx-AI benchmark cards last week it was the major highlight of our launch last week. >> Brian, you got showing the demo in the booth at Dell here. Not demo, the product, it's available. What are you seeing for your use cases that customers are kind of rallying around now, and what are they doubling down on. >> Yeah, you know, I, so Raghu I think teed it up well. The really data is the currency of business and all organizations today. And that's what's pushing people to figure out, hey, both traditional workloads as well as new workloads. So we've got in the traditional workload space, you still have ERP systems like SAP, et cetera, and we've announced world records there, a hundred plus percent improvements in our single socket system, 70% and dual. We actually posted a 40% advantage over the best Genoa result just this week. So, I mean, we're excited about that in the traditional space. But what's exciting, like why are we here? Why, why are people thinking about HPC and AI? It's about how do we make use of that data, that data being the currency and how do we push in that space? So Raghu mentioned the TPC AI benchmark. We launched, or we announced in collaboration you talk about how do we work together, nine world records in that space. In one case it's a 3x improvement over prior generations. So the workloads that people care about is like how can I process this data more effectively? How can I store it and secure it more effectively? And ultimately, how do I make decisions about where we're going, whether it's a scientific breakthrough, or a commercial application. That's what's really driving the use cases and the demand from our customers today. >> I think one of the interesting trends we've seen over the last couple of years is a resurgence in interest in task specific hardware around AI. In fact venture capital companies invested a $1.8 billion last year in AI hardware startups. I wonder, and these companies are not doing CPUs necessarily, or GPUs, they're doing accelerators, FPGAs, ASICs. But you have to be looking at that activity and what these companies are doing. What are you taking away from that? How does that affect your own product development plans? Both on the chip side and on the system side? >> I think the future of computing is going to be heterogeneous. Okay. I mean a CPU solving certain type of problems like general purpose computing databases big data analytics, GPU solving, you know, problems in AI and visualization and DPUs and FPGA's accelerators solving you know, offloading, you know, some of the tasks from the CPU and providing realtime performance. And of course, you know, the, the software optimizes are going to be critical to stitch everything together, whether it is HPC or AI or other workloads. You know, again, as I said, heterogeneous computing is going to be the future. >> And, and for us as a platform provider, the heterogeneous, you know, solutions mean we have to design systems that are capable of supporting that. So if as you think about the compute power whether it's a GPU or a CPU, continuing to push the envelope in terms of, you know, to do the computations, power consumption, things like that. How do we design a system that can be, you know, incredibly efficient, and also be able to support the scaling, you know, to solve those complex problems. So that gets into challenges around, you know, both liquid cooling, but also making the most out of air cooling. And so we're seeing not only are we we driving up you know, the capability of these systems, we're actually improving the energy efficiency. And those, the most recent systems that we launched around the CPU, which is still kind of at the heart of everything today, you know, are seeing 50% improvement, you know, gen to gen in terms of performance per watt capabilities. So it's, it's about like how do we package these systems in effective ways and make sure that our customers can get, you know, the advertised benefits, so to speak, of the new chip technologies. >> Yeah. To add to that, you know, performance, scalability total cost of ownership, these are the key considerations, but now energy efficiency has become more important than ever, you know, our commitment to sustainability. This is one of the thing that we have demonstrated last week was with our new generation of EPIC Genoa based systems, we can do a one five to one consolidation, significantly reducing the energy requirement. >> Power's huge costs are going up. It's a global issue. >> Raghu: Yeah, it is. >> How do you squeeze more performance too out of it at the same time, I mean, smaller, faster, cheaper. Paul, you wrote a story about, you know, this weekend about hardware and AI making hardware so much more important. You got more power requirements, you got the sustainability, but you need more horsepower, more compute. What's different in the architecture if you guys could share like today versus years ago, what's different in as these generations step function value increases? >> So one of the major drivers from the processor perspective is if you look at the latest generation of processors, the five nanometer technology, bringing efficiency and density. So we are able to pack 96 processor cores, you know, in a two socket system, we are talking about 196 processor cores. And of course, you know, other enhancements like IPC uplift, bringing DDR5 to the market PC (indistinct) for the market, offering overall, you know, performance uplift of more than 2.5x for certain workloads. And of course, you know, significantly reducing the power footprint. >> Also, I was just going to cut, I mean, architecturally speaking, you know, then how do we take the 96 cores and surround it, deliver a balanced ecosystem to make sure that we can get the, the IO out of the system, and make sure we've got the right data storage. So I mean, you'll see 60% improvements and total storage in the system. I think in 2012 we're talking about 10 gig ethernet. Well, you know, now we're on to 100 and 400 on the forefront. So it's like how do we keep up with this increased power, by having, or computing capabilities both offload and core computing and make sure we've got a system that can deliver the desired (indistinct). >> So the little things like the bus, the PCI cards, the NICs, the connectors have to be rethought through. Is that what you're getting at? >> Yeah, absolutely. >> Paul: And the GPUs, which are huge power consumers. >> Yeah, absolutely. So I mean, cooling, we introduce, and we call it smart cooling is a part of our latest generation of servers. I mean, the thermal design inside of a server is a is a complex, you know, complex system, right? And doing that efficiently because of course fans consume power. So I mean, yeah, those are the kind of considerations that we have to put through to make sure that you're not either throttling performance because you don't have you know, keeping the chips at the right temperature. And, and you know, ultimately when you do that, you're hurting the productivity of the investment. So I mean, it's, it's our responsibility to put our thoughts and deliver those systems that are (indistinct) >> You mention data too, if you bring in the data, one of the big discussions going into the big Amazon show coming up, re:Invent is egress costs. Right, So now you've got compute and how you design data latency you know, processing. It's not just contained in a machine. You got to think about outside that machine talking to other machines. Is there an intelligent (chuckles) network developing? I mean, what's the future look like? >> Well, I mean, this is a, is an area that, that's, you know, it's fun and, you know, Dell's in a unique position to work on this problem, right? We have 70% of the mission housed, 70% of the mission critical data that exists in the world. How do we bring that closer to compute? How do we deliver system level solutions? So server compute, so recently we announced innovations around NVMe over Fabrics. So now you've got the NVMe technology and the SAN. How do we connect that more efficiently across the servers? Those are the kinds, and then guide our customers to make use of that. Those are the kinds of challenges that we're trying to unlock the value of the data by making sure we're (indistinct). >> There are a lot of lessons learned from, you know, classic HPC and some of the, you know big data analytics. Like, you know, Hadoops of the world, you know, you know distributor processing for crunching a large amount of amount of data. >> With the growth of the cloud, you see, you know, some pundits saying that data centers will become obsolete in five years, and everything's going to move to the cloud. Obviously data center market that's still growing, and is projected to continue to grow. But what's the argument for captive hardware, for owning a data center these days when the cloud offers such convenience and allegedly cost benefit? >> I would say the reality is that we're, and I think the industry at large has acknowledged this, that we're living in a multicloud world and multicloud methods are going to be necessary to you know, to solve problems and compete. And so, I mean, you know, in some cases, whether it's security or latency, you know, there's a push to have things in your own data center. And then of course growth at the edge, right? I mean, that's, that's really turning, you know, things on their head, if you will, getting data closer to where it's being generated. And so I would say we're going to live in this edge cloud, you know, and core data center environment with multi, you know, different cloud providers providing solutions and services where it makes sense, and it's incumbent on us to figure out how do we stitch together that data platform, that data layer, and help customers, you know, synthesize this data to, to generate, you know, the results they need. >> You know, one of the things I want to get into on the cloud you mentioned that Paul, is that we see the rise of graph databases. And so is that on the radar for the AI? Because a lot of more graph data is being brought in, the database market's incredibly robust. It's one of the key areas that people want performance out of. And as cloud native becomes the modern application development, a lot more infrastructure as code's happening, which means that the internet and the networks and the process should be programmable. So graph database has been one of those things. Have you guys done any work there? What's some data there you can share on that? >> Yeah, actually, you know, we have worked closely with a company called TigerGraph, there in the graph database space. And we have done a couple of case studies, one on the healthcare side, and the other one on the financial side for fraud detection. Yeah, I think they have a, this is an emerging area, and we are able to demonstrate industry leading performance for graph databases. Very excited about it. >> Yeah, it's interesting. It brings up the vertical versus horizontal applications. Where is the AI HPC kind of shining? Is it like horizontal and vertical solutions or what's, what's your vision there. >> Yeah, well, I mean, so this is a case where I'm also a user. So I own our analytics platform internally. We actually, we have a chat box for our product development organization to figure out, hey, what trends are going on with the systems that we sell, whether it's how they're being consumed or what we've sold. And we actually use graph database technology in order to power that chat box. So I'm actually in a position where I'm like, I want to get these new systems into our environment so we can deliver. >> Paul: Graphs under underlie most machine learning models. >> Yeah, Yeah. >> So we could talk about, so much to talk about in this space, so little time. And unfortunately we're out of that. So fascinating discussion. Brian Payne, Dell Technologies, Raghu Nambiar, AMD. Congratulations on the successful launch of your new chip set and the growth of, in your relationship over these past years. Thanks so much for being with us here on theCUBE. >> Super. >> Thank you much. >> It's great to be back. >> We'll be right back from SuperComputing 22 in Dallas. (upbeat music)
SUMMARY :
and much of it around the chips from AMD, over the last five years. in the PowerEdge portfolio. you know, system design, So much has changed over the years. Unlike in the past, you know, demo in the booth at Dell here. Yeah, you know, I, so and on the system side? And of course, you know, the heterogeneous, you know, This is one of the thing that we It's a global issue. What's different in the And of course, you know, other Well, you know, now the connectors have to Paul: And the GPUs, which And, and you know, you know, processing. is an area that, that's, you know, the world, you know, you know With the growth of the And so, I mean, you know, in some cases, on the cloud you mentioned that Paul, Yeah, actually, you know, Where is the AI HPC kind of shining? And we actually use graph Paul: Graphs under underlie Congratulations on the successful launch SuperComputing 22 in Dallas.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Brian | PERSON | 0.99+ |
Brian Payne | PERSON | 0.99+ |
Paul | PERSON | 0.99+ |
Paul Gillan | PERSON | 0.99+ |
Dallas | LOCATION | 0.99+ |
50% | QUANTITY | 0.99+ |
60% | QUANTITY | 0.99+ |
70% | QUANTITY | 0.99+ |
2012 | DATE | 0.99+ |
Raghu | PERSON | 0.99+ |
John Furrier | PERSON | 0.99+ |
Dell | ORGANIZATION | 0.99+ |
96 cores | QUANTITY | 0.99+ |
two companies | QUANTITY | 0.99+ |
40% | QUANTITY | 0.99+ |
100 | QUANTITY | 0.99+ |
$1.8 billion | QUANTITY | 0.99+ |
400 | QUANTITY | 0.99+ |
TigerGraph | ORGANIZATION | 0.99+ |
AMD | ORGANIZATION | 0.99+ |
last week | DATE | 0.99+ |
Raghu Nambiar | PERSON | 0.99+ |
2014 | DATE | 0.99+ |
one | QUANTITY | 0.99+ |
both | QUANTITY | 0.99+ |
Dell Technologies | ORGANIZATION | 0.99+ |
96 processor cores | QUANTITY | 0.99+ |
last year | DATE | 0.99+ |
Both | QUANTITY | 0.99+ |
Amazon | ORGANIZATION | 0.98+ |
five years | QUANTITY | 0.98+ |
two socket | QUANTITY | 0.98+ |
3x | QUANTITY | 0.98+ |
this week | DATE | 0.98+ |
five years ago | DATE | 0.98+ |
today | DATE | 0.98+ |
first generation | QUANTITY | 0.98+ |
four | QUANTITY | 0.98+ |
SiliconANGLE | ORGANIZATION | 0.97+ |
more than 2.5x | QUANTITY | 0.97+ |
five | QUANTITY | 0.97+ |
one idea | QUANTITY | 0.97+ |
ISG | ORGANIZATION | 0.96+ |
one case | QUANTITY | 0.95+ |
five nanometer | QUANTITY | 0.95+ |
SuperComputing | ORGANIZATION | 0.94+ |
EPIC | ORGANIZATION | 0.93+ |
years | DATE | 0.93+ |
Genoa | ORGANIZATION | 0.92+ |
Raghu Nambiar | ORGANIZATION | 0.92+ |
SC22 SuperComputing Conference | EVENT | 0.91+ |
last couple of years | DATE | 0.9+ |
hundred plus percent | QUANTITY | 0.89+ |
TPC | ORGANIZATION | 0.88+ |
nine world | QUANTITY | 0.87+ |
SuperComputing 22 | ORGANIZATION | 0.87+ |
about 196 processor cores | QUANTITY | 0.85+ |
Video exclusive: Oracle adds more wood to the MySQL HeatWave fire
(upbeat music) >> When Oracle acquired Sun in 2009, it paid $5.6 billion net of Sun's cash and debt. Now I argued at the time that Oracle got one of the best deals in the history of enterprise tech, and I got a lot of grief for saying that because Sun had a declining business, it was losing money, and its revenue was under serious pressure as it tried to hang on for dear life. But Safra Catz understood that Oracle could pay Sun's lower profit and lagging businesses, like its low index 86 product lines, and even if Sun's revenue was cut in half, because Oracle has such a high revenue multiple as a software company, it could almost instantly generate $25 to $30 billion in shareholder value on paper. In addition, it was a catalyst for Oracle to initiate its highly differentiated engineering systems business, and was actually the precursor to Oracle's Cloud. Oracle saw that it could capture high margin dollars that used to go to partners like HP, it's original exit data partner, and get paid for the full stack across infrastructure, middleware, database, and application software, when eventually got really serious about cloud. Now there was also a major technology angle to this story. Remember Sun's tagline, "the network is the computer"? Well, they should have just called it cloud. Through the Sun acquisition. Oracle also got a couple of key technologies, Java, the number one programming language in the world, and MySQL, a key ingredient of the LAMP stack, that's Linux, Apache, MySQL and PHP, Perl or Python, on which the internet is basically built, and is used by many cloud services like Facebook, Twitter, WordPress, Flicker, Amazon, Aurora, and many other examples, including, by the way, Maria DB, which is a fork of MySQL created by MySQL's creator, basically in protest to Oracle's acquisition; the drama is Oscar worthy. It gets even better. In 2020, Oracle began introducing a new version of MySQL called MySQL HeatWave, and since late 2020 it's been in sort of a super cycle rolling, out three new releases in less than a year and a half in an attempt to expand its Tam and compete in new markets. Now we covered the release of MySQL Autopilot, which uses machine learning to automate management functions. And we also covered the bench marketing that Oracle produced against Snowflake, AWS, Azure, and Google. And Oracle's at it again with HeatWave, adding machine learning into its database capabilities, along with previously available integrations of OLAP and OLTP. This, of course, is in line with Oracle's converged database philosophy, which, as we've reported, is different from other cloud database providers, most notably Amazon, which takes the right tool for the right job approach and chooses database specialization over a one size fits all strategy. Now we've asked Oracle to come on theCUBE and explain these moves, and I'm pleased to welcome back Nipun Agarwal, who's the senior vice president for MySQL Database and HeatWave at Oracle. And today, in this video exclusive, we'll discuss machine learning, other new capabilities around elasticity and compression, and then any benchmark data that Nipun wants to share. Nipun's been a leading advocate of the HeatWave program. He's led engineering in that team for over 10 years, and he has over 185 patents in database technologies. Welcome back to the show Nipun. Great to see you again. Thanks for coming on. >> Thank you, Dave. Very happy to be back. >> Yeah, now for those who may not have kept up with the news, maybe to kick things off you could give us an overview of what MySQL HeatWave actually is so that we're all on the same page. >> Sure, Dave, MySQL HeatWave is a fully managed MySQL database service from Oracle, and it has a builtin query accelerator called HeatWave, and that's the part which is unique. So with MySQL HeatWave, customers of MySQL get a single database which they can use for transactional processing, for analytics, and for mixed workloads because traditionally MySQL has been designed and optimized for transaction processing. So in the past, when customers had to run analytics with the MySQL based service, they would need to move the data out of MySQL into some other database for running analytics. So they would end up with two different databases and it would take some time to move the data out of MySQL into this other system. With MySQL HeatWave, we have solved this problem and customers now have a single MySQL database for all their applications, and they can get the good performance of analytics without any changes to their MySQL application. >> Now it's no secret that a lot of times, you know, queries are not, you know, most efficiently written, and critics of MySQL HeatWave will claim that this product is very memory and cluster intensive, it has a heavy footprint that adds to cost. How do you answer that, Nipun? >> Right, so for offering any database service in the cloud there are two dimensions, performance and cost, and we have been very cognizant of both of them. So it is indeed the case that HeatWave is a, in-memory query accelerator, which is why we get very good performance, but it is also the case that we have optimized HeatWave for commodity cloud services. So for instance, we use the least expensive compute. We use the least expensive storage. So what I would suggest is for the customers who kind of would like to know what is the price performance advantage of HeatWave compared to any database we have benchmark against, Redshift, Snowflake, Google BigQuery, Azure Synapse, HeatWave is significantly faster and significantly lower price on a multitude of workloads. So not only is it in-memory database and optimized for that, but we have also optimized it for commodity cloud services, which makes it much lower price than the competition. >> Well, at the end of the day, it's customers that sort of decide what the truth is. So to date, what's been the customer reaction? Are they moving from other clouds from on-prem environments? Both why, you know, what are you seeing? >> Right, so we are definitely a whole bunch of migrations of customers who are running MySQL on-premise to the cloud, to MySQL HeatWave. That's definitely happening. What is also very interesting is we are seeing that a very large percentage of customers, more than half the customers who are coming to MySQL HeatWave, are migrating from other clouds. We have a lot of migrations coming from AWS Aurora, migrations from RedShift, migrations from RDS MySQL, TerriData, SAP HANA, right. So we are seeing migrations from a whole bunch of other databases and other cloud services to MySQL HeatWave. And the main reason we are told why customers are migrating from other databases to MySQL HeatWave are lower cost, better performance, and no change to their application because many of these services, like AWS Aurora are ETL compatible with MySQL. So when customers try MySQL HeatWave, not only do they get better performance at a lower cost, but they find that they can migrate their application without any changes, and that's a big incentive for them. >> Great, thank you, Nipun. So can you give us some names? Are there some real world examples of these customers that have migrated to MySQL HeatWave that you can share? >> Oh, absolutely, I'll give you a few names. Stutor.com, this is an educational SaaS provider raised out of Brazil. They were using Google BigQuery, and when they migrated to MySQL HeatWave, they found a 300X, right, 300 times improvement in performance, and it lowered their cost by 85 (audio cut out). Another example is Neovera. They offer cybersecurity solutions and they were running their application on an on-premise version of MySQL when they migrated to MySQL HeatWave, their application improved in performance by 300 times and their cost reduced by 80%, right. So by going from on-premise to MySQL HeatWave, they reduced the cost by 80%, improved performance by 300 times. We are Glass, another customer based out of Brazil. They were running on AWS EC2, and when they migrated, within hours they found that there was a significant improvement, like, you know, over 5X improvement in database performance, and they were able to accommodate a very large virtual event, which had more than a million visitors. Another example, Genius Senority. They are a game designer in Japan, and when they moved to MySQL HeatWave, they found a 90 times percent improvement in performance. And there many, many more like a lot of migrations, again, from like, you know, Aurora, RedShift and many other databases as well. And consistently what we hear is (audio cut out) getting much better performance at a much lower cost without any change to their application. >> Great, thank you. You know, when I ask that question, a lot of times I get, "Well, I can't name the customer name," but I got to give Oracle credit, a lot of times you guys have at your fingertips. So you're not the only one, but it's somewhat rare in this industry. So, okay, so you got some good feedback from those customers that did migrate to MySQL HeatWave. What else did they tell you that they wanted? Did they, you know, kind of share a wishlist and some of the white space that you guys should be working on? What'd they tell you? >> Right, so as customers are moving more data into MySQL HeatWave, as they're consolidating more data into MySQL HeatWave, customers want to run other kinds of processing with this data. A very popular one is (audio cut out) So we have had multiple customers who told us that they wanted to run machine learning with data which is stored in MySQL HeatWave, and for that they have to extract the data out of MySQL (audio cut out). So that was the first feedback we got. Second thing is MySQL HeatWave is a highly scalable system. What that means is that as you add more nodes to a HeatWave cluster, the performance of the system improves almost linearly. But currently customers need to perform some manual steps to add most to a cluster or to reduce the cluster size. So that was other feedback we got that people wanted this thing to be automated. Third thing is that we have shown in the previous results, that HeatWave is significantly faster and significantly lower price compared to competitive services. So we got feedback from customers that can we trade off some performance to get even lower cost, and that's what we have looked at. And then finally, like we have some results on various data sizes with TPC-H. Customers wanted to see if we can offer some more data points as to how does HeatWave perform on other kinds of workloads. And that's what we've been working on for the several months. >> Okay, Nipun, we're going to get into some of that, but, so how did you go about addressing these requirements? >> Right, so the first thing is we are announcing support for in-database machine learning, meaning that customers who have their data inside MySQL HeatWave can now run training, inference, and prediction all inside the database without the data or the model ever having to leave the database. So that's how we address the first one. Second thing is we are offering support for real time elasticity, meaning that customers can scale up or scale down to any number of nodes. This requires no manual intervention on part of the user, and for the entire duration of the resize operation, the system is fully available. The third, in terms of the costs, we have double the amount of data that can be processed per node. So if you look at a HeatWave cluster, the size of the cluster determines the cost. So by doubling the amount of data that can be processed per node, we have effectively reduced the cluster size which is required for planning a given workload to have, which means it reduces the cost to the customer by half. And finally, we have also run the TPC-DS workload on HeatWave and compared it with other vendors. So now customers can have another data point in terms of the performance and the cost comparison of HeatWave with other services. >> All right, and I promise, I'm going to ask you about the benchmarks, but I want to come back and drill into these a bit. How is HeatWave ML different from competitive offerings? Take for instance, Redshift ML, for example. >> Sure, okay, so this is a good comparison. Let's start with, let's say RedShift ML, like there are some systems like, you know, Snowflake, which don't even offer any, like, processing of machine learning inside the database, and they expect customers to write a whole bunch of code, in say Python or Java, to do machine learning. RedShift ML does have integration with SQL. That's a good start. However, when customers of Redshift need to run machine learning, and they invoke Redshift ML, it makes a call to another service, SageMaker, right, where so the data needs to be exported to a different service. The model is generated, and the model is also outside RedShift. With HeatWave ML, the data resides always inside the MySQL database service. We are able to generate models. We are able to train the models, run inference, run explanations, all inside the MySQL HeatWave service. So the data, or the model, never have to leave the database, which means that both the data and the models can now be secured by the same access control mechanisms as the rest of the data. So that's the first part, that there is no need for any ETL. The second aspect is the automation. Training is a very important part of machine learning, right, and it impacts the quality of the predictions and such. So traditionally, customers would employ data scientists to influence the training process so that it's done right. And even in the case of Redshift ML, the users are expected to provide a lot of parameters to the training process. So the second thing which we have worked on with HeatWave ML is that it is fully automated. There is absolutely no user intervention required for training. Third is in terms of performance. So one of the things we are very, very sensitive to is performance because performance determines the eventual cost to the customer. So again, in some benchmarks, which we have published, and these are all available on GitHub, we are showing how HeatWave ML is 25 times faster than Redshift ML, and here's the kicker, at 1% of the cost. So four benefits, the data all remain secure inside the database service, it's fully automated, much faster, much lower cost than the competition. >> All right, thank you Nipun. Now, so there's a lot of talk these days about explainability and AI. You know, the system can very accurately tell you that it's a cat, you know, or for you Silicon Valley fans, it's a hot dog or not a hot dog, but they can't tell you how the system got there. So what is explainability, and why should people care about it? >> Right, so when we were talking to customers about what they would like from a machine learning based solution, one of the feedbacks we got is that enterprise is a little slow or averse to uptaking machine learning, because it seems to be, you know, like magic, right? And enterprises have the obligation to be able to explain, or to provide a answer to their customers as to why did the database make a certain choice. With a rule based solution it's simple, it's a rule based thing, and you know what the logic was. So the reason explanations are important is because customers want to know why did the system make a certain prediction? One of the important characteristics of HeatWave ML is that any model which is generated by HeatWave ML can be explained, and we can do both global explanations or model explanations as well as we can also do local explanations. So when the system makes a specific prediction using HeatWave ML, the user can find out why did the system make such a prediction? So for instance, if someone is being denied a loan, the user can figure out what were the attribute, what were the features which led to that decision? So this ensures, like, you know, fairness, and many of the times there is also like a need for regulatory compliance where users have a right to know. So we feel that explanations are very important for enterprise workload, and that's why every model which is generated by HeatWave ML can be explained. >> Now I got to give Snowflakes some props, you know, this whole idea of separating compute from storage, but also bringing the database to the cloud and driving elasticity. So that's been a key enabler and has solved a lot of problems, in particular the snake swallowing the basketball problem, as I often say. But what about elasticity and elasticity in real time? How is your version, and there's a lot of companies chasing this, how is your approach to an elastic cloud database service different from what others are promoting these days? >> Right, so a couple of characteristics. One is that we have now fully automated the process of elasticity, meaning that if a user wants to scale up or scale down, the only thing they need to specify is the eventual size of the cluster and the system completely takes care of it transparently. But then there are a few characteristics which are very unique. So for instance, we can scale up or scale down to any number of nodes. Whereas in the case of Snowflake, the number of nodes someone can scale up or scale down to are the powers of two. So if a user needs 70 CPUs, well, their choice is either 64 or 128. So by providing this flexibly with MySQL HeatWave, customers get a custom fit. So they can get a cluster which is optimized for their specific portal. So that's the first thing, flexibility of scaling up or down to any number of nodes. The second thing is that after the operation is completed, the system is fully balanced, meaning the data across the various nodes is fully balanced. That is not the case with many solutions. So for instance, in the case of Redshift, after the resize operation is done, the user is expected to manually balance the data, which can be very cumbersome. And the third aspect is that while the resize operation is going on, the HeatWave cluster is completely available for queries, for DMLS, for loading more data. That is, again, not the case with Redshift. Redshift, suppose the operation takes 10 to 15 minutes, during that window of time, the system is not available for writes, and for a big part of that chunk of time, the system is not even available for queries, which is very limiting. So the advantages we have are fully flexible, the system is in a balanced state, and the system is completely available for the entire duration operation. >> Yeah, I guess you got that hypergranularity, which, you know, sometimes they say, "Well, t-shirt sizes are good enough," but then I think of myself, some t-shirts fit me better than others, so. Okay, I saw on the announcement that you have this lower price point for customers. How did you actually achieve this? Could you give us some details around that please? >> Sure, so there are two things for announcing this service, which lower the cost for the customers. The first thing is that we have doubled the amount of data that can be processed by a HeatWave node. So if we have doubled the amount of data, which can be a process by a node, the cluster size which is required by customers reduces to half, and that's why the cost drops to half. The way we have managed to do this is by two things. One is support for Bloom filters, which reduces the amount of intermediate memory. And second is we compress the base data. So these are the two techniques we have used to process more data per node. The second way by which we are lowering the cost for the customers is by supporting pause and resume of HeatWave. And many times you find customers of like HeatWave and other services that they want to run some other queries or some other workloads for some duration of time, but then they don't need the cluster for a few hours. Now with the support for pause and resume, customers can pause the cluster and the HeatWave cluster instantaneously stops. And when they resume, not only do we fetch the data, in a very, like, you know, a quick pace from the object store, but we also preserve all the statistics, which are used by Autopilot. So both the data and the metadata are fetched, extremely fast from the object store. So with these two capabilities we feel that it'll drive down the cost to our customers even more. >> Got it, thank you. Okay, I promised I was going to get to the benchmarks. Let's have it. How do you compare with others but specifically cloud databases? I mean, and how do we know these benchmarks are real? My friends at EMC, they were back in the day, they were brilliant at doing benchmarks. They would produce these beautiful PowerPoints charts, but it was kind of opaque, but what do you say to that? >> Right, so there are multiple things I would say. The first thing is that this time we have published two benchmarks, one is for machine learning and other is for SQL analytics. All the benchmarks, including the scripts which we have used are available on GitHub. So we have full transparency, and we invite and encourage customers or other service providers to download the scripts, to download the benchmarks and see if they get any different results, right. So what we are seeing, we have published it for other people to try and validate. That's the first part. Now for machine learning, there hasn't been a precedence for enterprise benchmarks so we talk about aiding open data sets and we have published benchmarks for those, right? So both for classification, as well as for aggression, we have run the training times, and that's where we find that HeatWave MLS is 25 times faster than RedShift ML at one percent of the cost. So fully transparent, available. For SQL analytics, in the past we have shown comparisons with TPC-H. So we would show TPC-H across various databases, across various data sizes. This time we decided to use TPC-DS. the advantage of TPC-DS over TPC-H is that it has more number of queries, the queries are more complex, the schema is more complex, and there is a lot more data skew. So it represents a different class of workloads, and which is very interesting. So these are queries derived from the TPC-DS benchmark. So the numbers we have are published this time are for 10 terabyte TPC-DS, and we are comparing with all the four majors services, Redshift, Snowflake, Google BigQuery, Azure Synapse. And in all the cases, HeatWave is significantly faster and significantly lower priced. Now one of the things I want to point out is that when we are doing the cost comparison with other vendors, we are being overly fair. For instance, the cost of HeatWave includes the cost of both the MySQL node as well as the HeatWave node, and with this setup, customers can run transaction processing analytics as well as machine learning. So the price captures all of it. Whereas with the other vendors, the comparison is only for the analytic queries, right? So if customers wanted to run RDP, you would need to add the cost of that database. Or if customers wanted to run machine learning, you would need to add the cost of that service. Furthermore, with the case of HeatWave, we are quoting pay as you go price, whereas for other vendors like, you know, RedShift, and like, you know, where applicable, we are quoting one year, fully paid upfront cost rate. So it's like, you know, very fair comparison. So in terms of the numbers though, price performance for TPC-DS, we are about 4.8 times better price performance compared to RedShift We are 14.4 times better price performance compared to Snowflake, 13 times better than Google BigQuery, and 15 times better than Synapse. So across the board, we are significantly faster and significantly lower price. And as I said, all of these scripts are available in GitHub for people to drive for themselves. >> Okay, all right, I get it. So I think what you're saying is, you could have said this is what it's going to cost for you to do both analytics and transaction processing on a competitive platform versus what it takes to do that on Oracle MySQL HeatWave, but you're not doing that. You're saying, let's take them head on in their sweet spot of analytics, or OLTP separately and you're saying you still beat them. Okay, so you got this one database service in your cloud that supports transactions and analytics and machine learning. How much do you estimate your saving companies with this integrated approach versus the alternative of kind of what I called upfront, the right tool for the right job, and admittedly having to ETL tools. How can you quantify that? >> Right, so, okay. The numbers I call it, right, at the end of the day in a cloud service price performance is the metric which gives a sense as to how much the customers are going to save. So for instance, for like a TPC-DS workload, if we are 14 times better price performance than Snowflake, it means that our cost is going to be 1/14th for what customers would pay for Snowflake. Now, in addition, in other costs, in terms of migrating the data, having to manage two different databases, having to pay for other service for like, you know, machine learning, that's all extra and that depends upon what tools customers are using or what other services they're using for transaction processing or for machine learning. But these numbers themselves, right, like they're very, very compelling. If we are 1/5th the cost of Redshift, right, or 1/14th of Snowflake, these numbers, like, themselves are very, very compelling. And that's the reason we are seeing so many of these migrations from these databases to MySQL HeatWave. >> Okay, great, thank you. Our last question, in the Q3 earnings call for fiscal 22, Larry Ellison said that "MySQL HeatWave is coming soon on AWS," and that caught a lot of people's attention. That's not like Oracle. I mean, people might say maybe that's an indication that you're not having success moving customers to OCI. So you got to go to other clouds, which by the way I applaud, but any comments on that? >> Yep, this is very much like Oracle. So if you look at one of the big reasons for success of the Oracle database and why Oracle database is the most popular database is because Oracle database runs on all the platforms, and that has been the case from day one. So very akin to that, the idea is that there's a lot of value in MySQL HeatWave, and we want to make sure that we can offer same value to the customers of MySQL running on any cloud, whether it's OCI, whether it's the AWS, or any other cloud. So this shows how confident we are in our offering, and we believe that in other clouds as well, customers will find significant advantage by having a single database, which is much faster and much lower price then what alternatives they currently have. So this shows how confident we are about our products and services. >> Well, that's great, I mean, obviously for you, you're in MySQL group. You love that, right? The more places you can run, the better it is for you, of course, and your customers. Okay, Nipun, we got to leave it there. As always it's great to have you on theCUBE, really appreciate your time. Thanks for coming on and sharing the new innovations. Congratulations on all the progress you're making here. You're doing a great job. >> Thank you, Dave, and thank you for the opportunity. >> All right, and thank you for watching this CUBE conversation with Dave Vellante for theCUBE, your leader in enterprise tech coverage. We'll see you next time. (upbeat music)
SUMMARY :
and get paid for the full Very happy to be back. maybe to kick things off you and that's the part which is unique. that adds to cost. So it is indeed the case that HeatWave Well, at the end of the day, And the main reason we are told So can you give us some names? and they were running their application and some of the white space and for that they have to extract the data and for the entire duration I'm going to ask you about the benchmarks, So one of the things we are You know, the system can and many of the times there but also bringing the So the advantages we Okay, I saw on the announcement and the HeatWave cluster but what do you say to that? So the numbers we have and admittedly having to ETL tools. And that's the reason we in the Q3 earnings call for fiscal 22, and that has been the case from day one. Congratulations on all the you for the opportunity. All right, and thank you for watching
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Vellante | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
$25 | QUANTITY | 0.99+ |
Japan | LOCATION | 0.99+ |
Larry Ellison | PERSON | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
Brazil | LOCATION | 0.99+ |
two techniques | QUANTITY | 0.99+ |
2009 | DATE | 0.99+ |
EMC | ORGANIZATION | 0.99+ |
14.4 times | QUANTITY | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
85 | QUANTITY | 0.99+ |
10 | QUANTITY | 0.99+ |
Sun | ORGANIZATION | 0.99+ |
300 times | QUANTITY | 0.99+ |
14 times | QUANTITY | 0.99+ |
two things | QUANTITY | 0.99+ |
$5.6 billion | QUANTITY | 0.99+ |
2020 | DATE | 0.99+ |
HP | ORGANIZATION | 0.99+ |
80% | QUANTITY | 0.99+ |
MySQL | TITLE | 0.99+ |
25 times | QUANTITY | 0.99+ |
Nipun Agarwal | PERSON | 0.99+ |
Redshift | TITLE | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
both | QUANTITY | 0.99+ |
90 times | QUANTITY | 0.99+ |
Java | TITLE | 0.99+ |
Python | TITLE | 0.99+ |
$30 billion | QUANTITY | 0.99+ |
ORGANIZATION | 0.99+ | |
70 CPUs | QUANTITY | 0.99+ |
MySQL HeatWave | TITLE | 0.99+ |
second aspect | QUANTITY | 0.99+ |
RedShift | TITLE | 0.99+ |
Second thing | QUANTITY | 0.99+ |
RedShift ML | TITLE | 0.99+ |
1% | QUANTITY | 0.99+ |
Redshift ML | TITLE | 0.99+ |
Nipun | PERSON | 0.99+ |
Third | QUANTITY | 0.99+ |
one percent | QUANTITY | 0.99+ |
13 times | QUANTITY | 0.99+ |
first part | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
15 times | QUANTITY | 0.99+ |
two capabilities | QUANTITY | 0.99+ |
Benoit Dageville, Snowflake | AWS re:Invent 2021
(upbeat music) >> Hi, everyone, welcome back to theCUBE's coverage of AWS re:Invent 2021. We're wrapping up four days of coverage, two sets. Two remote sets, one in Boston, one in Palo Alto. And really, it's a pleasure to introduce Benoit Dageville. He's the Press Co-founder of Snowflake and President of Products. Benoit, thanks for taking some time out and coming to theCUBE. >> Yeah, thank you for having me, Dave. >> You know, it's really a pleasure. We've been watching Snowflake since, maybe not 2012, but mid last decade you hit our radar. We said, "Wow, this company is going to go places." And yeah, we made that call correctly. But it's been a pleasure to sort of follow you. We've talked a little bit remotely. I kind of want to go back to some of the fundamentals. First of all, I wanted mention your earnings last night. If you guys didn't see it, again, triple digit growth, $1.8 billion RPO, cashflow actually looking pretty good. So, pretty amazing. Oh, and 173% NRR, you know, wow. And Mike Scarpelli is kind of bummed that you did so well. And I know why, right? Because it's going to be at some point, and he dials it down for the expectations and Wall Street says, "Oh, he's sandbagging." And then at some point you're actually going to meet expectations and people are going to go, "Oh, they met expectations." But anyway, he's a smart guy, he know what he's doing. (Benoit laughing) I loved it, it was so funny listening to him last night. But anyway, I want to go back to, when I talked to practitioners about data warehousing pre-cloud, they would say sound bites like, it's like a snake swallowing a basketball, they would tell me. And the other thing they said, "We just chased the chips. Every time a new Intel chip comes out, we have to bring in new servers, and we're struggling." The cloud changed all that. Your vision and Terry's vision changed all that. Maybe go back to the fundamentals of what you saw. >> Yeah, we really wanted to address what we call the data challenges. And if you remember at that time, data challenge was first of the volume of data, machine-generated data. So it was way more than just structured data, right? Machine-generated data is weblogs, and it's at petabyte scale. And there was no good solution for that type of data. Big data was not a great solution, Hadoop was really bad. And there was no good solution for that. So we thought we should do something for big data. The other aspect was concurrency, right? Everyone wants to use these data analytic platform in an enterprise, right? And you have more and more workload running against the same data, and the systems that were built were not scaling for these workloads. So you had to silo data, right? That's the only way big enterprise could deal with that, is to create many different silos, Oracle, Teradata, data mass, you would hear data mass. All of it was to afloat, right, this data? And then there was the, what do we call, data sharing. How to get access to data which is not born inside the enterprise, right? So with Terry, we wanted to solve all these challenges and we thought the only way to solve it was the cloud. And the cloud has really two free aspects. One is the elasticity, for all of a sudden, you can run every workload that you want concurrently, in parallel, on different computer resources, and you can run them against the same data. So this is kind of the data lake model, if you want. At the same time, you can, in the cloud, create a service. So you can remove complexity from users and make it really easy for new workloads to be added to the system, because you can manage, you can create a managed service, where all the sudden our customers, they don't need to manage infrastructure, they don't need to patch, they don't need to tune. Everything is done by Snowflake, the service, and they can just load in and run their query. And the third aspect is really collaboration. Is how to connect data sets together. And that's almost a new product for Snowflake, this data sharing. So we really at Snowflake was all about combining big data and data warehouse in one system in the cloud, and have only one single system where you can put all your data and all your workload. >> So you weren't necessarily trying to solve the data warehouse problem, you were trying to solve a data problem. And then it just so happened data warehouse was a logical entry point for you. >> It's really not that. Yes, we wanted to solve the data problem. And for us big data was a really important problem to solve. So from day one, Snowflake was all about machine generated data, petabyte scale, but we wanted to do it right. And for us, right was not compromising on data warehouse principle, which is a CDT of transaction, which is really fast response time, and which is also simplicity. So as I said, we wanted to solve kind of all the problems at the time of volume of data, concurrency, and these sharing aspects. >> This was 2012. You knew at that time that Hadoop wasn't going to be the answer. >> No, I mean, we were really, I mean, everyone knew that. Everyone knew Hadoop was really bad. You know, complex to manage, really slow. It had good aspects, right? This was the only system that could manage petabyte scale data sets. That's the only thing- >> Cheaply. >> Yeah, and cheaply which was good. And we wanted really to do that, plus have all the good attributes of data warehouse system. And at the same time, we wanted to build a system where if you are data warehouse customer, if you are coming from Teradata, you can migrate to Snowflake and you will get to a system which is faster than what you had on-premise, right. That's why it's pretty cool. So we wanted to do big data without compromising on data warehouse. >> So several years ago we looked at the hyperscalers and said, "Wow, last year they spent $100 billion in CapEx." And so, we started to think about this abstraction layer. And then we saw what you guys announced with the data cloud. We call it super clouds. And we see that as exactly what you're building. So that's clearly not just a data warehouse or database, it's technology that really hides the underlying complexity of all those clouds, and it allows you to have federated governance and data sharing, all those things. Can you talk about sort of how you think about that architecture? >> So for me, what I say is that really Snowflake is the worldwide web of data. And we are indeed a super cloud, or we are super-posed to the infrastructure cloud, which is our friends at Amazon, and of course, Azure, I mean, Microsoft and Google. And as any cloud, we have regions, Snowflake regions all over the world, and located on different cloud providers. At the same time, our platform is global in the sense that every region interconnects with all the other regions, this is our snow grid and data mesh, if you want. So that as an organization you can have your presence on several Snowflake region. It doesn't matter which cloud provider, so you can mix AWS with Azure. You can use our cloud like that. And indeed you can, this is a cloud where you can store your data, that's the thing that really matters, and data is structured, but it's machine structure, as I say, machine generated, petabyte scale, but there's also unstructured, right? We have added support for images, text, videos, where you can process this data in our system, and that's the workload spout. And workload, what is very important is that you can run this workload, any number of workloads. So the number of workloads is effectively unlimited with Snowflake because each workload can have its dedicated set of compute resources all operating on the same data set. And the type of workloads is also very important. It's not only about dashboards and data warehouse, it's data engineering, it's data science, it's building application. We have many of our customers who are building full-scale cloud applications on top of Snowflake. >> Yeah so the other thing, if you're not familiar with Snowflake, I don't know, maybe your head has been in the sand for a while, but separating compute and storage, I don't know if you were the first, but you were certainly the first to popularize it. And that allowed you to solve that chasing the chips problem and the swallowing the basketball, right? Because you have virtually infinite resources now at your disposal. >> Yeah, this is really the concurrency challenge that I was mentioning. Everyone wants to access the data. And of course, if everyone runs on the same set of compute resources, you have a bottleneck. So Snowflake was really about this multi-workload. We call it Multi-Cluster Shared Data Architecture. But it's not difficult to run multiple cluster if you don't have consistency of data. So how to do that while maintaining transactional property of data as CDT, right? You cannot modify data from different clusters. And when you commit, every other cluster will immediately see the change, right, as if everyone was running on the same cluster. So that was the challenge that we solve when we started Snowflake. >> Used the term data mesh. What is data mesh to Snowflake? Is it a concept, is it fabric? >> No, it's a very interesting point. As much as we like to centralize data, this becomes a bottleneck, right? When you are a large organization with different independent units, everyone wants to manage their own data and they have domain-specific expertise about that data. So having it centralized in IT is not practical. At the same time, you really want to be able to connect these different data sets together and join different data together, right? So that's the data mesh architecture. Each data set is managed independently by business owners, and then there is a contract which is exposed to others, and you can combine. And Snowflake architectures with data sharing, right. Data sharing that can happen within an organization, or across organization, allows you to connect any data with any other data on our platform. >> Yeah, so when I first heard that term, you guys using the term data mesh, I got very excited because it was kind of the data mesh is, my view, anyway, is going to be the fundamental architecture of this decade and beyond. And the principles, if I understand it correctly, you're applying the principles of Jim Octagon's data mesh within Snowflake. So decentralized data doesn't have to be physically in one place. Logically it's in the data cloud. >> It's logically decentralized, right? It's independently managed, and the reason, right, is the data that you need to use is not produced by your, even if in your company you want to centralize the data and having only one organization, let's say IT managing that, let's say, pretend. Yet you need to connect with other datasets, which is managed by other organizations. So by nature, the data that you use cannot be centralized, right? So now that you have this principle, if you have a platform where you can store all the data, wherever it is, and you can connect these data very seamlessly, then we can use that platform for your enterprise, right? To have different business units independently manage their data sets, connects these together so that as a company you have a 360 view of your customers, for example. But you can expand that outside of your enterprise and connect with data sets, which are from your vertical, for example, financial data set that you don't have in your company, or any public data set. >> And the other key principles, I think, that you've touched on really is the line of business now. Increasingly they're building data products that are creating value, and then also there's a self-service component. Assuming there's the fourth principle, governance. You got to have federated governance. And it seems like you've kind of ticked the boxes, more than tick the boxes, but engineered a solution to solve for those. >> No, it's very true. So Snowflake was really built to be really simple to use. And you're right. Our vision was, it would be more than IT, right? Who is going to use Snowflake is going now to be business unit, because you do not have to manage infrastructure. You do not have to patch. You do not have to do these things that business cannot do. You just have to load your data and run your queries, and run your applications. So now business can directly use Snowflake and create value from that. And yes, you're right, then connect that data with other data sets and to get maximum insights. >> Can you please talk about some of the things you do with AWS here at the event. I'm interested in what you're doing with your machine learning initiatives that you've recently announced, the AI piece. >> Yes, so one key aspects is data is not only about SQL, right? We started with SQL, but we expanded our platform to what we call data programmability, which is really about running program at scale across a large volume of data. And this was made popular with a programming model which was introduced by Pendal, DataFrames. Later taken by Spark, and now we have DataFrames in Snowflake, Where we are different than other systems, is that these DataFrame programs, which are in Python, or Java, or Scala, you program with data. These DataFrames are compiled to our single execution platforms. So we have one single execution platform, which is a data flow execution platform, which can run both SQL very efficiently, as I said, data warehouse speed, and also these very complex programs running Python and Java against this data. And this is a single platform. You don't need to use two different systems. >> Now so, you kind of really attack the traditional analytics base. People said, "Wow, Snowflake's really easy." Now you're injecting AI and machine intelligence. I see Databricks coming at it from the other angle. They started with machine learning, now they're sort of going after the analytics. Does there need to be a semantic layer to connect, 'cause it's the same raw data. Does there need to be a semantic layer to connect those two worlds? >> Yes, and that's what we are doing in our platform. And that's very novel to Snowflake. As I said, you interact with data in different program. You pick your program. You are a SQL programmer, use SQL. You are a Python programmer, use DataFrames with Python. It doesn't really matter. And then the semantic layer is our compiler and our processing engine, is going to translate both your program and my program in Python, your program in SQL, to the same execution platform and to the same programming language that Snowflake internally, we don't expose our programming language, but it's a data flow programming language that our execution platform executes. So at the end, we might execute exactly the same program, potentially. And that's very important because we spent all our IP and all our time, engineering time to optimize this platform, to make it the fastest platform. And we want to use that platform for any type of workloads, whether it's data programs or SQL. >> Now, you and Terry were at Oracle, so you know a lot about bench marketing. As Larry would stand up and say, "We killed the competition." You guys are probably behind it, right. So you know all about that. >> We are very behind it. >> So you know a lot about that. I've had some experience, I'm not a technologist, but I'm an observer and analyst. You have to take benchmarking with a very big grain of salt. So you guys have generally stayed away from that. Databricks came out and they came up with all these benchmarks. So you had to respond, because otherwise it's out there. Now you reran the benchmarks, you took out the materialized views and all the expensive stuff that they included in your cost, your price performance, but then you wrote, I thought, a very cogent blog. Maybe you could talk about sort of why you did that and your general philosophy around bench marketing. >> Yeah, from day one, with Terry we say never again we will participate in this really stupid benchmark war, because it's really not in the interest of customers. And we have been really at the frontline of that war with Terry, both of us, really doing special tricks, right? And optimizing this query to death, this query that no one runs apart from the synthetic benchmark. We optimize them to death to have the best number when we were at Oracle. And we decided that this is really not helping customers in the end. So we said, with Snowflake, we'll not do that. And actually, we are not the only one not to do that. If you look at who has published TPC-DS, you will see no one, none of the big vendors. It's not because they cannot run TPC-DS, Oracle can run it, I know that. And all the other big data warehouse vendor can, but it's something of a little bit of past. And TPC was really important at some point, and is not really relevant now. So we are not going to compete. And that's what we said is basically now our blog. We are not interesting in participating in this war. We want to invest our engineering effort and our IP in solving real world issues and performance issues that we have. And we want to improve our engine for these real world customers. And the nice thing with Snowflake, because it's a service, we see exactly all the queries that our customers are executing. So we know where we are struggling as a system, and that's where we want to invest and we want to improve. And if you look at many announcements that we made, it's all about under-the-cover improving Snowflake and getting the benefit of this improvement to our customer. So that was the message of that blog. And yes, the message was okay. Mr. Databricks, it's nice, and it's perfect that, I mean, everyone makes a decision, right? We made the decision not to participate. Databricks made another decision, which is very fine, and that's fine that they publish their number on their system. Where it is not fine is that they published number using Snowflake and misrepresenting our performance. And that's what we wanted also to correct. >> Yeah, well, thank you for going into that. I know it's, look, leaders don't necessarily have to get involved in that mudslide. (crosstalk) Enough said about that, so that's cool. I want to ask you, I interviewed Frank last spring, right after the lockdown, he was kind enough to come on virtually, and I asked him about on-prem. And he was, you know Frank, he doesn't mix words, He said, "We're not getting into a halfway house. That's not going to happen." And of course, you really can't do what you do on-prem. You can't separate compute, some have tried, but it's not the same. But at the same time that you see like Andreessen comes out with this blog that says a huge portion of your cost of goods sold is going to be the cloud, so you're going to have to repatriate. Help me square that circle. Is it cloud forever? Is it will you never say never? What can you share of that? >> I will never say never, it's not my style. I always say you can always change your mind, and maybe different factors can change your mind. What was true at some point might not be true at a later point. But as of now, I don't see any reason for us to go on-premise. As you mentioned at the beginning, right, Snowflake is growing like crazy. The world is moving to the cloud. I think maybe it goes both ways, but I would say 90% or 99% of the world is moving to the cloud. Maybe 1% is coming back for some very specific reasons. I don't think that the world is going to move back on-premise. So in the end we might miss a small percentage of the workload that will stay on-premise and that's okay. >> And as well, if you dig into some of the financial statements you'll see, read the notes where you've renegotiated, right? We're talking big numbers. Hundreds and hundreds of millions of dollars of cost reduction, actually more, over a 10 year period. Billions of your cloud bills. So the cloud suppliers, they don't want to lose you as a customer, right? You're one of their biggest customer. So it's awesome. Last question is kind of, your work now is to really drive the data cloud, get adoption up, build that supercloud, we call it. Maybe you could talk a little bit about how you see the future. >> The future is really broadened, the scope of Snowflake, and really, I would say the marketplace, and data sharing, and services, which are directly built natively on Snowflake and are shared through our platform, and can operate, it can mix data on provider-side with data on consumer-side, and creating this collaboration within the Snowflake data cloud, I think is really the future. And we are really only scratching the surface of that. And you can see the enthusiasm of Snowflake data cloud and vertical industry We have nuanced the final show data cloud. Industry, complete vertical industry, latching on that concept and collaborating via Snowflake, which was not possible before. And I think you talked about machine learning, for example. Machine learning, collaboration through machine learning, the ones who are building this advanced model might not be the same as the one who are consuming this model, right? It might be this collaboration between expertise and consumer of that expertise. So we are really at the beginning of this interconnected world. And to me the world wide web of data that we are creating is really going to be amazing. And it's all about connecting. >> And I'm glad you mentioned the ecosystem. I didn't give enough attention to that. Because as a cloud provider, which essentially you are, you've got to have a strong ecosystem. That's a hallmark of cloud. And then the other, vertical, that we didn't touch on, is media and entertainment. A lot of direct-to-consumer. I think healthcare is going to be a huge vertical for you guys. All right we got to go, Terry. Thanks so much for coming on "theCUBE." I really appreciate you. >> Thanks, Dave. >> And thank you for watching. This a wrap from AWS re:Invent 2021. "theCUBE," the leader in global tech coverage. We'll see you next time. (upbeat music)
SUMMARY :
and coming to theCUBE. and he dials it down for the expectations At the same time, you can, in So you weren't So as I said, we wanted to You knew at that time that Hadoop That's the only thing- And at the same time, we And then we saw what you guys is that you can run this And that allowed you to solve that And when you commit, every other cluster What is data mesh to Snowflake? At the same time, you really And the principles, if I is the data that you need to And the other key principles, I think, and to get maximum insights. some of the things you do and now we have DataFrames in Snowflake, 'cause it's the same raw data. and to the same programming language So you know all about that. and all the expensive stuff And the nice thing with But at the same time that you see So in the end we might And as well, if you dig into And I think you talked about And I'm glad you And thank you for watching.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Frank | PERSON | 0.99+ |
Mike Scarpelli | PERSON | 0.99+ |
Benoit Dageville | PERSON | 0.99+ |
Larry | PERSON | 0.99+ |
Terry | PERSON | 0.99+ |
Boston | LOCATION | 0.99+ |
$1.8 billion | QUANTITY | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Benoit | PERSON | 0.99+ |
Palo Alto | LOCATION | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
90% | QUANTITY | 0.99+ |
$100 billion | QUANTITY | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Dave | PERSON | 0.99+ |
last year | DATE | 0.99+ |
ORGANIZATION | 0.99+ | |
99% | QUANTITY | 0.99+ |
2012 | DATE | 0.99+ |
Teradata | ORGANIZATION | 0.99+ |
SQL | TITLE | 0.99+ |
two sets | QUANTITY | 0.99+ |
Snowflake | TITLE | 0.99+ |
one | QUANTITY | 0.99+ |
Andreessen | PERSON | 0.99+ |
Two remote sets | QUANTITY | 0.99+ |
one system | QUANTITY | 0.99+ |
One | QUANTITY | 0.99+ |
both | QUANTITY | 0.99+ |
first | QUANTITY | 0.99+ |
Hundreds | QUANTITY | 0.99+ |
1% | QUANTITY | 0.99+ |
third aspect | QUANTITY | 0.99+ |
Scala | TITLE | 0.99+ |
Snowflake | ORGANIZATION | 0.99+ |
Python | TITLE | 0.99+ |
Intel | ORGANIZATION | 0.99+ |
Databricks | PERSON | 0.99+ |
two free aspects | QUANTITY | 0.99+ |
mid last decade | DATE | 0.99+ |
Java | TITLE | 0.99+ |
Jim Octagon | PERSON | 0.99+ |
both ways | QUANTITY | 0.99+ |
fourth principle | QUANTITY | 0.98+ |
two worlds | QUANTITY | 0.98+ |
last night | DATE | 0.98+ |
173% | QUANTITY | 0.98+ |
360 view | QUANTITY | 0.98+ |
several years ago | DATE | 0.98+ |
each workload | QUANTITY | 0.97+ |
last spring | DATE | 0.97+ |
CapEx | ORGANIZATION | 0.97+ |
Wall Street | ORGANIZATION | 0.97+ |
one organization | QUANTITY | 0.95+ |
single platform | QUANTITY | 0.95+ |
four days | QUANTITY | 0.95+ |
First | QUANTITY | 0.95+ |
Snowflake | EVENT | 0.94+ |
Azure | ORGANIZATION | 0.94+ |
Video Exclusive: Oracle Announces New MySQL HeatWave Capabilities
(bright music) >> Surprising many people, including myself, Oracle last year began investing pretty heavily in the MySQL space. Now those investments continue today. Let me give you a brief history. Last December, Oracle made its first HeatWave announcement. Where converged OLTP and OLAP together in a single MySQL database. Now, what wasn't surprising was the approach Oracle took. They leveraged hardware to improve the performance and lower the cost. You see when Oracle acquired Sun more than a decade ago, rather than rely on loosely coupled partnerships with hardware vendors to speed up its databases. Oracle set out on a path to tightly integrate hardware and software innovations using its own in-house engineering. So with his first, MySQL HeatWave announcement, Oracle leaned heavily on developing software on top of an in-memory database technology to create an embedded OLAP capability that eliminates the need for ETL and data from a transaction system into a separate analytics database. Now in doing so, Oracle is taking a similar approach with its MySQL today, as it does for its, or back then, whereas it does for its mainstream Oracle database. And today extends that. And what I mean by that is it's converging capabilities in a single platform. So the argument is this simplifies and accelerates analytics that lowers the costs and allows analytics, things like analytics to be run on data that is more fresh. Now, as many of you know, this is a different strategy than how, for example, an AWS approaches database where it creates purpose-built database services, targeted at specific workloads. These are philosophical design decisions made for a variety of reasons, but it's very clear which direction Oracle is headed in. Today, Oracle continues its HeatWave announcement cadence with a focus on increased automation as well. The company is continuing the trend of using clustering technology to scale out for both performance and capacity. And again, that theme of marrying hardware with software Oracle is also making announcements that focus on security. Hello everyone and welcome to this video exclusive. This is Dave Vellante. We're going to dig into these capabilities, Nipun Agarwal here. He's VP of MySQL HeatWave and advanced development in Oracle. Nipun has been leading the MySQL and HeatWave development effort for nearly a decade. He's got 180 patents to his name about half of which are associated with HeatWave. Nipun, welcome back to the show. Great to have you. >> Thank you, Dave. >> So before we get into the new news, if we could, maybe you could give us all a quick overview of HeatWave again, and what problems you originally set out to solve with it? >> Sure. So HeatWave is a in-memory query accelerator for MySQL. Now, as most people are aware, MySQL was originally designed and optimized for transactional processing. So when customers had the need to run analytics, they would need to extract data from the, MySQL database into another database and run analytics. With MySQL HeatWave, customers get a single database, which can be used both for transactional processing and for analytics. There's no need to move the data from one database to another database and all existing tools and applications, which are compatible with MySQL, continue to work as is. So in-memory query accelerator for MySQL and this is significantly faster than any version of MySQL database. And also it's much faster than specialized databases for analytics. >> Yeah, we're going to talk about that. And so obviously when you made the announcement last December, you had, I'm sure, a core group of, of early customers and beta customers, but then you opened it up to the world. So what was the reaction once you expose that to customers? >> The reaction has been very positive, Dave. So initially we're thinking that they're going to be a lot of customers who are on premise users of MySQL, who are going to migrate to the service. And surely that was the case. But the part which was very interesting and surprising is that we see many customers who are migrating from other cloud vendors or migrating from other cloud services to MySQL HeatWave. And most notably the biggest number of migrations we are seeing are from AWS Aurora and AWS RDS. >> Interesting. Okay. I wonder if you've got other feedback you're obviously responding in a pretty, pretty fast cadence here, you know, seven, eight month cadence. What are the feedback that you get, were there gaps that customers wanted you to to close? >> Sure. Yes. So as customers starting moving in to HeatWave they found that HeatWave is much faster, much cheaper. And when it's so much faster, they told us that there are some classes of queries, which could just not run earlier, which they can now with HeatWave. So it makes the applications richer because they can write new classes of queries with which they could not in the past. But in terms of the feedback or enhancement requests we got, I would say they will categorize the number one was automation. There've been customers move their database from on-premise to the cloud. They expect more automation. So that was the number one thing. The second thing was people wanted the ability to run analytics on larger sizes of data with MySQL HeatWave because they like what they saw and they wanted us to increase the data size limit, which can be processed by HeatWave. Third one was they wanted more classes of queries to be accessed with HeatWave. Initially, when we went out, HeatWave was designed to be an accelerator for analytic queries but more and more customers started seeing the benefit of beyond just analytics. More towards mixed workloads. So that was a third request. And then finally they wanted us to scale to a larger cluster size. And that's what we have done over the last several months that incorporating this feedback, which you've gotten from customers. >> So you're addressing those, those, those gaps. And thank you for sharing that with us. I got the press release here. I wonder if we could kind of go through these. Let's start with AutoPilot, you know, what's, what's that all about? What's different about AutoPilot? >> That's right. So MySQL AutoPilot provides machine learning based automation. So the first difference is that not only is it automating things, where and as a cloud provider as a service provider, we feel there are a lot of opportunities for us to automate, but the big difference about the approach we've taken with MySQL AutoPilot is that it's all driven based on the data and the queries. It's machine learning based automation. That's the first aspect. The second thing is this is all done natively in the server, right? So we are enhancing the, MySQL engine. We're enhancing the HeatWave engine and that's where all the logic and all the processing resides. In order to do this, we have had to collect new kinds of data. So for instance, in the past, people would collect statistics, which are based on just the data. Now we also collect statistics based on queries, for instance, what is the compilation time? What is the execution time? And we have augmented this with new machine learning models. And finally we have made a lot of innovations, a lot of inventions in the process where we collect data in a smart way. We process data in a smart way and the machine learning models we are talking about, also have a lot of innovation. And that's what gives us an edge over what other vendors may try to do. >> Yeah. I mean, I'm just, again, I'm looking at this meat, this pretty meaty preference, press release. Auto-provisioning, auto parallel load, auto data placement, auto encoding, auto error, auto recovery, auto scheduling, and you know, using a lot of, you know, computer science techniques that are well-known, first in first out, auto change propagation. So really focusing on, on driving that automation for customers. The other piece of it that struck me, and I said this in my intro is, you know, using clustering technology, clustering technology has been around for a long time as, as in-memory database, but applying it and integrating it. My sense is that's really about scale and performance and taking advantage of course, cloud being able to drive that scale instantaneously, but talk about scale a little bit in your philosophy there and why so much emphasis on scalability? >> Right. So what we want to do is to provide the fastest engine for running analytics. And that's why we do the processing in memory. Now, one of the issues with in process, in-memory processing is that the amount of data which you're processing has to reside in memory. So when we went out in the version one, given the footprint of the MySQL customers we spoke to, we thought 12 terabytes of processing at any given point in time, would be adequate. In the very first month, we got feedback that customers wanted us to process larger amounts of data with HeatWave, because they really like what they saw and they wanted us to increase. So if we have increased deployment from 12 terabytes to 32 terabytes and in order to do so, we now have a HeatWave cluster, which can be up to 64 nodes. That's one aspect on the query processing side. Now to answer the question as to why so much of an emphasis it's because this is something which is extremely difficult to do in query processing that as you scale the size of the cluster, the kind of algorithms, the kind of techniques you have to use so that you achieve a very high efficiency with a very large cluster. These are things which are easy to do, because what we want to make sure is that as customers have the need for like, like a processing larger amount of data, one of the big benefits customers get by using a cloud as opposed to on-premise is that they don't need to worry about provisioning gear ahead of time. So if they have more data with the cloud, they should be able to like process pool data easily. But when they process more data, they should expect the same kind of performance. So same kind of efficiency on a larger data size, similar to a smaller data size. And this is something traditionally other database vendors have struggled to provide. So this is a important problem. This is a tough engineering problem. And that's why a lot of emphasis on this to make sure that we provide our customers with very high efficiency of processing as they increase the size of the data. >> You're saying, traditionally, you'll get diminishing returns as you scale. So sort of as, as the volume grows, you're not able to take as much advantage or you're less efficient. And you're saying you've, you've largely solved that problem you're able to use. I mean, people always talk about scaling linearly and I'm always skeptical, but, but you're saying, especially in database, that's been a challenge, but you're, you're saying you've solved that problem largely. >> Right. What I would say is that we have a system which is very efficient, more efficient than like, you know, any of the database we are aware of. So as you said, perfect scaling is hard with you, right? I mean, that's a critical limit of scale factor one. That's very hard to achieve. We are now close to 90% efficiency for n2n queries. This is not for primitives. This is for n2n queries, both on industry benchmarks, as well as real world customer workloads. So this 90% efficiency we believe is very good and higher than what many of the vendors provide. >> Yeah. Right. So you're not, not just primitives the whole end to end cycle. I think 0.89, I think was the number that I, that I saw just to be technically correct there, but that's pretty, pretty good. Now let's talk about the benchmarks. It wouldn't be an Oracle announcement with some, some benchmarks. So you laid out today in your announcement, some, some pretty outstanding performance and price performance numbers, particularly you called out it's, it's. I feel like it's a badge of honor. If, if Oracle calls me out, I feel like I'm doing well. You called out Snowflake and Amazons. So maybe you could go over those benchmark results that we could peel the onion on that a little bit. >> Right. So the first thing to realize is that we want to have benchmarks, which are credible, right? So it's not the case that we have taken some specific unique workloads where HeatWave shines. That's not the case. What we did was we took a industry standard benchmark, which is like, you know, TPC-H. And furthermore, we had a third party, independent firm do this comparison. So let's first compare with Snowflake. On a 10 terabyte TPC-H benchmark HeatWave is seven times faster and one fifth the cost. So with this, it is 35 times better price performance compared to Snowflake, right? So seven times faster than Snowflake and one fifth of the cost. So HeatWave is 35 times better price performance compared to Snowflake. Not just that, Snowflake only does analytics, whereas MySQL HeatWave does both transactional processing and analytics. It's not a specialized database, MySQL HeatWave is a general purpose database, which can do both OLTP analytics whereas Snowflake can only do analytics. So to be 35 times more efficient than a database service, which is specialized only for one case, which is analytics, we think it's pretty good. So that's a comparison with Snowflake. >> So that's, that's you're using, I presume you got to be using list prices for that, obviously. >> That is correct. >> So there's discounts, let's put that into context of maybe 35 X better. You're not going to get that kind of discount. I wouldn't think. >> That is correct. >> Okay. What about Redshift? Aqua for Redshift has gained a lot of momentum in the marketplace. How do you compare against that? >> Right. So we did a comparison with Redshift, Aqua, same benchmark, 10 terabytes, TPC-H. And again, this was done by a third party. Here, HeatWave is six and a half times faster at half the cost. So HeatWave is 13 times better price performance compared to Redshift Aqua. And the same thing for Redshift. It's a specialized database only for analytics. So customers need to have two databases, one for transaction processing, one for analytics, with Redshift. Whereas with MySQL HeatWave, it's a single database for both. And it is so much faster than Redshift. That again, we feel is a pretty remarkable. >> Now, you mentioned earlier, but you're not, you're obviously I presume not, you're not cheating here. You're not including the cost of the transaction processing data store. Right? We're, we're, we're ignoring that for a minute. Ignoring that you got to, you know, move data, ETL, we're just talking about like the like, is that correct? >> Right. This is extremely fair and extremely generous comparison. Not only are we not including the cost of the source OLTP database, the cost in the case of the Redshift I'm talking about is the cost for one year paid full upfront. So this is a best pricing. A customer can get for one year subscription with Redshift. Whereas when I'm talking about HeatWave, this is the pay as you go price. And the third aspect is, this is Redshift when it is completely fully optimized. I don't think anyone else can get much better numbers on Redshift than we have. Right? So fully optimized configuration of Redshift looking at the one year pre-pay cost of Redshift and not including the source database. >> Okay. And then speaking of transaction processing database, what about Aurora? You mentioned earlier that that you're seeing a lot of migration from Aurora. Can you add some color to that? >> Right. And this is a very interesting question in a, it was a very interesting observation for us when we did the launch back in December, we had numbers on four terabytes, TPC-H with Aurora. So if you look at the same benchmark, four terabytes TPC-H HeatWave is 1,400 times faster than Aurora at half the cost, which makes it 2,800 times better price performance compared to Aurora. So very good number. What we have found is that many customers who are running on Aurora started migrating to HeatWave, and these customers had a mix of transaction processing and analytics, and the data sizes are much smaller. Even those customers found that there was a significant improvement in performance and reduction in costs when they migrated to HeatWave. In the announcement today, many of the references are those class of customers. So for that, we decided to choose another benchmark, which is called CH-benchmark on a much smaller data size. And even for that, even for mixed workloads, we find that HeatWave is 18 times faster, provides over a hundred times higher throughput than Aurora at 42% of the cost. So in terms of price performance gain, it is much, much better than Aurora, even for mixed workloads. And then if you consider a pure OLTP assume you have an application, which has only OLTP, which by the way is like, you know, a very uncommon scenario, but even if that were be the case, in that case for pure OLTP only, MySQL HeatWave is at par with Aurora, with respect to performance, but MySQL HeatWave costs 42% of Aurora. So the point is that in the whole spectrum, pure OLTP, mixed workloads or analytics, MySQL HeatWave is going to be fraction of the cost of a Aurora. And depending upon your query workload, your acceleration can be anywhere from 14,000 times to 18 times faster. >> That's interesting. I mean, you've been at this for the better part of a decade, because my sense is that HeatWave is all about OLAP. And that's really where you've put the majority, if not all of the innovation. But you're saying just coming into December's announcement, you were at par with a, in a, in a, in a, in a rare, but, but hypothetical OLTP workload. >> That is correct. >> Yeah. >> Well, you know, I got to push you still on this because a lot of times these benchmarks are a function of the skills of the individuals performing these tests, right? So can I, if I want to run them myself, you know, if you publish these benchmarks, what if a customer wants to replicate these tests and try to see if they can tune up, you know, Redshift better than you guys did? >> Sure. So I'll say a couple of things. One is all the numbers which I'm talking about both for Redshift and Snowflake were done by a third party firm, but all the numbers we is talking about, TPC-H, as well has CH-benchmark. All the scripts are published on GitHub. So anyone is very welcome. In fact, we encourage customers to go and try it for themselves, and they will find that the numbers are absolutely as advertised. In fact, we had couple of companies like in the last several months who went to GitHub, they downloaded our TPCH scripts and they reported that the performance numbers they were seeing with HeatWave were actually better than we had published back in December. And the reason was that since December we had new code, which was running. So our numbers were actually better than advertised. So all the benchmarks are published. They are all available on GitHub. You can go to the HeatWave website on oracle.com and get the link for it. And we welcome anyone to come and try these numbers for themselves. >> All right. Good. Great. Thank you for that. Now you mentioned earlier that you were somewhat surprised, not surprised that you got customers migrating from on-prem databases, but you also saw migration from other clouds. How do you expect the trend with regard to this new announcement? Do you have any sense as to how that's going to go? >> Right. So one of the big changes from December to now is that we have now focused quite a bit on mixed workloads. So in the past, in December, when we first went out, HeatWave was designed primarily for analytics. Now, what we have found is that there's a very large class of customers who have mixed workloads and who also have smaller data sizes. We now have introduced a lot of technology, including things like auto scheduling, definitely improvement in performance, where MySQL HeatWave is a very superior solution compared to Aurora or other databases out there, both in terms of performance as well as price for these mixed workloads and better latency, better throughput, lower costs. So we expect this trend of migration to MySQL HeatWave, to accelerate. So we are seeing customers migrate from Azure. We are seeing customers migrate from GCP and by far the number one migrations we are seeing are from AWS. So I think based on the new features and technologies, we have announced today, this migration is going to accelerate. >> All right, last question. So I said earlier, it's, it's, it seems like you're applying what are generally well understood and proven technologies, like in-memory, you like clustering to solve these problems. And I think about, you know, the, the things that you're doing, and I wonder, you know, I mean, these things have been around for awhile and why has this type of approach not been introduced by others previously? >> Right. Well, so the main thing is it takes time, right? That we designed HeatWave from the ground up for the cloud. And as a part of that, we had to invent new algorithms for distributed query processing for the cloud. We put in the hooks for machine learning processes. We're sealing processing right from the ground up. So this has taken us close to a decade. It's been hundreds of person-years of investment, dozens of patents which have gone in. Another aspect is it takes talent from different areas. So we have like, you know, people working in distributed query processing, we have people who have a lot of like background in machine learning. And then given that we are like the custodians of the MySQL database, we have a very rich set of customers we can reach out to, to get feedback from them as to what are the pinpoints. So culmination of these trends, which we have this talent, the customer base and the time, so we spent almost close to a decade to make this thing work. So that's what it takes. It takes time, patience, patience, and talent. >> A lot of software innovation bringing together, as I said, that hardware and software strategy. Very interesting. Nipun, thanks so much. I appreciate your, your insights and coming on this video exclusive. >> Thank you, Dave. Thank you for the opportunity. >> My pleasure. And thank you for watching everybody. This is Dave Vellante for theCUBE. We'll see you next time. (bright music)
SUMMARY :
So the argument is this simplifies the data from one database So what was the reaction once And most notably the What are the feedback that you get, So it makes the applications I got the press release here. So for instance, in the past, and I said this in my intro is, you know, In the very first month, we So sort of as, as the volume grows, any of the database we are So maybe you could go over So the first thing to realize So that's, that's you're using, You're not going to get in the marketplace. And the same thing for Redshift. of the transaction and not including the source database. a lot of migration from Aurora. So the point is that in the if not all of the innovation. but all the numbers we is talking about, not surprised that you So in the past, in December, And I think about, you know, the, of the MySQL database, we have A lot of software Thank you for the opportunity. you for watching everybody.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Vellante | PERSON | 0.99+ |
2,800 times | QUANTITY | 0.99+ |
Dave | PERSON | 0.99+ |
December | DATE | 0.99+ |
one year | QUANTITY | 0.99+ |
12 terabytes | QUANTITY | 0.99+ |
1,400 times | QUANTITY | 0.99+ |
14,000 times | QUANTITY | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
32 terabytes | QUANTITY | 0.99+ |
Amazons | ORGANIZATION | 0.99+ |
35 times | QUANTITY | 0.99+ |
18 times | QUANTITY | 0.99+ |
90% | QUANTITY | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Nipun | PERSON | 0.99+ |
first aspect | QUANTITY | 0.99+ |
Last December | DATE | 0.99+ |
Nipun Agarwal | PERSON | 0.99+ |
last year | DATE | 0.99+ |
MySQL | TITLE | 0.99+ |
seven | QUANTITY | 0.99+ |
42% | QUANTITY | 0.99+ |
13 times | QUANTITY | 0.99+ |
seven times | QUANTITY | 0.99+ |
180 patents | QUANTITY | 0.99+ |
Sun | ORGANIZATION | 0.99+ |
third request | QUANTITY | 0.99+ |
first | QUANTITY | 0.99+ |
one case | QUANTITY | 0.99+ |
AutoPilot | TITLE | 0.99+ |
0.89 | QUANTITY | 0.99+ |
second thing | QUANTITY | 0.99+ |
third aspect | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
two databases | QUANTITY | 0.99+ |
10 terabyte | QUANTITY | 0.99+ |
MySQL AutoPilot | TITLE | 0.99+ |
both | QUANTITY | 0.99+ |
Third one | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
last December | DATE | 0.99+ |
MySQL HeatWave | TITLE | 0.99+ |
HeatWave | ORGANIZATION | 0.99+ |
One | QUANTITY | 0.99+ |
10 terabytes | QUANTITY | 0.98+ |
GitHub | ORGANIZATION | 0.98+ |
one fifth | QUANTITY | 0.98+ |
Yuanhao Sun, Transwarp | Big Data SV 2018
>> Announcer: Live, from San Jose, it's The Cube (light music) Presenting Big Data Silicon Valley. Brought to you by Silicon Angle Media, and its ecosystem partners. >> Hi, I'm Peter Burris and welcome back to Big Data SV, The Cube's, again, annual broadcast of what's happening in the big data marketplace here at, or adjacent to Strada here in San Jose. We've been broadcasting all day. We're going to be here tomorrow as well, over at the Forager eatery and place to come meander. So come on over. Spend some time with us. Now, we've had a number of great guests. Many of the thought leaders that are visiting here in San Jose today were on the big data marketplace. But I don't think any has traveled as far as our next guest. Yuanhao Sun is the ceo of Transwarp. Come all the way from Shanghai Yuanhao. It's once again great to see you on The Cube. Thank you very much for being here. >> Good to see you again. >> So Yuanhao, the Transwarp as a company has become extremely well known for great technology. There's a lot of reasons why that's the case, but you have some interesting updates on how the technology's being applied. Why don't you tell us what's going on? >> Okay, so, recently we announced the first order to the TPC-DS benchmark result. Our product, calling scepter, that is, SQL engine on top of Hadoop. We already add quite a lot of features, like dissre transactions, like a full SQL support. So that it can mimic, like oracle or the mutual, and also traditional database features so that we can pass the whole test. This single is also scalable, because it's distributed, scalable. So the large benchmark, like TPC-DS. It starts from 10 terabytes. SQL engine can pester without much trouble. >> So I know that there have been other firms that have claimed to pass TPCC-DS, but they haven't been audited. What does it mean to say you're audited? I'd presume that as a result, you've gone through some extremely stringent and specific tests to demonstrate that you can actually pass the entire suite. >> Yes, actually, there is a third party auditor. They already audit our test process and it results for the passed six, uh, five months. So it is fully audited. The reason why we can pass the test is because, actually, there's two major reasons for traditional databases. They are not scalable to the process large dataset. So they could not pass the test. For (mumbles) vendors, because the SQL engine, the features to reach enough to pass all the test. You know, there several steps in the benchmark, and the SQL queries, there are 99 queries, the syntax is not supported by all howve vendors yet. And also, the benchmark required to upload the data, after the queries, and then we run the queries for multiple concurrent users. That means you have to support disputed transactions. You have to make the upload data consistent. For howve vendors, the SQL engine on Hadoop. They haven't implemented the de-switch transaction capabilities. So that's why they failed to pass the benchmark. >> So I had the honor of traveling to Shanghai last year and going and speaking at your user conference and was quite impressed with the energy that was in the room as you announced a large number of new products. You've been very focused on taking what open source has to offer but adding significant value to it. As you said, you've done a lot with the SQL interfaces and various capabilities of SQL on top of Hadoop. Where is Transwarp going with its products today? How is it expanding? How is it being organizing? How is it being used? >> We group these products into three catalog, including big data, cloud, AI and the machine learning. So there are three categories. The big data, we upgrade the SQL engine, the stream engine, and we have a set of tools called adjustable studio to help people to streamline the big data operations. And the second part I lie is data cloud. We call it transwarp data cloud. So this product is going to be raised in early in May this year. So this product we build this product on top of common idiots. We provide how to buy the service, get a sense as service, air as a service to customers. A lot of people took credit multiple tenets. And they turned as isolated by network, storage, cpu. They free to create a clusters and speeding up on turning it off. So it can also scale hundreds of cost. So this is the, I think this is the first we implement, like, a network isolation and sweaty percendency in cobinets. So that it can support each day affairs and all how to components. And because it is elastic, just like car computing, but we run on bare model, people can consult the data, consult the applications in one place. Because all application and Hadoop components are conternalized, that means, we are talking images. We can spend up a very quickly and scale through a larger cluster. So this data cloud product is very interesting for large company, because they usually have a small IT team. But they have to provide a (mumbles), and a machine only capability to larger groups, like one found the people. So they need a convenient way to manage all these bigger clusters. And they have to isolate the resources. Even they need a bidding system. So this product is, we already have few big names in China, like China Post, Picture Channel, and Secret of Source Channel. So they are already applying this data cloud for their internal customers. >> And China has a, has a few people, so I presume that, you know, China Post for example, is probably a pretty big implementation. >> Yes so, they have a, but the IT team is, like less than 100 people, but they have to support thousands of users. So that's why they, you usually would deploy 100 cluster for each application, right, but today, for large organization, they have lots of applications. They hope to leverage big data capability, but a very small team, IT team, can also part of so many applications. So they need a convenient the way like a, just like when you put Hadoop on public cloud. We provide a product that allows you to provide a hardware service in private cloud on bare model machines. So this is the second product category. And the third is the machine learning and artificial intelligence. We provide a data sales platform, a machine learning tool, that is, interactive tools that allows people to create the machine only pipelines and models. We even implemented some automatic modeling capability that allow you to, to fisher in youring automatically or seeming automatically and to select the best items for you so that the machine learning can be, so everyone can be at Los Angeles. So they can use our tool to quickly create a models. And we also have some probuter models for different industry, like financial service, like banks, security companies, even iot. So we have different probuter machine only models for them. We just need to modify the template, then apply the machine only models to the applications very quickly. So that probably like a lesson, for example, for a bank customer, they just use it to deploy a model in one week. This is very quick for them. Otherwise, in the past, they have a company to build that application, to develop much models. They usually takes several months. Today it is much faster. So today we have three categories, particularly like cloud and machine learning. >> Peter Burris: Machine learning and AI. >> And so three products. >> And you've got some very, very big implementations. So you were talking about a couple of banks, but we were talking, before we came on, about some of the smart cities. >> Yuanhao Sun: Right. Kinds of things that you guys are doing at enormous scale. >> Yes, so we deploy our streaming productor for more than 300 cities in China. So this cluster is like connected together. So we use streaming capability to monitor the traffic and send the information from city to the central government. So all the, the sort of essential repoetry. So whenever illegal behavior on the road is detected, that information will be sent to the policeman, or the central repoetry within two second. Whenever you are seen by the camera in any place in China, their loads where we send out within two seconds. >> So the bad behavior is detected. It's identified as the location. The system also knows where the nearest police person is. And it sends a message and says, this car has performed something bad. >> Yeah and you should stop that car in the next station or in the next crossroad. Today there are tens of thousands policeman. They depends on this system for their daily work. >> Peter Burris: Interesting. >> So, just a question on, it sounds like one of your, sort of nearest competitors, in terms of, let's take the open source community, at least the APIs, and in their case open source, Waway. Have their been customers that tried to do a POC with you and with Waway, and said, well it took four months using the pure open source stuff, and it took, say, two weeks with your stack having, being much broader and deeper? Are any examples like that? >> There are quite a lot. We have more macro-share, like in financial services, we have about 100 bank users. So if we take all banks into account, for them they already use Hadoop. So we, our macro-share is above 60%. >> George Gilbert: 60. >> Yeah, in financial services. We usually do POC and, like run benchmarks. They are real workloads and usually it takes us three days or one week. They can found, we can speed up their workload very quickly. For Bank of China, they might go to their oracle workload to our platform. And they test our platform and the huave platform too. So the first thing is they cannot marry the whole oracle workload to open source Hadoop, because the missing features. We are able to support all this workloads with very minor modifications. So the modification takes only several hours. And we can finish the whole workload within two hours, but originally they take, usually take oracle more than one day, >> George Gilbert: Wow. >> more than ten hours to finish the workload. So it is very easy to see the benefits quickly. >> Now the you have a streaming product also with that same SQL interface. Are you going to see a migration of applications that used to be batch to more near real time or continuous, or will you see a whole new set of applications that weren't done before, because the latency wasn't appropriate? >> For streaming applications, real time cases they are mostly new applications, but if we are using storm api or spark streaming api, it is not so easy to develop your applications. And another issue is once you detect one new rule, you had to add those rules dynamically to your cluster. So to add to your printer, they do not have so many knowledge of writing scholar codes. They only know how to configure. Probably they are familiar with c-code. They just need to add one SQL statement to add a new rule. So that they can. >> In your system. >> Yeah, in our system. So it is much easier for them to program streaming applications. And for those customers who they don't have real time equations, they hope to do, like a real time data warehousing. They collect all this data from websites from their censors, like Petrol Channel, an oil company, the large oil company. They collect all the (mumbles) information directly to our streaming product. In the past, they just accredit to oracle and around the dashboard. So it only takes hours to see the results. But today, the application can be moved through our streaming product with only a few modifications, because they are all SQL statements. And this application becomes the real time. They can see the real time dashboard results in several seconds. >> So Yuanhao, you're number one in China. You're moving more aggressively to participate in the US market. What's the, last question, what's the biggest difference between being number one in China, the way that big data is being done in China versus the way you're encountering big data being done here, certainly in the US, for example? Is there a difference? >> I think there are some difference. Some a seem, katsumoto usually request a POC. But in China, they usually, I think they focus more on the results. They focus on what benefit they can gain from your product. So we have to prove them. So we have to hip them to my great application to see the benefits. I think in US, they focus more on technology than Chinese customers. >> Interesting, so they're more on technology here in the US, more in the outcome in China. Once again, Yuanhao Sun, from, ceo of Transwarp, thank you very much for being on The Cube. >> Thank you. And I'm Peter Burris with George Gilbert, my co-host, and we'll be back with more from big data SV, in San Jose. Come on over to the Forager, and spend some time with us. And we'll be back in a second. (light music)
SUMMARY :
Brought to you by Silicon Angle Media, over at the Forager eatery and place to come meander. So Yuanhao, the Transwarp as a company has become So that it can mimic, like oracle or the mutual, to demonstrate that you can actually pass the entire suite. And also, the benchmark required to upload the data, So I had the honor of traveling to Shanghai last year So this product is going to be raised you know, China Post for example, and to select the best items for you So you were talking about a couple of banks, Kinds of things that you guys are doing at enormous scale. from city to the central government. So the bad behavior is detected. or in the next crossroad. and it took, say, two weeks with your stack having, So if we take all banks into account, So the first thing is they cannot more than ten hours to finish the workload. Now the you have a streaming product also So to add to your printer, So it only takes hours to see the results. to participate in the US market. So we have to prove them. in the US, more in the outcome in China. Come on over to the Forager, and spend some time with us.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Peter Burris | PERSON | 0.99+ |
Shanghai | LOCATION | 0.99+ |
George Gilbert | PERSON | 0.99+ |
US | LOCATION | 0.99+ |
China | LOCATION | 0.99+ |
99 queries | QUANTITY | 0.99+ |
three days | QUANTITY | 0.99+ |
two weeks | QUANTITY | 0.99+ |
Silicon Angle Media | ORGANIZATION | 0.99+ |
five months | QUANTITY | 0.99+ |
San Jose | LOCATION | 0.99+ |
China Post | ORGANIZATION | 0.99+ |
Picture Channel | ORGANIZATION | 0.99+ |
one week | QUANTITY | 0.99+ |
six | QUANTITY | 0.99+ |
four months | QUANTITY | 0.99+ |
Los Angeles | LOCATION | 0.99+ |
10 terabytes | QUANTITY | 0.99+ |
last year | DATE | 0.99+ |
today | DATE | 0.99+ |
Today | DATE | 0.99+ |
tomorrow | DATE | 0.99+ |
more than one day | QUANTITY | 0.99+ |
more than 300 cities | QUANTITY | 0.99+ |
second part | QUANTITY | 0.99+ |
two hours | QUANTITY | 0.99+ |
less than 100 people | QUANTITY | 0.99+ |
more than ten hours | QUANTITY | 0.99+ |
Waway | ORGANIZATION | 0.99+ |
Bank of China | ORGANIZATION | 0.99+ |
third | QUANTITY | 0.99+ |
Hadoop | TITLE | 0.99+ |
Petrol Channel | ORGANIZATION | 0.99+ |
three products | QUANTITY | 0.98+ |
one new rule | QUANTITY | 0.98+ |
hundreds | QUANTITY | 0.98+ |
three categories | QUANTITY | 0.98+ |
SQL | TITLE | 0.98+ |
single | QUANTITY | 0.98+ |
Transwarp | ORGANIZATION | 0.98+ |
first | QUANTITY | 0.98+ |
tens of thousands policeman | QUANTITY | 0.98+ |
Yuanhao Sun | ORGANIZATION | 0.98+ |
each application | QUANTITY | 0.98+ |
two seconds | QUANTITY | 0.98+ |
100 cluster | QUANTITY | 0.97+ |
first thing | QUANTITY | 0.97+ |
about 100 bank users | QUANTITY | 0.97+ |
two second | QUANTITY | 0.97+ |
each day | QUANTITY | 0.97+ |
Big Data SV | ORGANIZATION | 0.97+ |
The Cube | ORGANIZATION | 0.96+ |
two major reasons | QUANTITY | 0.95+ |
one | QUANTITY | 0.95+ |
above 60% | QUANTITY | 0.95+ |
early in May this year | DATE | 0.94+ |
Source Channel | ORGANIZATION | 0.93+ |
Big Data | ORGANIZATION | 0.92+ |
Chinese | OTHER | 0.9+ |
Strada | LOCATION | 0.89+ |
second product category | QUANTITY | 0.88+ |