Mohan Rokkam & Greg Gibby | 4th Gen AMD EPYC on Dell PowerEdge: Virtualization

(cheerful music) >> Welcome to theCUBE's continuing coverage of AMD's 4th Generation EPYC launch. I'm Dave Nicholson, and I'm here in our Palo Alto studios talking to Greg Gibby, senior product manager, data center products from AMD, and Mohan Rokkam, technical marketing engineer at Dell. Welcome, gentlemen. >> Mohan: Hello, hello. >> Greg: Thank you. Glad to be here. >> Good to see each of you. Just really quickly, I want to start out. Let us know a little bit about yourselves. Mohan, let's start with you. What do you do at Dell exactly? >> So I'm a technical marketing engineer at Dell. I've been with Dell for around 15 years now and my goal is to really look at the Dell powered servers and see how do customers take advantage of some of the features we have, especially with the AMD EPYC processors that have just come out. >> Greg, and what do you do at AMD? >> Yeah, so I manage our software-defined infrastructure solutions team, and really it's a cradle to grave where we work with the ISVs in the market, so VMware, Nutanix, Microsoft, et cetera, to integrate the features that we're putting into our processors and make sure they're ready to go and enabled. And then we work with our valued partners like Dell on putting those into actual solutions that customers can buy and then we work with them to sell those solutions into the market. >> Before we get into the details on the 4th Generation EPYC launch and what that means and why people should care. Mohan, maybe you can tell us a little about the relationship between Dell and AMD, how that works, and then Greg, if you've got commentary on that afterwards, that'd be great. Yeah, Mohan. >> Absolutely. Dell and AMD have a long standing partnership, right? Especially now with EPYC series. We have had products since EPYC first generation. We have been doing solutions across the whole range of Dell ecosystem. We have integrated AMD quite thoroughly and effectively and we really love how performant these systems are. So, yeah. >> Dave: Greg, what are your thoughts? >> Yeah, I would say the other thing too is, is that we need to point out is that we both have really strong relationships across the entire ecosystem. So memory vendors, the software providers, et cetera, we have technical relationships. We're working with them to optimize solutions so that ultimately when the customer buys that, they get a great user experience right out of the box. >> So, Mohan, I know that you and your team do a lot of performance validation testing as time goes by. I suspect that you had early releases of the 4th Gen EPYC processor technology. What have you been seeing so far? What can you tell us? >> AMD has definitely knocked it out of the park. Time and again, in the past four generations, in the past five years alone, we have done some database work where in five years, we have seen five exit performance. And across the board, AMD is the leader in benchmarks. We have done virtualization where we would consolidate from five into one system. We have world records in AI, we have world records in databases, we have world records in virtualization. The AMD EPYC solutions has been absolutely performant. I'll leave you with one number here. When we went from top of Stack Milan to top of Stack Genoa, we saw a performance bump of 120%. And that number just blew my mind. >> So that prompts a question for Greg. Often we, in industry insiders, think in terms of performance gains over the last generation or the current generation. A lot of customers in the real world, however, are N - 2. They're a ways back, so I guess two points on that. First of all, the kinds of increases the average person is going to see when they move to this architecture, correct me if I'm wrong, but it's even more significant than a lot of the headline numbers because they're moving two generations, number one. Correct me if I'm wrong on that, but then the other thing is the question to you, Greg. I like very long complicated questions, as you can tell. The question is, is it okay for people to skip generations or make the case for upgrades, I guess is the problem? >> Well, yeah, so a couple thoughts on that first too. Mohan talked about that five X over the generation improvements that we've seen. The other key point with that too is that we've made significant process improvements along the way moving to seven nanocomputer to now five nanocomputer and that's really reducing the total amount of power or the performance per watt the customers can realize as well. And when we look at why would a customer want to upgrade, right? And I want to rephrase that as to why aren't you? And there is a real cost of not upgrading. And so when you look at infrastructure, the average age of a server in the data center is over five years old. And if you look at the most popular processors that were sold in that timeframe, it's 8, 10, 12 cores. So now you've got a bunch of servers that you need in order to deliver the applications and meet your SLAs to your end users, and all those servers pull power. They require maintenance. They have the opportunity to go down, et cetera. You got to pay licensing and service and support costs and all those. And when you look at all the costs that roll up, even though the hardware is paid for just to keep the lights on, and not even talking about the soft costs of unplanned downtime, and, "I'm not meeting your SLAs," et cetera, it's very expensive to keep those servers running. Now, if you refresh, and now you have processors that have 32, 64, 96 cores, now you can consolidate that infrastructure and reduce your total power bill. You can reduce your CapEx, you reduce your ongoing OpEx, you improve your performance, and you improve your security profile. So it really is more cost effective to refresh than not to refresh. >> So, Mohan, what has your experience been double clicking on this topic of consolidation? I know that we're going to talk about virtualization in some of the results that you've seen. What have you seen in that regard? Does this favor better consolidation and virtualized environments? And are you both assuring us that the ROI and TCO pencil out on these new big, bad machines? >> Greg definitely hit the nail on the head, right? We are seeing tremendous savings really, if you're consolidating from two generations old. We went from, as I said, five is to one. You're going from five full servers, probably paid off down to one single server. That itself is, if you look at licensing costs, which again, with things like VMware does get pretty expensive. If you move to a single system, yes, we are at 32, 64, 96 cores, but if you compare to the licensing costs of 10 cores, two sockets, that's still pretty significant, right? That's one huge thing. Another thing which actually really drives the thing is we are looking at security, and in today's environment, security becomes a major driving factor for upgrades. Dell has its own setups, cyber-resilient architecture, as we call it, and that really is integrated from processor all the way up into the OS. And those are some of the features which customers really can take advantage of and help protect their ecosystems. >> So what kinds of virtualized environments did you test? >> We have done virtualization across primary codes with VMware, but the Azure Stack, we have looked at Nutanix. PowerFlex is another one within Dell. We have vSAN Ready Nodes. All of these, OpenShift, we have a broad variety of solutions from Dell and AMD really fits into almost every one of them very well. >> So where does hyper-converged infrastructure fit into this puzzle? We can think of a server as something that contains not only AMD's latest architecture but also latest PCIe bus technology and all of the faster memory, faster storage cards, faster nicks, all of that comes together. But how does that play out in Dell's hyper-converged infrastructure or HCI strategy? >> Dell is a leader in hyper-converged infrastructure. We have the very popular VxRail line, we have the PowerFlex, which is now going into the AWS ecosystem as well, Nutanix, and of course, Azure Stack. With all these, when you look at AMD, we have up to 96 cores coming in. We have PCIe Gen 5 which means you can now connect dual port, 100 and 200 gig nicks and get line rate on those so you can connect to your ecosystem. And I don't know if you've seen the news, 200, 400 gig routers and switchers are selling out. That's not slowing down. The network infrastructure is booming. If you want to look at the AI/ML side of things, the VDI side of things, accelerator cards are becoming more and more powerful, more and more popular. And of course they need that higher end data path that PCIe Gen 5 brings to the table. GDDR5 is another huge improvement in terms of performance and latencies. So when we take all this together, you talk about hyper-converged, all of them add into making sure that A, with hyper-converged, you get ease of management, but B, just 'cause you have ease of management doesn't mean you need to compromise on anything. And the AMD servers effectively are a no compromise offering that we at Dell are able to offer to our customers. >> So Greg, I've got a question a little bit from left field for you. We covered Supercompute Conference 2022. We were in Dallas a couple of weeks ago, and there was a lot of discussion of the current processor manufacturer battles, and a lot of buzz around 4th Gen EPYC being launched and what's coming over the next year. Do you have any thoughts on what this architecture can deliver for us in terms of things like AI? We talk about virtualization, but if you look out over the next year, do you see this kind of architecture driving significant change in the world? >> Yeah, yeah, yeah, yeah. It has the real potential to do that from just the building blocks. So we have our chiplet architecture we call it. So you have an IO die and then you have your core complexes that go around that. And we integrate it all with our infinity fabric. That architecture allows you, if we wanted to, replace some of those CCDs with specific accelerators. And so when we look two, three, four years down the road, that architecture and that capability already built into what we're delivering and can easily be moved in. We just need to make sure that when you look at doing that, that the power that's required to do that and the software, et cetera, and those accelerators actually deliver better performance as a dedicated engine versus just using standard CPUs. The other things that I would say too is if you look at emerging workloads. So data center modernization is one of the buzzwords in cloud native, right? And these container environments, well, AMD'S architecture really just screams support for those type of environments, right? Where when you get into these larger core accounts and the consolidation that Mohan talked about. Now when I'm in a container environment, that blast radius so a lot of customers have concerns around, "Hey, having a single point of failure and having more than X number of cores concerns me." If I'm in containers, that becomes less of a concern. And so when you look at cloud native, containerized applications, data center modernization, AMD's extremely well positioned to take advantage of those use cases as well. >> Yeah, Mohan, and when we talk about virtualization, I think sometimes we have to remind everyone that yeah, we're talking about not only virtualization that has a full-blown operating system in the bucket, but also virtualization where the containers have microservices and things like that. I think you had something to add, Mohan. >> I did, and I think going back to the accelerator side of business, right? When we are looking at the current technology and looking at accelerators, AMD has done a fantastic job of adding in features like AVX-512, we have the bfloat16 and eight features. And some of what these do is they're effectively built-in accelerators for certain workloads especially in the AI and media spaces. And in some of these use cases we look at, for example, are inference. Traditionally we have used external accelerator cards, but for some of the entry level and mid-level use cases, CPU is going to work just fine especially with the newer CPUs that we are seeing this fantastic performance from. The accelerators just help get us to the point where if I'm at the edge, if I'm in certain use cases, I don't need to have an accelerator in there. I can run most of my inference workloads right on the CPU. >> Yeah, yeah. You know the game. It's an endless chase to find the bottleneck. And once we've solved the puzzle, we've created a bottleneck somewhere else. Back to the supercompute conversations we had, specifically about some of the AMD EPYC processor technology and the way that Dell is packaging it up and leveraging things like connectivity. That was one of the things that was also highlighted. This idea that increasingly connectivity is critically important, not just for supercomputing, but for high-performance computing that's finding its way out of the realms of Los Alamos and down to the enterprise level. Gentlemen, any more thoughts about the partnership or maybe a hint at what's coming in the future? I know that the original AMD announcement was announcing and previewing some things that are rolling out over the next several months. So let me just toss it to Greg. What are we going to see in 2023 in terms of rollouts that you can share with us? >> That I can share with you? Yeah, so I think look forward to see more advancements in the technology at the core level. I think we've already announced our product code name Bergamo, where we'll have up to 128 cores per socket. And then as we look in, how do we continually address this demand for data, this demand for, I need actionable insights immediately, look for us to continue to drive performance leadership in our products that are coming out and address specific workloads and accelerators where appropriate and where we see a growing market. >> Mohan, final thoughts. >> On the Dell side, of course, we have four very rich and configurable options with AMD EPYC servers. But beyond that, you'll see a lot more solutions. Some of what Greg has been talking about around the next generation of processors or the next updated processors, you'll start seeing some of those. and you'll definitely see more use cases from us and how customers can implement them and take advantage of the features that. It's just exciting stuff. >> Exciting stuff indeed. Gentlemen, we have a great year ahead of us. As we approach possibly the holiday seasons, I wish both of you well. Thank you for joining us. From here in the Palo Alto studios, again, Dave Nicholson here. Stay tuned for our continuing coverage of AMD's 4th Generation EPYC launch. Thanks for joining us. (cheerful music)

Published Date : Dec 14 2022

SUMMARY :

talking to Greg Gibby, Glad to be here. What do you do at Dell exactly? of some of the features in the market, so VMware, on the 4th Generation EPYC launch the whole range of Dell ecosystem. is that we need to point out is that of the 4th Gen EPYC processor technology. Time and again, in the the question to you, Greg. of servers that you need in some of the results that you've seen. really drives the thing is we have a broad variety and all of the faster We have the very popular VxRail line, over the next year, do you that the power that's required to do that in the bucket, but also but for some of the entry I know that the original AMD in the technology at the core level. and take advantage of the features that. From here in the Palo Alto studios,

ENTITIES

Entity	Category	Confidence
Greg	PERSON	0.99+
Dave Nicholson	PERSON	0.99+
AMD	ORGANIZATION	0.99+
Greg Gibby	PERSON	0.99+
Dell	ORGANIZATION	0.99+
Dave	PERSON	0.99+
8	QUANTITY	0.99+
Mohan	PERSON	0.99+
32	QUANTITY	0.99+
Mohan Rokkam	PERSON	0.99+
100	QUANTITY	0.99+
200	QUANTITY	0.99+
10 cores	QUANTITY	0.99+
10	QUANTITY	0.99+
Dallas	LOCATION	0.99+
120%	QUANTITY	0.99+
two sockets	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
12 cores	QUANTITY	0.99+
two generations	QUANTITY	0.99+
2023	DATE	0.99+
five	QUANTITY	0.99+
64	QUANTITY	0.99+
200 gig	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
one	QUANTITY	0.99+
five full servers	QUANTITY	0.99+
Palo Alto	LOCATION	0.99+
two points	QUANTITY	0.99+
400 gig	QUANTITY	0.99+
EPYC	ORGANIZATION	0.99+
two	QUANTITY	0.99+
five years	QUANTITY	0.99+
one system	QUANTITY	0.99+
three	QUANTITY	0.99+
Los Alamos	LOCATION	0.99+
next year	DATE	0.99+
Nutanix	ORGANIZATION	0.99+
two generations	QUANTITY	0.99+
four years	QUANTITY	0.98+
both	QUANTITY	0.98+
Azure Stack	TITLE	0.98+
five nanocomputer	QUANTITY	0.98+

AMD Oracle Partnership Elevates MySQLHeatwave

(upbeat music) >> For those of you who've been following the cloud database space, you know that MySQL HeatWave has been on a technology tear over the last 24 months with Oracle claiming record breaking benchmarks relative to other database platforms. So far, those benchmarks remain industry leading as competitors have chosen not to respond, perhaps because they don't feel the need to, or maybe they don't feel that doing so would serve their interest. Regardless, the HeatWave team at Oracle has been very aggressive about its performance claims, making lots of noise, challenging the competition to respond, publishing their scripts to GitHub. But so far, there are no takers, but customers seem to be picking up on these moves by Oracle and it's likely the performance numbers resonate with them. Now, the other area we want to explore, which we haven't thus far, is the engine behind HeatWave and that is AMD. AMD's epic processors have been the powerhouse on OCI, running MySQL HeatWave since day one. And today we're going to explore how these two technology companies are working together to deliver these performance gains and some compelling TCO metrics. In fact, a recent Wikibon analysis from senior analyst Marc Staimer made some TCO comparisons in OLAP workloads relative to AWS, Snowflake, GCP, and Azure databases, you can find that research on wikibon.com. And with that, let me introduce today's guest, Nipun Agarwal senior vice president of MySQL HeatWave and Kumaran Siva, who's the corporate vice president for strategic business development at AMD. Welcome to theCUBE gentlemen. >> Welcome. Thank you. >> Thank you, Dave. >> Hey Nipun, you and I have talked a lot about this. You've been on theCUBE a number of times talking about MySQL HeatWave. But for viewers who may not have seen those episodes maybe you could give us an overview of HeatWave and how it's different from competitive cloud database offerings. >> Sure. So MySQL HeatWave is a fully managed MySQL database service offering from Oracle. It's a single database, which can be used to run transactional processing, analytics and machine learning workloads. So, in the past, MySQL has been designed and optimized for transaction processing. So customers of MySQL when they had to run, analytics machine learning, would need to extract the data out of MySQL, into some other database or service, to run analytics or machine learning. MySQL HeatWave offers a single database for running all kinds of workloads so customers don't need to extract data into some of the database. In addition to having a single database, MySQL HeatWave is also very performant compared to one up databases and also it is very price competitive. So the advantages are; single database, very performant, and very good price performance. >> Yes. And you've published some pretty impressive price performance numbers against competitors. Maybe you could describe those benchmarks and highlight some of the results, please. >> Sure. So one thing to notice that the performance of any database is going to like vary, the performance advantage is going to vary based on, the size of the data and the specific workloads, so the mileage varies, that's the first thing to know. So what we have done is, we have published multiple benchmarks. So we have benchmarks on PPCH or PPCDS and we have benchmarks on different data sizes because based on the customer's workload, the mileage is going to vary, so we want to give customers a broad range of comparisons so that they can decide for themselves. So in a specific case, where we are running on a 30 terabyte PPCH workload, HeatWave is about 18 times better price performance compared to Redshift. 18 times better compared to Redshift, about 33 times better price performance, compared to Snowflake, and 42 times better price performance compared to Google BigQuery. So, this is on 30 Terabyte PPCH. Now, if the data size is different, or the workload is different, the characteristics may vary slightly but this is just to give a flavor of the kind of performance advantage MySQL HeatWave offers. >> And then my last question before we bring in Kumaran. We've talked about the secret sauce being the tight integration between hardware and software, but would you add anything to that? What is that secret sauce in HeatWave that enables you to achieve these performance results and what does it mean for customers? >> So there are three parts to this. One is HeatWave has been designed with a scale out architecture in mind. So we have invented and implemented new algorithms for skill out query processing for analytics. The second aspect is that HeatWave has been really optimized for cloud, commodity cloud, and that's where AMD comes in. So for instance, many of the partitioning schemes we have for processing HeatWave, we optimize them for the L3 cache of the AMD processor. The thing which is very important to our customers is not just the sheer performance but the price performance, and that's where we have had a very good partnership with AMD because not only does AMD help us provide very good performance, but the price performance, right? And that all these numbers which I was showing, big part of it is because we are running on AMD which provides very good price performance. So that's the second aspect. And the third aspect is, MySQL autopilot, which provides machine learning based automation. So it's really these three things, a combination of new algorithms, design for scale out query processing, optimized for commodity cloud hardware, specifically AMD processors, and third, MySQL auto pilot which gives us this performance advantage. >> Great, thank you. So that's a good segue for AMD and Kumaran. So Kumaran, what is AMD bringing to the table? What are the, like, for instance, relevance specs of the chips that are used in Oracle cloud infrastructure and what makes them unique? >> Yeah, thanks Dave. That's a good question. So, OCI is a great customer of ours. They use what we call the top of stack devices meaning that they have the highest core count and they also are very, very fast cores. So these are currently Zen 3 cores. I think the HeatWave product is right now deployed on Zen 2 but will shortly be also on the Zen 3 core as well. But we provide in the case of OCI 64 cores. So that's the largest devices that we build. What actually happens is, because these large number of CPUs in a single package and therefore increasing the density of the node, you end up with this fantastic TCO equation and the cost per performance, the cost per for deployed services like HeatWave actually ends up being extraordinarily competitive and that's a big part of the contribution that we're bringing in here. >> So Zen 3 is the AMD micro architecture which you introduced, I think in 2017, and it's the basis for EPIC, which is sort of the enterprise grade that you really attacked the enterprise with. Maybe you could elaborate a little bit, double click on how your chips contribute specifically to HeatWave's, price performance results. >> Yeah, absolutely. So in the case of HeatWave, so as Nipun alluded to, we have very large L3 caches, right? So in our very, very top end parts just like the Milan X devices, we can go all the way up to like 768 megabytes of L3 cache. And that gives you just enormous performance and performance gains. And that's part of what we're seeing with HeatWave today and that not that they're currently on the second generation ROM based product, 'cause it's a 7,002 based product line running with the 64 cores. But as time goes on, they'll be adopting the next generation Milan as well. And the other part of it too is, as our chip led architecture has evolved, we know, so from the first generation Naples way back in 2017, we went from having multiple memory domains and a sort of NUMA architecture at the time, today we've really optimized that architecture. We use a common I/O Die that has all of the memory channels attached to it. And what that means is that, these scale out applications like HeatWave, are able to really scale very efficiently as they go from a small domain of CPUs to, for example the entire chip, all 64 cores that scaling, is been a key focus for AMD and being able to design and build architectures that can take advantage of that and then have applications like HeatWave that scale so well on it, has been, a key aim of ours. >> And Gen 3 moving up the Italian countryside. Nipun, you've taken the somewhat unusual step of posting the benchmark parameters, making them public on GitHub. Now, HeatWave is relatively new. So people felt that when Oracle gained ownership of MySQL it would let it wilt on the vine in favor of Oracle database, so you lost some ground and now, you're getting very aggressive with HeatWave. What's the reason for publishing those benchmark parameters on GitHub? >> So, the main reason for us to publish price performance numbers for HeatWave is to communicate to our customers a sense of what are the benefits they're going to get when they use HeatWave. But we want to be very transparent because as I said the performance advantages for the customers may vary, based on the data size, based on the specific workloads. So one of the reasons for us to publish, all these scripts on GitHub is for transparency. So we want customers to take a look at the scripts, know what we have done, and be confident that we stand by the numbers which we are publishing, and they're very welcome, to try these numbers themselves. In fact, we have had customers who have downloaded the scripts from GitHub and run them on our service to kind of validate. The second aspect is in some cases, they may be some deviations from what we are publishing versus what the customer would like to run in the production deployments so it provides an easy way, for customers to take the scripts, modify them in some ways which may suit their real world scenario and run to see what the performance advantages are. So that's the main reason, first, is transparency, so the customers can see what we are doing, because of the comparison, and B, if they want to modify it to suit their needs, and then see what is the performance of HeatWave, they're very welcome to do so. >> So have customers done that? Have they taken the benchmarks? And I mean, if I were a competitor, honestly, I wouldn't get into that food fight because of the impressive performance, but unless I had to, I mean, have customers picked up on that, Nipun? >> Absolutely. In fact, we have had many customers who have benchmarked the performance of MySQL HeatWave, with other services. And the fact that the scripts are available, gives them a very good starting point, and then they've also tweaked those queries in some cases, to see what the Delta would be. And in some cases, customers got back to us saying, hey the performance advantage of HeatWave is actually slightly higher than what was published and what is the reason. And the reason was, when the customers were trying, they were trying on the latest version of the service, and our benchmark results were posted let's say, two months back. So the service had improved in those two to three months and customers actually saw better performance. So yes, absolutely. We have seen customers download the scripts, try them and also modify them to some extent and then do the comparison of HeatWave with other services. >> Interesting. Maybe a question for both of you how is the competition responding to this? They haven't said, "Hey, we're going to come up "with our own benchmarks." Which is very common, you oftentimes see that. Although, for instance, Snowflake hasn't responded to data bricks, so that's not their game, but if the customers are actually, putting a lot of faith in the benchmarks and actually using that for buying decisions, then it's inevitable. But how have you seen the competition respond to the MySQL HeatWave and AMD combo? >> So maybe I can take the first track from the database service standpoint. When customers have more choice, it is invariably advantages for the customer because then the competition is going to react, right? So the way we have seen the reaction is that we do believe, that the other database services are going to take a closer eye to the price performance, right? Because if you're offering such good price performance, the vendors are already looking at it. And, you know, instances where they have offered let's say discount to the customers, to kind of at least like close the gap to some extent. And the second thing would be in terms of the capability. So like one of the things which I should have mentioned even early on, is that not only does MySQL HeatWave on AMD, provide very good price performance, say on like a small cluster, but it's all the way up to a cluster size of 64 nodes, which has about 1000 cores. So the point is, that HeatWave performs very well, both on a small system, as well as a huge scale out. And this is again, one of those things which is a differentiation compared to other services so we expect that even other database services will have to improve their offerings to provide the same good scale factor, which customers are now starting to expectancy, with MySQL HeatWave. >> Kumaran, anything you'd add to that? I mean, you guys are an arms dealer, you love all your OEMs, but at the same time, you've got chip competitors, Silicon competitors. How do you see the competitive-- >> I'd say the broader answer and the big picture for AMD, we're very maniacally focused on our customers, right? And OCI and Oracle are huge and important customers for us, and this particular use cases is extremely interesting both in that it takes advantage, very well of our architecture and it pulls out some of the value that AMD bring. I think from a big picture standpoint, our aim is to execute, to build to bring out generations of CPUs, kind of, you know, do what we say and say, sorry, say what we do and do what we say. And from that point of view, we're hitting, the schedules that we say, and being able to bring out the latest technology and bring it in a TCO value proposition that generationally keeps OCI and HeatWave ahead. That's the crux of our partnership here. >> Yeah, the execution's been obvious for the last several years. Kumaran, staying with you, how would you characterize the collaboration between, the AMD engineers and the HeatWave engineering team? How do you guys work together? >> No, I'd say we're in a very, very deep collaboration. So, there's a few aspects where, we've actually been working together very closely on the code and being able to optimize for both the large L3 cache that AMD has, and so to be able to take advantage of that. And then also, to be able to take advantage of the scaling. So going between, you know, our architecture is chip like based, so we have these, the CPU cores on, we call 'em CCDs and the inter CCD communication, there's opportunities to optimize an application level and that's something we've been engaged with. In the broader engagement, we are going back now for multiple generations with OCI, and there's a lot of input that now, kind of resonates in the product line itself. And so we value this very close collaboration with HeatWave and OCI. >> Yeah, and the cadence, Nip, and you and I have talked about this quite a bit. The cadence has been quite rapid. It's like this constant cycle every couple of months I turn around, is something new on HeatWave. But for question again, for both of you, what new things do you think that organizations, customers, are going to be able to do with MySQL HeatWave if you could look out next 12 to 18 months, is there anything you can share at this time about future collaborations? >> Right, look, 12 to 18 months is a long time. There's going to be a lot of innovation, a lot of new capabilities coming out on in MySQL HeatWave. But even based on what we are currently offering, and the trend we are seeing is that customers are bringing, more classes of workloads. So we started off with OLTP for MySQL, then it went to analytics. Then we increased it to mixed workloads, and now we offer like machine learning as alike. So one is we are seeing, more and more classes of workloads come to MySQL HeatWave. And the second is a scale, that kind of data volumes people are using HeatWave for, to process these mixed workloads, analytics machine learning OLTP, that's increasing. Now, along the way we are making it simpler to use, we are making it more cost effective use. So for instance, last time, when we talked about, we had introduced this real time elasticity and that's something which is a very, very popular feature because customers want the ability to be able to scale out, or scale down very efficiently. That's something we provided. We provided support for compression. So all of these capabilities are making it more efficient for customers to run a larger part of their workloads on MySQL HeatWave, and we will continue to make it richer in the next 12 to 18 months. >> Thank you. Kumaran, anything you'd add to that, we'll give you the last word as we got to wrap it. >> No, absolutely. So, you know, next 12 to 18 months we will have our Zen 4 CPUs out. So this could potentially go into the next generation of the OCI infrastructure. This would be with the Genoa and then Bergamo CPUs taking us to 96 and 128 cores with 12 channels at DDR five. This capability, you know, when applied to an application like HeatWave, you can see that it'll open up another order of magnitude potentially of use cases, right? And we're excited to see what customers can do do with that. It certainly will make, kind of the, this service, and the cloud in general, that this cloud migration, I think even more attractive. So we're pretty excited to see how things evolve in this period of time. >> Yeah, the innovations are coming together. Guys, thanks so much, we got to leave it there really appreciate your time. >> Thank you. >> All right, and thank you for watching this special Cube conversation, this is Dave Vellante, and we'll see you next time. (soft calm music)

Published Date : Sep 14 2022

SUMMARY :

and it's likely the performance Thank you. and how it's different from So the advantages are; single and highlight some of the results, please. the first thing to know. We've talked about the secret sauce So for instance, many of the relevance specs of the chips that are used and that's a big part of the contribution and it's the basis for EPIC, So in the case of HeatWave, of posting the benchmark parameters, So one of the reasons for us to publish, So the service had improved how is the competition responding to this? So the way we have seen the but at the same time, and the big picture for AMD, for the last several years. and so to be able to Yeah, and the cadence, and the trend we are seeing is we'll give you the last and the cloud in general, Yeah, the innovations we'll see you next time.

ENTITIES

Entity	Category	Confidence
Marc Staimer	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Nipun	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
2017	DATE	0.99+
Dave	PERSON	0.99+
OCI	ORGANIZATION	0.99+
Zen 3	COMMERCIAL_ITEM	0.99+
7,002	QUANTITY	0.99+
Kumaran	PERSON	0.99+
second aspect	QUANTITY	0.99+
Nipun Agarwal	PERSON	0.99+
AMD	ORGANIZATION	0.99+
12	QUANTITY	0.99+
64 cores	QUANTITY	0.99+
768 megabytes	QUANTITY	0.99+
two	QUANTITY	0.99+
MySQL	TITLE	0.99+
third aspect	QUANTITY	0.99+
12 channels	QUANTITY	0.99+
Kumaran Siva	PERSON	0.99+
HeatWave	ORGANIZATION	0.99+
96	QUANTITY	0.99+
18 times	QUANTITY	0.99+
Bergamo	ORGANIZATION	0.99+
three parts	QUANTITY	0.99+
Delta	ORGANIZATION	0.99+
three months	QUANTITY	0.99+
MySQL HeatWave	TITLE	0.99+
42 times	QUANTITY	0.99+
both	QUANTITY	0.99+
18 months	QUANTITY	0.99+
Zen 2	COMMERCIAL_ITEM	0.99+
one	QUANTITY	0.99+
GitHub	ORGANIZATION	0.99+
One	QUANTITY	0.98+
second generation	QUANTITY	0.98+
single database	QUANTITY	0.98+
128 cores	QUANTITY	0.98+
18 months	QUANTITY	0.98+
three things	QUANTITY	0.98+

Tammy Butow & Alberto Farronato, Gremlin CUBE Conversation, April 2020

>> Narrator: From theCUBE studios in Palo Alto in Boston, connecting with thought leaders all around the world, this is theCUBE Conversation. >> Hello everyone, welcome to theCUBE Conversation here in Palo Alto, in our studios of theCUBE, I'm John Furrier, your host. We're here during the crisis of COVID-19 doing remote interviews. I come into the studio, we've got a quarantine crew are here, getting the interviews, getting the stories out there and of course, the story we're going to continue to talk about is the impact of COVID-19, and how we're all getting back to work, either working at home or working remotely and virtually certainly, but as things start to change, we're going to start to see events, mostly digital events, and we're here to talk about an event that's coming up called the Failover Conference from Gremlin which is now gone digital because it's April 21st. But I think what's important about this conversation that I want to get into is, not only talk about the event that's coming up, but talk about the scale problems that are being highlighted by this change in work environment, working at home. We've been talking about the at-scale problems that we're seeing whether it's a flood of surge of traffic and the chaos that's ensuing across the world and with this pandemic. So I'm excited, I've two two great guests, Alberto Fernando, senior vice president of marketing in Gremlin and Tammy Butow, principal site reliability engineer, or SRE. Guys thanks for coming on. Appreciate it, thank you. >> Thanks. >> Thanks for having me. >> Alberto, I want to get to you first. We've know each other before. You've been in this industry. We've been all talking about the cloud native, cloud scale for some time. It's kind of inside the ropes, it's inside baseball. Tammy, you're a site reliability engineer. Everyone knows Google, knows how cloud works. This is large scale stuff. Now with the COVID-19, we're starting to see the average person, my brother, my sister, our family members and people around the world go, "Oh my God, this is really a high impact." This change of behavior, this surge of web, whether it's traffic on the internet or work at home tools that are inadequate, you start to see (laughs) the statistical things that were planned for, not working well, and this actually maps the things that we've been talking about in our industry. Alberto, you've been on this. How are you guys doing? >> Yeah. >> And what's your take on this situation we're in right now? >> Yeah, we're doing pretty well as a company. We were born as a distributed organization to begin with, so for us working in a distributed environment from all over the world is common practice day-to-day. Personally, I'm originally from Italy, my parents, my family, is Milan and Bergamo of all places, so I have to follow the news with extra care and it becomes so much clear nowadays that the technology is not just a powerful tool to enable our businesses but it also is so critical for our day-to-day life, and thanks to video calls, I can easily talk to my family back there every day. So that's really important. So yes, we've been talking for a long time as you mentioned about complex systems at scale and reliability often in the context of mission critical applications, but more and more of these systems need to be reliable also when it comes to back office systems that enable people to continue to work on a daily basis. >> Yeah, well our hearts go out to your family and your friends in Italy, and I hope everyone stays safe there (speaks faintly) a tough situation continues to be a challenge. Tammy, I want to get your thoughts. How's life going for you? You're a site reliable engineer. What you deal with on the tech side is now (laughs) happening in the real world. It's mind blowing to me that we're seeing these things happen, it's a paradigm that needs attention. How do you look at it as a SRE, dealing with mostly on the tech side now seeing it play out in real life? >> It's been such an interesting situation, obviously really terrible for everybody to have to go through and deal with, so one of the things that I specialize in as a site reliability engineer is incident management and so for example, I previously worked at Dropbox where I was the incident manager on call for 500 million customers, it's like 24/7 shift. These large scale incidents, you really need to be able to act fast. There are two very important metrics that we track and care about as a site reliability engineer. The first one is mean time to detection. How fast can you detect that something is happening? Obviously, if we detect an issue faster then you've got a better chance of making the impact lower so you can contain the blast radius. I like to explain it to people like, if you have a fire in your sauce bin in your kitchen, and you put it out, that's way better than waiting until your entire house is on fire. And the other metric is mean time to resolution. So how long does it take you to recover from the situation? So yeah, this is a large scale, global incident right now that we're in. >> Yeah, I know you guys do a lot, talk about chaos, theory and that applies. A lot of math involved, we all know that, but I think we need to look at the real world. This is now going to be table stakes and there's now a line in the sand here, pre-pandemic, post-pandemic, and I think you guys have an interesting company, Gremlin, in the sense that this is a complex system and that if you think about the world we're going to be living in, whether it's digital events that you guys have one coming up or how to work at home or tools that humans are going to be using, it's going to be working with systems, right? So you have this new paradigm going to be upon us pretty quickly and it's not just buying software mechanisms or software, it's a complex system, it's distributed computing, it's an operating system. I mean this is kind of the world. Can you guys talk about the Gremlin situation of how you guys are attacking these new problems and these new opportunities that are emerging? >> Sure, I can talk about that. So yeah, one of the things I've always specialized in over the last ten years is chaos engineering. And so the idea of chaos engineering is that your injecting failure on purpose to uncover weaknesses. So that's really important in distributed systems, with distributed cloud computing, all these different services that you're kind of putting together. But the idea is if you can inject failure, you can actually figure out what happens when I inject that small failure? And then you can actually go ahead and fix it. One of the things I like to say to people is focus on what you're top five critical systems are. Let's fix those first. Don't go for low hanging fruit. Fix the biggest problems first, get rid of the biggest amount of pain that you have as a company, and then you can go ahead and actually... If you think about Pareto principle, the 80/20 rule, if you fix 20% of your biggest problems, you'll actually solve 80% of your issues. That always works. It's something that I've done while working at the National Australia Bank doing chaos engineering. Also at Gremlin, at Dropbox and I help a lot of our customers do that too. >> Alberto, talk about the mindset involved. It's the most counter intuitive. Whoa! Whoa! Risk! The biggest system. >> Yeah >> I don't want to touch those. They're working fine right now. And then these problems just gestate, they kind of hang around to the bin in the kitchen fire, this is okay, I don't want to touch it. The house is still working. So this is kind of a new mindset. Could you talk about what your take is on that? Is the industry there? I mean, it was a kind of a corner case, you had Netflix, you had the Chaos Monkey those days and then now it's a DevOps practice, for a lot of folks, you guys are involved in that. What's the appetite and what's the progress of chaos engineering in mainstream case? >> Yeah, it's interesting that you mentioned DevOps, and recently Gartner came up with a new, revisited DevOps framework that has chaos engineering in the middle of the lifecycle management of your application. And the reality is that systems have become so complex in infrastructure, so many layers of abstractions. You have hundreds of services if you're doing microservices, but even if you're not doing microservices, you have so many applications connected to each other, build really complex workflows and automation flows. It's impossible for traditional QA to really understand where the vulnerability are in terms of resiliency, in terms of quality. Too often the production environment is also too different from the staging environment, and so you need a fundamentally different approach to go and find where your weaknesses are and find them before they happen, before you end up finding yourself in a situation like the one we're into today and you are not prepared. And so, so much of what we talk about is giving a tool and the methodology for people to go and find these vulnerabilities. Not so much about creating chaos, but it's about managing chaos that is built into our current system and exposing those vulnerabilities before they create problem. And so that's a very scientific methodology and tooling that we bring to market and we help customers well. >> Tammy, I want to get your thoughts on something. We used to riff a lot with our 10th unit CUBE, we've had a lot of conversation we've riffed over the years, but you know when the surge of Amazon web services came out it was pretty obvious that cloud's amazing and look at the startups that were born, you mentioned Dropbox, you worked there. These companies, all these born on the cloud, these hyper scale, companies built from scratch, great way to scale up. And we used to joke about Google, people would say, "I would like a cloud like Google," but no one has Googles use cases. And Google really pioneered the SRE concept, and you got to give 'em a lot of props for that. But now we're kind of getting to a world where it's becoming Google-like. There's more scale now than ever before. It's not a corner case, it's becoming more popular and more of a preferred architecture, this large scale. What's your assessment of the main stream enterprises, how far are they in your mind, are they there with chaos? Are they close? Are they doing it? How does someone develop an SRE practice to get the Google-like scale? 'Cause Google has an amazing network, they got large scale cloud, they have SRE's, they've been doing it for years. How does a company that's transforming their IT (laughs) have SRE's? >> That's a great question. I get asked this a lot as well. One of our goals at Gremlin is to help make the internet more reliable for everybody. Everyone using the internet, all of the engineers who are trying to build reliable services, and so I'm often asked by companies all over the world, how do we create an SRE practice and how do we practice chaos engineering? But you can get started actually rolling out your SRE program. Based on my experiences, I've done it. So when I worked at Dropbox, I worked with a lot of people who had been at Google, they've been at YouTube, they were there when SRE was rolled out across those companies, and then they brought those learnings to Dropbox, and I learned from them. But also the interesting thing is if you look at enterprise companies, so large banks. Say for example, I worked at the National Australia Bank for six years, we actually did a lot of work that I would consider chaos engineering and SRE practices. So for example, we would do large scale disaster recovery, and that's where you'd fail over an entire data center to a secret data center in an unknown location, and the reason is 'cause you're checking to make sure that everything operates okay if there's a nuclear blast. That's actually what you have to do and you have to do that practice every quarter. But if you think about it, it's not very good to only do it once a quarter. You really want to be practicing chaos engineering and injecting failure on purpose. I think actually, I prefer to do it three times a week, so I do it a lot. But I'm also someone who likes to work out a lot and be fit all the time so I know that if you do something regularly, you get great results. So that's what I always tell everyone. >> Yeah, get the reps in, as we say, get stronger, get the muscle memory. >> Yep, exactly. >> Guys, talk about the event that's coming up. You've got an event that was scheduled, physical event and then you were right in the planning mode and then the crisis hits. You're going digital, going virtual, it's really digital, but it's digital. It's on the internet. So how are you guys thinking about this? I know its out there. It's April 21st. Can you share some specifics around the event? Who should be attending and how do they get involved online? >> Yeah, the event really came together about a month ago when we started to see all the cancellations happening across the industry because of COVID-19 and we were extremely engaged in the community and we have a lot of talks and we were seeing a lot of conferences just dropping and so speakers losing their opportunity to really share their knowledge with respect with how you do reliability and topics that we focus on. And so we quickly pivoted as a company and created a new online event to give everyone in the community the opportunity to just failover to a new event as the conference name says and have those speakers who'll have lost their speaking slots have a new opportunity to go share their knowledge. And so that came together really quickly, we shared the idea with a dozen of our partners and everyone liked it and all the sudden this thing took off like crazy and just a month where we are approaching 4,000 registrations, we have over 30 partners signed up and supporting the initiative. A lot of past partners as well covering the event. So it was impressive to see the amount of interest that we were able to generate in such a short amount of time. And really, this is a conference for anybody who is interested in resiliency. If you want to know from the best on how to build business continuity across systems, people and processes, this is a great opportunity at no cost really. It's a free conference. >> And the target persona and the audience you want to have attend is what? SREs or folks doing architectural work? What's the target >> Yeah >> person to attend? >> Architects, SREs, developers, business leaders who care about the quality and the reliability of their applications, who need to help create a framework and a mindset for their organizations that speaks to what Tammy was saying a minute ago. Having that constant practice on a daily basis about go and finding how to improve things. >> You know, Tammy we've been going to physical events with theCUBE and extracting the signal from the noise and distributed it digitally for 10 years and I got to ask you because now that those events have gone away, you talk about chaos and injecting failure. Doing these digital events is not as easy as just live streaming, it's hard to replicate the value of a physical event, years of experience and standards, roles and responsibilities to digital. A different consumption environment, it's asynchronous, you're trying to create a synchronous environment. It's its own complex system, so I think a lot of people who are experimenting and learning (laughs) from these events because it's pretty chaotic. So, I'd love to get your thoughts on how you look at these digital events as a chaos engineer. How should people be looking at these events? How are you guys looking at... I mean, obviously you want to get the program going, get people out there, get the content, but to iterate on this, how do you view this? >> It is really different. So I actually like to compare it to fire drills in SRE. So often what you do there is you actually create a fake incident or a fake issue, so you just, you were saying, "Let's have a fire drill." Similar to when you're in a building and you have a fire drill that goes off and you have wardens and everything and you all have to go outside. So we can do that in this new world that we're all in all of the sudden. A lot people have never run an online event and now all of a sudden they have to. So what I would say is like, do a fire drill. Run a fake one before you do the actual one to make sure that everything does work okay. My other tip is make sure that you have backup plans. Backup plans on backup plans on backup plans. As an SRE, I always have at least three to five backup plans. I'm not just saying plan A and plan B, but there's also a C, D, and E and I think that's very important and even when you're considering technology, one of the things we say with chaos engineering is, if you're using one service, inject failure and make sure that you can fail over to a different alternative servers in case something goes wrong. >> Yeah, hence the Failover Conference, which is the name of the conference. (chuckles) >> Exactly! >> Yeah, well we certainly are going to be sending a digital reporter there, virtually. If you need any backup plans, obviously we have the remote interviews here. If you need any help, let us know, really appreciate it. Great to see you guys. And thanks for sharing. Any final thoughts on the conference? What happens when we get through the other side of this? I'll give you guys a final word. We'll start with Alberto, with you first. >> Yeah, I think when we are on the other side of this, we'll understand even more the importance of effective resilience, architecting and testing. As a provider of tools and methodologies for that, we think we will be able to help customers when we do a significant leap forward on that side. And the conference is just super exciting. I think it's going to be a great event. I encourage everyone to participate. We have tremendous lineup of speakers that have incredible reputation in their field so I'm really happy and excited about the work that the team has been able to do with our partners put together at this type of event. >> Okay, Tammy. >> Yeah, for me, I'm actually going to be doing the opening keynote for the conference and the topic that I'm speaking about is that reliability matters more now than ever. And I'll be sharing some, bizarre, weird incidents that I have worked on myself that I have experienced, really critical strange issues that have come up. But yeah, I'm really looking forward to sharing that with everybody else, so please come along, it's free. You can join from your own home and we can all be there together to support each other. >> You got a great community support and there's a lot of partners, Press Media and ecosystem and customers, so congratulations Gremlin, having a conference on April 21st called the Failover Conference. TheCUBE and SiliconANGLE have a digital reporter there that will be covering the news. Thanks for coming on and sharing. I appreciate the time. I'm John Furrier in the Palo Alto studio with remote interview with Gremlin around their Failover Conference, April 21st. It's really demonstrating, in my opinion, the at scale problems that we've been working on the industry, now more applicable than ever before as we get post-pandemic with COVID-19. Thanks for watching. Be back. (calm music)

Published Date : Apr 8 2020

SUMMARY :

this is theCUBE Conversation. and of course, the story we're going to and people around the world go, and reliability often in the context and your friends in Italy, making the impact lower so you can contain the blast radius. and that if you think about the world and then you can go ahead and actually... Alberto, talk about the mindset involved. in the kitchen fire, this is okay, and the methodology for people to go and look at the startups that were born, and so I'm often asked by companies all over the world, Yeah, get the reps in, as we say, get stronger, and then you were right in the planning mode and all the sudden this thing took off like crazy and the reliability of their applications, and I got to ask you because now and you all have to go outside. Yeah, hence the Failover Conference, Great to see you guys. that the team has been able to do and the topic that I'm speaking about and customers, so congratulations Gremlin,

ENTITIES

Entity	Category	Confidence
Tammy	PERSON	0.99+
Alberto Fernando	PERSON	0.99+
Alberto	PERSON	0.99+
80%	QUANTITY	0.99+
John Furrier	PERSON	0.99+
Italy	LOCATION	0.99+
20%	QUANTITY	0.99+
Milan	LOCATION	0.99+
Palo Alto	LOCATION	0.99+
April 21st	DATE	0.99+
4,000 registrations	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
Bergamo	LOCATION	0.99+
six years	QUANTITY	0.99+
Dropbox	ORGANIZATION	0.99+
National Australia Bank	ORGANIZATION	0.99+
Alberto Farronato	PERSON	0.99+
COVID-19	OTHER	0.99+
10 years	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
April 2020	DATE	0.99+
Tammy Butow	PERSON	0.99+
Gremlin	PERSON	0.99+
One	QUANTITY	0.99+
Boston	LOCATION	0.99+
over 30 partners	QUANTITY	0.99+
Gartner	ORGANIZATION	0.99+
10th unit	QUANTITY	0.99+
YouTube	ORGANIZATION	0.99+
theCUBE	ORGANIZATION	0.99+
first	QUANTITY	0.98+
Netflix	ORGANIZATION	0.98+
today	DATE	0.98+
one service	QUANTITY	0.97+
once a quarter	QUANTITY	0.97+
one	QUANTITY	0.97+
Gremlin	ORGANIZATION	0.97+
SiliconANGLE	ORGANIZATION	0.96+
Failover Conference	EVENT	0.96+
500 million customers	QUANTITY	0.96+
TheCUBE	ORGANIZATION	0.96+
hundreds of services	QUANTITY	0.95+
Gremlin	LOCATION	0.95+
first one	QUANTITY	0.95+
three times a week	QUANTITY	0.95+
five backup plans	QUANTITY	0.94+
two very important metrics	QUANTITY	0.94+
a month ago	DATE	0.94+
five critical systems	QUANTITY	0.93+
a month	QUANTITY	0.92+
a dozen	QUANTITY	0.89+
Googles	ORGANIZATION	0.88+
theCUBE Conversation	EVENT	0.88+
SRE	ORGANIZATION	0.83+
DevOps	TITLE	0.83+
two two great guests	QUANTITY	0.82+
CUBE	COMMERCIAL_ITEM	0.82+
pandemic	EVENT	0.81+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Bergamo: