Kim Leyenaar, Broadcom | SuperComputing 22

(Intro music) >> Welcome back. We're LIVE here from SuperComputing 22 in Dallas Paul Gillin, for Silicon Angle in theCUBE with my guest host Dave... excuse me. And our, our guest today, this segment is Kim Leyenaar who is a storage performance architect at Broadcom. And the topic of this conversation is, is is networking, it's connectivity. I guess, how does that relate to the work of a storage performance architect? >> Well, that's a really good question. So yeah, I have been focused on storage performance for about 22 years. But even, even if we're talking about just storage the entire, all the components have a really big impact on ultimately how quickly you can access your data. So, you know, the, the switches the memory bandwidth, the, the expanders the just the different protocols that you're using. And so, and the big part of is actually ethernet because as you know, data's not siloed anymore. You have to be able to access it from anywhere in the world. >> Dave: So wait, so you're telling me that we're just not living in a CPU centric world now? >> Ha ha ha >> Because it is it is sort of interesting. When we talk about supercomputing and high performance computing we're always talking about clustering systems. So how do you connect those systems? Isn't that, isn't that kind of your, your wheelhouse? >> Kim: It really is. >> Dave: At Broadcom. >> It's, it is, it is Broadcom's wheelhouse. We are all about interconnectivity and we own the interconnectivity. You know, you know, years ago it was, 'Hey, you know buy this new server because, you know, we we've added more cores or we've got better memory.' But now you've got all this siloed data and we've got you know, we've got this, this stuff or defined kind of environment now this composable environments where, hey if you need more networking, just plug this in or just go here and just allocate yourself more. So what we're seeing is these silos really of, 'hey here's our compute, here's your networking, here's your storage.' And so, how do you put those all together? The thing is interconnectivity. So, that's really what we specialize in. I'm really, you know, I'm really happy to be here to talk about some of the things that that we do to enable high performance computing. >> Paul: Now we're seeing, you know, new breed of AI computers being built with multiple GPUs very large amounts of data being transferred between them. And the internet really has become a, a bottleneck. The interconnect has become a bottle, a bottleneck. Is that something that Broadcom is working on alleviating? >> Kim: Absolutely. So we work with a lot of different, there's there's a lot of different standards that we work with to define so that we can make sure that we work everywhere. So even if you're just a dentist's office that's deploying one server, or we're talking about these hyperscalers that are, you know that have thousands or, you know tens of thousands of servers, you know, we're working on making sure that the next generation is able to outperform the previous generation. Not only that, but we found that, you know with these siloed things, if, if you add more storage but that means we're going to eat up six cores using that it's not really as useful. So Broadcom's really been focused on trying to offload the CPU. So we're offloading it from, you know data security, data protection, you know, we're we do packet sniffing ourselves and things like that. So no longer do we rely on the CPU to do that kind of processing for us but we become very smart devices all on our own so that they work very well in these kind of environments. >> Dave: So how about, give, give us an example. I know a lot of the discussion here has been around using ethernet as the connectivity layer. >> Yes. >> You know, in in, in the past, people would think about supercomputing as exclusively being InfiniBand based. >> Ha ha ha. >> But give, give us an idea of what Broadcom is doing in the ethernet space. What, you know, what's what are the advantages of using ethernet? >> Kim: So we've made two really big announcements. The first one is our Tomahawk five ethernet switch. So it's a 400 gigi ethernet switch. And the other thing we announced too was our Thor. So we have, these are our network controllers that also support up to 400 gigi each as well. So, those two alone, it just, it's amazing to me how much data we're able to transfer with those. But not only that, but they're super super intelligent controllers too. And then we realized, you know, hey, we're we're managing all this data, let's go ahead and offload the CPU. So we actually adopted the Rocky Standards. So that's one of the things that puts us above InfiniBand is that ethernet is ubiquitous, it's everywhere. And InfiniBand is primarily just owned by one or two companies. And, and so, and it's also a lot more expensive. So ethernet is just, it's everywhere. And now with the, with the Rocky standards, we're working along with, it's, it's, it does what you're talking about much better than, you know predecessors. >> Tell us about the Rocky Standards. I'm not familiar with it. I'm sure some of our listeners are not. What is the Rocky standard? >> Kim: Ha ha ha. So it's our DNA over converged to ethernet. I'm not a Rocky expert myself but I am an expert on how to offload the CPU. And so one of the things it does is instead of using the CPU to transfer the data from, you know the user space over to the next, you know server when you're transferring it we actually will do it ourselves. So we'll handle it ourselves. We will take it, we will move it across the wire and we will put it in that remote computer. And we don't have to ask the CPU to do anything to get involved in that. So big, you know, it's a big savings. >> Yeah, I mean in, in a nutshell, because there are parts of the InfiniBand protocol that are essentially embedded in RDMA over converged ethernet. So... >> Right. >> So if you can, if you can leverage kind of the best of both worlds, but have it in an ethernet environment which is already ubiquitous, it seems like it's, kind of democratizing supercomputing and, and HPC and I know you guys are big partners with Dell as an example, you guys work with all sorts of other people. >> Kim: Yeah. >> But let's say, let's say somebody is going to be doing ethernet for connectivity, you also offer switches? >> Kim: We do, actually. >> So is that, I mean that's another piece of the puzzle. >> That's a big piece of the puzzle. So we just released our, our Atlas 2 switch. It is a PCIE Gen Five switch. And... >> Dave: What does that mean? What does Gen five, what does that mean? >> Oh, Gen Five PCIE, it's it's a magic connectivity right now. So, you know, we talk about the Sapphire Rapids release as well as the GENUWA release. I know that those, you know those have been talked about a lot here. I've been walking around and everybody's talking about it. Well, those enable the Gen Five PCIE interfaces. So we've been able to double the bandwidth from the Gen Four up to the Gen Five. So, in order to, to support that we do now have our Atlas two PCIE Gen Five switch. And it allows you to connect especially around here we're talking about, you know artificial intelligence and machine learning. A lot of these are relying on the GPU and the DPU that you see, you know a lot of people talking about enabling. So by in, you know, putting these switches in the servers you can connect multitudes of not only NVME devices but also these GPUs and these, these CPUs. So besides that we also have the storage component of it too. So to support that, we we just recently have released our 9,500 series HBAs which support 24 gig SAS. And you know, this is kind of a, this is kind of a big deal for some of our hyperscalers that say, Hey, look our next generation, we're putting a hundred hard drives in. So we're like, you know, so a lot of it is maybe for cold storage, but by giving them that 24 gig bandwidth and by having these mass 24 gig SAS expanders that allows these hyperscalers to build up their systems. >> Paul: And how are you supporting the HPC community at large? And what are you doing that's exclusively for supercomputing? >> Kim: Exclusively for? So we're doing the interconnectivity really for them. You know, you can have as, as much compute power as you want, but these are very data hungry applications and a lot of that data is not sitting right in the box. A lot of that data is sitting in some other country or in some other city, or just the box next door. So to be able to move that data around, you know there's a new concept where they say, you know do the compute where the data is and then there's another kind of, you know the other way is move the data around which is a lot easier kind of sometimes, but so we're allowing us to move that data around. So for that, you know, we do have our our tomahawk switches, we've got our Thor NICS and of course we got, you know, the really wide pipe. So our, our new 9,500 series HBA and RAID controllers not only allow us to do, so we're doing 28 gigabytes a second that we can trans through the one controller, and that's on protected data. So we can actually have the high availability protected data of RAID 5 or RAID 6, or RAID 10 in the box giving in 27 gigabytes a second. So it's, it's unheard of the latency that we're seeing even off of this too, we have a right cash latency that is sub 8 microseconds that is lower than most of the NVME drives that you see, you know that are available today. So, so you know we're able to support these applications that require really low latency as well as data protection. >> Dave: So, so often when we talk about the underlying hardware, it's a it's a game of, you know, whack-a-mole chase the bottleneck. And so you've mentioned PCIE five, a lot of folks who will be implementing five, gen five PCIE five are coming off of three, not even four. >> Kim: I know. >> So make, so, so they're not just getting a last generation to this generation bump but they're getting a two generations, bump. >> Kim: They are. >> How does that, is it the case that it would never make sense to use a next gen or a current gen card in an older generation bus because of the mismatch and performance? Are these things all designed to work together? >> Uh... That's a really tough question. I want to say, no, it doesn't make sense. It, it really makes sense just to kind of move things forward and buy a card that's made for the bus it's in. However, that's not always the case. So for instance, our 9,500 controller is a Gen four PCIE but what we did, we doubled the PCIE so it's a by 16, even though it's a gen four, it's a by 16. So we're getting really, really good bandwidth out of it. As I said before, you know, we're getting 28, 27.8 or almost 28 gigabytes a second bandwidth out of that by doubling the PCIE bus. >> Dave: But they worked together, it all works together? >> All works together. You can put, you can put our Gen four and a Gen five all day long and they work beautifully. Yeah. We, we do work to validate that. >> We're almost out our time. But I, I want to ask you a more, nuts and bolts question, about storage. And we've heard for, you know, for years of the aerial density of hard disk has been reached and there's really no, no way to excel. There's no way to make the, the dish any denser. What is the future of the hard disk look like as a storage medium? >> Kim: Multi actuator actually, we're seeing a lot of multi-actuator. I was surprised to see it come across my desk, you know because our 9,500 actually does support multi-actuator. And, and, and so it was really neat after I've been working with hard drives for 22 years and I remember when they could do 30 megabytes a second, and that was amazing. That was like, wow, 30 megabytes a second. And then, about 15 years ago, they hit around 200 to 250 megabytes a second, and they stayed there. They haven't gone anywhere. What they have done is they've increased the density so that you can have more storage. So you can easily go out and buy 15 to 30 terabyte drive, but you're not going to get any more performance. So what they've done is they've added multiple actuators. So each one of these can do its own streaming and each one of these can actually do their own seeking. So you can get two and four. And I've even seen a talk about, you know eight actuator per disc. I, I don't think that, I think that's still theory, but but they could implement those. So that's one of the things that we're seeing. >> Paul: Old technology somehow finds a way to, to remain current. >> It does. >> Even it does even in the face of new alternatives. Kim Leyenaar, Storage Architect, Storage Performance Architect at Broadcom Thanks so much for being here with us today. Thank you so much for having me. >> This is Paul Gillin with Dave Nicholson here at SuperComputing 22. We'll be right back. (Outro music)

Published Date : Nov 16 2022

SUMMARY :

And the topic of this conversation is, is So, you know, the, the switches So how do you connect those systems? buy this new server because, you know, we you know, new breed So we're offloading it from, you know I know a lot of the You know, in in, in the What, you know, what's And then we realized, you know, hey, we're What is the Rocky standard? the data from, you know of the InfiniBand protocol So if you can, if you can So is that, I mean that's So we just released So we're like, you know, So for that, you know, we do have our it's a game of, you know, So make, so, so they're not out of that by doubling the PCIE bus. You can put, you can put And we've heard for, you know, for years so that you can have more storage. to remain current. Even it does even in the with Dave Nicholson here

ENTITIES

Entity	Category	Confidence
Kim Leyenaar	PERSON	0.99+
Dave Nicholson	PERSON	0.99+
Dave	PERSON	0.99+
15	QUANTITY	0.99+
Paul	PERSON	0.99+
Broadcom	ORGANIZATION	0.99+
Kim	PERSON	0.99+
30 megabytes	QUANTITY	0.99+
one	QUANTITY	0.99+
two	QUANTITY	0.99+
thousands	QUANTITY	0.99+
9,500	QUANTITY	0.99+
28	QUANTITY	0.99+
22 years	QUANTITY	0.99+
six cores	QUANTITY	0.99+
Paul Gillin	PERSON	0.99+
Dell	ORGANIZATION	0.99+
four	QUANTITY	0.99+
Dallas	LOCATION	0.99+
24 gig	QUANTITY	0.99+
two companies	QUANTITY	0.99+
first one	QUANTITY	0.99+
Rocky	ORGANIZATION	0.98+
27.8	QUANTITY	0.98+
today	DATE	0.98+
30 terabyte	QUANTITY	0.98+
both worlds	QUANTITY	0.98+
about 22 years	QUANTITY	0.97+
two generations	QUANTITY	0.97+
each one	QUANTITY	0.97+
SuperComputing 22	ORGANIZATION	0.97+
one controller	QUANTITY	0.97+
three	QUANTITY	0.96+
two really big announcements	QUANTITY	0.96+
250 megabytes	QUANTITY	0.96+
one server	QUANTITY	0.94+
Gen four	COMMERCIAL_ITEM	0.94+
up to 400 gigi	QUANTITY	0.93+
Rocky standards	ORGANIZATION	0.93+
tens of thousands of servers	QUANTITY	0.93+
400 gigi	QUANTITY	0.92+
around 200	QUANTITY	0.92+
9,500 series	QUANTITY	0.92+
excel	TITLE	0.91+
9,500 series	COMMERCIAL_ITEM	0.9+
16	QUANTITY	0.9+
InfiniBand	ORGANIZATION	0.89+
sub 8 microseconds	QUANTITY	0.89+
gen four	COMMERCIAL_ITEM	0.89+
eight actuator	QUANTITY	0.89+
second bandwidth	QUANTITY	0.88+
Atlas 2	COMMERCIAL_ITEM	0.86+
GENUWA	ORGANIZATION	0.86+
Thor	ORGANIZATION	0.85+
five	TITLE	0.85+
about 15 years ago	DATE	0.84+
28 gigabytes	QUANTITY	0.84+
Gen Five	COMMERCIAL_ITEM	0.83+
27 gigabytes a second	QUANTITY	0.82+

Andy Brown, Broadcom

hello and welcome to the cube i'm dave nicholson chief technology officer at thecube and we're here for a very special cube conversation with andy brown from broadcom andy welcome to the cube tell us a little about yourself a little bit my about myself my name is andy brown i'm currently the senior director of software architecture and performance analysis here within the data center solutions group at broadcom i've been doing that for about seven years prior to that i held various positions within the system architecture systems engineering and ic development organization but ultimately as well i spent some time in our support organization and managing our support team but ultimately have landed in the architecture organization as well as performance analysis great so a lot of what you do is around improving storage performance tell us more about that so let me give you a brief history of uh storage from from my perspective um you know i as i mentioned i go back about 30 years in my career and that would have started back in the ncr microelectronics days and originally with parallel scuzzy so that would be if anyone would remember the 5380 controller which was one of the original parallel scuzzy controllers that existed and built by ncr microelectronics at the time i've i've seen the advent of parallel scuzzy a stint of fiber channel ultimately leading into the serialization of those of the scuzzy standard into sas as well as sata and then ultimately leading to nvme protocols and the advent of flash moving from hard drives into a flash based media and as well on on that's on the storage side on the host side moving from parallel interfaces isa if everybody could remember that moving to pci pci express that's where we land today so andy we're square in the middle of the era of both nvme and sas what kinds of challenges does that overlap represent well i think you know obviously we've seen sas around for a while it was the conversion from parallel into a serial attached scuzzy and that really sas brings with it the ability to uh connect on really a high number of devices um and uh was was kind of the original scaling of devices and and really uh also enabled uh was was one of the things that enabled flash based media given the the speed and performance that came to the table of course nvme came in as well with the promise of of even higher speeds and as we saw flash media really really take a strong role in storage uh nvme came around and and really was focused on trying to address that whereas sas originated with hard drive technology nvme was really born out of how do we how do we most efficiently deal with flash based media you know sas with its but sas still carries a benefit on scalability nvme maybe has i don't want to say challenges there but it's definitely was not designed as much to be broadly scalable across many many say hundreds or thousands of devices but definitely addressed some of the performance issues that were coming up as flash media was becoming so uh uh was was increasing the overall storage performance that we could experience if you will let's talk about host interfaces like pcie what's the significance there um really uh the all the storage in the world all of the performance in the world and on the storage side is not of much use to you unless you can really feed it into the into the beast if you will into the cpu and into this the rest of the server subsystem and that's really where pci comes into play pci uh originally was in parallel form and then moved to serial with pci express as we know it today and and really has created a pathway to to to enable not not only storage performance but any other adapter or any other networking or other other types of technologies to just open up that pathway and feed the processor if and as we've moved through from pci to pci express pci 2.0 3.0 4.0 and just opening up those pipes has really enabled just a tremendous amount of flow of data into into the compute engine allowing it to be analyzed sorted used to analyze data big data uh ai type applications just those pipes are critical in those types of applications we know we've seen dramatic increases in performance going from one generation of pcie to the next but how does that translate into the worlds of sas sata and nvme um so from a performance perspective when we look at these different types of media whether it be sata sas or nvme um of course there are performance difference inherent in that media sata being probably the lowest performing with nvme uh topping out at higher performing although sas can perform quite well as a flash based you know as a protocol connected to flash based media and of course nvme from us an individual device scaling from a by one to a by four interface really that is where nvme kind of has enabled a bigger pipe directly to the storage media uh being able to scale up to buy four whereas sas is kind of limited to buy one maybe buy two in some cases although most servers only connect the sas device by one so from a difference perspective then you're really wanting to create a solution or or enable the infrastructure to be able to consume that performance that nvme is going to give you and i think that you know that is something where our solutions have really in in the recent generations shine at their ability to really now uh keep up with uh storage performance in nvme uh as well as provide that connectivity back down into the sas and sata world as well let's talk about your perspective on raid today so uh there's been a lot of uh views and opinions on raid over the years it's been a and those have been changing over time raid has been around for a very very long time uh probably about as long as again going back over my 30-year career uh it's been around for almost the entire time obviously raid originally was viewed as as something that was uh very very necessary uh devices fail they don't last forever but the data that's on them is very very important and people care about that so raid was brought about you know knowing that individual devices that are storing that data are going to fail and really took hold as a primary mechanism of protection but as time went on uh and and as performance moved up uh both in the server and both in in the media itself if we start talking about flash uh raid really took on people people started to look at traditional server storage raid uh but with maybe a more of a negative connotation i think that because uh to be quite honest it fell behind a little bit if you look at things like parity raid raid five and rate six very very effective and efficient means of protecting your data very storage efficient but ultimately had some penalties a primarily around wright performance random rights in raid 5 volumes was not keeping up with what really needed to be there and um i think that really shifted uh opinions of raid that hey it's just it's just not it's not going to keep up and we need to move on to other avenues and and we've seen that we've seen disaggregated storage and other solutions pop up to protect your data obviously in cloud environments and things like that it's shown up and uh and they have been successful so one of the drawbacks with raid is always the performance tax associated with generating parity for parity rate what has broadcom done to address those potential bottlenecks we've really solved the raid performance issue the right performance issue we're we're in our latest generation of controllers we're exceeding a million rate five right iops which is enough to satisfy many many many applications as a matter of fact even in virtual environments aggregated solutions we have multiple applications and then as well in the rebuild arena we really have through our architecture through our hardware automation have been able to move the bar on that to where the rebuild not only the rebuild times have been brought down dramatically in sas based or in i'm sorry in flash based solutions but the performance that you can observe while those rebuilds are going on is almost immeasurable so in most applications you would almost observe no performance deficiencies during a rebuild operation which is really night and day compared to where things were just a few short years ago so the fact that you've been able to dramatically decrease the time necessary for a raid rebuild is obviously extremely important but give us your overall performance philosophy from broadcom's point of view you know over the years we have recognized that performance is is obviously critically important for our products and the ability to analyze performance from many many angles is critically important there are literally infinite ways you can look at performance in a storage subsystem what we have done in our labs and in our solutions through not only hardware scaling in our in our in our labs but also through automation scripts and things like that allowed us to collect a substantial amount of data to look at the performance of our solutions from every angle you know iops bandwidth application level performance small topologies large topologies just just many many aspects it's still honestly only scratches the surface of all the possible uh performance points that you could gather but it it has we have moved the bar dramatically in that regard and and it's something that our customers really demanded of us um you know storage technology has gotten more complex and you have to look at it from a lot different angles especially on the performance front to make sure that there are no holes there that somebody's going to run into so based on specific customer needs and requests you look at performance from a variety of different angles um what are some of the trends that you're seeing specifically in storage performance today and moving into the future yeah emerging trends within the storage industry i think that to look at the emerging trends you really need to go back and look at where we started we started uh in compute where people were you would have basically your uh your server that would be under the desk in a small business operation and individual uh businesses would have their own uh set of set of servers and and the storage would really be localized to those obviously the industry has recognized that um that to some extent disaggregation of that we we see that obviously in what's happening in cloud uh in hyper-converged storage and things like that those afford a tremendous amount of flexibility uh and and are obviously uh great players in the storage world today but what with that flexibility is come some sacrifice in performance and actually quite substantial sacrifice and what we're observing is almost uh it comes back full circle the uh the need for inbox high performing server storage that is well protected uh and and with people with confidence that people have confidence that their data is protected and that they can uh extract the performance that they need for the demanding database applications that still exist today and they still operate in in the offices around the country and around the world that really need to protect their data on a local basis in the server and i think that from a trend perspective that's what we're seeing also from the standpoint of nvme store nvme itself is really started out with hey we'll just software rate that we'll just we'll just wrap software around that we can protect the data we had so many customers come back to us saying you know what we really need hardware raid on nvme and when they came to us we were ready we had a solution ready to go and we're able to provide that and now we're seeing going on demand we are we are complementary to other storage solutions out there server storage is not going to necessarily rule the world but it is surely has a place in the broader storage spectrum and we think we have the right solution for that speaking of servers and server-based storage why would for example a dell customer care about the broadcom components in that dell server so uh uh let's say you're configuring a dell server and you're going why does why does hardware raid matter what what what's important about that well i think when you look at today's hardware raid uh first of all you're going to see dramatically better performance you're going to see dramatically better performance in it's going to enable you to put raid 5 volumes a very effective and efficient mechanism for protecting your data a storage efficient mechanism you're going to use raid 5 volumes where you weren't able to do that before because when you're in the millions of iops range you really uh can satisfy a lot of application needs out there and and then you're going to also going to have rebuilt times that are lightning fast your performance is not going to degrade when you're when you're running those application especially database applications but not not only database but streaming applications uh bandwidth uh to to protected raid volumes is is almost almost imperceptibly different from just raw bandwidth to the media so the rate rate configurations in today's dell servers really afford you the opportunity to make use of that storage where you you may not have uh you may have already written it off as well ray just doesn't is not going to get me there quite frankly uh into this in in the storage servers that dell is providing uh with with raid technology uh there are huge windows open and what you can do today with applications well all of this is obviously good news for dell and dell customers thanks again andy for joining us for this cube conversation i'm dave nicholson for the cube [Music]

Published Date : May 5 2022

SUMMARY :

move the bar on that to where the

ENTITIES

Entity	Category	Confidence
Andy Brown	PERSON	0.99+
dell	ORGANIZATION	0.99+
dave nicholson	PERSON	0.99+
andy brown	PERSON	0.99+
two	QUANTITY	0.99+
hundreds	QUANTITY	0.99+
today	DATE	0.99+
dave nicholson	PERSON	0.97+
millions	QUANTITY	0.97+
andy brown	PERSON	0.96+
both	QUANTITY	0.96+
broadcom	ORGANIZATION	0.95+
about 30 years	QUANTITY	0.93+
one	QUANTITY	0.92+
one generation	QUANTITY	0.91+
30-year	QUANTITY	0.87+
Broadcom	ORGANIZATION	0.86+
about seven years	QUANTITY	0.86+
four	QUANTITY	0.85+
thousands of devices	QUANTITY	0.84+
nvme	ORGANIZATION	0.84+
few short years ago	DATE	0.83+
5380	COMMERCIAL_ITEM	0.81+
rate six	OTHER	0.81+
things	QUANTITY	0.76+
raid five	OTHER	0.74+
5 volumes	QUANTITY	0.73+
one of	QUANTITY	0.72+
million	QUANTITY	0.72+
chief	PERSON	0.7+
windows	TITLE	0.64+
pci express	COMMERCIAL_ITEM	0.64+
nvme	TITLE	0.64+
thecube	ORGANIZATION	0.61+
rate five	OTHER	0.6+
pci	ORGANIZATION	0.6+
angle	QUANTITY	0.59+
a lot	QUANTITY	0.56+
lot	QUANTITY	0.55+
customers	QUANTITY	0.54+
raid	QUANTITY	0.54+
no holes	QUANTITY	0.54+
angles	QUANTITY	0.53+
ncr	ORGANIZATION	0.52+
pci	COMMERCIAL_ITEM	0.52+
express	TITLE	0.41+

Andy Brown, Broadcom

(upbeat music) >> Hello and welcome to theCUBE. I'm Dave Nicholson, Chief Technology Officer at theCUBE and we are here for a very special Cube Conversation with Andy Brown from Broadcom. Andy, welcome to theCUBE, tell us a little about yourself. >> Well, a little bit about myself, my name is Andy Brown, I'm currently the Senior Director of Software Architecture and Performance Analysis here within the Data Center Solutions Group at Broadcom. I've been doing that for about seven years prior to that, I held various positions within the system architecture, systems engineering, and IC development organization, but ultimately as well as spent some time in our support organization and managing our support team. But ultimately have landed in the architecture organization as well as performance analysis. >> Great, so a lot of what you do is around improving storage performance, tell us more about that. >> So let me give you a brief history of storage from my perspective. As I mentioned, I go back about 30 years in my career and that would've started back in the NCR Microelectronics days. And originally with Parallel SCSI, so that would be, if anyone would remember the the 5380 Controller, which was one of the original Parallel SCSI controllers that existed in built by NCR Microelectronics at the time, I've seen the advent of Parallel SCSI, a stint of fiber channel, ultimately leading into the serialization of the SCSI standard into SaaS, as well as SATA, and then ultimately leading to NVMe protocols and the advent of flash moving from hard drives into a flash based media and as well on that's on the storage side on the host side, moving from parallel interfaces, ISA if everybody could remember that, moving to PCI, PCI Express and that's where we land today. >> So Andy, we are square in the middle of the era of both NVMe and SaaS. What kinds of challenges does that overlap represent? >> Well, I think obviously we've seen SaaS around for a while, it was the conversion from parallel into a serial attached SCSI and that really SaaS brings with it, the ability to connect really a high number of devices and was kind of the original scaling of devices. And really also enabled was one of the things that enabled flash based media, given the the speed and performance that came to the table. Of course NVMe came in as well with the promise of even higher speeds. And as we saw flash media really, really take a strong role in storage. NVMe came around and really was focused on trying to address that, whereas SaaS originated with hard drive technology. NVMe was really born out of how do we most efficiently deal with flash based media, SaaS with its. But SaaS still carries a benefit on scalability and NVMe maybe has, I don't want to say challenges there, but it's definitely was not designed as much to be broadly scale across many, many, say high hundreds or thousands of devices. But definitely addressed some of the performance issues that were coming up as flash media was becoming. So it was increasing the overall storage performance that we could experience if you will. >> Let's talk about host interfaces, PCIe. What's the significance there? >> Really all the storage in the world, all the performance in the world on the storage side, is not of much use to you unless you can really feed it into the beast, if you will, into the CPU and into the the rest of the service subsystem. And that's really where PCI comes into play. PCI originally was in parallel form and then moved to serial with the PCI Express as we know it today, and really has created a pathway to enable not only storage performance but any other adapter or any other networking or other types of technologies to just open up that pathway and feed the processor. And as we've moved through from PCI to PCI Express PCI 2.0 3.0 4.0, and just opening up those pipes has really enabled just a tremendous amount of flow of data into the compute engine, allowing it to be analyzed, sorted used to analyze data, big data, AI type applications. Just those pipes are critical in those types of applications. >> We know we've seen dramatic increases in performance, going from one generation of PCIe to the next. But how does that translate into the worlds of SaaS, SATA and NVMe? >> So from a performance perspective when we look at these different types of media whether it be SATA, SaaS or NVMe, of course, there are performance difference inherent in that media, SATA being probably the lowest performing with NVMe topping out at higher performing although SaaS can perform quite well as a flash based as protocol connected to flash based media. And of course, NVMe from an individual device scaling, from a by one to a by four interface, really that is where NVMe kind of has enabled a bigger pipe directly to the storage media, being able to scale up to by four whereas SaaS can limit it to by one, maybe by two in some cases, although most servers only connect the SaaS device of by one. So from a different perspective then you're really wanting to create a solution or enable the infrastructure to be able to consume that performance at NVMe is going to give you. And I think that is something where our solutions have really in the recent generation shined, at their ability to really now keep up with storage performance and NVMe, as well as provide that connectivity back down into the SaaS and SATA world as well. >> Let's talk about your perspective on RAID today. >> So there've been a lot of views and opinions on RAID over the years, it's been and those have been changing over time. RAID has been around for a very, very long time, probably about as long as again, going back over my 30 year career, it's been around for almost the entire time. Obviously RAID originally was viewed as some thing that was very, very necessary devices fail. They don't last forever, but the data that's on them is very, very important and people care about that. So RAID was brought about knowing that individual devices that are storing that data are going to fail, and really took cold as a primary mechanism of protection. But as time went on and as performance moved up both in the server and both in the media itself if we start talking about flash. RAID really took on, people started to look at traditional server storage RAID, well, maybe a more of a negative connotation. I think that because to be quite honest, it fell behind a little bit. If you look at things like parity RAID 5 and RAID 6, very, very effective efficient means of protecting your data, very storage efficient, but ultimately had some penalty a primarily around right performance, random rights in RAID 5 volumes was not keeping up with what really needed to be there. And I think that really shifted opinions of RAID that, "Hey it's just not, it's not going to keep up and we need to move on to other avenues." And we've seen that, we've seen disaggregated storage and other solutions pop up and protect your data obviously in cloud environments and things like that have shown up and they have been successful, but. >> So one of the drawbacks with RAID is always the performance tax associated with generating parody for parody RAID. What has Broadcom done to address those potential bottlenecks? >> We've really solved the RAID performance issue the right performance issue. We're in our latest generation of controllers we're exceeding a million RAID 5 right IOPS which is enough to satisfy many, many, many applications as a matter of fact, even in virtual environments aggregated solutions, we have multiple applications. And then as well in the rebuild arena, we really have through our architecture, through our hardware automation have been able to move the bar on that to where the rebuild not only the rebuild times have been brought down dramatically in SaaS based or in I'm sorry in flash based solutions. But the performance that you can observe while those rebuilds are going on is almost immeasurable. So in most applications you would almost observe no performance deficiencies during a rebuild operation which is really night and day compared to where things were just few short years ago. >> So the fact that you've been able to, dramatically decrease the time necessary for a RAID rebuild is obviously extremely important. But give us your overall performance philosophy from Broadcom's point of view. >> Over the years we have recognized that performance is obviously a critically important for our products, and the ability to analyze performance from many many angles is critically important. There are literally infinite ways you can look at performance in a storage subsystem. What we have done in our labs and in our solutions through not only hardware scaling in our labs, but also through automation scripts and things like that, have allowed us to collect a substantial amount of data to look at the performance of our solutions from every angle. IOPS, bandwidth application level performance, small topologies, large topologies, just many, many aspects. It still honestly only scratches the surface of all the possible performance points that you could gather, but we have moved them bar dramatically in that regard. And it's something that our customers really demanded of us. Storage technology has gotten more complex, and you have to look at it from a lot different angles, especially on the performance front to make sure that there are no holes there that somebody's going to run into. >> So based on specific customer needs and requests, you look at performance from a variety of different angles. What are some of the trends that you're seeing specifically in storage per performance today and moving into the future? >> Yeah, emerging trends within the storage industry. I think that to look at the emerging trends, you really need to go back and look at where we started. We started in compute where people were you would have basically your server that would be under the desk in a small business operation and individual businesses would have their own set of servers, and the storage would really be localized to those. Obviously the industry has recognized that to some extent, disaggregation of that, we see that obviously in what's happening in cloud, in hyper-converged storage and things like that. Those afford a tremendous amount of flexibility and are obviously great players in the storage world today. But with that flexibility has come some sacrifice and performance and actually quite substantial sacrifice. And what we're observing is almost, it comes back full circle. The need for inbox high performing server storage that is well protected. And with people with confidence that people have confidence that their data is protected and that they can extract the performance that they need for the demanding database applications, that still exists today, and that still operate in the offices around the country and around the world, that really need to protect their data on a local basis in the server. And I think that from a trend perspective that's what we're seeing. We also, from the standpoint of NVMe itself is really started out with, "Hey, we'll just software rate that. We'll just wrap software around that, we can protect the data." We had so many customers come back to us saying, you know what? We really need hardware RAID on NVMe. And when they came to us, we were ready. We had a solution ready to go, and we're able to provide that, and now we're seeing ongoing on demand. We are complimentary to other storage solutions out there. Server storage is not going to necessarily rule a world but it is surely has a place in the broader storage spectrum. And we think we have the right solution for that. >> Speaking of servers and server-based storage. Why would, for example, a Dell customer care about the Broadcom components in that Dell server. >> So let's say you're configuring a Dell server and you're going, why does hardware where RAID matter? What's important about that? Well, I think when you look at today's hardware RAID, first of all, you're going to see a dramatically better performance. You're going to see dramatically better performance it's going to enable you to put RAID 5 volumes a very effective and efficient mechanism for protecting your data, a storage efficient mechanism. You're going to use RAID 5 volumes where you weren't able to do that before, because when you're in the millions of IOPS range you really can satisfy a lot of application needs out there. And then you're going to also going to have rebuilt times that are lightning fast. Your performance is not going to degrade, when you're running those application, especially database applications, but not only database, but streaming applications, bandwidth to protected RAID volumes is almost imperceptively different from just raw bandwidth to the media. So the RAID configurations in today's Dell servers really afford you the opportunity to make use of that storage where you may not have already written it off as well RAID just doesn't, is not going to get me there. Quite frankly, into this in the storage servers that Dell is providing with RAID technology, there are huge windows open in what you can do today with applications. >> Well, all of this is obviously good news for Dell and Dell customers, thanks again, Andy for joining us, for this Cube Conversation, I'm Dave Nicholson for theCUBE. (upbeat music)

Published Date : Apr 28 2022

SUMMARY :

and we are here for a very I'm currently the Senior Great, so a lot of what you do and the advent of flash in the middle of the era and performance that came to the table. What's the significance there? and into the the rest of of PCIe to the next. have really in the Let's talk about your both in the server and So one of the drawbacks with RAID on that to where the rebuild So the fact that you've been able to, and the ability to analyze performance and moving into the future? and the storage would really about the Broadcom components in the storage servers and Dell customers, thanks

ENTITIES

Entity	Category	Confidence
Dave Nicholson	PERSON	0.99+
Andy	PERSON	0.99+
Andy Brown	PERSON	0.99+
Dell	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Data Center Solutions Group	ORGANIZATION	0.99+
Broadcom	ORGANIZATION	0.99+
millions	QUANTITY	0.99+
today	DATE	0.98+
one	QUANTITY	0.98+
hundreds	QUANTITY	0.97+
both	QUANTITY	0.97+
theCUBE	ORGANIZATION	0.97+
PCI 2.0 3.0 4.0	OTHER	0.97+
NCR Microelectronics	ORGANIZATION	0.94+
about 30 years	QUANTITY	0.94+
PCI Express	OTHER	0.94+
one generation	QUANTITY	0.93+
a million	QUANTITY	0.92+
thousands of devices	QUANTITY	0.9+
four	QUANTITY	0.88+
few short years ago	DATE	0.87+
Parallel SCSI	OTHER	0.85+
RAID 5	OTHER	0.84+
RAID 5	TITLE	0.77+
30 year	QUANTITY	0.73+
NCR	ORGANIZATION	0.72+
RAID 6	TITLE	0.71+
5380 Controller	COMMERCIAL_ITEM	0.71+
Parallel SCSI	OTHER	0.71+
one of	QUANTITY	0.7+
RAID	TITLE	0.68+
NVMe	TITLE	0.64+
SaaS	TITLE	0.63+
about seven years	QUANTITY	0.6+
things	QUANTITY	0.57+
PCI	OTHER	0.54+
IOPS	QUANTITY	0.47+

Does Hardware Matter?

[Music] does hardware still matter the attractiveness of software-defined models and services that are running in the cloud really make you wonder don't they but the reality is that software has to run on something and that something is hardware and history in the it business shows that the hardware that you purchase today is going to be up against the price performance of new systems in short order and these new systems will be far superior from a price performance standpoint within a couple of years so when it's time to purchase a new system look at whether it's a laptop a mainframe or a server configuring a leading edge product is going to give you the longest useful life of that new system now when i say a system what makes up a system well there's a lot of underlying technology components of course you have the processor you got memories you got storage devices there's networking like network interface cards there's interconnects and the bus architecture like pcie gen4 or whatever these components are constantly in a leapfrog mode like clock speeds and more cores and faster memories and ssds versus spinning disks and faster network cards the whole gamut so you see a constant advancement of the system components it's like it's a perpetual and sometimes chaotic improvement of the piece parts now i say chaotic because balancing these different components such that you're not wasting resources and that you're ensuring consistent application performance is a critical aspect of architecting systems so it becomes a game of like whack-a-mole meaning you're going to find the bottlenecks and you got to stamp them out it's a constant chase for locating the constraints designing systems that address these constraints without breaking the bank and optimizing all these components in a harmonious way hello everyone this is dave vellante of the cube and while these issues may not capture all the headlines except for maybe tom's hardware blog they're part of an important topic that we want to explore more deeply and to do so we're going to go inside some new benchmarking tests with our good friend kim lenar who's principal performance architect at broadcom kim always great to see you thanks so much for coming back on the cube hi there dave good to see you too thanks for having me on you bet hey so last time we met we talked about the importance of designing these balance systems i talked about that in my open and how solid state threw everything out of whack because the system was designed around spinning disk and we talked about nvme and we're here today with some new data an independent performance lab prowess consulting conducted some initial tests i've seen their their white papers on this stuff it compared the current generation of dell servers with previous models to quantify the performance impact of these new technologies and so before we get into that kim tell us a little about your background and your performance chops sure sure so i started my career about 22 years ago back when the ultra 160 scuzzy was out and just could only do about 20 megabytes a second um but i felt my experience really studying that relationship between the file systems and the application the os and storage layers as well as the hardware interaction i was absolutely just amazed with how you know touching one really affects the other and you have to understand that in order to be a good performance architect so i've authored dozens of performance white papers and i've worked with thousands of customers over the years designing and optimizing and debugging storage and trying to build mathematical models like project that next generation product where we really need to land but honestly i've just been blessed to work with really brilliant um and some of the most talented minds in the industry yeah well that's why i love having you on you you can go go really deep and so like i said we've got these these new white papers uh new test results on these dell servers what's the role people might be wondering what's the role broadcom plays inside these systems well we've been working alongside dell for for decades trying to design some of the industry's best uh storage and it's been a team effort in fact i've been working with some of these people for for you know multiple decades i know their their birthdays and their travel plans and where they vacation so it's been a really great relationship between broadcom and dell over the years we've been with them through the sata to the sas to the ssd kind of revolution now we're working from all the way back at that series five to their latest series 11 products that support nvme so it's been it's been really great but it's not just about you know gluing together the latest host or the latest disk interface you know we work with them to try and understand and characterize their customers and our customers applications the way that they're deployed security features management optimizing the i o path and making sure that when a failure happens we can get those raid volumes back optimal so it's been a really really great um you know role between between broadcom and dell got it okay let's get into the tested framework let's keep it at high level and then we're going to get into some of the data but but what did prowess test what was the workload what can you tell us about you know what they were trying to measure well the first thing is you have to kind of have an objective so what we had done was um we had them benchmark on one of the previous dell poweredge our 740xd servers and then we had them compare that to the rs750 and not just one r 750 there was two different configurations of the rs750 so we get to see kind of you know what gen 3 to gen 4 looks like um and upgrading the processor so we kind of got from like a gold system to maybe a platinum system we've added more controllers we add more drives um and then we said you know let's go ahead and let's do some sql transactional benchmarking on it and i'd like to go into why we chose that but you know microsoft sql server is one of the most popular database management platforms in the world and you know there are two kinds ones at oltp which processes records and business transactions and then there's kind of a an oltp which does analytical analytical processing and does a lot of complex queries and you know together these two things they drive the business operations and help kind of improve productivity it's a real critical part for the decision makers in a uh you know for for all of our companies so before we get in share the actual test results what specifically did prowess measure what were some of the metrics that we're going to see here we focused on the transactional workloads so we did something called a tpcc like and let me be really clear we did not execute a tpcc benchmark but it was a tpcc like benchmark and tpcc is one of the most mature standardized industry database benchmarks in the world and what it does is it simulates a sales model of a wholesale supplier so we can all kind of agree that you know handling payments and orders and status and deliveries and things like that those are those are really critical parts to running a business and ultimately what this results in is something called a new order so somebody might go on they'll log on they'll say hey is this available let me pay you um and then once that transaction is done it's called a new to order so they come up with something called a tpmc which is the new order transactions per minute now the neat thing is it's not just a one-size-fits-all kind of benchmark so you get to scale that in the way you scale the database you scale the size and the capacity of the database by adding more warehouses in our case we actually decided to choose 1400 warehouses which is a pretty standardized size and then you can also test the concurrency so you could start from one thread which kind of simulates a user all the way up to however many threads you want we decided to settle on 100 threads now this is very different from the generic benchmarking we're actually doing real work we're not just doing random reads and random rights which those are great they're critical they tell us how well we're performing but this is more like a paced workload it really executes sql i o transactions uh and you know those in order operations um are very different you do a read and then a write and then another read and those have to be executed in order it's very different from just setting up a q depth and a workers and it also provides very realistic and objective measurements that exercises not just the storage but the entire server all right let's get into some of the results so the first graphic we're going to show you is that what you were just talking about new orders per minute how should we interpret uh this graphic kim well i mean it looks like we won the waccamo game didn't we so we started out with with the baseline here the r740xd and we measured the new order transactions per minute on that we then set up the r 750 in the very first rs 750 and we have the very all the details are laid out in the paper that you just referenced there um but we started out with a single raid controller with eight drives and we measured that we got a 7x increase and then in the second test we actually added another rig controller and another eight drives and then we we kind of upgraded the the processor a little bit we were able to even double that over the initial one so and you know how do we get there that's really the more important thing and you know the the critical part of this understanding and characterizing the workload so we have to know what kind of components to balance you know where are your bottlenecks at so typically an oltp online transaction processing is a mix of transactions that are generally two reads to every one and they're very random and the way this benchmark works is it randomly accesses different warehouses it executes these queries when it executes a read query it pulls that data into memory well once the data is into memory any kind of transactions are acted on it in memory so the actual database engine does in memory transactions then you have something called a transaction log that has to record all those modifications down to non-volatile media and that's based on something um you know just to make sure that you have um all the data in case somebody pulls the plug or something you know catastrophic happens you want to make sure that those are recorded um and then every once in a while um all those in-memory changes are written down to the disk in something called a checkpoint and then we can go ahead and clear that transaction log so there's a bunch of sequence of of different kinds of i o um that happen during the course of an oltp kind of transaction so your bottlenecks are found in the processor and the memory and the amount of memory you know the latency of your disks i mean it really the whole gamut everything could be a bottleneck within there so the trick is to figure out where your bottlenecks are and trying to release those so you can get the the best performance that you possibly can yeah the sequence of events that has to take place to do a right we often we take it for granted okay the the next uh set of data we're going to look at is like you said you're doing reads you're doing right we're going to we're going to bring up now the the data around log rights and and log reads so explain what we're looking at here so as i mentioned earlier the even though the transactions happen in memory um those recorded transactions get committed down to down to the disk but eventually they get committed onto disk what we do first is we do something called a log right it's a transaction log right and that way it allows the it allows the transaction to go ahead and process so the trick here is to have the lowest latency fast disks for that log and it's very critical for your consistency and also for rollbacks and something called asset transactions and operations the log reads are really important also for the recovery efforts so we try to manage our log performance um we want low latency we want very high iops for both reads and for rights but it's not just the logs there's also the database disks and what we see is initially during these benchmarks there's a bunch of reads that are going into the database data um and then ultimately after some period of time we see something called a checkpoint and we'll see this big flurry of rights come down so you have to be able to handle all those flurry of rights as they come down and they're committed down to the disk so some of our important design considerations here are is can our processor handle this workload do we have enough memory and then finally we have three storage considerations we have a database disk we have log disk and then of course there's a temp db as well so because we have the industry leading raid 5 performance we were able to use a raid 5 for the database and that's something that you know just years ago was like whoa oh don't ever use raid 5 on your database that is no longer true our raid 5 is is fast enough and has low enough latency to handle database and it also helps save money um and then for the raid 10 we use that for a log that's pretty standardized so the faster your processor the more cores you know when you double the disk um and we get more performance so yeah you know we just figured out where the bottlenecks were we cleared them out we were able to double that that's interesting go back in history a little bit when raid 5 was all the rage uh emc at the time now of course dell when they announced symmetrics they announced it with with raid 1 which was mirroring and they did that because it was heavily into mainframe and transaction processing and while there was you know additional overhead of you do you need two disk drives to do that the performance really outweighed that and so now we're seeing with the advent of new technologies that you you're solving that problem um i i guess the other thing of course is is rebuild times and we've kind of rethought that so the next set of data that we're going to look at is is is how long it takes to rebuild uh around the raid time so we'll bring that up now and you can kind of give us the the insights here well yeah so you can see that we've been able to reduce the rebuild times and you know how do we do that well i can tell you me and my fellow architects we have been spending the last uh probably the last two years focusing on trying to improve the rebuild so we you know it's not just rebuilding faster it's also how to eloquently handle all the host operations you can't just tell those sorry i'm busy doing rebuilds you've got to be able to handle that because business continuity is a very critical component of that so um so we do that through mirroring and preparity data layouts and so the rebuild times if you can if you can do a really good balance of making sure that you are supplying a sufficient host io that we actually very quickly in the background as soon as as we have a moment we start implementing those rebuilds um you know during those law periods and so making sure that we do aggressive rebuilds by while allowing those business operations to continue have always been a real critical part but we've been working on that a lot over the last couple of generations that said we always tell our customers always have a backup that's that's a critical part to uh to business continuity plans great i wonder if we can come back to the components inside the system how does what broadcom is supplying to dell in these servers contribute to these performance results specifically kim okay so specifically um we we provide the perk storage controller and so the dell r740xd actually has their series 10 h740p controller whereas the h the r750 has the generation 11 perc 11 h755n um so we own those um you know in terms of of trying to make sure that they are integrated properly into the system provided the highest possible performance um but not just the storage controller i want to make sure that everybody knows that we also have our broadcom net extreme e series these are gen 4 pcie 25 gig do ported ethernet controllers so in you know in a critical true deployment it is a really important part of the e-commerce uh business solution so we do own the storage um for these as well as the networking excellent okay so we kind of we went deep inside into the system but let's up level why does this matter to an organization what's the business impact of all this tech coming to fruition we you know as everybody always references there's a massive growth of data and data is required for success it doesn't matter if you're a fortune 500 company or you're just a small to medium business you know it that critical data needs protected and needs protected without the complexity or the overhead or the cost of such hyper-converged infrastructures or sand deployments so we're able to do this on bare metal um and it really helps with the tco so you know and the other thing is nvme right now is the fastest growing storage nvme is so fast um as well from a performance perspective as well so that that dell r 750 with the two perc 11 controllers in it it had over 51 terabytes of storage in a single server you know and that's pretty impressive but there's um so many different performance advantages that the rs 750 provides for sql servers as well so they've got you know the gen 3 intel xeon scalable processors we've got ddr4 3200 memory you know the faster memory is very critical for those in memory transactions as well we have gen 4 pcie it really does justify an upgrade and i can tell you dave that a little over a year ago i had you know i had one of these delos 750 servers sitting in my own house and i was testing it and i was just amazed at the performance i was doing different tpcc and tpch and tpce tests on it and i was telling dell wow this is really this is amazing this server is doing so so well so i was so excited could not wait to see it in print so thank you to the prowess team um for actually showing the world what these servers can do combined with the broadcom storage now speaking of the prowess team when you read the white papers um it really is focused on this small and medium-sized business market so people might be wondering well wait a minute why wouldn't folks just spin up this compute in the cloud why am i buying servers well that's a really good question you know that still you know the studies have shown that the majority of workloads are still on-prem um and also you know there's a challenge here with the skill sets there's a lack of developers for cloud and you know cloud architects so keeping these in prem where you actually own it it really does help keep costs down um and just the management of these r750s are fantastic and the support that dell provides as well great kim i love having you on and we'd like to have you back we're going to leave it there for now but thanks so much i really appreciate your time thanks dave so look this is really helpful in understanding that at the end of the day you still need microprocessors and memories and storage devices controllers and interconnects that we you know we just saw pat gelsinger at the state of the union address nudging the federal government to support semiconductor manufacturing and you know intel is going to potentially match tsm's 100 billion dollar capex commitment and that's going to be a tailwind for the surrounding components you know including semiconductor you know component core infrastructure designers like broadcom now this is a topic that we care about and and like i said kim we're going to have you back and we plan to continue our coverage under the hood in the future so thank you for watching this cube conversation this is dave vellante and we'll see you next time [Music] you

Published Date : Mar 3 2022

**Summary and Sentiment Analysis are not been shown because of improper transcript**

ENTITIES

Entity	Category	Confidence
100 threads	QUANTITY	0.99+
kim lenar	PERSON	0.99+
7x	QUANTITY	0.99+
r 750	COMMERCIAL_ITEM	0.99+
1400 warehouses	QUANTITY	0.99+
eight drives	QUANTITY	0.99+
rs 750	COMMERCIAL_ITEM	0.99+
dave	PERSON	0.99+
microsoft	ORGANIZATION	0.99+
one thread	QUANTITY	0.99+
two reads	QUANTITY	0.99+
rs750	COMMERCIAL_ITEM	0.98+
rs 750	COMMERCIAL_ITEM	0.98+
second test	QUANTITY	0.98+
today	DATE	0.98+
thousands of customers	QUANTITY	0.98+
two kinds	QUANTITY	0.98+
over 51 terabytes	QUANTITY	0.97+
dave vellante	PERSON	0.97+
740xd	COMMERCIAL_ITEM	0.97+
first thing	QUANTITY	0.97+
one	QUANTITY	0.97+
r750s	COMMERCIAL_ITEM	0.96+
two things	QUANTITY	0.96+
intel	ORGANIZATION	0.96+
kim	PERSON	0.95+
over a year ago	DATE	0.95+
tom	PERSON	0.95+
dozens of performance white papers	QUANTITY	0.95+
two different configurations	QUANTITY	0.95+
first graphic	QUANTITY	0.95+
first	QUANTITY	0.94+
broadcom	ORGANIZATION	0.94+
100 billion dollar	QUANTITY	0.94+
decades	QUANTITY	0.94+
dell	ORGANIZATION	0.94+
25 gig	QUANTITY	0.94+
xeon	COMMERCIAL_ITEM	0.93+
r 750	COMMERCIAL_ITEM	0.92+
raid 1	OTHER	0.92+
about 20 megabytes	QUANTITY	0.92+
two disk drives	QUANTITY	0.91+
single	QUANTITY	0.9+
r740xd	COMMERCIAL_ITEM	0.89+
160	COMMERCIAL_ITEM	0.88+
a couple of years	QUANTITY	0.87+
single server	QUANTITY	0.86+
raid 10	OTHER	0.85+
raid 5	OTHER	0.84+
5	OTHER	0.84+
raid 5	TITLE	0.84+
last two years	DATE	0.83+
both reads	QUANTITY	0.83+
r750	COMMERCIAL_ITEM	0.79+
years ago	DATE	0.78+
22 years ago back	DATE	0.75+
double	QUANTITY	0.72+
every one	QUANTITY	0.72+
10	COMMERCIAL_ITEM	0.71+
ddr4 3200	COMMERCIAL_ITEM	0.7+
two perc	QUANTITY	0.69+
gen 4	OTHER	0.68+
multiple decades	QUANTITY	0.68+
government	ORGANIZATION	0.63+
one of the most popular	QUANTITY	0.6+
series 11	QUANTITY	0.59+
gen 4	QUANTITY	0.59+
3	OTHER	0.57+
h755n	COMMERCIAL_ITEM	0.56+
h740p	COMMERCIAL_ITEM	0.56+
most mature	QUANTITY	0.56+
gen 3	OTHER	0.55+
waccamo	TITLE	0.54+
minute	QUANTITY	0.54+
raid	TITLE	0.49+
11	QUANTITY	0.48+
fortune	QUANTITY	0.47+

UNLIST TILL 4/1 - How The Trade Desk Reports Against Two 320-node Clusters Packed with Raw Data

hi everybody thank you for joining us today for the virtual Vertica BBC 2020 today's breakout session is entitled Vertica and en mode at the trade desk my name is su LeClair director of marketing at Vertica and I'll be your host for this webinar joining me is Ron Cormier senior Vertica database engineer at the trade desk before we begin I encourage you to submit questions or comments during the virtual session you don't have to wait just type your question or comment in the question box below the slides and click submit there will be a Q&A session at the end of the presentation we'll answer as many questions as we're able to during that time any questions that we don't address we'll do our best to answer them offline alternatively you can visit vertical forums to post your questions there after the session our engineering team is planning to join the forums to keep the conversation going also a quick reminder that you can maximize your screen by clicking the double arrow button in the lower right corner of the slide and yes this virtual session is being recorded and will be available to view on demand this week we'll send you a notification as soon as it's ready so let's get started over to you run thanks - before I get started I'll just mention that my slide template was created before social distancing was a thing so hopefully some of the images will harken us back to a time when we could actually all be in the same room but with that I want to get started uh the date before I get started in thinking about the technology I just wanted to cover my background real quick because I think it's peach to where we're coming from with vertically on at the trade desk and I'll start out just by pointing out that prior to my time in the trade desk I was a tech consultant at HP HP America and so I traveled the world working with Vertica customers helping them configure install tune set up their verdict and databases and get them working properly so I've seen the biggest and the smallest implementations and everything in between and and so now I'm actually principal database engineer straight desk and and the reason I mentioned this is to let you know that I'm a practitioner I'm working with with the product every day or most days this is a marketing material so hopefully the the technical details in this presentation are are helpful I work with Vertica of course and that is most relative or relevant to our ETL and reporting stack and so what we're doing is we're taking about the data in the Vertica and running reports for our customers and we're an ad tech so I did want to just briefly describe what what that means and how it affects our implementation so I'm not going to cover the all the details of this slide but basically I want to point out that the trade desk is a DSP it's a demand-side provider and so we place ads on behalf of our customers or agencies and ad agencies and their customers that are advertised as brands themselves and the ads get placed on to websites and mobile applications and anywhere anywhere digital advertising happens so publishers are what we think ocean like we see here espn.com msn.com and so on and so every time a user goes to one of these sites or one of these digital places and an auction takes place and what people are bidding on is the privilege of showing and add one or more ads to users and so this is this is really important because it helps fund the internet ads can be annoying sometimes but they actually help help are incredibly helpful in how we get much much of our content and this is happening in real time at very high volumes so on the open Internet there is anywhere from seven to thirteen million auctions happening every second of those seven to thirteen million auctions happening every second the trade desk bids on hundreds of thousands per second um so that gives it and anytime we did we have an event that ends up in Vertica that's that's one of the main drivers of our data volume and certainly other events make their way into Vertica as well but that wanted to give you a sense of the scale of the data and sort of how it's impacting or how it is impacted by sort of real real people in the world so um the uh let's let's take a little bit more into the workload and and we have the three B's in spades late like many many people listening to a massive volume velocity and variety in terms of the data sizes I've got some information here some stats on on the raw data sizes that we deal with on a daily basis per day so we ingest 85 terabytes of raw data per day and then once we get it into Vertica we do some transformations we do matching which is like joins basically and we do some aggregation group buys to reduce the data and make it clean it up make it so it's more efficient to consume buy our reporting layer so that matching in aggregation produces about ten new terabytes of raw data per day it all comes from the it all comes from the data that was ingested but it's new data and so that's so it is reduced quite a bit but it's still pretty pretty high high volume and so we have this aggregated data that we then run reports on on behalf of our customers so we have about 40,000 reports per day oh that's probably that's actually a little bit old and older number it's probably closer to 50 or 55,000 reports per day at this point so it's I think probably a pretty common use case for for Vertica customers it's maybe a little different in the sense that most of the reports themselves are >> reports so they're not it's not a user sitting at a keyboard waiting for the result basically we have we we have a workflow where we do the ingest we do this transform and then and then once once all the data is available for a day we run reports on behalf of our customer to let me have our customers on that that daily data and then we send the reports out you via email or we drop them in a shared location and then they they look at the reports at some later point of time so it's up until yawn we did all this work on on enterprise Vertica at our peak we had four production enterprise clusters each which held two petabytes of raw data and I'll give you some details on on how those enterprise clusters were configured in the hardware but before I do that I want to talk about the reporting workload specifically so the the reporting workload is particularly lumpy and what I mean by that is there's a bunch of work that becomes available bunch of queries that we need to run in a short period of time after after the days just an aggregation is completed and then the clusters are relatively quiet for the remaining portion of the day that's not to say they are they're not doing anything as far as read workload but they certainly are but it's much less reactivity after that big spike so what I'm showing here is our reporting queue and the spike is is when all those reports become a bit sort of ailable to be processed we can't we can't process we can't run the report until we've done the full ingest and matching and aggregation for the day and so right around 1:00 or 2:00 a.m. UTC time every day that's when we get this spike and the spike we affectionately called the UTC hump but basically it's a huge number of queries that need to be processed sort of as soon as possible and we have service levels that dictate what as soon as possible means but I think the spike illustrates our use case pretty pretty accurately and um it really as we'll see it's really well suited for pervert icky on and we'll see what that means so we've got our we had our enterprise clusters that I mentioned earlier and just to give you some details on what they look like there they were independent and mirrored and so what that means is all four clusters held the same data and we did this intentionally because we wanted to be able to run our report anywhere we so so we've got this big queue over port is big a number of reports that need to be run and we've got these we started we started with one cluster and then we got we found that it couldn't keep up so we added a second and we found the number of reports went up that we needed to run that short period of time and and so on so we eventually ended up with four Enterprise clusters basically with this with the and we'd say they were mirrored they all had the same data they weren't however synchronized they were independent and so basically we would run the the tailpipe line so to speak we would run ingest and the matching and the aggregation on all the clusters in parallel so they it wasn't as if each cluster proceeded to the next step in sync with which dump the other clusters they were run independently so it was sort of like each each cluster would eventually get get consistent and so this this worked pretty well for for us but it created some imbalances and there was some cost concerns that will dig into but just to tell you about each of these each of these clusters they each had 50 nodes they had 72 logical CPU cores a half half a terabyte of RAM a bunch of raid rated disk drives and 2 petabytes of raw data as I stated before so pretty big beefy nodes that are physical physical nodes that we held we had in our data centers we actually reached these nodes so so it was on our data center providers data centers and the these were these these were what we built our business on basically but there was a number of challenges that we ran into as we as we continue to build our business and add data and add workload and and the first one is is some in ceremony can relate to his capacity planning so we had to prove think about the future and try to predict the amount of work that was going to need to be done and how much hardware we were going to need to satisfy that work to meet that demand and that's that's just generally a hard thing to do it's very difficult to verdict the future as we can probably all attest to and how much the world has changed and even in the last month so it's a it's a very difficult thing to do to look six twelve eighteen eighteen months into the future and sort of get it right and and and what people what we tended to do is we reach or we tried to our art plans our estimates were very conservative so we overbought in a lot of cases and not only that we had to plan for the peak so we're planning for that that that point in time that those number of hours in the early morning when we had to we had all those reports to run and so that so so we ended up buying a lot of hardware and we actually sort of overbought at times and then and then as the hardware were days it would kind of come into it would come into maturity and we have our our our workload would sort of come approach matching the demand so that was one of the big challenges the next challenge is that we were running on disk you can we wanted to add data in sort of two dimensions the only dimensions that everybody can think about we wanted to add more columns to our big aggregates and we wanted to keep our big aggregates for for longer periods of time so both horizontally and vertically we wanted to expand the datasets but we basically were running out of disk there was no more disk in and it's hard to add a disc to Vertica in enterprise mode not not impossible but certainly hard and and one cannot add discs without adding compute because enterprise mode the disk is all local to each of the nodes for most most people you can do not exchange with sands and other external rays but that's there are a number of other challenges with that so um adding in order to add disk we had to add compute and that basically meant kept us out of balance we're adding more compute than we needed for the amount of disk so that was the problem certainly physical nodes getting them the order delivered racked cables even before we even start such Vertica there's lead times there and and so it's also long commitment since we like I mentioned me Lisa hardware so we were committing to these nodes these physical servers for two or three years at a time and I mentioned that can be a hard thing to do but we wanted to least to keep our capex down so we wanted to keep our aggregates for a long period of time we could have done crazy things or more exotic things to to help us with this if we had to in enterprise mode we could have started to like daisy chain clusters together and that would have been sort of a non-trivial engineering effort because we would need to then figure out how to migrate data source first to recharge the data across all the clusters and we had to migrate data from one cluster to another cluster hesitation and we would have to think about how to aggregate run queries across clusters so if you assured data set spans two clusters it would have had to sort of aggregated within each cluster maybe and then build something on top the aggregated the data from each of those clusters so not impossible things but certainly not easy things and luckily for us we started talking about two Vertica about separation of compute and storage and I know other customers were talking to Vertica as we were people had had these problems and so Vertica inyeon mode came to the rescue and what I want to do is just talk about nyan mode really briefly for for those in the audience who aren't familiar but it's basically Vertigo's answered to the separation of computing storage it allows one to scale compute and or storage separately and and this there's a number of advantages to doing that whereas in the old enterprise days when you add a compute you added stores and vice-versa now we can now we can add one or the other or both according to how we want to and so really briefly how this works this slide this figure was taken directly from the verdict and documentation and so just just to talk really briefly about how it works the taking advantage of the cloud and so in this case Amazon Web Services the elasticity in the cloud and basically we've got you seen two instances so elastic cloud compute servers that access data that's in an s3 bucket and so three three ec2 nodes and in a bucket or the the blue objects in this diagram and the difference is a couple of a couple of big differences one the data no longer the persistent storage of the data the data where the data lives is no longer on each of the notes the persistent stores of the data is in s3 bucket and so what that does is it basically solves one of our first big problems which is we were running out of disk the s3 has for all intensive purposes infinite storage so we can keep much more data there and that mostly solved one of our big problems so the persistent data lives on s3 now what happens is when a query runs it runs on one of the three nodes that you see here and assuming we'll talk about depo in a second but what happens in a brand new cluster where it's just just spun up the hardware is the query will will run on those ec2 nodes but there will be no data so those nodes will reach out to s3 and run the query on remote storage so that so the query that the nodes are literally reaching out to the communal storage for the data and processing it entirely without using any data on on the nodes themselves and so that that that works pretty well it's not as fast as if the data was local to the nodes but um what Vertica did is they built a caching layer on on each of the node and that's what the depot represents so the depot is some amount of disk that is relatively local to the ec2 node and so when the query runs on remote stores on the on the s3 data it then queues up the data for download to the nodes and so the data will get will reside in the Depot so that the next query or the subsequent subsequent queries can run on local storage instead of remote stores and that speeds things up quite a bit so that that's that's what the role of the Depot is the depot is basically a caching layer and we'll talk about the details of how we can see your in our Depot the other thing that I want to point out is that since this is the cloud another problem that helps us solve is the concurrency problem so you can imagine that these three nodes are one sort of cluster and what we can do is we can spit up another three nodes and have it point to the same s3 communal storage bucket so now we've got six nodes pointing to the same data but we've you isolated each of the three nodes so that they act as if they are their own cluster and so vertical calls them sub-clusters so we've got two sub clusters each of which has three nodes and what this has essentially done it is it doubled the concurrency doubled the number of queries that can run at any given time because we've now got this new place which new this new chunk of compute which which can answer queries and so that has given us the ability to add concurrency much faster and I'll point out that for since it's cloud and and there are on-demand pricing models we can have significant savings because when a sub cluster is not needed we can stop it and we pay almost nothing for it so that's that's really really important really helpful especially for our workload which I pointed out before was so lumpy so those hours of the day when it's relatively quiet I can go and stop a bunch of sub clusters and and I will pay for them so that that yields nice cost savings let's be on in a nutshell obviously engineers and the documentation can use a lot more information and I'm happy to field questions later on as well but I want to talk about how how we implemented beyond at the trade desk and so I'll start on the left hand side at the top the the what we're representing here is some clusters so there's some cluster 0 r e t l sub cluster and it is a our primary sub cluster so when you get into the world of eon there's primary Club questions and secondary sub classes and it has to do with quorum so primary sub clusters are the sub clusters that we always expect to be up and running and they they contribute to quorum they decide whether there's enough instances number a number of enough nodes to have the database start up and so these this is where we run our ETL workload which is the ingest the match in the aggregate part of the work that I talked about earlier so these nodes are always up and running because our ETL pipeline is always on we're internet ad tech company like I mentioned and so we're constantly getting costly running ad and there's always data flowing into the system and the matching is happening in the aggregation so that part happens 24/7 and we wanted so that those nodes will always be up and running and we need this we need that those process needs to be super efficient and so what that is reflected in our instance type so each of our sub clusters is sixty four nodes we'll talk about how we came at that number but the infant type for the ETL sub cluster the primary subclusters is I 3x large so that is one of the instance types that has quite a bit of nvme stores attached and we'll talk about that but on 32 cores 240 four gigs of ram on each node and and that what that allows us to do I should have put the amount of nvme but I think it's seven terabytes for anything me storage what that allows us to do is to basically ensure that our ETL everything that this sub cluster does is always in Depot and so that that makes sure that it's always fast now when we get to the secondary subclusters these are as mentioned secondary so they can stop and start and it won't affect the cluster going up or down so they're they're sort of independent and we've got four what we call Rhian subclusters and and they're not read by definition or technically they're not read only any any sub cluster can ingest and create your data within the database and that'll all get that'll all get pushed to the s3 bucket but logically for us they're read only like these we just most of these the work that they happen to do is read only which it is which is nice because if it's read only it doesn't need to worry about commits and we let we let the primary subclusters or ETL so close to worry about committing data and we don't have to we don't have to have the all nodes in the database participating in transaction commits so we've got a for read subclusters and we've got one EP also cluster so a total of five sub clusters each so plus they're running sixty-four nodes so that gives us a 320 node database all things counted and not all those nodes are up at the same time as I mentioned but often often for big chunks of the days most of the read nodes are down but they do all spin up during our during our busy time so for the reading so clusters we've got I three for Excel so again the I three incidents family type which has nvme stores these notes have I think three and a half terabytes of nvme per node we just rate it to nvme drives we raid zero them together and 16 cores 122 gigs of ram so these are smaller you'll notice but it works out well for us because the the read workload is is typically dealing with much smaller data sets than then the ingest or the aggregation workbook so we can we can run these workloads on on smaller instances and leave a little bit of money and get more granularity with how many sub clusters are stopped and started at any given time the nvme doesn't persist the data on it isn't persisted remember you stop and start this is an important detail but it's okay because the depot does a pretty good job in that in that algorithm where it pulls data in that's recently used and the that gets pushed out a victim is the data that's least reasons use so it was used a long time ago so it's probably not going to be used to get so we've got um five sub-clusters and we have actually got to two of those so we've got a 320 node cluster in u.s. East and a 320 node cluster in u.s. West so we've got a high availability region diversity so and their peers like I talked about before they're they're independent but but yours they are each run 128 shards and and so with that what that which shards are is basically the it's similar to segmentation when you take those dataset you divide it into chunks and though and each sub cluster can concede want the data set in its entirety and so each sub cluster is dealing with 128 shards it shows 128 because it'll give us even distribution of the data on 64 node subclusters 60 120 might evenly by 64 and so there's so there's no data skew and and we chose 128 because the sort of ginger proof in case we wanted to double the size of any of the questions we can double the number of notes and we still have no excuse the data would be distributed evenly the disk what we've done is so we've got a couple of raid arrays we've got an EBS based array that they're catalog uses so the catalog storage location and I think we take for for EBS volumes and raid 0 them together and come up with 128 gigabyte Drive and we wanted an EPS for the catalog because it we can stop and start nodes and that data will persist it will come back when the node comes up so we don't have to run a bunch of configuration when the node starts up basically the node starts it automatically joins the cluster and and very strongly there after it starts processing work let's catalog and EBS now the nvme is another raid zero as I mess with this data and is ephemeral so let me stop and start it goes away but basically we take 512 gigabytes of the nvme and we give it to the data temp storage location and then we take whatever is remaining and give it to the depot and since the ETL and the reading clusters are different instance types they the depot is is side differently but otherwise it's the same across small clusters also it all adds up what what we have is now we we stopped the purging data for some of our big a grits we added bunch more columns and what basically we at this point we have 8 petabytes of raw data in each Jian cluster and it is obviously about 4 times what we can hold in our enterprise classes and we can continue to add to this maybe we need to add compute maybe we don't but the the amount of data that can can be held there against can obviously grow much more we've also built in auto scaling tool or service that basically monitors the queue that I showed you earlier monitors for those spikes I want to see as low spikes it then goes and starts up instances one sub-collector any of the sub clusters so that's that's how that's how we we have compute match the capacity match that's the demand also point out that we actually have one sub cluster is a specialized nodes it doesn't actually it's not strictly a customer reports sub clusters so we had this this tool called planner which basically optimizes ad campaigns for for our customers and we built it it runs on Vertica uses data and Vertica runs vertical queries and it was it was wildly successful um so we wanted to have some dedicated compute and beyond witty on it made it really easy to basically spin up one of these sub clusters or new sub cluster and say here you go planner team do what you want you can you can completely maximize the resources on these nodes and it won't affect any of the other operations that were doing the ingest the matching the aggregation or the reports up so it gave us a great deal of flexibility and agility which is super helpful so the question is has it been worth it and without a doubt the answer is yes we're doing things that we never could have done before sort of with reasonable cost we have lots more data specialized nodes and more agility but how do you quantify that because I don't want to try to quantify it for you guys but it's difficult because each eon we still have some enterprise nodes by the way cost as you have two of them but we also have these Eon clusters and so they're there they're running different workloads the aggregation is different the ingest is running more on eon does the number of nodes is different the hardware is different so there are significant differences between enterprise and and beyond and when we combine them together to do the entire workload but eon is definitely doing the majority of the workload it has most of the data it has data that goes is much older so it handles the the heavy heavy lifting now the query performance is more anecdotal still but basically when the data is in the Depot the query performance is very similar to enterprise quite close when the data is not in Depot and it needs to run our remote storage the the query performance is is is not as good it can be multiples it's not an order not orders of magnitude worse but certainly multiple the amount of time that it takes to run on enterprise but the good news is after the data downloads those young clusters quickly catch up as the cache populates there of cost I'd love to be able to tell you that we're running to X the number of reports or things are finishing 8x faster but it's not that simple as you Iran is that you it is me I seem to have gotten to thank you you hear me okay I can hear you now yeah we're still recording but that's fine we can edit this so if I'm just talking to the person the support person he will extend our recording time so if you want to maybe pick back up from the beginning of the slide and then we'll just edit out this this quiet period that we have sir okay great I'm going to go back on mute and why don't you just go back to the previous slide and then come into this one again and I'll make sure that I tell the person who yep perfect and then we'll continue from there is that okay yeah sound good all right all right I'm going back on yet so the question is has it been worth it and for us the answer has been a resounding yes we're doing things that we never could have done at reasonable cost before and we got more data we've got this Y note this law has nodes and in work we're much more agile so how to quantify that um well it's not quite as simple and straightforward as you might hope I mean we still have enterprise clusters we've got to update the the four that we had at peak so we've still got two of those around and we got our two yawn clusters but they're running different workloads and they're comprised of entirely different hardware the dependence has I've covered the number of nodes is different for sub-clusters so 64 versus 50 is going to have different performance the the workload itself the aggregation is aggregating more columns on yon because that's where we have disk available the queries themselves are different they're running more more queries on more intensive data intensive queries on yon because that's where the data is available so in a sense it is Jian is doing the heavy lifting for the cluster for our workload in terms of query performance still a little anecdotal but like when the queries that run on the enterprise cluster the performance matches that of the enterprise cluster quite closely when the data is in the Depot when the data is not in a Depot and Vertica has to go out to the f32 to get the data performance degrades as you might expect it can but it depends on the curious all things like counts counts are is really fast but if you need lots of the data from the material others to realize lots of columns that can run slower I'm not orders of magnitude slower but certainly multiple of the amount of time in terms of costs anecdotal will give a little bit more quantifying here so what I try to do is I try to figure out multiply it out if I wanted to run the entire workload on enterprise and I wanted to run the entire workload on e on with all the data we have today all the queries everything and to try to get it to the Apple tab so for enterprise the the and estimate that we do need approximately 18,000 cores CPU cores all together and that's a big number but that's doesn't even cover all the non-trivial engineering work that would need to be required that I kind of referenced earlier things like starting the data among multiple clusters migrating the data from one culture to another the daisy chain type stuff so that's that's the data point now for eon is to run the entire workload estimate we need about twenty thousand four hundred and eighty CPU cores so more CPU cores uh then then enterprise however about half of those and partly ten thousand of both CPU cores would only run for about six hours per day and so with the on demand and elasticity of the cloud that that is a huge advantage and so we are definitely moving as fast as we can to being on all Aeon we have we have time left on our contract with the enterprise clusters or not we're not able to get rid of them quite yet but Eon is certainly the way of the future for us I also want to point out that uh I mean yawn is we found to be the most efficient MPP database on the market and what that refers to is for a given dollar of spend of cost we get the most from that zone we get the most out of Vertica for that dollar compared to other cloud and MPP database platforms so our business is really happy with what we've been able to deliver with Yan Yan has also given us the ability to begin a new use case which is probably this case is probably pretty familiar to folks on the call where it's UI based so we'll have a website that our customers can log into and on that website they'll be able to run reports on queries through the website and have that run directly on a separate row to get beyond cluster and so much more latent latency sensitive and concurrency sensitive so the workflow that I've described up until this point has been pretty steady throughout the day and then we get our spike and then and then it goes back to normal for the rest of the day this workload it will be potentially more variable we don't know exactly when our engineers are going to deliver some huge feature that is going to make a 1-1 make a lot of people want to log into the website and check how their campaigns are doing so we but Yohn really helps us with this because we can add a capacity so easily we cannot compute and we can add so we can scale that up and down as needed and it allows us to match the concurrency so beyond the concurrency is much more variable we don't need a big long lead time so we're really excited about about this so last slide here I just want to leave you with some things to think about if you're about to embark or getting started on your journey with vertically on one of the things that you'll have to think about is the no account in the shard count so they're kind of tightly coupled the node count we determined by figuring like spinning up some instances in a single sub cluster and getting performance smaller to finding an acceptable performance considering current workload future workload for the queries that we had when we started and so we went with 64 we wanted to you want to certainly want to increase over 50 but we didn't want to have them be too big because of course it costs money and so what you like to do things in power to so 64 nodes and then the shard count for the shards again is like the data segmentation is a new type of segmentation on the data and the start out we went with 128 it began the reason is so that we could have no skew but you know could process the same same amount of data and we wanted to future-proof it so that's probably it's probably a nice general recommendation doubleness account for the nodes the instance type and and how much people space those are certainly things you're going to consider like I was talking about we went for they I three for Excel I 3/8 Excel because they offer good good Depot stores which gives us a really consistent good performance and it is all in Depot the pretty good mud presentation and some information on on I think we're going to use our r5 or the are for instance types for for our UI cluster so much less the data smaller so much less enter this on Depot so we don't need on that nvm you stores the reader we're going to want to have a reserved a mix of reserved and on-demand instances if you're if you're 24/7 shop like we are like so our ETL subclusters those are reserved instances because we know we're going to run those 24 hours a day 365 days a year so there's no advantage of having them be on-demand on demand cost more than reserve so we get cost savings on on figuring out what we're going to run and have keep running and it's the read subclusters that are for the most part on on demand we have one of our each sub Buster's is actually on 24/7 because we keep it up for ad-hoc queries your analyst queries that we don't know when exactly they're going to hit and they want to be able to continue working whenever they want to in terms of the initial data load the initial data ingest what we had to do and now how it works till today is you've got to basically load all your data from scratch there isn't a great tooling just yet for data populate or moving from enterprise to Aeon so what we did is we exported all the data in our enterprise cluster into park' files and put those out on s3 and then we ingested them into into our first Eon cluster so it's kind of a pain we script it out a bunch of stuff obviously but they worked and the good news is that once you do that like the second yon cluster is just a bucket copy in it and so there's tools missions that can help help with that you're going to want to manage your fetches and addiction so this is the data that's in the cache is what I'm referring to here the data that's in the default and so like I talked about we have our ETL cluster which has the most recent data that's just an injected and the most difficult data that's been aggregated so this really recent data so we wouldn't want anybody logging into that ETL cluster and running queries on big aggregates to go back one three years because that would invalidate the cache the depot would start pulling in that historical data and it was our assessing that historical data and evicting the recent data which would slow things out flow down that ETL pipelines so we didn't want that so we need to make sure that users whether their service accounts or human users are connecting to the right phone cluster and I mean we just created the adventure users with IPS and target groups to palm those pretty-pretty it was definitely something to think about lastly if you're like us and you're going to want to stop and start nodes you're going to have to have a service that does that for you we're where we built this very simple tool that basically monitors the queue and stops and starts subclusters accordingly we're hoping that that we can work with Vertica to have it be a little bit more driven by the cloud configuration itself so for us all amazon and we love it if we could have it have a scale with the with the with the eight of us can take through points do things to watch out for when when you're working with Eon is the first is system table queries on storage layer or metadata and the thing to be careful of is that the storage layer metadata is replicated it's caught as a copy for each of the sub clusters that are out there so we have the ETL sub cluster and our resources so for each of the five sub clusters there is a copy of all the data in storage containers system table all the data and partitions system table so when you want to use this new system tables for analyzing how much data you have or any other analysis make sure that you filter your query with a node name and so for us the node name is less than or equal to 64 because each of our sub clusters at 64 so we limit we limit the nodes to the to the 64 et 64 node ETL collector otherwise if we didn't have this filter we would get 5x the values for counts and some sort of stuff and lastly there is a problem that we're kind of working on and thinking about is a DC table data for sub clusters that are our stops when when the instances stopped literally the operating system is down and there's no way to access it so it takes the DC table DC table data with it and so I cannot after after my so close to scale up in the morning and then they scale down I can't run DC table queries on how what performed well and where and that sort of stuff because it's local to those nodes so we're working on something so something to be aware of and we're working on a solution or an implementation to try to suck that data out of all the notes you can those read only knows that stop and start all the time and bring it in to some other kind of repository perhaps another vertical cluster so that we can run analysis and monitoring even you want those those are down that's it um thanks for taking the time to look into my presentation really do it thank you Ron that was a tremendous amount of information thank you for sharing that with everyone um we have some questions come in that I would like to present to you Ron if you have a couple min it your first let's jump right in the first one a loading 85 terabytes per day of data is pretty significant amount what format does that data come in and what does that load process look like yeah a great question so the format is a tab separated files that are Jesus compressed and the reason for that could basically historical we don't have much tabs in our data and this is how how the data gets compressed and moved off of our our bidders the things that generate most of this data so it's a PSD gzip compressed and how you kind of we kind of have how we load it I would say we have actually kind of a Cadillac loader in a couple of different perspectives one is um we've got this autist raishin layer that's homegrown managing the logs is the data that gets loaded into Vertica and so we accumulate data and then we take we take some some files and we push them to redistribute them along the ETL nodes in the cluster and so we're literally pushing the file to through the nodes and we then run a copy statement to to ingest data in the database and then we remove the file from from the nodes themselves and so it's a little bit extra data movement which you may think about changing in the future assisting we move more and more to be on well the really nice thing about this especially for for the enterprise clusters is that the copy' statements are really fast and so we the coffee statements use memory but let's pick any other query but the performance of the cautery statement is really sensitive to the amount of available memory and so since the data is local to the nodes literally in the data directory that I referenced earlier it can access that data from the nvme stores and the kabhi statement runs very fast and then that memory is available to do something else and so we pay a little bit of cost in terms of latency and in terms of downloading the data to the nose we might as we move more and more PC on we might start ingesting it directly from s3 not copying the nodes first we'll see about that what's there that's how that's how we read the data interesting works great thanks Ron um another question what was the biggest challenge you found when migrating from on-prem to AWS uh yeah so um a couple of things that come to mind the first was the baculum the data load it was kind of a pain I mean like I referenced in that last slide only because I mean we didn't have tools built to do this so I mean we had to script some stuff out and it wasn't overly complex but yes it's just a lot of data to move I mean even with starting with with two petabytes so making sure that there there is no missed data no gaps making and moving it from the enterprise cluster so what we did is we exported it to the local disk on the enterprise buses and we then we push this history and then we ingested it in ze on again Allspark X oh so it's a lot of days to move around and I mean we have to you have to take an outage at some point stop loading data while we do that final kiss-up phase and so that was that was a challenge a sort of a one-time challenge the other saying that I mean we've been dealing with a week not that we're dealing with but with his challenge was is I mean it's relatively you can still throw totally new product for vertical and so we are big advantages of beyond is allow us to stop and start nodes and recently Vertica has gotten quite good at stopping in part starting nodes for a while there it was it was it took a really long time to start to Noah back up and it could be invasive but we worked with with the engineering team with Yan Zi and others to really really reduce that and now it's not really an issue that we think that we think too much about hey thanks towards the end of the presentation you had said that you've got 128 shards but you have your some clusters are usually around 64 nodes and you had talked about a ratio of two to one why is that and if you were to do it again would you use 128 shards ah good question so that is a reference the reason why is because we wanted to future professionals so basically we wanted to make sure that the number of stars was evenly divisible by the number of nodes and you could I could have done that was 64 I could have done that with 128 or any other multiple entities for but we went with 128 is to try to protect ourselves in the future so that if we wanted to double the number of nodes in the ECL phone cluster specifically we could have done that so that was double from 64 to 128 and then each node would have happened just one chart that it had would have to deal with so so no skew um the second part of question if I had to do it if I had to do it over again I think I would have done I think I would have stuck with 128 we still have I mean so we either running this cluster for more than 18 months now I think especially in USC and we haven't needed to increase the number of nodes so in that sense like it's been a little bit extra overhead having more shards but it gives us the peace of mind that we can easily double that and not have to worry about it so I think I think everyone is a nice place to start and you may even consider a three to one or four to one if if you're if you're expecting really rapid growth that you were just getting started with you on and your business and your gates that's a small now but what you expect to have them grow up significantly less powerful green thank you Ron that's with all the questions that we have out there for today if you do have others please feel free to send them in and we will get back to you and we'll respond directly via email and again our engineers will be available on the vertical forums where you can continue the discussion with them there I want to thank Ron for the great presentation and also the audience for your participation in questions please note that a replay of today's event and a copy of the slides will be available on demand shortly and of course we invite you to share this information with your colleagues as well again thank you and this concludes this webinar and have a great day you

Published Date : Mar 30 2020

SUMMARY :

stats on on the raw data sizes that we is so that we could have no skew but you

ENTITIES

Entity	Category	Confidence
Ron Cormier	PERSON	0.99+
seven	QUANTITY	0.99+
Ron	PERSON	0.99+
two	QUANTITY	0.99+
Vertica	ORGANIZATION	0.99+
8 petabytes	QUANTITY	0.99+
122 gigs	QUANTITY	0.99+
85 terabytes	QUANTITY	0.99+
Excel	TITLE	0.99+
512 gigabytes	QUANTITY	0.99+
128 gigabyte	QUANTITY	0.99+
three nodes	QUANTITY	0.99+
three years	QUANTITY	0.99+
six nodes	QUANTITY	0.99+
each cluster	QUANTITY	0.99+
two petabytes	QUANTITY	0.99+
240	QUANTITY	0.99+
2 petabytes	QUANTITY	0.99+
16 cores	QUANTITY	0.99+
espn.com	OTHER	0.99+
Amazon Web Services	ORGANIZATION	0.99+
Yan Yan	ORGANIZATION	0.99+
more than 18 months	QUANTITY	0.99+
today	DATE	0.99+
each cluster	QUANTITY	0.99+
one	QUANTITY	0.99+
one cluster	QUANTITY	0.99+
each	QUANTITY	0.99+
amazon	ORGANIZATION	0.99+
32 cores	QUANTITY	0.99+
ten thousand	QUANTITY	0.98+
each sub cluster	QUANTITY	0.98+
one cluster	QUANTITY	0.98+
72	QUANTITY	0.98+
seven terabytes	QUANTITY	0.98+
two dimensions	QUANTITY	0.98+
Two	QUANTITY	0.98+
5x	QUANTITY	0.98+
first one	QUANTITY	0.98+
first	QUANTITY	0.98+
eon	ORGANIZATION	0.98+
128	QUANTITY	0.98+
50	QUANTITY	0.98+
four gigs	QUANTITY	0.98+
s3	TITLE	0.98+
three and a half terabytes	QUANTITY	0.98+
this week	DATE	0.98+
64	QUANTITY	0.98+
8x	QUANTITY	0.97+
one chart	QUANTITY	0.97+
about ten new terabytes	QUANTITY	0.97+
one-time	QUANTITY	0.97+
two instances	QUANTITY	0.97+
Depot	ORGANIZATION	0.97+
last month	DATE	0.97+
five sub-clusters	QUANTITY	0.97+
two clusters	QUANTITY	0.97+
each node	QUANTITY	0.97+
five sub clusters	QUANTITY	0.96+

Eric Herzog, IBM | Cisco Live EU Barcelona 2020

>> Announcer: Live from Barcelona, Spain, it's theCUBE, covering Cisco Live 2020, brought to you by Cisco and its ecosystem partners. >> Welcome back to Barcelona, everybody, we're here at Cisco Live, and you're watching theCUBE, the leader in live tech coverage. We go to the events and extract the signal from the noise. This is day one, really, we started day zero yesterday. Eric Herzog is here, he's the CMO and Vice President of Storage Channels. Probably been on theCUBE more than anybody, with the possible exception of Pat Gelsinger, but you might surpass him this week, Eric. Great to see you. >> Great to see you guys, love being on theCUBE, and really appreciate the coverage you do of the entire industry. >> This is a big show for you guys. I was coming down the escalator, I saw up next Eric Herzog, so I sat down and caught the beginning of your presentation yesterday. You were talking about multicloud, which we're going to get into, you talked about cybersecurity, well let's sort of recap what you told the audience there and really let's dig in. >> Sure, well, first thing is, IBM is a strong partner of Cisco, I mean they're a strong partner of ours both ways. We do all kinds of joint activities with them on the storage side, but in other divisions as well. The security guys do stuff with Cisco, the services guys do a ton of stuff with Cisco. So Cisco's one of our valued partners, which is why we're here at the show, and obviously, as you guys know, with a lot of the coverage you do to the storage industry, that is considered one of the big storage shows, you know, in the industry, and has been a very strong show for IBM Storage and what we do. >> Yeah, and I feel like, you know, it brings together storage folks, whether it's data protection, or primary storage, and sort of is a collection point, because Cisco is a very partner-friendly organization. So talk a little bit about how you go to market, how you guys see the multicloud world, and what each of you brings to the table. >> Well, so we see it in a couple of different facts. So first of all, the day of public cloud only or on-prem only is long gone. There are a few companies that use public cloud only, but yeah, when you're talking mid-size enterprise, and certainly into let's say the global 2500, that just doesn't work. So certain workloads reside well in the cloud, and certain workloads reside well on-prem, and there's certain that can back and forth, right, developed in a cloud but then move it back on, for example, highly transactional workload, once you get going on that, you're not going to run that on any cloud provider, but that doesn't mean you can't develop the app, test the app, out in the cloud and then bring it back on. So we also see that the days of a cloud provider for big enterprise and again up to the 2500 of the global fortunes, that's not true either, because just as with other infrastructure and other technologies, they often have multiple vendors, and in fact, you know, what I've seen from talking to CIOs is, if they have three cloud providers, that's low. Many of 'em talk about five or six, whether that be for legal reasons, whether that be for security reasons, or of course the easy one, which is, we need to get a good price, and if we just use one vendor, we're not going to get a good price. And cloud is mature, cloud's not new anymore, the cloud is pretty old, it's basically, sort of, version three of the internet, (laughs) and so, you know, I think some of the procurement guys are a little savvy about why would you only use Amazon or only use Azure or only use Google or only use IBM Cloud. Why not use a couple to keep them, you know, which is kind of normal when procurement gets involved, and say, cloud is not new anymore, so that means procurement gets involved. >> Well, and it's kind of, comes down to the workload. You got certain clouds that are better, you have Microsoft if you want collaboration, you have Amazon if you want infrastructure for devs, on-prem if you want, you know, family jewels. So I got a question for you. So if you look at, you know, it's early 2020, entering a new decade, if you look at the last decade, some of the big themes. You had the consumerization of IT, you had, you know, Web 2.0, you obviously had the big data meme, which came and went and now it's got an AI. And of course you had cloud. So those are the things that brought us here over the last 10 years of innovation. How do you see the next 10 years? What are going to be those innovation drivers? >> Well I think one of the big innovations from a cloud perspective is like, truly deploying cloud. Not playing with the cloud, but really deploying the cloud. Obviously when I say cloud, I would include private cloud utilization. Basically, when you think on-prem in my world, on-prem is really a private cloud talking to a public cloud. That's how you get a multicloud, or, if you will, a hybrid cloud. Some people still think when you talk hybrid, like literally, bare metal servers talking to the cloud, and that just isn't true, because when you look at certainly the global 2500, I can't think any of them what isn't essentially running a private cloud inside their own walls, and then, whether they're going out or not, most do, but the few that don't, they mimic a public cloud inside because of the value they see in moving workloads around, easy deployment, and scale up and scale down, whether that be storage or servers or whatever the infrastructure is, let alone the app. So I think what you're going to see now is a recognization that it's not just private cloud, it's not just public cloud, things are going to go back and forth, and basically, it's going to be a true hybrid cloud world, and I also think with the cloud maturity, this idea of a multicloud, 'cause some people think multicloud is basically private cloud talking to public cloud, and I see multicloud as not just that, but literally, I'm a big company, I'm going to use eight or nine cloud providers to keep everybody honest, or, as you just said, Dave, and put it out, certain clouds are better for certain workloads, so just as certain storage or certain servers are better when it's on-prem, that doesn't surprise us, certain cloud vendors specialize in the apps. >> Right, so Eric, we know IBM and Cisco have had a very successful partnership with the VersaStack. If you talk about in your data center, in IBM Storage, Cisco networking in servers. When I hear both IBM and Cisco talking about the message for hybrid and multicloud, they talk the software solutions you have, the management in various pieces and integration that Cisco's doing. Help me understand where VersaStack fits into that broader message that you were just talking about. >> So we have VersaStack solutions built around primarily our FlashSystems which use our Spectrum Virtualize software. Spectrum Virtualize not only supports IBM arrays, but over 500 other arrays that are not ours. But we also have a version of Spectrum Virtualize that will work with AWS and IBM Cloud and sits in a virtual machine at the cloud providers. So whether it be test and dev, whether it be migration, whether it business continuity and disaster recovery, or whether it be what I'll call logical cloud error gapping. We can do that for ourselves, when it's not a VersaStack, out to the cloud and back. And then we also have solutions in the VersaStack world that are built around our Spectrum Scale product for big data and AI. So Spectrum Scale goes out and back to the cloud, Spectrum Virtualize, and those are embedded on the arrays that come in a VersaStack solution. >> I want to bring it back to cloud a little bit. We were talking about workloads and sort of what Furrier calls horses for courses. IBM has a public cloud, and I would put forth that your wheelhouse, IBM's wheelhouse for cloud workload is the hybrid mission-critical work that's being done on-prem today in the large IBM customer base, and to the extent that some of that work's going to move into the cloud. The logical place to put that is the IBM Cloud. Here's why. You could argue speeds and feeds and features and function all day long. The migration cost of moving data and workloads from wherever, on-prem into a cloud or from on-prem into another platform are onerous. Any CIO will tell you that. So to the extent that you can minimize those migration costs, the business case for, in IBM's case, for staying within that blue blanket, is going to be overwhelmingly positive relative to having to migrate. That's my premise. So I wonder if you could comment on that, and talk about, you know, what's happening in that hybrid world specifically with your cloud? >> Well, yeah, the key thing from our perspective is we are basically running block data or file data, and we just see ourselves sitting in IBM Cloud. So when you've got a FlashSystem product or you've got our Elastic Storage System 3000, when you're talking to the IBM Cloud, you think you're talking to another one of our boxes sitting on-prem. So what we do is make that transition completely seamless, and moving data back and forth is seamless, and that's because we take a version of our software and stick in a virtual machine running at the cloud provider, in this case IBM Cloud. So the movement of data back and forth, whether it be our FlashSystem product, even we have our DS8000 can do the same thing, is very easy for an IBM customer to move to an IBM Cloud. That said, just to make sure that we're covering, and in the year of multicloud, remember the IBM Cloud division just released the Multicloud Manager, you know, second half of last year, recognizing that while they want people to focus on the IBM Cloud, they're being realistic that they're going to have multiple cloud vendors. So we've followed that mantra too, and made sure that we've followed what they're doing. As they were going to multicloud, we made sure we were supporting other clouds besides them. But from IBM to IBM Cloud it's easy to do, it's easy to traverse, and basically, our software sits on the other side, and it basically is as if we're talking to an array on prem but we're really not, we're out in the cloud. We make it seamless. >> So testing my premise, I mean again, my argument is that the complexity of that migration is going to determine in part what cloud you should go to. If it's a simple migration, and it's better, and the customer decides okay it's better off on AWS, you as a storage supplier don't care. >> That is true. >> It's agnostic to you. IBM, as a supplier of multicloud management doesn't care. I'm sure you'd rather have it run on the IBM Cloud, but if the customer says, "No, we're going to run it "over here on Azure", you say, "Great. "We're going to help you manage that experience across clouds". >> Absolutely. So, as an IBM shareholder, we wanted to go to IBM Cloud. As a realist, with what CIOs say, which is I'm probably going to use multiple clouds, we want to make sure whatever cloud they pick, hopefully IBM first, but they're going to have a secondary cloud, we want to make sure we capture that footprint regardless, and that's what we've done. As I've said for years and years, a partial PO is better than no PO. So if they use our storage and go to a competitor of IBM Cloud, while I don't like that as a shareholder, it's still good for IBM, 'cause we're still getting money from the storage division, even though we're not working with IBM Cloud. So we make it as flexible as possible for the customer, The Multicloud Manager is about customer choice, which is leading with IBM Cloud, but if they want to use a, and again, I think it's a realization at IBM Corporate that no one's going to use just one cloud provider, and so we want to make sure we empower that. Leading with IBM Cloud first, always leading with IBM Cloud first, but we want to get all of their business, and that means, other areas, for example, the Red Hat team. Red Hat works with every cloud, right? And they don't really necessarily lead with IBM Cloud, but they work with IBM Cloud all right, but guess what, IBM gets the revenue no matter what. So I don't see it's like the old traditional component guy with an OEM deal, but it kind of sort of is. 'Cause we can make money no matter what, and that's good for the IBM Corporation, but we do always lead with IBM Cloud first but we work with everybody. >> Right, so Eric, we'd agree with your point that data is not just going to live one place. One area that there's huge opportunity that I'd love to get your comment here on is edge. So we talked about, you know, the data center, we talked about public cloud. Cisco's talking a lot about their edge strategy, and one of our questions is how will they enable their partners and help grow that ecosystem? So love to hear your thoughts on edge, and any synergies between what Cisco's doing and IBM in that standpoint. >> So the thing from an edge perspective for us, is built around our new Elastic Storage System 3000, which we announced in Q4. And while it's ideal for the typical big data and AI workloads, runs Spectrum Scale, we have many a customers with Scale that are exabytes in production, so we can go big, but we also go small. It's a compact 2U all-flash array, up to 400 terabytes, that can easily be deployed at a remote location, an oil well, right, or I should say, a platform, oil platform, could be deployed obviously if you think about what's going on in the building space or I should say the skyscraper space, they're all computerized now. So you'd have that as an edge processing box, whether that be for the heating systems, the security systems, we can do that at the edge, but because of Spectrum Scale you could also send it back to whatever their core is, whether that be their core data center or whether they're working with a cloud provider. So for us, the ideal solution for us, is built around the Elastic Storage System 3000. Self-contained, two rack U, all-flash, but with Spectrum Scale on it, versus what we normally sell with our all-flash arrays, which tends to be our Spectrum Virtualize for block. This is file-based, can do the analytics at the edge, and then move the data to whatever target they want. So the source would be the ESS 3000 at the edge box, doing processing at the edge, such as an oil platform or in, I don't know what really you call it, but, you know, the guys that own all the buildings, right, who have all this stuff computerized. So that's at the edge, and then wherever their core data center is, or their cloud partner they can go that way. So it's an ideal solution because you can go back and forth to the cloud or back to their core data center, but do it with a super-compact, very high performance analytics engine that can sit at the edge. >> You know, I want to talk a little bit about business. I remember seven years ago, we covered, theCUBE, the z13 announcement, and I was talking to a practitioner at a very large bank, and I said, "You going to buy this thing?", this is the z13, you know, a couple of generations ago. He says, "Yeah, absolutely, I'll buy it sight unseen". I said, "Really, sight unseen?" He goes, "Yeah, no question. "By going to the upgrade, I'm able to drive "more transactions through my system "in a certain amount of time. "That's dropping revenue right to my bottom line. "It's a no-brainer for me." So fast forward to the z15 announcement in September in my breaking analysis, I said, "Look, IBM's going to have a great Q4 in systems", and the thing you did in storage is you synchronized, I don't know if it was by design or what, you synchronized the DS8000, new 8000 announcement with the z15, and I predicted at the time you're going to see an uptick in both the systems business, which we saw, huge, 63%, and the storage business grew I think three points as well. So I wonder if you can talk about that. Was that again by design, was it a little bit of luck involved, and you know, give us an update. >> So that was by design. When the z14 came out, which is right when I first come over from EMC, one of the things I said to my guys is, "Let's see, we have "the number one storage platform on the mainframe "in revenue, according to the analysts that check revenue. "When they launch a box, why are we not launching with them?" So for example, we were in that original press release on the z14, and then they ran a series of roadshows all over the world, probably 60. I said, "Well don't you guys do the roadshows?", and my team said, "No, we didn't do that on z12 and 13". I said, "Well were are now, because we're the number one "mainframe storage company". Why would we not go out there, get 20 minutes to speak, the bulk of it would be on the Zs. So A, we did that of course with this launch, but we also made sure that on day one launch, we were part of the launch and truly integrated. Why IBM hadn't been doing for a while is kind of beyond me, especially with our market position. So it helped us with a great quarter, helped us in the field, now by the way, we did talk about other areas that grew publicly, so there were other areas, particularly all-flash. Now we do have an all-flash 8900 of course, and the high-end tape grew as well, but our overall all-flash, both at the high end, mid range and entry, all grew. So all-flash for us was a home run. Yeah, I would argue that, you know, on the Z side, it was grand slam home run, but it was a home run even for the entry flash, which did very, very well as well. So, you know, we're hitting the right wheelhouse on flash, we led with the DS8900 attached to the Z, but some of that also pulls through, you get the magic fairy dust stuff, well they have an all-flash array on the Z, 'cause last time we didn't have an all, we had all-flash or hybrids, before that was hybrid and hard drive. This time we just said, "Forget that hybrid stuff. "We're going all-flash." So this helps, if you will, the magic fairy dust across the entire portfolio, because of our power with the mainframe, and you know, even in fact the quarter before, our entry products, we announced six nines of availability on an array that could be as low cost as $US16,000 for RAID 5 all-flash array, and most guys don't offer six nines of availability at the system level, let alone we have 100% availability guaranteed. We do charge extra for that, but most people won't even offer that on entry product, we do. So that's helped overall, and then the Z was a great launch for us. >> Now you guys, you obviously can't give guidance, you have to be very careful about that, but I, as I say, predicted in September that you'd have a good quarter in systems and storage both. I'm on the record now I'm going to say that you're going to continue to see growth, particularly in the storage side, I would say systems as well. So I would look for that. The other thing I want to point out is, you guys, you sell a lot of storage, you sell a lot of storage that sometimes the analysts don't track. When you sell into cloud, for example, IBM Storage Cloud, I don't think you get credit for that, or maybe the services, the global services division. So there's a big chunk of revenue that you don't get credited for, that I just want to highlight. Is that accurate? >> Yeah, so think about it, IBM is a very diverse company, all kinds of acquisitions, tons of different divisions, which we document publicly, and, you know, we do it differently than if it was Zoggan Store. So if I were Zoggan Store, a standalone storage company, I'd get all credit for supporting services, there's all kinds of things I'd get credit for, but because of IBM's history of how the company grew and how company acquired, stuff that is storage that Ed Walsh, or GM, does own, it's somewhat dispersed, and so we don't always get credit on it publicly, but the number we do in storage is substantially larger than what we report, 'cause all we really report is our storage systems business. Even our storage software, which one of the analysts that does numbers has us as the number two storage software company, when we do our public stuff, we don't take credit for that. Now, luckily that analyst publishes a report on the numbers side, and we are shown to be the number two storage software company in the world, but when we do our financial reporting, that, because just the history of IBM, is spread out over other parts of the company, even though our guys do the work on the sales side, the marketing side, the development side, all under Ed Walsh, but you know, part of that's just the history of the company, and all the acquisitions over years and years, remember it's a 100-year-old company. So, you know, just we don't always get all the credit, but we do own it internally, and our teams take and manage most of what is storage in the minds of storage analysts like you guys, you know what storage is, most of that is us. >> I wanted to point that out because a lot of times, practitioners will look at the data, and they'll say, oh wow, the sales person of the competitor will come in and say, "Look at this, we're number one!" But you really got to dig in, ask the questions, and obviously make the decisions for yourself. Eric, great to see you. We're going to see you later on this week as well we're going to dig into cyber. Thanks so much for coming back. >> Great, well thank you, you guys do a great job and theCUBE is literally the best at getting IT information out, particularly all the shows you do all over the world, you guys are top notch. >> Thank you. All right, and thank you for watching everybody, we'll be back with our next guest right after this break. We're here at Cisco Live in Barcelona, Dave Vellante, Stu Miniman, John Furrier. We'll be right back.

Published Date : Jan 28 2020

SUMMARY :

covering Cisco Live 2020, brought to you by Cisco but you might surpass him this week, Eric. and really appreciate the coverage you do and caught the beginning of your presentation yesterday. and obviously, as you guys know, Yeah, and I feel like, you know, and in fact, you know, what I've seen from talking So if you look at, you know, it's early 2020, and that just isn't true, because when you look at that broader message that you were just talking about. So Spectrum Scale goes out and back to the cloud, So to the extent that you can minimize the Multicloud Manager, you know, second half of last year, is going to determine in part what cloud you should go to. "We're going to help you manage that experience across clouds". and that's good for the IBM Corporation, So we talked about, you know, the data center, the security systems, we can do that at the edge, and the thing you did in storage is you synchronized, and you know, even in fact the quarter before, I'm on the record now I'm going to say in the minds of storage analysts like you guys, We're going to see you later on this week as well particularly all the shows you do all over the world, All right, and thank you for watching everybody,

ENTITIES

Entity	Category	Confidence
Eric	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
Eric Herzog	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Stu Miniman	PERSON	0.99+
September	DATE	0.99+
Pat Gelsinger	PERSON	0.99+
eight	QUANTITY	0.99+
100%	QUANTITY	0.99+
20 minutes	QUANTITY	0.99+
John Furrier	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
$US16,000	QUANTITY	0.99+
63%	QUANTITY	0.99+
EMC	ORGANIZATION	0.99+
Zoggan Store	ORGANIZATION	0.99+
one	QUANTITY	0.99+
yesterday	DATE	0.99+
Barcelona	LOCATION	0.99+
VersaStack	TITLE	0.99+
Ed Walsh	PERSON	0.99+
z14	COMMERCIAL_ITEM	0.99+
Dave	PERSON	0.99+
z15	COMMERCIAL_ITEM	0.99+
Google	ORGANIZATION	0.99+
this week	DATE	0.99+
Red Hat	ORGANIZATION	0.99+
GM	ORGANIZATION	0.99+
Barcelona, Spain	LOCATION	0.98+
six	QUANTITY	0.98+
DS8000	COMMERCIAL_ITEM	0.98+
early 2020	DATE	0.98+
both ways	QUANTITY	0.98+
seven years ago	DATE	0.98+
ESS 3000	COMMERCIAL_ITEM	0.98+
both	QUANTITY	0.98+
IBM Corporation	ORGANIZATION	0.98+
100-year-old	QUANTITY	0.98+
six nines	QUANTITY	0.97+
z13	COMMERCIAL_ITEM	0.97+
multicloud	ORGANIZATION	0.97+
DS8900	COMMERCIAL_ITEM	0.97+
lps	QUANTITY	0.97+
z12	COMMERCIAL_ITEM	0.96+

Craig Hibbert, Infinidat | CUBEConversation, April 2019

from the silicon angle media office in Boston Massachusetts it's the queue now here's your host David on tape hi everybody this is Dave lotta a and this is the cube the leader in live tech coverage this cube conversation I'm really excited Craig Hibbert is here he's a vice president of infinite at and he focuses on strategic accounts he's been in the storage business for a long time he's got great perspectives correct good to see you again thanks for coming on good to say that good to be back so there's a there's a saying don't fight fashion well you guys fight fashion all the time you got these patents you got this thing called neuro cache you're your founder and chairman mo che has always been - cutting against the grain and doing things his own way but I'd love for you to talk about some of those things the patents that you have some architecture the neuro cache fill us in on all that sure so when we go in we talk to customers and we say we have a hundred and thirty-eight patents a lot of them say well that's great but you know how does that relate to me a lot of these are and or gates and certain things that they don't know how it fits into the day to day life so I think this is a good opportunity to talk about several of those that do and so obviously the neural cache is something that is is dynamic instead of having a key in a hash which all the other vendors have just our position in that table allows us to determine all the values and things we need from it but it also monitors this is an astounding statement but from the moment that array is powered on every i/o that flows through it we track data for the life of the reins for some of these customers it's five and six years so you know those blocks of data are they random are they sequential are they hot are they cold when was the last time was accessed and this is key information because we bring intelligence to the lower level block layer where everybody else has just done they just ship things things come into acutely moving they have no idea what they are we do and the value around that is that we can then predict when workloads are aging out today you have manual people writing things in in things like easy tier or faster or competing products or two stories right and all these things that that manage all these problems are the human intervention we do it dynamically and that feeds information back into the Ray and helps to determine which virtual ray group it should reside on and where on the discipline Dalls based upon the age of the the application how it's trending the these are very powerful things in a day where we need eminent information send in to a consumer in a store I'd it's all all this dynamic processing and the ability to bring that in so that's that's one of the things we do another one is that the catalyst for our fast rebuilds we can rebuild two failed full 12 terabyte drives in under 80 minutes if those drives are half full then it's nine minutes and this is by understanding where all the data is and sharing the rebuild process from the drives that's another one of our patterns perhaps one of the most challenging that we have is that storage vendors tend to do error correction at the fibre channel layer once that data enters into the storage array there is no mechanism to check the integrity of that data and a couple of vendors have an option to do this but they can only do it for the first right and they also recommend you to turn that feature off because it slows down the box so we're infinite out is unique and I think this is for me one of the the most important paths that we have is that every time we ride a 64k slice in the system we assign some metadata to that and obviously it has a CRC check sum but more importantly it has the locality of reference so if we subsequently go back and do a reread and the CRC matches but the location has changed we know that corruption has happened sometimes a bit flipped on right all of these things that constitute sound data corruption that's not just the impressive part what we do at that point is we dynamically deduce that the data has been corrupt and using the parity in the quorum where it were a raid 6 like a dual parity configuration we rebuild that data on the fly without the application or the end-user knowing that there was a problem and that way served back the data that was actually written we guarantee that were the only array that does that today there's massive for our customers I mean the time to rebuild you said 12 terabyte drive I mean I yeah I would have thought I mean they always joke how long do you think it takes to rebuild a 30 terabyte drive because eventually you know sure you know it's like a month with us it's the same so if you look at our three terabyte drives it was 18 minutes the four terabyte drives 18 minutes the 618 minutes 812 will be good all the way up to 20 terabyte drives figuration we have no what I came back to a conversation we've had many many times we've shown you guys we were early on in the flash storage trend and we saw the prices coming down we done like high-speed spinning disks were there days were numbered and sure correct in that prediction but then you know disk drives have kept that distance yeah you guys have a skewed going all flash because the economics but help us understand this because you've got this mechanical device and you yet you guys are able to claim performance that's equal to or oftentimes much much better than a lot of your all flash competitors and I want to understand that a little bit it suggests to me that there's so much other overhead going on and other ball necks in the system that you guys are dealing with both architectural II and through your intelligence software can you talk about that absolutely absolutely the software is the key right we are a software company and we have some phenomenal guys that do the software piece so as far as the performance goes the the backend spinning discs are really obfuscated by two layers of virtualization and we ensure that because we have massive amounts of DRAM that all of that data flows into DRAM it will sit in DRAM for an astonishing five minutes I say astonishing because most of our vendors try to evict cache straight away so they've got room for the next one and that does not facilitate a mechanism by which you can inspect those dumb pieces of data and if you get enough dumb data you can start to make him intelligent right you can go get discarded data from cell phone towers and find out we know where people go to work and what time they worker because of that what demographic at the end and you know now you're predicting the election based upon discarding itself on talladega so so if you can take dumb data and put patterns around it and make it sequential which we do we write out a log structured right so we're really really fast at the front-end and some customers say well how do you manage that on the backend here's something that our designers and architects did very very well the the speed of the of ddr3 is about 15k per second which is what Cindy REM right now we have 480 spindles on the backend if you say each one of them can do a hundred 100 mics per second which they can do more than that 200 that gives us a forty eight gigabit gigabyte sorry per second backplane D stage ability which is three times faster than the DRAM so when you look at it the box has been designed all the way so there is no bottleneck through flowing through the DRAM anything that still been access that comes out of that five minute window once it's D stays to all the spindles incidentally analog structured right so right now it over 480 spindles all the time and then you've got the random still on the SSD which will help to keep that response time around about 2 milliseconds and just one last point on there I have a customer that has 1.2 petabytes written on a 1.3 a petabyte box and is still achieving a 2 millisecond response time and that's unheard of because most block arrays as you fill them up to 60 70 % that the performance starts going in the tank so I go down memory lane here so the most successful you know storage array in the history of the industry my opinion probably fact it was symmetric sand mosha a designed that he eschewed raid5 everybody was on the crazy about raid 5 is dead no no just mirror it yeah and that's gonna give us the performance that we need and he would write they would write 2d ran and then then of course you'd think that the D stage bandwidth was the bottleneck because they had such a back high a large number of back-end spindles the bandwidth coming out of that DRAM was enormous you just described something actually quite similar so that I was going to ask you is it the D stage bandwidth the bottleneck and you're saying no because your D stage being what there's actually three tighter than the D rate up it is so with the symmetric some typical platforms you would have a certain amount of disk in a disk group and you would assign a phase and Fiber Channel ports to that and there'd be certain segments in cash that would dedicated those discs we have done away with that we have so many well with two layers of the virtualization at the front as we talked about but because nothing is a bottleneck and because we've optimized each component the DRAM and I talked about the SSDs we don't write heavily over those we write in a sequential pattern to the SSD so that the wear rate is elongated and so because of that and we have all the virtualized raid groups configured in cache so what happens is as we get to that five-minute window we're about 2 D state all of the raid groups the al telling the cash how to lay out the virtual raid structure based on how busy or the raid groups are at the time so if you were to pause it and ask us where it's going we can tell you it's the Machine line it's the artificial intelligence of saying this raid group just took a D stage you know or there's a lot of data in the cache that's heading for these but based upon the the prediction of the heart the cold that I talked about a few months ago and so it will make a determination to use a different virtual rater and that's all done in memory as opposed to to rely on the disk so we're not we don't have the concept of spare disk we have the concept of spare capacity it's all shared and because it's all shared it's this very powerful pool that just doesn't get bogged down and continues to operate all the way up to the full capacity so I'm struggling with this there is no bottleneck because there's always a problem that can assure them so where is the bottleneck the ball net for us is when the erase fault so if you overwrite the maximum bandwidth and that historically you know in in 2016-2017 was a roughly 12 cube per second we got that in the fall 2018 to roundabout 15 and we're about to make the announcement that we've made tectonic increases in that where will now have right bandwidth approach in 16 gig per second and also read bandwidth about 25 K per second that 16 is going to move up to 20 remember what I said we release a number and we gradually grow into it and and and maximize and tweak that software when you think that most or flash arrays can do maybe one and a half gig per second sustained writes that gives us a massive leg up over our competition instead of buying an all flash array for this and another mid-tier array for this and coal social this you can just buy one platform that services at all all the protocols and they're all access the same way so you write an API one way mark should almost as big fan of this about writing code obviously was spinnaker and some of those other things that he's been involved in and we do the same thing so our API is the same for the block as it is for the NAS as it is for the ice cozy so it's it's very consistent you write it once and you can adapt multiple products well I think you bring about customers for short bit everybody talks about digital transformation and it's this big buzzword but when you talk to customers they're all going through some kind of digital transformation oh they want to get digital right let's put it that way yeah I don't want to get disrupted they see Amazon buying grocers and while getting into the financial services and content and it's all about the data so there's a real disruption scenario going on for every business and and the innovation engine seems to be data okay but data just sitting there and a data swamp is no good so you got to apply machine intelligence for that to that data and you got to have scale mm-hmm do you guys make a big deal about about petabyte scale yeah what are your customers telling you about the importance of that and how does it fit into that innovation sandwich that I just laid out sure no it's great question so we have some very because we're so have 70 petabytes of production over those 70 yep we have a couple of those both financial institutions very very good at what they do we worked with them previously with a with another product that really kind of introduced another one of most Shea's products that was XIV that introduced the concepts of self-healing and no tuning and things like we don't even talked about that there's no tuning knobs on the infinite I probably should mention that but our customers said have said to us we couldn't scale you know we had a couple hundred terabyte boxes before there were okay you know you've brought you've raised the game by bringing in a much higher level of availability and much higher capacity we can take one of our but I'm in this process right now the customer we can take one of our boxes and collapse three vmax 20 of VMAX 40s on it we have numerous occassions gone into establishments that have 11 12 23 inch cabinets two and a half thousand spindles of the old DMC VMO station we've replaced it with one 19-inch rack of arts right that's a phenomenal state when you think about it and that was paid for you think some of these v-max 47 it's 192 ports on them Fiber Channel ports we have 24 so the fibre channel port reduction the power heating and cooling over an entire row down to one eight kilowatt consumption by the way our power is the same whether it's three four terabytes six eight twelve they all use the same power plan so as we increase the geometry capacity of the drives we decrease the cost per usable well we're actually far more efficient than all fly sharing with the most environmentally friendly hybrids been in this planet on the array so asking about cloud so miss gray on the planet that would be yeah so when cloud first sort of came out of the division Financial Services guys are like no clouds that's a bad word they're definitely you know leaning into that adopting it more but still there's a lot of workloads that they're gonna leave on Prem they want to that cloud experience to the data what are you hearing from the financial services customers in particular and I and I've single them out because they're they're very advanced they're very demanding they are they a lot of dough and so what do you see in terms of them building cloud hybrid cloud and and what it means for for them and specifically the storage industry yeah so I'm actually surprised that they've adopted it as much as they have to be honest with you and I think the the economics are driving that but having said that whenever they want to get the data back or they want to bring it back home prime for various reasons that's when they're running into problems right it's it's like how do I get my own data back well you've got to open up the checkbook and write big checks so I think infini debt has a nice strategy there where we have the same capabilities that you have on prime you having the cloud don't forget nobody else has that one of the encumbrances to people move into the cloud has been that it lacks the enterprise functionality that people are used to in the data center but because our cost point is so affordable we become not only very attractive or four on Prem but for cloud solutions as well of course we have our own new tricks cloud offering which allows people to use as dr or replications and so however you want to do it where you can use the same api's and code that your own dis and extrapolate that out to the cloud I was there which is which is very helpful and so we have the ability if you take a snapshot on Amazon it may take four hours and it's been copied over to an s3 device that's the only way they can make it affordable to do it and then if you need that data back it's it's not it's not imminent you've got to rehydrate from s3 and then copy it back over your snapshot with infinite data its instantaneous we do not stop i/o when we do snapshots and another one the patterns we use the time synchronous mechanism every every AO the rise has a timestamp and we when we take a snapshot we just do a point in time and in a timestamp that's greater than that instantiation point is for the volume and previous is for the snapshot we can do that in the cloud we can instantly recover hundreds of terabytes worth of databases and make them instantly available so our story again with the innovation our innovation wasn't just for for on pram it was to be facilitated anyway you are and that same price point carries forward from here into the cloud when Amazon and Microsoft wake up and realized that we have this phenomenal story here I think they'll be buying from us in leaps and bounds it's it's the only way to make the cloud affordable for storage vendors so these are the things you talk about you know bringing bringing data back and bringing workloads back and and there are tool chains that are now on Prem the kubernetes is a great example that our cloud like and so when you bring data back you want to have that cloud experience so automated operations plays into that you know automation used to be something that people are afraid of and they want to do do manual tearing member they wanted their own knobs to turn those days are gone because people want to drive digital transformations they don't want to spend time doing all this heavy lifting I'm talk about that a little bit and where you guys fit yeah I mean you know I say to my customers to not to knock our competition but you can't have a service processor as the inter communication point between what the customer wants and it deciding where it's going to talk to the Iranian configure it's going to be instantaneous and so we all we have we don't have any Java we don't have any flash we don't have any hosts we don't have massive servers around the data center collecting information we just have an html5 interface and so our time to deployment is very very quick when we land on the customer's dark the box goes in we hook up the power we put the drives in we're Haiti's the word V talk because it brings back memories for a lot of course I am now we're going back in time right knowing that main here and so we're very dynamic both in how we forward face the customers but also on the backend for ourselves we eat our own dog food in the sense that we are we have an automation team we've automated our migration from non infinite out platforms towards that uses some level of artificial intelligence we've also built a lot of parameters around things like going with ServiceNow and custom sites because well you can do with our API what other people take you know page and page of code I'll give you an example one of our customers said I need OC i the the let-up management product we called met up and they said hey listen you know it usually takes six months to get an appointment and that it takes at least six months to do the comb we said no no we're not like any other storage render we don't have all these silly raid groups and spare disk capacity you know this weave three commands we can show in the API and we showed them the light Wow can you send us an array we said no we can do something better we were designed SDS right when when infinite out was coded there was no hardware and the reason we did that is because software developers will always code to the level of resilience of the hardware so if you take away that Hardware the software developers have to code to make something to withstand any type of hardware that comes in and at the end of the coding process that's when we started bringing in the hardware pieces so we were written STS we can send vendors and customers a an OVA a virtual appliance of our box they were able to the in a week they told the custom we have to go through full QA no reason why it wouldn't work and they did it for us and got it was a massive customer of theirs and ours that's a powerful story the time to deployment for your homegrown apps as well as things like ServiceNow an MCI incredible infinite out three API calls we were done so you guys had a little share our partnership with met up in the field we did yeah I mean was great they had a massive license with this particular customer they wanted our storage on the platform and we worked very very quickly with them they were very accommodating and we'd love to get our storage qualified behind their behind their heads right now for another customer as well so yeah there's definitely some sooner people realize what we have a Splunk massive for us what we're able to do was plunk in one box where people the competitors can't do in a row so it so it's very compelling what we actually bring in how we do it and that API level is incredibly powerful and we're utilizing that ourselves I would like to see some integration with canonical Marshall what these guys have done a great job with SDS plays we'd like to bring that here do spinnaker do collect if I could do some of those things as well that we're working on the automation we just added another employee another FTE to the automation team and infinite out so we do these and we engage with customers and we help you get out of that trench that is antiquity and move forward into the you know into the vision of how you do one thing well and it permeates the cloud on primary and hybrid all those guys well that API philosophy that you have in the infrastructure is code model that you just described allows you to build out your ecosystem in a really fast way so Greg thanks so much for coming on thank you and doing that double click with this really I'd love to have you back great thanks a lot Dave all right thank you welcome thank you for watching you're watching the cube and this is Dave Volante we'll see you next time

Published Date : Apr 19 2019

SUMMARY :

do that in the cloud we can instantly

ENTITIES

Entity	Category	Confidence
six months	QUANTITY	0.99+
Dave Volante	PERSON	0.99+
Craig Hibbert	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
five	QUANTITY	0.99+
nine minutes	QUANTITY	0.99+
five minutes	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
70	QUANTITY	0.99+
18 minutes	QUANTITY	0.99+
Craig Hibbert	PERSON	0.99+
five-minute	QUANTITY	0.99+
April 2019	DATE	0.99+
Dave	PERSON	0.99+
Greg	PERSON	0.99+
2 millisecond	QUANTITY	0.99+
30 terabyte	QUANTITY	0.99+
618 minutes	QUANTITY	0.99+
1.2 petabytes	QUANTITY	0.99+
12 terabyte	QUANTITY	0.99+
four hours	QUANTITY	0.99+
David	PERSON	0.99+
six years	QUANTITY	0.99+
64k	QUANTITY	0.99+
two stories	QUANTITY	0.99+
five minute	QUANTITY	0.99+
one	QUANTITY	0.99+
2016-2017	DATE	0.99+
70 petabytes	QUANTITY	0.99+
hundreds of terabytes	QUANTITY	0.99+
vmax 20	COMMERCIAL_ITEM	0.99+
480 spindles	QUANTITY	0.99+
16	QUANTITY	0.99+
Boston Massachusetts	LOCATION	0.98+
200	QUANTITY	0.98+
192 ports	QUANTITY	0.98+
four terabyte	QUANTITY	0.98+
Java	TITLE	0.98+
11	QUANTITY	0.98+
Shea	ORGANIZATION	0.98+
three terabyte	QUANTITY	0.98+
v-max 47	COMMERCIAL_ITEM	0.98+
VMAX 40s	COMMERCIAL_ITEM	0.98+
today	DATE	0.98+
fall 2018	DATE	0.97+
1.3 a petabyte	QUANTITY	0.97+
three times	QUANTITY	0.97+
a hundred and thirty-eight patents	QUANTITY	0.97+
ServiceNow	TITLE	0.97+
12	QUANTITY	0.97+
under 80 minutes	QUANTITY	0.96+
19-inch	QUANTITY	0.96+
one last point	QUANTITY	0.95+
two layers	QUANTITY	0.95+
about 25 K per second	QUANTITY	0.95+
Haiti	LOCATION	0.95+
both	QUANTITY	0.94+
two and a half thousand spindles	QUANTITY	0.94+
about 15k per second	QUANTITY	0.93+
two	QUANTITY	0.93+
eight	QUANTITY	0.93+
about 2 milliseconds	QUANTITY	0.93+
one box	QUANTITY	0.93+
a hundred	QUANTITY	0.93+
infinite	ORGANIZATION	0.92+
each one	QUANTITY	0.92+
16 gig per second	QUANTITY	0.91+
one and a half gig per second	QUANTITY	0.91+
a week	QUANTITY	0.9+
Infinidat	ORGANIZATION	0.9+
each component	QUANTITY	0.9+
one platform	QUANTITY	0.89+
three commands	QUANTITY	0.89+
forty eight gigabit	QUANTITY	0.88+
a lot of data	QUANTITY	0.88+
four terabytes	QUANTITY	0.87+
both financial institutions	QUANTITY	0.87+
first right	QUANTITY	0.87+
up to 20 terabyte	QUANTITY	0.87+
s3	TITLE	0.87+
three API calls	QUANTITY	0.86+
a few months ago	DATE	0.86+
24	QUANTITY	0.86+
100 mics per second	QUANTITY	0.85+

Mike McNamara, NetApp | DataWorks Summit 2018

>> Live, from San Jose, in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2018. Brought to you by Hortonworks. >> Welcome back everyone to theCUBE's live coverage of DataWorks here in San Jose, California. I'm your host, Rebecca Knight, along with my cohost James Kobielus. We are joined by Mike McNamara, he is the Senior Product and Solutions Marketing at NetApp. Thanks so much for coming on theCUBE. >> Thanks for having me. >> You're a first timer, >> Yes, >> So this is very exciting! >> Happy to be here. >> Welcome. >> Thanks. >> So, before the cameras were rolling, we were talking about how NetApp has been in this space for a while, but is really just starting to be recognized as a player. So, talk a little bit about your company's evolution. >> Sure. So, in the whole analytic space, is something NetApp was in a long time ago, and then sort got out of it, and then over the last several years, we've gotten back in, and we recognize it's a huge opportunity for data storage, data management, if you look at IDC Data, massive, massive market, but, the opportunity for us, is like you know what, they're mainly using a direct attached storage model where compute and storage is tied together. And now, with data just exploding, and growing like crazy, it's always been growing, but now it seems like it's just growing like crazy now, that, and customers wanting to have data on-prem, but also being able to move it off to the cloud, we're like, hey this is a great opportunity for us to come in with a solution that's, external storage solution that can come in and show them the benefits of have a more reliable, have an opportunity to move their data off to the cloud, we've got great solutions with that, so it's gone well, but it's been a little bit different, like at this show, a lot of the people, the data scientists, data engineers, some who know us, some still don't like, so, NetApp, what do you guys do, and so it's a little bit of an education, 'cause it's not a traditional buyer, if you will, we look at them as influencers, but it's only one influence than we traditionally have sold to say Vice President of Infrastructure, as an example, or maybe a Director of Storage Admin, but most of those folks are not here, so we're, this is just kind of a new market for us that we're making inroads. >> How do data scientists, or do they influence the purchase of storage solutions, or data management solutions? >> Sure, so they want to have access to the data, they want to be able analyze it quickly and effectively, they want to make sure it's always available, you know, at their fingertips so to speak. We can help them by giving them very fast, very reliable solutions, and specially with our software, they want to do for example, do some virtual clone of that data, and just do some testing on that without impacting their production data, we can do that in a snap, so we can make their lives a lot easier, so we can show them how, hey, mister data scientist, we can make your life a little easier-- >> Or miss data scientist. >> Or miss, we were talking about that, >> There are a lot of women in this field. >> Yeah, yeah. >> More than we realize, and they're great. >> So we can help you do your job better, and then, that, him or her can then influence who's making the purchase decisions. >> Yeah, training sets, test sets, validation sets of data for the machine learning and analytics development pipeline, yes, you need a solid storage infrastructure to do it right. >> Absolutely. >> So, when you're getting inside the head of your potential buyer here, the VP of Infrastructure, or data admin, what is it that you're hearing from those people most, what are their concerns, what keeps them up at night, and where do you come in? >> Yeah, so one of the concerns is, often times, you're, hey, how do I, do you have a cloud storage, connected to the cloud, you know, I'm doing things on-prem now, but is there a path, so that's a big one. And we, NetApp, pride ourselves on being the most cloud-connected, all flash storage in the industry. So, that's a big focus, big push for us. If you saw our marketing, it shows data authority for the hybrid cloud, so we really honestly do, whether it's with Google, or Azure, or AWS, we know our software runs in those environments, it also runs on-premises, but because it's the same on-tap software, we can move data between those environments. So, we get a real good storage, so we can you know, boom, check the box, we got you covered if you want to utilize the cloud, and I think the next piece of that is just from a protecting, protecting the data, you know, again I said data is just growing so much, I want to make sure it's always available, and we can back it up and all that, and that's been a core, core strength, versus like a lot of these traditional solutions they've been using, these direct attached models, they just don't have anywhere near the enterprise-grade data protection that NetApp has always prided itself on, over many decades now. And so, we can help them do that, and quite honestly, a lot of people think, well you know, you guys are external storage, how do you compare versus direct attached storage from our total cost, that's another one. I can tell you definitively, and we've got data to back it up from a total cost of ownership point of view, because of the fact that, of the advantages we bring from, up-time, and you know from RAID, but you know, in a Hadoop environment, often times there's three copies of data. With our solution, a good piece of software, there's only one copy of your data, so have three versus one is a big saving, but even what we do with the data, compressing it, and compacting it, a lot of benefits. So, we do have honest to goodness, outwards to 50% better total cost of ownership, versus a DAS model. >> Do you use machine learning within your portfolio? I'm hearing of more stories, >> Great question, yeah. >> Incorporating machine learning to automate or facilitate more of the functions in the data protection or data management life-cycle. >> Yeah, that's a great question, and we do use, so we've got a piece of software which we call Active IQ, it was referred to as Ace Update, you may have, it may ring a bell, but to answer your question, so we've got thousands of thousands of NetApp systems out there, and those customers that allow us, we have, think of it as kind of a call home feature, where we're getting data back from all our installed customers, and then we will go and do predictive analytics, and do some machine learning on that data, so then we can go back to those customers and say, hey you know what, you've got this volume that's unprotected, you should protect this, or we can show them, if you were to move that data off into our cloud environment, here's maybe performance you would see, so we do do a lot of that predictive-- >> Predictive performance assessment, it sounds like there's anomaly detection in there as well. >> Anomaly as well, letting them know, hey, you know, it's time for this drive, it may fail on you, let's ship you out a new drive now before it happens, so yeah, a lot of, from an analytics, predictive analysis going on. And you know, it's a huge benefit to our customers. Huge benefit. >> I know you're also doing a push toward artificial intelligence, so I'd like to hear more about that, and then also, if there's any best practices that have emerged. >> Sure, sure, so yes. That is another big area, so it's kind of a logical progression from where we were, if you will, in the analytics space, data lakes, but now moving into artificial intelligence, which has always been around, but it's really taking more of a more prominent role, I mean just a quick fun fact, I read that, you know that at the royal wedding that recently happened, did you know that Amazon used artificial intelligence to help us, the TV viewer, identify who the guests were. >> Ooh. >> So, you know it's like, it's everywhere, right? And so for us, we see that trend, a ton of data that needs to be managed, and so we kind of look at it from the edge to the core, to the cloud, those three, not pillars, but directional ways, taking data from IOT centers at the edge, bring it into the core, doing training, and then if the customer so chooses, out to the cloud. So, yeah it is a big push for us now, and we're going a lot with Nvidia, is a key partner with us. >> Really? This is a bit futuristic, but I can see a role going forward for AI to look into large data volumes, like video objects, to find things like faces, and poses and gestures and so forth, and see, to use that intelligence to be able to reduce the data sets down to where it's reduced, to de-duplicate, so that you can use less storage and then you can re-construct the original video objects or whatever going forward, I mean as a potential use of AI within the storage efficiency. >> Yep, yeah you're right, and that again, like in the analytic space, how we roll our in-line efficiency capabilities and data protection, is you know, very important, and then being able to move the data off into the cloud, if the customer so chooses, or just wants to use the cloud. So yeah, some of the same benefits from cloud connectivity, performance and efficiency that analytics apply certainly to AI. You know, another fun fact too about AI, which might help us, you and I living in the Boston area, is that I've read IBM has a patent out to use AI in traffic signaling, so in conjunction with cameras, to get AI, so hopefully that, you know, that works well it could alleviate-- >> Lead them out of the Tip O'Neill tunnel easy. (laughing) >> You got it maybe worse in D.C. (laughing) >> I'd like to hear though, if you have any best practices that with this moving into AI, how are you experimenting with it, and how are you finding it used most efficiently and effectively. >> Yeah, so I think one way we are eating our own dog food, so to speak, in that we're using it internally, we're using it on our customers' data, as I was explaining to help look at trends, and do analysis. So that's one, and then it's other things, just you know, partnering with companies like Nvidia as well and coming out with a joint solution, so we're doing work with them on different solution areas. >> Great, great. Well, Mike thanks so much for coming on theCUBE, >> Thanks for having me! >> It was fun having you. >> You survived! >> Yes! (laughs) >> We'll look forward to many more CUBE conversations. >> Great to hear from NetApp, you're very much in the game. >> Indeed, indeed. >> Alright, thank you very much. >> I'm Rebecca Knight for James Kobielus, we will have more from theCUBE's coverage of DataWorks coming up in just a little bit. (electronic music)

Published Date : Jun 20 2018

SUMMARY :

Brought to you by Hortonworks. he is the Senior Product and So, before the cameras were rolling, and we recognize it's a huge opportunity so we can show them how, More than we realize, So we can help you do your job better, yes, you need a solid storage boom, check the box, we got you covered more of the functions it sounds like there's anomaly And you know, it's a huge so I'd like to hear you know that at the royal from the edge to the core, so that you can use less so hopefully that, you Lead them out of the You got it maybe worse in D.C. that with this moving into AI, how are you so to speak, in that for coming on theCUBE, We'll look forward to Great to hear from NetApp, we will have more from theCUBE's coverage

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Mike McNamara	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Nvidia	ORGANIZATION	0.99+
Mike	PERSON	0.99+
50%	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
Silicon Valley	LOCATION	0.99+
Google	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
San Jose, California	LOCATION	0.99+
D.C.	LOCATION	0.99+
Boston	LOCATION	0.99+
one copy	QUANTITY	0.99+
three	QUANTITY	0.99+
one	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.98+
DataWorks Summit 2018	EVENT	0.98+
three copies	QUANTITY	0.98+
NetApp	ORGANIZATION	0.97+
Hortonworks	ORGANIZATION	0.94+
Ace Update	TITLE	0.91+
IDC	ORGANIZATION	0.88+
Azure	ORGANIZATION	0.86+
thousands of thousands	QUANTITY	0.86+
NetApp	TITLE	0.82+
RAID	TITLE	0.8+
DataWorks	EVENT	0.76+
Vice President of Infrastructure	PERSON	0.71+
Active IQ	TITLE	0.69+
one influence	QUANTITY	0.69+
a lot of the people	QUANTITY	0.66+
of women	QUANTITY	0.66+
last	DATE	0.65+
years	DATE	0.63+
CUBE	ORGANIZATION	0.56+
ton	QUANTITY	0.54+
first	QUANTITY	0.53+
DataWorks	TITLE	0.51+
NetApp	QUANTITY	0.4+

Keegan Riley, HPE | VMworld 2017

>> Announcer: Live from Las Vegas it's theCUBE covering VMWorld 2017. Brought to you by VMware and its ecosystem partners. >> Okay, welcome back everyone. Live CUBE coverage here at VMWorld 2017. Three days, we're on our third day of VMWorld, always a great tradition, our eighth year. I'm John Furrier with theCUBE co-hosted by Dave Vellante of Wikibon and our next guest is Keegan Riley, vice president and general manager of North American storage at HP Enterprise. Welcome to theCUBE. >> Thank you, thanks for having me. >> Thanks for coming on, love the pin, as always wearin' that with flair. Love the logo, always comment on that when I, first I was skeptical on it, but now I love it, but, HP doing great in storage with acquisitions of SimpliVity and Nimble where you had a good run there. >> Keegan: Absolutely. >> We just had a former HPE entrepreneur now on doing a storage startup, so we're familiar with he HPE storage. Good story. What's the update now, you got Discover in the books, now you got the Madrid coming up event. Software to find storage that pony's going to run for a while. What's the update? >> Yeah, so appreciate the time, appreciate you having me on. You know, the way that we're thinking about HPE's storage it's interesting, it's the company is so different, and mentioned to you guys when we were talking before that I actually left HP to come to Nimble, so in some ways I'm approaching the gold pin for a 10 year anniversary at HP. But the-- >> And they retro that so you get that grand floated in. >> Oh, absolutely, absolutely, vacation time carries over it's beautiful. But the HPE storage that I'm now leading is in some ways very different from the HP storage that I left sic years ago and the vision behind HPE's storage is well aligned with the overall vision of Hewlett-Packard Enterprise, which is we make hybrid IT simple, we power the intelligent edge, and we deliver the services to empower organizations to do this. And the things that we were thinking about at Nimble and the things that we're thinking about as kind of a part of HPE are well aligned with this. So, our belief is everyone at this conference cares about whether it's software defined, whether it's hybrid converge, whether it's all flash so on and so forth, but in the real world what clients tend to care about is kind of their experience and we've seen this really fundamental shift in how consumers think about interacting with IT in general. The example I always give is you know I've been in sales my whole career, I've traveled a lot and historically 15 years ago when I would go to a new city, you know, I would land and I would jump on a airport shuttle to go rent a car and then I would pull out a Thomas Guide and I would go to cell C3 and map out my route to the client and things like that. And so I just expected that if I had a meeting at 2:00 p.m., I needed to land at 10:00 a.m., to make my way to, that was just my experience. Cut to today, you know, I land and I immediately pull out my iPhone and hail an Uber and you know reserve an Airbnb when I get there and I, for a 2:00 p.m. meeting I can land at 1:15 and I know Waze is going to route me around traffic to get there. So, my experience as a consumer has fundamentally changed and that's true of IT organizations and consumers within those organizations. So, IT departments have to adapt to that, right? And so a kind of powering this hybrid IT experience and servicing clients that expect immediacy is what we're all about. >> Okay, so I love that analogy. In fact when we were at HP Discover we kind of had this conversation, so as you hailed that Uber, IT wants self driving storage. >> Keegan: Absolutely. >> So, bring that, tie that back, things that we talk a lot about in kind of a colorful joking way, but that is the automation goal of storage is to be available. We talk about edge, unstructured data, moving compute to the edge, it's nuanced now, storage and compute all this where they go through software. Self driving storage means something, and it's kind of a joke on one hand, but what does it actually mean for an IT guy? >> No, that's a great question and this is exactly the way that we think about it. An the self driving car analogy is a really powerful one, right? And so the way we think about this, we're delivering a predictive cloud platform overall and notice that's not a predictive cloud storage conversation and it's a big part of why it made a ton of sense for Nimble storage to become a part of HPE. We brought to bear a product called InfoSight that you might be familiar with. The idea behind InfoSight is in a cloud connected world the client should never know about what's going on in their infrastructure than we do. So, we view every system as being at the edge of our network and for about seven years now we've been collecting a massive amount of information about infrastructure, about 70 million telemetry points per day per system that's coming back to us. So, we have a massive anonymized dataset about infrastructure. So, we've been collecting all of the sensor data in the same way that say Uber or Tesla has been collecting sensor data from cars, right, and the next step kind of the next wave of innovation, if you will, is, okay it's great that you've collected this sensor data, now what do you do with it? Right? And so we're starting to think about how do you put actuators in place so that you can have an actual self managing data center. How can you apply a machine learning and global kind of corelation in a way that actually applies artificial intelligence to the data center and makes it truly touchless and self managing and self healing and so on and so forth. >> So, that vision alone is when, well, I'm sure when you pitched that to Meg, she was like,"Okay, that sounds good, "let's buy the company." But as well, there was another factor, which was the success that Nimble was having. A major shift in the storage market and you can see it walking around here is that over the last five, seven years there's been a shift from the storage specialist expert at managing LUNs and deploying and tuning, to the sort of generalist because people realize, look, there's no competitive advantage. So, talk about that and how the person to whom you've sold and your career has changed. >> Yeah, no, absolutely, it's a great point. And I think it's in a lot of ways it goes to, you're right, obviously Meg and Antonio saw a lot of value in Nimble Storage. The value that we saw as Nimble Storage is as a standalone storage company with kind of one product to sell. You know there's a saying in sales that if you're a hammer everything looks like a nail, right. And so, it's really cool that we could go get on a whiteboard and explain why the Castle file system is revolutionary and delivers superior IOPs and so on and so forth, but the conversation is shifting to more of a solutions conversation. It moves to how do I deliver actual value and how do I help organizations drive revenue and help them distinguish themselves from their competitors leveraging digital transformation. So, being a part of a company that has a wide portfolio and applying a solutions sales approach it's game changing, right. Our ability to go in and say, "I don't want to tell you about the Nimble OS, "I want to hear from you what your challenges are "and then I'm going to come back to you with a proposal "to help you solve those challenges." It's exciting for our sales teams, frankly, because it changes our conversations that makes us more consultative. >> Alright, talk about the some of the-- >> Value conversations. >> Talk about the sales engagement dynamic with the buyer of storage, especially you mentioned in the old days, now new days. A new dynamic's emerging we've identified on theCUBE past couple days and I'll just kind of lay it out for you and I want you to get a reaction. I'm the storage buyer of old, now I'm the modern guy, I got to know all the ins and outs of speeds and feeds against all the competitors, but now there's a new overlay on top of which is a broader picture across the organization that has compute, that has edge, so I feel more, not deluded from storage, but more holistic around other things, so I have to balance both worlds. I got to balance the, I got to know and nail the storage equation. >> Yeah. >> Okay, at well as know the connection points with how it all works, kind of almost as an OS. How do you engage in that conversation? 'Cause it's hard, right? 'Cause storage you go right into the weeds, speeds and feeds under the hood, see our numbers, we're great, we do all this stuff. But now you got to say wait a minute, but in a VM environment it's this, in a cloud it's like this and there's a little bit of bigger picture, HCI or whatever that is. How do you deal with that? >> No, absolutely, and I think that's well said. I mean, I think the storage market historically has always been sort of, alright, do you want Granny Smith apples or red delicious apples? It always sort of looked the same and it was just about I can deliver x number of IOPs and it became a speeds and feeds conversation. Today, it's not just not apples to apples, it's like you prefer apples, pineapples, or vacuum cleaners. Like, there's so many different ways to solve these challenges and so you have to take the conversation to a higher level, right. It has to be a conversation about how do you deliver value to businesses? And I think, I hear-- >> It gets confusing to the buyers, too, because they're being bombarded with a lot of fudd and they still got to check the boxes on all the under the hood stuff, the engine's got to work. >> And they come to VMWorld and every year there's 92 new companies that haven't heard of before that are pitching them on, hey, I solve your problems. I think what I'm hearing from clients a lot is they don't necessarily want to think about the storage, they don't want to think about do I provision RAID 10 or RAID five and do I manage this aggregate in this way or that way, they don't want to think about, right. So, I think this is why you're seeing the success of these next generation platforms that are radically simple to implement, right, and in some ways at Nimble, wen we were talking to some of these clients to have sort of a legacy approach to storage where you got like a primary LUN administrator, there's nothing wrong with that job, it's a great job and I have friends who do that job, but a lot of companies are now shifting to more of a generalist, I manage applications and I manage you know-- >> John: You manage a dashboard console. >> Exactly, yeah, so you have to make it simple and you have to make it you don't have to think about those things anymore. >> So, in thinking about your relationship over the years with VMware, as HP, you are part of the cartel I call it, the inner circle, you got all the APIs early, all the, you know, the CDKs or SDKs early. You know, you were one of the few. You, of course EMC, NetApp, all the big storage players, couple of IBM, couple others. Okay, and then you go to Nimble, you're a little guy, and it's like c'mon hey let's partner! Okay and so much has changed now that you're back at HPE, how has that, how is it VMware evolved from a ecosystem partner standpoint and then specifically where you are today with HPE? >> That's a great question and I remember the early days at Nimble when you know we were knocking on the door and they were like, "Who are you again? "Nimble who?" And we're really proud of sort of the reputation that we've earned inside of VMware, they're a great partner and they've built such a massive ecosystem, and I mean this show is incredible, right. They're such a core part of our business. At Nimble I feel like we earned sort of a seat at that table in some ways through technology differentiation and just grit and hustle, right. We kind of edged our way into those conversations. >> Dave: Performance. >> And performance. And we started to get interesting to them from a strategic perspective as just Nimble Storage. Now, as a part of HPE, HPE was, and in some ways as a part of HPE you're like, "Oh, that was cute." We thought we were strategic to VMware, now we actually are very strategic to VMware and the things that we're doing with them. From an innovation perspective it's like just throwing fuel on the fire, right. So, we're doubling down on some of the things we're doing around like VM Vision and InfoSight, our partnership with Visa and on ProLiant servers, things like that, it's a great partnership. And I think the things that VMware's announced this week are really exciting. >> Thank you, great to see you, and great to have you on theCUBE. >> Thank you so much. >> I'll give you the last word. What's coming up for you guys and HP storage as the vice president general manager, you're out there pounding the pavement, what should customers look for from you guys? >> No, I appreciate that. There's a couple things. So, first and foremost are R&D budget just got a lot bigger specifically around InfoSight. So, you'll see InfoSight come to other HPE products, 3PAR, ProLiant servers so on and so forth and InfoSight will become a much more interesting cloud based management tool for proactive wellness in the infrastructure. Second, you'll see us double down on our channel, right. So, the channel Nimble's always 100% channel, SimpliVity was 100% channel, HPE Storage is going to get very serious about embracing the channel. And third, we're going to ensure that the client experience remains top notch. The NPS score of 85 that Nimble delivered we're really proud of that and we're going to make sure we don't mess that up for our clients. >> You know it's so funny, just an observation, but I worked at HP for nine years in the late '80s, early '90s and then I watched and been covering theCUBE for over seven years now, storage is always like the power engine of HPE and no matter what's happening it comes back down to storage, I mean, the earnings, the results, the client engagements, storage has moved from this corner kind of function to really strategic. And it continues that way. Congratulations. >> Thank you so much. Appreciate the time. >> Alright, it's theCUBE. Coming up Pat Gelsinger on theCUBE at one o'clock. Stay with us. Got all the great guests and alumni and also executives from VMware coming on theCUBE. I'm John Furrier, Dave Vellante. We'll be right back with more live coverage after this short break.

Published Date : Aug 30 2017

SUMMARY :

Brought to you by VMware and its ecosystem partners. Welcome to theCUBE. of SimpliVity and Nimble where you had a good run there. What's the update now, you got Discover in the books, and mentioned to you guys when we were talking before and the things that we're thinking about as kind of conversation, so as you hailed that Uber, and it's kind of a joke on one hand, actuators in place so that you can have an actual self So, talk about that and how the person to whom you've "and then I'm going to come back to you with a proposal and I want you to get a reaction. 'Cause storage you go right into the weeds, It has to be a conversation about how do you deliver and they still got to check the boxes on all of a legacy approach to storage where you got like and you have to make it you don't have to think Okay, and then you go to Nimble, you're a little guy, and they were like, "Who are you again? and the things that we're doing with them. and great to have you on theCUBE. I'll give you the last word. and we're going to make sure we don't mess that up corner kind of function to really strategic. Thank you so much. and also executives from VMware coming on theCUBE.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Uber	ORGANIZATION	0.99+
Nimble	ORGANIZATION	0.99+
Tesla	ORGANIZATION	0.99+
VMware	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
IBM	ORGANIZATION	0.99+
HPE	ORGANIZATION	0.99+
HP	ORGANIZATION	0.99+
Meg	PERSON	0.99+
Pat Gelsinger	PERSON	0.99+
Keegan	PERSON	0.99+
2:00 p.m.	DATE	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
Keegan Riley	PERSON	0.99+
Nimble Storage	ORGANIZATION	0.99+
John	PERSON	0.99+
10:00 a.m.	DATE	0.99+
Visa	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Las Vegas	LOCATION	0.99+
92 new companies	QUANTITY	0.99+
100%	QUANTITY	0.99+
InfoSight	ORGANIZATION	0.99+
Hewlett-Packard Enterprise	ORGANIZATION	0.99+
Second	QUANTITY	0.99+
1:15	DATE	0.99+
eighth year	QUANTITY	0.99+
today	DATE	0.99+
nine years	QUANTITY	0.99+
VMWorld 2017	EVENT	0.99+
Today	DATE	0.99+
one o'clock	DATE	0.99+
third	QUANTITY	0.98+
VM Vision	ORGANIZATION	0.98+
VMWorld	ORGANIZATION	0.98+
both worlds	QUANTITY	0.98+
15 years ago	DATE	0.98+
early '90s	DATE	0.98+
first	QUANTITY	0.98+
third day	QUANTITY	0.98+
about seven years	QUANTITY	0.98+
this week	DATE	0.98+
Three days	QUANTITY	0.97+
SimpliVity	ORGANIZATION	0.97+
late '80s	DATE	0.97+
VMWorld	EVENT	0.97+
HP Discover	ORGANIZATION	0.97+
85	QUANTITY	0.97+
Antonio	PERSON	0.96+
over seven years	QUANTITY	0.96+
10 year anniversary	QUANTITY	0.96+
Wikibon	ORGANIZATION	0.96+
EMC	ORGANIZATION	0.95+
HP Enterprise	ORGANIZATION	0.95+
Madrid	LOCATION	0.94+
couple	QUANTITY	0.94+
LUN	ORGANIZATION	0.93+
one product	QUANTITY	0.92+
seven years	QUANTITY	0.92+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for RAID 5: