Renen Hallak & David Floyer | CUBE Conversation 2021
(upbeat music) >> In 2010 Wikibon predicted that the all flash data center was coming. The forecast at the time was that flash memory consumer volumes, would drive prices of enterprise flash down faster than those of high spin speed, hard disks. And by mid decade, buyers would opt for flash over 15K HDD for virtually all active data. That call was pretty much dead on and the percentage of flash in the data center continues to accelerate faster than that, of spinning disk. Now, the analyst that made this forecast was David FLoyer and he's with me today, along with Renen Hallak who is the founder and CEO of Vast Data. And they're going to discuss these trends and what it means for the future of data and the data center. Gentlemen, welcome to the program. Thanks for coming on. >> Great to be here. >> Thank you for having me. >> You're very welcome. Now David, let's start with you. You've been looking at this for over a decade and you know, frankly, your predictions have caused some friction, in the marketplace but where do you see things today? >> Well, what I was forecasting was based on the fact that the key driver in any technology is volume, volume reduces the cost over time and the volume comes from the consumers. So flash has been driven over the years by initially by the iPod in 2006 the Nano where Steve Jobs did a great job with Samsung and introducing large volumes of flash. And then the iPhone in 2008. And since then, all of mobile has been flash and mobile has been taking in a greater and greater percentage share. To begin with the PC dropped. But now the PCs are over 90% are using flash when there delivered. So flash has taken over the consumer market, very aggressively and that has driven down the cost of flash much much faster than the declining market of HDD. >> Okay and now, so Renen I wonder if we could come to you, we've got I want you to talk about the innovations that you're doing, but before we get there, talk about why you started Vast. >> Sure, so it was five years ago and it was basically the kill of the hard drive. I think what David is saying resonates very, very well. In fact, if you look at our original presentation for Vast Data. It showed flash and tape. There was no hard drive in the middle. And we said 10 years from now, and this was five years ago. So even the dates match up pretty well. We're not going to have hard drives anymore. Any piece of information that needs to be accessible at all will be on flash and anything that is dormant and never gets read will be on tape. >> So, okay. So we're entering this kind of new phase now, with which is being driven by QLC. David maybe you could give us a quick what is QLC? Just give us a bumper sticker there. >> There's 3D NAND, which is the thing that's growing, very very fast and it's growing on several dimensions. One dimension is the number of layers. Another dimension is the size of each of those pieces. And the third dimension is the number of bits which a QLC is five bits per cell. So those three dimensions have all been improving. And the result of that is that on a wafer of, that you create, more and more data can be stored on the whole wafer on the chip that comes from that wafer. And so QLC is the latest, set of 3D NAND flash NAND flash. That's coming off the lines at the moment. >> Okay, so my understanding is that there's new architectures that are entering the data center space, that could take advantage of QLC enter Vast. Someone said they've rented this, a nice set up for you and maybe before we get into the architecture, can you talk a little bit more about the company? I mean, maybe not everybody's familiar with with Vast, you share why you started it but what can you tell us about the business performance and any metrics you can share would be great? >> Sure, so the company as I said is five years old, about 170, 180 people today. We started selling product just around two years ago and have just hit $150 million in run rate. That's with eight sales people. And so, as you can imagine, there's a lot of demand for flash all the way down the stack in the way that David predicted. >> Wow, okay. So you got pretty comfortable. I think you've got product market fit, right? And now you're going to scale. I would imagine you're going to go after escape velocity and you're going to build your moat. Now part of that, I mean a lot of that is product, right? Product is sales. Those are the cool two golden pillars, but, and David when you think back to your early forecast last decade it was really about block storage. That was really what was under attack. You know, kind of fusion IO got it started with Facebook. They were trying to solve their SQL database performance problems. And then we saw pure storage. They hit escape velocity. They drove a truck through EMC sym metrics HDD based install base which precipitated the acquisition of XtremeIO by EMC. Something Renan knows a little bit about having led development, of the product but flash was late to the NAS party guys, Renan let me start with you. Why is that? And what is the relevance of QLC in that regard? >> The way storage has been always, it looks like a pyramid and you have your block devices up at the top and then your NAS underneath. And today you have object down at the bottom of that pyramid. And the pyramid basically represents capacity and the Y axis is price performance. And so if you could only serve a small subset of the capacity, you would go for block. And that is the subset that needed high performance. But as you go to QLC and PLC will soon follow the price of all flash systems goes down to a point where it can compete on the lower ends of that pyramid. And the capacity grows to a point where there's enough flash to support those workloads. And so now with QLC and a lot of innovation that goes with it it makes sense to build an all flash, NAS and object store. >> Yeah, okay. And David, you and I have talked about the volumes and Renan sort of just alluded to that, the higher volumes of NAS, not to mention the fact that NAS is hard, you know files difficult, but that's another piece of the equation here, isn't it? >> Absolutely, NAS is difficult. It's a large, very large scale. We're talking about petabytes of data. You're talking about very important data. And you're talking about data, which is at the moment very difficult to manage. It takes a lot of people to manage it, takes a lot of resources and it takes up a lot, a lot of space as well. So all of those issues with NAS and complexity is probably the biggest single problem. >> So maybe we could geek out a little bit here. You guys go at it, but Renan talk about the Vast architecture. I presume it was built from the ground up for flash since you were trying to kill HTD. What else do we need to know? >> It was built for flash. It was also built for Crosspoint which is a new technology that came out from Intel and micron about three years ago. Cross point is basically another level of persistent media above flash and below Ram. But what we really set out to do is, as I said to kill the hard drive, and for that what you need is to get the price parity. And of course, flash and hard drives are not at price parity today. As David said, they probably will be in a few years from now. And so we wanted to, jumpstart that, to accelerate that. And so we spent a lot of time in building a new type of architecture with a lot of new metadata structures and algorithms on top to bring that effective price down to a point where it's competitive today. And in fact, two years ago the way we did it was by going out to talk to these vendors Intel with 3D Crosspoint and QLC flash Mellanox with NVMe over fabrics, and very fast ethernet networks. And we took those building blocks and we thought how can we use this to build a completely different type of architecture, that doesn't just take flash one level down the stack but actually allows us to break that pyramid, to collapse it down and to build a single system that is as fast as your fastest all flash block device or faster but as affordable as your hard drive based archives. And once that happens you don't need to think about storage anymore. You have a single system that's big enough and cheap enough to throw everything at it. And it's fast enough such that everything is accessible as sub-millisecond latencies. The way the architecture is built is pretty much the opposite of the way scale-out storage has been done. It's not based on shared nothing. The way XtremIO was the way Isilon is the way Hadoop and the Google file system are. We're basing it on a concept called Dis-aggregated Shared Everything. And what that means is that we have the media on one set of devices, the logic running in containers, just software and you can scale each of those independently. So you can scale capacity independently from performance and you have this shared metadata space, that all of the containers can see. So the containers don't actually have to talk to each other in the synchronous path. That means that it's much more scalable. You can go up to hundreds of thousands of nodes rather than just a few dozen. It's much more resilient. You can have all of them fail and you still didn't lose any data. And it's much more easy to use to David's point about complexity. >> Thank you for that. And then you, you mentioned up front that you not only built for flash, but built for Crosspoint. So you're using Crosspoint today. It's interesting. There was always been this sort of debate about Crosspoint It's less expensive than Ram, or maybe I got that wrong but it's persistent, >> It is. >> Okay, but it's more expensive than flash. And it was sort of thought it was a fence sitter cause it didn't have the volume but you're using it today successfully. That's interesting. >> We're using it both to offset the deficiencies of the low cost flash. And the nice thing about QLC and PLC is that you get the same levels of read performance as you would from high-end flash. The only difference between high cost and low cost flash today is in right cycles and in right performance. And so Crosspoint helps us offset both of those. We use it as a large right buffer and we use it as a large metadata store. And that allows us not just to arrange the information in a very large persistent right buffer before we need to place it on the low cost flash. But it also allows us to develop new types of metadata structures and algorithms that allow us to make better use of the low cost flash and reduce the effective price down even lower than the rock capacity. >> Very cool. David, what are your thoughts on the architecture? give us kind of the independent perspective >> I think it's brilliant architecture. I'd like to just go one step down on the network side of things. The whole use of NBME over fabric allows the users all of the servers to get any data across this whole network directly to it. So you've got great performance right away across the stack. And then the other thing is that by using RDMA for NASS, you're able, if you need to, to get down in microseconds to the data. So overall that's a thousand times faster than any HDD system could manage. So this architecture really allows an any to any simple, single level of storage which is so much easier to think about, architect use or manage is just so much simpler. >> If you had I mean, I said I don't know if there's an answer to this question but if you had to pick one thing Renan that you really were dogmatic about and you bet on from an architectural standpoint, what would that be? >> I think what we bet on in the early days is the fact that the pyramid doesn't work anymore and that tiering doesn't work anymore. In fact, we stole Johnson and Johnson's tagline No More Tears. Only, It's not spelled the same way. The reason for that is not because of storage. It's because of the applications as we move to applications more and more that are machine-based and machines are now not just generating the data. They're also reading the data and analyzing it and providing insights for humans to consume. Then the workloads changed dramatically. And the one thing that we saw is that you can't choose which pieces of information need to be accessible anymore. These new algorithms, especially around AI and machine learning and deep learning they need fast access to the entirety of the dataset and they want to read it over and over and over again in order to generate those insights. And so that was the driving force behind us building this new type of architecture. And we're seeing every single day when we talk to customers how the old architecture is simply break down in the face of these new applications. >> Very cool speaking of customers. I wonder if you could talk about use cases, customers you know, and this NASS arena maybe you could add some color there. >> Sure, our customers are large in data. We started half a petabyte and we grow into the exabyte range. The system likes to be big as, as it grows it grows super linearly. If you have a 100 nodes or a 1000 nodes you get more than 10X in performance, in capacity efficiency and resilience, et cetera. And so that's where we thrive. And those workloads are today. Mainly analytics workloads, although not entirely. If you look at it geographically we have a lot of life science in Boston research institutes medical imaging, genomics universities pharmaceutical companies here in New York. We have a lot of financials, hedge funds, Analyzing everything from satellite imagery to trade data to Twitter feeds out in California. A lot of AI, autonomous driving vehicles as well as media and entertainment both generation of films like animation, as well as content distribution are being done on top of best. >> Great thank you and David, when you look at the forecast that you've made over the years and when I imagine that they match nicely with your assumptions. And so, okay, I get that, but that doesn't, not everybody agrees, David. I mean, certainly the HDD guys don't agree but they, they're obviously fighting to hang on to their awesome run for 50 years, but as well there's others to do in hybrids and the like, and they kind of challenge your assumptions and you don't have a dog in this fight. We just want the truth and try to do our best to report it. But let me start with this. One of the things I've seen is that you're comparing deduped and compressed flash with raw HDD. Is that true or false? >> It's in terms of the fundamentals of the forecast, et cetera, it's false. What I'm taking is the new egg price. And I did it this morning and I looked up a two terabyte disc drive, NAS disc drive. I think it was $54. And if you look at the cost of a a NAND for two terabytes, it's about $200. So it's a four to one ratio. >> So, >> So and that's coming down from what people saw last year, which was five or six and every year has been, that ratio has been coming down. >> The ratio between the cost Delta, between HDD is still cheaper. So Renan I wonder one of the other things that Floyer has said is that because of the advantages of flash, not only performance but also data sharing, et cetera, which really drives other factors like TCO. That it doesn't have to be at parody in order for customers to consume that. I certainly saw that on my laptop, I could have got more storage and it could have been cheaper for per bit for my laptop. I took the flash. I mean, no problem. That that was an intelligence test but what are you seeing from customers? And by the way Floyer I think is forecasting by what, 2026 there will be actually a raw to raw crossover. So then it's game over. But what are you seeing in terms of what customers are telling you or any evidence you have that it doesn't have to be, even that customers actually get more value even if it's more expensive from flash, what are you seeing? >> Yeah in the enterprise space customers aren't buying raw flash they're buying storage systems. And so even if the raw numbers flash versus hard drive are still not there there is a lot of things that can be done at the system level to equalize those two. In fact, a lot of our IP is based on that we are taking flash today is, as David said more expensive than hard drives, but at the system level it doesn't remain more expensive. And the reason for that is storage systems waste space. They waste it on metadata, they waste it on redundancy. We built our new metadata structures, such that they everything lives in Crosspoint and is so much smaller because of the way Crosspoint is accessible at byte level granularity, we built our erasure codes in a way where you can sustain 10, 20, 30 drive failures but you only pay two or 1% in overhead. We built our data reduction mechanisms such that they can reduce down data even if the application has already compressed it and already de-duplicated it. And so there's a lot of innovation that can happen at the software level as part of this new direct dis-aggregated shared everything architecture that allows us to bridge that cost gap today without having customers do fancy TCO calculations. And of course, as prices of flash over the next few years continue declining, all of those advantages remain and it will just widen the gap between hard drives and flash. And there really is no advantage to hard drives once the price thing is solved. >> So thank you. So David, the other thing I've seen around these forecasts is that the comments that you can't really data reduce effectively hard disk. And I understand why the overhead and of course you can in flash you can use all kinds of data reduction techniques and not affect performance, or it's not even noticeable like put the cloud guys, do it upstream. Others do it upstream. What's your comment on that? >> Yes, if you take sequential data and you do a lot of work upfront you can write out in very lot big blocks and that's a perfect sequentially, good way of doing it. The challenge for the HDD people is if they go for that for that sort of sequential type of application that the cheapest way of doing that is to use tape which comes back to the discussion that the two things that are going to remain are tape and flash. So that part of the HDD market in my assertion will go towards tape and tape libraries. And those are serving very well at the moment. >> Yeah I mean, It's just the economics of tape are really attractive. I just feel like I've said this many times that the marketing of tape is lacking. Like I'd like to see, better thinking around how it could play. Cause I think customers have this perception tape, but there's actually a lot of value there. I want to carry on, >> Small point there. Yeah, I mean, there's an opportunity in the same way that Vast have created an architecture for flash. There's an opportunity out there for the tech people with flash to make an architecture that allows you to take that workload and really lower the price, enormously. >> You've called it Flape >> Flape yes. >> There's some interesting metadata opportunities there but we won't go into that. And then David, I want to ask you about NAND shortages. We saw this in 2016 and 2017. A lot of people saying there's an NAND shortage again. So that's a flaw in your forecast prices of you're assuming prices of flash continue to come down faster than those of HDD but the shortages of NAND could be problematic. What do you say to that? >> Well, I've looked at that in some detail and one of the big, important things is what's happening in the flash market and the Chinese, YMTC Chinese company has introduced a lot more volume into the market. They're making 100,000 wafers a month for this year. That's around six to 8% of market of NAND at this year, as a result, Samsung, micron, Intel, Hynix they're all increasing their volumes of NAND so that they're all investing. So I don't see that NAND itself is going to be a problem. There is certainly a shortage of processor chips which drive the intelligence in the NAND itself. But that's a problem for everybody. That's a problem for cars. It's a problem for disk drives. >> You could argue that's going to create an oversupply, potentially. Let's not go there, but you know what at the end of the day it comes back to the customer and all this stuff. It's interesting. I love talking about the architecture but it's really all about customer value. And so, so Renan, I want you to sort of close there. What should customers be paying attention to? And what should observers of Vast Data really watch as indicators for progress for you guys milestones and things in the market that we should be paying attention to but start with the customers. What's your advice to them? >> Sure, for any customer that I talked to I always ask the same thing. Imagine where you'll be five years from now because you're making an investment now that is at least five years long. In our case, we guaranteed the lifespan of the devices for a decade, such that you know that it's going to be there for you and imagine what is going to happen over those next five years. What we're seeing in most customers is that they have a lot of doormen data and with the advances in analytics and AI they want to make use of that data. They want to turn it from a cost center to a profit center and to gain insight from that data and to improve their business based on that information that they have the same way the hyperscalers are doing in order to do that, you need one thing you need fast access to all of that information. Once you have that, you have the foundation to step into this next generation type world where you can actually make money off of your information. And the best way to get very, very fast access to all of your information is to put it on Vast media like flash and Crosspoint. If I can give one example, Hedge Funds. Hedge funds do a lot of back-testing on Vast. And what makes sense for them is to test as much information back as they possibly can but because of storage limitations, they can't do that. And the other thing that's important to them is to have a real-time experience to be able to run those simulations in a few minutes and not as a batch process overnight, but because of storage limitations, they can't do that either. The third thing is if you have many different applications and many different users on the same system they usually step on each other's toes. And so the Vast architecture is solves those three problems. It allows you a lot of information very fast access and fast processing an amazing quality of service where different users of the system don't even notice that somebody else is accessing the same piece of information. And so Hedge Funds is one example. Any one of these verticals that make use of a lot of information will benefit from this architecture in this system. And if it doesn't cost any more, there's really no real reason delay this transition into all flash. >> Excellent very clear thinking. Thanks for laying that out. And what about, you know, things that we should how should we judge you? What are the things that we should watch? >> I think the most important way to judge us is to look at customer adoption and what we're seeing and what we're showing investors is a very high net dollar retention number. What that means is basically a customer buys a piece of kit today, how much more will they buy over the next year, over the next two years? And we're seeing them buy more than three times more, within a year of the initial purchase. And we see more than 90% of them buying more within that first year. And that to me indicates that we're solving a real problem and that they're making strategic decisions to stop buying any other type of storage system. And to just put everything on Vast over the next few years we're going to expand beyond just storage services and provide a full stack for these AI applications. We'll expand into other areas of infrastructure and develop the best possible vertically integrated system to allow those new applications to thrive. >> Nice, yeah. Think investors love that lifetime value story. If you can get above 3X of the customer acquisition cost is to IPO in the way. Guys hey, thanks so much for coming to the Cube. We had a great conversation and really appreciate your time. >> Thank you. >> Thank you. >> All right, Thanks for watching everybody. This is Dave Volante for the Cube. We'll see you next time. (gentle music)
SUMMARY :
that the all flash data center was coming. in the marketplace but where and the volume comes from the consumers. the innovations that you're doing, kill of the hard drive. David maybe you could give And so QLC is the latest, and any metrics you can in the way that David predicted. having led development, of the product And the capacity grows to a point where And David, you and I have talked about the biggest single problem. the ground up for flash that all of the containers can see. that you not only built for cause it didn't have the volume and PLC is that you get the same levels David, what are your all of the servers to get any data And the one thing that we saw I wonder if you could talk And so that's where we thrive. One of the things I've seen is that of the forecast, et cetera, it's false. So and that's coming down And by the way Floyer I at the system level to equalize those two. the comments that you can't really So that part of the HDD market that the marketing of tape is lacking. and really lower the price, enormously. but the shortages of NAND and one of the big, important I love talking about the architecture that it's going to be there for you What are the things that we should watch? And that to me indicates that of the customer acquisition This is Dave Volante for the Cube.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
David | PERSON | 0.99+ |
Renen Hallak | PERSON | 0.99+ |
2008 | DATE | 0.99+ |
Samsung | ORGANIZATION | 0.99+ |
Renan | PERSON | 0.99+ |
2016 | DATE | 0.99+ |
10 | QUANTITY | 0.99+ |
David FLoyer | PERSON | 0.99+ |
David Floyer | PERSON | 0.99+ |
five | QUANTITY | 0.99+ |
New York | LOCATION | 0.99+ |
$54 | QUANTITY | 0.99+ |
2006 | DATE | 0.99+ |
Dave Volante | PERSON | 0.99+ |
Hynix | ORGANIZATION | 0.99+ |
$150 million | QUANTITY | 0.99+ |
iPhone | COMMERCIAL_ITEM | 0.99+ |
California | LOCATION | 0.99+ |
EMC | ORGANIZATION | 0.99+ |
2010 | DATE | 0.99+ |
50 years | QUANTITY | 0.99+ |
Steve Jobs | PERSON | 0.99+ |
two | QUANTITY | 0.99+ |
2017 | DATE | 0.99+ |
four | QUANTITY | 0.99+ |
Intel | ORGANIZATION | 0.99+ |
last year | DATE | 0.99+ |
Vast Data | ORGANIZATION | 0.99+ |
20 | QUANTITY | 0.99+ |
six | QUANTITY | 0.99+ |
three dimensions | QUANTITY | 0.99+ |
three problems | QUANTITY | 0.99+ |
YMTC | ORGANIZATION | 0.99+ |
Floyer | ORGANIZATION | 0.99+ |
Boston | LOCATION | 0.99+ |
Delta | ORGANIZATION | 0.99+ |
Renen | PERSON | 0.99+ |
30 | QUANTITY | 0.99+ |
100 nodes | QUANTITY | 0.99+ |
ORGANIZATION | 0.99+ | |
two terabytes | QUANTITY | 0.99+ |
1% | QUANTITY | 0.99+ |
next year | DATE | 0.99+ |
more than 90% | QUANTITY | 0.99+ |
both | QUANTITY | 0.99+ |
2026 | DATE | 0.99+ |
two things | QUANTITY | 0.99+ |
five years ago | DATE | 0.99+ |
third dimension | QUANTITY | 0.99+ |
one example | QUANTITY | 0.99+ |
third thing | QUANTITY | 0.99+ |
two terabyte | QUANTITY | 0.99+ |
iPod | COMMERCIAL_ITEM | 0.99+ |
more than three times | QUANTITY | 0.98+ |
1000 nodes | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
last decade | DATE | 0.98+ |
single problem | QUANTITY | 0.98+ |
each | QUANTITY | 0.98+ |
One dimension | QUANTITY | 0.98+ |
one | QUANTITY | 0.98+ |
five years | QUANTITY | 0.98+ |
one set | QUANTITY | 0.98+ |
ORGANIZATION | 0.98+ | |
about $200 | QUANTITY | 0.97+ |
this year | DATE | 0.97+ |
two years ago | DATE | 0.97+ |
single system | QUANTITY | 0.97+ |
first year | QUANTITY | 0.97+ |
half a petabyte | QUANTITY | 0.97+ |
one thing | QUANTITY | 0.97+ |
micron | ORGANIZATION | 0.97+ |
One | QUANTITY | 0.97+ |