Neil Vachharajani, Pure Storage | CUBEConversation, Sept 2018
(upbeat music) >> Hi I'm Peter Burris. Welcome to another CUBE Conversation from our wonderful studios in beautiful Palo Alto, CA. Today we are going to be talking about new architectures, new disciplines required to really make possible the opportunities associated with digital business. And to do that, we've got Neil Vachharajani, who is the Technical Director at Pure Storage. Neil welcome to theCUBE. >> Thank you for having me, Peter. >> So Neil, we have spent a fair amount of time within Wikibon and within the CUBE community, talking a lot about what is digital business. So, give me a second, run something by ya, tell me if you agree. So we think that there is a difference between business and digital business. And specifically, we think that difference is, a digital business uses data assets differently, than a business does. Walmart beat Sears 'cause it used data differently. AWS is putting the pressure on Walmart, because it uses data differently. Or Amazon is putting the pressure on Walmart, because it uses data differently. So, that is at the centerpiece of a lot of these digital transformations. How are you using data to re-institutionalize your work, realign your resources, reestablish a new engagement model with your marketplace. Would you agree with that? >> Yeah, absolutely agree with that and I think a lot of it has to do with the volume of data, where the data is coming from. If you look at traditional business, it really was about just putting into computers what we used to do on paper. And digital business today I think is about generating huge volumes of data by really looking at every interaction we have no matter how small or how big. >> So, putting telemetry on as many things. So, IoT for machines, mobile for human beings, but it used to be as you said. It was a process, known process, unknown technology world for a long time. And now, these are data driven processes. We're actually using data to describe what their next best action should be, what the recommendation should be. >> That's right. >> So, as we think about this, you know, businesses has been around for a long time. There's this notion of evidence based management, which is the idea that we use data differently, from the boardroom all the way down to the drivers. How does a business start to bring forward the discipline required to really make possible this data driven world. >> Well you know I think the first thing is, to really recognize why does this new paradigm shift changes things? And I think in the old world, if you looked at a piece of data, you actually could articulate all the way from the boardroom down to the stockroom every use of the data. And that meant that you could build a lot of siloed applications and that wasn't a big deal. You got your money's worth out of the data. So for example, recording transactions in store number 17. >> That's right. But in the new world, you actually don't know what the value of the data is ahead of time. Right. You're, in some sense, you're trying to capture a lot of data and then use technology to correlate it with things, mix and mash, mix and match, mash it up, and then drive business decisions that you didn't even know you were making a decision a few weeks ago and that means that you can't really lock up your data, you can't constrain it, because that's going to limit your possibilities. It's going to limit your ROI on that data. >> Yeah, we like to say that data as an asset is different from all other assets, because it is inherently sharable, reusable, it doesn't follow the laws of scarcity. And so, in many respects what the IT organization has had to do is find new ways to privatize that data through things like security, but as you're saying, they don't want to introduce technologies that artificially constrain derivative and future uses of that data. >> And I think, that's where, really the big architectural shift is happening in the data center. Because if you look traditionally, we have siloed the data and it wasn't like this intentional thing that we want to put it into a silo. But that's how we packaged our applications and that's how we deployed our applications. And now, we need a new discipline inside the data center, that makes the data available, lets people put policies on it. Like security policies. But then also makes it available for the innovators all throughout the company to get access to that data. You know, we're trying to crystallize this whole philosophy into something we refer to as the data-centric architecture. Where data is at the center, people have access to the data, and then there's just applications all around it that are all hitting this common pool of data and doing different things, driving new business processes. >> Now, you're talking not about a physical pool of data, but rather a logical pool of data. Data is stil going to be very distributed, right? >> Well you know, data gets generated in a distributed way, data is very large. I think it would be a bit naive to be able to point to one rack and one data center and say all your data center is going to be right here in this one rack. >> Or in one cloud. >> Or in one cloud for that matter. But just from a philosophical perspective, you do want to pull your data out of anything that is, like you said a minute ago, that's constraining it. So, I think, one really good example of this is when we went, quote unquote, web scale, we saw a lot of applications move into direct attached storage, to dive deep into a technology. And that was great if you wanted to only come in the front door and access the data through the application that was managing that das. But, if you wanted to do anything else, you were kind of stuck. >> So as to summarize this point, we're moving from a world in which data is a place to data is a service. >> That's right. >> Have I got that right? >> That's absolutely right. I mean, the way I like to think about it is that data and storage need to really be different things and storage's job is to give you access to the data. Storage in its own right, you know, doesn't solve a business problem. It's the data that solves the business problem. Storage is the vehicle that gets you there. And so I think it's pretty exciting that there's new technologies that are coming out, or that honestly are here, that are enabling that. Things like Flash and NVMe, and you know, it's futures. >> Well let's talk about that because what, the observation that I made to clients for quite some time is that if you go back, disk, was a great technology for persisting data. So again, Store number 17, transaction at a certain time. It's already occurred, we have to record it. So, we record it, we persisted on disk. Now what we are trying to do is we're utilizing technologies that are inherently structured to deliver data so that we can have the data be very distributed, but still look at it from a logical standpoint. And have that data be delivered to a lot of applications whether that is local and as long as we don't undermine basic physics perhaps further away. But even more importantly, deliver it to different roles, different, same day of being delivered to developers, same day to being different, delivered to a new application. What are some of those core technologies that are going to be necessary to do this? You mentioned NVMe, let's start there. >> Yeah, if I just back up a little bit right, that in some sense, even that recording the data workflow that you talked about, we made disk work. But it was actually a pretty challenging media and so we put in a lot of optimizations and things in place, because we said, we know the usage pattern. And if we know the usage pattern, we know how to organize our data. And so as a step one, like the transformation that I think is, in pretty full swing these days was moving from disk to flash. And that was a huge transformation, because it meant that random access to the data was just as performant as this carefully crafted sequential access. That meant you could start accepting unknown workloads into your applications, but you were still stuck behind this very serial, very antiquated SCSI protocol. And NVMe is now bringing a lot more parallels, to play. And that's going to help us to drive things like just simple, plain old data center. Stuff like density, and performance density, and power, and that kind of thing. So, that's sort of step one in terms of the technology that you can package all of this stuff in a pretty dense package and put petabytes of storage with enough I/O to actually access that data. If that's the key that you can have pedabytes, but you can only have one I out for each gig, well you're not going to get a lot out of that data. >> So, just to stop right there, and that leads to a world, in which as long as your disciplined and architected, you do not have to know what workloads are going to access that data near term. >> Well, you know, that's only step one, right. >> Right. >> Because the other challenge is that very few people access storage directly, right. We hide this behind databases, and we hide this behind a whole bunch of other technologies. Now, those technologies might have have their own limitations in place. But we have a lot or really rich things we can do at the storage level to present the same data out multiple frontends. And so the simplest idea is, we don't have one copy of a database, we often will have the transactional database that's using, recording those transactions, but then we'll have an analytics copy of the database and now we need to keep the two of those things in sync. And this is where the discipline and the architecture really comes into place. And we kind of have a lot of that figured out for things like relational databases and best practices there. But in the meantime, the world also moved over to the new world of Node-SQL databases, Queue's, Kafka. Things of that nature. And those, brought direct attached storage as the best practice. And so I think where the discipline comes in and where some of the new technologies that we're talking about right now are: How do you bring those old disciplines that we figured out, on let's say the relational world, how you bring that to bear on the new technologies that are meeting the scale requirements that we have today? >> Well one of the more important workloads that are going to require scale is, for example, AI. So, how are we going to organize some of these technologies, add them to these new disciplines, to be able to make some of these AI workloads run really, really fast. >> You know, I think a lot of this really comes down to pulling the storage out and putting it into it's own tier. And so, Pure Storage has an offering which is called AIRI, which is packaging DGX and Video DGX boxes with FlashBlades. And we say, hey you don't need a whole bunch of direct attached storage which is siloing your data, you can go put it into this common shared pool. And I think that on, you know, the other side the house, our FlashArray business is doing something really similar with NVMe, the FlashArray/X is essentially commoditizing NVMe. It's saying, everybody has access to this high performance density. And looking into the future with technologies like NVMe over Fabric, what we're really saying is your apps that used to use direct attached storage, there's no reason why they can't go to a sand based architecture that offers rich data services and not compromise one iota on latency. >> Or access or any other number of activities as well. So we've got NVMe, NVMe over Fabric, Flash, new approaches for thinking about packaging some of these things. Are there any other technologies that you envision on the horizon that are going to be really important to customers and that Pure is going to take advantage of. >> Yeah, you know, I really think that the other thing is once you collect all this stuff, you need a way to tame the beast. You need a way to deploy your applications. You need a way to catalog everything. And honestly, things like Kubernetes and container orchestration is becoming this platform where you deploy all of this stuff. And some of the assumptions that are baked into that, really go back and tie in nicely with those other technologies. In particular, they assume that I can schedule this compute wherever I want and I have access to the data. So in that way of having a fabric if you will between your compute and your data is essential. And it's just another reason why siloing things off into particular units of compute is just really the architecture of the past. And the architecture going forward is going to be to logically centralize. And maybe put some smarts at that other layer, saying, hey if this data is in the public cloud, let me schedule up there. But if this data is in my data center, let me schedule the compute down there. But then not having to worry about the micro decisions about, does it have to be in this rack or, you know, or on this particular physical node. All your data is accessible. >> But increasingly, we're going to do things that move the compute both physically as well as logically closer to the data. >> You know, 100%. Right. But it's at what scale? That you really want to get the data center right. Your compute should be running in the correct data center. >> Or the center of data right? >> Or the center of data, right, you know. Get it in the right spot, but then you don't want to have to worry about all the other micro constraints. You don't want, you know, if you look on the networking side of the world, Leaf Spy networks are all about say, hey look they're really is a uniform fabric for networking. We're trying to do the same thing in storage and just say, look, the storage is so performant, there's no reason to silo. You can run your compute where ever you want. If you've got a good networking fabric and you've got a good storage fabric, the end of the day, all your data is accessible, to whatever new application you envision. And you just, there's no reason why you have to lock it up. You mentioned security before. You know, you should absolutely be able to orchestrate things like taking a snapshot of your data, putting it through, masking, or whatever anonymization you need to make it safely accessible to new applications and innovators inside of your company to drive that digital business. >> Yes, and we like to talk about moving from a world that is focused on infrastructure, taking cost out, making it static, by removing all uncertainty to a world where we've no workloads, and elastic capacity, or elastic scale to a plastic world. Where plastic, using of the physicals, you know, the physic sense is unknown workload, unknown scale. And just making sure that we have the option to use data any way we want as much as possible in the future. >> And I think that that's why you see the rise of service catalogs and self service coming up in IT, it's that plasticity that you have the brightest minds in your company trying to figure out what to do, and you don't want to have infrastructure be this bottleneck that's causing everything to go slower. Or for people to say no. You just always want to say, yes. And that's where I think it's always exciting to see, see these technologies, NVMe, come out and say, we've now got the performance to say yes. NVMe over Fabric to say there's no compromise over latency. And then honestly, having this stuff packaged in things like FlashArray/X, where the CIO or the CFO, doesn't complain about breaking the bank as well. Because now these technologies are the status quo. They're the standard. There's no premium for them. And if anyone is trying to charge you that premium, you should really, you know, ask them why. This is the new architecture, this should be, this should be, what, the only thing you offer >> Right. >> In some sense >> Yeah, we're bringing all these new technologies into economic envelope that IT has to be in for business today. >> That's right, and you know, you look at something like flash memory, right. It's not a new technology. I remember in college having a flash card to put into like a digital camera in the early days of digital cameras. But for it to make it into the data center, the thing that was critical was that economic aspect of it. So it's not just about being on the bleeding edge of technology, but it's packaging that in a way that's actually palatable for the entire C-Suite to consume inside your organization. >> And I remember my disk pack that I carried around in college from the PDP system that we had to use. (laughter) Alright, Neil Vachharajani, Technical Director of Pure Storage talking about the relationship between new technologies, data centeric architectures, and digital business. Thanks very much for being on theCUBE. >> Thanks so much Peter. >> And once again, I'm Peter Burris, you've been participating in another CUBE conversation. 'Til we talk again. (upbeat music)
SUMMARY :
And to do that, we've got So, that is at the centerpiece has to do with the volume but it used to be as you that we use data differently, And that meant that you could build a lot the new world, you actually has had to do is find new have access to the data, and Data is stil going to be is going to be right here to pull your data out of anything that is, So as to summarize this Storage is the vehicle that that I made to clients for And that's going to help us to have to know what workloads Well, you know, that's that to bear on the new to be able to make some And we say, hey you don't need horizon that are going to in this rack or, you know, to the data. in the correct data center. And you just, that we have the option got the performance to say to be in for business today. But for it to make it into system that we had to use. And once again, I'm
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Neil Vachharajani | PERSON | 0.99+ |
Peter Burris | PERSON | 0.99+ |
Neil | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Walmart | ORGANIZATION | 0.99+ |
Peter | PERSON | 0.99+ |
100% | QUANTITY | 0.99+ |
Sept 2018 | DATE | 0.99+ |
two | QUANTITY | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Sears | ORGANIZATION | 0.99+ |
each gig | QUANTITY | 0.99+ |
Pure Storage | ORGANIZATION | 0.99+ |
Palo Alto, CA. | LOCATION | 0.99+ |
one rack | QUANTITY | 0.99+ |
one cloud | QUANTITY | 0.99+ |
one | QUANTITY | 0.98+ |
one rack | QUANTITY | 0.98+ |
Today | DATE | 0.97+ |
Kafka | TITLE | 0.97+ |
Node | TITLE | 0.97+ |
both | QUANTITY | 0.97+ |
first thing | QUANTITY | 0.96+ |
CUBE | ORGANIZATION | 0.94+ |
today | DATE | 0.93+ |
step one | QUANTITY | 0.93+ |
one copy | QUANTITY | 0.93+ |
one data center | QUANTITY | 0.92+ |
few weeks ago | DATE | 0.87+ |
X | TITLE | 0.86+ |
FlashBlades | COMMERCIAL_ITEM | 0.85+ |
Kubernetes | TITLE | 0.84+ |
FlashArray | TITLE | 0.84+ |
a minute ago | DATE | 0.81+ |
petabytes | QUANTITY | 0.77+ |
C-Suite | TITLE | 0.76+ |
number 17 | OTHER | 0.74+ |
CUBE | TITLE | 0.71+ |
a second | QUANTITY | 0.67+ |
DGX | ORGANIZATION | 0.63+ |
theCUBE | ORGANIZATION | 0.62+ |
AIRI | TITLE | 0.61+ |
SQL | TITLE | 0.6+ |
Leaf | ORGANIZATION | 0.6+ |
Wikibon | ORGANIZATION | 0.58+ |
CUBE Conversation | EVENT | 0.51+ |
CUBEConversation | EVENT | 0.45+ |