Next Gen Analytics & Data Services for the Cloud that Comes to You | An HPE GreenLake Announcement
(upbeat music) >> Welcome back to theCUBE's coverage of HPE GreenLake announcements. We're seeing the transition of Hewlett Packard Enterprise as a company, yes they're going all in for as a service, but we're also seeing a transition from a hardware company to what I look at increasingly as a data management company. We're going to talk today to Vishal Lall who's GreenLake cloud services solutions at HPE and Matt Maccaux who's a global field CTO, Ezmeral Software at HPE. Gents welcome back to theCube. Good to see you again. >> Thank you for having us here. >> Thanks Dave. >> So Vishal let's start with you. What are the big mega trends that you're seeing in data? When you talk to customers, when you talk to partners, what are they telling you? What's your optic say? >> Yeah, I mean, I would say the first thing is data is getting even more important. It's not that data hasn't been important for enterprises, but as you look at the last, I would say 24 to 36 months has become really important, right? And it's become important because customers look at data and they're trying to stitch data together across different sources, whether it's marketing data, it's supply chain data, it's financial data. And they're looking at that as a source of competitive advantage. So, customers were able to make sense out of the data, enterprises that are able to make sense out of that data, really do have a competitive advantage, right? And they actually get better business outcomes. So that's really important, right? If you start looking at, where we are from an analytics perspective, I would argue we are in maybe the third generation of data analytics. Kind of the first one was in the 80's and 90's with data warehousing kind of EDW. A lot of companies still have that, but think of Teradata, right? The second generation more in the 2000's was around data lakes, right? And that was all about Hadoop and others, and really the difference between the first and the second generation was the first generation was more around structured data, right? Second became more about unstructured data, but you really couldn't run transactions on that data. And I would say, now we are entering this third generation, which is about data lake houses, right? Customers what they want really is, or enterprises, what they want really is they want structured data. They want unstructured data altogether. They want to run transactions on them, right? They want to use the data to mine it for machine learning purposes, right? Use it for SQL as well as non-SQL, right? And that's kind of where we are today. So, that's really what we are hearing from our customers in terms of at least the top trends. And that's how we are thinking about our strategy in context of those trends. >> So lake house use that term. It's an increasing popular term. It connotes, "Okay, I've got the best of data warehouse "and I've got the best of data lake. "I'm going to try to simplify the data warehouse. "And I'm going to try to clean up the data swamp "if you will." Matt, so, talk a little bit more about what you guys are doing specifically and what that means for your customers. >> Well, what we think is important is that there has to be a hybrid solution, that organizations are going to build their analytics. They're going to deploy algorithms, where the data either is being produced or where it's going to be stored. And that could be anywhere. That could be in the trunk of a vehicle. It could be in a public cloud or in many cases, it's on-premises in the data center. And where organizations struggle is they feel like they have to make a choice and a trade-off going from one to the other. And so what HPE is offering is a way to unify the experiences of these different applications, workloads, and algorithms, while connecting them together through a fabric so that the experience is tied together with consistent, security policies, not having to refactor your applications and deploying tools like Delta lake to ensure that the organization that needs to build a data product in one cloud or deploy another data product in the trunk of an automobile can do so. >> So, Vishal I wonder if we could talk about some of the patterns that you're seeing with customers as you go to deploy solutions. Are there other industry patterns? Are there any sort of things you can share that you're discerning? >> Yeah, no, absolutely. As we kind of hear back from our customers across industries, I think the problem sets are very similar, right? Whether you look at healthcare customers. You look at telco customers, you look at consumer goods, financial services, they're all quite similar. I mean, what are they looking for? They're looking for making sense, making business value from the data, breaking down the silos that I think Matt spoke about just now, right? How do I stitch intelligence across my data silos to get more business intelligence out of it. They're looking for openness. I think the problem that's happened is over time, people have realized that they are locked in with certain vendors or certain technologies. So, they're looking for openness and choice. So that's an important one that we've at least heard back from our customers. The other one is just being able to run machine learning on algorithms on the data. I think that's another important one for them as well. And I think the last one I would say is, TCO is important as customers over the last few years have realized going to public cloud is starting to become quite expensive, to run really large workloads on public cloud, especially as they want to egress data. So, cost performance, trade offs are starting to become really important and starting to enter into the conversation now. So, I would say those are some of the key things and themes that we are hearing from customers cutting across industries. >> And you talked to Matt about basically being able to essentially leave the data where it belongs, bring the compute to data. We talk about that all the time. And so that has to include on-prem, it's got to include the cloud. And I'm kind of curious on the edge, where you see that 'cause that's... Is that an eventual piece? Is that something that's actually moving in parallel? There's lot of fuzziness as an observer in the edge. >> I think the edge is driving the most interesting use cases. The challenge up until recently has been, well, I think it's always been connectivity, right? Whether we have poor connection, little connection or no connection, being able to asynchronously deploy machine learning jobs into some sort of remote location. Whether it's a very tiny edge or it's a very large edge, like a factory floor, the challenge as Vishal mentioned is that if we're going to deploy machine learning, we need some sort of consistency of runtime to be able to execute those machine learning models. Yes, we need consistent access to data, but consistent access in terms of runtime is so important. And I think Hadoop got us started down this path, the ability to very efficiently and cost-effectively run large data jobs against large data sets. And it attempted to work into the source ecosystem, but because of the monolithic deployment, the tightly coupling of the compute and the data, it never achieved that cloud native vision. And so what as role in HPE through GreenLake services is delivering with open source-based Kubernetes, open source Apache Spark, open source Delta lake libraries, those same cloud native services that you can develop on your workstation, deploy in your data center in the same way you deploy through automation out at the edge. And I think that is what's so critical about what we're going to see over the next couple of years. The edge is driving these use cases, but it's consistency to build and deploy those machine learning models and connect it consistently with data that's what's going to drive organizations to success. >> So you're saying you're able to decouple, to compute from the storage. >> Absolutely. You wouldn't have a cloud if you didn't decouple compute from storage. And I think this is sort of the demise of Hadoop was forcing that coupling. We have high-speed networks now. Whether I'm in a cloud or in my data center, even at the edge, I have high-performance networks, I can now do distributed computing and separate compute from storage. And so if I want to, I can have high-performance compute for my really data intensive applications and I can have cost-effective storage where I need to. And by separating that off, I can now innovate at the pace of those individual tools in that opensource ecosystem. >> So, can I stay on this for a second 'cause you certainly saw Snowflake popularize that, they were kind of early on. I don't know if they're the first, but they certainly one of the most successful. And you saw Amazon Redshift copied it. And Redshift was kind of a bolt on. What essentially they did is they teared off. You could never turn off the compute. You still had to pay for a little bit compute, that's kind of interesting. Snowflakes at the t-shirt sizes, so there's trade offs there. There's a lot of ways to skin the cat. How did you guys skin the cat? >> What we believe we're doing is we're taking the best of those worlds. Through GreenLake cloud services, the ability to pay for and provision on demand the computational services you need. So, if someone needs to spin up a Delta lake job to execute a machine learning model, you spin up that. We're of course spinning that up behind the scenes. The job executes, it spins down, and you only pay for what you need. And we've got reserve capacity there. So you, of course, just like you would in the public cloud. But more importantly, being able to then extend that through a fabric across clouds and edge locations, so that if a customer wants to deploy in some public cloud service, like we know we're going to, again, we're giving that consistency across that, and exposing it through an S3 API. >> So, Vishal at the end of the day, I mean, I love to talk about the plumbing and the tech, but the customer doesn't care, right? They want the lowest cost. They want the fastest outcome. They want the greatest value. My question is, how are you seeing data organizations evolve to sort of accommodate this third era of this next generation? >> Yeah. I mean, the way at least, kind of look at, from a customer perspective, what they're trying to do is first of all, I think Matt addressed it somewhat. They're looking at a consistent experience across the different groups of people within the company that do something to data, right? It could be a SQL users. People who's just writing a SQL code. It could be people who are writing machine learning models and running them. It could be people who are writing code in Spark. Right now they are, you know the experience is completely disjointed across them, across the three types of users or more. And so that's one thing that they trying to do, is just try to get that consistency. We spoke about performance. I mean the disjointedness between compute and storage does provide the agility, because there customers are looking for elasticity. How can I have an elastic environment? So, that's kind of the other thing they're looking at. And performance and DCU, I think a big deal now. So, I think that that's definitely on a customer's mind. So, as enterprises are looking at their data journey, those are the at least the attributes that they are trying to hit as they organize themselves to make the most out of the data. >> Matt, you and I have talked about this sort of trend to the decentralized future. We're sort of hitting on that. And whether it's in a first gen data warehouse, second gen data lake, data hub, bucket, whatever, that essentially should ideally stay where it is, wherever it should be from a performance standpoint, from a governance standpoint and a cost perspective, and just be a node on this, I like the term data mesh, but be a node on that, and essentially allow the business owners, those with domain context to you've mentioned data products before to actually build data products, maybe air quotes, but a data product is something that can be monetized. Maybe it cuts costs. Maybe it adds value in other ways. How do you see HPE fitting into that long-term vision which we know is going to take some time to play out? >> I think what's important for organizations to realize is that they don't have to go to the public cloud to get that experience they're looking for. Many organizations are still reluctant to push all of their data, their critical data, that is going to be the next way to monetize business into the public cloud. And so what HPE is doing is bringing the cloud to them. Bringing that cloud from the infrastructure, the virtualization, the containerization, and most importantly, those cloud native services. So, they can do that development rapidly, test it, using those open source tools and frameworks we spoke about. And if that model ends up being deployed on a factory floor, on some common X86 infrastructure, that's okay, because the lingua franca is Kubernetes. And as Vishal mentioned, Apache Spark, these are the common tools and frameworks. And so I want organizations to think about this unified analytics experience, where they don't have to trade off security for cost, efficiency for reliability. HPE through GreenLake cloud services is delivering all of that where they need to do it. >> And what about the speed to quality trade-off? Have you seen that pop up in customer conversations, and how are organizations dealing with that? >> Like I said, it depends on what you mean by speed. Do you mean a computational speed? >> No, accelerating the time to insights, if you will. We've got to go faster, faster, agile to the data. And it's like, "Whoa, move fast break things. "Whoa, whoa. "What about data quality and governance and, right?" They seem to be at odds. >> Yeah, well, because the processes are fundamentally broken. You've got a developer who maybe is able to spin up an instance in the public cloud to do their development, but then to actually do model training, they bring it back on-premises, but they're waiting for a data engineer to get them the data available. And then the tools to be provisioned, which is some esoteric stack. And then runtime is somewhere else. The entire process is broken. So again, by using consistent frameworks and tools, and bringing that computation to where the data is, and sort of blowing this construct of pipelines out of the water, I think is what is going to drive that success in the future. A lot of organizations are not there yet, but that's I think aspirationally where they want to be. >> Yeah, I think you're right. I think that is potentially an answer as to how you, not incrementally, but revolutionized sort of the data business. Last question, is talking about GreenLake, how this all fits in. Why GreenLake? Why do you guys feel as though it's differentiable in the market place? >> So, I mean, something that you asked earlier as well, time to value, right? I think that's a very important attribute and kind of a design factor as we look at GreenLake. If you look at GreenLake overall, kind of what does it stand for? It stands for experience. How do we make sure that we have the right experience for the users, right? We spoke about it in context of data. How do we have a similar experience for different users of data, but just broadly across an enterprise? So, it's all about experience. How do you automate it, right? How do you automate the workloads? How do you provision fast? How do you give folks a cloud... An experience that they have been used to in the public cloud, on using an Apple iPhone? So it's all about experience, I think that's number one. Number two is about choice and openness. I mean, as we look at GreenLake is not a proprietary platform. We are very, very clear that the design, one of the important design principles is about choice and openness. And that's the reason we are, you hear us talk about Kubernetes, about Apaches Spark, about Delta lake et cetera, et cetera, right? We're using kind of those open source models where customers have a choice. If they don't want to be on GreenLake, they can go to public cloud tomorrow. Or they can run in our Holos if they want to do it that way or in their Holos, if they want to do it. So they should have the choice. Third is about performance. I mean, what we've done is it's not just about the software, but we as a company know how to configure infrastructure for that workload. And that's an important part of it. I mean if you think about the machine learning workloads, we have the right Nvidia chips that accelerate those transactions. So, that's kind of the last, the third one, and the last one, I think, as I spoke about earlier is cost. We are very focused on TCO, but from a customer perspective, we want to make sure that we are giving a value proposition, which is just not about experience and performance and openness, but also about costs. So if you think about GreenLake, that's kind of the value proposition that we bring to our customers across those four dimensions. >> Guys, great conversation. Thanks so much, really appreciate your time and insights. >> Matt: Thanks for having us here, David. >> All right, you're welcome. And thank you for watching everybody. Keep it right there for more great content from HPE GreenLake announcements. You're watching theCUBE. (upbeat music)
SUMMARY :
Good to see you again. What are the big mega trends enterprises that are able to "and I've got the best of data lake. fabric so that the experience about some of the patterns that And I think the last one I would say is, And so that has to include on-prem, the ability to very efficiently to compute from the storage. of the demise of Hadoop of the most successful. services, the ability to pay for end of the day, I mean, So, that's kind of the other I like the term data mesh, bringing the cloud to them. on what you mean by speed. to insights, if you will. that success in the future. in the market place? And that's the reason we are, Thanks so much, really appreciate And thank you for watching everybody.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
David | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Vishal | PERSON | 0.99+ |
Matt Maccaux | PERSON | 0.99+ |
HPE | ORGANIZATION | 0.99+ |
Matt | PERSON | 0.99+ |
24 | QUANTITY | 0.99+ |
Vishal Lall | PERSON | 0.99+ |
Hewlett Packard Enterprise | ORGANIZATION | 0.99+ |
first | QUANTITY | 0.99+ |
Second | QUANTITY | 0.99+ |
second generation | QUANTITY | 0.99+ |
first generation | QUANTITY | 0.99+ |
third generation | QUANTITY | 0.99+ |
tomorrow | DATE | 0.99+ |
iPhone | COMMERCIAL_ITEM | 0.99+ |
Spark | TITLE | 0.99+ |
Third | QUANTITY | 0.99+ |
first one | QUANTITY | 0.99+ |
36 months | QUANTITY | 0.99+ |
Nvidia | ORGANIZATION | 0.99+ |
second generation | QUANTITY | 0.99+ |
telco | ORGANIZATION | 0.99+ |
GreenLake | ORGANIZATION | 0.98+ |
Redshift | TITLE | 0.98+ |
first gen | QUANTITY | 0.98+ |
one | QUANTITY | 0.98+ |
one thing | QUANTITY | 0.98+ |
Teradata | ORGANIZATION | 0.98+ |
third one | QUANTITY | 0.97+ |
SQL | TITLE | 0.97+ |
theCUBE | ORGANIZATION | 0.97+ |
second gen | QUANTITY | 0.96+ |
S3 | TITLE | 0.96+ |
today | DATE | 0.96+ |
Ezmeral Software | ORGANIZATION | 0.96+ |
Apple | ORGANIZATION | 0.96+ |
three types | QUANTITY | 0.96+ |
2000's | DATE | 0.95+ |
third | QUANTITY | 0.95+ |
90's | DATE | 0.95+ |
HPE GreenLake | ORGANIZATION | 0.95+ |
TCO | ORGANIZATION | 0.94+ |
Delta lake | ORGANIZATION | 0.93+ |
80's | DATE | 0.91+ |
Number two | QUANTITY | 0.88+ |
last | DATE | 0.88+ |
theCube | ORGANIZATION | 0.87+ |
Amazon | ORGANIZATION | 0.87+ |
Apache | ORGANIZATION | 0.87+ |
Kubernetes | TITLE | 0.86+ |
Kubernetes | ORGANIZATION | 0.83+ |
Hadoop | TITLE | 0.83+ |
first thing | QUANTITY | 0.82+ |
Snowflake | TITLE | 0.82+ |
four dimensions | QUANTITY | 0.8+ |
Holos | TITLE | 0.79+ |
years | DATE | 0.78+ |
second | QUANTITY | 0.75+ |
X86 | TITLE | 0.73+ |
next couple of years | DATE | 0.73+ |
Delta lake | TITLE | 0.69+ |
Apaches Spark | ORGANIZATION | 0.65+ |
Matt Maccaux, HPE | HPE Discover 2021
(bright music) >> Data by its very nature is distributed and siloed, but most data architectures today are highly centralized. Organizations are increasingly challenged to organize and manage data, and turn that data into insights. This idea of a single monolithic platform for data, it's giving way to new thinking. Where a decentralized approach, with open cloud native principles and federated governance, will become an underpinning of digital transformations. Hi everybody. This is Dave Volante. Welcome back to HPE Discover 2021, the virtual version. You're watching theCube's continuous coverage of the event and we're here with Matt Maccaux, who's a field CTO for Ezmeral Software at HPE. We're going to talk about HPE software strategy, and Ezmeral and specifically how to take AI analytics to scale and ensure the productivity of data teams. Matt, welcome to theCube. Good to see you. >> Good to see you again, Dave. Thanks for having me today. >> You're welcome. So talk a little bit about your role as a CTO. Where do you spend your time? >> I spend about half of my time talking to customers and partners about where they are on their digital transformation journeys and where they struggle with this sort of last phase where we start talking about bringing those cloud principles and practices into the data world. How do I take those data warehouses, those data lakes, those distributed data systems, into the enterprise and deploy them in a cloud-like manner? Then the other half of my time is working with our product teams to feed that information back, so that we can continually innovate to the next generation of our software platform. >> So when I remember, I've been following HP and HPE, for a long, long time, theCube has documented, we go back to sort of when the company was breaking in two parts, and at the time a lot of people were saying, "Oh, HP is getting rid of their software business, they're getting out of software." I said, "No, no, no, hold on. They're really focusing", and the whole focus around hybrid cloud and now as a service, you've really retooling that business and sharpened your focus. So tell us more about Ezmeral, it's a cool name, but what exactly is Ezmeral software? >> I get this question all the time. So what is Ezmeral? Ezmeral is a software platform for modern data and analytics workloads, using open source software components. We came from some inorganic growth. We acquired a company called Cytec, that brought us a zero trust approach to doing security with containers. We bought BlueData who came to us with an orchestrator before Kubernetes even existed in mainstream. They were orchestrating workloads using containers for some of these more difficult workloads. Clustered applications, distributed applications like Hadoop. Then finally we acquired MapR, which gave us this scale out distributed file system and additional analytical capabilities. What we've done is we've taken those components and we've also gone out into the marketplace to see what open source projects exist to allow us to bring those cloud principles and practices to these types of workloads, so that we can take things like Hadoop, and Spark, and Presto, and deploy and orchestrate them using open source Kubernetes. Leveraging GPU's, while providing that zero trust approach to security, that's what Ezmeral is all about is taking those cloud practices and principles, but without locking you in. Again, using those open source components where they exist, and then committing and contributing back to the opensource community where those projects don't exist. >> You know, it's interesting, thank you for that history, and when I go back, I have been there since the early days of Big Data and Hadoop and so forth and MapR always had the best product, but they couldn't get it out. Back then it was like kumbaya, open source, and they had this kind of proprietary system but it worked and that's why it was the best product. So at the same time they participated in open source projects because everybody did, that's where the innovation is going. So you're making that really hard to use stuff easier to use with Kubernetes orchestration, and then obviously, I'm presuming with the open source chops, sort of leaning into the big trends that you're seeing in the marketplace. So my question is, what are those big trends that you're seeing when you speak to technology executives which is a big part of what you do? >> So the trends are, I think, are a couplefold, and it's funny about Hadoop, but I think the final nails in the coffin have been hammered in with the Hadoop space now. So that leading trend, of where organizations are going, we're seeing organizations wanting to go cloud first. But they really struggle with these data-intensive workloads. Do I have to store my data in every cloud? Am I going to pay egress in every cloud? Well, what if my data scientists are most comfortable in AWS, but my data analysts are more comfortable in Azure, how do I provide that multi-cloud experience for these data workloads? That's the number one question I get asked, and that's probably the biggest struggle for these chief data officers, chief digital officers, is how do I allow that innovation but maintaining control over my data compliance especially when we talk international standards, like GDPR, to restrict access to data, the ability to be forgotten, in these multinational organizations how do I sort of square all of those components? Then how do I do that in a way that just doesn't lock me into another appliance or software vendor stack? I want to be able to work within the confines of the ecosystem, use the tools that are out there, but allow my organization to innovate in a very structured compliant way. >> I mean, I love this conversation and you just, to me, you hit on the key word, which is organization. I want to talk about what some of the barriers are. And again, you heard my wrap up front. I really do think that we've created, not only from a technology standpoint, and yes the tooling is important, but so is the organization, and as you said an analyst might want to work in one environment, a data scientist might want to work in another environment. The data may be very distributed. You might have situations where they're supporting the line of business. The line of business is trying to build new products, and if I have to go through this monolithic centralized organization, that's a barrier for me. And so we're seeing that change, that I kind of alluded to it up front, but what do you see as the big barriers that are blocking this vision from becoming a reality? >> It very much is organization, Dave. The technology's actually no longer the inhibitor here. We have enough technology, enough choices out there that technology is no longer the issue. It's the organization's willingness to embrace some of those technologies and put just the right level of control around accessing that data. Because if you don't allow your data scientists and data analysts to innovate, they're going to do one of two things. They're either going to leave, and then you have a huge problem keeping up with your competitors, or they're going to do it anyway. And they're going to do it in a way that probably doesn't comply with the organizational standards. So the more progressive enterprises that I speak with have realized that they need to allow these various analytical users to choose the tools they want, to self provision those as they need to and get access to data in a secure and compliant way. And that means we need to bring the cloud to generally where the data is because it's a heck of a lot easier than trying to bring the data where the cloud is, while conforming to those data principles, and that's HPE's strategy. You've heard it from our CEO for years now. Everything needs to be delivered as a service. It's Ezmeral Software that enables that capability, such as self-service and secure data provisioning, et cetera. >> Again, I love this conversation because if you go back to the early days of Hadoop, that was what was profound about a Hadoop. Bring five megabytes of code to a petabyte of data, and it didn't happen. We shoved it all into a data lake and it became a data swamp. And that's okay, it's a one dot oh, you know, maybe in data as is like data warehouses, data hubs, data lakes, maybe this is now a four dot oh, but we're getting there. But open source, one thing's for sure, it continues to gain momentum, it's where the innovation is. I wonder if you could comment on your thoughts on the role that open-source software plays for large enterprises, maybe some of the hurdles that are there, whether they're legal or licensing, or just fears, how important is open source software today? >> I think the cloud native developments, following the 12 factor applications, microservices based, paved the way over the last decade to make using open source technology tools and libraries mainstream. We have to tip our hats to Red Hat, right? For allowing organizations to embrace something so core as an operating system within the enterprise. But what everyone realized is that it's support that's what has to come with that. So we can allow our data scientists to use open source libraries, packages, and notebooks, but are we going to allow those to run in production? So if the answer is no, well? Then if we can't get support, we're not going to allow that. So where HPE Ezmeral is taking the lead here is, again, embracing those open source capabilities, but, if we deploy it, we're going to support it. Or we're going to work with the organization that has the committers to support it. You call HPE, the same phone number you've been calling for years for tier one 24 by seven support, and we will support your Kubernetes, your Spark your Presto, your Hadoop ecosystem of components. We're that throat to choke and we'll provide, all the way up to break/fix support, for some of these components and packages, giving these large enterprises the confidence to move forward with open source, but knowing that they have a trusted partner in which to do so. >> And that's why we've seen such success with say, for instance, managed services in the cloud, versus throwing out all the animals in the zoo and say, okay, figure it out yourself. But then, of course, what we saw, which was kind of ironic, was people finally said, "Hey, we can do this in the cloud more easily." So that's where you're seeing a lot of data land. However, the definition of cloud or the notion of cloud is changing. No longer is it just this remote set of services, "Somewhere out there in the cloud", some data center somewhere, no, it's moving to on-prem, on-prem is creating hybrid connections. You're seeing co-location facilities very proximate to the cloud. We're talking now about the edge, the near edge, and the far edge, deeply embedded. So that whole notion of cloud is changing. But I want to ask you, there's still a big push to cloud, everybody has a cloud first mantra, how do you see HPE competing in this new landscape? >> I think collaborating is probably a better word, although you could certainly argue if we're just leasing or renting hardware, then it would be competition, but I think again... The workload is going to flow to where the data exists. So if the data's being generated at the edge and being pumped into the cloud, then cloud is prod. That's the production system. If the data is generated via on-premises systems, then that's where it's going to be executed. That's production, and so HPE's approach is very much co-exist. It's a co-exist model of, if you need to do DevTests in the cloud and bring it back on-premises, fine, or vice versa. The key here is not locking our customers and our prospective clients into any sort of proprietary stack, as we were talking about earlier, giving people the flexibility to move those workloads to where the data exists, that is going to allow us to continue to get share of wallet, mind share, continue to deploy those workloads. And yes, there's going to competition that comes along. Do you run this on a GCP or do you run it on a GreenLake on-premises? Sure, we'll have those conversations, but again, if we're using open source software as the foundation for that, then actually where you run it is less relevant. >> So there's a lot of choices out there, when it comes to containers generally and Kubernetes specifically, and you may have answered this, you get the zero trust component, you've got the orchestrator, you've got the scale-out piece, but I'm interested in hearing in your words why an enterprise would or should consider Ezmeral instead of alternatives to Kubernetes solutions? >> It's a fair question, and it comes up in almost every conversation. "Oh, we already do Kubernetes, we have a Kubernetes standard", and that's largely true in most of the enterprises I speak to. They're using one of the many on-premises distributions to their cloud distributions, and they're all fine. They're all fine for what they were built for. Ezmeral was generally built for something a little different. Yes, everybody can run microservices based applications, DevOps based workloads, but where Ezmeral is different is for those data intensive, in clustered applications. Those sorts of applications require a certain degree of network awareness, persistent storage, et cetera, which requires either a significant amount of intelligence. Either you have to write in Golang, or you have to write your own operators, or Ezmeral can be that easy button. We deploy those stateful applications, because we bring a persistent storage layer, that came from MapR. We're really good at deploying those stateful clustered applications, and, in fact, we've opened sourced that as a project, KubeDirector, that came from BlueData, and we're really good at securing these, using SPIFFE and SPIRE, to ensure that there's that zero trust approach, that came from Scytale, and we've wrapped all of that in Kubernetes. So now you can take the most difficult, gnarly complex data intensive applications in your enterprise and deploy them using open source. And if that means we have to co-exist with an existing Kubernetes distribution, that's fine. That's actually the most common scenario that I walk into is, I start asking about, "What about these other applications you haven't done yet?" The answer is usually, "We haven't gotten to them yet", or "We're thinking about it", and that's when we talk about the capabilities of Ezmeral and I usually get the response, "Oh. A, we didn't know you existed and B well, let's talk about how exactly you do that." So again, it's more of a co-exist model rather than a compete with model, Dave. >> Well, that makes sense. I mean, I think again, a lot of people, they go, "Oh yeah, Kubernetes, no big deal. It's everywhere." But you're talking about a solution, kind of taking a platform approach with capabilities. You got to protect the data. A lot of times, these microservices aren't so micro and things are happening really fast. You've got to be secure. You got to be protected. And like you said, you've got a single phone number. You know, people say one throat to choke. Somebody in the media the other day said, "No, no. Single hand to shake." It's more of a partnership. I think that's apropos for HPE, Matt, with your heritage. >> That one's better. >> So, you know, thinking about this whole, we've gone through the pre big data days and the big data was all the hot buzzword. People don't maybe necessarily use that term anymore, although the data is bigger and getting bigger, which is kind of ironic. Where do you see this whole space going? We've talked about that sort of trend toward breaking down the silos, decentralization, maybe these hyper specialized roles that we've created, maybe getting more embedded or aligned with the line of business. How do you see... It feels like the next 10 years are going to be different than the last 10 years. How do you see it, Matt? >> I completely agree. I think we are entering this next era, and I don't know if it's well-defined. I don't know if I would go out on an edge to say exactly what the trend is going to be. But as you said earlier, data lakes really turned into data swamps. We ended up with lots of them in the enterprise, and enterprises had to allow that to happen. They had to let each business unit or each group of users collect the data that they needed and IT sort of had to deal with that down the road. I think that the more progressive organizations are leading the way. They are, again, taking those lessons from cloud and application developments, microservices, and they're allowing a freedom of choice. They're allowing data to move, to where those applications are, and I think this decentralized approach is really going to be king. You're going to see traditional software packages. You're going to see open source. You're going to see a mix of those, but what I think will probably be common throughout all of that is there's going to be this sense of automation, this sense that, we can't just build an algorithm once, release it and then wish it luck. That we've got to treat these analytics, and these data systems, as living things. That there's life cycles that we have to support. Which means we need to have DevOps for our data science. We need a CI/CD for our data analytics. We need to provide engineering at scale, like we do for software engineering. That's going to require automation, and an organizational thinking process, to allow that to actually occur. I think all of those things. The sort of people, process, products. It's all three of those things that are going to have to come into play, but stealing those best ideas from cloud and application developments, I think we're going to end up with probably something new over the next decade or so. >> Again, I'm loving this conversation, so I'm going to stick with it for a sec. It's hard to predict, but some takeaways that I have, Matt, from our conversation, I wonder if you could comment? I think the future is more open source. You mentioned automation, Devs are going to be key. I think governance as code, security designed in at the point of code creation, is going to be critical. It's no longer going be a bolt on. I don't think we're going to throw away the data warehouse or the data hubs or the data lakes. I think they become a node. I like this idea, I don't know if you know Zhamak Dehghani? but she has this idea of a global data mesh where these tools, lakes, whatever, they're a node on the mesh. They're discoverable. They're shareable. They're governed in a way. I think the mistake a lot of people made early on in the big data movement is, "Oh, we got data. We have to monetize our data." As opposed to thinking about what products can I build that are based on data that then can lead to monetization? I think the other thing I would say is the business has gotten way too technical. (Dave chuckles) It's alienated a lot of the business lines. I think we're seeing that change, and I think things like Ezmeral that simplify that, are critical. So I'll give you the final thoughts, based on my rant. >> No, your rant is spot on Dave. I think we are in agreement about a lot of things. Governance is absolutely key. If you don't know where your data is, what it's used for, and can apply policies to it. It doesn't matter what technology you throw at it, you're going to end up in the same state that you're essentially in today, with lots of swamps. I did like that concept of a node or a data mesh. It kind of goes back to the similar thing with a service mesh, or a set of APIs that you can use. I think we're going to have something similar with data. The trick is always, how heavy is it? How easy is it to move about? I think there's always going to be that latency issue, maybe not within the data center, but across the WAN. Latency is still going to be key, which means we need to have really good processes to be able to move data around. As you said, govern it. Determine who has access to what, when, and under what conditions, and then allow it to be free. Allow people to bring their choice of tools, provision them how they need to, while providing that audit, compliance and control. And then again, as you need to provision data across those nodes for those use cases, do so in a well measured and governed way. I think that's sort of where things are going. But we keep using that term governance, I think that's so key, and there's nothing better than using open source software because that provides traceability, auditability and this, frankly, openness that allows you to say, "I don't like where this project's going. I want to go in a different direction." And it gives those enterprises a control over these platforms that they've never had before. >> Matt, thanks so much for the discussion. I really enjoyed it. Awesome perspectives. >> Well thank you for having me, Dave. Excellent conversation as always. Thanks for having me again. >> You're very welcome. And thank you for watching everybody. This is theCube's continuous coverage of HPE Discover 2021. Of course, the virtual version. Next year, we're going to be back live. My name is Dave Volante. Keep it right there. (upbeat music)
SUMMARY :
and ensure the productivity of data teams. Good to see you again, Dave. Where do you spend your time? and practices into the data world. and at the time a lot and practices to these types of workloads, and MapR always had the best product, the ability to be forgotten, and if I have to go through this the cloud to generally where it continues to gain momentum, the committers to support it. of cloud or the notion that is going to allow us in most of the enterprises I speak to. You got to be protected. and the big data was all the hot buzzword. of that is there's going to so I'm going to stick with it for a sec. and then allow it to be free. for the discussion. Well thank you for having me, Dave. Of course, the virtual version.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave | PERSON | 0.99+ |
Matt Maccaux | PERSON | 0.99+ |
Matt | PERSON | 0.99+ |
Dave Volante | PERSON | 0.99+ |
HP | ORGANIZATION | 0.99+ |
Cytec | ORGANIZATION | 0.99+ |
Next year | DATE | 0.99+ |
two parts | QUANTITY | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Zhamak Dehghani | PERSON | 0.99+ |
HPE | ORGANIZATION | 0.99+ |
BlueData | ORGANIZATION | 0.99+ |
today | DATE | 0.99+ |
Hadoop | TITLE | 0.99+ |
12 factor | QUANTITY | 0.99+ |
each business unit | QUANTITY | 0.99+ |
GDPR | TITLE | 0.98+ |
Golang | TITLE | 0.98+ |
each group | QUANTITY | 0.98+ |
Ezmeral | ORGANIZATION | 0.97+ |
three | QUANTITY | 0.97+ |
zero trust | QUANTITY | 0.97+ |
single phone number | QUANTITY | 0.96+ |
Ezmeral | PERSON | 0.96+ |
single | QUANTITY | 0.96+ |
one | QUANTITY | 0.96+ |
seven | QUANTITY | 0.95+ |
kumbaya | ORGANIZATION | 0.95+ |
one thing | QUANTITY | 0.93+ |
Big Data | TITLE | 0.91+ |
two things | QUANTITY | 0.9+ |
theCube | ORGANIZATION | 0.9+ |
next 10 years | DATE | 0.89+ |
four dot | QUANTITY | 0.89+ |
first mantra | QUANTITY | 0.89+ |
last 10 years | DATE | 0.88+ |
Ezmeral Software | ORGANIZATION | 0.88+ |
one environment | QUANTITY | 0.88+ |
MapR | ORGANIZATION | 0.87+ |
Scytale | ORGANIZATION | 0.87+ |
next decade | DATE | 0.86+ |
first | QUANTITY | 0.86+ |
Kubernetes | TITLE | 0.86+ |
SPIFFE | TITLE | 0.84+ |
SPIRE | TITLE | 0.83+ |
tier one | QUANTITY | 0.82+ |
Spark | TITLE | 0.8+ |
five megabytes of code | QUANTITY | 0.77+ |
KubeDirector | ORGANIZATION | 0.75+ |
one question | QUANTITY | 0.74+ |
Single hand | QUANTITY | 0.74+ |
years | QUANTITY | 0.73+ |
last decade | DATE | 0.73+ |
2021 | DATE | 0.73+ |
Azure | TITLE | 0.7+ |