Bill Peterson, MapR - Spark Summit East 2017 - #SparkSummit - #theCUBE

>> Narrator: Live from Boston, Massachusetts, this is theCUBE, covering Spark Summit East 2017. Brought to you by Databricks. Now, here are your hosts Dave Vellante and George Gilbert. >> Welcome back to Boston, everybody, this is theCUBE, the leader in live tech coverage. We're here in Boston, in snowy Boston. This is Spark Summit. Spark Summit does a East Coast version, they do a West Coast version, they've got one in Europe this year. theCUBE has been a partner with Databricks as the live broadcast partner. Our friend Bill Peterson is here. He's the head of partner marketing at MapR. Bill, good to see you again. >> Thank you, thanks for having me. >> So how's the show going for you? >> It's great. >> Give us the vibe. We're kind of windin' down day two. >> It is. The show's been great, we've got a lot of traffic coming by, a lot of deep technical questions which is-- >> Dave: Hardcore at the show-- >> It is, it is. I spend a lot of time there smiling and going, "Yeah, talk to him." (laughs) But it's great. We're getting those deep technical questions and it's great. We actually just got one on Lustre, which I had to think for a minute, oh, HPC. It was like way back in there. >> Dave: You know, Cray's on the floor. >> Oh, yeah that's true. But a lot of our customers as well. UnitedHealth Group, Wells Fargo, AMEX coming by. Which is great to see them and talk to them, but also they've got some deep technical questions for us. So it's moving the needle with existing customers but also new business, which is great. >> So I got to ask a basic question. What is MapR? MapR started in the early days of Hadoop distro, vendor, one of the big three. When somebody says to you what is MapR, what do you say? My answer today is MapR is an enterprise software company that delivers a converged data platform. That converged data platform consists of a file system, a NoSQL database, a Hadoop distribution, a Spark distribution, and a set of data management tools. And as a customer of MapR, you get all of those. You can turn 'em all on if you'd like. You can just turn on the file system, for example, if you wanted to just use the file system for storage. But the enterprise software piece of that is all the hardening we do behind the scenes on things like snapshots, mirroring, data governance, multi-tenancy, ease of use performance, all of that baked in to the solution, or the platform as we're calling it now. So as you're kind of alluding to, a year ago now we kind of got out of that business of saying okay, lead 100% with Hadoop and then while we have your attention, or if we don't, hey wait, we got all this other stuff in the basket we want to show you, we went the platform play and said we're going to include everything and it's all there and then the baseline underneath is the hardening of it, the file system, the database, and the streaming product, actually, which I didn't mention, which is kind of the core, and everything plays off of there. And that honestly has been really well-received. And it just, I feel, makes it so much easier because-- It happened here, we get the question, okay, how are you different from Cloudera or Hortonworks? And some of it here, given the nature of the attendees, is very technical, but there's been a couple of business users that I've talked to. And when I talk about us as an enterprise software company delivering a plethora of solutions versus just Hadoop, you can see the light going on sometimes in people's eyes. And I got it today, earlier, "I had no idea you had a file system," which, to me, just drives me insane because the file system is pretty cool, right? >> Well you guys are early on in investing in that file system and recovery capabilities and all the-- >> Two years in stealth writing it. >> Nasty, gnarly, hard stuff that was kind of poo-pooed early on. >> Yeah, yeah. MapR was never patient about waiting for the open source community to just figure it out and catch up. You always just said all right, we're going to solve this problem and go sell. >> And I'm glad you said that. I want to be clear. We're not giving up on open source or anything, right? Open source is still a big piece. 50% of our engineers' time is working on open source projects. That's still super important to us. And then back in November-ish last year we announced the MapR Ecosystem Packs, which is our effort to help our customers that are using open source components to stay current. 'Cause that's a pain in the butt. So this is a set of packages that have a whole bunch of components. We lead with Spark and Drill, and that was by customer request, that they were having a hard time keeping current with Spark and Drill. So the packs allow them to come up to current level within the converged data platform for all of their open source components. And that's something we're going to do at dot Level, so I think we're at 2.1 or 2 now. The dot levels will bring you up on everything and then the big ones, like the 3.0s, the 4.0s, will bring Spark and Drill current. And so we're going to kind of leapfrog those. So that's still a really important part of our business and we don't want to forget that part, but what we're trying here to do is, via the platform, is deliver all of that in one entity, right? >> So the converged data platform is relevant presumably because you've got the history of Hadoop, 'cause you got all these different components and you got to cobble 'em together and they're different interfaces and different environments, you're trying to unify that and you have unified that, right? >> Yeah, yeah. >> So what is your customer feedback with regard to the converged data platform? >> Yeah so it's a great question because for existing customers, it was like, ah, thank you. It was one of those, right, because we're listening. Actually, again, glad you said that. This week, in addition to Spark Summit we're doing our yearly customer advisory board so we've got, like a lot of vendors, we've got a 30 plus company customer advisory board that we bring in and we sit down with them for a couple of days and they give us feedback on what we should and shouldn't be doing and where, directional and all that, which is super important. And that's where a lot of this converged data platform came out of is the need for... There was just too much, it's kind of confusing. I'll give the example of streams, right? We came out with our streaming product last year and okay, I'm using Hadoop, I'm using your file system, I'm using NoSQL, now you're adding streams, this is great, but now, like MEP, the Ecosystem Packages, I have to keep everything current. You got to make it easier for me, you got to make my life easier for me. So for existing customers it's a stay current, I like this, the model, I can turn on and off what I want when I want. Great model for them, existing business. For new business it gets us out of that Hadoop-only mode, right? I kind of jokingly call us Hadoop plus plus plus plus. We keep adding solutions and add it to a single, cohesive data platform that we keep updated. And as I mentioned here, talking to new customers or new prospects, our potential new business, when I describe the model you can just see the light going on and they realize wow, there's a lot more to this than I had imagined. I got it earlier today, I thought you guys only did Hadoop. Which is a little infuriating as a marketer, but I think from a mechanism and a delivery and a message and a story point of view, it's really helped. >> More Cube time will help get this out there. (laughs) >> Well played, well played. >> It's good to have you back on. Okay, so Spark comes along a couple years ago and it was like ah, what's going to happen to Hadoop? So you guys embraced Spark. Talk more specifically about Spark, where it fits in your platform and the ecosystem generally. >> Spark, Hadoop, others as a entity to bring data into the converged data platform, that's one way to think about it. Way oversimplified, obviously, but that's a really great way, I think, to think about it is if we're going to provide this platform that anybody can query on, you can run analytics against. We talk a lot about now converged applications. So taking historical data, taking operational data, so streaming data, great example. Putting those together and you could use the Data Lake example if you want, that's fine. But putting them into a converged application in the middle where they overlap, kind of typical Venn diagram where they overlap, and that middle part is the converged application. What's feeding that? Well, Spark could be feeding that, Hadoop could be feeding that. Just yesterday we announced a Docker for containers, that could be feeding into the converged data platform as well. So we look at all of these things as an opportunity for us to manage data and to make data accessible at the enterprise level. And then that enterprise level goes back to what I was talkin' before, it's got to have all of those things, like multi-tenancy and snapshots and mirroring and data governance, security, et cetera. But Spark is a big component of that. All of the customers who came by here that I mentioned earlier, which are some really good names for us, are all using Spark to drive data into the converged data platform. So we look at it as we can help them build new applications within converged data platform with that data. So whether it's Spark data, Hadoop data, container data, we don't really care. >> So along those lines, if the focus of intense interest right now is on Spark, and Spark says oh, and we work with all these databases, data storers, file systems, if you approach a customer who's Spark first, what's the message relative to all the other data storers that they can get to through, without getting too techy, their API? >> Sure, sure. I think as you know, George, we support a whole bunch of APIs. So I guess for us it's the breadth. >> But I'm thinking of Spark in particular. If someone says specifically, I want to run Databricks, but I need something underneath it to capture the data and to manage it. >> Well I think that's the beauty of our file system there. As I mentioned, if you think about it from an architectural point of view, our file system along the bottom, or it could be our database or our streaming product, but in this instance-- >> George: That's what I'm getting at too, all three. >> Picture that as the bottom layer as your storage-- I shouldn't say storage layer but as the bottom layer. 'Cause it's not just storage, it's more than storage. Middle layer is maybe some of your open source tools and the like, and then above that is what I called your data delivery mechanisms. Which would be Spark, for example, one bucket. Another bucket could be Hadoop, and another bucket could be these microservices we're talking about. Let my draw the picture another way using a partner, SAP. One of the things we've had some success with SAP is SAP HANA sitting up here. SAP would love to have you put all your data in HANA. It's probably not going to happen. >> George: Yeah, good luck. >> Yeah, good luck, right? But what if you, hey customer, what if you put zero to two years worth of data, historical data, in HANA. Okay, maybe the customer starts nodding their head like you just did. Hey customer, what if you put two to five years worth of data in Business Warehouse. Guess what, you already own that. You've been an SAP customer for awhile, you already have it. Okay, the customer's now really nodding their head. You got their attention. To your original question, whether it's Spark or whatever, five plus years, put it in MapR. >> Oh, and then like HANA Vora could do the query. >> Drill can query across all of them. >> Oh, right including the Business Warehouse, okay. >> So we're running in the file system. That, to me, and we do this obviously with our joint SAP MapR customers, that to me is kind of a really cool vision. And to your original question, if that was Spark at the top feeding it rather than SAP, sure, right? Why not? >> What can you share with us, Bill, about business metrics around MapR? However you choose to share it, head count, want to give us gross margins by product, that's great, but-- (laughs) >> Would you like revenues too, Dave? >> We know they're very high because you're a software company, so that's actually a bad question. I've already profit-- (laughs) >> You don't have to give us top line revenues-- >> So what are you guys saying publicly about the company, its growth. >> That's fair. >> Give us the latest. >> Fantastic, number one. Hiring like crazy, we're well north of 500 people now. I actually, you want to hear a funny story? I yesterday was texting in the booth, with a candidate from my team, back and forth on salary. Did the salary negotiation on text right there in the booth and closed her, she starts on the 27th, so. >> Dave: Congratulations. >> I'm very excited about that. So moving along on that. Seven, 800 plus customers as we talk about... We just finished our fiscal year on January 31st, so we're on Feb one fiscal year. And we always do a momentum press release, which will be coming out soon. Hiring, again, like crazy, as I mentioned, executive staff is all filled in and built to scale which we're really excited about. We talk a lot about the kind of uptake of-- it used to be of the file system, Hadoop, et cetera on its own, but now in this one the momentum release we'll be doing, we'll talk about the converged data platform and the uplift we've seen from that. So we obviously can't talk revenue numbers and the like, but everything... David, I got to tell you, we've been doin' this a long time, all of that is just all moving in the right direction. And then the other example I'll give you from my world, in the partner world. Last year I rebranded our partner to the converged partner program. We're going with this whole converged thing, right? And we established three levels, elite, preferred, and affiliate with different levels there. But also, there's revenue requirements at each level, so elite, preferred, and affiliate, and there's resell and influence revenues, we have MDF funds, not only from the big guys coming to us, but we're paying out MDF funds now to select partners as well. So all of this stuff I always talk about as the maturity of the company, right? We're maturing in our messaging, we're maturing in the level of people who are joining, and we're maturing in the customers and the deals, the deal sizes and volumes that we're seeing. It's all movin' in the right direction. >> Dave: Great, awesome, congratulations. >> Bill: Thank you, yeah, I'm excited. >> Can you talk about number of customers or number of employees relative to last year? >> Oh boy. Honestly, George, I don't know off the top of my head. I apologize, I don't know the metric, but I know it's north of 500 today, of employees, and it's like seven, 800 customers. >> Okay, okay. >> Yeah, yeah. >> And a little bit more on this partner, elite, preferred, and affiliate. >> Affiliate, yeah. >> What did you call it, the converged partners program? >> Converged-- Yeah, yeah. >> What are some of the details of that? >> Sure. So the elites are invite only, and those are some of the bigger ones. So for us, we're-- >> Dave: Like, some examples. >> Cisco, SAP, AWS, others, but those are some of the big ones. And they were looking at things like resell and influence revenue. That's what I track in my... I always jokingly say at MapR, even though we're kind of a big startup now, I always jokingly say at MapR you have three jobs. You have the job you were hired for, you have your Thursday night job, and you have your Sunday night job. (Dave and George laugh) In the job that I was hired for, partner marketing, I track influence and resell revenue. So at the elite level, we're doing both. Like Cisco resells us, so this S-Series, we're in their SKU, their sales reps can go sell an S-Series for big data workloads or analytical workloads, MapR, on it, off you go. Our job then is cashing checks, which I like. That's a good job to have in this business. At the preferred level it's kind of that next tier of big players, but revenue thresholds haven't moved into the elite yet. Partners in there, like the MicroStrategies of the world, we're doing a lot with them, Tableau, Talend, a lot of the BI vendors in there. And then the affiliates are the smaller guys who maybe we'll do one piece of a campaign during the year with them. So I'll give you an example, Attunity, you guys know those guys right here? >> Sure >> Yeah, yeah. >> Last year we were doing a campaign on DWO, data warehouse offload. We wanted to bring them in but this was a MapR campaign running for a quarter, and we're typical, like a lot of companies, we run four campaigns a year and then my partner in field stuff kind of opts into that and we run stuff to support it. And then corporate marketing does something. Pretty traditional. But what I try and do is pull these partners into those campaigns. So we did a webinar with Attunity as part of that campaign. So at the affiliate level, the lower level, we're not doing a full go-to-market like we would with the elites at the top, but they're being brought into our campaigns and then obviously hopefully, we hope on the other side they're going to pull us in as well. >> Great, last question. What should we pay attention to, what's comin' up? >> Yeah, so-- >> Let's see, we got some events, we got Strata coming up you'll be out your way, or out MapR way. >> As my Twitter handle says, seat 11A. That's where I am. (laughs) Yeah, I mean the Docker announcement we're really excited about, and microservices. You'll see more from us on the whole microservices thing. Streaming is still a big one, we think, for this year. You guys probably agree. That's why we announced the MapR streaming product last year. So again, from a go-to-market point of view and kind of putting some meat behind streaming not only MapR but with partners, so streaming as a component and a delivery model for managing data in CDP. I think that's a big one. Machine learning is something that we're seeing more and more touching us from a number of customers but also from the partner perspective. I see all the partner requests that come in to join the partner program, and there's been an uptick in the machine learning customers that want to come in and-- Excuse me, partners, that want to be talking to us. Which I think is really interesting. >> Where you would be the sort of prediction serving layer? >> Exactly, exactly. Or a data store. A lot of them are looking for just an easy data store that the MapR file system can do. >> Infrastructure to support that, yeah. >> Commodity, right? The whole old promise of Hadoop or just a generic file system is give me easy access to storage on commodity hardware. The machine learning-- >> That works. >> Right. The existing machine learning vendors need an answer for that. When the customer asks them, they want just an easy answer, say oh, we just use MapR FS for that and we're done. Okay, that's fine with me, I'll take that one. >> So that's the operational end of that machine learning pipeline that we call DevOps for data scientists? >> Correct, right. I guess the nice synergy there is the whole, going back to the Docker microservices one, there's a DevOps component there as well. So, might be interesting marrying those together. >> All right, we got to go, Bill, thanks very much, good to see you again. >> All right, thank you. >> All right, George and I will be back to wrap. We're going to part two of our big data forecast right now, so stay with us, right back. (digital music) (synth music)

Published Date : Feb 9 2017

SUMMARY :

Brought to you by Databricks. Bill, good to see you again. We're kind of windin' down day two. a lot of deep technical questions which is-- "Yeah, talk to him." So it's moving the needle with existing customers is all the hardening we do behind the scenes that was kind of poo-pooed early on. You always just said all right, we're going to solve So the packs allow them to come up to current level I got it earlier today, I thought you guys only did Hadoop. More Cube time will help get this out there. It's good to have you back on. and that middle part is the converged application. I think as you know, George, we support and to manage it. our file system along the bottom, and the like, and then above that is what I called Okay, maybe the customer starts nodding their head And to your original question, if that was Spark at the top so that's actually a bad question. So what are you guys saying publicly and closed her, she starts on the 27th, so. all of that is just all moving in the right direction. Honestly, George, I don't know off the top of my head. And a little bit more on this partner, elite, Yeah, yeah. So the elites are invite only, So at the elite level, we're doing both. So at the affiliate level, the lower level, What should we pay attention to, what's comin' up? Let's see, we got some events, we got Strata coming up I see all the partner requests that come in that the MapR file system can do. to storage on commodity hardware. When the customer asks them, they want just an easy answer, I guess the nice synergy there is the whole, thanks very much, good to see you again. We're going to part two of our big data forecast

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
George	PERSON	0.99+
Dave Vellante	PERSON	0.99+
UnitedHealth Group	ORGANIZATION	0.99+
George Gilbert	PERSON	0.99+
AMEX	ORGANIZATION	0.99+
Bill Peterson	PERSON	0.99+
Boston	LOCATION	0.99+
Dave	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
two	QUANTITY	0.99+
MapR	ORGANIZATION	0.99+
Wells Fargo	ORGANIZATION	0.99+
Last year	DATE	0.99+
50%	QUANTITY	0.99+
five years	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
yesterday	DATE	0.99+
two years	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
Bill	PERSON	0.99+
Cloudera	ORGANIZATION	0.99+
30 plus	QUANTITY	0.99+
zero	QUANTITY	0.99+
last year	DATE	0.99+
Two years	QUANTITY	0.99+
today	DATE	0.99+
November	DATE	0.99+
both	QUANTITY	0.99+
January 31st	DATE	0.99+
Feb one	DATE	0.99+
HANA	TITLE	0.99+
This week	DATE	0.99+
Thursday night	DATE	0.99+
SAP	ORGANIZATION	0.99+
Sunday night	DATE	0.99+
five plus years	QUANTITY	0.99+
three jobs	QUANTITY	0.99+
Tableau	ORGANIZATION	0.99+
Boston, Massachusetts	LOCATION	0.99+
Seven, 800 plus customers	QUANTITY	0.99+
100%	QUANTITY	0.98+
Talend	ORGANIZATION	0.98+
NoSQL	TITLE	0.98+
Hadoop	TITLE	0.98+
seven, 800 customers	QUANTITY	0.98+
each level	QUANTITY	0.98+
a year ago	DATE	0.98+
Spark	TITLE	0.98+
Twitter	ORGANIZATION	0.98+
this year	DATE	0.98+
theCUBE	ORGANIZATION	0.98+
day two	QUANTITY	0.98+
27th	DATE	0.97+
One	QUANTITY	0.97+
one	QUANTITY	0.97+
SAP HANA	TITLE	0.97+
Spark Summit	EVENT	0.97+
East Coast	LOCATION	0.96+

Jack Norris, MapR - Spark Summit East 2016 #SparkSummit #theCUBE

>>From New York expecting the signal to nine. It's the cube covering sparks summit east brought to you by spark summit. Now your hosts, Dave Volante and George Gilbert >>Right here in Midtown at the Hilton hotel. This has sparked somebody and this is the cube. The cube goes out to the events. We extract the signal from the noise. Jack Norris is here. He's the CMO of Mapbox, long time cube, alum jackets. It's great to see you again. Hey, if you've been here since the beginning of this whole big data >>Meme and it might've started here, I don't know. I think we've yeah, >>I think you're right. I mean, it really did start it. I think in this building, it was our first big data show at the original, you know, uh, uh, Hadoop world. And, uh, and you guys, like I say, I've been there from the start. Uh, you were kind of impatient early on. You said, you know, we're just going to go build solutions and, uh, and ignore the noise and you built a really nice, nice business. Um, you guys have been growing, you're growing your Salesforce and, uh, and things are good and all of a sudden, boom, the spark thing comes in. So we're seeing the evolution. I remember saying to George and the early days of a dupe, we were geeking out talking to all the bits and bytes and then it turned into a business discussion. It's like we're back to the hardcore bits and bites. So give us the update from Matt bar's point of view, where are we in the whole big data space? >>Well, I think, um, I think it has transitioned. I mean, uh, if you look at the typical large fortune company, the web to Datto's, it's really, how do we best leverage our data and how do we leverage our data in that we can, we can make decisions much faster, right? That high-frequency decision-making process. Um, and typically that involves taking production data and analytics and joining them together so that you're actually impacting business as it happens and to do that effectively requires, um, innovations. So the exciting thing about spark is taking and, uh, and having a distributed compute engine, it's much easier to develop and, uh, in much faster. >>So in the remember the early days we'd be at these shows and the big question was, you know, can you take the humans out of the equation? It's like, no, no humans are the last mile. Um, is that, is that changing or would we still need that human interaction or, >>Um, humans are important part of the process, but increasingly if you can adjust and make, you know, small algorithmic decisions, um, and, and make those decisions at that kind of moment of truth, you got big impact, and I'll give you a few examples. So, um, ad platforms, you know, Rubicon project over a hundred billion ad auctions a day, you know, humans, part of that process in terms of setting that up and reviewing the process, but each, you know, each supply and demand decision, there is an automated decision optimizing that has a huge impact on the bottom line, um, fraud, uh, you know, credit card swiping that transaction and deciding is this fraudulent or not avoiding false positives, et cetera, a big leveraged item. So we're seeing things like that across manufacturing, across retail healthcare. And, um, it isn't about asking bigger questions or doing reports and looking back at, you know, what happened last week. It's more, how can I have an infrastructure in place that allows this organization to be agile? Because it's not the companies with the most data that's going to win. It's the companies that are the most agile and making intelligent. >>So it's so much data. Humans can ingest it any faster. I mean, we just, we can't keep up. So the world needs data scientists that needs trained developers. You've got some news I want to talk about on the training side, but even that we can only throw so many bodies at the problem. So it's really software. That's going to allow us to scale it. Software's hard. Software takes time. So we've seen a lot of the spend in the analytics, big data world on, on services. And obviously you guys and others have been working hard to shift it towards software. I want to come back to that training issue. We heard this morning about, uh, Databricks launched a move. They trained 20,000 people. That's a lot, but still long way to go. You guys are putting some investment into training. Talk about that news. Yeah. >>Yeah. Um, well it starts at the underlying software. If you can do things in the platform to make it much easier and do things that are hard to surround with services, like, uh, data protection, right? If you've lost data, it doesn't matter how many people you throw at it, you can't recover it. Right. So that's kind of the starting point you're gonna get fired. >>The, the, uh, the approach we've taken is, is to take, uh, a software product approach to the training as well. So we rolled out on demand training. So it's free, it's on demand. You work at your own pace. It's got different modules, there's some training associated with that, or some hands-on labs, if you will. Um, we launched that last January. So it's basically coming up the year anniversary. We recently celebrated, we trained 50,000 people, uh, on, on Hadoop and big data. Um, today we're announcing expansion on spark classes. We've got full curriculum around spark, including a certification. So you can get sparked certification through this, this map, our on demand training. Okay. >>Gotcha. You said something really, really intriguing that I want to dive into a little bit is where we were talking about the small decisions that can be made really, really fast for that a human in the loop human might have to train them, but it at runtime now where you said, it's not about asking bigger questions, it's finding faster answers, um, what had to change in your platform or in the underlying technology to make that possible. >>You know, um, there's a lot that into it. It's typically a series of functions, uh, a kind of breadth that needs to be brought to the problem as well as squeezing out latencies. So instead of, um, the traditional approach, which is different applications and different analytic techniques dictate a separate silo, a separate, you know, scheme of data. And you've got those all around the organization and data kind of travels, and you get an answer at the end of some period of time. Uh, it's converging that altogether into a single platform, squeezing out those latencies so that you can have an informed action at the speed of business, if you will. And, >>Um, let's say spark never came along. Would that be possible? >>Yes. Yes. Would you, how would you, so if you look at kind of the different architectures that are out there, there's typically deep analytics in terms of, you know, let's go look at the trends, you know, the last seven years, what happened. And then look, let's look at, um, doing actions on a streaming set, say for instance, storm, and then let's do a real time database operations. So you could do that with, with HBase or map RDB and all of that together. What spark has really done is made that whole development process just much easier and much more streamlined. And that's where a lot of the excitements happen. >>So you mentioned earlier, um, to, to use cases, ad tech and fraud detection. Um, and I want to ask you about those in the state of those. So ad tech obviously has come a long way, but it's still got a ways to go. I mean, you look at, I mean, who's making money on ads. Obviously Google will make tons of money. Everybody else is sorta chasing them Facebook making money. It's probably cause they didn't let Google in. Okay. So how will spark affect sort of that business? Uh, and, and what's map, R's sort of role in evolving that, you know, to the next level. >>So, so, um, there's, there's different kind of compute and the types of things you can do, um, on the data. I think increasingly we're seeing the kind of streaming analytics and making those decisions as the data arrives, right. And then there's the whole ecosystem in terms of how do you coordinate those flows of data? It's not just a simple, here's the origin, here's the destination. There's typically a complex data flow. Um, that's where we've kind of focused on map our streams, this huge publish and subscribe infrastructure so that you can get real-time data to the appropriate location and then do the right operations, a lot of that involved with spark, but not exclusively. >>Okay. And then on fraud detection, um, obviously come a long way. Sampling could have died. Yes. And now, but now we're getting too many false positives. You get the call and, you know, I mean, I get a lot of calls because we can buy so much equipment, but, um, but now what about the next level? What are you guys doing to take fraud detection to the next level? So that when I get on the plane in Boston and I land in London, it knows, um, is that a database problem? Is it an integration problem, a systems problem, and how, what role you guys play in solving that? >>Well, there's, there's, um, you know, there's, there's a lot of details and techniques that probably go, um, beyond, you know, what, what we'll share publicly or what are our customers talk about publicly? I think in general, it's the more data that you can apply to a problem. The more context, the better off you are, that's the way I kind of summarize it so that instead of a sampling or instead of a boy, that's a strange purchase over there, it's understanding, well, this is Dave Valenti and this is the full body of, of, uh, expenditures he's done, then the types of things and here's who he frequently purchases from. And here's kind of a transaction trend started in San Francisco, went to New York, et cetera. So in context it would make more sense. So >>Part of that is more data. And the other part of that is just better algorithms and better, better learnings and applying that on a continuous basis. How are your customers dealing with that, that constraint? I mean, if they got a, a hundred dollars to spend, yeah. They can only spend so much on, on each of those gathering more data, cleaning the data, they spent so much time getting it ready versus making their machine learning algorithms or whatever the other techniques to do. What are you seeing there as sort of best practice? It was probably varies. I'm sure, but give us some color on it. >>Um, I'll actually go back to Google and Google a letter last round, um, you know, excellent, excellent insights coming from Google. They wrote a paper called the unreasonable effectiveness of data and in it, they basically squarely addressed that problem. And given the choice to invest in either the complex model and algorithm or put more data at it, putting more data, had a huge impact. And, um, you know, my simple explanation is if you're sampling the data, you have to have a model that tries to recreate reality. If you're looking at all of the data, then the anomalies can, can pop up and be more apparent. And, um, the more context you can bring, the more data from other sources. So you get around, you know, a better picture of what's happening, the better off you are. And so that requires scale. It requires speed and requires different techniques that can be brought to bear, right? The database operation, here's a streaming operation, here's a deep, you know, file machine learning algorithm. >>So there's a lot of vendors in the sort of big data ecosystem are coming at spark from different angles and, um, are, are trying to add value to it and sort of bathe themselves in sort of the halo. Yep. Now you guys took some time upfront to build a converged platform so that you weren't trying to wrap your arms around 37 different projects. Can you tell us how having perhaps not anticipated spark how this converts platform allows you to add more value to it than other approaches? >>So, so we simplify, if you look at the Hadoop ecosystem, it's basically separated into the components for compute and management on top of the data layer, right? The Hadoop distributed file system. So how do you scale data? How do you protect it? It's very simply what's going on. Spark really does a great job at that top layer. Doesn't do anything about defining the underlying storage layer in the Hadoop community that underlying storage layer is a batch system. So you're trying to do, you know, micro batch kind of streaming operations on top of batch oriented data. What we addressed was to take that whole data layer, make it real time, make it random. Read-write converge enterprise storage together with Hadoop support and spark support on a single platform. And that's basically >>With the difference and to make an enterprise great. You guys were really the first to lead the lecture. You were, everybody started talking about attic price straight after you were kind of delivering it. So you've had a lead there. Do you feel like you still have a lead there, or is that the kind of thing where you sort of hit the top of the S-curve and start innovating elsewhere? >>NC state did a study, uh, just this past year, a recent study identified that only 25% of data corruption issues are identified and properly handled by the Hadoop distributed file system. 42% of those are silent. So there's a huge gap in terms of quote unquote enterprise grade features and what we think. >>Yes, silent data corruption has been a problem for decades now. And you're saying it's no different in the duke ecosystem, especially as, as mainstream businesses start to, uh, to adopt this what's happening in the valley. Uh, we're seeing, you know, in the wall street journal every day you read about down rounds, flat rounds, people can't get B rounds. Uh, you guys are funded, you know, you're growing, you're talking about investments, you know, what do you see? Do you, do you feel like you're achieving escape velocity? Um, maybe give us sort of an update on, uh, the state of the business. >>Yeah. I, I think the state of the business is best represented by the customers, right? And the customers kind of vote, right. They vote in terms of, you know, how well is this technology driving their business? So we've got a recent study, um, that kind of shows the, the returns that customers, um, are getting, uh, we've got a 1% chance, a 99% retention rate with our customers. We've got, uh, an expansion rate. That's, that's unbelievable. We've got multi-million dollar customers in, uh, in seven of the top verticals and nine out of the top $10 million customers. So we're seeing significant investments and more importantly, significant returns on the part of customers where they're not just doing a single application on the platform, but multiple >>Applications, Jack Norris map are always focused. Always a pleasure having you on the cube. Thanks very much for coming on. Appreciate it. Keep right there, buddy. We'll be back with our next guest is the cube we're live from spark somebody's right back. Okay.

Published Date : Feb 17 2016

SUMMARY :

covering sparks summit east brought to you by spark summit. It's great to see you again. I think we've yeah, You said, you know, we're just going to go build solutions and, if you look at the typical large fortune company, So in the remember the early days we'd be at these shows and the big question was, you know, and reviewing the process, but each, you know, each supply and demand decision, And obviously you guys and others have been working hard to shift it towards software. If you can do things in the platform to make it much easier and do things that are hard to surround So you can get sparked certification through really fast for that a human in the loop human might have to train them, but it at runtime around the organization and data kind of travels, and you get an answer at the end of some period Would that be possible? let's go look at the trends, you know, the last seven years, what happened. So you mentioned earlier, um, to, to use cases, ad tech and fraud detection. so that you can get real-time data to the appropriate location and then do the right operations, You get the call and, you know, I mean, I get a lot of calls because we can buy so much equipment, but, The more context, the better off you are, that's the way I kind of summarize What are you seeing there as sort of best practice? um, you know, my simple explanation is if you're sampling the data, this converts platform allows you to add more value to it than other approaches? So how do you scale data? You were, everybody started talking about attic price straight after you were kind of delivering it. and properly handled by the Hadoop distributed file system. you know, in the wall street journal every day you read about down rounds, flat rounds, people can't get B rounds. They vote in terms of, you know, Always a pleasure having you on the cube.

ENTITIES

Entity	Category	Confidence
Dave Valenti	PERSON	0.99+
Jack Norris	PERSON	0.99+
Dave Volante	PERSON	0.99+
New York	LOCATION	0.99+
London	LOCATION	0.99+
George	PERSON	0.99+
San Francisco	LOCATION	0.99+
Boston	LOCATION	0.99+
George Gilbert	PERSON	0.99+
99%	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
42%	QUANTITY	0.99+
Facebook	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
50,000 people	QUANTITY	0.99+
nine	QUANTITY	0.99+
20,000 people	QUANTITY	0.99+
last week	DATE	0.99+
Datto	ORGANIZATION	0.99+
last January	DATE	0.99+
$10 million	QUANTITY	0.98+
seven	QUANTITY	0.98+
each	QUANTITY	0.98+
first	QUANTITY	0.98+
Mapbox	ORGANIZATION	0.98+
today	DATE	0.97+
1%	QUANTITY	0.97+
Hadoop	TITLE	0.97+
Matt	PERSON	0.96+
single platform	QUANTITY	0.96+
NC	ORGANIZATION	0.95+
this morning	DATE	0.95+
single application	QUANTITY	0.94+
25%	QUANTITY	0.94+
Midtown	LOCATION	0.93+
first big	QUANTITY	0.92+
Rubicon	ORGANIZATION	0.92+
37 different projects	QUANTITY	0.92+
last seven years	DATE	0.89+
over a hundred billion ad auctions a day	QUANTITY	0.88+
this past year	DATE	0.86+
spark	ORGANIZATION	0.85+
multi-million dollar	QUANTITY	0.84+
decades	QUANTITY	0.83+
a hundred dollars	QUANTITY	0.79+
data corruption	QUANTITY	0.7+
HBase	TITLE	0.67+
Hilton	ORGANIZATION	0.67+
RDB	TITLE	0.64+
Spark	ORGANIZATION	0.57+
MapR	ORGANIZATION	0.57+
map	TITLE	0.57+
Salesforce	ORGANIZATION	0.53+
2016	EVENT	0.51+
- Spark Summit	EVENT	0.46+
East	LOCATION	0.42+

Opher Kahane, Sonoma Ventures | CloudNativeSecurityCon 23

(uplifting music) >> Hello, welcome back to theCUBE's coverage of CloudNativeSecurityCon, the inaugural event, in Seattle. I'm John Furrier, host of theCUBE, here in the Palo Alto Studios. We're calling it theCUBE Center. It's kind of like our Sports Center for tech. It's kind of remote coverage. We've been doing this now for a few years. We're going to amp it up this year as more events are remote, and happening all around the world. So, we're going to continue the coverage with this segment focusing on the data stack, entrepreneurial opportunities around all things security, and as, obviously, data's involved. And our next guest is a friend of theCUBE, and CUBE alumni from 2013, entrepreneur himself, turned, now, venture capitalist angel investor, with his own firm, Opher Kahane, Managing Director, Sonoma Ventures. Formerly the founder of Origami, sold to Intuit a few years back. Focusing now on having a lot of fun, angel investing on boards, focusing on data-driven applications, and stacks around that, and all the stuff going on in, really, in the wheelhouse for what's going on around security data. Opher, great to see you. Thanks for coming on. >> My pleasure. Great to be back. It's been a while. >> So you're kind of on Easy Street now. You did the entrepreneurial venture, you've worked hard. We were on together in 2013 when theCUBE just started. XCEL Partners had an event in Stanford, XCEL, and they had all the features there. We interviewed Satya Nadella, who was just a manager at Microsoft at that time, he was there. He's now the CEO of Microsoft. >> Yeah, he was. >> A lot's changed in nine years. But congratulations on your venture you sold, and you got an exit there, and now you're doing a lot of investments. I'd love to get your take, because this is really the biggest change I've seen in the past 12 years, around an inflection point around a lot of converging forces. Data, which, big data, 10 years ago, was a big part of your career, but now it's accelerated, with cloud scale. You're seeing people building scale on top of other clouds, and becoming their own cloud. You're seeing data being a big part of it. Cybersecurity kind of has not really changed much, but it's the most important thing everyone's talking about. So, developers are involved, data's involved, a lot of entrepreneurial opportunities. So I'd love to get your take on how you see the current situation, as it relates to what's gone on in the past five years or so. What's the big story? >> So, a lot of big stories, but I think a lot of it has to do with a promise of making value from data, whether it's for cybersecurity, for Fintech, for DevOps, for RevTech startups and companies. There's a lot of challenges in actually driving and monetizing the value from data with velocity. Historically, the challenge has been more around, "How do I store data at massive scale?" And then you had the big data infrastructure company, like Cloudera, and MapR, and others, deal with it from a scale perspective, from a storage perspective. Then you had a whole layer of companies that evolved to deal with, "How do I index massive scales of data, for quick querying, and federated access, et cetera?" But now that a lot of those underlying problems, if you will, have been solved, to a certain extent, although they're always being stretched, given the scale of data, and its utility is becoming more and more massive, in particular with AI use cases being very prominent right now, the next level is how to actually make value from the data. How do I manage the full lifecycle of data in complex environments, with complex organizations, complex use cases? And having seen this from the inside, with Origami Logic, as we dealt with a lot of large corporations, and post-acquisition by Intuit, and a lot of the startups I'm involved with, it's clear that we're now onto that next step. And you have fundamental new paradigms, such as data mesh, that attempt to address that complexity, and responsibly scaling access, and democratizing access in the value monetization from data, across large organizations. You have a slew of startups that are evolving to help the entire lifecycle of data, from the data engineering side of it, to the data analytics side of it, to the AI use cases side of it. And it feels like the early days, to a certain extent, of the revolution that we've seen in transition from traditional databases, to data warehouses, to cloud-based data processing, and big data. It feels like we're at the genesis of that next wave. And it's super, super exciting, for me at least, as someone who's sitting more in the coach seat, rather than being on the pitch, and building startups, helping folks as they go through those motions. >> So that's awesome. I want to get into some of these data infrastructure dynamics you mentioned, but before that, talk to the audience around what you're working on now. You've been a successful entrepreneur, you're focused on angel investing, so, super-early seed stage. What kind of deals are you looking at? What's interesting to you? What is Sonoma Ventures looking for, and what are some of the entrepreneurial dynamics that you're seeing right now, from a startup standpoint? >> Cool, so, at a macro level, this is a little bit of background of my history, because it shapes very heavily what it is that I'm looking at. So, I've been very fortunate with entrepreneurial career. I founded three startups. All three of them are successful. Final two were sold, the first one merged and went public. And my third career has been about data, moving data, passing data, processing data, generating insights from it. And, at this phase, I wanted to really evolve from just going and building startup number four, from going through the same motions again. A 10 year adventure, I'm a little bit too old for that, I guess. But the next best thing is to sit from a point whereby I can be more elevated in where I'm dealing with, and broaden the variety of startups I'm focused on, rather than just do your own thing, and just go very, very deep into it. Now, what specifically am I focused on at Sonoma Ventures? So, basically, looking at what I refer to as a data-driven application stack. Anything from the low-level data infrastructure and cloud infrastructure, that helps any persona in the data universe maximize value for data, from their particular point of view, for their particular role, whether it's data analysts, data scientists, data engineers, cloud engineers, DevOps folks, et cetera. All the way up to the application layer, in applications that are very data-heavy. And what are very typical data-heavy applications? FinTech, cyber, Web3, revenue technologies, and product and DevOps. So these are the areas we're focused on. I have almost 23 or 24 startups in the portfolio that span all these different areas. And this is in terms of the aperture. Now, typically, focus on pre-seed, seed. Sometimes a little bit later stage, but this is the primary focus. And it's really about partnering with entrepreneurs, and helping them make, if you will, original mistakes, avoid the mistakes I made. >> Yeah. >> And take it to the next level, whatever the milestone they're driving with. So I'm very, very hands-on with many of those startups. Now, what is it that's happening right now, initially, and why is it so exciting? So, on one hand, you have this scaling of data and its complexity, yet lagging value creation from it, across those different personas we've touched on. So that's one fundamental opportunity which is secular. The other one, which is more a cyclic situation, is the fact that we're going through a down cycle in tech, as is very evident in the public markets, and everything we're hearing about funding going slower and lower, terms shifting more into the hands of typical VCs versus entrepreneur-friendly market, and so on and so forth. And a very significant amount of layoffs. Now, when you combine these two trends together, you're observing a very interesting thing, that a lot of folks, really bright folks, who have sold a startup to a company, or have been in the guts of the large startup, or a large corporation, have, hands-on, experienced all those challenges we've spoken about earlier, in turf, maximizing value from data, irrespective of their role, in a specific angle, or vantage point they have on those challenges. So, for many of them, it's an opportunity to, "Now, let me now start a startup. I've been laid off, maybe, or my company's stock isn't doing as well as it used to, as a large corporation. Now I have an opportunity to actually go and take my entrepreneurial passion, and apply it to a product and experience as part of this larger company." >> Yeah. >> And you see a slew of folks who are emerging with these great ideas. So it's a very, very exciting period of time to innovate. >> It's interesting, a lot of people look at, I mean, I look at Snowflake as an example of a company that refactored data warehouses. They just basically took data warehouse, and put it on the cloud, and called it a data cloud. That, to me, was compelling. They didn't pay any CapEx. They rode Amazon's wave there. So, a similar thing going on with data. You mentioned this, and I see it as an enabling opportunity. So whether it's cybersecurity, FinTech, whatever vertical, you have an enablement. Now, you mentioned data infrastructure. It's a super exciting area, as there's so many stacks emerging. We got an analytics stack, there's real-time stacks, there's data lakes, AI stack, foundational models. So, you're seeing an explosion of stacks, different tools probably will emerge. So, how do you look at that, as a seasoned entrepreneur, now investor? Is that a good thing? Is that just more of the market? 'Cause it just seems like more and more kind of decomposed stacks targeted at use cases seems to be a trend. >> Yeah. >> And how do you vet that, is it? >> So it's a great observation, and if you take a step back and look at the evolution of technology over the last 30 years, maybe longer, you always see these cycles of expansion, fragmentation, contraction, expansion, contraction. Go decentralize, go centralize, go decentralize, go centralize, as manifested in different types of technology paradigms. From client server, to storage, to microservices, to et cetera, et cetera. So I think we're going through another big bang, to a certain extent, whereby end up with more specialized data stacks for specific use cases, as you need performance, the data models, the tooling to best adapt to the particular task at hand, and the particular personas at hand. As the needs of the data analysts are quite different from the needs of an NL engineer, it's quite different from the needs of the data engineer. And what happens is, when you end up with these siloed stacks, you end up with new fragmentation, and new gaps that need to be filled with a new layer of innovation. And I suspect that, in part, that's what we're seeing right now, in terms of the next wave of data innovation. Whether it's in a service of FinTech use cases, or cyber use cases, or other, is a set of tools that end up having to try and stitch together those elements and bridge between them. So I see that as a fantastic gap to innovate around. I see, also, a fundamental need in creating a common data language, and common data management processes and governance across those different personas, because ultimately, the same underlying data these folks need, albeit in different mediums, different access models, different velocities, et cetera, the subject matter, if you will, the underlying raw data, and some of the taxonomies right on top of it, do need to be consistent. So, once again, a great opportunity to innovate, whether it's about semantic layers, whether it's about data mesh, whether it's about CICD tools for data engineers, and so on and so forth. >> I got to ask you, first of all, I see you have a friend you brought into the interview. You have a dog in the background who made a little cameo appearance. And that's awesome. Sitting right next to you, making sure everything's going well. On the AI thing, 'cause I think that's the hot trend here. >> Yeah. >> You're starting to see, that ChatGPT's got everyone excited, because it's kind of that first time you see kind of next-gen functionality, large-language models, where you can bring data in, and it integrates well. So, to me, I think, connecting the dots, this kind of speaks to the beginning of what will be a trend of really blending of data stacks together, or blending of models. And so, as more data modeling emerges, you start to have this AI stack kind of situation, where you have things out there that you can compose. It's almost very developer-friendly, conceptually. This is kind of new, but kind of the same concept's been working on with Google and others. How do you see this emerging, as an investor? What are some of the things that you're excited about, around the ChatGPT kind of things that's happening? 'Cause it brings it mainstream. Again, a million downloads, fastest applications get a million downloads, even among all the successes. So it's obviously hit a nerve. People are talking about it. What's your take on that? >> Yeah, so, I think that's a great point, and clearly, it feels like an iPhone moment, right, to the industry, in this case, AI, and lots of applications. And I think there's, at a high level, probably three different layers of innovation. One is on top of those platforms. What use cases can one bring to the table that would drive on top of a ChatGPT-like service? Whereby, the startup, the company, can bring some unique datasets to infuse and add value on top of it, by custom-focusing it and purpose-building it for a particular use case or particular vertical. Whether it's applying it to customer service, in a particular vertical, applying it to, I don't know, marketing content creation, and so on and so forth. That's one category. And I do know that, as one of my startups is in Y Combinator, this season, winter '23, they're saying that a very large chunk of the YC companies in this cycle are about GPT use cases. So we'll see a flurry of that. The next layer, the one below that, is those who actually provide those platforms, whether it's ChatGPT, whatever will emerge from the partnership with Microsoft, and any competitive players that emerge from other startups, or from the big cloud providers, whether it's Facebook, if they ever get into this, and Google, which clearly will, as they need to, to survive around search. The third layer is the enabling layer. As you're going to have more and more of those different large-language models and use case running on top of it, the underlying layers, all the way down to cloud infrastructure, the data infrastructure, and the entire set of tools and systems, that take raw data, and massage it into useful, labeled, contextualized features and data to feed the models, the AI models, whether it's during training, or during inference stages, in production. Personally, my focus is more on the infrastructure than on the application use cases. And I believe that there's going to be a massive amount of innovation opportunity around that, to reach cost-effective, quality, fair models that are deployed easily and maintained easily, or at least with as little pain as possible, at scale. So there are startups that are dealing with it, in various areas. Some are about focusing on labeling automation, some about fairness, about, speaking about cyber, protecting models from threats through data and other issues with it, and so on and so forth. And I believe that this will be, too, a big driver for massive innovation, the infrastructure layer. >> Awesome, and I love how you mentioned the iPhone moment. I call it the browser moment, 'cause it felt that way for me, personally. >> Yep. >> But I think, from a business model standpoint, there is that iPhone shift. It's not the BlackBerry. It's a whole 'nother thing. And I like that. But I do have to ask you, because this is interesting. You mentioned iPhone. iPhone's mostly proprietary. So, in these machine learning foundational models, >> Yeah. >> you're starting to see proprietary hardware, bolt-on, acceleration, bundled together, for faster uptake. And now you got open source emerging, as two things. It's almost iPhone-Android situation happening. >> Yeah. >> So what's your view on that? Because there's pros and cons for either one. You're seeing a lot of these machine learning laws are very proprietary, but they work, and do you care, right? >> Yeah. >> And then you got open source, which is like, "Okay, let's get some upsource code, and let people verify it, and then build with that." Is it a balance? >> Yes, I think- >> Is it mutually exclusive? What's your view? >> I think it's going to be, markets will drive the proportion of both, and I think, for a certain use case, you'll end up with more proprietary offerings. With certain use cases, I guess the fundamental infrastructure for ChatGPT-like, let's say, large-language models and all the use cases running on top of it, that's likely going to be more platform-oriented and open source, and will allow innovation. Think of it as the equivalent of iPhone apps or Android apps running on top of those platforms, as in AI apps. So we'll have a lot of that. Now, when you start going a little bit more into the guts, the lower layers, then it's clear that, for performance reasons, in particular, for certain use cases, we'll end up with more proprietary offerings, whether it's advanced silicon, such as some of the silicon that emerged from entrepreneurs who have left Google, around TensorFlow, and all the silicon that powers that. You'll see a lot of innovation in that area as well. It hopefully intends to improve the cost efficiency of running large AI-oriented workloads, both in inference and in learning stages. >> I got to ask you, because this has come up a lot around Azure and Microsoft. Microsoft, pretty good move getting into the ChatGPT >> Yep. >> and the open AI, because I was talking to someone who's a hardcore Amazon developer, and they said, they swore they would never use Azure, right? One of those types. And they're spinning up Azure servers to get access to the API. So, the developers are flocking, as you mentioned. The YC class is all doing large data things, because you can now program with data, which is amazing, which is amazing. So, what's your take on, I know you got to be kind of neutral 'cause you're an investor, but you got, Amazon has to respond, Google, essentially, did all the work, so they have to have a solution. So, I'm expecting Google to have something very compelling, but Microsoft, right now, is going to just, might run the table on developers, this new wave of data developers. What's your take on the cloud responses to this? What's Amazon, what do you think AWS is going to do? What should Google be doing? What's your take? >> So, each of them is coming from a slightly different angle, of course. I'll say, Google, I think, has massive assets in the AI space, and their underlying cloud platform, I think, has been designed to support such complicated workloads, but they have yet to go as far as opening it up the same way ChatGPT is now in that Microsoft partnership, and Azure. Good question regarding Amazon. AWS has had a significant investment in AI-related infrastructure. Seeing it through my startups, through other lens as well. How will they respond to that higher layer, above and beyond the low level, if you will, AI-enabling apparatuses? How do they elevate to at least one or two layers above, and get to the same ChatGPT layer, good question. Is there an acquisition that will make sense for them to accelerate it, maybe. Is there an in-house development that they can reapply from a different domain towards that, possibly. But I do suspect we'll end up with acquisitions as the arms race around the next level of cloud wars emerges, and it's going to be no longer just about the basic tooling for basic cloud-based applications, and the infrastructure, and the cost management, but rather, faster time to deliver AI in data-heavy applications. Once again, each one of those cloud suppliers, their vendor is coming with different assets, and different pros and cons. All of them will need to just elevate the level of the fight, if you will, in this case, to the AI layer. >> It's going to be very interesting, the different stacks on the data infrastructure, like I mentioned, analytics, data lake, AI, all happening. It's going to be interesting to see how this turns into this AI cloud, like data clouds, data operating systems. So, super fascinating area. Opher, thank you for coming on and sharing your expertise with us. Great to see you, and congratulations on the work. I'll give you the final word here. Give a plugin for what you're looking for for startup seats, pre-seeds. What's the kind of profile that gets your attention, from a seed, pre-seed candidate or entrepreneur? >> Cool, first of all, it's my pleasure. Enjoy our chats, as always. Hopefully the next one's not going to be in nine years. As to what I'm looking for, ideally, smart data entrepreneurs, who have come from a particular domain problem, or problem domain, that they understand, they felt it in their own 10 fingers, or millions of neurons in their brains, and they figured out a way to solve it. Whether it's a data infrastructure play, a cloud infrastructure play, or a very, very smart application that takes advantage of data at scale. These are the things I'm looking for. >> One final, final question I have to ask you, because you're a seasoned entrepreneur, and now coach. What's different about the current entrepreneurial environment right now, vis-a-vis, the past decade? What's new? Is it different, highly accelerated? What advice do you give entrepreneurs out there who are putting together their plan? Obviously, a global resource pool now of engineering. It might not be yesterday's formula for success to putting a venture together to get to that product-market fit. What's new and different, and what's your advice to the folks out there about what's different about the current environment for being an entrepreneur? >> Fantastic, so I think it's a great question. So I think there's a few axes of difference, compared to, let's say, five years ago, 10 years ago, 15 years ago. First and foremost, given the amount of infrastructure out there, the amount of open-source technologies, amount of developer toolkits and frameworks, trying to develop an application, at least at the application layer, is much faster than ever. So, it's faster and cheaper, to the most part, unless you're building very fundamental, core, deep tech, where you still have a big technology challenge to deal with. And absent that, the challenge shifts more to how do you manage my resources, to product-market fit, how are you integrating the GTM lens, the go-to-market lens, as early as possible in the product-market fit cycle, such that you reach from pre-seed to seed, from seed to A, from A to B, with an optimal amount of velocity, and a minimal amount of resources. One big difference, specifically as of, let's say, beginning of this year, late last year, is that money is no longer free for entrepreneurs, which means that you need to operate and build startup in an environment with a lot more constraints. And in my mind, some of the best startups that have ever been built, and some of the big market-changing, generational-changing, if you will, technology startups, in their respective industry verticals, have actually emerged from these times. And these tend to be the smartest, best startups that emerge because they operate with a lot less money. Money is not as available for them, which means that they need to make tough decisions, and make verticals every day. What you don't need to do, you can kick the cow down the road. When you have plenty of money, and it cushions for a lot of mistakes, you don't have that cushion. And hopefully we'll end up with companies with a more agile, more, if you will, resilience, and better cultures in making those tough decisions that startups need to make every day. Which is why I'm super, super excited to see the next batch of amazing unicorns, true unicorns, not just valuation, market rising with the water type unicorns that emerged from this particular era, which we're in the beginning of. And very much enjoy working with entrepreneurs during this difficult time, the times we're in. >> The next 24 months will be the next wave, like you said, best time to do a company. Remember, Airbnb's pitch was, "We'll rent cots in apartments, and sell cereal." Boy, a lot of people passed on that deal, in that last down market, that turned out to be a game-changer. So the crazy ideas might not be that bad. So it's all about the entrepreneurs, and >> 100%. >> this is a big wave, and it's certainly happening. Opher, thank you for sharing. Obviously, data is going to change all the markets. Refactoring, security, FinTech, user experience, applications are going to be changed by data, data operating system. Thanks for coming on, and thanks for sharing. Appreciate it. >> My pleasure. Have a good one. >> Okay, more coverage for the CloudNativeSecurityCon inaugural event. Data will be the key for cybersecurity. theCUBE's coverage continues after this break. (uplifting music)

Published Date : Feb 2 2023

SUMMARY :

and happening all around the world. Great to be back. He's now the CEO in the past five years or so. and a lot of the startups What kind of deals are you looking at? and broaden the variety of and apply it to a product and experience And you see a slew of folks and put it on the cloud, and new gaps that need to be filled You have a dog in the background but kind of the same and the entire set of tools and systems, I call it the browser moment, But I do have to ask you, And now you got open source and do you care, right? and then build with that." and all the use cases I got to ask you, because and the open AI, and it's going to be no longer What's the kind of profile These are the things I'm looking for. about the current environment and some of the big market-changing, So it's all about the entrepreneurs, and to change all the markets. Have a good one. for the CloudNativeSecurityCon

ENTITIES

Entity	Category	Confidence
Satya Nadella	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
2013	DATE	0.99+
Opher	PERSON	0.99+
CapEx	ORGANIZATION	0.99+
Seattle	LOCATION	0.99+
John Furrier	PERSON	0.99+
Sonoma Ventures	ORGANIZATION	0.99+
BlackBerry	ORGANIZATION	0.99+
10 fingers	QUANTITY	0.99+
Airbnb	ORGANIZATION	0.99+
CUBE	ORGANIZATION	0.99+
nine years	QUANTITY	0.99+
Facebook	ORGANIZATION	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
Origami Logic	ORGANIZATION	0.99+
Origami	ORGANIZATION	0.99+
Intuit	ORGANIZATION	0.99+
RevTech	ORGANIZATION	0.99+
each	QUANTITY	0.99+
Opher Kahane	PERSON	0.99+
CloudNativeSecurityCon	EVENT	0.99+
Palo Alto Studios	LOCATION	0.99+
yesterday	DATE	0.99+
One	QUANTITY	0.99+
First	QUANTITY	0.99+
third layer	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.98+
two layers	QUANTITY	0.98+
Android	TITLE	0.98+
third career	QUANTITY	0.98+
two things	QUANTITY	0.98+
both	QUANTITY	0.98+
MapR	ORGANIZATION	0.98+
one	QUANTITY	0.98+
one category	QUANTITY	0.98+
late last year	DATE	0.98+
millions of neurons	QUANTITY	0.98+
a million downloads	QUANTITY	0.98+
three startups	QUANTITY	0.98+
10 years ago	DATE	0.97+
Fintech	ORGANIZATION	0.97+
winter '23	DATE	0.97+
first one	QUANTITY	0.97+
this year	DATE	0.97+
Stanford	LOCATION	0.97+
Cloudera	ORGANIZATION	0.97+
theCUBE Center	ORGANIZATION	0.96+
five years ago	DATE	0.96+
10 year	QUANTITY	0.96+
ChatGPT	TITLE	0.96+
three	QUANTITY	0.95+
first time	QUANTITY	0.95+
XCEL Partners	ORGANIZATION	0.95+
15 years ago	DATE	0.94+
24 startups	QUANTITY	0.93+

Data Power Panel V3

(upbeat music) >> The stampede to cloud and massive VC investments has led to the emergence of a new generation of object store based data lakes. And with them two important trends, actually three important trends. First, a new category that combines data lakes and data warehouses aka the lakehouse is emerged as a leading contender to be the data platform of the future. And this novelty touts the ability to address data engineering, data science, and data warehouse workloads on a single shared data platform. The other major trend we've seen is query engines and broader data fabric virtualization platforms have embraced NextGen data lakes as platforms for SQL centric business intelligence workloads, reducing, or somebody even claim eliminating the need for separate data warehouses. Pretty bold. However, cloud data warehouses have added complimentary technologies to bridge the gaps with lakehouses. And the third is many, if not most customers that are embracing the so-called data fabric or data mesh architectures. They're looking at data lakes as a fundamental component of their strategies, and they're trying to evolve them to be more capable, hence the interest in lakehouse, but at the same time, they don't want to, or can't abandon their data warehouse estate. As such we see a battle royale is brewing between cloud data warehouses and cloud lakehouses. Is it possible to do it all with one cloud center analytical data platform? Well, we're going to find out. My name is Dave Vellante and welcome to the data platform's power panel on theCUBE. Our next episode in a series where we gather some of the industry's top analysts to talk about one of our favorite topics, data. In today's session, we'll discuss trends, emerging options, and the trade offs of various approaches and we'll name names. Joining us today are Sanjeev Mohan, who's the principal at SanjMo, Tony Baers, principal at dbInsight. And Doug Henschen is the vice president and principal analyst at Constellation Research. Guys, welcome back to theCUBE. Great to see you again. >> Thank guys. Thank you. >> Thank you. >> So it's early June and we're gearing up with two major conferences, there's several database conferences, but two in particular that were very interested in, Snowflake Summit and Databricks Data and AI Summit. Doug let's start off with you and then Tony and Sanjeev, if you could kindly weigh in. Where did this all start, Doug? The notion of lakehouse. And let's talk about what exactly we mean by lakehouse. Go ahead. >> Yeah, well you nailed it in your intro. One platform to address BI data science, data engineering, fewer platforms, less cost, less complexity, very compelling. You can credit Databricks for coining the term lakehouse back in 2020, but it's really a much older idea. You can go back to Cloudera introducing their Impala database in 2012. That was a database on top of Hadoop. And indeed in that last decade, by the middle of that last decade, there were several SQL on Hadoop products, open standards like Apache Drill. And at the same time, the database vendors were trying to respond to this interest in machine learning and the data science. So they were adding SQL extensions, the likes Hudi and Vertical we're adding SQL extensions to support the data science. But then later in that decade with the shift to cloud and object storage, you saw the vendor shift to this whole cloud, and object storage idea. So you have in the database camp Snowflake introduce Snowpark to try to address the data science needs. They introduced that in 2020 and last year they announced support for Python. You also had Oracle, SAP jumped on this lakehouse idea last year, supporting both the lake and warehouse single vendor, not necessarily quite single platform. Google very recently also jumped on the bandwagon. And then you also mentioned, the SQL engine camp, the Dremios, the Ahanas, the Starbursts, really doing two things, a fabric for distributed access to many data sources, but also very firmly planning that idea that you can just have the lake and we'll help you do the BI workloads on that. And then of course, the data lake camp with the Databricks and Clouderas providing a warehouse style deployments on top of their lake platforms. >> Okay, thanks, Doug. I'd be remiss those of you who me know that I typically write my own intros. This time my colleagues fed me a lot of that material. So thank you. You guys make it easy. But Tony, give us your thoughts on this intro. >> Right. Well, I very much agree with both of you, which may not make for the most exciting television in terms of that it has been an evolution just like Doug said. I mean, for instance, just to give an example when Teradata bought AfterData was initially seen as a hardware platform play. In the end, it was basically, it was all those after functions that made a lot of sort of big data analytics accessible to SQL. (clears throat) And so what I really see just in a more simpler definition or functional definition, the data lakehouse is really an attempt by the data lake folks to make the data lake friendlier territory to the SQL folks, and also to get into friendly territory, to all the data stewards, who are basically concerned about the sprawl and the lack of control in governance in the data lake. So it's really kind of a continuing of an ongoing trend that being said, there's no action without counter action. And of course, at the other end of the spectrum, we also see a lot of the data warehouses starting to edit things like in database machine learning. So they're certainly not surrendering without a fight. Again, as Doug was mentioning, this has been part of a continual blending of platforms that we've seen over the years that we first saw in the Hadoop years with SQL on Hadoop and data warehouses starting to reach out to cloud storage or should say the HDFS and then with the cloud then going cloud native and therefore trying to break the silos down even further. >> Now, thank you. And Sanjeev, data lakes, when we first heard about them, there were such a compelling name, and then we realized all the problems associated with them. So pick it up from there. What would you add to Doug and Tony? >> I would say, these are excellent points that Doug and Tony have brought to light. The concept of lakehouse was going on to your point, Dave, a long time ago, long before the tone was invented. For example, in Uber, Uber was trying to do a mix of Hadoop and Vertical because what they really needed were transactional capabilities that Hadoop did not have. So they weren't calling it the lakehouse, they were using multiple technologies, but now they're able to collapse it into a single data store that we call lakehouse. Data lakes, excellent at batch processing large volumes of data, but they don't have the real time capabilities such as change data capture, doing inserts and updates. So this is why lakehouse has become so important because they give us these transactional capabilities. >> Great. So I'm interested, the name is great, lakehouse. The concept is powerful, but I get concerned that it's a lot of marketing hype behind it. So I want to examine that a bit deeper. How mature is the concept of lakehouse? Are there practical examples that really exist in the real world that are driving business results for practitioners? Tony, maybe you could kick that off. >> Well, put it this way. I think what's interesting is that both data lakes and data warehouse that each had to extend themselves. To believe the Databricks hype it's that this was just a natural extension of the data lake. In point of fact, Databricks had to go outside its core technology of Spark to make the lakehouse possible. And it's a very similar type of thing on the part with data warehouse folks, in terms of that they've had to go beyond SQL, In the case of Databricks. There have been a number of incremental improvements to Delta lake, to basically make the table format more performative, for instance. But the other thing, I think the most dramatic change in all that is in their SQL engine and they had to essentially pretty much abandon Spark SQL because it really, in off itself Spark SQL is essentially stop gap solution. And if they wanted to really address that crowd, they had to totally reinvent SQL or at least their SQL engine. And so Databricks SQL is not Spark SQL, it is not Spark, it's basically SQL that it's adapted to run in a Spark environment, but the underlying engine is C++, it's not scale or anything like that. So Databricks had to take a major detour outside of its core platform to do this. So to answer your question, this is not mature because these are all basically kind of, even though the idea of blending platforms has been going on for well over a decade, I would say that the current iteration is still fairly immature. And in the cloud, I could see a further evolution of this because if you think through cloud native architecture where you're essentially abstracting compute from data, there is no reason why, if let's say you are dealing with say, the same basically data targets say cloud storage, cloud object storage that you might not apportion the task to different compute engines. And so therefore you could have, for instance, let's say you're Google, you could have BigQuery, perform basically the types of the analytics, the SQL analytics that would be associated with the data warehouse and you could have BigQuery ML that does some in database machine learning, but at the same time for another part of the query, which might involve, let's say some deep learning, just for example, you might go out to let's say the serverless spark service or the data proc. And there's no reason why Google could not blend all those into a coherent offering that's basically all triggered through microservices. And I just gave Google as an example, if you could generalize that with all the other cloud or all the other third party vendors. So I think we're still very early in the game in terms of maturity of data lakehouses. >> Thanks, Tony. So Sanjeev, is this all hype? What are your thoughts? >> It's not hype, but completely agree. It's not mature yet. Lakehouses have still a lot of work to do, so what I'm now starting to see is that the world is dividing into two camps. On one hand, there are people who don't want to deal with the operational aspects of vast amounts of data. They are the ones who are going for BigQuery, Redshift, Snowflake, Synapse, and so on because they want the platform to handle all the data modeling, access control, performance enhancements, but these are trade off. If you go with these platforms, then you are giving up on vendor neutrality. On the other side are those who have engineering skills. They want the independence. In other words, they don't want vendor lock in. They want to transform their data into any number of use cases, especially data science, machine learning use case. What they want is agility via open file formats using any compute engine. So why do I say lakehouses are not mature? Well, cloud data warehouses they provide you an excellent user experience. That is the main reason why Snowflake took off. If you have thousands of cables, it takes minutes to get them started, uploaded into your warehouse and start experimentation. Table formats are far more resonating with the community than file formats. But once the cost goes up of cloud data warehouse, then the organization start exploring lakehouses. But the problem is lakehouses still need to do a lot of work on metadata. Apache Hive was a fantastic first attempt at it. Even today Apache Hive is still very strong, but it's all technical metadata and it has so many different restrictions. That's why we see Databricks is investing into something called Unity Catalog. Hopefully we'll hear more about Unity Catalog at the end of the month. But there's a second problem. I just want to mention, and that is lack of standards. All these open source vendors, they're running, what I call ego projects. You see on LinkedIn, they're constantly battling with each other, but end user doesn't care. End user wants a problem to be solved. They want to use Trino, Dremio, Spark from EMR, Databricks, Ahana, DaaS, Frink, Athena. But the problem is that we don't have common standards. >> Right. Thanks. So Doug, I worry sometimes. I mean, I look at the space, we've debated for years, best of breed versus the full suite. You see AWS with whatever, 12 different plus data stores and different APIs and primitives. You got Oracle putting everything into its database. It's actually done some interesting things with MySQL HeatWave, so maybe there's proof points there, but Snowflake really good at data warehouse, simplifying data warehouse. Databricks, really good at making lakehouses actually more functional. Can one platform do it all? >> Well in a word, I can't be best at breed at all things. I think the upshot of and cogen analysis from Sanjeev there, the database, the vendors coming out of the database tradition, they excel at the SQL. They're extending it into data science, but when it comes to unstructured data, data science, ML AI often a compromise, the data lake crowd, the Databricks and such. They've struggled to completely displace the data warehouse when it really gets to the tough SLAs, they acknowledge that there's still a role for the warehouse. Maybe you can size down the warehouse and offload some of the BI workloads and maybe and some of these SQL engines, good for ad hoc, minimize data movement. But really when you get to the deep service level, a requirement, the high concurrency, the high query workloads, you end up creating something that's warehouse like. >> Where do you guys think this market is headed? What's going to take hold? Which projects are going to fade away? You got some things in Apache projects like Hudi and Iceberg, where do they fit Sanjeev? Do you have any thoughts on that? >> So thank you, Dave. So I feel that table formats are starting to mature. There is a lot of work that's being done. We will not have a single product or single platform. We'll have a mixture. So I see a lot of Apache Iceberg in the news. Apache Iceberg is really innovating. Their focus is on a table format, but then Delta and Apache Hudi are doing a lot of deep engineering work. For example, how do you handle high concurrency when there are multiple rights going on? Do you version your Parquet files or how do you do your upcerts basically? So different focus, at the end of the day, the end user will decide what is the right platform, but we are going to have multiple formats living with us for a long time. >> Doug is Iceberg in your view, something that's going to address some of those gaps in standards that Sanjeev was talking about earlier? >> Yeah, Delta lake, Hudi, Iceberg, they all address this need for consistency and scalability, Delta lake open technically, but open for access. I don't hear about Delta lakes in any worlds, but Databricks, hearing a lot of buzz about Apache Iceberg. End users want an open performance standard. And most recently Google embraced Iceberg for its recent a big lake, their stab at having supporting both lakes and warehouses on one conjoined platform. >> And Tony, of course, you remember the early days of the sort of big data movement you had MapR was the most closed. You had Horton works the most open. You had Cloudera in between. There was always this kind of contest as to who's the most open. Does that matter? Are we going to see a repeat of that here? >> I think it's spheres of influence, I think, and Doug very much was kind of referring to this. I would call it kind of like the MongoDB syndrome, which is that you have... and I'm talking about MongoDB before they changed their license, open source project, but very much associated with MongoDB, which basically, pretty much controlled most of the contributions made decisions. And I think Databricks has the same iron cloud hold on Delta lake, but still the market is pretty much associated Delta lake as the Databricks, open source project. I mean, Iceberg is probably further advanced than Hudi in terms of mind share. And so what I see that's breaking down to is essentially, basically the Databricks open source versus the everything else open source, the community open source. So I see it's a very similar type of breakdown that I see repeating itself here. >> So by the way, Mongo has a conference next week, another data platform is kind of not really relevant to this discussion totally. But in the sense it is because there's a lot of discussion on earnings calls these last couple of weeks about consumption and who's exposed, obviously people are concerned about Snowflake's consumption model. Mongo is maybe less exposed because Atlas is prominent in the portfolio, blah, blah, blah. But I wanted to bring up the little bit of controversy that we saw come out of the Snowflake earnings call, where the ever core analyst asked Frank Klutman about discretionary spend. And Frank basically said, look, we're not discretionary. We are deeply operationalized. Whereas he kind of poo-pooed the lakehouse or the data lake, et cetera, saying, oh yeah, data scientists will pull files out and play with them. That's really not our business. Do any of you have comments on that? Help us swing through that controversy. Who wants to take that one? >> Let's put it this way. The SQL folks are from Venus and the data scientists are from Mars. So it means it really comes down to it, sort that type of perception. The fact is, is that, traditionally with analytics, it was very SQL oriented and that basically the quants were kind of off in their corner, where they're using SaaS or where they're using Teradata. It's really a great leveler today, which is that, I mean basic Python it's become arguably one of the most popular programming languages, depending on what month you're looking at, at the title index. And of course, obviously SQL is, as I tell the MongoDB folks, SQL is not going away. You have a large skills base out there. And so basically I see this breaking down to essentially, you're going to have each group that's going to have its own natural preferences for its home turf. And the fact that basically, let's say the Python and scale of folks are using Databricks does not make them any less operational or machine critical than the SQL folks. >> Anybody else want to chime in on that one? >> Yeah, I totally agree with that. Python support in Snowflake is very nascent with all of Snowpark, all of the things outside of SQL, they're very much relying on partners too and make things possible and make data science possible. And it's very early days. I think the bottom line, what we're going to see is each of these camps is going to keep working on doing better at the thing that they don't do today, or they're new to, but they're not going to nail it. They're not going to be best of breed on both sides. So the SQL centric companies and shops are going to do more data science on their database centric platform. That data science driven companies might be doing more BI on their leagues with those vendors and the companies that have highly distributed data, they're going to add fabrics, and maybe offload more of their BI onto those engines, like Dremio and Starburst. >> So I've asked you this before, but I'll ask you Sanjeev. 'Cause Snowflake and Databricks are such great examples 'cause you have the data engineering crowd trying to go into data warehousing and you have the data warehousing guys trying to go into the lake territory. Snowflake has $5 billion in the balance sheet and I've asked you before, I ask you again, doesn't there has to be a semantic layer between these two worlds? Does Snowflake go out and do M&A and maybe buy ad scale or a data mirror? Or is that just sort of a bandaid? What are your thoughts on that Sanjeev? >> I think semantic layer is the metadata. The business metadata is extremely important. At the end of the day, the business folks, they'd rather go to the business metadata than have to figure out, for example, like let's say, I want to update somebody's email address and we have a lot of overhead with data residency laws and all that. I want my platform to give me the business metadata so I can write my business logic without having to worry about which database, which location. So having that semantic layer is extremely important. In fact, now we are taking it to the next level. Now we are saying that it's not just a semantic layer, it's all my KPIs, all my calculations. So how can I make those calculations independent of the compute engine, independent of the BI tool and make them fungible. So more disaggregation of the stack, but it gives us more best of breed products that the customers have to worry about. >> So I want to ask you about the stack, the modern data stack, if you will. And we always talk about injecting machine intelligence, AI into applications, making them more data driven. But when you look at the application development stack, it's separate, the database is tends to be separate from the data and analytics stack. Do those two worlds have to come together in the modern data world? And what does that look like organizationally? >> So organizationally even technically I think it is starting to happen. Microservices architecture was a first attempt to bring the application and the data world together, but they are fundamentally different things. For example, if an application crashes, that's horrible, but Kubernetes will self heal and it'll bring the application back up. But if a database crashes and corrupts your data, we have a huge problem. So that's why they have traditionally been two different stacks. They are starting to come together, especially with data ops, for instance, versioning of the way we write business logic. It used to be, a business logic was highly embedded into our database of choice, but now we are disaggregating that using GitHub, CICD the whole DevOps tool chain. So data is catching up to the way applications are. >> We also have databases, that trans analytical databases that's a little bit of what the story is with MongoDB next week with adding more analytical capabilities. But I think companies that talk about that are always careful to couch it as operational analytics, not the warehouse level workloads. So we're making progress, but I think there's always going to be, or there will long be a separate analytical data platform. >> Until data mesh takes over. (all laughing) Not opening a can of worms. >> Well, but wait, I know it's out of scope here, but wouldn't data mesh say, hey, do take your best of breed to Doug's earlier point. You can't be best of breed at everything, wouldn't data mesh advocate, data lakes do your data lake thing, data warehouse, do your data lake, then you're just a node on the mesh. (Tony laughs) Now you need separate data stores and you need separate teams. >> To my point. >> I think, I mean, put it this way. (laughs) Data mesh itself is a logical view of the world. The data mesh is not necessarily on the lake or on the warehouse. I think for me, the fear there is more in terms of, the silos of governance that could happen and the silo views of the world, how we redefine. And that's why and I want to go back to something what Sanjeev said, which is that it's going to be raising the importance of the semantic layer. Now does Snowflake that opens a couple of Pandora's boxes here, which is one, does Snowflake dare go into that space or do they risk basically alienating basically their partner ecosystem, which is a key part of their whole appeal, which is best of breed. They're kind of the same situation that Informatica was where in the early 2000s, when Informatica briefly flirted with analytic applications and realized that was not a good idea, need to redouble down on their core, which was data integration. The other thing though, that raises the importance of and this is where the best of breed comes in, is the data fabric. My contention is that and whether you use employee data mesh practice or not, if you do employee data mesh, you need data fabric. If you deploy data fabric, you don't necessarily need to practice data mesh. But data fabric at its core and admittedly it's a category that's still very poorly defined and evolving, but at its core, we're talking about a common meta data back plane, something that we used to talk about with master data management, this would be something that would be more what I would say basically, mutable, that would be more evolving, basically using, let's say, machine learning to kind of, so that we don't have to predefine rules or predefine what the world looks like. But so I think in the long run, what this really means is that whichever way we implement on whichever physical platform we implement, we need to all be speaking the same metadata language. And I think at the end of the day, regardless of whether it's a lake, warehouse or a lakehouse, we need common metadata. >> Doug, can I come back to something you pointed out? That those talking about bringing analytic and transaction databases together, you had talked about operationalizing those and the caution there. Educate me on MySQL HeatWave. I was surprised when Oracle put so much effort in that, and you may or may not be familiar with it, but a lot of folks have talked about that. Now it's got nowhere in the market, that no market share, but a lot of we've seen these benchmarks from Oracle. How real is that bringing together those two worlds and eliminating ETL? >> Yeah, I have to defer on that one. That's my colleague, Holger Mueller. He wrote the report on that. He's way deep on it and I'm not going to mock him. >> I wonder if that is something, how real that is or if it's just Oracle marketing, anybody have any thoughts on that? >> I'm pretty familiar with HeatWave. It's essentially Oracle doing what, I mean, there's kind of a parallel with what Google's doing with AlloyDB. It's an operational database that will have some embedded analytics. And it's also something which I expect to start seeing with MongoDB. And I think basically, Doug and Sanjeev were kind of referring to this before about basically kind of like the operational analytics, that are basically embedded within an operational database. The idea here is that the last thing you want to do with an operational database is slow it down. So you're not going to be doing very complex deep learning or anything like that, but you might be doing things like classification, you might be doing some predictives. In other words, we've just concluded a transaction with this customer, but was it less than what we were expecting? What does that mean in terms of, is this customer likely to turn? I think we're going to be seeing a lot of that. And I think that's what a lot of what MySQL HeatWave is all about. Whether Oracle has any presence in the market now it's still a pretty new announcement, but the other thing that kind of goes against Oracle, (laughs) that they had to battle against is that even though they own MySQL and run the open source project, everybody else, in terms of the actual commercial implementation it's associated with everybody else. And the popular perception has been that MySQL has been basically kind of like a sidelight for Oracle. And so it's on Oracles shoulders to prove that they're damn serious about it. >> There's no coincidence that MariaDB was launched the day that Oracle acquired Sun. Sanjeev, I wonder if we could come back to a topic that we discussed earlier, which is this notion of consumption, obviously Wall Street's very concerned about it. Snowflake dropped prices last week. I've always felt like, hey, the consumption model is the right model. I can dial it down in when I need to, of course, the street freaks out. What are your thoughts on just pricing, the consumption model? What's the right model for companies, for customers? >> Consumption model is here to stay. What I would like to see, and I think is an ideal situation and actually plays into the lakehouse concept is that, I have my data in some open format, maybe it's Parquet or CSV or JSON, Avro, and I can bring whatever engine is the best engine for my workloads, bring it on, pay for consumption, and then shut it down. And by the way, that could be Cloudera. We don't talk about Cloudera very much, but it could be one business unit wants to use Athena. Another business unit wants to use some other Trino let's say or Dremio. So every business unit is working on the same data set, see that's critical, but that data set is maybe in their VPC and they bring any compute engine, you pay for the use, shut it down. That then you're getting value and you're only paying for consumption. It's not like, I left a cluster running by mistake, so there have to be guardrails. The reason FinOps is so big is because it's very easy for me to run a Cartesian joint in the cloud and get a $10,000 bill. >> This looks like it's been a sort of a victim of its own success in some ways, they made it so easy to spin up single note instances, multi note instances. And back in the day when compute was scarce and costly, those database engines optimized every last bit so they could get as much workload as possible out of every instance. Today, it's really easy to spin up a new node, a new multi node cluster. So that freedom has meant many more nodes that aren't necessarily getting that utilization. So Snowflake has been doing a lot to add reporting, monitoring, dashboards around the utilization of all the nodes and multi node instances that have spun up. And meanwhile, we're seeing some of the traditional on-prem databases that are moving into the cloud, trying to offer that freedom. And I think they're going to have that same discovery that the cost surprises are going to follow as they make it easy to spin up new instances. >> Yeah, a lot of money went into this market over the last decade, separating compute from storage, moving to the cloud. I'm glad you mentioned Cloudera Sanjeev, 'cause they got it all started, the kind of big data movement. We don't talk about them that much. Sometimes I wonder if it's because when they merged Hortonworks and Cloudera, they dead ended both platforms, but then they did invest in a more modern platform. But what's the future of Cloudera? What are you seeing out there? >> Cloudera has a good product. I have to say the problem in our space is that there're way too many companies, there's way too much noise. We are expecting the end users to parse it out or we expecting analyst firms to boil it down. So I think marketing becomes a big problem. As far as technology is concerned, I think Cloudera did turn their selves around and Tony, I know you, you talked to them quite frequently. I think they have quite a comprehensive offering for a long time actually. They've created Kudu, so they got operational, they have Hadoop, they have an operational data warehouse, they're migrated to the cloud. They are in hybrid multi-cloud environment. Lot of cloud data warehouses are not hybrid. They're only in the cloud. >> Right. I think what Cloudera has done the most successful has been in the transition to the cloud and the fact that they're giving their customers more OnRamps to it, more hybrid OnRamps. So I give them a lot of credit there. They're also have been trying to position themselves as being the most price friendly in terms of that we will put more guardrails and governors on it. I mean, part of that could be spin. But on the other hand, they don't have the same vested interest in compute cycles as say, AWS would have with EMR. That being said, yes, Cloudera does it, I think its most powerful appeal so of that, it almost sounds in a way, I don't want to cast them as a legacy system. But the fact is they do have a huge landed legacy on-prem and still significant potential to land and expand that to the cloud. That being said, even though Cloudera is multifunction, I think it certainly has its strengths and weaknesses. And the fact this is that yes, Cloudera has an operational database or an operational data store with a kind of like the outgrowth of age base, but Cloudera is still based, primarily known for the deep analytics, the operational database nobody's going to buy Cloudera or Cloudera data platform strictly for the operational database. They may use it as an add-on, just in the same way that a lot of customers have used let's say Teradata basically to do some machine learning or let's say, Snowflake to parse through JSON. Again, it's not an indictment or anything like that, but the fact is obviously they do have their strengths and their weaknesses. I think their greatest opportunity is with their existing base because that base has a lot invested and vested. And the fact is they do have a hybrid path that a lot of the others lack. >> And of course being on the quarterly shock clock was not a good place to be under the microscope for Cloudera and now they at least can refactor the business accordingly. I'm glad you mentioned hybrid too. We saw Snowflake last month, did a deal with Dell whereby non-native Snowflake data could access on-prem object store from Dell. They announced a similar thing with pure storage. What do you guys make of that? Is that just... How significant will that be? Will customers actually do that? I think they're using either materialized views or extended tables. >> There are data rated and residency requirements. There are desires to have these platforms in your own data center. And finally they capitulated, I mean, Frank Klutman is famous for saying to be very focused and earlier, not many months ago, they called the going on-prem as a distraction, but clearly there's enough demand and certainly government contracts any company that has data residency requirements, it's a real need. So they finally addressed it. >> Yeah, I'll bet dollars to donuts, there was an EBC session and some big customer said, if you don't do this, we ain't doing business with you. And that was like, okay, we'll do it. >> So Dave, I have to say, earlier on you had brought this point, how Frank Klutman was poo-pooing data science workloads. On your show, about a year or so ago, he said, we are never going to on-prem. He burnt that bridge. (Tony laughs) That was on your show. >> I remember exactly the statement because it was interesting. He said, we're never going to do the halfway house. And I think what he meant is we're not going to bring the Snowflake architecture to run on-prem because it defeats the elasticity of the cloud. So this was kind of a capitulation in a way. But I think it still preserves his original intent sort of, I don't know. >> The point here is that every vendor will poo-poo whatever they don't have until they do have it. >> Yes. >> And then it'd be like, oh, we are all in, we've always been doing this. We have always supported this and now we are doing it better than others. >> Look, it was the same type of shock wave that we felt basically when AWS at the last moment at one of their reinvents, oh, by the way, we're going to introduce outposts. And the analyst group is typically pre briefed about a week or two ahead under NDA and that was not part of it. And when they dropped, they just casually dropped that in the analyst session. It's like, you could have heard the sound of lots of analysts changing their diapers at that point. >> (laughs) I remember that. And a props to Andy Jassy who once, many times actually told us, never say never when it comes to AWS. So guys, I know we got to run. We got some hard stops. Maybe you could each give us your final thoughts, Doug start us off and then-- >> Sure. Well, we've got the Snowflake Summit coming up. I'll be looking for customers that are really doing data science, that are really employing Python through Snowflake, through Snowpark. And then a couple weeks later, we've got Databricks with their Data and AI Summit in San Francisco. I'll be looking for customers that are really doing considerable BI workloads. Last year I did a market overview of this analytical data platform space, 14 vendors, eight of them claim to support lakehouse, both sides of the camp, Databricks customer had 32, their top customer that they could site was unnamed. It had 32 concurrent users doing 15,000 queries per hour. That's good but it's not up to the most demanding BI SQL workloads. And they acknowledged that and said, they need to keep working that. Snowflake asked for their biggest data science customer, they cited Kabura, 400 terabytes, 8,500 users, 400,000 data engineering jobs per day. I took the data engineering job to be probably SQL centric, ETL style transformation work. So I want to see the real use of the Python, how much Snowpark has grown as a way to support data science. >> Great. Tony. >> Actually of all things. And certainly, I'll also be looking for similar things in what Doug is saying, but I think sort of like, kind of out of left field, I'm interested to see what MongoDB is going to start to say about operational analytics, 'cause I mean, they're into this conquer the world strategy. We can be all things to all people. Okay, if that's the case, what's going to be a case with basically, putting in some inline analytics, what are you going to be doing with your query engine? So that's actually kind of an interesting thing we're looking for next week. >> Great. Sanjeev. >> So I'll be at MongoDB world, Snowflake and Databricks and very interested in seeing, but since Tony brought up MongoDB, I see that even the databases are shifting tremendously. They are addressing both the hashtag use case online, transactional and analytical. I'm also seeing that these databases started in, let's say in case of MySQL HeatWave, as relational or in MongoDB as document, but now they've added graph, they've added time series, they've added geospatial and they just keep adding more and more data structures and really making these databases multifunctional. So very interesting. >> It gets back to our discussion of best of breed, versus all in one. And it's likely Mongo's path or part of their strategy of course, is through developers. They're very developer focused. So we'll be looking for that. And guys, I'll be there as well. I'm hoping that we maybe have some extra time on theCUBE, so please stop by and we can maybe chat a little bit. Guys as always, fantastic. Thank you so much, Doug, Tony, Sanjeev, and let's do this again. >> It's been a pleasure. >> All right and thank you for watching. This is Dave Vellante for theCUBE and the excellent analyst. We'll see you next time. (upbeat music)

Published Date : Jun 2 2022

SUMMARY :

And Doug Henschen is the vice president Thank you. Doug let's start off with you And at the same time, me a lot of that material. And of course, at the and then we realized all the and Tony have brought to light. So I'm interested, the And in the cloud, So Sanjeev, is this all hype? But the problem is that we I mean, I look at the space, and offload some of the So different focus, at the end of the day, and warehouses on one conjoined platform. of the sort of big data movement most of the contributions made decisions. Whereas he kind of poo-pooed the lakehouse and the data scientists are from Mars. and the companies that have in the balance sheet that the customers have to worry about. the modern data stack, if you will. and the data world together, the story is with MongoDB Until data mesh takes over. and you need separate teams. that raises the importance of and the caution there. Yeah, I have to defer on that one. The idea here is that the of course, the street freaks out. and actually plays into the And back in the day when the kind of big data movement. We are expecting the end And the fact is they do have a hybrid path refactor the business accordingly. saying to be very focused And that was like, okay, we'll do it. So Dave, I have to say, the Snowflake architecture to run on-prem The point here is that and now we are doing that in the analyst session. And a props to Andy Jassy and said, they need to keep working that. Great. Okay, if that's the case, Great. I see that even the databases I'm hoping that we maybe have and the excellent analyst.

ENTITIES

Entity	Category	Confidence
Doug	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
Tony	PERSON	0.99+
Uber	ORGANIZATION	0.99+
Frank	PERSON	0.99+
Frank Klutman	PERSON	0.99+
Tony Baers	PERSON	0.99+
Mars	LOCATION	0.99+
Doug Henschen	PERSON	0.99+
2020	DATE	0.99+
AWS	ORGANIZATION	0.99+
Venus	LOCATION	0.99+
Oracle	ORGANIZATION	0.99+
2012	DATE	0.99+
Databricks	ORGANIZATION	0.99+
Dell	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Holger Mueller	PERSON	0.99+
Andy Jassy	PERSON	0.99+
last year	DATE	0.99+
$5 billion	QUANTITY	0.99+
$10,000	QUANTITY	0.99+
14 vendors	QUANTITY	0.99+
Last year	DATE	0.99+
last week	DATE	0.99+
San Francisco	LOCATION	0.99+
SanjMo	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
8,500 users	QUANTITY	0.99+
Sanjeev	PERSON	0.99+
Informatica	ORGANIZATION	0.99+
32 concurrent users	QUANTITY	0.99+
two	QUANTITY	0.99+
Constellation Research	ORGANIZATION	0.99+
Mongo	ORGANIZATION	0.99+
Sanjeev Mohan	PERSON	0.99+
Ahana	ORGANIZATION	0.99+
DaaS	ORGANIZATION	0.99+
EMR	ORGANIZATION	0.99+
32	QUANTITY	0.99+
Atlas	ORGANIZATION	0.99+
Delta	ORGANIZATION	0.99+
Snowflake	ORGANIZATION	0.99+
Python	TITLE	0.99+
each	QUANTITY	0.99+
Athena	ORGANIZATION	0.99+
next week	DATE	0.99+

Breaking Analysis: The Hybrid Cloud Tug of War Gets Real

>> From the theCUBE Studios in Palo Alto in Boston, bringing you data-driven insights from theCUBE and ETR, this is Breaking Analysis with Dave Vellante. >> Well, it looks like hybrid cloud is finally here. We've seen a decade of posturing, marchitecture, slideware and narrow examples of hybrid cloud, but there's little question that the definition of cloud is expanding to include on-premises workloads in hybrid models. Now depending on which numbers you choose to represent IT spending, public cloud only accounts for actually less than 5% of the total pie. So the big question is, how will this now evolve? Customers want control, they want governance, they want security, flexibility and a feature-rich set of services to build their digital businesses. It's unlikely that they can buy all that, so they're going to have to build it with partners, specifically vendors, SI's, consultancies and their own developers. The tug of war to win the new cloud day has finally started in earnest between the hyperscalers and the largest enterprise tech companies in the world. Hello and welcome to this week's Wikibon CUBE insights, powered by ETR. In this Breaking Analysis, we'll walk you through how we see the battle for hybrid cloud, how we got here, where we are and where it's headed. First, I want to go back to 2009, in a blog post by a man named Chuck Hollis. Chuck Hollis, at the time, was a CTO and marketing guru inside of EMC who, remember, owned VMware. Chuck was kind of this hybrid, multi-tool player, pun intended. EMC at the time had a big stake, a lot at stake, as the ascendancy of AWS was threatening the historical models, which had defined enterprise IT. Now around that time, NIST published its first draft of a cloud computing definition which, as I recall, included language, something to the effect of accessing remote services over the public network, i.e., public IP networks. Now, NIST has essentially or since evolved that definition, but the original draft was very favorable to the public cloud. And the vendor community, the traditional vendor community, said hang on, we're in this game too. So that was 2009 when Chuck Hollis published this slide. He termed it Private Cloud, a term which he saw buried inside of a Gartner research post or research note that was not really fleshed out and defined. The idea was pretty compelling. The definition of cloud centered on control, where you, as the customer, had on-prem workloads that could span public and on-prem clouds, if you will, with federated security and a data plan that spanned the states. Essentially, you had an internal and an external cloud with a single point of control. This is basically what the hybrid cloud vision has become. An abstraction layer that spans on-prem and public clouds and we can extend that across clouds and out to the edge, where a customer has a single point of control and federated governance and security. Now we know this is still aspirational, but we're now seeing vendor offerings that put forth this promise and a roadmap to get there from different points of view, that we're going to talk about today. The NIST definition now reads cloud computing is a model for enabling ubiquitous, convenient on-demand network access to a shared pool of configurable computing resources, e.g., network server storage, applications and services, that can be rapidly provisioned and released with minimal management effort or service provider interaction. So there you have it, that is inclusive of on-prem, but it took the industry a decade plus to actually get where we are today. And they did so by essentially going to school with the public cloud offerings. Now in 2018, AWS announced Outposts and that was another wake up call to the on-prem community. Externally, they pointed to the validation that hybrid cloud was real. Hey, AWS is doing it so clearly they've capitulated, but most on-prem vendors at the time didn't have a coherent offering for hybrid, but the point is the on-prem vendors responded as they saw AWS moving past the demilitarized zone into enemy lines. And here's what the competitive landscape of hybrid offerings looks like today. All three US-based hyperscalers have an offering or multiple offerings in various forms, Outposts from Amazon and other services that they offer, Google Anthos and Azure Arc, they're all so prominent, but the real action today is coming from the on-prem vendors. Every major company has an offering. Now most of these stemmed from services-led and finance-led initiatives, but they're evolving to true Azure Service models. HPE GreenLake is prominent and the company's CEO, Antonio Neri, is putting the whole company behind Azure Service. HPE claims to be the first, it uses that in its marketing, with such an Azure Service offering, but actually Oracle was their first with Cloud@Customer. You know, possibly Microsoft could make a claim to being early as well, but it really doesn't matter. Let's see, Dell has responded with Apex and is going hard after this opportunity. Cisco has Cisco Plus and Lenovo has TruScale. IBM also has a long services and finance-led history and has announced pockets of Azure Service in areas like storage. And Pure Storage is an example that we chose of a segment player, of course within storage, that has a strong Azure Service offering, and there are others like that. So the landscape is getting very busy. And so, let's break this down a bit. AWS is bringing its programmable infrastructure model and its own hardware to what it calls the edge. And it looks at on-prem data centers as just another edge node. So that's how they're de-positioning the on-prem crowd, but the fact is, when you really look at what Outposts can do today, it's limited, but AWS will move quickly so expect a continued rapid evolution of their model and the services that are supported on Outposts. Azure gets its hardware from partners and has relationships with virtually everyone that matters. Anthos is, as well, a software layer and Google created Kubernetes as the great equalizer in cloud. And it was a nice open source gift to the industry and has obviously taken off. So the cloud guys have the advantage of owning a cloud. The pure on-prem players, they don't, but the on-prem crowd has rich stacks, much richer and more mature in a lot of areas, as it relates to supporting on-premises workloads and much more so than the cloud players, but they don't have mature cloud stacks. They're kind of just getting started with things like subscription billing and API-based microservices offerings. They got to figure out Salesforce compensation and just the overall Azure service mentality versus the historical product box mentality, and that takes time. And they're each coming at this from their respective different points of view and points of strength. HPE is doing a very good job of marketing and go-to market. It probably has the cleanest model, enabled by the company's split from HP, but it has some gaps that it's needed to fill and it's doing so through acquisitions. Ezmeral, for example, is it's new data play. It just bought Zerto to facilitate backup as a service. And it's expanded partnerships to fill gaps in the portfolio. Some partnerships, which they couldn't do before because it created conflicts inside of HPE or HP. Dell is all about the portfolio, the breadth of the portfolio, the go-to-market prowess and its supply chain advantage. It's very serious about Azure Service with Apex and it's driving hard to win that day. Cisco comes at this from a huge portfolio and of course, a point of strength and networking, which maybe is a bit tougher to offer as a service, but Cisco has a large and fast growing subscription business in collaborations, security and other areas, so it's cloud-like in that regard. And Oracle, of course, has the huge advantage of an extremely rich functional stack and it owns a cloud, which has dramatically improved in the past few years, but Oracle is narrow to the red stack, at least today. Oracle, if it wanted to, we think, could dominate the database cloud, it could be the database cloud, especially if it decided to open its cloud to competitive database offerings and run them in the Oracle cloud. Hmm. Wonder if Oracle will ever move in that direction. Now a big part of this shift is the appeal of OPEX versus CAPEX. Let's take a look at some ETR data that digs a bit deeper into this topic. This data is from an August ETR drill down, asking CIOs and IT buyers how their budgets are split between OPEX and CAPEX. The mid point of the yellow line shows where we are today, 57% OPEX, expecting to grow to 63% one year from now. That's not a huge difference, there's not a huge difference when you drill into global 2000, which kind of surprised me. I thought global 2000 would be heavier CAPEX, but they seem to be accelerating the shift to OPEX slightly faster than the overall base, but not really in a meaningful way. So I didn't really discern big differences there. Now, when you dig further into industries and look at subscription versus consumption models for OPEX, you see about 60/40 favoring subscription models, with most industry slowly moving toward consumption or usage based models over time. There are a couple of outliers, but generally speaking, that's the trend. What's perhaps more interesting is when you drill into subscription versus usage based models by product area, and that's what this chart shows. It shows by tech segment, the percent subscription, that's the blue, versus consumption or usage based, that's the gray bars, yellow being indifferent or maybe it's I don't know. What stands out are two areas that are more usage heavy, consumption heavy. That's database, data warehousing, and IS. So database is surely weighted by companies like Snowflake and offerings like Redshift and other cloud databases from Azure and Google and other managed services, but the IS piece, while not surprising, is, we think, relevant because most of the legacy vendor Azure Service offerings are borrowing from a SaaS-oriented subscription model with a hardware twist. In other words, as a customer, you're committing to a term and a minimum spend over the life of that term. You're locked in for a year or three years, whatever it is, to account for the hardware and headroom the vendor has to install because they want to allow you to increase your usage. So that's the usage based model. See, you're then paying by the drink for that consumption above that minimum threshold. So it's a hybrid subscription consumption model, which is actually quite interesting. And we've been saying, what would really be cool is if one of the on-prem penguins on the iceberg would actually jump in and offer a true consumption model right out of the box, as a disruptive move to the industry and to the cloud players, and take that risk. And I think that might happen once they feel comfortable with the financial model and they have nailed the product market fit, but right now, the model is what it is. And even AWS without post requires a threshold and a minimum commitment. So we'd love to see someone take that chance and offer true cloud consumption pricing to facilitate more experimentation and lower risk for the customer entry points. Now let's take a look at some of these players and see what kind of spending momentum they have. This is our popular XY chart-view that plots net score or spending velocity on the x-axis and market share or pervasiveness in the data set on the... Oh, sorry, net score or spending momentum on the y-axis and pervasiveness or market share on the x-axis. Now this is cut by cloud computing vendors, as defined by the customers responding. There were nearly 1500 respondents in the ETR survey, so a couple of points here. Note the red line is the elevated line. In other words, anything above that is considered really robust momentum. And no surprise, Azure, AWS and Google are above that line. Azure and AWS always battle it out for top share of voice in the x-axis in this survey. Now this, remember, is the July survey, but ETR, they gave me a sneak peek at the October results that they're going to be releasing in the coming week and Dell cloud and VMware cloud, which is VCF and maybe some other components, not VMware cloud and AWS, that's a separate beast, but those two are moving up in the y-axis. So they're demonstrating spending momentum. IBM is moving down and Oracle is at a respectable 20% on the y-axis. Now, interestingly, HPE and Lenovo don't show up in the cloud taxonomy, in that cloud cut, and neither does Cisco. I believe I'm correct in that this is an open-ended question, i.e., who are your cloud suppliers? So the customers are not resonating with that messaging yet, but I'm going to double check on that. Now to widen the aperture a bit, we said let's do a cut of the on-prem and cloud players within cloud accounts, so we can include HPE and Cisco and see how they're doing inside of cloud accounts. So that's what this chart does. It's a filter on 975 customers who identify themselves as cloud accounts. So here we were able to add in Cisco and HPE. Now, Lenovo still doesn't show up on the data. It shows up in laptops and desktops, but not as prominent in the enterprise, not prominent at all, but HPE Ezmeral did show up and it's moving forward in the October survey, again, part of the sneak peek. Ezmeral is HPE's data platform that they've introduced, combining the assets of MapR, BlueData and some other organic development. Now, as you can see, HPE and Cisco, they show up on the chart, as I said, and you can see the rope in the tug of war is starting to get a little bit more taut. The cloud guys have momentum and big account presence, but the on-prem folks also have big footprints, rich stacks and many have strong services arms, and a lot of customer affinity. So let's wrap with some comments about how this will shake out and what's some of the markers we can watch. Now, the first thing I'll say is we're starting to hear the right language come out of the vendor community. The idea that they're investing in a layer to abstract the underlying complexity of the clouds and on-prem infrastructure and turning the world into, essentially, a programmable interface to resources. The question is, what about giving access through that layer to underlying primitives in the public cloud? VMware has been very clear on this. They will facilitate that access. I believe Red Hat as well. So watch to the degree in which the large on-prem players are enabling that access for developers. We believe this is the right direction overall, but it's also very hard and it's going to require lots of resources and R & D. I would say at this point that each company has its respective strengths and weaknesses. I see HPE mostly focused today on making its on-prem offerings work like a cloud, whereas some of the others, VMware, Dell and Cisco, are stressing to a greater degree, in my view, enabling multi-cloud and edge connections, cross connections. Not that HPE isn't open to that when you ask them about it, but its marketing is more on-prem leaning, in my opinion. Now all of the traditional vendors, in my view, are still defensive about the cloud, although I would say much less so each day. Increasingly, they look at the public cloud as an opportunity to build value on top of that abstraction layer, if you will. As I said earlier, these on-prem guys, they all have ways to go. They're in the early stages of figuring out what a cloud operating model looks like, how it works, what services to offer, how to pay sellers and partners, but the public cloud vendors, they're miles ahead in that regard, but at the same time, they're navigating into on-prem territory. And they're very immature, in most cases. So how do they service all this stuff? How do they establish partnerships and so forth? And how do they build stacks on prem that are as rich as they are in the cloud? And what's their motivation to do that? Are they getting pulled, digging their heels in? Or are they really serious about it? Now, in some respects, Oracle is in the best position here in terms of hybrid maturity, but again, it's narrowly focused on the Red Stack. I would say the same for Pure Storage, more mature as a service, but narrowly focused, of course, on storage. Let's talk marketplace and ecosystems. One of the hallmarks of public clouds is optionality of tooling. Just all you do is go to the AWS Marketplace and you'll see what I mean. It's got this endless bevy of choices. It's got one of everything in there and you can buy directly from your AWS Console. So watch how the hybrid cloud plays out in terms of partner inclusion and ease of doing business, that's another sign of maturity. Let's talk developers and edge. This is by far the most important and biggest hole in the hybrid portfolios, outside the public cloud players. If you're going to build infrastructure as code, who do you expect to code it? How are the on-prem players cultivating developer communities? IBM paid 34 billion to buy its way in. Actually, in today's valuation terms, you might say that's looking like a good play, but still, that cash outlay is equal to one third of IBM's revenue. So big, big bet on OpenShift, but IBM's infrastructure strategy is fragmented and its cloud business, as IBM reports in its financial statements, is a services-heavy, kitchen sink set of offerings. It's very confusing. So they got to still do some clean up there, but they're serious about the architectural battle for hybrid cloud, as Arvind Krishna calls it. Now VMware, by cobbling together the misfit developer toys of the remnants from the EMC Federation, including Pivotal, is trying to get there. You know, but when you talk to customers, they're still not all in on VMware's developer affinity. Now Cisco has DevNet, but that's basically CCIE's and other trained networking engineers learning to code in languages like Python. It's not necessarily true devs, although they're upskilling. It's a start and they're investing, Cisco, that is, investing in the community, leveraging their champions, and I would say Dell could do the same with, for example, the numerous EMC storage admins that are out there. Now Oracle bought Sun to get Java, and that's a large community of developers, but even so, when you compare AWS and Microsoft ecosystems to the others, it's not even close in terms of developer affinity. So lots of work to be done there. One other point is Pure's acquisition of Portworx, again, while narrowly focused, is a good move and instructive of the changes going on in infrastructure. Now how does this all relate to the edge? Well, I'm not going to talk much about that today, but suffice to say, developers, in our view, will win the edge. And right now, they're coding in the cloud. Now they're often coding in the cloud and moving work on prem, wrapping them in containers, but watch how sticky that model is for the respective players. The other thing to watch is cadence of offerings. Another hallmark of cloud is a rapid expansion of features. The public cloud players don't appear to be slowing down and the on-prem folks seem to be accelerating. I've been watching HPE and GreenLake and their cadence of offerings, and watch how quickly the newbies of Azure Service can add functionality, I have no doubt Dell is going to be right there as well, as is Cisco and others. Also pay attention to financial metrics, watch how Azure Service impacts the income statements and how the companies deal with that because as you shift to deferred revenue models, it's going to hurt profitability. And I'm not worried about that at all because it won't hurt cashflow, or at least it shouldn't. As long as the companies communicate to Wall Street and they're transparent, i.e., they don't shift reporting definitions every year and a half or two years, but watch for metrics around retention and churn, RPO or Remaining Performance Obligations, billing versus bookings, increased average contract values, cohort selling, the impact on both gross margin and operating margin. These are the things you watch with SaaS companies and essentially, these big hardware players are becoming Azure Service slash SaaS companies. These are going to be the key indicators of success and the proof in the pudding of the transition to Azure Service. It should be positive for these companies, assuming they get the product market fit right, and can create a flywheel effect with their respective ecosystems and partner channels. Now I'm sure you can think of other important factors to watch, but I'm going to leave it here for now. Remember these episodes, they're all available as podcasts, wherever you listen. All you got to do is search Breaking Analysis podcast and please subscribe, check out ETR's website at etr.plus. We also publish a full report every week on wikibon.com and siliconangle.com. You can get in touch with me, email david.vellante@siliconangle.com or you can DM me @dvellante. You can comment on our LinkedIn posts. This is Dave Vellante for theCUBE Insights powered by ETR. Have a great week, everybody, stay safe, be well. And we'll see you next time. (soft music)

Published Date : Oct 15 2021

SUMMARY :

From the theCUBE Studios and a data plan that spanned the states.

ENTITIES

Entity	Category	Confidence
Chuck Hollis	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Chuck Hollis	PERSON	0.99+
Lenovo	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
October	DATE	0.99+
2009	DATE	0.99+
Antonio Neri	PERSON	0.99+
2018	DATE	0.99+
Dell	ORGANIZATION	0.99+
975 customers	QUANTITY	0.99+
NIST	ORGANIZATION	0.99+
three years	QUANTITY	0.99+
HP	ORGANIZATION	0.99+
July	DATE	0.99+
Arvind Krishna	PERSON	0.99+
Palo Alto	LOCATION	0.99+
20%	QUANTITY	0.99+
EMC	ORGANIZATION	0.99+
HPE	ORGANIZATION	0.99+
Chuck	PERSON	0.99+
August	DATE	0.99+
34 billion	QUANTITY	0.99+
57%	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
63%	QUANTITY	0.99+
VMware	ORGANIZATION	0.99+
two	QUANTITY	0.99+
EMC Federation	ORGANIZATION	0.99+
a year	QUANTITY	0.99+
First	QUANTITY	0.99+
Python	TITLE	0.99+
first	QUANTITY	0.99+
Java	TITLE	0.99+
One	QUANTITY	0.99+
theCUBE Studios	ORGANIZATION	0.99+
less than 5%	QUANTITY	0.99+
Pivotal	ORGANIZATION	0.99+
Azure Service	TITLE	0.99+
two years	QUANTITY	0.99+
first draft	QUANTITY	0.99+
Gartner	ORGANIZATION	0.98+

Breaking Analysis: How JPMC is Implementing a Data Mesh Architecture on the AWS Cloud

>> From theCUBE studios in Palo Alto and Boston, bringing you data-driven insights from theCUBE and ETR. This is braking analysis with Dave Vellante. >> A new era of data is upon us, and we're in a state of transition. You know, even our language reflects that. We rarely use the phrase big data anymore, rather we talk about digital transformation or digital business, or data-driven companies. Many have come to the realization that data is a not the new oil, because unlike oil, the same data can be used over and over for different purposes. We still use terms like data as an asset. However, that same narrative, when it's put forth by the vendor and practitioner communities, includes further discussions about democratizing and sharing data. Let me ask you this, when was the last time you wanted to share your financial assets with your coworkers or your partners or your customers? Hello everyone, and welcome to this week's Wikibon Cube Insights powered by ETR. In this breaking analysis, we want to share our assessment of the state of the data business. We'll do so by looking at the data mesh concept and how a leading financial institution, JP Morgan Chase is practically applying these relatively new ideas to transform its data architecture. Let's start by looking at what is the data mesh. As we've previously reported many times, data mesh is a concept and set of principles that was introduced in 2018 by Zhamak Deghani who's director of technology at ThoughtWorks, it's a global consultancy and software development company. And she created this movement because her clients, who were some of the leading firms in the world had invested heavily in predominantly monolithic data architectures that had failed to deliver desired outcomes in ROI. So her work went deep into trying to understand that problem. And her main conclusion that came out of this effort was the world of data is distributed and shoving all the data into a single monolithic architecture is an approach that fundamentally limits agility and scale. Now a profound concept of data mesh is the idea that data architectures should be organized around business lines with domain context. That the highly technical and hyper specialized roles of a centralized cross functional team are a key blocker to achieving our data aspirations. This is the first of four high level principles of data mesh. So first again, that the business domain should own the data end-to-end, rather than have it go through a centralized big data technical team. Second, a self-service platform is fundamental to a successful architectural approach where data is discoverable and shareable across an organization and an ecosystem. Third, product thinking is central to the idea of data mesh. In other words, data products will power the next era of data success. And fourth data products must be built with governance and compliance that is automated and federated. Now there's lot more to this concept and there are tons of resources on the web to learn more, including an entire community that is formed around data mesh. But this should give you a basic idea. Now, the other point is that, in observing Zhamak Deghani's work, she is deliberately avoided discussions around specific tooling, which I think has frustrated some folks because we all like to have references that tie to products and tools and companies. So this has been a two-edged sword in that, on the one hand it's good, because data mesh is designed to be tool agnostic and technology agnostic. On the other hand, it's led some folks to take liberties with the term data mesh and claim mission accomplished when their solution, you know, maybe more marketing than reality. So let's look at JP Morgan Chase in their data mesh journey. Is why I got really excited when I saw this past week, a team from JPMC held a meet up to discuss what they called, data lake strategy via data mesh architecture. I saw that title, I thought, well, that's a weird title. And I wondered, are they just taking their legacy data lakes and claiming they're now transformed into a data mesh? But in listening to the presentation, which was over an hour long, the answer is a definitive no, not at all in my opinion. A gentleman named Scott Hollerman organized the session that comprised these three speakers here, James Reid, who's a divisional CIO at JPMC, Arup Nanda who is a technologist and architect and Serita Bakst who is an information architect, again, all from JPMC. This was the most detailed and practical discussion that I've seen to date about implementing a data mesh. And this is JP Morgan's their approach, and we know they're extremely savvy and technically sound. And they've invested, it has to be billions in the past decade on data architecture across their massive company. And rather than dwell on the downsides of their big data past, I was really pleased to see how they're evolving their approach and embracing new thinking around data mesh. So today, we're going to share some of the slides that they use and comment on how it dovetails into the concept of data mesh that Zhamak Deghani has been promoting, and at least as we understand it. And dig a bit into some of the tooling that is being used by JP Morgan, particularly around it's AWS cloud. So the first point is it's all about business value, JPMC, they're in the money business, and in that world, business value is everything. So Jr Reid, the CIO showed this slide and talked about their overall goals, which centered on a cloud first strategy to modernize the JPMC platform. I think it's simple and sensible, but there's three factors on which he focused, cut costs always short, you got to do that. Number two was about unlocking new opportunities, or accelerating time to value. But I was really happy to see number three, data reuse. That's a fundamental value ingredient in the slide that he's presenting here. And his commentary was all about aligning with the domains and maximizing data reuse, i.e. data is not like oil and making sure there's appropriate governance around that. Now don't get caught up in the term data lake, I think it's just how JP Morgan communicates internally. It's invested in the data lake concept, so they use water analogies. They use things like data puddles, for example, which are single project data marts or data ponds, which comprise multiple data puddles. And these can feed in to data lakes. And as we'll see, JPMC doesn't strive to have a single version of the truth from a data standpoint that resides in a monolithic data lake, rather it enables the business lines to create and own their own data lakes that comprise fit for purpose data products. And they do have a single truth of metadata. Okay, we'll get to that. But generally speaking, each of the domains will own end-to-end their own data and be responsible for those data products, we'll talk about that more. Now the genesis of this was sort of a cloud first platform, JPMC is leaning into public cloud, which is ironic since the early days, in the early days of cloud, all the financial institutions were like never. Anyway, JPMC is going hard after it, they're adopting agile methods and microservices architectures, and it sees cloud as a fundamental enabler, but it recognizes that on-prem data must be part of the data mesh equation. Here's a slide that starts to get into some of that generic tooling, and then we'll go deeper. And I want to make a couple of points here that tie back to Zhamak Deghani's original concept. The first is that unlike many data architectures, this puts data as products right in the fat middle of the chart. The data products live in the business domains and are at the heart of the architecture. The databases, the Hadoop clusters, the files and APIs on the left-hand side, they serve the data product builders. The specialized roles on the right hand side, the DBA's, the data engineers, the data scientists, the data analysts, we could have put in quality engineers, et cetera, they serve the data products. Because the data products are owned by the business, they inherently have the context that is the middle of this diagram. And you can see at the bottom of the slide, the key principles include domain thinking, an end-to-end ownership of the data products. They build it, they own it, they run it, they manage it. At the same time, the goal is to democratize data with a self-service as a platform. One of the biggest points of contention of data mesh is governance. And as Serita Bakst said on the Meetup, metadata is your friend, and she kind of made a joke, she said, "This sounds kind of geeky, but it's important to have a metadata catalog to understand where data resides and the data lineage in overall change management. So to me, this really past the data mesh stink test pretty well. Let's look at data as products. CIO Reid said the most difficult thing for JPMC was getting their heads around data product, and they spent a lot of time getting this concept to work. Here's the slide they use to describe their data products as it related to their specific industry. They set a common language and taxonomy is very important, and you can imagine how difficult that was. He said, for example, it took a lot of discussion and debate to define what a transaction was. But you can see at a high level, these three product groups around wholesale, credit risk, party, and trade and position data as products, and each of these can have sub products, like, party, we'll have to know your customer, KYC for example. So a key for JPMC was to start at a high level and iterate to get more granular over time. So lots of decisions had to be made around who owns the products and the sub-products. The product owners interestingly had to defend why that product should even exist, what boundaries should be in place and what data sets do and don't belong in the various products. And this was a collaborative discussion, I'm sure there was contention around that between the lines of business. And which sub products should be part of these circles? They didn't say this, but tying it back to data mesh, each of these products, whether in a data lake or a data hub or a data pond or data warehouse, data puddle, each of these is a node in the global data mesh that is discoverable and governed. And supporting this notion, Serita said that, "This should not be infrastructure-bound, logically, any of these data products, whether on-prem or in the cloud can connect via the data mesh." So again, I felt like this really stayed true to the data mesh concept. Well, let's look at some of the key technical considerations that JPM discussed in quite some detail. This chart here shows a diagram of how JP Morgan thinks about the problem, and some of the challenges they had to consider were how to write to various data stores, can you and how can you move data from one data store to another? How can data be transformed? Where's the data located? Can the data be trusted? How can it be easily accessed? Who has the right to access that data? These are all problems that technology can help solve. And to address these issues, Arup Nanda explained that the heart of this slide is the data in ingestor instead of ETL. All data producers and contributors, they send their data to the ingestor and the ingestor then registers the data so it's in the data catalog. It does a data quality check and it tracks the lineage. Then, data is sent to the router, which persists the data in the data store based on the best destination as informed by the registration. This is designed to be a flexible system. In other words, the data store for a data product is not fixed, it's determined at the point of inventory, and that allows changes to be easily made in one place. The router simply reads that optimal location and sends it to the appropriate data store. Nowadays you see the schema infer there is used when there is no clear schema on right. In this case, the data product is not allowed to be consumed until the schema is inferred, and then the data goes into a raw area, and the inferer determines the schema and then updates the inventory system so that the data can be routed to the proper location and properly tracked. So that's some of the detail of how the sausage factory works in this particular use case, it was very interesting and informative. Now let's take a look at the specific implementation on AWS and dig into some of the tooling. As described in some detail by Arup Nanda, this diagram shows the reference architecture used by this group within JP Morgan, and it shows all the various AWS services and components that support their data mesh approach. So start with the authorization block right there underneath Kinesis. The lake formation is the single point of entitlement and has a number of buckets including, you can see there the raw area that we just talked about, a trusted bucket, a refined bucket, et cetera. Depending on the data characteristics at the data catalog registration block where you see the glue catalog, that determines in which bucket the router puts the data. And you can see the many AWS services in use here, identity, the EMR, the elastic MapReduce cluster from the legacy Hadoop work done over the years, the Redshift Spectrum and Athena, JPMC uses Athena for single threaded workloads and Redshift Spectrum for nested types so they can be queried independent of each other. Now remember very importantly, in this use case, there is not a single lake formation, rather than multiple lines of business will be authorized to create their own lakes, and that creates a challenge. So how can that be done in a flexible and automated manner? And that's where the data mesh comes into play. So JPMC came up with this federated lake formation accounts idea, and each line of business can create as many data producer or consumer accounts as they desire and roll them up into their master line of business lake formation account. And they cross-connect these data products in a federated model. And these all roll up into a master glue catalog so that any authorized user can find out where a specific data element is located. So this is like a super set catalog that comprises multiple sources and syncs up across the data mesh. So again to me, this was a very well thought out and practical application of database. Yes, it includes some notion of centralized management, but much of that responsibility has been passed down to the lines of business. It does roll up to a master catalog, but that's a metadata management effort that seems compulsory to ensure federated and automated governance. As well at JPMC, the office of the chief data officer is responsible for ensuring governance and compliance throughout the federation. All right, so let's take a look at some of the suspects in this world of data mesh and bring in the ETR data. Now, of course, ETR doesn't have a data mesh category, there's no such thing as that data mesh vendor, you build a data mesh, you don't buy it. So, what we did is we use the ETR dataset to select and filter on some of the culprits that we thought might contribute to the data mesh to see how they're performing. This chart depicts a popular view that we often like to share. It's a two dimensional graphic with net score or spending momentum on the vertical axis and market share or pervasiveness in the data set on the horizontal axis. And we filtered the data on sectors such as analytics, data warehouse, and the adjacencies to things that might fit into data mesh. And we think that these pretty well reflect participation that data mesh is certainly not all compassing. And it's a subset obviously, of all the vendors who could play in the space. Let's make a few observations. Now as is often the case, Azure and AWS, they're almost literally off the charts with very high spending velocity and large presence in the market. Oracle you can see also stands out because much of the world's data lives inside of Oracle databases. It doesn't have the spending momentum or growth, but the company remains prominent. And you can see Google Cloud doesn't have nearly the presence in the dataset, but it's momentum is highly elevated. Remember that red dotted line there, that 40% line, anything over that indicates elevated spending momentum. Let's go to Snowflake. Snowflake is consistently shown to be the gold standard in net score in the ETR dataset. It continues to maintain highly elevated spending velocity in the data. And in many ways, Snowflake with its data marketplace and its data cloud vision and data sharing approach, fit nicely into the data mesh concept. Now, a caution, Snowflake has used the term data mesh in it's marketing, but in our view, it lacks clarity, and we feel like they're still trying to figure out how to communicate what that really is. But is really, we think a lot of potential there to that vision. Databricks is also interesting because the firm has momentum and we expect further elevated levels in the vertical axis in upcoming surveys, especially as it readies for its IPO. The firm has a strong product and managed service, and is really one to watch. Now we included a number of other database companies for obvious reasons like Redis and Mongo, MariaDB, Couchbase and Terradata. SAP as well is in there, but that's not all database, but SAP is prominent so we included them. As is IBM more of a database, traditional database player also with the big presence. Cloudera includes Hortonworks and HPE Ezmeral comprises the MapR business that HPE acquired. So these guys got the big data movement started, between Cloudera, Hortonworks which is born out of Yahoo, which was the early big data, sorry early Hadoop innovator, kind of MapR when it's kind of owned course, and now that's all kind of come together in various forms. And of course, we've got Talend and Informatica are there, they are two data integration companies that are worth noting. We also included some of the AI and ML specialists and data science players in the mix like DataRobot who just did a monster $250 million round. Dataiku, H2O.ai and ThoughtSpot, which is all about democratizing data and injecting AI, and I think fits well into the data mesh concept. And you know we put VMware Cloud in there for reference because it really is the predominant on-prem infrastructure platform. All right, let's wrap with some final thoughts here, first, thanks a lot to the JP Morgan team for sharing this data. I really want to encourage practitioners and technologists, go to watch the YouTube of that meetup, we'll include it in the link of this session. And thank you to Zhamak Deghani and the entire data mesh community for the outstanding work that you're doing, challenging the established conventions of monolithic data architectures. The JPM presentation, it gives you real credibility, it takes Data Mesh well beyond concept, it demonstrates how it can be and is being done. And you know, this is not a perfect world, you're going to start somewhere and there's going to be some failures, the key is to recognize that shoving everything into a monolithic data architecture won't support massive scale and agility that you're after. It's maybe fine for smaller use cases in smaller firms, but if you're building a global platform in a data business, it's time to rethink data architecture. Now much of this is enabled by the cloud, but cloud first doesn't mean cloud only, doesn't mean you'll leave your on-prem data behind, on the contrary, you have to include non-public cloud data in your Data Mesh vision just as JPMC has done. You've got to get some quick wins, that's crucial so you can gain credibility within the organization and grow. And one of the key takeaways from the JP Morgan team is, there is a place for dogma, like organizing around data products and domains and getting that right. On the other hand, you have to remain flexible because technologies is going to come, technology is going to go, so you got to be flexible in that regard. And look, if you're going to embrace the metaphor of water like puddles and ponds and lakes, we suggest maybe a little tongue in cheek, but still we believe in this, that you expand your scope to include data ocean, something John Furry and I have talked about and laughed about extensively in theCUBE. Data oceans, it's huge. It's the new data lake, go transcend data lake, think oceans. And think about this, just as we're evolving our language, we should be evolving our metrics. Much the last the decade of big data was around just getting the stuff to work, getting it up and running, standing up infrastructure and managing massive, how much data you got? Massive amounts of data. And there were many KPIs built around, again, standing up that infrastructure, ingesting data, a lot of technical KPIs. This decade is not just about enabling better insights, it's a more than that. Data mesh points us to a new era of data value, and that requires the new metrics around monetizing data products, like how long does it take to go from data product conception to monetization? And how does that compare to what it is today? And what is the time to quality if the business owns the data, and the business has the context? the quality that comes out of them, out of the shoot should be at a basic level, pretty good, and at a higher mark than out of a big data team with no business context. Automation, AI, and very importantly, organizational restructuring of our data teams will heavily contribute to success in the coming years. So we encourage you, learn, lean in and create your data future. Okay, that's it for now, remember these episodes, they're all available as podcasts wherever you listen, all you got to do is search, breaking analysis podcast, and please subscribe. Check out ETR's website at etr.plus for all the data and all the survey information. We publish a full report every week on wikibon.com and siliconangle.com. And you can get in touch with us, email me david.vellante@siliconangle.com, you can DM me @dvellante, or you can comment on my LinkedIn posts. This is Dave Vellante for theCUBE insights powered by ETR. Have a great week everybody, stay safe, be well, and we'll see you next time. (upbeat music)

Published Date : Jul 12 2021

SUMMARY :

This is braking analysis and the adjacencies to things

ENTITIES

Entity	Category	Confidence
JPMC	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
2018	DATE	0.99+
Zhamak Deghani	PERSON	0.99+
James Reid	PERSON	0.99+
JP Morgan	ORGANIZATION	0.99+
JP Morgan	ORGANIZATION	0.99+
Cloudera	ORGANIZATION	0.99+
Serita Bakst	PERSON	0.99+
IBM	ORGANIZATION	0.99+
HPE	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Scott Hollerman	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
40%	QUANTITY	0.99+
JP Morgan Chase	ORGANIZATION	0.99+
Serita	PERSON	0.99+
Yahoo	ORGANIZATION	0.99+
Arup Nanda	PERSON	0.99+
each	QUANTITY	0.99+
ThoughtWorks	ORGANIZATION	0.99+
first	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
david.vellante@siliconangle.com	OTHER	0.99+
each line	QUANTITY	0.99+
Terradata	ORGANIZATION	0.99+
Redis	ORGANIZATION	0.99+
$250 million	QUANTITY	0.99+
first point	QUANTITY	0.99+
three factors	QUANTITY	0.99+
Second	QUANTITY	0.99+
MapR	ORGANIZATION	0.99+
today	DATE	0.99+
Informatica	ORGANIZATION	0.99+
Talend	ORGANIZATION	0.99+
John Furry	PERSON	0.99+
Zhamak Deghani	PERSON	0.99+
first platform	QUANTITY	0.98+
YouTube	ORGANIZATION	0.98+
fourth	QUANTITY	0.98+
single	QUANTITY	0.98+
One	QUANTITY	0.98+
Third	QUANTITY	0.97+
Couchbase	ORGANIZATION	0.97+
three speakers	QUANTITY	0.97+
two data	QUANTITY	0.97+
first strategy	QUANTITY	0.96+
one	QUANTITY	0.96+
one place	QUANTITY	0.96+
Jr Reid	PERSON	0.96+
single lake	QUANTITY	0.95+
SAP	ORGANIZATION	0.95+
wikibon.com	OTHER	0.95+
siliconangle.com	OTHER	0.94+
Azure	ORGANIZATION	0.93+

Matt Maccaux, HPE | HPE Discover 2021

(bright music) >> Data by its very nature is distributed and siloed, but most data architectures today are highly centralized. Organizations are increasingly challenged to organize and manage data, and turn that data into insights. This idea of a single monolithic platform for data, it's giving way to new thinking. Where a decentralized approach, with open cloud native principles and federated governance, will become an underpinning of digital transformations. Hi everybody. This is Dave Volante. Welcome back to HPE Discover 2021, the virtual version. You're watching theCube's continuous coverage of the event and we're here with Matt Maccaux, who's a field CTO for Ezmeral Software at HPE. We're going to talk about HPE software strategy, and Ezmeral and specifically how to take AI analytics to scale and ensure the productivity of data teams. Matt, welcome to theCube. Good to see you. >> Good to see you again, Dave. Thanks for having me today. >> You're welcome. So talk a little bit about your role as a CTO. Where do you spend your time? >> I spend about half of my time talking to customers and partners about where they are on their digital transformation journeys and where they struggle with this sort of last phase where we start talking about bringing those cloud principles and practices into the data world. How do I take those data warehouses, those data lakes, those distributed data systems, into the enterprise and deploy them in a cloud-like manner? Then the other half of my time is working with our product teams to feed that information back, so that we can continually innovate to the next generation of our software platform. >> So when I remember, I've been following HP and HPE, for a long, long time, theCube has documented, we go back to sort of when the company was breaking in two parts, and at the time a lot of people were saying, "Oh, HP is getting rid of their software business, they're getting out of software." I said, "No, no, no, hold on. They're really focusing", and the whole focus around hybrid cloud and now as a service, you've really retooling that business and sharpened your focus. So tell us more about Ezmeral, it's a cool name, but what exactly is Ezmeral software? >> I get this question all the time. So what is Ezmeral? Ezmeral is a software platform for modern data and analytics workloads, using open source software components. We came from some inorganic growth. We acquired a company called Cytec, that brought us a zero trust approach to doing security with containers. We bought BlueData who came to us with an orchestrator before Kubernetes even existed in mainstream. They were orchestrating workloads using containers for some of these more difficult workloads. Clustered applications, distributed applications like Hadoop. Then finally we acquired MapR, which gave us this scale out distributed file system and additional analytical capabilities. What we've done is we've taken those components and we've also gone out into the marketplace to see what open source projects exist to allow us to bring those cloud principles and practices to these types of workloads, so that we can take things like Hadoop, and Spark, and Presto, and deploy and orchestrate them using open source Kubernetes. Leveraging GPU's, while providing that zero trust approach to security, that's what Ezmeral is all about is taking those cloud practices and principles, but without locking you in. Again, using those open source components where they exist, and then committing and contributing back to the opensource community where those projects don't exist. >> You know, it's interesting, thank you for that history, and when I go back, I have been there since the early days of Big Data and Hadoop and so forth and MapR always had the best product, but they couldn't get it out. Back then it was like kumbaya, open source, and they had this kind of proprietary system but it worked and that's why it was the best product. So at the same time they participated in open source projects because everybody did, that's where the innovation is going. So you're making that really hard to use stuff easier to use with Kubernetes orchestration, and then obviously, I'm presuming with the open source chops, sort of leaning into the big trends that you're seeing in the marketplace. So my question is, what are those big trends that you're seeing when you speak to technology executives which is a big part of what you do? >> So the trends are, I think, are a couplefold, and it's funny about Hadoop, but I think the final nails in the coffin have been hammered in with the Hadoop space now. So that leading trend, of where organizations are going, we're seeing organizations wanting to go cloud first. But they really struggle with these data-intensive workloads. Do I have to store my data in every cloud? Am I going to pay egress in every cloud? Well, what if my data scientists are most comfortable in AWS, but my data analysts are more comfortable in Azure, how do I provide that multi-cloud experience for these data workloads? That's the number one question I get asked, and that's probably the biggest struggle for these chief data officers, chief digital officers, is how do I allow that innovation but maintaining control over my data compliance especially when we talk international standards, like GDPR, to restrict access to data, the ability to be forgotten, in these multinational organizations how do I sort of square all of those components? Then how do I do that in a way that just doesn't lock me into another appliance or software vendor stack? I want to be able to work within the confines of the ecosystem, use the tools that are out there, but allow my organization to innovate in a very structured compliant way. >> I mean, I love this conversation and you just, to me, you hit on the key word, which is organization. I want to talk about what some of the barriers are. And again, you heard my wrap up front. I really do think that we've created, not only from a technology standpoint, and yes the tooling is important, but so is the organization, and as you said an analyst might want to work in one environment, a data scientist might want to work in another environment. The data may be very distributed. You might have situations where they're supporting the line of business. The line of business is trying to build new products, and if I have to go through this monolithic centralized organization, that's a barrier for me. And so we're seeing that change, that I kind of alluded to it up front, but what do you see as the big barriers that are blocking this vision from becoming a reality? >> It very much is organization, Dave. The technology's actually no longer the inhibitor here. We have enough technology, enough choices out there that technology is no longer the issue. It's the organization's willingness to embrace some of those technologies and put just the right level of control around accessing that data. Because if you don't allow your data scientists and data analysts to innovate, they're going to do one of two things. They're either going to leave, and then you have a huge problem keeping up with your competitors, or they're going to do it anyway. And they're going to do it in a way that probably doesn't comply with the organizational standards. So the more progressive enterprises that I speak with have realized that they need to allow these various analytical users to choose the tools they want, to self provision those as they need to and get access to data in a secure and compliant way. And that means we need to bring the cloud to generally where the data is because it's a heck of a lot easier than trying to bring the data where the cloud is, while conforming to those data principles, and that's HPE's strategy. You've heard it from our CEO for years now. Everything needs to be delivered as a service. It's Ezmeral Software that enables that capability, such as self-service and secure data provisioning, et cetera. >> Again, I love this conversation because if you go back to the early days of Hadoop, that was what was profound about a Hadoop. Bring five megabytes of code to a petabyte of data, and it didn't happen. We shoved it all into a data lake and it became a data swamp. And that's okay, it's a one dot oh, you know, maybe in data as is like data warehouses, data hubs, data lakes, maybe this is now a four dot oh, but we're getting there. But open source, one thing's for sure, it continues to gain momentum, it's where the innovation is. I wonder if you could comment on your thoughts on the role that open-source software plays for large enterprises, maybe some of the hurdles that are there, whether they're legal or licensing, or just fears, how important is open source software today? >> I think the cloud native developments, following the 12 factor applications, microservices based, paved the way over the last decade to make using open source technology tools and libraries mainstream. We have to tip our hats to Red Hat, right? For allowing organizations to embrace something so core as an operating system within the enterprise. But what everyone realized is that it's support that's what has to come with that. So we can allow our data scientists to use open source libraries, packages, and notebooks, but are we going to allow those to run in production? So if the answer is no, well? Then if we can't get support, we're not going to allow that. So where HPE Ezmeral is taking the lead here is, again, embracing those open source capabilities, but, if we deploy it, we're going to support it. Or we're going to work with the organization that has the committers to support it. You call HPE, the same phone number you've been calling for years for tier one 24 by seven support, and we will support your Kubernetes, your Spark your Presto, your Hadoop ecosystem of components. We're that throat to choke and we'll provide, all the way up to break/fix support, for some of these components and packages, giving these large enterprises the confidence to move forward with open source, but knowing that they have a trusted partner in which to do so. >> And that's why we've seen such success with say, for instance, managed services in the cloud, versus throwing out all the animals in the zoo and say, okay, figure it out yourself. But then, of course, what we saw, which was kind of ironic, was people finally said, "Hey, we can do this in the cloud more easily." So that's where you're seeing a lot of data land. However, the definition of cloud or the notion of cloud is changing. No longer is it just this remote set of services, "Somewhere out there in the cloud", some data center somewhere, no, it's moving to on-prem, on-prem is creating hybrid connections. You're seeing co-location facilities very proximate to the cloud. We're talking now about the edge, the near edge, and the far edge, deeply embedded. So that whole notion of cloud is changing. But I want to ask you, there's still a big push to cloud, everybody has a cloud first mantra, how do you see HPE competing in this new landscape? >> I think collaborating is probably a better word, although you could certainly argue if we're just leasing or renting hardware, then it would be competition, but I think again... The workload is going to flow to where the data exists. So if the data's being generated at the edge and being pumped into the cloud, then cloud is prod. That's the production system. If the data is generated via on-premises systems, then that's where it's going to be executed. That's production, and so HPE's approach is very much co-exist. It's a co-exist model of, if you need to do DevTests in the cloud and bring it back on-premises, fine, or vice versa. The key here is not locking our customers and our prospective clients into any sort of proprietary stack, as we were talking about earlier, giving people the flexibility to move those workloads to where the data exists, that is going to allow us to continue to get share of wallet, mind share, continue to deploy those workloads. And yes, there's going to competition that comes along. Do you run this on a GCP or do you run it on a GreenLake on-premises? Sure, we'll have those conversations, but again, if we're using open source software as the foundation for that, then actually where you run it is less relevant. >> So there's a lot of choices out there, when it comes to containers generally and Kubernetes specifically, and you may have answered this, you get the zero trust component, you've got the orchestrator, you've got the scale-out piece, but I'm interested in hearing in your words why an enterprise would or should consider Ezmeral instead of alternatives to Kubernetes solutions? >> It's a fair question, and it comes up in almost every conversation. "Oh, we already do Kubernetes, we have a Kubernetes standard", and that's largely true in most of the enterprises I speak to. They're using one of the many on-premises distributions to their cloud distributions, and they're all fine. They're all fine for what they were built for. Ezmeral was generally built for something a little different. Yes, everybody can run microservices based applications, DevOps based workloads, but where Ezmeral is different is for those data intensive, in clustered applications. Those sorts of applications require a certain degree of network awareness, persistent storage, et cetera, which requires either a significant amount of intelligence. Either you have to write in Golang, or you have to write your own operators, or Ezmeral can be that easy button. We deploy those stateful applications, because we bring a persistent storage layer, that came from MapR. We're really good at deploying those stateful clustered applications, and, in fact, we've opened sourced that as a project, KubeDirector, that came from BlueData, and we're really good at securing these, using SPIFFE and SPIRE, to ensure that there's that zero trust approach, that came from Scytale, and we've wrapped all of that in Kubernetes. So now you can take the most difficult, gnarly complex data intensive applications in your enterprise and deploy them using open source. And if that means we have to co-exist with an existing Kubernetes distribution, that's fine. That's actually the most common scenario that I walk into is, I start asking about, "What about these other applications you haven't done yet?" The answer is usually, "We haven't gotten to them yet", or "We're thinking about it", and that's when we talk about the capabilities of Ezmeral and I usually get the response, "Oh. A, we didn't know you existed and B well, let's talk about how exactly you do that." So again, it's more of a co-exist model rather than a compete with model, Dave. >> Well, that makes sense. I mean, I think again, a lot of people, they go, "Oh yeah, Kubernetes, no big deal. It's everywhere." But you're talking about a solution, kind of taking a platform approach with capabilities. You got to protect the data. A lot of times, these microservices aren't so micro and things are happening really fast. You've got to be secure. You got to be protected. And like you said, you've got a single phone number. You know, people say one throat to choke. Somebody in the media the other day said, "No, no. Single hand to shake." It's more of a partnership. I think that's apropos for HPE, Matt, with your heritage. >> That one's better. >> So, you know, thinking about this whole, we've gone through the pre big data days and the big data was all the hot buzzword. People don't maybe necessarily use that term anymore, although the data is bigger and getting bigger, which is kind of ironic. Where do you see this whole space going? We've talked about that sort of trend toward breaking down the silos, decentralization, maybe these hyper specialized roles that we've created, maybe getting more embedded or aligned with the line of business. How do you see... It feels like the next 10 years are going to be different than the last 10 years. How do you see it, Matt? >> I completely agree. I think we are entering this next era, and I don't know if it's well-defined. I don't know if I would go out on an edge to say exactly what the trend is going to be. But as you said earlier, data lakes really turned into data swamps. We ended up with lots of them in the enterprise, and enterprises had to allow that to happen. They had to let each business unit or each group of users collect the data that they needed and IT sort of had to deal with that down the road. I think that the more progressive organizations are leading the way. They are, again, taking those lessons from cloud and application developments, microservices, and they're allowing a freedom of choice. They're allowing data to move, to where those applications are, and I think this decentralized approach is really going to be king. You're going to see traditional software packages. You're going to see open source. You're going to see a mix of those, but what I think will probably be common throughout all of that is there's going to be this sense of automation, this sense that, we can't just build an algorithm once, release it and then wish it luck. That we've got to treat these analytics, and these data systems, as living things. That there's life cycles that we have to support. Which means we need to have DevOps for our data science. We need a CI/CD for our data analytics. We need to provide engineering at scale, like we do for software engineering. That's going to require automation, and an organizational thinking process, to allow that to actually occur. I think all of those things. The sort of people, process, products. It's all three of those things that are going to have to come into play, but stealing those best ideas from cloud and application developments, I think we're going to end up with probably something new over the next decade or so. >> Again, I'm loving this conversation, so I'm going to stick with it for a sec. It's hard to predict, but some takeaways that I have, Matt, from our conversation, I wonder if you could comment? I think the future is more open source. You mentioned automation, Devs are going to be key. I think governance as code, security designed in at the point of code creation, is going to be critical. It's no longer going be a bolt on. I don't think we're going to throw away the data warehouse or the data hubs or the data lakes. I think they become a node. I like this idea, I don't know if you know Zhamak Dehghani? but she has this idea of a global data mesh where these tools, lakes, whatever, they're a node on the mesh. They're discoverable. They're shareable. They're governed in a way. I think the mistake a lot of people made early on in the big data movement is, "Oh, we got data. We have to monetize our data." As opposed to thinking about what products can I build that are based on data that then can lead to monetization? I think the other thing I would say is the business has gotten way too technical. (Dave chuckles) It's alienated a lot of the business lines. I think we're seeing that change, and I think things like Ezmeral that simplify that, are critical. So I'll give you the final thoughts, based on my rant. >> No, your rant is spot on Dave. I think we are in agreement about a lot of things. Governance is absolutely key. If you don't know where your data is, what it's used for, and can apply policies to it. It doesn't matter what technology you throw at it, you're going to end up in the same state that you're essentially in today, with lots of swamps. I did like that concept of a node or a data mesh. It kind of goes back to the similar thing with a service mesh, or a set of APIs that you can use. I think we're going to have something similar with data. The trick is always, how heavy is it? How easy is it to move about? I think there's always going to be that latency issue, maybe not within the data center, but across the WAN. Latency is still going to be key, which means we need to have really good processes to be able to move data around. As you said, govern it. Determine who has access to what, when, and under what conditions, and then allow it to be free. Allow people to bring their choice of tools, provision them how they need to, while providing that audit, compliance and control. And then again, as you need to provision data across those nodes for those use cases, do so in a well measured and governed way. I think that's sort of where things are going. But we keep using that term governance, I think that's so key, and there's nothing better than using open source software because that provides traceability, auditability and this, frankly, openness that allows you to say, "I don't like where this project's going. I want to go in a different direction." And it gives those enterprises a control over these platforms that they've never had before. >> Matt, thanks so much for the discussion. I really enjoyed it. Awesome perspectives. >> Well thank you for having me, Dave. Excellent conversation as always. Thanks for having me again. >> You're very welcome. And thank you for watching everybody. This is theCube's continuous coverage of HPE Discover 2021. Of course, the virtual version. Next year, we're going to be back live. My name is Dave Volante. Keep it right there. (upbeat music)

Published Date : Jun 22 2021

SUMMARY :

and ensure the productivity of data teams. Good to see you again, Dave. Where do you spend your time? and practices into the data world. and at the time a lot and practices to these types of workloads, and MapR always had the best product, the ability to be forgotten, and if I have to go through this the cloud to generally where it continues to gain momentum, the committers to support it. of cloud or the notion that is going to allow us in most of the enterprises I speak to. You got to be protected. and the big data was all the hot buzzword. of that is there's going to so I'm going to stick with it for a sec. and then allow it to be free. for the discussion. Well thank you for having me, Dave. Of course, the virtual version.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Matt Maccaux	PERSON	0.99+
Matt	PERSON	0.99+
Dave Volante	PERSON	0.99+
HP	ORGANIZATION	0.99+
Cytec	ORGANIZATION	0.99+
Next year	DATE	0.99+
two parts	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Zhamak Dehghani	PERSON	0.99+
HPE	ORGANIZATION	0.99+
BlueData	ORGANIZATION	0.99+
today	DATE	0.99+
Hadoop	TITLE	0.99+
12 factor	QUANTITY	0.99+
each business unit	QUANTITY	0.99+
GDPR	TITLE	0.98+
Golang	TITLE	0.98+
each group	QUANTITY	0.98+
Ezmeral	ORGANIZATION	0.97+
three	QUANTITY	0.97+
zero trust	QUANTITY	0.97+
single phone number	QUANTITY	0.96+
Ezmeral	PERSON	0.96+
single	QUANTITY	0.96+
one	QUANTITY	0.96+
seven	QUANTITY	0.95+
kumbaya	ORGANIZATION	0.95+
one thing	QUANTITY	0.93+
Big Data	TITLE	0.91+
two things	QUANTITY	0.9+
theCube	ORGANIZATION	0.9+
next 10 years	DATE	0.89+
four dot	QUANTITY	0.89+
first mantra	QUANTITY	0.89+
last 10 years	DATE	0.88+
Ezmeral Software	ORGANIZATION	0.88+
one environment	QUANTITY	0.88+
MapR	ORGANIZATION	0.87+
Scytale	ORGANIZATION	0.87+
next decade	DATE	0.86+
first	QUANTITY	0.86+
Kubernetes	TITLE	0.86+
SPIFFE	TITLE	0.84+
SPIRE	TITLE	0.83+
tier one	QUANTITY	0.82+
Spark	TITLE	0.8+
five megabytes of code	QUANTITY	0.77+
KubeDirector	ORGANIZATION	0.75+
one question	QUANTITY	0.74+
Single hand	QUANTITY	0.74+
years	QUANTITY	0.73+
last decade	DATE	0.73+
2021	DATE	0.73+
Azure	TITLE	0.7+

Breaking Analysis: Chasing Snowflake in Database Boomtown

(upbeat music) >> From theCUBE studios in Palo Alto, in Boston bringing you data-driven insights from theCUBE and ETR. This is braking analysis with Dave Vellante. >> Database is the heart of enterprise computing. The market is both exploding and it's evolving. The major force is transforming the space include Cloud and data, of course, but also new workloads, advanced memory and IO capabilities, new processor types, a massive push towards simplicity, new data sharing and governance models, and a spate of venture investment. Snowflake stands out as the gold standard for operational excellence and go to market execution. The company has attracted the attention of customers, investors, and competitors and everyone from entrenched players to upstarts once in the act. Hello everyone and welcome to this week's Wikibon CUBE Insights powered by ETR. In this breaking analysis, we'll share our most current thinking on the database marketplace and dig into Snowflake's execution. Some of its challenges and we'll take a look at how others are making moves to solve customer problems and try to get a piece of the growing database pie. Let's look at some of the factors that are driving market momentum. First, customers want lower license costs. They want simplicity. They want to avoid database sprawl. They want to run anywhere and manage new data types. These needs often are divergent and they pull vendors and technologies in different direction. It's really hard for any one platform to accommodate every customer need. The market is large and it's growing. Gardner has it at around 60 to 65 billion with a CAGR of somewhere around 20% over the next five years. But the market, as we know it is being redefined. Traditionally, databases have served two broad use cases, OLTP or transactions and reporting like data warehouses. But a diversity of workloads and new architectures and innovations have given rise to a number of new types of databases to accommodate all these diverse customer needs. Many billions have been spent over the last several years in venture money and it continues to pour in. Let me just give you some examples. Snowflake prior to its IPO, raised around 1.4 billion. Redis Labs has raised more than 1/2 billion dollars so far, Cockroach Labs, more than 350 million, Couchbase, 250 million, SingleStore formerly MemSQL, 238 million, Yellowbrick Data, 173 million. And if you stretch the definition of database a little bit to including low-code or no-code, Airtable has raised more than 600 million. And that's by no means a complete list. Now, why is all this investment happening? Well, in a large part, it's due to the TAM. The TAM is huge and it's growing and it's being redefined. Just how big is this market? Let's take a look at a chart that we've shown previously. We use this chart to Snowflakes TAM, and it focuses mainly on the analytics piece, but we'll use it here to really underscore the market potential. So the actual database TAM is larger than this, we think. Cloud and Cloud-native technologies have changed the way we think about databases. Virtually 100% of the database players that they're are in the market have pivoted to a Cloud first strategy. And many like Snowflake, they're pretty dogmatic and have a Cloud only strategy. Databases has historically been very difficult to manage, they're really sensitive to latency. So that means they require a lot of tuning. Cloud allows you to throw virtually infinite resources on demand and attack performance problems and scale very quickly, minimizing the complexity and tuning nuances. This idea, this layer of data as a service we think of it as a staple of digital transformation. Is this layer that's forming to support things like data sharing across ecosystems and the ability to build data products or data services. It's a fundamental value proposition of Snowflake and one of the most important aspects of its offering. Snowflake tracks a metric called edges, which are external connections in its data Cloud. And it claims that 15% of its total shared connections are edges and that's growing at 33% quarter on quarter. This notion of data sharing is changing the way people think about data. We use terms like data as an asset. This is the language of the 2010s. We don't share our assets with others, do we? No, we protect them, we secure or them, we even hide them. But we absolutely don't want to share those assets but we do want to share our data. I had a conversation recently with Forrester analyst, Michelle Goetz. And we both agreed we're going to scrub data as an asset from our phrasiology. Increasingly, people are looking at sharing as a way to create, as I said, data products or data services, which can be monetized. This is an underpinning of Zhamak Dehghani's concept of a data mesh, make data discoverable, shareable and securely governed so that we can build data products and data services that can be monetized. This is where the TAM just explodes and the market is redefining. And we think is in the hundreds of billions of dollars. Let's talk a little bit about the diversity of offerings in the marketplace. Again, databases used to be either transactional or analytic. The bottom lines and top lines. And this chart here describe those two but the types of databases, you can see the middle of mushrooms, just looking at this list, blockchain is of course a specialized type of database and it's also finding its way into other database platforms. Oracle is notable here. Document databases that support JSON and graph data stores that assist in visualizing data, inference from multiple different sources. That's is one of the ways in which adtech has taken off and been so effective. Key Value stores, log databases that are purpose-built, machine learning to enhance insights, spatial databases to help build the next generation of products, the next automobile, streaming databases to manage real time data flows and time series databases. We might've missed a few, let us know if you think we have, but this is a kind of pretty comprehensive list that is somewhat mind boggling when you think about it. And these unique requirements, they've spawned tons of innovation and companies. Here's a small subset on this logo slide. And this is by no means an exhaustive list, but you have these companies here which have been around forever like Oracle and IBM and Teradata and Microsoft, these are the kind of the tier one relational databases that have matured over the years. And they've got properties like atomicity, consistency, isolation, durability, what's known as ACID properties, ACID compliance. Some others that you may or may not be familiar with, Yellowbrick Data, we talked about them earlier. It's going after the best price, performance and analytics and optimizing to take advantage of both hybrid installations and the latest hardware innovations. SingleStore, as I said, formerly known as MemSQL is a very high end analytics and transaction database, supports mixed workloads, extremely high speeds. We're talking about trillions of rows per second that could be ingested in query. Couchbase with hybrid transactions and analytics, Redis Labs, open source, no SQL doing very well, as is Cockroach with distributed SQL, MariaDB with its managed MySQL, Mongo and document database has a lot of momentum, EDB, which supports open source Postgres. And if you stretch the definition a bit, Splunk, for log database, why not? ChaosSearch, really interesting startup that leaves data in S-3 and is going after simplifying the ELK stack, New Relic, they have a purpose-built database for application performance management and we probably could have even put Workday in the mix as it developed a specialized database for its apps. Of course, we can't forget about SAP with how not trying to pry customers off of Oracle. And then the big three Cloud players, AWS, Microsoft and Google with extremely large portfolios of database offerings. The spectrum of products in this space is very wide, with you've got AWS, which I think we're up to like 16 database offerings, all the way to Oracle, which has like one database to do everything not withstanding MySQL because it owns MySQL got that through the Sun Acquisition. And it recently, it made some innovations there around the heat wave announcement. But essentially Oracle is investing to make its database, Oracle database run any workload. While AWS takes the approach of the right tool for the right job and really focuses on the primitives for each database. A lot of ways to skin a cat in this enormous and strategic market. So let's take a look at the spending data for the names that make it into the ETR survey. Not everybody we just mentioned will be represented because they may not have quite the market presence of the ends in the survey, but ETR that capture a pretty nice mix of players. So this chart here, it's one of the favorite views that we like to share quite often. It shows the database players across the 1500 respondents in the ETR survey this past quarter and it measures their net score. That's spending momentum and is shown on the vertical axis and market share, which is the pervasiveness in the data set is on the horizontal axis. The Snowflake is notable because it's been hovering around 80% net score since the survey started picking them up. Anything above 40%, that red line there, is considered by us to be elevated. Microsoft and AWS, they also stand out because they have both market presence and they have spending velocity with their platforms. Oracle is very large but it doesn't have the spending momentum in the survey because nearly 30% of Oracle installations are spending less, whereas only 22% are spending more. Now as a caution, this survey doesn't measure dollar spent and Oracle will be skewed toward the big customers with big budgets. So you got to consider that caveat when evaluating this data. IBM is in a similar position although its market share is not keeping up with Oracle's. Google, they've got great tech especially with BigQuery and it has elevated momentum. So not a bad spot to be in although I'm sure it would like to be closer to AWS and Microsoft on the horizontal axis, so it's got some work to do there. And some of the others we mentioned earlier, like MemSQL, Couchbase. As shown MemSQL here, they're now SingleStore. Couchbase, Reddis, Mongo, MariaDB, all very solid scores on the vertical axis. Cloudera just announced that it was selling to private equity and that will hopefully give it some time to invest in this platform and get off the quarterly shot clock. MapR was acquired by HPE and it's part of HPE's Ezmeral platform, their data platform which doesn't yet have the market presence in the survey. Now, something that is interesting in looking at in Snowflakes earnings last quarter, is this laser focused on large customers. This is a hallmark of Frank Slootman and Mike Scarpelli who I know they don't have a playbook but they certainly know how to go whale hunting. So this chart isolates the data that we just showed you to the global 1000. Note that both AWS and Snowflake go up higher on the X-axis meaning large customers are spending at a faster rate for these two companies. The previous chart had an end of 161 for Snowflake, and a 77% net score. This chart shows the global 1000, in the end there for Snowflake is 48 accounts and the net score jumps to 85%. We're not going to show it here but when you isolate the ETR data, nice you can just cut it, when you isolate it on the fortune 1000, the end for Snowflake goes to 59 accounts in the data set and Snowflake jumps another 100 basis points in net score. When you cut the data by the fortune 500, the Snowflake N goes to 40 accounts and the net score jumps another 200 basis points to 88%. And when you isolate on the fortune 100 accounts is only 18 there but it's still 18, their net score jumps to 89%, almost 90%. So it's very strong confirmation that there's a proportional relationship between larger accounts and spending momentum in the ETR data set. So Snowflakes large account strategy appears to be working. And because we think Snowflake is sticky, this probably is a good sign for the future. Now we've been talking about net score, it's a key measure in the ETR data set, so we'd like to just quickly remind you what that is and use Snowflake as an example. This wheel chart shows the components of net score, that lime green is new adoptions. 29% of the customers in the ETR dataset that are new to Snowflake. That's pretty impressive. 50% of the customers are spending more, that's the forest green, 20% are flat, that's the gray, and only 1%, the pink, are spending less. And 0% zero or replacing Snowflake, no defections. What you do here to get net scores, you subtract the red from the green and you get a net score of 78%. Which is pretty sick and has been sick as in good sick and has been steady for many, many quarters. So that's how the net score methodology works. And remember, it typically takes Snowflake customers many months like six to nine months to start consuming it's services at the contracted rate. So those 29% new adoptions, they're not going to kick into high gear until next year, so that bodes well for future revenue. Now, it's worth taking a quick snapshot at Snowflakes most recent quarter, there's plenty of stuff out there that you can you can google and get a summary but let's just do a quick rundown. The company's product revenue run rate is now at 856 million they'll surpass $1 billion on a run rate basis this year. The growth is off the charts very high net revenue retention. We've explained that before with Snowflakes consumption pricing model, they have to account for retention differently than what a SaaS company. Snowflake added 27 net new $1 million accounts in the quarter and claims to have more than a hundred now. It also is just getting its act together overseas. Slootman says he's personally going to spend more time in Europe, given his belief, that the market is huge and they can disrupt it and of course he's from the continent. He was born there and lived there and gross margins expanded, do in a large part to renegotiation of its Cloud costs. Welcome back to that in a moment. Snowflake it's also moving from a product led growth company to one that's more focused on core industries. Interestingly media and entertainment is one of the largest along with financial services and it's several others. To me, this is really interesting because Disney's example that Snowflake often puts in front of its customers as a reference. And it seems to me to be a perfect example of using data and analytics to both target customers and also build so-called data products through data sharing. Snowflake has to grow its ecosystem to live up to its lofty expectations and indications are that large SIS are leaning in big time. Deloitte cross the $100 million in deal flow in the quarter. And the balance sheet's looking good. Thank you very much with $5 billion in cash. The snarks are going to focus on the losses, but this is all about growth. This is a growth story. It's about customer acquisition, it's about adoption, it's about loyalty and it's about lifetime value. Now, as I said at the IPO, and I always say this to young people, don't buy a stock at the IPO. There's probably almost always going to be better buying opportunities ahead. I'm not always right about that, but I often am. Here's a chart of Snowflake's performance since IPO. And I have to say, it's held up pretty well. It's trading above its first day close and as predicted there were better opportunities than day one but if you have to make a call from here. I mean, don't take my stock advice, do your research. Snowflake they're priced to perfection. So any disappointment is going to be met with selling. You saw that the day after they beat their earnings last quarter because their guidance in revenue growth,. Wasn't in the triple digits, it sort of moderated down to the 80% range. And they pointed, they pointed to a new storage compression feature that will lower customer costs and consequently, it's going to lower their revenue. I swear, I think that that before earnings calls, Scarpelli sits back he's okay, what kind of creative way can I introduce the dampen enthusiasm for the guidance. Now I'm not saying lower storage costs will translate into lower revenue for a period of time. But look at dropping storage prices, customers are always going to buy more, that's the way the storage market works. And stuff like did allude to that in all fairness. Let me introduce something that people in Silicon Valley are talking about, and that is the Cloud paradox for SaaS companies. And what is that? I was a clubhouse room with Martin Casado of Andreessen when I first heard about this. He wrote an article with Sarah Wang, calling it to question the merits of SaaS companies sticking with Cloud at scale. Now the basic premise is that for startups in early stages of growth, the Cloud is a no brainer for SaaS companies, but at scale, the cost of Cloud, the Cloud bill approaches 50% of the cost of revenue, it becomes an albatross that stifles operating leverage. Their conclusion ended up saying that as much as perhaps as much as the back of the napkin, they admitted that, but perhaps as much as 1/2 a trillion dollars in market cap is being vacuumed away by the hyperscalers that could go to the SaaS providers as cost savings from repatriation. And that Cloud repatriation is an inevitable path for large SaaS companies at scale. I was particularly interested in this as I had recently put on a post on the Cloud repatriation myth. I think in this instance, there's some merit to their conclusions. But I don't think it necessarily bleeds into traditional enterprise settings. But for SaaS companies, maybe service now has it right running their own data centers or maybe a hybrid approach to hedge bets and save money down the road is prudent. What caught my attention in reading through some of the Snowflake docs, like the S-1 in its most recent 10-K were comments regarding long-term purchase commitments and non-cancelable contracts with Cloud companies. And the companies S-1, for example, there was disclosure of $247 million in purchase commitments over a five plus year period. And the company's latest 10-K report, that same line item jumped to 1.8 billion. Now Snowflake is clearly managing these costs as it alluded to when its earnings call. But one has to wonder, at some point, will Snowflake follow the example of say Dropbox which Andreessen used in his blog and start managing its own IT? Or will it stick with the Cloud and negotiate hard? Snowflake certainly has the leverage. It has to be one of Amazon's best partners and customers even though it competes aggressively with Redshift but on the earnings call, CFO Scarpelli said, that Snowflake was working on a new chip technology to dramatically increase performance. What the heck does that mean? Is this Snowflake is not becoming a hardware company? So I going to have to dig into that a little bit and find out what that it means. I'm guessing, it means that it's taking advantage of ARM-based processes like graviton, which many ISVs ar allowing their software to run on that lower cost platform. Or maybe there's some deep dark in the weeds secret going on inside Snowflake, but I doubt it. We're going to leave all that for there for now and keep following this trend. So it's clear just in summary that Snowflake they're the pace setter in this new exciting world of data but there's plenty of room for others. And they still have a lot to prove. For instance, one customer in ETR, CTO round table express skepticism that Snowflake will live up to its hype because its success is going to lead to more competition from well-established established players. This is a common theme you hear it all the time. It's pretty easy to reach that conclusion. But my guess is this the exact type of narrative that fuels Slootman and sucked him back into this game of Thrones. That's it for now, everybody. Remember, these episodes they're all available as podcasts, wherever you listen. All you got to do is search braking analysis podcast and please subscribe to series. Check out ETR his website at etr.plus. We also publish a full report every week on wikinbon.com and siliconangle.com. You can get in touch with me, Email is David.vellante@siliconangle.com. You can DM me at DVelante on Twitter or comment on our LinkedIn posts. This is Dave Vellante for theCUBE Insights powered by ETR. Have a great week everybody, be well and we'll see you next time. (upbeat music)

Published Date : Jun 5 2021

SUMMARY :

This is braking analysis and the net score jumps to 85%.

ENTITIES

Entity	Category	Confidence
Michelle Goetz	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Mike Scarpelli	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Sarah Wang	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
50%	QUANTITY	0.99+
Palo Alto	LOCATION	0.99+
Andreessen	PERSON	0.99+
Europe	LOCATION	0.99+
40 accounts	QUANTITY	0.99+
$1 billion	QUANTITY	0.99+
Frank Slootman	PERSON	0.99+
Slootman	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Redis Labs	ORGANIZATION	0.99+
Scarpelli	PERSON	0.99+
TAM	ORGANIZATION	0.99+
six	QUANTITY	0.99+
33%	QUANTITY	0.99+
$5 billion	QUANTITY	0.99+
80%	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
1.8 billion	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
59 accounts	QUANTITY	0.99+
Cockroach Labs	ORGANIZATION	0.99+
Disney	ORGANIZATION	0.99+
Teradata	ORGANIZATION	0.99+
18	QUANTITY	0.99+
77%	QUANTITY	0.99+
85%	QUANTITY	0.99+
29%	QUANTITY	0.99+
20%	QUANTITY	0.99+
Boston	LOCATION	0.99+
78%	QUANTITY	0.99+
Martin Casado	PERSON	0.99+
48 accounts	QUANTITY	0.99+
856 million	QUANTITY	0.99+
1500 respondents	QUANTITY	0.99+
nine months	QUANTITY	0.99+
Zhamak Dehghani	PERSON	0.99+
0%	QUANTITY	0.99+
wikinbon.com	OTHER	0.99+
88%	QUANTITY	0.99+
two	QUANTITY	0.99+
$100 million	QUANTITY	0.99+
89%	QUANTITY	0.99+
Airtable	ORGANIZATION	0.99+
next year	DATE	0.99+
Snowflake	ORGANIZATION	0.99+
two companies	QUANTITY	0.99+
Deloitte	ORGANIZATION	0.99+
200 basis points	QUANTITY	0.99+
First	QUANTITY	0.99+
HPE	ORGANIZATION	0.99+
15%	QUANTITY	0.99+
more than 600 million	QUANTITY	0.99+
last quarter	DATE	0.99+
161	QUANTITY	0.99+
David.vellante@siliconangle.com	OTHER	0.99+
$247 million	QUANTITY	0.99+
27 net	QUANTITY	0.99+
2010s	DATE	0.99+
siliconangle.com	OTHER	0.99+
Forrester	ORGANIZATION	0.99+
MemSQL	TITLE	0.99+
Yellowbrick Data	ORGANIZATION	0.99+
more than 1/2 billion dollars	QUANTITY	0.99+
Dropbox	ORGANIZATION	0.99+
MySQL	TITLE	0.99+
BigQuery	TITLE	0.99+

Robert Christiansen & Kumar Sreekanti | HPE Ezmeral Day 2021

>> Okay. Now we're going to dig deeper into HPE Ezmeral and try to better understand how it's going to impact customers. And with me to do that are Robert Christiansen, who is the Vice President of Strategy in the office of the CTO and Kumar Sreekanti, who is the Chief Technology Officer and Head of Software, both of course, with Hewlett Packard Enterprise. Gentlemen, welcome to the program. Thanks for coming on. >> Good seeing you, Dave. Thanks for having us. >> It's always good to see you guys. >> Thanks for having us. >> So, Ezmeral, kind of an interesting name, catchy name, but Kumar, what exactly is HPE Ezmeral? >> It's indeed a catchy name. Our branding team has done fantastic job. I believe it's actually derived from Esmeralda, is the Spanish for emarald. Often it's supposed some very mythical bars, and they derived Ezmeral from there. And we all initially when we heard, it was interesting. So, Ezmeral was our effort to take all the software, the platform tools that HPE has and provide this modern operating platform to the customers and put it under one brand. So, it has a modern container platform, it does persistent storage with the data fabric and it doesn't include as many of our customers from that. So, think of it as a modern container platform for modernization and digitazation for the customers. >> Yeah, it's an interesting, you talk about platform, so it's not, you know, a lot of times people say product, but you're positioning it as a platform so that has a broader implication. >> That's very true. So, as the customers are thinking of this digitazation, modernization containers and Microsoft, as you know, there is, has become the stable all. So, it's actually a container orchestration platform with golfers open source going into this as well as the persistence already. >> So, by the way, Ezmeral, I think Emerald in Spanish, I think in the culture, it also has immunity powers as well. So immunity from lock-in, (Robert and Kumar laughing) and all those other terrible diseases, maybe it helps us with COVID too. Robert, when you talk to customers, what problems do you probe for that Ezmeral can do a good job solving? >> Yeah, that's a really great question because a lot of times they don't even know what it is that they're trying to solve for other than just a very narrow use case. But the idea here is to give them a platform by which they can bridge both the public and private environment for what they do, the application development, specifically in the data side. So, when yo're looking to bring containerization, which originally got started on the public cloud and it has moved its way, I should say it become popular in the public cloud and it moved its way on premises now, Ezmeral really opens the door to three fundamental things, but, you know, how do I maintain an open architecture like you're referring to, to some low or no lock-in of my applications. Number two, how do I gain a data fabric or a data consistency of accessing the data so I don't have to rewrite those applications when I do move them around. And then lastly, where everybody's heading, the real value is in the AI ML initiatives that companies are really bringing and that value of their data and locking that data at where the data is being generated and stored. And so the Ezmeral platform is those multiple pieces that Kumar was talking about stacked together to deliver the solutions for the client. >> So Kumar, how does it work? What's the sort of IP or the secret source behind it all? What makes HPE different? >> Yeah. Continuing on (indistinct) it's a modern glass form of optimizing the data and workloads. But I think I would say there are three unique characteristics of this platform. Number one is that it actually provides you both an ability to run statefull and stateless as workloads under the same platform. And number two is, as we were thinking about, unlike another Kubernete is open source, it actually add, use you all open-source Kurbenates as well as an orchestration behind them so you can actually, you can provide this hybrid thing that Robert was talking about. And then actually we built the workflows into it, for example, they'll actually announced along with it Ezmeral, ML expert on the customers can actually do the workflow management around specific data woakload. So, the magic is if you want to see the secrets out of all the efforts that has been going into some of the IP acquisitions that HPE has done over the years, we said we BlueData, MAPR, and the Nimble, all these pieces are coming together and providing a modern digitization platform for the customers. >> So these pieces, they all have a little bit of a machine intelligence in them, you have people, who used to think of AI as this sort of separate thing, I mean the same thing with containers, right? But now it's getting embedded into the stack. What is the role of machine intelligence or machine learning in Ezmeral? >> I would take a step back and say, you know, there's very well the customers, the amount of data that is being generated and 95% or 98% of the data is machine generated. And it does a series of a window gravity, and it is sitting at the edge and we were the only one that had edge to the cloud data fabric that's built to it. So, the number one is that we are bringing computer or a cloud to the data that taking the data to the cloud, right, if you will. It's a cloud like experience that provides the customer. AI is not much value to us if we don't harness the data. So, I said this in one of the blog was we have gone from collecting the data, to the finding the insights into the data, right. So, that people have used all sorts of analysis that we are to find data is the new oil. So, the AI and the data. And then now your applications have to be modernized and nobody wants write an application in a non microservices fashion because you wanted to build the modernization. So, if you bring these three things, I want to have a data gravity with lots of data, I have built an AI applications and I want to have those three things I think we bring to the customer. >> So, Robert let's stay on customers for a minute. I mean, I want to understand the business impact, the business case, I mean, why should all the cloud developers have all the fun, you've mentioned it, you're bridging the cloud and on-prem, they talk about when you talk to customers and what they are seeing is the business impact, what's the real drivers for that? >> That's a great question cause at the end of the day, I think the recent survey that was that cost and performance are still the number one requirement for this, just real close second is agility, the speed at which they want to move and so those two are the top of mind every time. But the thing we find Ezmeral, which is so impactful is that nobody brings together the Silicon, the hardware, the platform, and all of that stack together work and combine like Ezmeral does with the platforms that we have and specifically, we start getting 90, 92, 93% utilization out of AI ML workloads on very expensive hardware, it really, really is a competitive advantage over a public cloud offering, which does not offer those kinds of services and the cost models are so significantly different. So, we do that by collapsing the stack, we take out as much intellectual property, excuse me, as much software pieces that are necessary so we are closest to the Silicon, closest to the applications, bring it to the hardware itself, meaning that we can interleave the applications, meaning that you can get to true multitenancy on a particular platform that allows you to deliver a cost optimized solution. So, when you talk about the money side, absolutely, there's just nothing out there and then on the second side, which is agility. One of the things that we know is today is that applications need to be built in pipelines, right, this is something that's been established now for quite some time. Now, that's really making its way on premises and what Kumar was talking about with, how do we modernize? How do we do that? Well, there's going to be some that you want to break into microservices containers, and there's some that you don't. Now, the ones that they're going to do that they're going to get that speed and motion, et cetera, out of the gate and they can put that on premises, which is relatively new these days to the on-premises world. So, we think both won't be the advantage. >> Okay. I want to unpack that a little bit. So, the cost is clearly really 90 plus percent utilization. >> Yes. >> I mean, Kumar, you know, even pre virtualization, we know that it was like, even with virtualization, you never really got that high. I mean, people would talk about it, but are you really able to sustain that in real world workloads? >> Yeah. I think when you make your exchangeable cut up into smaller pieces, you can insert them into many areas. We have one customer was running 18 containers on a single server and each of those containers, as you know, early days of new data, you actually modernize what we consider week run containers or microbiome. So, if you actually build these microservices, and you all and you have versioning all correctly, you can pack these things extremely well. And we have seen this, again, it's not a guarantee, it all depends on your application and your, I mean, as an engineer, we want to always understand all of these caveats work, but it is a very modern utilization of the platform with the data and once you know where the data is, and then it becomes very easy to match those two. >> Now, the other piece of the value proposition that I heard Robert is it's basically an integrated stack. So I don't have to cobble together a bunch of open source components, there's legal implications, there's obviously performance implications. I would imagine that resonates and particularly with the enterprise buyer because they don't have the time to do all this integration. >> That's a very good point. So there is an interesting question that enterprises, they want to have an open source so there is no lock-in, but they also need help to implement and deploy and manage it because they don't have the expertise. And we all know that the IKEA desk has actually brought that API, the past layer standardization. So what we have done is we have given the open source and you arrive to the Kubernetes API, but at the same time orchestration, persistent stories, the data fabric, the AI algorithms, all of them are bolted into it and on the top of that, it's available both as a licensed software on-prem, and the same software runs on the GreenLake. So you can actually pay as you go and then we run it for them in a colo or, or in their own data center. >> Oh, good. That was one of my latter questions. So, I can get this as a service pay by the drink, essentially I don't have to install a bunch of stuff on-prem and pay it perpetualized... >> There is a lot of containers and is the reason and the lapse of service in the last discover and knowledge gone production. So both Ezmeral is available, you can run it on-prem, on the cloud as well, a congenital platform, or you can run instead on GreenLake. >> Robert, are there any specific use case patterns that you see emerging amongst customers? >> Yeah, absolutely. So there's a couple of them. So we have a, a really nice relationship that we see with any of the Splunk operators that were out there today, right? So Splunk containerized, their operator, that operator is the number one operator, for example, for Splunk in the IT operation side or notifications as well as on the security operations side. So we've found that that runs highly effective on top of Ezmeral, on top of our platforms so we just talked about, that Kumar just talked about, but I want to also give a little bit of backgrounds to that same operator platform. The way that the Ezmeral platform has done is that we've been able to make it highly active, active with HA availability at nine, it's going to be at five nines for that same Splunk operator on premises, on the Kubernetes open source, which is as far as I'm concerned, a very, very high end computer science work. You understand how difficult that is, that's number one. Number two is you'll see just a spark workloads as a whole. All right. Nobody handles spark workloads like we do. So we put a container around them and we put them inside the pipeline of moving people through that basic, ML AI pipeline of getting a model through its system, through its trained, and then actually deployed to our ML ops pipeline. This is a key fundamental for delivering value in the data space as well. And then lastly, this is, this is really important when you think about the data fabric that we offer, the data fabric itself doesn't necessarily have to be bolted with the container platform, the container, the actual data fabric itself, can be deployed underneath a number of our, you know, for competitive platforms who don't handle data well. We know that, we know that they don't handle it very well at all. And we get lots and lots of calls for people saying, "Hey, can you take your Ezmeral data fabric "and solve my large scale, "highly challenging data problems?" And we say, "yeah, "and then when you're ready for a real world, "full time enterprise ready container platform, "we'd be happy to prove that too." >> So you're saying you're, if I'm inferring correctly, you're one of the values as you're simplifying that whole data pipeline and the whole data science, science project pun intended, I guess. (Robert and Kumar laughing) >> That's true. >> Absolutely. >> So, where does a customer start? I mean, what, what are the engagements like? What's the starting point? >> It's means we're probably one of the most trusted and robust supplier for many, many years and we have a phenomenal workforce of both the (indistinct), world leading support organization, there are many places to start with. One is obviously all these salaries that are available on the GreenLake, as we just talked about, and they can start on a pay as you go basis. There are many customers that actually some of them are from the early days of BlueData and MAPR, and then already running and they actually improvise on when, as they move into their next version more of a message. You can start with simple as well as container platform or system with the store, a computer's operation and can implement as an analyst to start working. And then finally as a big company like HPE as an everybody's company, that finance it's services, it's very easy for the customers to be able to get that support on day to day operations. >> Thank you for watching everybody. It's Dave Vellante for theCUBE. Keep it right there for more great content from Ezmeral.

Published Date : Mar 10 2021

SUMMARY :

in the office of the Thanks for having us. digitazation for the customers. so it's not, you know, a lot So, as the customers are So, by the way, Ezmeral, of accessing the data So, the magic is if you I mean the same thing and it is sitting at the edge is the business impact, One of the things that we know is today So, the cost is clearly really I mean, Kumar, you know, and you have versioning all correctly, of the value proposition and the same software service pay by the drink, and the lapse of service that operator is the number one operator, and the whole data science, that are available on the GreenLake, Thank you for watching everybody.

ENTITIES

Entity	Category	Confidence
Kumar	PERSON	0.99+
Robert	PERSON	0.99+
90	QUANTITY	0.99+
Dave Vellante	PERSON	0.99+
Robert Christiansen	PERSON	0.99+
Kumar Sreekanti	PERSON	0.99+
Splunk	ORGANIZATION	0.99+
Ezmeral	PERSON	0.99+
95%	QUANTITY	0.99+
Dave	PERSON	0.99+
HPE	ORGANIZATION	0.99+
98%	QUANTITY	0.99+
Hewlett Packard Enterprise	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
IKEA	ORGANIZATION	0.99+
One	QUANTITY	0.99+
MAPR	ORGANIZATION	0.99+
one customer	QUANTITY	0.99+
BlueData	ORGANIZATION	0.99+
90 plus percent	QUANTITY	0.99+
both	QUANTITY	0.99+
each	QUANTITY	0.99+
Nimble	ORGANIZATION	0.99+
second side	QUANTITY	0.98+
one	QUANTITY	0.98+
GreenLake	ORGANIZATION	0.98+
today	DATE	0.98+
Ezmeral	ORGANIZATION	0.97+
Emerald	PERSON	0.97+
HPE Ezmeral	ORGANIZATION	0.97+
three unique characteristics	QUANTITY	0.96+
92	QUANTITY	0.95+
one brand	QUANTITY	0.94+
Number one	QUANTITY	0.94+
single server	QUANTITY	0.93+
Spanish	OTHER	0.92+
three things	QUANTITY	0.92+
nine	QUANTITY	0.9+
18 con	QUANTITY	0.89+
number two	QUANTITY	0.88+
Kubernetes	TITLE	0.86+
93%	QUANTITY	0.86+
Kubernetes	ORGANIZATION	0.85+
Number two	QUANTITY	0.83+
second	QUANTITY	0.8+
COVID	OTHER	0.79+
Ezmeral	TITLE	0.77+
couple	QUANTITY	0.75+
three fundamental things	QUANTITY	0.75+
Kubernete	TITLE	0.73+
GreenLake	TITLE	0.7+

Breaking Analysis: Five Questions About Snowflake’s Pending IPO

>> From theCUBE Studios in Palo Alto in Boston, bringing you data driven insights from theCUBE and ETR. This is breaking analysis with Dave Vellante. >> In June of this year, Snowflake filed a confidential document suggesting that it would do an IPO. Now of course, everybody knows about it, found out about it and it had a $20 billion valuation. So, many in the community and the investment community and so forth are excited about this IPO. It could be the hottest one of the year, and we're getting a number of questions from investors and practitioners and the entire Wiki bond, ETR and CUBE community. So, welcome everybody. This is Dave Vellante. This is "CUBE Insights" powered by ETR. In this breaking analysis, we're going to unpack five critical questions around Snowflake's IPO or pending IPO. And with me to discuss that is Erik Bradley. He's the Chief Engagement Strategists at ETR and he's also the Managing Director of VENN. Erik, thanks for coming on and great to see you as always. >> Great to see you too. Always enjoy being on the show. Thank you. >> Now for those of you don't know Erik, VENN is a roundtable that he hosts and he brings in CIOs, IT practitioners, CSOs, data experts and they have an open and frank conversation, but it's private to ETR clients. But they know who the individual is, what their role is, what their title is, et cetera and it's a kind of an ask me anything. And I participated in one of them this past week. Outstanding. And we're going to share with you some of that. But let's bring up the agenda slide if we can here. And these are really some of the questions that we're getting from investors and others in the community. There's really five areas that we want to address. The first is what's happening in this enterprise data warehouse marketplace? The second thing is kind of a one area. What about the legacy EDW players like Oracle and Teradata and Netezza? The third question we get a lot is can Snowflake compete with the big cloud players? Amazon, Google, Microsoft. I mean they're right there in the heart, in the thick of things there. And then what about that multi-cloud strategy? Is that viable? How much of a differentiator is that? And then we get a lot of questions on the TAM. Meaning the total available market. How big is that market? Does it justify the valuation for Snowflake? Now, Erik, you've been doing this now. You've run a couple VENNs, you've been following this, you've done some other work that you've done with Eagle Alpha. What's your, just your initial sort of takeaway from all this work that you've been doing. >> Yeah, sure. So my first take on Snowflake was about two and a half years ago. I actually hosted them for one of my VENN interviews and my initial thought was impressed. So impressed. They were talking at the time about their ability to kind of make ease of use of a multi-cloud strategy. At the time although I was impressed, I did not expect the growth and the hyper growth that we have seen now. But, looking at the company in its current iteration, I understand where the hype is coming from. I mean, it's 12 and a half billion private valuation in the last round. The least confidential IPO (laughs) anyone's ever seen (Dave laughs) with a 15 to $20 billion valuation coming out, which is more than Teradata, Margo and Cloudera combined. It's a great question. So obviously the success to this point is warranted, but we need to see what they're going to be able to do next. So I think the agenda you laid out is a great one and I'm looking forward to getting into some of those details. >> So let's start with what's happening in the marketplace and let's pull up a slide that I very much love to use. It's the classic X-Y. On the vertical axis here we show net score. And remember folks, net score is an indicator of spending momentum. ETR every quarter does like a clockwork survey where they're asking people, "Essentially are you spending more or less?" They subtract the less from the more and comes up with a net score. It's more complicated than, but like NPS, it's a very simple and reliable methodology. That's the vertical axis. And the horizontal axis is what's called market share. Market share is the pervasiveness within the data set. So it's calculated by the number of mentions of the vendor divided by the number of mentions within that sector. And what we're showing here is the EDW sector. And we've pulled out a few companies that I want to talk about. So the big three, obviously Microsoft, AWS and Google. And you can see Microsoft has a huge presence far to the right. AWS, very, very strong. A lot of Redshift in there. And then they're pretty high on the vertical axis. And then Google, not as much share, but very solid in that. Close to 60% net score. And then you can see above all of them from a vertical standpoint is Snowflake with a 77.5% net score. You can see them in the upper right there in the green. One of the highest Erik in the entire data set. So, let's start with some sort of initial comments on the big guys and Snowflakes. Your thoughts? >> Sure. Just first of all to comment on the data, what we're showing there is just the data warehousing sector, but Snowflake's actual net score is that high amongst the entire universe that we follow. Their data strength is unprecedented and we have forward-looking spending intention. So this bodes very well for them. Now, what you did say very accurately is there's a difference between their spending intentions on a net revenue level compared to AWS, Microsoft. There no one's saying that this is an apples-to-apples comparison when it comes to actual revenue. So we have to be very cognizant of that. There is domination (laughs) quite frankly from AWS and from Azure. And Snowflake is a necessary component for them not only to help facilitate a multi-cloud, but look what's happening right now in the US Congress, right? We have these tech leaders being grilled on their actual dominance. And one of the main concerns they have is the amount of data that they're collecting. So I think the environment is right to have another player like this. I think Snowflake really has a lot of longevity and our data is supporting that. And the commentary that we hear from our end users, the people that take the survey are supporting that as well. >> Okay, and then let's stay on this X-Y slide for a moment. I want to just pull out a couple of other comments here, because one of the questions we're asking is Whither, the legacy EDW players. So we've got in here, IBM, Oracle, you can see Teradata and then Hortonworks and MapR. We're going to talk a little bit about Hortonworks 'cause it's now Cloudera. We're going to talk a little bit about Hadoop and some of the data lakes. So you can see there they don't have nearly the net score momentum. Oracle obviously has a huge install base and is investing quite frankly in R&D and do an Exadata and it has its own cloud. So, it's got a lock on it's customers and if it keeps investing and adding value, it's not going away. IBM with Netezza, there's really been some questions around their commitment to that base. And I know that a lot of the folks in the VENNs that we've talked to Erik have said, "Well, we're replacing Netezza." Frank Slootman has been very vocal about going after Teradata. And then we're going to talk a little bit about the Hadoop space. But, can you summarize for us your thoughts in your research and the commentary from your community, what's going on with the legacy guys? Are these guys cooked? Can they hang on? What's your take? >> Sure. We focus on this quite a bit actually. So, I'm going to talk about it from the data perspective first, and then we'll go into some of the commentary and the panel. You even joined one yesterday. You know that it was touched upon. But, first on the data side, what we're noticing and capturing is a widening bifurcation between these cloud native and the legacy on-prem. It is undeniable. There is nothing that you can really refute. The data is concrete and it is getting worse. That gap is getting wider and wider and wider. Now, the one thing I will say is, nobody's going to rip out their legacy applications tomorrow. It takes years and years. So when you look at Teradata, right? Their market cap's only 2 billion, 2.3 billion. How much revenue growth do they need to stay where they are? Not much, right? No one's expecting them to grow 20%, which is what you're seeing on the left side of that screen. So when you look at the legacy versus the cloud native, there is very clear direction of what's happening. The one thing I would note from the data perspective is if you switched from net score or adoptions and you went to flat spending, you suddenly see Oracle and Teradata move over to that left a little bit, because again what I'm trying to say is I don't think they're going to catch up. No, but also don't think they're going away tomorrow. That these have large install bases, they have relationships. Now to kind of get into what you were saying about each particular one, IBM, they shut down Netezza. They shut it down and then they brought it back to life. How does that make you feel if you're the head of data architecture or you're DevOps and you're trying to build an application for a large company? I'm not going back to that. There's absolutely no way. Teradata on the other hand is known to be incredibly stable. They are known to just not fail. If you need to kind of re-architect or you do a migration, they work. Teradata also has a lot of compliance built in. So if you're a financials, if you have a regulated business or industry, there's still some data sets that you're not going to move up to the cloud. Whether it's a PII compliance or financial reasons, some of that stuff is still going to live on-prem. So Teradata is still has a very good niche. And from what we're hearing from our panels, then this is a direct quote if you don't mind me looking off screen for one second. But this is a great one. Basically said, "Teradata is the only one from the legacy camp who is putting up a fight and not giving up." Basically from a CIO perspective, the rest of them aren't an option anymore. But Teradata is still fighting and that's great to hear. They have their own data as a service offering and listen, they're a small market cap compared to these other companies we're talking about. But, to summarize, the data is very clear. There is a widening bifurcation between the two camps. I do not think legacy will catch up. I think all net new workloads are moving to data as a service, moving to cloud native, moving to hosted, but there are still going to be some existing legacy on-prem applications that will be supported with these older databases. And of those, Oracle and Teradata are still viable options. >> I totally agree with you and my colleague David Floyd is actually quite high on Teradata Vantage because he really does believe that a key component, we're going to talk about the TAM in a minute, but a key component of the TAM he believes must include the on-premises workloads. And Frank Slootman has been very clear, "We're not doing on-prem, we're not doing this halfway house." And so that's an opportunity for companies like Teradata, certainly Oracle I would put it in that camp is putting up a fight. Vertica is another one. They're very small, but another one that's sort of battling it out from the old NPP world. But that's great. Let's go into some of the specifics. Let's bring up here some of the specific commentary that we've curated here from the roundtables. I'm going to go through these and then ask you to comment. The first one is just, I mean, people are obviously very excited about Snowflake. It's easy to use, the whole thing zero to Snowflake in 90 minutes, but Snowflake is synonymous with cloud-native data warehousing. There are no equals. We heard that a lot from your VENN panelist. >> We certainly did. There was even more euphoria around Snowflake than I expected when we started hosting these series of data warehousing panels. And this particular gentleman that said that happens to be the global head of data architecture for a fortune 100 financials company. And you mentioned earlier that we did a report alongside Eagle Alpha. And we noticed that among fortune 100 companies that are also using the big three public cloud companies, Snowflake is growing market share faster than anyone else. They are positioned in a way where even if you're aligned with Azure, even if you're aligned with AWS, if you're a large company, they are gaining share right now. So that particular gentleman's comments was very interesting. He also made a comment that said, "Snowflake is the person who championed the idea that data warehousing is not dead yet. Use that old monthly Python line and you're not dead yet." And back in the day where the Hadoop came along and the data lakes turned into a data swamp and everyone said, "We don't need warehousing anymore." Well, that turned out to be a head fake, right? Hadoop was an interesting technology, but it's a complex technology. And it ended up not really working the way people want it. I think Snowflake came in at that point at an opportune time and said, "No, data warehousing isn't dead. We just have to separate the compute from the storage layer and look at what I can do. That increases flexibility, security. It gives you that ability to run across multi-cloud." So honestly the commentary has been nothing but positive. We can get into some of the commentary about people thinking that there's competition catching up to what they do, but there is no doubt that right now Snowflake is the name when it comes to data as a service. >> The other thing we heard a lot was ETL is going to get completely disrupted, you sort of embedded ETL. You heard one panelist say, "Well, it's interesting to see that guys like Informatica are talking about how fast they can run inside a Snowflake." But Snowflake is making that easy. That data prep is sort of part of the package. And so that does not bode well for ETL vendors. >> It does not, right? So ETL is a legacy of on-prem databases and even when Hadoop came along, it still needed that extra layer to kind of work with the data. But this is really, really disrupting them. Now the Snowflake's credit, they partner well. All the ETL players are partnered with Snowflake, they're trying to play nice with them, but the writings on the wall as more and more of this application and workloads move to the cloud, you don't need the ETL layer. Now, obviously that's going to affect their talent and Informatica the most. We had a recent comment that said, this was a CIO who basically said, "The most telling thing about the ETL players right now is every time you speak to them, all they talk about is how they work in a Snowflake architecture." That's their only metric that they talk about right now. And he said, "That's very telling." That he basically used it as it's their existential identity to be part of Snowflake. If they're not, they don't exist anymore. So it was interesting to have sort of a philosophical comment brought up in one of my roundtables. But that's how important playing nice and finding a niche within this new data as a service is for ETL, but to be quite honest, they might be going the same way of, "Okay, let's figure out our niche on these still the on-prem workloads that are still there." I think over time we might see them maybe as an M&A possibility, whether it's Snowflake or one of these new up and comers, kind of bring them in and sort of take some of the technology that's useful and layer it in. But as a large market cap, solo existing niche, I just don't know how long ETL is for this world. >> Now, yeah. I mean, you're right that if it wasn't for the marketing, they're not fighting fashion. But >> No. >> really there're some challenges there. Now, there were some contrarians in the panel and they signaled some potential icebergs ahead. And I guarantee you're going to see this in Snowflake's Red Herring when we actually get it. Like we're going to see all the risks. One of the comments, I'll mention the two and then we can talk about it. "Their engineering advantage will fade over time." Essentially we're saying that people are going to copycat and we've seen that. And the other point is, "Hey, we might see some similar things that happened to Hadoop." The public cloud players giving away these offerings at zero cost. Essentially marginal cost of adding another service is near zero. So the cloud players will use their heft to compete. Your thoughts? >> Yeah, first of all one of the reasons I love doing panels, right? Because we had three gentlemen on this panel that all had nothing but wonderful things to say. But you always get one. And this particular person is a CTO of a well known online public travel agency. We'll put it that way. And he said, "I'm going to be the contrarian here. I have seven different technologies from private companies that do the same thing that I'm evaluating." So that's the pressure from behind, right? The technology, they're going to catch up. Right now Snowflake has the best engineering which interestingly enough they took a lot of that engineering from IBM and Teradata if you actually go back and look at it, which was brought up in our panel as well. He said, "However, the engineering will catch up. They always do." Now from the other side they're getting squeezed because the big cloud players just say, "Hey, we can do this too. I can bundle it with all the other services I'm giving you and I can squeeze your pay. Pretty much give it a waive at the cost." So I do think that there is a very valid concern. When you come out with a $20 billion IPO evaluation, you need to warrant that. And when you see competitive pressures from both sides, from private emerging technologies and from the more dominant public cloud players, you're going to get squeezed there a little bit. And if pricing gets squeezed, it's going to be very, very important for Snowflake to continue to innovate. That comment you brought up about possibly being the next Cloudera was certainly the best sound bite that I got. And I'm going to use it as Clickbait in future articles, because I think everyone who starts looking to buy a Snowflake stock and they see that, they're going to need to take a look. But I would take that with a grain of salt. I don't think that's happening anytime soon, but what that particular CTO was referring to was if you don't innovate, the technology itself will become commoditized. And he believes that this technology will become commoditized. So therefore Snowflake has to continue to innovate. They have to find other layers to bring in. Whether that's through their massive war chest of cash they're about to have and M&A, whether that's them buying analytics company, whether that's them buying an ETL layer, finding a way to provide more value as they move forward is going to be very important for them to justify this valuation going forward. >> And I want to comment on that. The Cloudera, Hortonworks, MapRs, Hadoop, et cetera. I mean, there are dramatic differences obviously. I mean, that whole space was so hard, very difficult to stand up. You needed science project guys and lab coats to do it. It was very services intensive. As well companies like Cloudera had to fund all these open source projects and it really squeezed their R&D. I think Snowflake is much more focused and you mentioned some of the background of their engineers, of course Oracle guys as well. However, you will see Amazon's going to trot out a ton of customers using their RA3 managed storage and their flash. I think it's the DC two piece. They have a ton of action in the marketplace because it's just so easy. It's interesting one of the comments, you asked this yesterday, was with regard to separating compute from storage, which of course it's Snowflakes they basically invented it, it was one of their climbs to fame. The comment was what AWS has done to separate compute from storage for Redshift is largely a bolt on. Which I thought that was an interesting comment. I've had some other comments. My friend George Gilbert said, "Hey, despite claims to the contrary, AWS still hasn't separated storage from compute. What they have is really primitive." We got to dig into that some more, but you're seeing some data points that suggest there's copycatting going on. May not be as functional, but at the same time, Erik, like I was saying good enough is maybe good enough in this space. >> Yeah, and especially with the enterprise, right? You see what Microsoft has done. Their technology is not as good as all the niche players, but it's good enough and I already have a Microsoft license. So, (laughs) you know why am I going to move off of it. But I want to get back to the comment you mentioned too about that particular gentleman who made that comment about RedShift, their separation is really more of a bolt on than a true offering. It's interesting because I know who these people are behind the scenes and he has a very strong relationship with AWS. So it was interesting to me that in the panel yesterday he said he switched from Redshift to Snowflake because of that and some other functionality issues. So there is no doubt from the end users that are buying this. And he's again a fortune 100 financial organization. Not the same one we mentioned. That's a different one. But again, a fortune 100 well known financials organization. He switched from AWS to Snowflake. So there is no doubt that right now they have the technological lead. And when you look at our ETR data platform, we have that adoption reasoning slide that you show. When you look at the number one reason that people are adopting Snowflake is their feature set of technological lead. They have that lead now. They have to maintain it. Now, another thing to bring up on this to think about is when you have large data sets like this, and as we're moving forward, you need to have machine learning capabilities layered into it, right? So they need to make sure that they're playing nicely with that. And now you could go open source with the Apache suite, but Google is doing so well with BigQuery and so well with their machine learning aspects. And although they don't speak enterprise well, they don't sell to the enterprise well, that's changing. I think they're somebody to really keep an eye on because their machine learning capabilities that are layered into the BigQuery are impressive. Now, of course, Microsoft Azure has Databricks. They're layering that in, but this is an area where I think you're going to see maybe what's next. You have to have machine learning capabilities out of the box if you're going to do data as a service. Right now Snowflake doesn't really have that. Some of the other ones do. So I had one of my guest panelist basically say to me, because of that, they ended up going with Google BigQuery because he was able to run a machine learning algorithm within hours of getting set up. Within hours. And he said that that kind of capability out of the box is what people are going to have to use going forward. So that's another thing we should dive into a little bit more. >> Let's get into that right now. Let's bring up the next slide which shows net score. Remember this is spending momentum across the major cloud players and plus Snowflake. So you've got Snowflake on the left, Google, AWS and Microsoft. And it's showing three survey timeframes last October, April 20, which is right in the middle of the pandemic. And then the most recent survey which has just taken place this month in July. And you can see Snowflake very, very high scores. Actually improving from the last October survey. Google, lower net scores, but still very strong. Want to come back to that and pick up on your comments. AWS dipping a little bit. I think what's happening here, we saw this yesterday with AWS's results. 30% growth. Awesome. Slight miss on the revenue side for AWS, but look, I mean massive. And they're so exposed to so many industries. So some of their industries have been pretty hard hit. Microsoft pretty interesting. A little softness there. But one of the things I wanted to pick up on Erik, when you're talking about Google and BigQuery and it's ML out of the box was what we heard from a lot of the VENN participants. There's no question about it that Google technically I would say is one of Snowflake's biggest competitors because it's cloud native. Remember >> Yep. >> AWS did a license one time. License deal with PowerShell and had a sort of refactor the thing to be cloud native. And of course we know what's happening with Microsoft. They basically were on-prem and then they put stuff in the cloud and then all the updates happen in the cloud. And then they pushed to on-prem. But they have that what Frank Slootman calls that halfway house, but BigQuery no question technically is very, very solid. But again, you see Snowflake right now anyway outpacing these guys in terms of momentum. >> Snowflake is out outpacing everyone (laughs) across our entire survey universe. It really is impressive to see. And one of the things that they have going for them is they can connect all three. It's that multi-cloud ability, right? That portability that they bring to you is such an important piece for today's modern CIO as data architects. They don't want vendor lock-in. They are afraid of vendor lock-in. And this ability to make their data portable and to do that with ease and the flexibility that they offer is a huge advantage right now. However, I think you're a hundred percent right. Google has been so focused on the engineering side and never really focusing on the enterprise sales side. That is why they're playing catch up. I think they can catch up. They're bringing in some really important enterprise salespeople with experience. They're starting to learn how to talk to enterprise, how to sell, how to support. And nobody can really doubt their engineering. How many open sources have they given us, right? They invented Kubernetes and the entire container space. No one's really going to compete with them on that side if they learn how to sell it and support it. Yeah, right now they're behind. They're a distant third. Don't get me wrong. From a pure hosted ability, AWS is number one. Microsoft is yours. Sometimes it looks like it's number one, but you have to recognize that a lot of that is because of simply they're hosted 365. It's a SAS app. It's not a true cloud type of infrastructure as a service. But Google is a distant third, but their technology is really, really great. And their ability to catch up is there. And like you said, in the panels we were hearing a lot about their machine learning capability is right out of the box. And that's where this is going. What's the point of having this huge data if you're not going to be supporting it on new application architecture. And all of those applications require machine learning. >> Awesome. So we're. And I totally agree with what you're saying about Google. They just don't have it figured out how to sell the enterprise yet. And a hundred percent AWS has the best cloud. I mean, hands down. But a very, very competitive market as we heard yesterday in front of Congress. Now we're on the point about, can Snowflake compete with the big cloud players? I want to show one more data point. So let's bring up, this is the same chart as we showed before, but it's new adoptions. And this is really telling. >> Yeah. >> You can see Snowflake with 34% in the yellow, new adoptions, down yes from previous surveys, but still significantly higher than the other players. Interesting to see Google showing momentum on new adoptions, AWS down on new adoptions. And again, exposed to a lot of industries that have been hard hit. And Microsoft actually quite low on new adoption. So this is very impressive for Snowflake. And I want to talk about the multi-cloud strategy now Erik. This came up a lot. The VENN participants who are sort of fans of Snowflake said three things: It was really the flexibility, the security which is really interesting to me. And a lot of that had to do with the flexibility. The ability to easily set up roles and not have to waste a lot of time wrangling. And then the third was multi-cloud. And that was really something that came through heavily in the VENN. Didn't it? >> It really did. And again, I think it just comes down to, I don't think you can ever overstate how afraid these guys are of vendor lock-in. They can't have it. They don't want it. And it's best practice to make sure your sensitive information is being kind of spread out a little bit. We all know that people don't trust Bezos. So if you're in certain industries, you're not going to use AWS at all, right? So yeah, this ability to have your data portability through multi-cloud is the number one reason I think people start looking at Snowflake. And to go to your point about the adoptions, it's very telling and it bodes well for them going forward. Most of the things that we're seeing right now are net new workloads. So let's go again back to the legacy side that we were talking about, the Teradatas, IBMs, Oracles. They still have the monolithic applications and the data that needs to support that, right? Like an old ERP type of thing. But anyone who's now building a new application, bringing something new to market, it's all net new workloads. There is no net new workload that is going to go to SAP or IBM. It's not going to happen. The net new workloads are going to the cloud. And that's why when you switch from net score to adoption, you see Snowflake really stand out because this is about new adoption for net new workloads. And that's really where they're driving everything. So I would just say that as this continues, as data as a service continues, I think Snowflake's only going to gain more and more share for all the reasons you stated. Now get back to your comment about security. I was shocked by that. I really was. I did not expect these guys to say, "Oh, no. Snowflake enterprise security not a concern." So two panels ago, a gentleman from a fortune 100 financials said, "Listen, it's very difficult to get us to sign off on something for security. Snowflake is past it, it is enterprise ready, and we are going full steam ahead." Once they got that go ahead, there was no turning back. We gave it to our DevOps guys, we gave it to everyone and said, "Run with it." So, when a company that's big, I believe their fortune rank is 28. (laughs) So when a company that big says, "Yeah, you've got the green light. That we were okay with the internal compliance aspect, we're okay with the security aspect, this gives us multi-cloud portability, this gives us flexibility, ease of use." Honestly there's a really long runway ahead for Snowflake. >> Yeah, so the big question I have around the multi-cloud piece and I totally and I've been on record saying, "Look, if you're going looking for an agnostic multi-cloud, you're probably not going to go with the cloud vendor." (laughs) But I've also said that I think multi-cloud to date anyway has largely been a symptom as opposed to a strategy, but that's changing. But to your point about lock-in and also I think people are maybe looking at doing things across clouds, but I think that certainly it expands Snowflake's TAM and we're going to talk about that because they support multiple clouds and they're going to be the best at that. That's a mandate for them. The question I have is how much of complex joining are you going to be doing across clouds? And is that something that is just going to be too latency intensive? Is that really Snowflake's expertise? You're really trying to build that data layer. You're probably going to maybe use some kind of Postgres database for that. >> Right. >> I don't know. I need to dig into that, but that would be an opportunity from a TAM standpoint. I just don't know how real that is. >> Yeah, unfortunately I'm going to just be honest with this one. I don't think I have great expertise there and I wouldn't want to lead anyone a wrong direction. But from what I've heard from some of my VENN interview subjects, this is happening. So the data portability needs to be agnostic to the cloud. I do think that when you're saying, are there going to be real complex kind of workloads and applications? Yes, the answer is yes. And I think a lot of that has to do with some of the container architecture as well, right? If I can just pull data from one spot, spin it up for as long as I need and then just get rid of that container, that ethereal layer of compute. It doesn't matter where the cloud lies. It really doesn't. I do think that multi-cloud is the way of the future. I know that the container workloads right now in the enterprise are still very small. I've heard people say like, "Yeah, I'm kicking the tires. We got 5%." That's going to grow. And if Snowflake can make themselves an integral part of that, then yes. I think that's one of those things where, I remember the guy said, "Snowflake has to continue to innovate. They have to find a way to grow this TAM." This is an area where they can do so. I think you're right about that, but as far as my expertise, on this one I'm going to be honest with you and say, I don't want to answer incorrectly. So you and I need to dig in a little bit on this one. >> Yeah, as it relates to question four, what's the viability of Snowflake's multi-cloud strategy? I'll say unquestionably supporting multiple clouds, very viable. Whether or not portability across clouds, multi-cloud joins, et cetera, TBD. So we'll keep digging into that. The last thing I want to focus on here is the last question, does Snowflake's TAM justify its $20 billion valuation? And you think about the data pipeline. You go from data acquisition to data prep. I mean, that really is where Snowflake shines. And then of course there's analysis. You've got to bring in EMI or AI and ML tools. That's not Snowflake's strength. And then you're obviously preparing that, serving that up to the business, visualization. So there's potential adjacencies that they could get into that they may or may not decide to. But so we put together this next chart which is kind of the TAM expansion opportunity. And I just want to briefly go through it. We published this stuff so you can go and look at all the fine print, but it's kind of starts with the data lake disruption. You called it data swamp before. The Hadoop no schema on, right? Basically the ROI of Hadoop became reduction of investment as my friend Abby Meadow would say. But so they're kind of disrupting that data lake which really was a failure. And then really going after that enterprise data warehouse which is kind of I have it here as a 10 billion. It's actually bigger than that. It's probably more like a $20 billion market. I'll update this slide. And then really what Snowflake is trying to do is be data as a service. A data layer across data stores, across clouds, really make it easy to ingest and prepare data and then serve the business with insights. And then ultimately this huge TAM around automated decision making, real-time analytics, automated business processes. I mean, that is potentially an enormous market. We got a couple of hundred billion. I mean, just huge. Your thoughts on their TAM? >> I agree. I'm not worried about their TAM and one of the reasons why as I mentioned before, they are coming out with a whole lot of cash. (laughs) This is going to be a red hot IPO. They are going to have a lot of money to spend. And look at their management team. Who is leading the way? A very successful, wise, intelligent, acquisitive type of CEO. I think there is going to be M&A activity, and I believe that M&A activity is going to be 100% for the mindset of growing their TAM. The entire world is moving to data as a service. So let's take as a backdrop. I'm going to go back to the panel we did yesterday. The first question we asked was, there was an understanding or a theory that when the virus pandemic hit, people wouldn't be taking on any sort of net new architecture. They're like, "Okay, I have Teradata, I have IBM. Let's just make sure the lights are on. Let's stick with it." Every single person I've asked, they're just now eight different experts, said to us, "Oh, no. Oh, no, no." There is the virus pandemic, the shift from work from home. Everything we're seeing right now has only accelerated and advanced our data as a service strategy in the cloud. We are building for scale, adopting cloud for data initiatives. So, across the board they have a great backdrop. So that's going to only continue, right? This is very new. We're in the early innings of this. So for their TAM, that's great because that's the core of what they do. Now on top of it you mentioned the type of things about, yeah, right now they don't have great machine learning. That could easily be acquired and built in. Right now they don't have an analytics layer. I for one would love to see these guys talk to Alteryx. Alteryx is red hot. We're seeing great data and great feedback on them. If they could do that business intelligence, that analytics layer on top of it, the entire suite as a service, I mean, come on. (laughs) Their TAM is expanding in my opinion. >> Yeah, your point about their leadership is right on. And I interviewed Frank Slootman right in the heart of the pandemic >> So impressed. >> and he said, "I'm investing in engineering almost sight unseen. More circumspect around sales." But I will caution people. That a lot of people I think see what Slootman did with ServiceNow. And he came into ServiceNow. I have to tell you. It was they didn't have their unit economics right, they didn't have their sales model and marketing model. He cleaned that up. Took it from 120 million to 1.2 billion and really did an amazing job. People are looking for a repeat here. This is a totally different situation. ServiceNow drove a truck through BMCs install base and with IT help desk and then created this brilliant TAM expansion. Let's learn and expand model. This is much different here. And Slootman also told me that he's a situational CEO. He doesn't have a playbook. And so that's what is most impressive and interesting about this. He's now up against the biggest competitors in the world: AWS, Google and Microsoft and dozens of other smaller startups that have raised a lot of money. Look at the company like Yellowbrick. They've raised I don't know $180 million. They've got a great team. Google, IBM, et cetera. So it's going to be really, really fun to watch. I'm super excited, Erik, but I'll tell you the data right now suggest they've got a great tailwind and if they can continue to execute, this is going to be really fun to watch. >> Yeah, certainly. I mean, when you come out and you are as impressive as Snowflake is, you get a target on your back. There's no doubt about it, right? So we said that they basically created the data as a service. That's going to invite competition. There's no doubt about it. And Yellowbrick is one that came up in the panel yesterday about one of our CIOs were doing a proof of concept with them. We had about seven others mentioned as well that are startups that are in this space. However, none of them despite their great valuation and their great funding are going to have the kind of money and the market lead that Slootman is going to have which Snowflake has as this comes out. And what we're seeing in Congress right now with some antitrust scrutiny around the large data that's being collected by AWS as your Google, I'm not going to bet against this guy either. Right now I think he's got a lot of opportunity, there's a lot of additional layers and because he can basically develop this as a suite service, I think there's a lot of great opportunity ahead for this company. >> Yeah, and I guarantee that he understands well that customer acquisition cost and the lifetime value of the customer, the retention rates. Those are all things that he and Mike Scarpelli, his CFO learned at ServiceNow. Not learned, perfected. (Erik laughs) Well Erik, really great conversation, awesome data. It's always a pleasure having you on. Thank you so much, my friend. I really appreciate it. >> I appreciate talking to you too. We'll do it again soon. And stay safe everyone out there. >> All right, and thank you for watching everybody this episode of "CUBE Insights" powered by ETR. This is Dave Vellante, and we'll see you next time. (soft music)

Published Date : Jul 31 2020

SUMMARY :

This is breaking analysis and he's also the Great to see you too. and others in the community. I did not expect the And the horizontal axis is And one of the main concerns they have and some of the data lakes. and the legacy on-prem. but a key component of the TAM And back in the day where of part of the package. and Informatica the most. I mean, you're right that if And the other point is, "Hey, and from the more dominant It's interesting one of the comments, that in the panel yesterday and it's ML out of the box the thing to be cloud native. That portability that they bring to you And I totally agree with what And a lot of that had to and the data that needs and they're going to be the best at that. I need to dig into that, I know that the container on here is the last question, and one of the reasons heart of the pandemic and if they can continue to execute, And Yellowbrick is one that and the lifetime value of the customer, I appreciate talking to you too. This is Dave Vellante, and

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Frank Slootman	PERSON	0.99+
George Gilbert	PERSON	0.99+
Erik Bradley	PERSON	0.99+
Erik	PERSON	0.99+
Frank Slootman	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Mike Scarpelli	PERSON	0.99+
Google	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
David Floyd	PERSON	0.99+
Slootman	PERSON	0.99+
Teradata	ORGANIZATION	0.99+
Abby Meadow	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
$180 million	QUANTITY	0.99+
$20 billion	QUANTITY	0.99+
Netezza	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
77.5%	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
20%	QUANTITY	0.99+
10 billion	QUANTITY	0.99+
12 and a half billion	QUANTITY	0.99+
120 million	QUANTITY	0.99+
Oracles	ORGANIZATION	0.99+
one	QUANTITY	0.99+
two	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
Yellowbrick	ORGANIZATION	0.99+

Buno Pati, Infoworks io | CUBEConversation January 2020

>> From the SiliconANGLE media office in Boston, Massachusetts, it's theCUBE. Now, here's your host, Dave Vellante. >> Hello everyone, and welcome to this CUBE Conversation. You know, theCUBE has been following the trends in the so-called big data space since 2010. And one of the things that we reported on for a number of years is the complexity involved in wrangling and making sense out of data. The allure of this idea of no schema on write and very low cost platforms like Hadoop became a data magnet. And for years, organizations would shove data into a data lake. And of course the joke was it was became a data swamp. And organizations really struggled to realize the promised return on their big data investments. Now, while the cloud certainly simplified infrastructure deployment, it really introduced a much more complex data environment and data pipeline, with dozens of APIs and a mind-boggling array of services that required highly skilled data engineers to properly ingest, shape, and prepare that data, so that it could be turned into insights. This became a real time suck for data pros, who spent 70 to 80% of their time wrestling data. A number of people saw the opportunity to solve this problem and automate the heavy lift of data, and simplify the process to adjust, synchronize, transform, and really prepare data for analysis. And one of the companies that is attacking this challenge is InfoWorks. And with me to talk about the evolving data landscape is Buno Pati, CEO of InfoWorks. Buno, great to see you, thanks for coming in. >> Well thank you Dave, thanks for having me here. >> You're welcome. I love that you're in Palo Alto, you come to MetroWest in Boston to see us (Buno laughs), that's great. Well welcome. So, you heard my narrative. We're 10 years plus into this big data theme and meme. What did we learn, what are some of the failures and successes that we can now build on, from your point of view? >> All right, so Dave, I'm going to start from the top, with why big data, all right? I think this big data movement really started with the realization by companies that they need to transform their customer experience and their operations, in order to compete effectively in this increasingly digital world, right? And in that context, they also realized very quickly that data was the key asset on which this transformation would be built. So given that, you look at this and say, "What is digital transformation really about?" It is about competing with digital disruption, or fending off digital disruption. And this has become, over time, an existential imperative. You cannot survive and be relevant in this world without leveraging data to compete with others who would otherwise disrupt your business. >> You know, let's stay on that for a minute, because when we started the whole big data, covering that big data space, you didn't really hear about digital transformation. That's sort of a more recent trend. So I got to ask you, what's the difference between a business and a digital business, in your view? >> That is the foundational question behind big data. So if you look at a digital native, there are many of them that you can name. These companies start by building a foundational platform on which they build their analytics and data programs. It gives them a tremendous amount of agility and the right framework within which to build a data-first strategy. A data-first strategy where business information is persistently collected and used at every level of the organization. Furthermore, they take this and they automate this process. Because if you want to collect all your data and leverage it at every part of the business, it needs to be a highly automated system, and it needs to be able to seamlessly traverse on-premise, cloud, hybrid, and multi-cloud environments. Now, let's look at a traditional business. In a traditional enterprise, there is no foundational platform. There are things like point tools for ETL, and data integration, and you can name a whole slew of other things, that need to be stitched together and somehow made to work to deliver data to the applications that consume. The strategy is not a data-first strategy. It is use case by use case. When there is a use case, people go and find the data, they gather the data, they transform that data, and eventually feed an application. A process that can take months to years, depending on the complexity of the project that they're trying. And they don't automate this. This is heavily dependent, as you pointed out, on engineering talent, highly skilled engineering talent that is scarce. And they have not seamlessly traversed the various clouds and on-premise environments, but rather fragmented those environments, where individual teams are focused on a single environment, building different applications, using different tools, and different infrastructure. >> So you're saying the digital native company puts data at the core. They organize around that data, as opposed to maybe around a bottling plant, or around people. And then they leverage that data for competitive advantage through a platform that's kind of table stakes. And then obviously there's cultural aspects and other skills that they need to develop, right? >> Yeah, they have an ability which traditional enterprises don't. Because of this choice of a data-first strategy with a foundational platform, they have the ability to rapidly launch analytics use cases and iterate all them. That is not possible in a traditional or legacy environment. >> So their speed to market and time to value is going to be much better than their competition. This gets into the risk of disruption. Sometimes we talk about cloud native and cloud naive. You could talk about digital native and digital naive. So it's hard for incumbents to fend off the disrupters, and then ultimately become disrupters themselves. But what are you seeing in terms of some of the trends where organizations are having success there? >> One of the key trends that we're seeing, or key attributes of companies that are seeing a lot of success, is when they have organized themselves around their data. Now, what do I mean by that? This is usually a high-level mandate coming down from the top of the company, where they're forming centralized groups to manage the data and make it available for the rest of the organization to use. There are a variety of names that are being used for this. People are calling it their data fabric. They're calling it data as a service, which is pretty descriptive of what it ends up being. And those are terms that are all sort of representing the same concept of a centralized environment and, ideally, a highly automated environment that serves the rest of the business with data. And the goal, ultimately, is to get any data at any time for any application. >> So, let's talk a little bit about the cloud. I mentioned up front that the cloud really simplified infrastructure deployment, but it really didn't solve this problem of, we talked about in terms of data wrangling. So, why didn't it solve that problem? And you got companies like Amazon and Google and Microsoft, who are very adept at data. They're some of these data-first companies. Why is it that the cloud sort of in and of itself has not been able to solve this problem? >> Okay, so when you say solve this problem, it sort of begs the question, what's the goal, right? And if I were to very simply state the goal, I would call it analytics agility. It is gaining agility with analytics. Companies are going from a traditional world, where they had to generate a handful of BI and other reporting type of dashboards in a year, to where they literally need to generate thousands of these things in a year, to run the business and compete with digital disruption. So agility is the goal. >> But wait, the cloud is all about agility, is it not? >> It is, when you talk about agility of compute and storage infrastructure. So, there are three layers to this problem. The first is, what is the compute and storage infrastructure? The cloud is wonderful in that sense. It gives you the ability to rapidly add new infrastructure and spin it down when it's not in use. That is a huge blessing, when you compare it to the six to nine months, or perhaps even longer, that it takes companies to order, install, and test hardware on premise, and then find that it's only partially used. The next layer on that is what is the operating system on which my data and analytics are going to be run? This is where Hadoop comes in. Now, Hadoop is inherently complex, but operating systems are complex things. And Spark falls in that category. Databricks has taken some of the complexity out of running Spark because of their sort of manage service type of offering. But there's still a missing layer, which leverages that infrastructure and that operating system to deliver this agility where users can access data that they need anywhere in the organization, without intensely deep knowledge of what that infrastructure is and what that operating system is doing underneath. >> So, in my up front narrative, I talked about the data pipeline a little bit. But I'm inferring from your comments on platform that it's more than just this sort of narrow data pipeline. There's a macro here. I wonder if you could talk about that a little bit. >> Yeah. So, the data pipeline is one piece of the puzzle. What needs to happen? Data needs to be ingested. It needs to be brought into these environments. It has to be kept fresh, because the source data is persistently changing. It needs to be organized and cataloged, so that people know what's there. And from there, pipelines can be created that ultimately generate data in a form that's consumable by the application. But even surrounding that, you need to be able to orchestrate all of this. Typical enterprise is a multi-cloud enterprise. 80% of all enterprises have more than one cloud that they're working on, and on-premise. So if you can't orchestrate all of this activity in the pipelines, and the data across these various environments, that's not a complete solution either. There's certainly no agility in that. Then there's governance, security, lineage. All of this has to be managed. It's not simply creation of the pipeline, but all these surrounding things that need to happen in order for analytics to run at-scale within enterprises. >> So the cloud sort of solved that layer one problem. And you certainly saw this in the, not early days, but sort of mid-days of Hadoop, where the cloud really became the place where people wanted to do a lot of their Hadoop workloads. And it was kind of ironic that guys like Hortonworks, and Cloudera and MapR really didn't have a strong cloud play. But now, it's sort of flipping back where, as you point out, everybody's multi-cloud. So you have to include a lot of these on-prem systems, whether it's your Oracle database or your ETL systems or your existing data warehouse, those are data feeds into the cloud, or the digital incumbent who wants to be a digital native. They can't just throw all that stuff away, right? So you're seeing an equilibrium there. >> An equilibrium between ... ? >> Yeah, between sort of what's in the cloud and what's on-prem. Let me ask it this way: If the cloud is not a panacea, is there an approach that does really solve the problem of different datasets, the need to ingest them from different clouds, on-prem, and bring them into a platform that can be analyzed and drive insights for an organization? >> Yeah, so I'm going to stay away from the word panacea, because I don't think there ever is really a panacea to any problem. >> That's good, that means we got a good roadmap for our business then. (both laugh) >> However, there is a solution. And the solution has to be guided by three principles. Number one, automation. If you do not automate, the dependence on skill talent is never going to go away. And that talent, as we all know, is very very scarce and hard to come by. The second thing is integration. So, what's different now? All of these capabilities that we just talked about, whether it's things like ETL, or cataloging, or ingesting, or keeping data fresh, or creating pipelines, all of this needs to be integrated together as a single solution. And that's been missing. Most of what we've seen is point tools. And the third is absolutely critical. For things to work in multi-cloud and hybrid environments, you need to introduce a layer of abstraction between the complexity of the underlying systems and the user of those systems. And the way to think about this, Dave, is to think about it much like a compiler. What does a compiler do, right? You don't have to worry about what Intel processor is underneath, what version of your operating system you're running on, what memory is in the system. Ultimately, you might-- >> As much as we love assembly code. >> As much as we love assembly code. Now, so take the analogy a little bit further, there was a time when we wrote assembly code because there was no compiler. So somebody had to sit back and say, "Hey, wouldn't it be nice if we abstracted away from this?" (both laugh) >> Okay, so this sort of sets up my next question, which is, is this why you guys started InfoWorks? Maybe you could talk a little bit about your why, and kind of where you fit. >> So, let me give you the history of InfoWorks. Because the vision of InfoWorks, believe it or not, came out of a rear view mirror. Looking backwards, not forwards. And then predicting the future in a different manner. So, Amar Arsikere is the founder of InfoWorks. And when I met him, he had just left Zynga, where he was the general manager of their gaming platform. What he told me was very very simple. He said he had been at Google at a time when Google was moving off of the legacy systems of, I believe it was Netezza, and Oracle, and a variety of things. And they had just created Bigtable, and they wanted to move and create a data warehouse on Bigtable. So he was given that job. And he led that team. And that, as you might imagine, was this massive project that required a high degree of automation to make it all come together. And he built that, and then he built a very similar system at Zynga, when he was there. These foundational platforms, going back to what I was talking about before digital days. When I met him, he said, "Look, looking back, "Google may have been the only company "that needed such a platform. "But looking forward, "I believe that everyone's going to need one." And that has, you know, absolute truth in it, and that's what we're seeing today. Where, after going through this exercise of trying to write machine code, or assembly code, or whatever we'd like to call it, down at the detailed, complex level of an operating system or infrastructure, people have realized, "Hey, I need something much more holistic. "I need to look at this from a enterprise-wide perspective. "And I need to eliminate all of this dependence on," kind of like the cloud plays a role because it eliminates some of the dependence, or the bottlenecks around hardware and infrastructure. "And ultimately gain a lot more agility "than I'm able to do with legacy methodology." So you were asking early on, what are the lessons learned from that first 10 years? And lot of technology goes through these types of cycles of hype and disillusionment, and we all know the curve. I think there are two key lessons. One is, just having a place to land your data doesn't solve your problem. That's the beginning of your problems. And the second is that legacy methodologies do not transfer into the future. You have to think differently. And looking to the digital natives as guides for how to think, when you're trying to compete with them is a wonderful perspective to take. >> But those legacy technologies, if you're an incumbent, you can't just rip 'em and throw 'em out and convert. You going to use them as feeders to your digital platform. So, presumably, you guys have products. You call this space Enterprise Data Ops and Orchestration, EDO2. Presumably you have products and a portfolio to support those higher layer challenges that we talked about, right? >> Yeah, so that's a really important question. No, you don't rip and replace stuff. These enterprises have been built over years of acquisitions and business systems. These are layers, one on top of another. So think about the introduction of ERP. By the way, ERP is a good analogy of to what happened, because those were point tools that were eventually combined into a single system called ERP. Well, these are point capabilities that are being combined into a single system for EDO2, or Enterprise Data Operations and Orchestration. The old systems do not go away. And we are seeing some companies wanting to move some of their workloads from old systems to new systems. But that's not the major trend. The major trend is that new things that get done, the things that give you holistic views of the company, and then analytics based on that holistic view, are all being done on the new platforms. So it's a layer on top. It's not a rip and replace of the layers underneath. What's in place stays in place. But for the layer on top, you need to think differently. You cannot use all the legacy methodologies and just say that's going to apply to the new platform or new system. >> Okay, so how do you engage with customers? Take a customer who's got, you know, on-prem, they've got legacy infrastructure, they don't want to get disrupted. They want to be a digital native. How do you help them? You know, what do I buy from you? >> Yeah, so our product is called DataFoundry. It is a EDO2 system. It is built on the three principles, founding principles, that I mentioned earlier. It is highly automated. It is integrated in all the capabilities that surround pipelines, perhaps. And ultimately, it's also abstracting. So we're able to very easily traverse one cloud to another, or on-premise to the cloud, or even back. There are some customers that are moving some workloads back from the cloud. Now, what's the benefit here? Well first of all, we lay down the foundation for digital transformation. And we enable these companies to consolidate and organize their data in these complex hybrid, cloud, multi-cloud environments. And then generate analytics use cases 10x faster with about tenth of the resource. And I'm happy to give you some examples on how that works. >> Please do. I mean, maybe you could share some customer examples? >> Yeah, absolutely. So, let me talk about Macy's. >> Okay. >> Macy's is a customer of ours. They've been a customer for about, I think about 14 months at this point in time. And they had built a number of systems to run their analytics, but then recognized what we're seeing other companies recognize. And that is, there's a lot of complexity there. And building it isn't the end game. Maintaining it is the real challenge, right? So even if you have a lot of talent available to you, maintaining what you built is a real challenge. So they came to us. And within a period of 12 months, I'll just give you some numbers that are just mind-blowing. They are currently running 165,000 jobs a month. Now, what's a job? A job is a ingestion job, or a synchronization job, or a transformation. They have launched 431 use cases over a period of 12 months. And you know what? They're just ramping. They will get to thousands. >> Scale. >> Yeah, scale. And they have ingested a lot of data, brought in a lot of DataSources. So to do that in a period of 12 months is unheard of. It does not happen. Why is it important for them? So what problem are they trying to solve? They're a retailer. They are being digitally disruptive like (chuckles) no one else. >> They have an Amazon war room-- >> Right. >> No doubt. >> And they have had to build themselves out as a omni-channel retailer now. They are online, they are also with brick and mortar stores. So you take a look at this. And the key to competing with digital disrupters is the customer experience. What is that experience? You're online, how does that meld with your in-store experience? What happens if I buy online and return something in a store? How does all this come together into a single unified experience for the consumer? And that's what they're chasing. So that was the first application that they came to us with. They said, "Look, let us go into a customer 360. "Let us understand the entirety "of that customer's interaction "and touchpoints with our business. "And having done so, we are in a position "to deliver a better experience." >> Now that's a data problem. I mean, different DataSources, and trying to understand 360, I mean, you got data all over the place. >> All over the place. (speaking simultaneously) And there's historical data, there's stuff coming in from, you know, what's online, what's in the store. And then they progress from there. I mean, they're not restricting it to customer experience and selling. They're looking at merchandising, and inventory, and fulfillment, and store operations. Simple problem. You order something online, where do I pull this from? A store or a warehouse? >> So this is, you know, big data 2.0, just to use a sort of silly term. But it's really taking advantage of all the investment. I've often said, you know, Hadoop, for all the criticism it gets, it did lower our cost of getting data into, you know, at least one virtual place. And it got us thinking about how to get insights out of data. And so, what you're describing is the ability to operationalize your data initiatives at scale. >> Yeah, you can absolutely get your insights off of Hadoop. And I know people have different opinions of Hadoop, given their experience. But what they don't have, what these customers have not achieved yet, most of them, is that agility, right? So, how easily can you get your insights off of Hadoop? Do I need to hire a boatload of consultants who are going to write code for me, and shovel data in, and create these pipelines, and so forth? Or can I do this with a click of a button, right? And that's the difference. That is truly the difference. The level of automation that you need, and the level of abstraction that you need, away from this complexity, has not been delivered. >> We did, in, it must have been 2011, I think, the very first big data market study from anybody in the world, and put it out on, you know, Wikibon, free research. And one of the findings was (chuckles) this is a huge services business. I mean, the professional service is where all the money was going to flow because it was so complicated. And that's kind of exactly what happened. But now we're entering, really it seems like a phase where you can scale, and operationalize, and really simplify, and really focus your attention on driving business value, versus making stuff work. >> You are absolutely correct. So I'll give you the numbers. 55% of this industry is services. About 30% is software, and the rest is hardware. Break it down that way. 55%. So what's going on? People will buy a big data system. Call it Hadoop, it could be something in the cloud, it could be Databricks. And then, this is welcome to the world of SIs. Because at this point, you need these SIs to write code and perform these services in order to get any kind of value out of that. And look, we have some dismal numbers that we're staring at. According to Gardner, only 17% of those who have invested in Hadoop have anything in production. This is after how many years? And you look at surveys from, well, pick your favorite. They all look the same. People have not been able to get the value out of this, because it is too hard. It is too complex and you need too many consultants (laughs) delivering services for you to make this happen. >> Well, what I like about your story, Buno, is you're not, I mean, a lot of the data companies have pivoted to AI. Sort of like, we have a joke, ya know, same wine, new bottle. But you're not talking about, I mean sure, machine intelligence, I'm sure, fits in here, but you're talking about really taking advantage of the investments that you've made in the last decade and helping incumbents become digital natives. That sounds like it's at least a part of your mission here. >> Not become digital natives, but rather compete with them. >> Yeah, right, right. >> Effectively, right? >> Yep, okay. >> So, yeah, that is absolutely what needs to get done. So let me talk for a moment about AI, all right? Way back when, there was another wave of AI in the late 80s. I was part of that, I was doing my PhD at the time. And that obviously went nowhere, because we didn't have any data, we didn't have enough compute power or connectivity. Pretty inert. So here it is again. Very little has changed. Except for we do have the data, we have the connectivity, and we have the compute power. But do we really? So what's AI without the data? Just A, right? There's nothing there. So what's missing, even for AI and ML to be, and I believe these are going to be powerful game changers. But for them to be effective, you need to provide data to it, and you need to be able to do so in a very agile way, so that you can iterate on ideas. No one knows exactly what AI solution is going to solve your problem or enhance your business. This is a process of experimentation. This is what a company like Google can do extraordinarily well, because of this foundational platform. They have this agility to keep iterating, and experimenting, and trying ideas. Because without trying them, you will not discover what works best. >> Yeah, I mean, for 50 years, this industry has marched to the cadence of Moore's Law, and that really was the engine of innovation. And today, it's about data, applying machine intelligence to that data. And the cloud brings, as you point out, agility and scale. That's kind of the new cocktail for innovation, isn't it? >> The cloud brings agility and scale to the infrastructure. >> In low risk, as you said, right? >> Yeah. >> Experimentation, fail fast, et cetera. >> But without an EDO2 type of system, that gives you a great degree of automation, you could spend six months to run one experiment with AI. >> Yeah, because-- >> In gathering data and feeding it to it. >> 'Cause if the answer is people and throwing people at the problem, then you're not going to scale. >> You're not going to scale, and you're never going to really leverage AI and ML capabilities. You need to be able to do that not in six months, in six days, right, or less. >> So let's talk about your company a little bit. Can you give us the status, you know, where you're at? As their newly minted CEO, what your sort of goals are, milestones that we should be watching in 2020 and beyond? >> Yeah, so newly minted CEO, I came in July of last year. This has been an extraordinary company. I started my journey with this company as an investor. And it was funded by actually two funds that I was associated with, first being Nexus Venture Partners, and then Centerview Capital, where I'm still a partner. And myself and my other two partners looked at the opportunity and what the company had been able to do. And in July of last year, I joined as CEO. My partner, David Dorman, who used to be CEO of AT&T, he joined as chairman. And my third partner, Ned Hooper, joined as President and Chief Operating Officer. Ned used to be the Chief Strategy Officer of Cisco. So we pushed pause on the funding, and that's about as all-in as a fund can get. >> Yeah, so you guys were operational experts that became investors, and said, "Okay, we're going to dive back in "and actually run the business." >> And here's why. So we obviously see a lot of companies as investors, as they go out and look for funding. There are three things that come together very rarely. One is a massive market opportunity combined with the second, which is the right product to serve that opportunity. But the third is pure luck, timing. (Dave chuckles) It's timing. And timing, you know, it's a very very challenging thing to try to predict. You can get lucky and get it right, but then again, it's luck. This had all three. It was the absolute perfect time. And it's largely because of what you described, the 10 years of time that had elapsed, where people had sort of run the experiment and were not going to get fooled again by how easy this supposed to be by just getting one piece or the other. They recognized that they need to take this holistic approach and deploy something as an enterprise-wide platform. >> Yeah, I mean, you talk about a large market, I don't even know how you do a TAM, what's the TAM? It's data. (laughs) You know, it's the data universe, which is just, you know, massive. So, I have to ask you a question as an investor. I think you've raised, what 50 million, is that right? >> We've raised 50 million. The last round was led by NEA. >> Right, okay. You got great investors, hefty amount. Although, you know, in this day and age, you know, you're seeing just outrageous amounts being raised. Software obviously is a capital efficient business, but today you need to raise a lot of money for promotion, right, to get your name out there. What's your thoughts on, as a Silicon Valley investor, as this wave, I mean, get it while you can, I guess. You know, we're in the 10th year of this boom market. But your thoughts? >> You're asking me to put on my other hat. (Dave laughs) I think companies have, in general, raised too much money at too high a value too fast. And there's a penalty for that. And the down round IPO, which has become fashionable these days, is one of those penalties. It's a clear indication. Markets are very rational, public markets are very rational. And the pricing in a public market, when it's significantly below the pricing of in a private market, is telling you something. So, we are a little old-fashioned in that sense. We believe that a company has to lay down the right foundation before it adds fuel to the mix and grows. You have to have evidence that the machinery that you build, whether it's for sales, or marketing, or other go-to-market activities, or even product development, is working. And if you do not see all of those signs, you're building a very fragile company. And adding fuel in that setting is like flooding the carburetor. You don't necessarily go faster. (laughs) You just-- >> Consume more. >> You consume more. So there's a little bit of, perhaps, old-fashioned discipline that we bring to the table. And you can argue against it. You can say, "Well, why don't you just raise a lot of money, "hire a lot of sales guys, and hope for the best?" >> See what sticks? (laughs) >> Yeah. We are fully expecting to build a large institution here. And I use that word carefully. And for that to happen, you need the right foundation down first. >> Well, that resonates with us east coast people. So, Buno, thanks very much for comin' on theCUBE and sharing with us your perspectives on the marketplace. And best of luck with InfoWorks. >> Thank you, Dave. This has been a pleasure. Thank you for having me here. >> All right, we'll be watching, thank you. And thank you for watching, everybody. This is Dave Vellante for theCUBE. We'll see ya next time. (upbeat music fades out)

Published Date : Jan 14 2020

SUMMARY :

From the SiliconANGLE media office and simplify the process to adjust, synchronize, transform, and successes that we can now build on, that they need to transform their customer experience So I got to ask you, what's the difference and it needs to be able to seamlessly traverse on-premise, and other skills that they need to develop, right? they have the ability to rapidly launch analytics use cases is going to be much better than their competition. for the rest of the organization to use. Why is it that the cloud sort of in and of itself So agility is the goal. and that operating system to deliver this agility I talked about the data pipeline a little bit. All of this has to be managed. And you certainly saw this in the, not early days, the need to ingest them from different clouds, on-prem, Yeah, so I'm going to stay away from the word panacea, That's good, that means we got a good roadmap And the solution has to be guided by three principles. So somebody had to sit back and say, and kind of where you fit. And that has, you know, absolute truth in it, You going to use them as feeders to your digital platform. But for the layer on top, you need to think differently. Take a customer who's got, you know, on-prem, And I'm happy to give you some examples on how that works. I mean, maybe you could share some customer examples? So, let me talk about Macy's. And building it isn't the end game. So to do that in a period of 12 months is unheard of. And the key to competing with digital disrupters you got data all over the place. And then they progress from there. So this is, you know, big data 2.0, and the level of abstraction that you need, And one of the findings was (chuckles) And you look at surveys from, well, pick your favorite. I mean, a lot of the data companies have pivoted to AI. and I believe these are going to be powerful game changers. And the cloud brings, as you point out, that gives you a great degree of automation, and feeding it to it. 'Cause if the answer You need to be able to do that not in six months, Can you give us the status, you know, where you're at? And in July of last year, I joined as CEO. Yeah, so you guys were operational experts And it's largely because of what you described, So, I have to ask you a question as an investor. The last round was led by NEA. right, to get your name out there. You have to have evidence that the machinery that you build, And you can argue against it. And for that to happen, And best of luck with InfoWorks. Thank you for having me here. And thank you for watching, everybody.

ENTITIES

Entity	Category	Confidence
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Dave	PERSON	0.99+
David Dorman	PERSON	0.99+
Google	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Zynga	ORGANIZATION	0.99+
Cisco	ORGANIZATION	0.99+
January 2020	DATE	0.99+
Ned Hooper	PERSON	0.99+
Amar Arsikere	PERSON	0.99+
six months	QUANTITY	0.99+
Palo Alto	LOCATION	0.99+
2020	DATE	0.99+
six	QUANTITY	0.99+
AT&T	ORGANIZATION	0.99+
Buno	PERSON	0.99+
Centerview Capital	ORGANIZATION	0.99+
Ned	PERSON	0.99+
Nexus Venture Partners	ORGANIZATION	0.99+
third partner	QUANTITY	0.99+
2011	DATE	0.99+
80%	QUANTITY	0.99+
10 years	QUANTITY	0.99+
12 months	QUANTITY	0.99+
two partners	QUANTITY	0.99+
55%	QUANTITY	0.99+
70	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
50 years	QUANTITY	0.99+
six days	QUANTITY	0.99+
thousands	QUANTITY	0.99+
first application	QUANTITY	0.99+
one piece	QUANTITY	0.99+
10th year	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
InfoWorks	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
nine months	QUANTITY	0.99+
50 million	QUANTITY	0.99+
two funds	QUANTITY	0.99+
Buno Pati	PERSON	0.99+
third	QUANTITY	0.99+
three things	QUANTITY	0.99+
first	QUANTITY	0.99+
431 use cases	QUANTITY	0.99+
Boston	LOCATION	0.99+
Netezza	ORGANIZATION	0.99+
second	QUANTITY	0.99+
two key lessons	QUANTITY	0.99+
One	QUANTITY	0.99+
single	QUANTITY	0.98+
three layers	QUANTITY	0.98+
late 80s	DATE	0.98+
MapR	ORGANIZATION	0.98+
Boston, Massachusetts	LOCATION	0.98+
dozens	QUANTITY	0.98+
three principles	QUANTITY	0.98+
10x	QUANTITY	0.98+
one	QUANTITY	0.98+
second thing	QUANTITY	0.98+
17%	QUANTITY	0.98+
2010	DATE	0.97+
first 10 years	QUANTITY	0.97+
Cloudera	ORGANIZATION	0.97+
today	DATE	0.97+
Gardner	PERSON	0.96+
about 14 months	QUANTITY	0.96+

Greg Tinker, SereneIT | CUBEConversation, November 2019

(upbeat music) >> Hi, and welcome to another Cube Conversation where we speak with thought leaders in depth about the topics that are most important to the overall technology community. I'm Peter Burris, your host. Every business inspires to be a digital business, which is every business, faces a significant challenge. They need to use their data in new and value creating ways. But some of that data is not lending itself to new applications, new uses because it's locked up in formats, in technologies and applications that don't lend themselves to change. That's one of the big challenges that every business faces. What can they do to help unlock, to help liberate their data from older formats and older approaches so they can create new sources of value with it. To have that conversation, we're joined by a great guest. Greg Tinker is the CTO and Founder of SereneIT. Greg, welcome back to theCube. >> Thank you Peter, very appreciate it buddy. >> It's been a long time. This is your first time here with SereneIT so why don't you tell us a little bit about SereneIT. >> Sure, so at a high level we are a technology partner. SereneIT focuses on the next generation model structures of engineering first. There's a lot of VARS, in simplest terms, I would say we're a value at a reseller, sure. But we capitalize and focus just on the VA. Anybody can bring VR. The legacy approach of just being a reseller is no longer valid in our industry. Complexities and trying to have a situation where you can liberate data, try to take it from a legacy entrenched model, process, procedure and go into a new modern IT software defined ecosystem is very complex. And our objective is to make the, enablement of IT serene or simple and that's where SereneIT comes from. >> You know, I love the name but if you go back 20 years as you said, the asset that IT was focused on and took care of was the hardware. >> That's right. >> And we bought the hardware from a reseller, they just made the installations, configurations and what not. But as you said, today we're focused on the data. That's the asset. >> That's correct. >> And just as we used to have challenges uplifting and all the things we had to do with hardware, we're having similar types of challenges when you think about how to apply data to new uses, sustain that asset feature of it but apply it in new ways to create new value. As you talk to customers, what is the problem that you find they're encountering as they try to think what to do with their high value traditional data? So there's actually, I'll call it three strategic problems. Becoming to where it to be a workload optimized model structures or your data driven intelligence, trying to pull something out of the data model, trying to pull something out of the data, make it tangible to the business. And then trying to figure out a way to make it easy to enable the users, that is the employees to do something with the data the have. Making it more of a cloud-centric approach. Everybody wants that easy button now. So at a high level, trying to make that a possibility is where we spend our time today. And give you a quick example of that would be legacy block storage. We do a lot in the storage world. And we focus on software defined storage apparatus or solutions. So a lot of our clients are kind of mired down with legacy block, via Fibre Channel basics that were great for their era. But today with cost being a big factor in trying to be able to leverage an ecosystem where I can take my data, wherever my data sits and leverage it on multiple different apparatuses, be it BlueData, be in Kubernetes, be it name your favorite Docker solution. Trying to be able to use that in an ecosystem in a software defined hyper cloud, doing that on a legacy block is very problematic. And that's where we help customers transition from that legacy mindset, legacy IT infrastructure into a more of a modern software defined data program. >> So what's talk about that. Because there's a more modern technology, but really what they're doing is they're saying, look I've got this data, using these protocols like Fibre Channel with these applications and it's doing its job. >> That's right. >> But I want to create options on how I might use that data in the future, options that aren't available to me or aren't available to my business if it stays locked inside Fibre Channel for example. >> That's correct. >> So what you're really doing, is you're giving them paths to new options with their data that can be sustained whatever the technology is. Have I got that right? >> In a nutshell, Frank I would agree with your sentiment on that, your comment is spot on. We take customers data, we look at the business as a whole. And we focus on, what is the core of the business? Be it, maybe it's a High-Performance Computing Cluster Maybe it's a Oracle, Cyrus, Informant name your favorite data base structure. Maybe it's MapR, maybe it's a Dupe. We look at the business and determine, how are we using that data? How much data do we need? What's my data working set size? Understanding that and then we actually would design a solution that will be a software defined ecosystem that we can move that data in. And nine times out of 10 we can do it on the fly. Rarely, rarely ever do we have an outage to do it. Or that might be a small few minute outage window when we do a cut over, where we keep everything in mirroring Lockstep . >> Well that's one of the beauties of software defined is that you have those kinds of flexibilities. >> That's right. >> But think of, so talk to me a little bit about the you are, the customer realizes they have a problem. They find you guys. >> Sure. >> So how do they find you? >> So we do a lot with large scale Fortune 50, Fortune 100, the large scale enterprise businesses. And we do that with our, we're known in the engineering world, big accounts, because of our backdrop in HP engineering. And so HP brings us a lot into these accounts to help them solve a big business problem. So that's how a lot of our customers are finding us today. We are reaching out with media, like theCube here to talk to clients about the fact that we do exist and that we exist to help them consume a more modern IT in footprint. To help them go from that legacy model into that more modern model. >> Okay, so the customer realizes they have a problem, HPE and others, help identify you guys, matches you together. You show up, how do you work with the customer? Is it your big brains and the customer passive? Or you're working side by side to help them accelerate their journey? >> We find it best that we do it in a cohesive manner. We sit down and have a long discussion with their, usually their Chief Executive officer, their CTO, Chief Technology Officer, we'll sit down and talk about the business constraints. And then we'll go down to the directors the guys on the front lines that see the problems on a day to day basis. And we look at where their constraints are. Is it performance, IOP driven. Nine times out of 10, those problems are no longer there. They were solved years ago. Today it's more about the legacy model of, let me log a ticket to stand up a new virtual machine to a SQL database to do this application. So I've logged the ticket, a week two later I finally get a virtual machine. And now I got to get five more teams engaged, I get it online. Total business takes about a month to get some new apparatus up. Where if we go into a software defined ecosystem where we have these playbooks and this model written for the business, we can do that in 10 minutes. Be it on Nutanix do it with SimpliVity, VMware models, we don't' differentiate that. We let the customer tell us which one they use. 'Cause everybody has their liking. Be it some are VMware shop, some are Hyper-V, some are KBM. We do all of them. >> But the point is you want to help them move form an old world that was focused on executing the tasks associate with bringing the system up to a new world that's focused on the resources being able to configure themselves, being able to bring to bring themselves up test themselves in a software defined manner introducing some of those DevOps processes. Whatever the technology is, they have the people and the process to execute the technology. >> That's exactly right, because the technology in a nutshell. If you look at just technology itself that's not the hard part. Not for us anyway, 'cause we're an engineering team that's what we do well. The data driven intelligence stuff and helping customers bring more value out of their data. We can help them with that and show them exactly how we would do it. Be it a different technologies and stuff and we'll get into that discussion later. But the biggest problem we see is the people and processes which you just mentioned. Pushing the button, achieve an objective. That is where the old way of being very ticket driven Siloed approach, really slow down the economics of business. Was a huge driving force of not achieving the ROI that you actually set out to do years ago. Where we have one client that has a little over 4,000 servers and how my team and I explain it to the clients. Come out to the Golden Gate bridge. January 1 you start painting. December 30th you're done painting and January 1 you start painting again. You never get done. It's always getting painted. Patching of these large scale enterprises is the exact same way. You can't patch all the servers on a Saturday. You can't patch three thousand machines, BIOS, firmware, the list goes on. What we do for them is we actually put in an apparatus engine, basically an automation engine and instead of an army of 10 people doing firmware or BIOS and all the stuff updates, we automated 100% of that entire process. That's what SereneIT does. Help a customer take a, could be a legacy model, bare metal machine and show them how we can automate the bare metal machine. We can do the exact same thing in any hypervisor on the planet today. >> So that it's done faster, simpler. The outcome is more predictable. The result is more measurable. >> Yes. >> That's really great stuff. Let's go back to this notion of data because we kind of started with this idea of data and having to evolve the formats increasing the flexibility of it's utilization. We talked about hypervisors and all that technology is kind of sucking it forward, bringing that data forward making it possible to do things with it, but still the data itself is a major challenge. How are you working with customers to get them to envision the new data world independent of some of these other technologies? >> Sure, okay. So yeah, we have clients right now, we have (mumbled) systems these are global file systems that have enormous amount of data in it, some of it is compiled code logics for drivers and firmware and Kernel code structure that are forthcoming technologies that aren't even released yet. We have clients that have data based structure with ascii text is very common road driven. We have customers that have flat ascii files that are just flat text files. So we help the customers grab data from that existing data footprint for new lines of business. Determine what are we touching, how are we touching and how often are we touching it and why are we touching it? Case in point, when you have a large manufacturer doing chip design and your looking at a global file system you're trying to give assertation data as to what drivers are our developers working on most frequently. In the medical community, we have a client we're working on at global scale, we're doing real time data analytics to figure out if we're doing SQL injection from a hacker. So we show them exactly how we can do this in an inline driver stack and show them how to do it with the technology reducing their actual CapEx expend. There's legacy tools out there that work great. You know one of these is like, I won't give names of product and stuff, but there's a lot of cool technologies that's been around for a long time. >> That works. >> That works. >> And it just needs a smart person, or a smart team to put it together so it can be applied. >> That's what we've been doing with our clients is trying to show them that we can take the data that you have, be it flat ascii files or binary data structures. And we can show them that we can give you data analytics and pull that back. We have another client in law industry that we manage worldwide and we do e-discovery. On trying to figure out phrases and things that are maybe concerning to them in a financial world that is the global market. And we're able to give them that data structure on their own intellectual property and we give that to them in real time. We give them a dashboard so they can log in to the dashboard and they can see real time data transparency at a moments notice, so they can tell what the market is doing in Britain or they can tell what the market is doing in Singapore or U.S. by just looking at a dashboard and we're pulling data back. And we're pulling it from outside of world data points, this could be Facebook. Real time feeds, news, media and we pull it from internal data feeds. Email transactions that are going from their financial, they have like CIO's the Chief Investment Officers. Most people think of that as an information officer, right? So we're able to pull data from that and show them that they have a great deal of intellectual property at their fingertips that honestly they've never used before and that's what we're helping customers do today. >> Greg Tinker, Founder, CTO SereneIT. Thanks so much for being on theCube. >> Thank you very much Peter. >> And once again want to thank you for listening to this Cube Conversation. Until next time. (upbeat music)

Published Date : Nov 6 2019

SUMMARY :

that don't lend themselves to change. so why don't you tell us a little bit about SereneIT. And our objective is to make the, enablement of IT You know, I love the name but if you go back And we bought the hardware from a reseller, to do something with the data the have. with these applications and it's doing its job. options that aren't available to me to new options with their data that can be sustained that we can move that data in. is that you have those kinds of flexibilities. about the you are, the customer realizes and that we exist to help them consume Okay, so the customer realizes they have a problem, We find it best that we do it in a cohesive manner. and the process to execute the technology. But the biggest problem we see is the people So that it's done faster, simpler. and having to evolve the formats increasing In the medical community, we have a client to put it together so it can be applied. And we can show them that we can give you data analytics Thanks so much for being on theCube. And once again want to thank you for listening

ENTITIES

Entity	Category	Confidence
David Nicholson	PERSON	0.99+
Chris	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Joel	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Peter	PERSON	0.99+
Mona	PERSON	0.99+
Dave Vellante	PERSON	0.99+
David Vellante	PERSON	0.99+
Keith	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Jeff	PERSON	0.99+
Kevin	PERSON	0.99+
Joel Minick	PERSON	0.99+
Andy	PERSON	0.99+
Ryan	PERSON	0.99+
Cathy Dally	PERSON	0.99+
Patrick	PERSON	0.99+
Greg	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Stephen	PERSON	0.99+
Kevin Miller	PERSON	0.99+
Marcus	PERSON	0.99+
Dave Alante	PERSON	0.99+
Eric	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Dan	PERSON	0.99+
Peter Burris	PERSON	0.99+
Greg Tinker	PERSON	0.99+
Utah	LOCATION	0.99+
IBM	ORGANIZATION	0.99+
John	PERSON	0.99+
Raleigh	LOCATION	0.99+
Brooklyn	LOCATION	0.99+
Carl Krupitzer	PERSON	0.99+
Lisa	PERSON	0.99+
Lenovo	ORGANIZATION	0.99+
JetBlue	ORGANIZATION	0.99+
2015	DATE	0.99+
Dave	PERSON	0.99+
Angie Embree	PERSON	0.99+
Kirk Skaugen	PERSON	0.99+
Dave Nicholson	PERSON	0.99+
2014	DATE	0.99+
Simon	PERSON	0.99+
United	ORGANIZATION	0.99+
Stu Miniman	PERSON	0.99+
Southwest	ORGANIZATION	0.99+
Kirk	PERSON	0.99+
Frank	PERSON	0.99+
Patrick Osborne	PERSON	0.99+
1984	DATE	0.99+
China	LOCATION	0.99+
Boston	LOCATION	0.99+
California	LOCATION	0.99+
Singapore	LOCATION	0.99+

Patrick Osborne, HPE | CUBE Conversation, September 2019

>> From the SiliconANGLE media office, in Boston, Massachusetts, it's theCUBE. Now, here's your host, Stu Miniman. >> Hi, I'm Stu Miniman, and welcome to a Cube Conversation here in our new Boston area studio, happy to welcome back to the program a VIP from our community, Patrick Osborne, who's the Vice President and General Manager for big data and secondary storage at Hewlett Packard Enterprise, Patrick, great to talk to you. >> Great to be back, thanks, Stu. >> All right, we're talking about the big thing, hundredth year of the NFL kicking off here. Maybe we're talking a little bit about the changing role of infrastructure and, we've been talking about it at the Wikibon team for a number of years, data is at the center of the universe today, when we talk about IT and businesses and what they're thinking about, and in some ways everything's changed, and in other ways it feels like I go to some of these shows and the people that have even more experience than me are like "Oh, geez, we've recreated the mainframe." So, we're fresh off of VMworld, you skipped the show this year, but I know HPE had a large presence at the show, and let me start there, I guess, we look at data centers and cloud, and the mission VMware has is how do they maintain relevant as customers are changing their applications? They just made billions of dollars of acquisitions to be more in the cloud native environment, so when you look at, HPE's very well known in the infrastructure space, had some changes as to what pieces are in the company versus partnered with the company, so when you talk to your customers and they're changing what I call the long pole in the tent of modernization, it's the applications. Where are they today, where are some of the areas that they're doing well, and where are the areas that it's challenging and struggling? >> Yeah, so I'd say from an HPE perspective, we've made a number of investments as well, over the last couple years, both inorganic and organic investments in the space, and I think that even though we've historically been known as an infrastructure company, we're very quickly pivoting towards being known as an enterprise workload company, and so for my perspective, the things that we're trying to do, especially in our division around AI and ML and analytics is being able to provide a platform for customers, especially application developers. I think when we talk about how the world is changing, the buyer personas people were selling to now have completely drastically changed, right? There's no more dedicated backup teams, there's rarely now dedicated storage teams, maybe only in very large organizations, and so now you're catering to a different set of folks, and for instance, over the last two or three years, we've seen the advent of folks like a chief data officer, the CDO, data scientists, data engineers, and so for us, we have a whole new buyer persona and user persona that we not only have to cater to in our UX design, but also present the value, which is a much different conversation than we've had in the past. >> Yeah, you know I actually had a number of conversations with customers at the VMworld show, and they talked about, organizationally they often still have hardware-defined roles, yet they live in a software-defined world. So, even groups that are like "I still have some storage headcount and some "networking headcount," but virtualization and cloud are slowly eating over pieces it, but there's still some turf battles, which I was sad to hear because, I've worked for the last couple of decades to try to eliminate silos and get people working together, so we know those organizational changes often take even longer than the long cycles of technology that we're trying to roll out here. You mentioned some of the big data pieces, and yeah, HPE's made a number of acquisitions. Most recently MapR. Wonder if you could help us connect the dots. When we covered heavily, the big data wave, and Dave Vellante would say, "Look, the people that "deploy these technologies, the end users will create "way more value than the distributions of Hadoop will." When we did our forecast, they were there, but the promise of big data was, data was going to go from that burden, how do I keep it, how do I maintain it, how do I back it up, to new value for the company, new revenue that we could have along that where, and whether or not that happened often mattered on deployment, but when you go into the AI space, like what you're doing with BlueData, is that a continuation of what we were seeing with the big data space, is there some new waves that are drastically changing the outcomes in what we're seeing, how does that all fit together? >> Yeah, so I mean I think it's definitely an extension of all these things are creative and incremental at the end of the day. I think some of the things around how people are operationalizing AI and ML are pretty unique, and so from our perspective, we made some investments around BlueData, and we've had some recent product announcements in that area around helping folks operationalize machine learning, which is, at this point it's becoming very real and people are putting it into a number of different use cases, and then to come along with that, the need to store data, right, so we talk about this often, which nobody talks about storage anymore, everyone talks about data, right? The need to store all this data that's coming in in a persistent data layer is super important, more important than it ever was, and it comes in multiple different forms and multiple different factors, and also protocols. So, to have a data platform that is very scalable, has enterprise resiliency to it, the ability to take data and manifest it in different ways, right, is important for that entire ecosystem, we felt that MapR was a great platform, they have a great data platform, that started with Hadoop, moved into supporting things like streams, Kafka and Spark, and then certainly now have been shifted into a Kubernetes and container deployment, and then mapping their file system and their database and streams to servicing AI and ML workloads, so it's kind of along the same vein and being able to live in that world that you're still separating compute and storage, and being able to scale those independently, but work together from a security perspective, I think it's really important. >> One of the boundaries that I've always been fascinated with is, some of the underlying components that we're changing, so when we rolled out virtualization, the whole storage and networking industry had to work to kind of put the pieces back together as we took advantage of that. You mentioned Kubernetes, at the KubeCon show, there's lots of that same plumbing things that need to be understand and work. But on the other hand, we've seen a massive transformation in the database market. 10 years ago, everybody had one database to rule them all, and now most companies we talk to, it's like "Oh, well I've got lots of little databases "and now pulling them together differently." But that boundary between what's happening at the infrastructure layer and what's happening at the application layer. On the one hand, they seem to be pulling apart, you know, I should just be able to use cloud or serverless and makes it easy, but on the same hot time, you're talking everybody's like "I've got the best infrastructure for your AI deployment," so can you talk a little bit about some of the hard challenges that HPE's looking at solving, what do you look to actually create, whether that be a box or a service or some offering, cause I know HPE has lots of different areas that you look at those solutions. >> Yeah, we're trying to, when we go and have a successful deployment at our customers, and we have deployments in most verticals, right, in the Fortune 500 Global 2000, whether it's financial services, automotive, manufacturing, you can name healthcare, right? I think what we've seen is that the successful deployments are the ones that bring together the application owner's line of business, even the data scientist engineers, along with the infrastructure folks, right? Think, sometimes they're at odds. And so when you can bring together a platform that at the end of the day is going to provide something, right, as a service, it's either an analytic sandbox, big data and analytics as a service, AI as a service, right, there is a set of folks that are trying to service a number of application developers and data scientists internally, that's a platform that can have a uniform data structure, where you can grab all this data and have access to it securely, and be able to deploy your workflow on top of that, in a virtualized, multitenant way, deployed in containers with the toolsets and the applications that they want to have access to, but not have to deal with the infrastructure, right? And then that can be the providence of the CIO and the data center team and the infrastructure folks working with those teams, and that's where we've seen the magic happen for successful deployments, and those are the ones that, they end up growing and scaling very quickly, and they can be deployed on-prem, they can be deployed, we have some of our pilots and POCs that start completely in the cloud and then come back on prem for different reasons, security, data locality, governance, what have you, but it provides the flexibility, but I think what we found is that, taking an outcome and a services-based approach that bring everyone to the table, that's where we see the projects really get a big business benefit for our customers. >> You know, I was having a conversation earlier today, and when we watch the adoption of virtualization, it's been almost 20 years now, since most people are doing it. When we'd reached about 10 years in, we felt that most people were doing it and were on their journey, but things like converged and hyperconverged infrastructure really helped accelerate us past that kind of early majority into the late majority because it was the simplicity of that offering. We wonder, are we reaching some of that same point when we look at cloud, and when I say cloud, not just public cloud, but what we're doing in private, where the hybrid, multicloud mixup that we have, because while cloud is definitely real and here to stay, I don't think anybody would really say that cloud, circa 2019, is easy. So, how does HPE and its partners, how do we make it even easier so that customers can move down that journey to modernize themselves even more, and get out of what we call that undifferentiated heavy lifting? >> Yeah, so definitely want to avoid the undifferentiated heavy lifting, because that's certainly a weight on many organizations, and so what we are trying to provide is a platform that increases customers' time to value, and by providing, by abstracting a lot of difficult things. I mean there's a lot of data gravity in this space, right, you're talking about, we have projects right now for autonomous cars where they ingest two, five, 10 petabytes a day, for example, and it's not, it's very difficult to migrate and move that data, right, so you want to be able to bring that data in, tap into it securely, there's a lot of networking that goes on that's very difficult from a security perspective as well as multitenancy and making sure that that model is set up correctly. So for us, it's all about providing a platform that can service multiple tenants and multiple organizations that are all using similar toolsets at the end of the day, but you can have your specific data scientists and data engineers operating on a platform that they don't have to worry about infrastructure. Right, cause at the end of the day, when we go visit those folks who own those applications, oftentimes they don't want to deal with, "I need to go request in a VM, "I need to go request a block of IP addresses, "I need some LUNs for my storage, "I need a server deployment to run bare metal," you know, some bare metal tooling. They really want to establish a service, just like we saw with virtualization, and so right now it's sort of the fight for, how can I make my infrastructure as invisible as possible and fight for the eyeballs of the developers? >> Great. Want to just give you the final word, Patrick, what's exciting you, kind of second half of the year, things you're looking forward to? >> Yeah, so the things that excite me is certainly customer acquisition, right, we've been marching along that very quickly with some of these new acquisitions and some of the net new development we've done within HPE, I think that the, we've got a lot of stuff cooking with Kubernetes in that area, and so we'll make some big announcements at KubeCon, and that's always very exciting to talk in these new ecosystems, and speaking of ecosystems, we're establishing, I think there's new ecosystems that are forming in the market, especially around AI and ML, it's still a very nascent market, and so we're bringing on new partners every week from an application development perspective, and so for me it's really exciting to talk with all these new apps, these new tool chains, new toolsets, libraries, algorithms, and I think it's really exciting to kind of move up stack and be in this very cool world of application development. >> I know when I see the market landscape of some of the AI space, you need to have a big monitor or be able to zoom in, cause there's a lot of players, there's a lot of pieces, we always worry about things like API sprawls and the like, but absolutely super exciting space, Patrick Osborne, thanks so much for giving us an update on what's happening, especially how AI is driving a lot of new innovation in the area. >> Yeah, very exciting, thanks for having me. >> All right, Patrick Osborne with HPE, and I'm Stu Miniman, thanks as always for joining theCUBE.

Published Date : Sep 5 2019

SUMMARY :

From the SiliconANGLE media office, and secondary storage at Hewlett Packard Enterprise, and the people that have even more experience than me and for instance, over the last two or three years, that are drastically changing the outcomes and being able to live in that world that you're still On the one hand, they seem to be pulling apart, and the applications that they want to have access to, and when we watch the adoption of virtualization, and so right now it's sort of the fight for, Want to just give you the final word, Patrick, and so for me it's really exciting to talk with of some of the AI space, you need to have a big monitor and I'm Stu Miniman, thanks as always for joining theCUBE.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Patrick Osborne	PERSON	0.99+
Patrick	PERSON	0.99+
September 2019	DATE	0.99+
Stu Miniman	PERSON	0.99+
Boston	LOCATION	0.99+
HPE	ORGANIZATION	0.99+
two	QUANTITY	0.99+
five	QUANTITY	0.99+
Stu	PERSON	0.99+
Hewlett Packard Enterprise	ORGANIZATION	0.99+
VMware	ORGANIZATION	0.99+
KubeCon	EVENT	0.99+
BlueData	ORGANIZATION	0.99+
Hadoop	TITLE	0.98+
10 years ago	DATE	0.97+
Kafka	TITLE	0.97+
Boston, Massachusetts	LOCATION	0.97+
both	QUANTITY	0.97+
this year	DATE	0.97+
VMworld	ORGANIZATION	0.97+
one database	QUANTITY	0.97+
Wikibon	ORGANIZATION	0.96+
almost 20 years	QUANTITY	0.96+
about 10 years	QUANTITY	0.96+
Spark	TITLE	0.96+
today	DATE	0.95+
MapR	ORGANIZATION	0.9+
hundredth year	QUANTITY	0.9+
Cube	ORGANIZATION	0.9+
MapR	TITLE	0.89+
One	QUANTITY	0.89+
circa 2019	DATE	0.85+
Kubernetes	TITLE	0.83+
10 petabytes a day	QUANTITY	0.83+
billions of dollars	QUANTITY	0.82+
three years	QUANTITY	0.81+
last couple years	DATE	0.79+
NFL	EVENT	0.79+
last couple	DATE	0.78+
one	QUANTITY	0.73+
second half	QUANTITY	0.71+
lot of pieces	QUANTITY	0.69+
Fortune 500 Global 2000	TITLE	0.68+
Kubernetes	PERSON	0.67+
wave	EVENT	0.66+
earlier today	DATE	0.64+
SiliconANGLE	ORGANIZATION	0.63+
lot of players	QUANTITY	0.62+
Vice President	PERSON	0.62+
decades	DATE	0.62+
every week	QUANTITY	0.56+
HPE	TITLE	0.46+
last	DATE	0.39+

Colin Mahony, Vertica | MIT CDOIQ 2019

>> From Cambridge, Massachusetts, it's theCUBE, covering MIT Chief Data Officer and Information Quality Symposium 2019, brought to you by SiliconANGLE Media. >> Welcome back to Cambridge, Massachusetts everybody, you're watching The Cube, the leader in tech coverage. My name is Dave Vellante here with my cohost Paul Gillin. This is day one of our two day coverage of the MIT CDOIQ conferences. CDO, Chief Data Officer, IQ, information quality. Colin Mahoney is here, he's a good friend and long time CUBE alum. I haven't seen you in awhile, >> I know >> But thank you so much for taking some time, you're like a special guest here >> Thank you, yeah it's great to be here, thank you. >> Yeah, so, this is not, you know, something that you would normally attend. I caught up with you, invited you in. This conference has started as, like back office governance, information quality, kind of wonky stuff, hidden. And then when the big data meme took off, kind of around the time we met. The Chief Data Officer role emerged, the whole Hadoop thing exploded, and then this conference kind of got bigger and bigger and bigger. Still intimate, but very high level, very senior. It's kind of come full circle as we've been saying, you know, information quality still matters. You have been in this data business forever, so I wanted to invite you in just to get your perspectives, we'll talk about what's new with what's going on in your company, but let's go back a little bit. When we first met and even before, you saw it coming, you kind of invested your whole career into data. So, take us back 10 years, I mean it was so different, remember it was Batch, it was Hadoop, but it was cool. There was a lot of cool >> It's still cool. (laughs) projects going on, and it's still cool. But, take a look back. >> Yeah, so it's changed a lot, look, I got into it a while ago, I've always loved data, I had no idea, the explosion and the three V's of data that we've seen over the last decade. But, data's really important, and it's just going to get more and more important. But as I look back I think what's really changed, and even if you just go back a decade I mean, there's an insatiable appetite for data. And that is not slowing down, it hasn't slowed down at all, and I think everybody wants that perfect solution that they can ask any question and get an immediate answers to. We went through the Hadoop boom, I'd argue that we're going through the Hadoop bust, but what people actually want is still the same. You know, they want real answers, accurate answers, they want them quickly, and they want it against all their information and all their data. And I think that Hadoop evolved a lot as well, you know, it started as one thing 10 years ago, with MapReduce and I think in the end what it's really been about is disrupting the storage market. But if you really look at what's disrupting storage right now, public clouds, S3, right? That's the new data league. So there's always a lot of hype cycles, everybody talks about you know, now it's Cloud, everything, for maybe the last 10 years it was a lot of Hadoop, but at the end of the day I think what people want to do with data is still very much the same. And a lot of companies are still struggling with it, hence the role for Chief Data Officers to really figure out how do I monetize data on the one hand and how to I protect that asset on the other hand. >> Well so, and the cool this is, so this conference is not a tech conference, really. And we love tech, we love talking about this, this is why I love having you on. We kind of have a little Vertica thread that I've created here, so Colin essentially, is the current CEO of Vertica, I know that's not your title, you're GM and Senior Vice President, but you're running Vertica. So, Michael Stonebreaker's coming on tomorrow, >> Yeah, excellent. >> Chris Lynch is coming on tomorrow, >> Oh, great, yeah. >> we've got Andy Palmer >> Awesome, yeah. >> coming up as well. >> Pretty cool. (laughs) >> So we have this connection, why is that important? It's because, you know, Vertica is a very cool company and is all about data, and it was all about disrupting, sort of the traditional relational database. It's kind of doing more with data, and if you go back to the roots of Vertica, it was like how do you do things faster? How do you really take advantage of data to really drive new business? And that's kind of what it's all about. And the tech behind it is really cool, we did your conference for many, many years. >> It's coming back by the way. >> Is it? >> Yeah, this March, so March 30th. >> Oh, wow, mark that down. >> At Boston, at the new Encore Hotel. >> Well we better have theCUBE there, bro. (laughs) >> Yeah, that's great. And yeah, you've done that conference >> Yep. >> haven't you before? So very cool customers, kind of leading edge, so I want to get to some of that, but let's talk the disruption for a minute. So you guys started with the whole architecture, MPP and so forth. And you talked about Cloud, Cloud really disrupted Hadoop. What are some of the other technology disruptions that you're seeing in the market space? >> I think, I mean, you know, it's hard not to talk about AI machine learning, and what one means versus the other, who knows right? But I think one thing that is definitely happening is people are leveraging the volumes of data and they're trying to use all the processing power and storage power that we have to do things that humans either are too expensive to do or simply can't do at the same speed and scale. And so, I think we're going through a renaissance where a lot more is being automated, certainly on the Vertica roadmap, and our path has always been initially to get the data in and then we want the platform to do a lot more for our customers, lots more analytics, lots more machine-learning in the platform. So that's definitely been a lot of the buzz around, but what's really funny is when you talk to a lot of customers they're still struggling with just some basic stuff. Forget about the predictive thing, first you've got to get to what happened in the past. Let's give accurate reporting on what's actually happening. The other big thing I think as a disruption is, I think IOT, for all the hype that it's getting it's very real. And every device is kicking off lots of information, the feedback loop of AB testing or quality testing for predictive maintenance, it's happening almost instantly. And so you're getting massive amounts of new data coming in, it's all this machine sensor type data, you got to figure out what it means really quick, and then you actually have to do something and act on it within seconds. And that's a whole new area for so many people. It's not their traditional enterprise data network warehouse and you know, back to you comment on Stonebreaker, he got a lot of this right from the beginning, you know, and I think he looked at the architectures, he took a lot of the best in class designs, we didn't necessarily invent everything, but we put a lot of that together. And then I think the other you've got to do is constantly re-invent your platform. We came out with our Eon Mode to run cloud native, we just got rated the best cloud data warehouse from a net promoter score rating perspective, so, but we got to keep going you know, we got to keep re-inventing ourselves, but leverage everything that we've done in the past as well. >> So one of the things that you said, which is kind of relevant for here, Paul, is you're still seeing a real data quality issue that customers are wrestling with, and that's a big theme here, isn't it? >> Absolutely, and the, what goes around comes around, as Dave said earlier, we're still talking about information quality 13 years after this conference began. Have the tools to improve quality improved all that much? >> I think the tools have improved, I think that's another area where machine learning, if you look at Tamr, and I know you're going to have Andy here tomorrow, they're leveraging a lot of the augmented things you can do with the processing to make it better. But I think one thing that makes the problem worse now, is it's gotten really easy to pour data in. It's gotten really easy to store data without having to have the right structure, the right quality, you know, 10 years ago, 20 years ago, everything was perfect before it got into the platform. Right, everything was, there was quality, everything was there. What's been happening over the last decade is you're pumping data into these systems, nobody knows if it's redundant data, nobody knows if the quality's any good, and the amount of data is massive. >> And it's cheap to store >> Very cheap to store. >> So people keep pumping it in. >> But I think that creates a lot of issues when it comes to data quality. So, I do think the technology's gotten better, I think there's a lot of companies that are doing a great job with it, but I think the challenge has definitely upped. >> So, go ahead. >> I'm sorry. You mentioned earlier that we're seeing the death of Hadoop, but I'd like you to elaborate on that becuase (Dave laughs) Hadoop actually came up this morning in the keynote, it's part of what GlaxoSmithKline did. Came up in a conversation I had with the CEO of Experian last week, I mean, it's still out there, why do you think it's in decline? >> I think, I mean first of all if you look at the Hadoop vendors that are out there, they've all been struggling. I mean some of them are shutting down, two of them have merged and they've got killed lately. I think there are some very successful implementations of Hadoop. I think Hadoop as a storage environment is wonderful, I think you can process a lot of data on Hadoop, but the problem with Hadoop is it became the panacea that was going to solve all things data. It was going to be the database, it was going to be the data warehouse, it was going to do everything. >> That's usually the kiss of death, isn't it? >> It's the kiss of death. And it, you know, the killer app on Hadoop, ironically, became SQL. I mean, SQL's the killer app on Hadoop. If you want to SQL engine, you don't need Hadoop. But what we did was, in the beginning Mike sort of made fun of it, Stonebreaker, and joked a lot about he's heard of MapReduce, it's called Group By, (Dave laughs) and that created a lot of tension between the early Vertica and Hadoop. I think, in the end, we embraced it. We sit next to Hadoop, we sit on top of Hadoop, we sit behind it, we sit in front of it, it's there. But I think what the reality check of the industry has been, certainly by the business folks in these companies is it has not fulfilled all the promises, it has not fulfilled a fraction on the promises that they bet on, and so they need to figure those things out. So I don't think it's going to go away completely, but I think its best success has been disrupting the storage market, and I think there's some much larger disruptions of technologies that frankly are better than HTFS to do that. >> And the Cloud was a gamechanger >> And a lot of them are in the cloud. >> Which is ironic, 'cause you know, cloud era, (Colin laughs) they didn't really have a cloud strategy, neither did Hortonworks, neither did MapR and, it just so happened Amazon had one, Google had one, and Microsoft has one, so, it's just convenient to-- >> Well, how is that affecting your business? We've seen this massive migration to the cloud (mumbles) >> It's actually been great for us, so one of the things about Vertica is we run everywhere, and we made a decision a while ago, we had our own data warehouse as a service offering. It might have been ahead of its time, never really took off, what we did instead is we pivoted and we say "you know what? "We're going to invest in that experience "so it's a SaaS-like experience, "but we're going to let our customers "have full control over the cloud. "And if they want to go to Amazon they can, "if they want to go to Google they can, "if they want to go to Azure they can." And we really invested in that and that experience. We're up on the Amazon marketplace, we have lots of customers running up on Amazon Cloud as well as Google and Azure now, and then about two years ago we went down and did this endeavor to completely re-architect our product so that we could separate compute and storage so that our customers could actually take advantage of the cloud economics as well. That's been huge for us, >> So you scale independent-- >> Scale independently, cloud native, add compute, take away compute, and for our existing customers, they're loving the hybrid aspect, they love that they can still run on Premise, they love that they can run up on a public cloud, they love that they can run in both places. So we will continue to invest a lot in that. And it is really, really important, and frankly, I think cloud has helped Vertica a lot, because being able to provision hardware quickly, being able to tie in to these public clouds, into our customers' accounts, give them control, has been great and we're going to continue on that path. >> Because Vertica's an ISV, I mean you're a software company. >> We're a software company. >> I know you were a part of HP for a while, and HP wanted to mash that in and run it on it's hardware, but software runs great in the cloud. And then to you it's another hardware platform. >> It's another hardware platform, exactly. >> So give us the update on Micro Focus, Micro Focus acquired Vertica as part of the HPE software business, how many years ago now? Two years ago? >> Less than two years ago. >> Okay, so how's that going, >> It's going great. >> Give us the update there. >> Yeah, so first of all it is great, HPE and HP were wonderful to Vertica, but it's great being part of a software company. Micro Focus is a software company. And more than just a software company it's a company that has a lot of experience bridging the old and the new. Leveraging all of the investments that you've made but also thinking about cloud and all these other things that are coming down the pike. I think for Vertica it's been really great because, as you've seen Vertica has gotten its identity back again. And that's something that Micro Focus is very good at. You can look at what Micro Focus did with SUSE, the Linux company, which actually you know, now just recently spun out of Micro Focus but, letting organizations like Vertica that have this culture, have this product, have this passion, really focus on our market and our customers and doing the right thing by them has been just really great for us and operating as a software company. The other nice thing is that we do integrate with a lot of other products, some of which came from the HPE side, some of which came from Micro Focus, security products is an example. The other really nice thing is we've been doing this insource thing at Micro Focus where we open up our source code to some of the other teams in Micro Focus and they've been contributing now in amazing ways to the product. In ways that we would just never be able to scale, but with 4,000 engineers strong in Micro Focus, we've got a much larger development organization that can actually contribute to the things that Vertica needs to do. And as we go into the cloud and as we do a lot more operational aspects, the experience that these teams have has been incredible, and security's another great example there. So overall it's been great, we've had four different owners of Vertica, our job is to continue what we do on the innovation side in the culture, but so far Micro Focus has been terrific. >> Well, I'd like to say, you're kind of getting that mojo back, because you guys as an independent company were doing your own thing, and then you did for a while inside of HP, >> We did. >> And that obviously changed, 'cause they wanted more integration, but, and Micro Focus, they know what they're doing, they know how to do acquisitions, they've been very successful. >> It's a very well run company, operationally. >> The SUSE piece was really interesting, spinning that out, because now RHEL is part of IBM, so now you've got SUSE as the lone independent. >> Yeah. >> Yeah. >> But I want to ask you, go back to a technology question, is NoSQL the next Hadoop? Are these databases, it seems to be that the hot fad now is NoSQL, it can do anything. Is the promise overblown? >> I think, I mean NoSQL has been out almost as long as Hadoop, and I, we always say not only SQL, right? Mike's said this from day one, best tool for the job. Nothing is going to do every job well, so I think that there are, whether it's key value stores or other types of NoSQL engines, document DB's, now you have some of these DB's that are running on different chips, >> Graph, yeah. >> there's always, yeah, graph DBs, there's always going to be specialty things. I think one of the things about our analytic platform is we can do, time series is a great example. Vertica's a great time series database. We can compete with specialized time series databases. But we also offer a lot of, the other things that you can do with Vertica that you wouldn't be able to do on a database like that. So, I always think there's going to be specialty products, I also think some of these can do a lot more workloads than you might think, but I don't see as much around the NoSQL movement as say I did a few years ago. >> But so, and you mentioned the cloud before as kind of, your position on it I think is a tailwind, not to put words in your mouth, >> Yeah, yeah, it's a great tailwind. >> You're in the Amazon marketplace, I mean they have products that are competitive, right? >> They do, they do. >> But, so how are you differentiating there? >> I think the way we differentiate, whether it's Redshift from Amazon, or BigQuery from Google, or even what Azure DB does is, first of all, Vertica, I think from, feature functionality and performance standpoint is ahead. Number one, I think the second thing, and we hear this from a lot of customers, especially at the C-level is they don't want to be locked into these full stacks of the clouds. Having the ability to take a product and run it across multiple clouds is a big thing, because the stack lock-in now, the full stack lock-in of these clouds is scary. It's really easy to develop in their ecosystems but you get very locked into them, and I think a lot of people are concerned about that. So that works really well for Vertica, but I think at the end of the day it's just, it's the robustness of the product, we continue to innovate, when you look at separating compute and storage, believe it or not, a lot of these cloud-native databases don't do that. And so we can actually leverage a lot of the cloud hardware better than the native cloud databases do themselves. So, like I said, we have to keep going, those guys aren't going to stop, and we actually have great relationships with those companies, we work really well with the clouds, they seem to care just as much about their cloud ecosystem as their own database products, and so I think that's going to continue as well. >> Well, Colin, congratulations on all the success >> Yeah, thank you, yeah. >> It's awesome to see you again and really appreciate you coming to >> Oh thank you, it's great, I appreciate the invite, >> MIT. >> it's great to be here. >> All right, keep it right there everybody, Paul and I will be back with our next guest from MIT, you're watching theCUBE. (electronic jingle)

Published Date : Jul 31 2019

SUMMARY :

brought to you by SiliconANGLE Media. I haven't seen you in awhile, kind of around the time we met. It's still cool. but at the end of the day I think is the current CEO of Vertica, (laughs) and if you go back to the roots of Vertica, at the new Encore Hotel. Well we better have theCUBE there, bro. And yeah, you've done that conference but let's talk the disruption for a minute. but we got to keep going you know, Have the tools to improve quality the right quality, you know, But I think that creates a lot of issues but I'd like you to elaborate on that becuase I think you can process a lot of data on Hadoop, and so they need to figure those things out. so one of the things about Vertica is we run everywhere, and frankly, I think cloud has helped Vertica a lot, I mean you're a software company. And then to you it's another hardware platform. the Linux company, which actually you know, and Micro Focus, they know what they're doing, so now you've got SUSE as the lone independent. is NoSQL the next Hadoop? Nothing is going to do every job well, the other things that you can do with Vertica and so I think that's going to continue as well. Paul and I will be back with our next guest from MIT,

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Andy Palmer	PERSON	0.99+
Paul Gillin	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Colin Mahoney	PERSON	0.99+
Paul	PERSON	0.99+
Colin	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Vertica	ORGANIZATION	0.99+
Chris Lynch	PERSON	0.99+
HPE	ORGANIZATION	0.99+
Michael Stonebreaker	PERSON	0.99+
HP	ORGANIZATION	0.99+
Micro Focus	ORGANIZATION	0.99+
Hadoop	TITLE	0.99+
Colin Mahony	PERSON	0.99+
last week	DATE	0.99+
Andy	PERSON	0.99+
March 30th	DATE	0.99+
NoSQL	TITLE	0.99+
Mike	PERSON	0.99+
Experian	ORGANIZATION	0.99+
tomorrow	DATE	0.99+
SQL	TITLE	0.99+
two day	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
Cambridge, Massachusetts	LOCATION	0.99+
4,000 engineers	QUANTITY	0.99+
Two years ago	DATE	0.99+
SUSE	TITLE	0.99+
Azure DB	TITLE	0.98+
second thing	QUANTITY	0.98+
20 years ago	DATE	0.98+
10 years ago	DATE	0.98+
one	QUANTITY	0.98+
Vertica	TITLE	0.98+
Hortonworks	ORGANIZATION	0.97+
MapReduce	ORGANIZATION	0.97+
one thing	QUANTITY	0.97+

Craig Taylor, Quantium | Cisco Live US 2019

>> Announcer: Live from San Diego, California, it's theCUBE, covering Cisco Live US 2019. Brought to you by Cisco and its ecosystem partners. >> Hey, welcome back to theCUBE's coverage. Day two of Cisco Live from San Diego. I'm Lisa Martin. Dave Vellante is my esteemed cohost. And we're pleased to welcome one of Cisco and Cohesity's customers from Quantium, Craig Tayler, Executive Manager at Business Technology and Platforms. Craig, welcome to theCUBE. >> Thank you. It's great to be here. >> Great seeing you. >> So, we love talking with customers. We love talking about data. Tell our audience a little bit about Quantium. I know you guys have expertise in two core domains, data science, AI, two really sexy topics that we talk about on theCUBE at every event. But give our audience a little bit of the flavor of who you guys are. >> Yeah, so Quantium's been around for 16 years, founded and headquartered in Sydney, Australia. And really, they are like you mentioned, the two main aspects of our business. So when you think of data science more as human intelligence, and then the AI side is how we can augment that with computers as much as possible. So, on the human intelligence side, we're looking at things like data curation, how can we work with a company to understand their data, perhaps monetize their data. And then on the AI side, we're more looking at things like, how do we do predictive modeling or predictive analytics, and how can we get that in front of maybe say a supply chain solution, or working with grocery stores around actually predicting how much fresh food they need. So we think of these things like, wouldn't it be great if we had a better idea of how much we needed? Less waste, less cost, everything else. So that's really how we kind of split the two sides of the company. >> You guys provide this as a service, is that right? >> Yeah, that's correct. So, with those two arms we focus on, whether it be a consulting engagement with a company, where that's a one-off, or an ongoing thing, and we have a range of products that we sell as well, with the idea that any of these companies, whether it be a bank or a retailer, can plug these tools into their existing solutions to give them some real data, some real impact, as opposed to the thoughts, or the feels, or the gut instincts, that we've been working on for so long, all right. >> So paint a picture of your environment. I mean, what does it look like? Cloud, not cloud, apps. >> Yeah. It's certainly a variety. So, if we think, on-premise is really where we do a lot of our work. And this is around, a lot of companies still feel a little bit sensitive around where their data is going, and they like that security of knowing physically where it's located. So on-premise stack we have a bit over 300 servers running a Hadoop cluster, that's where we do the majority of our AI work. And then what we augment that with is, and what we use the cloud a lot for, as we're doing work globally, we're doing a lot of work in North America, it's not feasible to bring all that data back to Sydney, process it, and send it all back, so then really, what we use the cloud for is to take our technology, take our analytics, to the data. So if we're working with a customer, West Coast, East Coast, and they're in Azure we'll deploy in Azure. If they're in GCP, we can deploy in GCP. And that's really how we use cloud is to offer our service, as much as we can, around the world. >> So you said, you got 300 servers, did I hear you right, in a Hadoop cluster, right? >> Yeah, correct. >> What's your distribution? >> We use MapR at the moment. I know there's certainly been a bit of news about them. >> I was going to ask you, well, all three of them. (Craig laughs) Well I guess Hortonworks now folded in, but-- >> Yeah, correct. Cloud has certainly shaken up that marketplace quite a bit. >> Dave: I'm sure, yeah. >> It's been something that we've been keeping a close eye on for quite a while. What's the future there? Is it another distribution? Will someone pick up MapR? Will they get through it? So it is interesting, it's certainly a challenge, but when you're playing in a more emerging space, these are some of the risks you take, but we've always felt that they're worth it. We've had many great years of that and we don't really see any reason that we're not going to get more great years out of that Hadoop environment. >> Yeah, I mean, the IP's going to survive, and it sounds like you guys were early on into it, you got a lot of value out of it. If you had to do it again, you'd probably do the same thing. >> Yeah, that's certainly true. I think, what we've built, there are cloud options on the hyperscale providers that you can use, but look, out of the box, they're not really capable of what we were trying to do. So if we had our time again, we probably would still build the same solution. We'd build it a little bit quicker, obviously, because it's a little bit more in the marketplace, it's not such an emerging technology, but I think we would do the same thing again. >> Dave: Right, and MapR was always ahead of the game with their approach. >> Correct. >> So, obvious question is, how do you protect that data? You're a Cohesity customer, but talk about the data protection aspect of that. >> Yeah, so this is where Cohesity really had a lot of synergies with us, was centralizing a whole raft of datasets into one location. And that's what we do with Hadoop. We take a lot of different datasets and we put it all there. We aggregate it there. So on the secondary data side we had the same problem. Silo datasets all over the environment. Things like, the protection aspect, the compliance aspect, it's not impossible, but it's very hard to manage. So what we really wanted to do was, what do we do with the data when we're not using it anymore? So we might still want to use it in the future, we have to hold onto it. And we needed a better solution for how we manage that. So, having Cohesity, which, to us, being a hyper-converged solution, it's very similar to how Hadoop works. It's a lot of data, a lot of compute, and that's how you deploy it. So we found that actually having all of that, the secondary kind of data that we still needed to keep, combined into one location, for us, it matched on a technology level. And then being able to have all that data in one space, you can do some analytics on it. How often are we using it? What is the data? How many copies of it do we have? So there are a lot of synergies from the data science aspect, and also the technology aspect, which has worked really well for us. >> So what was profound about Hadoop was the idea of bringing five megabytes of code to a petabyte of data, leaving the data where it is, highly distributed environment, obviously challenge protecting that. Help us understand. You're saying that Cohesity architecture is well-suited for that type of environment? >> Yeah, it certainly is. I mean, it augments it quite well, is how I'd say. So at the moment we keep the environments quite separate, but the way we manage them is very similar. So there's great audit login, great security controls that you can place on both environments. So the way that we structure Hadoop with role-based access, who can perform what action, the same thing applies in Cohesity. So now we sort of see that the way that we manage primary is the same way that we can manage secondary. So, it's easier for the staff, when we come to things like compliance or legislation, or, we value data, it's our lifeblood, so we have to be very careful with it. So if we want to do any audit reports or anything like this, we can do 'em the same way. Who has access, what they've done. >> So, Hadoop's been around a lot longer than Cohesity. So, what were you doing before Cohesity, and what were some of those challenges? >> Yeah, what we were doing was a lot. And that was really the only option we had. So we had four or five different solutions that had kind of organically grown over time, whether that was some secondary storage, multiple different backup products, throw a couple of NASes in there, just for good measure. >> Just in case. >> Yeah, just in case. And then really, what we were doing, and how we managed that, is we had close to one FTE dedicated to that environment. It's not great for that person, it's not really the funnest of jobs. And then obviously, the management of it becomes quite difficult. And so that was how we did it. We got by. But it certainly could have been a lot better. >> So that was one FTE dedicated to the backup? >> Just dedicated to the backup. >> Dedicated to data protection? >> Yeah, yeah, yeah. >> Okay. So then you bring in Cohesity, you do the business case, say oh wow, and part of that was we can free up this person to do other things, I presume, right? >> Yeah, yeah, definitely. That was actually certainly one of the key business cases. So, IT is a cost center. We certainly, we work for the business, we support the business, there's no doubt about that. But we are, at the end of the day, a cost center. So getting extra headcount or getting equipment, there has to be a really good business case behind this. And so we found that, so we freed up about 80% of time that we're spending on this, and so actually the two biggest things that we've seen as a benefit of that, staff engagement is actually a lot higher, right, because we don't have someone just dedicated to turning the screws on this old solution all the time. So they get to spend more time on newer tech, which is great, and obviously, if their time's freed-up, value-added activities. What can they be focusing on. >> So how's it work? Is it a self-service platform now? Or somebody, this individual, sets the overall policy, and then people apply it as they see fit, the application guys? >> Yeah, so we have a range. So our infrastructure team holds the overall management of it, and we have that one person who kind of, say rules it, so to speak, but the way we've done with this role-based access, we can give the service desk permission to search backups, so if someone needs a restore, or maybe legal and the compliance team want to know who was accessing what, we can give a lot more self-service to these teams. So the service desk, if they're dealing with an end-user that wants a restore, within 30 seconds, we can tell them, okay, here is the backup we have. Here are the dates that we have it. Which one do you want? Previously, that's a week-and-a-half turnaround. Escalate a ticket, spend three days doing restores and searching through it-- >> Dave: Working weekends. >> Right. Working weekends, and if you even do have the data. Typically what happens, by the time you've restored it, the customer has said, "Look, well I don't need it anymore." It's too late. >> So let's talk about some of the customer benefits. You've only deployed this about six months ago. >> Yeah, correct. >> You talked about a number of the benefits from a time perspective, allowing valuable FTEs to not only be reallocated for other projects, but also from a job satisfaction perspective-- >> Yeah definitely. >> Which is all the way up to the top end of the business. But in terms of helping customers extract more value from their data, monetizing their data, that example that you just gave of where it took too long to recover data before and the customer, the time has passed, what are some of the impacts that your customers are achieving so far? >> Yeah, so I think the biggest area of this that I think we actually look at the most, is that, like I mentioned earlier, we will do, say a piece of work with a customer, and then we'll keep that data. We might need it in the future, but there's not an ongoing engagement. What are we going to do with that? And so we tend to sort of put it aside. If a customer wants any further work done, or perhaps they want to come back with clarification, or anything like this, it then takes us quite a bit of time to find that data, get it back into production, get it back to the state that we were previously using it in. So, one of the biggest things that we've seen is actually now having all of that data always available on Cohesity, and being a hyper-converged platform, it has a lot of compute on it as well, so we can actually run some simple analytics on that data. So if a customer comes back and wants to query just a couple of small items, or perhaps we want to recheck a couple of things, super easy now for us to do that. And so we talk about time to market, or anything like this, is really big for us, and customer responsiveness. So if a customer is asking us a question and the answer is a five-minute answer, they don't want it in four days. So if we can turn that answer around a lot quicker, then obviously everyone's happier. >> And you've already been able to start achieving that? >> Yeah, we have been able to start achieving that already. Whether that be from a customer perspective, and certainly from a compliance perspective, if we have a customer that actually wants to know, where is our data, who has accessed it, everything else, we can turn that around straightaway. So obviously, when we talk about customer satisfaction, or that relationship, they feel a lot more comfortable that we're doing the right thing with their data, and that is obviously hugely invaluable for us as a business. >> And just another infrastructure question. These 300 servers, it's mostly UCS, is that right? Or a lot of UCS? >> Yeah, so we use Cisco for pretty much everything. We certainly are heavy, heavy users of UCS, and so, when we are looking at, I mean, implementing anything to the environment, you don't want it to be a lengthy process, because your return on investment is going to be hit. If you're spending three months installing something, you've already paid, you're getting no benefit out if it, it's now three months old before it's even implemented. So having this kit on Cisco UCS has been great for us, and we were having issues with our previous backup solution and we actually managed to implement the Cohesity solution on UCS and start using it before repairing our existing solution. So it's phenomenal how quickly, through UCS, we were able to bring it in. >> Dave: What kind of issues were you having? Just integration issues, or? >> Yeah, so with our previous backup solution, being a fragmented solution that we had stitched together, we had something as simple as a RAID controller failure caused a whole bunch of data corruption across multiple areas, and so, how the NAS saw the data corruption was different to how the SANDS saw it, and trying to re-index everything, we were struggling to understand what was going on. And whilst we were working through that, we actually had some other members of the team implement Cohesity and get it into the environment quicker than we could repair our existing solution. That's the power of Cisco UCS, really. >> Looking at this massive transformation that Cisco has been undergoing for a while, from a traditional network appliance vendor to now hardware, software, what are your thoughts on how that transformation, which is, in part, you could say, accelerated by DevNet, how is it going to enable businesses like yours to be able to start getting value even faster from the technology? >> Yeah, that's a very good question, and that's something, I think, a few of us in the industry, if we go back two, three, four, five years, was Cisco going to reinvent itself? What was that place? With hyperscale cloud, all these kind of things. I think quite a few people had some questions around what was going to happen in that space. They weren't always the quickest to market. They had great products, but there was a bit of speed issues there. And what we've seen as they've reinvented themselves is, Cisco has this great name for really being ahead of the curve, or leading industry, and this is, I think, what they were built on, really. And so it's been great from our perspective to see them, say, almost getting back to their roots a little bit, in this regard, and so for us, we are a technology business, we are fast-moving, our customers want things to be fast-moving, and so being able to rely on a technology partner like Cisco, and knowing that they're looking for the latest and greatest even quicker than ourselves, I think that's probably where we start to see the biggest impact. In the past, we might have a challenge that we need to solve, you talk to some vendors, and you might hear something like, oh, we're working on that. Maybe in 12 to 18 months we'll have it in the marketplace. Well we need it now. We don't need it in 18 months, it's a today problem. And that's not what we're seeing anymore with Cisco. Typically, any conversation we have with our account reps around here are some of the challenges, here are what our customers want to do, more frequently than not, our Cisco account reps will say, I think we have a solution for that. And that really, being able to partner with players like that in the industry, that makes some of the biggest differences for us as a company, because we need to partner with all these people to do what we do. >> Exactly. So, with all the momentum that you guys have achieved in just six short months, what's next? >> Yeah, Quantium is certainly a fast-moving company, like I mentioned, and what we wanted, we always like to run close to the leading edge, we're similar with Hadoop, we like to be early adopters. We like technology to grow with us. And this is what we saw in Cohesity. So, they haven't been around for long, and they're already doing everything we need. So we think, well this is a great mix. If we've got someone who's already solving everything that we need, this question of what next is great. And so as we move more towards your hyperscale cloud, being able to run Cohesity across all those environments to manage all of that data across all of it, that's certainly a big one that we're investigating. Like I mentioned, we keep pretty much all of our data, and so actually being able to use cloud as an archive solution, it sounds great, but then it's another silo to manage, it's another solution that you need to implement, but Cohesity will manage all that for us. So, the what next, I think, is we'll see the scale out of the solution as our data requirement grows, we will see it expand into the cloud environments that we're going to start building, so we really see it growing with us from that aspect. And then we see a great idea of being able to repurpose a lot of our on-premise hardware by archiving out to the cloud as well. >> What about SaaS? Do you see a need to use a Cohesity to protect your SaaS data, or are you kind of not there yet? >> Yeah, I think it certainly has a play there, it's still something that I think we're exploring a little bit more to make sure that it's a right fit. But certainly, there is an opportunity there to be explored, yeah. >> Always opportunities. Well Craig, we appreciate you stopping by theCUBE-- >> Thank you for having me. >> And sharing how Quantium is leveraging your partnerships with Cisco, with Cohesity, to drive those core business drivers of data science and AI. >> Thank you. >> Our pleasure. For Dave Vellante, I'm Lisa Martin. You're watching theCUBE Live from Cisco Live, in San Diego. (light music)

Published Date : Jun 11 2019

SUMMARY :

Brought to you by Cisco and its ecosystem partners. And we're pleased to welcome one of Cisco It's great to be here. So, we love talking with customers. and then the AI side is how we can augment that and we have a range of products that we sell as well, So paint a picture of your environment. So on-premise stack we have a bit over 300 servers I know there's certainly been a bit of news about them. I was going to ask you, well, all three of them. Yeah, correct. and we don't really see any reason Yeah, I mean, the IP's going to survive, So if we had our time again, Dave: Right, and MapR was always ahead of the game the data protection aspect of that. So on the secondary data side we had the same problem. So what was profound about Hadoop So the way that we structure Hadoop with role-based access, So, what were you doing before Cohesity, And that was really the only option we had. And so that was how we did it. and part of that was we can free up this person And so we found that, Here are the dates that we have it. the customer has said, "Look, well I don't need it anymore." So let's talk about some of the customer benefits. Which is all the way And so we talk about time to market, Yeah, we have been able to start achieving that already. These 300 servers, it's mostly UCS, is that right? and we actually managed to implement being a fragmented solution that we had stitched together, that we need to solve, you talk to some vendors, So, with all the momentum that you guys have achieved that we need, this question of what next is great. it's still something that I think we're exploring Well Craig, we appreciate you stopping by theCUBE-- to drive those core business drivers of data science and AI. You're watching theCUBE Live from Cisco Live, in San Diego.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Craig Tayler	PERSON	0.99+
three months	QUANTITY	0.99+
Craig	PERSON	0.99+
North America	LOCATION	0.99+
five-minute	QUANTITY	0.99+
San Diego	LOCATION	0.99+
three days	QUANTITY	0.99+
Sydney	LOCATION	0.99+
12	QUANTITY	0.99+
four	QUANTITY	0.99+
300 servers	QUANTITY	0.99+
two arms	QUANTITY	0.99+
two sides	QUANTITY	0.99+
Sydney, Australia	LOCATION	0.99+
San Diego, California	LOCATION	0.99+
six short months	QUANTITY	0.99+
16 years	QUANTITY	0.99+
two	QUANTITY	0.99+
18 months	QUANTITY	0.99+
three	QUANTITY	0.99+
MapR	ORGANIZATION	0.99+
Cohesity	ORGANIZATION	0.99+
both environments	QUANTITY	0.99+
East Coast	LOCATION	0.99+
Quantium	ORGANIZATION	0.99+
a week	QUANTITY	0.99+
FTE	ORGANIZATION	0.98+
two main aspects	QUANTITY	0.98+
four days	QUANTITY	0.98+
Hadoop	TITLE	0.98+
West Coast	LOCATION	0.98+
earl	PERSON	0.98+
today	DATE	0.98+
one space	QUANTITY	0.97+
Craig Taylor	PERSON	0.97+
one location	QUANTITY	0.97+
five years	QUANTITY	0.97+
one person	QUANTITY	0.96+
two biggest things	QUANTITY	0.96+
about 80%	QUANTITY	0.96+
Hadoop	ORGANIZATION	0.96+
Day two	QUANTITY	0.96+
one	QUANTITY	0.95+
theCUBE	ORGANIZATION	0.95+
over 300 servers	QUANTITY	0.94+
two really sexy topics	QUANTITY	0.94+
30 seconds	QUANTITY	0.93+
two core domains	QUANTITY	0.92+
UCS	TITLE	0.91+
six months ago	DATE	0.9+
US	LOCATION	0.88+
Cisco UCS	ORGANIZATION	0.88+
five megabytes of code	QUANTITY	0.84+
five different	QUANTITY	0.84+
-a-half	QUANTITY	0.83+
2019	DATE	0.83+

Jim Long, Sarbjeet Johal, and Joseph Jacks | CUBEConversation, February 2019

(lively classical music) >> Hello everyone, welcome to this special Cube conversation, we are here at the Power Panel Conversation. I'm John Furrier, in Palo Alto, California, theCUBE studies we have remote on the line here, talk about the cloud technology's impact on entrepreneurship and startups and overall ecosystem is Jim Long, who's the CEO of Didja, which is a startup around disrupting digital TV, also has been an investor and a serial entrepreneur, Sarbjeet Johal, who's the in-cloud influencer of strategy and investor out of Berkeley, California, The Batchery, and also Joseph Jacks, CUBE alumni, actually you guys are all CUBE alumni, so great to have you on. Joseph Jacks is the founder and general partner of OSS Capital, Open Source Software Capital, a new fund that's been raised specifically to commercialize and fund startups around open source software. Guys, we got a great panel here of experts, thanks for joining us, appreciate it. >> Go Bears! >> Nice to be here. >> So we have a distinguished panel, it's the Power Panel, we're on cloud technos, first I'd like to get you guys' reaction you know, you're to seeing a lot of negative news around what Facebook has become, essentially their own hyper-scale cloud with their application. They were called the digital, you know, renegades, or digital gangsters in the UK by the Parliament, which was built on open source software. Amazon's continuing to win, Azure's doing their thing, bundling Office 365, making it look like they've got more revenue with their catching up, Google, and then you got IBM and Oracle, and then you got an ecosystem that's impacted by this large scale, so I want to get your thoughts on first point here. Is there room for more clouds? There's a big buzzword around multiple clouds. Are we going to see specialty clouds? 'Causes Salesforce is a cloud, so is there room for more cloud? Jim, why don't you start? >> Well, I sure hope so. You know, the internet has unfortunately become sort of the internet of monopolies, and that doesn't do anyone any good. In fact, you bring up an interesting point, it'd be kind of interesting to see if Facebook created a social cloud for certain types of applications to use. I've no idea whether that makes any sense, but Amazon's clearly been the big gorilla now, and done an amazing job, we love using them, but we also love seeing, trying out different services that they have and then figuring out whether we want to develop them ourselves or use a specialty service, and I think that's going to be interesting, particularly in the AI area, stuff like that. So I sure hope more clouds are around for all of us to take advantage of. >> Joseph, I want you to weigh in here, 'cause you were close to the Kubernetes trend, in fact we were at a OpenStack event when you started Kismatic, which is the movement that became KubeCon Cloud Native, many many years ago, now you're investing in open source. The world's built on open source, there's got to be room for more clouds. Your thoughts on the opportunities? >> Yeah, thanks for having me on, John. I think we need a new kind of open collaborative cloud, and to date, we haven't really seen any of the existing major sort of large critical mass cloud providers participate in that type of model. Arguably, Google has probably participated and contributed the most in the open source ecosystem, contributing TensorFlow and Kubernetes and Go, lots of different open source projects, but they're ultimately focused on gravitating huge amounts of compute and storage cycles to their cloud platform. So I think one of the big missing links in the industry is, as we continue to see the rise of these large vertically integrated proprietary control planes for computing and storage and applications and services, I think as the open source community and the open source ecosystem continues to grow and explode, we'll need a third sort of provider, one that isn't based on monopoly or based on a traditional proprietary software business like Microsoft kind of transitioning their enterprise customers to services, sort of Amazon in the first camp vertically integrated many a buffet of all these different compute, storage, networking services, application, middleware. Microsoft focused on sort of building managed services of their software portfolio. I think we need a third model where we have sort of an open set of interfaces and an open standards based cloud provider that might be a pure software company, it might be a company that builds on the rails and the infrastructure that Amazon has laid down, spending tens of billions in cap ex, or it could be something based on a project like Kubernetes or built from the community ecosystem. So I think we need something like that just to sort of provide, speed the innovation, and disaggregate the services away from a monolithic kind of closed vendor like Amazon or Azure. >> I want to come back to that whole startup opportunity, but I want to get Sarbjeet in here, because we've been in the B2B area with just last week at IBM Think 2019. Obviously they're trying to get back into the cloud game, but this digital transformation that has been the cliche for almost a couple of years now, if not five or plus. Business has got to move to the cloud, so there's a whole new ball game of complete cultural shift. They need stability. So I want to talk more about this open cloud, which I love that conversation, but give me the blocking and tackling capabilities first, 'cause I got to get out of that old cap ex model, move to an operating model, transform my business, whether it's multi clouds. So Sarbjeet, what's your take on the cloud market for say, the enterprise? >> Yeah, I think for the enterprise... you're just sitting in that data center and moving those to cloud, it's a cumbersome task. For that to work, they actually don't need all the bells and whistles which Amazon has in the periphery, if you will. They need just core things like compute, network, and storage, and some other sort of services, maybe database, maybe data share and stuff like that, but they just want to move those applications as is to start with, with some replatforming and with some changes. Like, they won't make changes to first when they start moving those applications, but our minds are polluted by this thinking. When we see a Facebook being formed by a couple of people, or a company of six people sold for a billion dollars, it just messes up with our mind on the enterprise side, hey we can do that too, we can move that fast and so forth, but it's sort of tragic that we think that way. Well, having said that, and I think we have talked about this in the past. If you are doing anything in the way of systems innovation, if your building those at, even at the enterprise, I think cloud is the way to go. To your original question, if there's room for newer cloud players, I think there is, provided that we can detach the platforms from the environments they are sitting on. So the proprietariness has to kinda, it has to be lowered, the degree of proprietariness has to be lower. It can be through open source I think mainly, it can be from open technologies, they don't have to be open source, but portable. >> JJ was mentioning that, I think that's a big point. Jim Long, you're an entrepreneur, you've been a VC, you know all the VCs, been around for a while, you're also, you're an entrepreneur, you're a serial entrepreneur, starting out at Cal Berkeley back in the day. You know, small ideas can move fast, and you're building on Amazon, and you've got a media kind of thing going on, there's a cloud opportunity for you, 'cause you are cloud native, 'cause you're built in the cloud. How do you see it playing out? 'Cause you're scaling with Amazon. >> Well, so we obviously, as a new startup, don't have the issues the enterprise folks have, and I could really see the enterprise customers, what we used to call the Fortune 500, for example, getting together and insisting on at least a base set of APIs that Amazon and Microsoft et cetera adopt, and for a startup, it's really about moving fast with your own solution that solves a problem. So you don't necessarily care too much that you're tied into Amazon completely because you know that if you need to, you can make a change some day. But they do such a good job for us, and their costs, while they can certainly be lower, and we certainly would like more volume discounts, they're pretty darn amazing across the network, across the internet, we do try to price out other folks just for the heck of it, been doing that recently with CDNs, for example. But for us, we're actually creating a hybrid cloud, if you will, a purpose-built cloud to support local television stations, and we do think that's going to be, along with using Amazon, a unique cloud with our own APIs that we will hopefully have lots of different TV apps use our hybrid cloud for part of their application to service local TV. So it's kind of a interesting play for us, the B2B part of it, we're hoping to be pretty successful as well, and we hope to maybe have multiple cloud vendors in our mix, you know. Not that our users will know who's behind us, maybe Amazon, for something, Limelight for another, or whatever, for example. >> Well you got to be concerned about lock-in as you become in the cloud, that's something that everybody's worried about. JJ, I want to get back to you on the investment thesis, because you have a cutting edge business model around investing in open source software, and there's two schools of thought in the open source community, you know, free contribution's great, and let tha.t be organic, and then there's now commercialization. There's real value being created in open source. You had put together a chart with your team about the billions of dollars in exits from open source companies. So what are you investing in, what do you see as opportunities for entrepreneurs like Jim and others that are out there looking at scaling their business? How do you look at success, what's your advice, what do you see as leading indicators? >> I think I'll broadly answer your question with a model that we've been thinking a lot about. We're going to start writing publicly about it and probably eventually maybe publish a book or two on it, and it's around the sort of fundamental perspective of creating value and capturing value. So if you model a famous investor and entrepreneur in Silicon Valley who has commonly modeled these things using two different letter variables, X and Y, but I'll give you the sort of perspective of modeling value creation and value capture around open source, as compared to closed source or proprietary software. So if you look at value creation modeled as X, and value capture modeled as Y, where X and Y are two independent variables with a fully proprietary software company based approach, whether you're building a cloud service or a proprietary software product or whatever, just a software company, your value creation exponent is typically bounded by two things. Capital and fundraising into the entity creating the software, and the centralization of research and development, meaning engineering output for producing the software. And so those two things are tightly coupled to and bounded to the company. With commercial open source software, the exact opposite is true. So value creation is decoupled and independent from funding, and value creation is also decentralized in terms of the research and development aspect. So you have a sort of decentralized, community-based, crowd-sourced, or sort of internet, global phenomena of contributing to a code base that isn't necessarily owned or fully controlled by a single entity, and those two properties are sort of decoupled from funding and decentralized R and D, are fundamentally changing the value creation kind of exponent. Now let's look at the value capture variable. With proprietary software company, or proprietary technology company, you're primarily looking at two constituents capturing value, people who pay for accessing the service or the software, and people who create the software. And so those two constituents capture all the value, they capture, you know, the vendor selling the software captures maybe 10 or 20% of the value, and the rest of the value, I would would express it say as the customer is capturing the rest of the value. Most economists don't express value capture as capturable by an end user or a customer. I think that's a mistake. >> Jim, you're-- >> So now... >> Okay, Jim, your reaction to that, because there's an article went around this weekend from Motherboard. "The internet was built on free labor "of open source developers. "Is that sustainable?" So Jim, what's your reaction to JJ's comments about the interactions and the dynamic between value creation, value capture, free versus sustainable funding? >> Well if you can sort of mix both together, that's what I would like, I haven't really ever figured out how to make open source work in our business model, but I haven't really tried that hard. It's an intriguing concept for sure, particularly if we come up with APIs that are specific to say, local television or something like that, and maybe some special processes that do things that are of interest to the wider community. So it's something I do plan to look at because I do agree that if you, I mean we use open source, we use this thing called FFmpeg, and several other things, and we're really happy that there's people out there adding value to them, et cetera, and we have our own versions, et cetera, so we'd like to contribute to the community if we could figure out how. >> Sarbjeet, your reactions to JJ's thesis there? >> I think two things. I will comment on two different aspects. One is the lack of standards, and then open source becoming the standard, right. I think open source kind of projects take birth and life in its own, because we have lack of standard, 'cause these different vendors can't agree on standards. So remember we used to have service-oriented architecture, we have Microsoft pushing some standards from one side and IBM pushing from other, SOAP versus xCBL and XML, different sort of paradigms, right, but then REST API became the de facto standard, right, it just took over, I think what REST has done for software in last about 10 years or so, nothing has done that for us. >> well Kubernetes is right now looking pretty good. So if you look at JJ, Kubernetes, the movement you were really were pioneering on, it's having similar dynamic, I mean Kubernetes is becoming a forcing function for solidarity in the community of cloud native, as well as an actual interoperable orchestration layer for multiple clouds and other services. So JJ, your thoughts on how open source continues as some of these new technologies, like Kubernetes, continue to hit the scene. Is there any trajectory change in open source that you see, that you could share, I'd love to get your insights on what's next behind, you know, the rise of Kubernetes is happening, what's next? >> I think more abstractly from Kubernetes, we believe that if you just look at the rate of innovation as a primary factor for progress and forward change in the world, open source software has the highest rate of innovation of any technology creation phenomena, and as a consequence, we're seeing more standards emerge from the open source ecosystem, we're seeing more disruption happen from the open source ecosystem, we're seeing more new technology companies and new paradigms and shifts happen from the open source ecosystem, and kind of all progress across the largest, most difficult sort of compound, sensitive problems, influenced and kind of sourced from the open source ecosystem and the open source world overall. Whether it's chip design, machine learning or computing innovations or new types of architectures, or new types of developer paradigms, you know, biological breakthroughs, there's kind of things up and down the technology spectrum that have a lot to sort of thank open source for. We think that the future of technology and the future of software is really that open source is at the core, as opposed to the periphery or the edges, and so today, every software technology company, and cloud providers included, have closed proprietary cores, meaning that where the core is, the data path, the runtime, the core business logic of the company, today that core is proprietary software or closed source software, and yet what is also true, is at the edges, the wrappers, the sort of crust, the periphery of every technology company, we have lots of open source, we have client libraries and bindings and languages and integrations, configuration, UIs and so on, but the cores are proprietary. We think the following will happen over the next few decades. We think the future will gradually shift from closed proprietary cores to open cores, where instead of a proprietary core, an open core is where you have core open source software project, as the fundamental building block for the company. So for example, Hadoop caused the creation of MapR and Cloudera and Hortonworks, Spark caused the creation of Databricks, Kafka caused the creation of Confluent, Git caused the creation of GitHub and GitLab, and this type of commercial open source software model, where there's a core open source project as the kernel building block for the company, and then an extension of intellectual property or wrappers around that open source project, where you can derive value capture and charge for licensed product with the company, and impress customer, we think that model is where the future is headed, and this includes cloud providers, basically selling proprietary services that could be based on a mixture of open source projects, but perhaps not fundamentally on a core open source project. Now we think generally, like abstractly, with maybe somewhat of a reductionist explanation there, but that open core future is very likely, fundamentally because of the rate of innovation being the highest with the open source model in general. >> All right, that's great stuff. Jim, you're a historian of tech, you've lived it. Your thoughts on some of the emerging trends around cloud, because you're disrupting linear TV with Didja, in a new way using cloud technology. How do you see cloud evolving? >> Well, I think the long lines we discussed, certainly I think that's a really interesting model, and having the open source be the center of the universe, then figure out how to have maybe some proprietary stuff, if I can use that word, around it, that other people can take advantage of, but maybe you get the value capture and build a business on that, that makes a lot of sense, and could certainly fit in the TV industry if you will from where I sit... Bring services to businesses and consumers, so it's not like there's some reason it wouldn't work, you know, it's bound to, it's bound to figure out a way, and if you can get a whole mass of people around the world working on the core technology and if it is sort of unique to what mission of, or at least the marketplace you're going after, that could be pretty interesting, and that would be great to see a lot of different new mini-clouds, if you will, develop around that stuff would be pretty cool. >> Sarbjeet, I want you to talk about scale, because you also have experience working with Rackspace. Rackspace was early on, they were trying to build the cloud, and OpenStack came out of that, and guess what, the world was moving so fast, Amazon was a bullet train just flying down the tracks, and it just felt like Rackspace and their cloud, you know OpenStack, just couldn't keep up. So is scale an issue, and how do people compete against scale in your mind? >> I think scale is an issue, and software chops is an issue, so there's some patterns, right? So one pattern is that we tend to see that open source is now not very good at the application side. You will hardly see any applications being built as open source. And also on the extreme side, open source is pretty sort of lame if you will, at very core of the things, like OpenStack failed for that reason, right? But it's pretty good in the middle as Joseph said, right? So building pipes, building some platforms based on open source, so the hooks, integration, is pretty good there, actually. I think that pattern will continue. Hopefully it will go deeper into the core, which we want to see. The other pattern is I think the software chops, like one vendor has to lead the project for certain amount of time. If that project goes into sort of open, like anybody can grab it, lot of people contribute and sort of jump in very quickly, it tends to fail. That's what happened to, I think, OpenStack, and there were many other reasons behind that, but I think that was the main reason, and because we were smaller, and we didn't have that much software chops, I hate to say that, but then IBM could control like hundred parties a week, at the project >> They did, and look where they are. >> And so does HP, right? >> And look where they are. All right, so I'd love to have a Power Panel on open source, certainly JJ's been in the thick of it as well as other folks in the community. I want to just kind of end on lightweight question for you guys. What have you guys learned? Go down the line, start with Jim, Sarbjeet, and then JJ we'll finish with you. Share something that you've learned over the past three months that moved you or that people should know about in tech or cloud trends that's notable. What's something new that you've learned? >> In my case, it was really just spending some time in the last few months getting to know our end users a little bit better, consumers, and some of the impact that having free internet television has on their lives, and that's really motivating... (distorted speech) Something as simple as you might take for granted, but lower income people don't necessarily have a TV that works or a hotel room that has a TV that works, or heaven forbid they're homeless and all that, so it's really gratifying to me to see people sort of tuning back into their local media through television, just by offering it on their phone and laptops. >> And what are you going to do as a result of that? Take a different action, what's the next step for you, what's the action item? >> Well we're hoping, once our product gets filled out with the major networks, et cetera, that we actually provide a community attachment to it, so that we have over-the-air television channels is the main part of the app, and then a side part of the app could be any IP stream, from city council meetings to high schools, to colleges, to local community groups, local, even religious situations or festivals or whatever, and really try to tie that in. We'd really like to use local television as a way to strengthening all local media and local communities, that's the vision at least. >> It's a great mission you guys have at Didja, thanks for sharing that. Sarbjeet, what have learned over the past quarter, three months that was notable for you and the impact and something that changed you a little bit? >> What actually I have gravitated towards in last three to six months is the blockchain, actually. I was light on that, like what it can do for us, and is there really a thing behind it, and can we leverage it. I've seen more and more actually usage of that, and sort of full SCM, supply chain management and healthcare and some other sort of use cases if you will. I'm intrigued by it, and there's a lot of activity there. I think there's some legs behind it, so I'm excited about that. >> And are doing a blockchain project as a result, or are you still tire-kicking? >> No actually, I will play with it, I'm a practitioner, I play with it, I write code and play with it and see (Jim laughs) what does that level of effort it takes to do that, and as you know, I wrote the Alexa scale couple of weeks back, and play with AI and stuff like that. So I try to do that myself before I-- >> We're hoping blockchain helps even out the TV ad economy and gets rid of middle men and makes more trusting transactions between local businesses and stuff. At least I say that, I don't really know what I'm talking about. >> It sounds good though. You get yourself a new round of funding on that sound byte alone. JJ, what have you learned in the past couple months that's new to you and changed you or made you do something different? >> I've learned over the last few months, OSS Capital is a few months and change old, and so just kind of getting started on that, and it's really, I think potentially more than one decade, probably multi-decade kind of mostly consensus building effort. There's such a huge lack of consensus and agreement in the industry. It's a fascinatingly polarizing area, the sort of general topic of open source technology, economics, value creation, value capture. So my learnings over the past few months have just intensified in terms of the lack of consensus I've seen in the industry. So I'm trying to write a little bit more about observations there and sort of put thoughts out, and that's kind of been the biggest takeaway over the last few months for me. >> I'm sure you learned about all the lawyer conversations, setting up a fund, learnings there probably too right, (Jim laughs) I mean all the detail. All right, JJ, thanks so much, Sarbjeet, Jim, thanks for joining me on this Power Panel, cloud conversation impact, to entrepreneurship, open source. Jim Long, Sarbjeet Johal and Joseph Jacks, JJ, thanks for joining us, theCUBE Conversation here in Palo Alto, I'm John Furrier, thanks for watching. >> Thanks John. (lively classical music)

Published Date : Feb 20 2019

SUMMARY :

so great to have you on. Google, and then you got IBM and Oracle, sort of the internet of monopolies, there's got to be room for more clouds. and the open source that has been the cliche So the proprietariness has to kinda, Berkeley back in the day. across the internet, we do in the open source community, you know, and the rest of the value, about the interactions and the dynamic to them, et cetera, and we have One is the lack of standards, the movement you were and the future of software is really that How do you see cloud evolving? and having the open source be just flying down the tracks, and because we were smaller, and look where they are. over the past three months that moved you and some of the impact that of the app could be any IP stream, and the impact and something is the blockchain, actually. and as you know, I wrote the Alexa scale the TV ad economy and in the past couple months and agreement in the industry. I mean all the detail. (lively classical music)

ENTITIES

Entity	Category	Confidence
Jim	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Jim Long	PERSON	0.99+
JJ	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Sarbjeet	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Sarbjeet Johal	PERSON	0.99+
Joseph	PERSON	0.99+
John	PERSON	0.99+
Joseph Jacks	PERSON	0.99+
OSS Capital	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
February 2019	DATE	0.99+
Google	ORGANIZATION	0.99+
six people	QUANTITY	0.99+
John Furrier	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
Palo Alto	LOCATION	0.99+
10	QUANTITY	0.99+
two things	QUANTITY	0.99+
20%	QUANTITY	0.99+
CUBE	ORGANIZATION	0.99+
Palo Alto, California	LOCATION	0.99+
five	QUANTITY	0.99+
HP	ORGANIZATION	0.99+
two	QUANTITY	0.99+
two constituents	QUANTITY	0.99+
Open Source Software Capital	ORGANIZATION	0.99+
UK	LOCATION	0.99+
Office 365	TITLE	0.99+
last week	DATE	0.99+
Didja	ORGANIZATION	0.99+
two properties	QUANTITY	0.99+
both	QUANTITY	0.98+
two schools	QUANTITY	0.98+
One	QUANTITY	0.98+
first point	QUANTITY	0.98+
Rackspace	ORGANIZATION	0.98+
third model	QUANTITY	0.98+
first camp	QUANTITY	0.98+
Alexa	TITLE	0.98+

Patrick Osborne, HPE | CUBEConversation, November 2018

>> From the SiliconANGLE Media Office in Boston, Massachusets, it's theCUBE. Now, here's your host, Dave Vellante. >> Hi everybody, welcome to this preview of HPE's, Discover Madrid storage news. We're gonna unpack that. My name is Dave Vellante and Hewlett Packard Enterprise has a six-month cadence of shows. They have one in the June timeframe in Las Vegas, and then one in Europe. This year, again, it's in Madrid and you always see them announce products and innovations coinciding with those big user shows. With me here is Patrick Osborne who's the Vice President and General Manager of Big Data and Secondary Storage at HPE. Patrick, great to see you again. >> Great to be here, love theCUBE, thanks for having us. >> Oh, you're very welcome. So let's, let's unpack some of these announcements. You guys, as I said, you're on this six-month cadence. You've got sort of three big themes that you're vectoring into, maybe you could start there. >> Yeah, so within HP Storage and Big Data where, you know, where our point of view is around intelligent storage and intelligent data management and underneath that we've kind of vectored in on three pillars that you talked about. AI driven, so essentially bringing the intelligence, self-managing, self-healing, to all of our storage platforms, and big-data platforms, built for the Cloud, right? We've got a lot of use cases, and user stories, and you've seen from an HPE perspective, Hybrid Cloud, you know, is a big investment we're making in addition to the edge. And the last is delivering all of our capabilities, from product perspective, solutions and services as a service, right? So GreenLake is something that we started a few years ago and being able to provide that type of elastic, you know, purchasing experience for our customers is gonna weave itself in further products and solutions that we announce. >> So I like your strategy around AI. AI of course gets a lot of buzz these days. You guy are taking a practical approach. The Nimble acquisition gave you some capabilities there in predictive maintenance. You've pushed it into your automation capabilities. So let's talk about the hard news specifically around InfoSight. >> Yeah, so InfoSight is an incredible platform and what you see is that we've been not only giving customers richer experiences on top of InfoSight that go further up into the stack so we're providing recommendation engines so we've got this whole concept of Cross-stack Analytics that go from, you know, your app and your virtualization layer through the physical infrastructure. So we've had a number of pieces of that, that we're announcing to give very rich, AI-driven guidance, to customers, you know, to fix specific problems. We're also extending it to more platforms. Right, we just announced last week the ability to run InfoSight on our server platforms, right? So we're starting off on a journey of providing that which we're doing at the storage and networking layer weaving in our server platform. So essentially platforms like ProLiant, Synergy, Apollo, all of our value compute platforms. So we are, we're doing some really cool stuff not only providing the experience on new platforms, but richer experiences certainly around performance bottlenecks on 3PAR so we're getting deeper AI-driven recommendation engines as well as what we call an AI-driven resource planner for Nimble. So if you take a look at it from a tops-down view this isn't AI marketing. We're actually applying these techniques and machine learning within our install base in our fleet which is growing larger as we extend support from our platforms that actually make people's lives easier from a storage administration perspective. >> And that was a big part of the acquisition that IP, that machine intelligence IP. Obviously you had to evaluate that and the complexity of bringing it across the portfolio. You know we live in this API-driven world, Nimble was a very modern platform so that facilitated that injection of that intelligence across the platform and that's what we're seeing now isn't it. >> Yeah, absolutely. You go from essentially tooling up these platforms for this very rich telemetry really delivering a differentiated support experience that takes a lot of the manual interactions and interventions from a human perspective out of it and now we're moving in with these three announcements that we've made into things that are doing predictive analytics, recommendations and automation at the end of the day. So we're really making, trying to make people's lives easier from an admin perspective and giving them time back to work on higher value activities. >> Well let's talk about Cloud. HP doesn't have a public Cloud like an Amazon or an Azure, you partner with those guys, but you have Cloud Volumes, which is Cloud-like, it's actually Cloud from a business model perspective. Explain what Cloud Volumes is and what's the news here? >> Yeah, so, we've got a great service, it's called HPE Cloud Volumes and you'll see throughout the year us extending more user stories and experiences for Hybrid Cloud, right. So we have CloudBank, which focuses on secondary storage, Cloud Volumes is for primary storage users, so it is a Cloud, public Cloud adjacent storage as a service and it allows you to go into the portal, into your credentials. You can enter in your credit card number and essentially get storage as a service as an adjacent, or replacement data service for, for example, EBS from Amazon. So you're able to stand up storage as a service within a co-location facility that we manage and it's completely delivered as a service and then our announcement for that is that, so what we've done in the Americas is you can essentially apply compute instances from the public Cloud to that storage, so it's in a co-location facility it's very close from a latency standpoint to the public Cloud. Now we're gonna be extending that service into Europe, so UK, Ireland, and for the EMEA users as well as now we can also support persistent storage work loads for Docker and Kubernetes and this is a big win for a lot of customers that wanna do continuous improvement, continuous development, and use those containerized frameworks and then you can essentially, you know, integrate with your on-prem storage to your off-prem and then pull in the compute from the Cloud. >> Okay so you got that, write once, run anywhere sort of model. I was gonna ask you well why would I do this instead of EBS, I think you just answered that question. It's because you now can do that anywhere, hybrid is a key theme here, right? >> Yeah, also too from a resiliency perspective, performance, and durability perspective, the service that we provide is, you know, certainly six-nines, very high performant, from a latency perspective. We've been in the enterprise-storage game for quite some time so we feel we've got a really good service just from the technology perspective as well. >> And the European piece, I presume a lot of that is, well of course, GDPR, the fines went into effect in May of 2018. There's a lot of discussion about okay, data can't leave a particular locality, it's especially onerous in Europe, but probably other places as well. So there's a, there's a data locality governance compliance angle here too, is there not? >> Yeah, absolutely, and for us if you take a specific industry like healthcare, you know, for example, so you have to have pretty clear line of sight for your data provenance so it allows us to provide the service in these locations for a healthcare customer, or a healthcare ISV, you know, SAS provider to be able to essentially point to where that data is, you know, and so for us it's gonna be an entrance into that vertical for hybrid Cloud use cases. >> Alright so, so again, we've got the AI-driven piece, the Cloud piece, I see as a service, which is the third piece, I see Cloud as one, and as a service is one-A, it's almost like a feature of Cloud. So let's unpack that a little bit. What are you announcing in as a service and what's your position there? >> Yeah, so our vision is to be able to provide, and as a service experience, for almost everything we have that we provide our customers. Whether it's an individual product, whether it's a solution, or actually like a segment, right? So in the space that I work in, in Big Data and secondary service, secondary storage, backup is a service, for example, right, it's something that customers want, right? They don't want to be able to manage that on their own by piece parts, architect the whole thing, so what we're able to do is provide your primary storage, your secondary storage, your backup ISV, so in this case we're gonna be providing backup as a service through GreenLake with Vim. And then we even can bring in your Cloud capacity, so for example, Azure Blob Storage which will be your tertiary storage, you know, from an archive perspective. So for us it really allows us to provide customers an experience that, you know, is more of an, it's an experienced, Cloud is a destination, we're providing a multi-Cloud, a Hybrid-Cloud experience not only from a technology perspective, but also from a purchasing flex up, flex down, flex out experience and we're gonna keep on doing that over and over for the next, you know, foreseeable future. >> So you've been doing GreenLake for awhile here-- >> Yeah, absolutely. >> So how's that going and what's new here? >> Yeah, so that's been going great. We have well over, I think at this point, 500 petabytes on our management under GreenLake and so the service is, it's interesting when you think about it, when we were designing this we thought, just like the public Cloud, the compute as a service would take off, but from our perspective I think one of the biggest pain points for customers is managing data, you know, storage and Big Data, so storage as a service has grown very rapidly. So these services are very popular and we'll keep on iterating on them to create maximum velocity. One of the other things that's interesting about some of these accounting rules that have taken place, is that customers seed to us the, the ability to do architecture, right, so we're essentially creating no Snowflakes for our customers and they get better outcomes from a business perspective so we help them with the architecture, we help them with planning an architecture of the actual equipment and then they get a very defined business outcome in SLA that they pay for as a service, right? So it's a win-win across the board, is really good. >> Okay, so no Snowflakes as in, not everything's custom-- >> Absolutely. >> And then that, so that lowers not only your cost, it lowers the customer's cost. So let's take an example like that, let's take backup as a service which is part of GreenLake. How does that work if I wanna engage with you on backup as a service? >> Yeah, so we have a team of folks in Pointnext that can engage like very far up in the front end, right, so they say, hey, listen, I know that I need to do a major re-architecture for my secondary storage, HPE, can you help me out? So we provide advisory services, we have well-known architectures that fit a set of well-known mission critical, business critical applications at a typical customer site so we can drive that all the way from the inception of that project to implementation. We can take more customized view, or a road-mapped approach to customers where they want to bite off a little bit at a time and use things like Flex Capacity, and then weave in a full GreenLake implementation so it's very flexible in terms of the way we can implement it. So we can go soup to nuts, or we can get down to a very small granular pieces of infrastructure. >> Just sticking on data protection for a second, I saw a stat the other day, it's a fairly well, you know, popular, often quoted stat, it was Gartner I think, is 50% of customers are gonna change their backup platform by like 2023 or something. And you think about, and by the way, I think that's a legitimate stat and when you talk to customers about why, well things are changing, the Cloud, Multicloud, things like GDPR, Ransomware, digital transformation, I wanna get more out of my data then just insurance, my backup then just insurance, I wanna do analytics. So there's all these other sort of evolving things. I presume your backup as a service is evolving with that? >> Absolutely. >> What are you seeing there? >> Yeah, we're definitely seeing that the secondary storage market is very dynamic in terms of the expectations from customers, are, you know, they're changing, and changing very rapidly. And so not only are providing things like GreenLake and backup as a service we're also seeking new partners in this space so one of the big announcements that we'll make at Discover is we are doing a pretty big amplification of our partnership in an OEM relationship with Cohesity, right, so a lot of customers are looking for a secondary platform from a consolidation standpoint, so being able to run a number of very different disparate workloads from a secondary storage perspective and make them, you know, work. So it's a great platform scale-out. It's gonna run on a number of our HPE platforms, right, so we're gonna be able to provide customers that whole solution from HPE partnering with Cohesity. So, you know, in general this secondary storage market's hot and we're making some bets in our ecosystem right now. >> You also have Big Data in your title so you're responsible for that portfolio. I know Apollo in the HPC world has been at a foothold there. There's a lot of synergies between high-performance computing and Big Data-- >> Absolutely. >> What's going on in the Big Data world? >> Yeah, so Big Data is one of our fastest growing segments within HPE. I'd say Big Data and Analytics and some of the things that are going on with AI, and commercial high-performance applications. So for us we're, we have a new platform that we're announcing, our Gen10 version of Apollo 4200, it's definitely the workhorse of our Apollo server line for applications like, Cloudera, Hortonworks, MapR, we see Apache Spark, Kafka, a number of these as well as some of these newer workloads around HPC, so TensorFlow, Caffe, H2O, and so that platform allows us with a really good compute memory and storage mix, from a footprint perspective, and it certainly scales into rack-level infrastructure. That part of the business for us is growing very quickly. I think a lot of customers are using these Big Data Analytics techniques to transform their business and, you know, as we go along and help them it certainly, it's been a really cool ride to see all this implemented at customer sites. >> You know with all this talk about sort of Big Data and Analytics, and Cloud, and AI, you sort of, you know, get lost, the infrastructure kinda gets lost, but you know, the plumbing still matters, right, and so underneath this. So we saw the flash trend, and that really had a major impact on certainly the storage business specifically, but generally, the overall marketplace, I mean, you really, it'd be hard to support a lot of these emerging workloads without flash and that stack continues to evolve, the pyramid if you will. So you've got flash memory now replacing much of the spinning disk space, you've got DRAM which obviously is the most expensive, highest performance, and there seems to be this layer emerging in the middle, this storage-class memory layer. What are you guys doing there? Is there anything new there? >> Yeah, so we've got a couple things cooking in that space. In general, like when you talk about the infrastructure it is important, right, and we're trying to help customers not only by providing really good product in scalable infrastructure, things like Apollo, you know our system's Nimble 3PAR. We're also trying to provide experience around that too. So, you know, combining things like InfoSight, InfoSight on storage, InfoSight on servers and Apollo for Big Data workloads is something that we're gonna be delivering in the future. The platforms really matter. So we're gonna be introducing NVME and storage class memory into our, what we feel is the industry-leading portfolio for our, for flash storage. So between Nimble and 3PAR we'll have, those platforms will be, and they're NVME ready and we'll be making some product announcements on the availability of that type of medium. So if you think about using it in a platform like 3PAR, right, industry leading from a performance perspective allows to get sub 200 millisecond performance for very mission-critical latency intolerant applications and it's a great architecture. It scales in parallel, active, active, active, right, so you can get quite a bit of performance from a very, a large 3PAR system and we're gonna be introducing NVME into that equation as a part of this announcement. >> So, we see this as critical, for years, in the storage business, you talk about how storage is growing, storage is growing, storage is growing, and we'd show the charts upper to the right, and, but it always like yeah, and somehow you gotta store it, you gotta manage it, you might have to move it, it's a real pain. The whole equation is changing now because of things like flash, things like GPU, storage class memory, NVME, now you're seeing, and of course all this ML and deep learning tech, and now you're seeing things that you're able to do with the data that you've never been able to do before-- >> Absolutely. >> And emerging use cases and so it's not just lots of data, it's completely new use cases and it's driving new demands for infrastructure isn't it? >> Absolutely, I mean, there's some macro economic tailwinds that we had this year, but HP had a phenomenal year this year and we're looking at some pretty good outlooks into next year as well. So, yeah, from our perspective the requirement for customers, for latency improvements, bandwidth improvements, and total addressable capacity improvements is, never stops, right? So it's always going on and it's the data pipeline is getting longer. The amount of services and experiences that you're tying on to, existing applications, keeps on augmenting, right? So for us there's always new capabilities, always new ways that we can improve our products. We use for things like InfoSight, and a lot of the predictive Analytics, we're using those techniques for ourselves to improve our customers experience with our products. So it's been, it's a very, you know, virtual cycle in the industry right now. >> Well Patrick, thanks for coming in to theCube and unpacking these announcements at Discover Madrid. You're doing a great job sort of executing on the storage plan. Every time I see you there's new announcements, new innovations, you guys are hittin' all your marks, so congratulations on that. >> HPE, intelligent storage, intelligent data management, so if you guys have data needs you know where to come to. >> Alright, thanks again Patrick. >> Great, thank you so much. >> Talk to you soon. Alright, thanks for watching everybody. This is Dave Vellante from theCUBE. We'll see ya next time. (upbeat music)

Published Date : Nov 27 2018

SUMMARY :

From the SiliconANGLE Media Office and you always see them announce products and innovations Great to be here, love theCUBE, maybe you could start there. that type of elastic, you know, So let's talk about the hard news and what you see is that we've been not only of that intelligence across the platform that takes a lot of the manual interactions but you have Cloud Volumes, which is Cloud-like, from the public Cloud to that storage, Okay so you got that, write once, run anywhere the service that we provide is, you know, And the European piece, I presume a lot of that is, Yeah, absolutely, and for us if you take What are you announcing in as a service for the next, you know, foreseeable future. and so the service is, How does that work if I wanna engage with you of the way we can implement it. and when you talk to customers about why, and make them, you know, work. I know Apollo in the HPC world has been and so that platform allows us the pyramid if you will. right, so you can get quite a bit of performance in the storage business, you talk about how So it's been, it's a very, you know, virtual cycle new innovations, you guys are hittin' all your marks, so if you guys have data needs Talk to you soon.

ENTITIES

Entity	Category	Confidence
Patrick	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Europe	LOCATION	0.99+
Madrid	LOCATION	0.99+
Patrick Osborne	PERSON	0.99+
Boston	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
Las Vegas	LOCATION	0.99+
Ireland	LOCATION	0.99+
HPE	ORGANIZATION	0.99+
six-month	QUANTITY	0.99+
50%	QUANTITY	0.99+
HP	ORGANIZATION	0.99+
May of 2018	DATE	0.99+
Americas	LOCATION	0.99+
November 2018	DATE	0.99+
UK	LOCATION	0.99+
Hewlett Packard Enterprise	ORGANIZATION	0.99+
next year	DATE	0.99+
Discover	ORGANIZATION	0.99+
Apollo	ORGANIZATION	0.99+
Nimble	ORGANIZATION	0.99+
last week	DATE	0.99+
500 petabytes	QUANTITY	0.99+
third piece	QUANTITY	0.99+
this year	DATE	0.99+
This year	DATE	0.99+
EBS	ORGANIZATION	0.99+
three announcements	QUANTITY	0.98+
Discover Madrid	ORGANIZATION	0.98+
June	DATE	0.98+
Cohesity	ORGANIZATION	0.98+
InfoSight	ORGANIZATION	0.98+
one	QUANTITY	0.98+
Gartner	ORGANIZATION	0.98+
GDPR	TITLE	0.97+
Big Data	ORGANIZATION	0.97+
SAS	ORGANIZATION	0.96+
Kafka	TITLE	0.96+
Cloud	TITLE	0.96+
One	QUANTITY	0.95+
Synergy	ORGANIZATION	0.95+
SiliconANGLE Media Office	ORGANIZATION	0.95+
Cloud Volumes	TITLE	0.94+
few years ago	DATE	0.93+
Massachusets	LOCATION	0.93+
EMEA	ORGANIZATION	0.91+
Apache	ORGANIZATION	0.91+
GreenLake	ORGANIZATION	0.91+
Vim	ORGANIZATION	0.85+
six-nines	QUANTITY	0.84+
Pointnext	ORGANIZATION	0.83+
GreenLake	TITLE	0.83+
MapR	TITLE	0.82+
three	QUANTITY	0.79+
ProLiant	ORGANIZATION	0.79+
theCUBE	ORGANIZATION	0.79+

Evan Kaplan, InfluxData | CUBEConversation, Sept 2018

(intense orchestral music) >> Hey welcome back everybody, Jeff Frick here with theCUBE We are taking a short break from the madness of the conference season to do some CUBE Conversations here in the Palo Alto studio, which we always like to do and meet new people, and hear new stories, learn about new companies. And today we've got a new company, we've never had 'em on theCUBE before, it's Evan Kaplan, he's the CEO of InluxData. Evan, great to see you. >> Yeah, hey thanks for having me. >> Absolutely. So for people that aren't familiar with the company, give 'em kind of the 101 on Influx. >> Yeah so, InfluxData is an opensource platform for collecting metrics and events at scale. The company is about almost four years old, has a large selection of tier one customers, is broadly accepted by developers as the number one time-series platform out there, so. >> So a lot of people talk about collecting data, so we've been doing Splunk since 2012, and, they really found something interesting on log files, and took it a whole 'nother level, so there's a lot of people that are capturing events. So what do you guys do that's a little bit different, how are you slicing and dicing this opportunity? >> Yeah, to put this is even in the broader context of what we're looking at is the 20 year break-up of the Oracle, DB2 and Formex franchise that dominated and relational databases were the answer to all problems and so if you look at a company like Splunk working on logs, they optimized a platform for those logs, for that data set, Elastic also, really interesting space. I think our innovation has been in saying "Hey, where the world's going, where all of these complex systems are going?" Particularly IoT, is to real-time view of the data and so, rather than collect verbose logs, historical views of the data and things like that, real system operators, real developers and builders want to instrument their applications, their infrastructure, so you can view 'em in real time. The place where the rubber hits the road is IoT. Sensors spit out metrics and events, period, full stop. And so if you want to be performant in how you handle, your instrumentation of the physical world, and how you do your machine learning, and how you want to manage these systems, you use a fundamentally time-series based database. As opposed to Splunk or Elastic or, which are primarily search-based databases. >> And are you primarily capturing and standardizing the data to feed other analytics tools, or do you have the whole suite, where you're doing some of the analytics as well? >> Yeah, such a great question. So, the fundamental platform is called the TICK Stack, and it stands for Telegraf which is a collector, which has about 200 different collectors that sit out there in the world and collect everything from SNMP data, to Oracle data, to application, to micro-service data, to Kubernetes, to that sort of stuff. There's Influx, which is the DB, which is highly optimized for millions and millions of writes a second, so collecting data points and samples. There's Chronograf which is the visualization engine and so, it allows you as soon as the data comes input you can see how it's graphed, see it on time-series oriented graphing, and then there's Kapacitor which takes action on the data. What we don't do is the super high sophisticated analytics. There are lots of companies in Silicon Valley who take our data, pump it up, and then we put it back on the platform to build a control loop for it. >> Right. So when the Kapacitor, does your application then take action on those things? >> Yes. Yeah, so, it'd do everything from alerting, to sending out another machine request, to spinning up a new Kubernetes pod, to basically scaling the application, self healing. >> Right. So does it fit in between a lot of those other types of applications that are sending off notifications, and those types of things? >> Yes, yeah. so you're in between? >> And usually, we're instrumented the way a standard developer, or an architect or CTO does is they look at a complex application, or a complex set of sensors, they instrument with Influx and Telegraf, and collect that data, they view it in real time, and then they build control loops, automation loops, to make that easier so when you see a problem, it's got a tolerance you can self adjust for. So it's the beginning of kind of the self-healing system. >> Okay, and I know that Telegraf is definitely opensource, are the other three? >> All four are open-source All four are open-source. >> Everything, in our world, everything for a developer is free, so, and a single note of Influx can handle a couple million writes a second, which is really really performant to run in production. Where our business model is, where we make money is, our closed source clustering, sharding, distributing the database, if you decide you want to run highly available in the production environment, you would buy our closed-source stuff. We have about 430 customers who run our closed source stuff on top of the opensource. >> So, it is kind of like a MapR to Hadoop if you will, where, you know, it's built on, built on the opensource, and then they've got their proprietary stuff kind of wrapped around it, almost like an open core? Or is that a? >> Yeah, it's a little It's a little different than the normal Hadoop stuff. One is, our stuff doesn't have any external dependencies. It can work with other third party projects, but just, it's a platform onto itself, there aren't 25 projects. There are four different projects, we own them all, they come across as a single binary, and it's not part of Apache. >> So they're integrated So the TICK is the full TICK >> Yes, and then you put the clustering on top. So there's some similarity, but not being part of Apache, we can control and keep clean what that experience is. And we're about, the thing that's been most successful for us is, well Paul our founder who is my partner, it's called time to awesome, the idea that a developer in 10 minutes can very quickly be up and instrumenting an application or a set of sensors, and see that data pouring in within 10 minutes from going to the site and downloading the opensource. >> So it's interesting, the giant opportunity is really around IoT, just in terms of the explosion of the sensor data, and we see that coming, and we were at AT&T show a couple weeks ago, talking about 5G which is, slowly, slowly coming down the road, (Evan laughs) they've got the standards fixed. But in terms of the, you said the shorter term, nobody has budget, I always like to joke, nobody has budget for a new platform, they do have budget for new applications, because they've got real problems. So you said you're seeing, your main success now, your go to market application, is around application monitoring? Would that be accurate, or what is kind of your? >> Yeah, there are two broad things, and they're both very similar technology as a service. One is the central monitoring stuff so, Tesla's Power Wall, Seimens' Windmills, a variety of solar companies build Telegraf into their platforms and then use InluxData to collect and store that information and analyze it. On the software side, people like IBM's Cloud Service running their network and their fabric, SAP with Ariba, Cisco with all their collaboration stuff, they instrument their software applications. And that's the idea is it's a general purpose platform for collecting and instrumenting instrumenting the applications or the sensors, either one, or both. >> Okay, and so what are you guys working on now, what's next, kind of raise the profile, get some new stuff >> Yeah, so we are-- before the whole IoT thing completely explodes, we're not quite there yet but it's coming down the pike. >> But we're starting to see it really happen, so that's really exciting for us. And this is just a really, really big market, it's certainly a super set of the log market, it should be. As you think about just the instrumentation of the physical world, how much instrumentation is going on, your clothes, your cars, your homes, your industrial devices, my watch, how much sensor data there is. We think this is a tremendously large market, so we're doing a couple of things. One is, we're about to introduce a new language for querying these kinds of time-series data that's going to be opensource, that a bunch of other people can use with their data stores. We're rolling out a new API-driven service, so that people can store these things directly in the could natively, so all they have to do is know our API. So we're really trying to push from the technology limit we're a product-driven company, and so, and an opensource-driven company, so we're trying to push that, that community is super important to us. >> It's so wild to me, the opportunity to have a closed feedback loop between someone's product back to the barn, you're barely starting to see it, Tesla obviously, is a good example, they're slowly seeing it in other places. But what a fundamental change in manufacturing, from building a product, making some assumptions about use, shipping that product to your distribution, and then, maybe you get some feedback now an then, versus actually monitoring the way that that thing is actually used by your end user, whether it's a product like a car, or even a software application, as you're rolling out all these different apps and features in the apps, how are people using it, are they using it? Where do you double down, where do you back off? And that loop has not really been >> That's pretty insightful. >> opened up very wide. Yeah, no it's just starting to open up, and that whole notion of product telemetry, my prediction is is that, as development teams grow and things like that, you're going to have telemetry experts, people are going to be specializing. How do you instrument these products so you get maximum engagement, and usage, and things like that? So I think that's pretty insightful on your part. If you think about it from a systems point of view, right? Instrumentation is first. You can't do anything 'til you instrument, whether it's telemetry from a product, it's the engagement or this. So instrumentation is first, visibility in real time is second. So observability is the big thought in systems application and building now, this notion of observing your system in real time, because you don't know, apriori, it's impossible to know a complex system, how it's going to behave, then it's automation, right? So like, okay now I can see these behaviors, how do I automate something that makes the experience for you, the user, better? But lastly, we can see this with self-driving cars, it's autonomy. It's the idea that the system becomes self-healing, and AI, and those sorts of things, but that's kind of the last step. There's a lot of learning in that process to get there. >> And it has to be automated because at scale there's no way for people to keep up with this stuff, and then how do you separate signal from noise and how do you know what to do? So you've got to automate a whole bunch of this. >> And you know if we had an aspiration it would be we're not going to write the applications that do these things but what we want to do is be that system of record so that people have a really efficient, effective metrics and events store so they can really track and keep track of all that engagement. Time-stamped data, for lack of a better way to say it. >> It sounds like you're in a pretty good space, Evan. >> Pretty excited (chuckles), thank you. Thanks for saying that, but yeah, we're pretty excited. >> Alright, well thanks for taking a few minutes out of your day and sharing the story, we look forward to watching the journey. >> Yeah. Thanks man. Alright, take care. >> Alright, thanks. He's Evan, I'm Jeff, you're watching theCUBE. We're having a CUBE Conversation in Palo Alto, we'll see you next time, thanks for watching. (intense orchestral music)

Published Date : Sep 28 2018

SUMMARY :

it's Evan Kaplan, he's the CEO of InluxData. So for people that aren't familiar with the company, is broadly accepted by developers as the number one So what do you guys do and so if you look at a company like Splunk working on logs, and then there's Kapacitor which takes action on the data. So when the Kapacitor, to basically scaling the application, self healing. and those types of things? so you're in between? So it's the beginning of kind of the self-healing system. All four are open-source in the production environment, It's a little different than the normal Hadoop stuff. Yes, and then you put the clustering on top. So you said you're seeing, And that's the idea is it's a general purpose platform before the whole IoT thing completely explodes, so all they have to do is know our API. the opportunity to have a closed feedback loop between There's a lot of learning in that process to get there. and then how do you separate signal from noise and And you know if we had an aspiration it would be Thanks for saying that, but yeah, we're pretty excited. and sharing the story, Alright, take care. we'll see you next time,

ENTITIES

Entity	Category	Confidence
Vadim	PERSON	0.99+
Pravin Pillai	PERSON	0.99+
Vadim Supitskiy	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Pravin	PERSON	0.99+
Dave	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Rickard Söderberg	PERSON	0.99+
Jeff	PERSON	0.99+
Peter Burris	PERSON	0.99+
Thomas	PERSON	0.99+
Rickard	PERSON	0.99+
Evan	PERSON	0.99+
John Furrier	PERSON	0.99+
Micheline Nijmeh	PERSON	0.99+
Google	ORGANIZATION	0.99+
Peter	PERSON	0.99+
Abdul Razack	PERSON	0.99+
Micheline	PERSON	0.99+
Sept 2018	DATE	0.99+
March 2019	DATE	0.99+
Evan Kaplan	PERSON	0.99+
Hong Kong	LOCATION	0.99+
11	QUANTITY	0.99+
80%	QUANTITY	0.99+
New York City	LOCATION	0.99+
1949	DATE	0.99+
GANT	ORGANIZATION	0.99+
Tesla	ORGANIZATION	0.99+
Zscaler	ORGANIZATION	0.99+
30%	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
Palo Alto	LOCATION	0.99+
six months	QUANTITY	0.99+
Cisco	ORGANIZATION	0.99+
G Suite	TITLE	0.99+
Paul	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
millions	QUANTITY	0.99+
IBM	ORGANIZATION	0.99+
two	QUANTITY	0.99+
73%	QUANTITY	0.99+
Mongo	ORGANIZATION	0.99+
58%	QUANTITY	0.99+
one	QUANTITY	0.99+
GDPR	TITLE	0.99+
Formex	ORGANIZATION	0.99+
San Francisco	LOCATION	0.99+
Palo Alto, California	LOCATION	0.99+
three years	QUANTITY	0.99+
10 minutes	QUANTITY	0.99+
fourth	QUANTITY	0.99+
InluxData	ORGANIZATION	0.99+
Abdul	PERSON	0.99+

Sameer Nori, Cohesity | Microsoft Ignite 2018

>> Live from Orlando, Florida, it's theCUBE, covering Microsoft Ignite. Brought to you by Cohesity and theCUBE's ecosystem partners. >> Welcome back everyone to theCUBE's live coverage of Microsoft Ignite. I'm your host, Rebecca Knight, along with my co-host Stu Miniman. We're joined by Sameer Nori, he is the director of product marketing at Cohesity. Thanks so much for coming on theCUBE, Sameer. >> Thanks Rebecca, and thanks Stu, and thanks for having me on theCUBE. I'm exciting to be here. >> So you are a tech veteran you've been in this industry for a long time, you've worked at a lot of kinds of companies, what drew you to Cohesity? >> That's a great question, so when I was at MAPR, and as you are familiar, MAPR sits at the intersection of big data and storage, and what I saw very interesting and fascinating about Cohesity was a similar hypothesis in terms of, having built its own file system but then really applying it to a different realm of the market, in terms of secondary data and applications starting with backup as the foundation. And as you are aware, analytics is a part of our road map and a solution that we enabled. But you know, some starter things there, but that's kind of actually what really drove me and the opportunity to really try and be part of another hybrid company and apply my experience from prior industries into different segments. >> So you lead outbound marketing for Cohesity's Cloud Solutions, tell our viewers a little about what you do, and what your day is like? >> Sure, so my day oscillates and changes between developing value prop and messaging for our solutions, working with customers to understand their pain points and challenges, being able to translate that into tangible benefits for customers to parley off of, and then really enabling and working with our sales teams closely to help them, arm them with the necessary things they need to go succeed in the market. >> Sameer you've got an interesting space, questions I think we all get in the industry is, things are changing a lot, what do I do with my applications? If you look at the enterprise, most enterprises have hundreds, if not thousands of applications, and it's not a trivial thing to say, oh well, yeah we'll just put everything in the cloud, that'll be real easy, right? You've got SaaS, Microsoft opened the door of the flood gates really pushing everybody to Office 365 to SaaS-ify a lot of what you are doing. Public cloud is a big growth and then private cloud really modernizing the environment, what are you seeing and hearing from customers as to how they deal with the portfolio of their applications mobility of what they're doing? Where does Cohesity play and advise and help with those solutions? >> Sure, that's a great question too. So I think really what we see from customers is a combination of a couple of things. As you said, they've got thousands or probably hundreds of applications, they are not going to take a big chunk of those and just move them to the cloud as is, right? You got to select the right workloads and the right data and do the assessment and the viability fit, in terms of what makes sense. I think where our sweet spot really is is kind of back to what I said earlier. Really sort of using back up as the foundation for what customers can do and our core hypothesis has always been that backup should not be just an insurance policy. You can do a whole lot more with it. So what we see customers doing is taking their backups on premises, which are often times just idle with alternative solutions, reusing them in the cloud for test dev purposes, where it makes sense. So the easy way to convert formats, so if you are in VMDK format to the VHD format in Azure. Spin it up, run your test of processes. At the end when you're done, you can move those things back on premises and really use it in that context. So for us really I think it's a combination of assessing the right use case and the right application of the workload. And then making that, helping customers with understanding that and making that shift in that case. >> So talk a little bit about the biggest customer pain points and how you develop the right solution for them in this customized and tailored way. >> Sure, so I think for us, from our perspective, what we are seeing with challenges, customers their back up data is sitting idle. It's shocking actually sometimes to hear that, if we're talking to IT and storage teams, and the application test and development teams with their peers, they often have to wait weeks, or sometimes even months, to get a copy of data that they need. I think in today's world that shouldn't be the case, right? Our value prop really there is to help eliminate those expensive copies of data that are getting made. And because our platform is so purpose built and agile with the effect of reusing that backup data for test dev, that's actually where we really see the sweet spot coming together. Customers have even asked us, for example in our UI, can you actually provision test dev data and make it more self-service in nature right from that view point. I think that is something we are looking into. Into what makes sense there from a capability. But that's kind of really actually where we see the challenge and how we are enabling customers into solving that. >> Yeah Sameer, I want to go back to something you said at the beginning about the premise of, I've got all my applications, and I'm going to have intelligence, usually called machine learning and the like. How are these going to come together? We hear Microsoft really talking about that's the future. Satya Nadella is well-known, you know AI, AI, AI, is one of the main things that he talks about. How is that similar with the Cohesity division? >> Yeah, so I think when we think about the application world and how we are taking advantage of things, like AI and machine learning. Our recently announced capability and product are on Helios, which is from our perspective the global management piece to manage all your secondary apps. We've injected machine learning and AI capabilities there to help customers with smart assistant type of mode and capability to help them predict their, how much, when they need more capacity. Smart alerts to tell them what's happening in their system. And that's kind of both on premises and in the cloud. For us really, I think where we see specifically AI and machine learning coming together. I think as customers are injecting those in the applications they are using. I think definitely the data side of it and how that effects the underlying data landscape will make a difference, in terms of how we accommodate that. But I think from a core ML and AI perspective, Helios is our focal point in terms of what we are doing to bring those capabilities to bear. >> So what has the customer response been to them? It sounds very cool. Are customers using it? Are they finding that it is being very, that it is helping them a lot, in terms of, as you said, notifying if they need more capacity. >> Yeah, so I think it's early days for us when it comes to Helios, right? It's a pretty new product, but we're working with customers actively, especially our existing base, to get them really on board with a product and really the service. Be able to collect and assimilate all of their data, and help them with the usage of it. I think the more data we collect, as you know with machine learning and AI, the more data you collect the richer sample set you have. You can do a whole lot more with it. I think when it comes to the application side of it, the discussion we had earlier on application mobility and making that. I think the University of Pennsylvania is an interesting example of a customer we have where they have about forty different websites internally and externally. They had a planned power, building shutdown for like a day. They had a problem where they couldn't get to recreate those sites easily from their prior infrastructure. So our CloudSpin capability, which is what really helps customers take their on premises VMs and reuse them in the cloud, really came to their rescue with helping them very easily make these websites quickly operational. For them it's been a few simple clicks, and then when they are done with that, when the disaster, in this case, a planned disaster was done, they both go back to their operational on premises. So that's I think a great example of our capabilities coming to light, and really shining in the app mobility arena. But also actually spill over in a sort of disaster recovery. >> Sameer, I'm curious. One of the other things in the application space is a lot of the new appplications, call them cloud native apps if you will, what are you hearing from customers, and does Cohesity, is there something different about new type of architectures and how that ties into Cohesity's solutions? >> Yeah, absolutely so I think what we are seeing from customers is when it comes to everything that's our applications born in the cloud. Often times I think what we see as backup is kind of actually a rear guard, it's not even thought of in the context of cloud native. I think we see that being a challenge because customers, I think what they've with backup in the cloud today, they've got either a combination of manual scripts, they've got some, you know, processes they're running there. There is a lack of automation. So we have actually integrated with the snapshot API of Azure, for instance, and Azure disks. We bring through a combination of what we do on the platform side and that integration we're actually able to bring enterprise class backup capabilities to that cloud native app. The backup they can experience there. So that's kind of I think where we are looking actually to do more, in terms of that. I think we are starting to see more demand, in terms of more cloud native backup as it relates to applications that are more born in the cloud. I think with us the beauty is it's a single platform that's going to work on premises and in the cloud. And not have a separate solution that's quote unquote, just for the cloud versus one that's for on premises. >> One of the biggest challenges that so many companies have regardless of their industry is getting employees to adopt new technologies. I'm wondering how closely you work with your customers to make sure that there is a wide spread adoption and a real embrace of the Cohesity solutions. >> Sure, I think to me what's fascinating is a big value prop and message for our customers with us is the simplicity. That ranges all the way from the way they can do their upgrades with us. The way actually our interface presents itself. So in most cases, actually I think what we've heard from customers is with little to minimal training they're able to actually get up and going with Cohesity. That actually speaks volumes, to the fact, in terms of how the product and the UI and everything else was designed. We definitely have a support and services team that is, as we are starting to grow more enterprise and work with larger customers, it's starting to have those programs in place to enable customers to get up to speed quickly. But actually, in often times, more often than not, it's more the case of, I could just set it up get it up and going and I'm off to the races. >> Sameer, it's our first time here at Microsoft Ignite, over 30,000 people are here, 5,000 organizations, what takeaways do you have for people that haven't been able to attend this show? What have you seen so far? >> Sure, no, I think what I've seen is that from our viewpoint, we've seen lot of customers. We've had some great sessions with customers. We've got a couple more coming up. I think the hybrid cloud message is definitely mainstream, right? I think for customers who are not taking advantage of the services that A, Microsoft has to offer, and then B, others ISVs like us plug into that ecosystem very closely. I think customers definitely should be embracing that in full steam and moving forward with their hybrid cloud initiatives. >> Great, well Sameer, thank you so much for coming on theCUBE, it was great having you. >> Thanks, Rebecca. Thanks, Stu, I appreciate it. >> I'm Rebecca Knight, for Stu Miniman We will have more from Microsoft Ignite in just a little bit. (upbeat music)

Published Date : Sep 25 2018

SUMMARY :

Brought to you by Cohesity and theCUBE's ecosystem partners. he is the director of product marketing at Cohesity. I'm exciting to be here. and the opportunity to really try for customers to parley off of, to Office 365 to SaaS-ify a lot of what you are doing. is kind of back to what I said earlier. So talk a little bit about the biggest customer and the application test and development teams and I'm going to have intelligence, and how that effects the underlying data landscape So what has the customer response been to them? and really the service. is a lot of the new appplications, because customers, I think what they've with backup and a real embrace of the Cohesity solutions. to actually get up and going with Cohesity. Sure, no, I think what I've seen is that Great, well Sameer, thank you so much Thanks, Rebecca. in just a little bit.

ENTITIES

Entity	Category	Confidence
Rebecca	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Satya Nadella	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Stu Miniman	PERSON	0.99+
Sameer	PERSON	0.99+
Stu	PERSON	0.99+
thousands	QUANTITY	0.99+
Cohesity	ORGANIZATION	0.99+
hundreds	QUANTITY	0.99+
University of Pennsylvania	ORGANIZATION	0.99+
Office 365	TITLE	0.99+
Sameer Nori	PERSON	0.99+
first time	QUANTITY	0.99+
Orlando, Florida	LOCATION	0.99+
hundreds of applications	QUANTITY	0.98+
over 30,000 people	QUANTITY	0.98+
5,000 organizations	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.98+
One	QUANTITY	0.98+
today	DATE	0.98+
about forty different websites	QUANTITY	0.97+
MAPR	ORGANIZATION	0.97+
one	QUANTITY	0.96+
both	QUANTITY	0.96+
Azure	TITLE	0.95+
single platform	QUANTITY	0.95+
a day	QUANTITY	0.92+
CloudSpin	TITLE	0.91+
thousands of applications	QUANTITY	0.79+
Helios	TITLE	0.73+
Microsoft Ignite	ORGANIZATION	0.69+
Helios	ORGANIZATION	0.57+
2018	DATE	0.56+
SaaS	TITLE	0.53+
Cohesity	TITLE	0.53+
Ignite	TITLE	0.52+
Ignite	EVENT	0.39+

Kickoff | theCUBE NYC 2018

>> Live from New York, it's theCUBE covering theCUBE New York City 2018. Brought to you by SiliconANGLE Media and its ecosystem partners. (techy music) >> Hello, everyone, welcome to this CUBE special presentation here in New York City for CUBENYC. I'm John Furrier with Dave Vellante. This is our ninth year covering the big data industry, starting with Hadoop World and evolved over the years. This is our ninth year, Dave. We've been covering Hadoop World, Hadoop Summit, Strata Conference, Strata Hadoop. Now it's called Strata Data, I don't know what Strata O'Reilly's going to call it next. As you all know, theCUBE has been present for the creation at the Hadoop big data ecosystem. We're here for our ninth year, certainly a lot's changed. AI's the center of the conversation, and certainly we've seen some horses come in, some haven't come in, and trends have emerged, some gone away, your thoughts. Nine years covering big data. >> Well, John, I remember fondly, vividly, the call that I got. I was in Dallas at a storage networking world show and you called and said, "Hey, we're doing "Hadoop World, get over there," and of course, Hadoop, big data, was the new, hot thing. I told everybody, "I'm leaving." Most of the people said, "What's Hadoop?" Right, so we came, we started covering, it was people like Jeff Hammerbacher, Amr Awadallah, Doug Cutting, who invented Hadoop, Mike Olson, you know, head of Cloudera at the time, and people like Abi Mehda, who at the time was at B of A, and some of the things we learned then that were profound-- >> Yeah. >> As much as Hadoop is sort of on the back burner now and people really aren't talking about it, some of the things that are profound about Hadoop, really, were the idea, the notion of bringing five megabytes of code to a petabyte of data, for example, or the notion of no schema on write. You know, put it into the database and then figure it out. >> Unstructured data. >> Right. >> Object storage. >> And so, that created a state of innovation, of funding. We were talking last night about, you know, many, many years ago at this event this time of the year, concurrent with Strata you would have VCs all over the place. There really aren't a lot of VCs here this year, not a lot of VC parties-- >> Mm-hm. >> As there used to be, so that somewhat waned, but some of the things that we talked about back then, we said that big money and big data is going to be made by the practitioners, not by the vendors, and that's proved true. I mean... >> Yeah. >> The big three Hadoop distro vendors, Cloudera, Hortonworks, and MapR, you know, Cloudera's $2.5 billion valuation, you know, not bad, but it's not a $30, $40 billion value company. The other thing we said is there will be no Red Hat of big data. You said, "Well, the only Red Hat of big data might be "Red Hat," and so, (chuckles) that's basically proved true. >> Yeah. >> And so, I think if we look back we always talked about Hadoop and big data being a reduction, the ROI was a reduction on investment. >> Yeah. >> It was a way to have a cheaper data warehouse, and that's essentially-- Well, what did we get right and wrong? I mean, let's look at some of the trends. I mean, first of all, I think we got pretty much everything right, as you know. We tend to make the calls pretty accurately with theCUBE. Got a lot of data, we look, we have the analytics in our own system, plus we have the research team digging in, so you know, we pretty much get, do a good job. I think one thing that we predicted was that Hadoop certainly would change the game, and that did. We also predicted that there wouldn't be a Red Hat for Hadoop, that was a production. The other prediction was is that we said Hadoop won't kill data warehouses, it didn't, and then data lakes came along. You know my position on data lakes. >> Yeah. >> I've always hated the term. I always liked data ocean because I think it was much more fluidity of the data, so I think we got that one right and data lakes still doesn't look like it's going to be panning out well. I mean, most people that deploy data lakes, it's really either not a core thing or as part of something else and it's turning into a data swamp, so I think the data lake piece is not panning out the way it, people thought it would be. I think one thing we did get right, also, is that data would be the center of the value proposition, and it continues and remains to be, and I think we're seeing that now, and we said data's the development kit back in 2010 when we said data's going to be part of programming. >> Some of the other things, our early data, and we went out and we talked to a lot of practitioners who are the, it was hard to find in the early days. They were just a select few, I mean, other than inside of Google and Yahoo! But what they told us is that things like SQL and the enterprise data warehouse were key components on their big data strategy, so to your point, you know, it wasn't going to kill the EDW, but it was going to surround it. The other thing we called was cloud. Four years ago our data showed clearly that much of this work, the modeling, the big data wrangling, et cetera, was being done in the cloud, and Cloudera, Hortonworks, and MapR, none of them at the time really had a cloud strategy. Today that's all they're talking about is cloud and hybrid cloud. >> Well, it's interesting, I think it was like four years ago, I think, Dave, when we actually were riffing on the notion of, you know, Cloudera's name. It's called Cloudera, you know. If you spell it out, in Cloudera we're in a cloud era, and I think we were very aggressive at that point. I think Amr Awadallah even made a comment on Twitter. He was like, "I don't understand "where you guys are coming from." We were actually saying at the time that Cloudera should actually leverage more cloud at that time, and they didn't. They stayed on their IPO track and they had to because they had everything betted on Impala and this data model that they had and being the business model, and then they went public, but I think clearly cloud is now part of Cloudera's story, and I think that's a good call, and it's not too late for them. It never was too late, but you know, Cloudera has executed. I mean, if you look at what's happened with Cloudera, they were the only game in town. When we started theCUBE we were in their office, as most people know in this industry, that we were there with Cloudera when they had like 17 employees. I thought Cloudera was going to run the table, but then what happened was Hortonworks came out of the Yahoo! That, I think, changed the game and I think in that competitive battle between Hortonworks and Cloudera, in my opinion, changed the industry, because if Hortonworks did not come out of Yahoo! Cloudera would've had an uncontested run. I think the landscape of the ecosystem would look completely different had Hortonworks not competed, because you think about, Dave, they had that competitive battle for years. The Hortonworks-Cloudera battle, and I think it changed the industry. I think it couldn't been a different outcome. If Hortonworks wasn't there, I think Cloudera probably would've taken Hadoop and making it so much more, and I think they wouldn't gotten more done. >> Yeah, and I think the other point we have to make here is complexity really hurt the Hadoop ecosystem, and it was just bespoke, new projects coming out all the time, and you had Cloudera, Hortonworks, and maybe to a lesser extent MapR, doing a lot of the heavy lifting, particularly, you know, Hortonworks and Cloudera. They had to invest a lot of their R&D in making these systems work and integrating them, and you know, complexity just really broke the back of the Hadoop ecosystem, and so then Spark came in, everybody said, "Oh, Spark's going to basically replace Hadoop." You know, yes and no, the people who got Hadoop right, you know, embraced it and they still use it. Spark definitely simplified things, but now the conversation has turned to AI, John. So, I got to ask you, I'm going to use your line on you in kind of the ask-me-anything segment here. AI, is it same wine, new bottle, or is it really substantively different in your opinion? >> I think it's substantively different. I don't think it's the same wine in a new bottle. I'll tell you... Well, it's kind of, it's like the bad wine... (laughs) Is going to be kind of blended in with the good wine, which is now AI. If you look at this industry, the big data industry, if you look at what O'Reilly did with this conference. I think O'Reilly really has not done a good job with the conference of big data. I think they blew it, I think that they made it a, you know, monetization, closed system when the big data business could've been all about AI in a much deeper way. I think AI is subordinate to cloud, and you mentioned cloud earlier. If you look at all the action within the AI segment, Diane Greene talking about it at Google Next, Amazon, AI is a software layer substrate that will be underpinned by the cloud. Cloud will drive more action, you need more compute, that drives more data, more data drives the machine learning, machine learning drives the AI, so I think AI is always going to be dependent upon cloud ends or some sort of high compute resource base, and all the cloud analytics are feeding into these AI models, so I think cloud takes over AI, no doubt, and I think this whole ecosystem of big data gets subsumed under either an AWS, VMworld, Google, and Microsoft Cloud show, and then also I think specialization around data science is going to go off on its own. So, I think you're going to see the breakup of the big data industry as we know it today. Strata Hadoop, Strata Data Conference, that thing's going to crumble into multiple, fractured ecosystems. >> It's already starting to be forked. I think the other thing I want to say about Hadoop is that it actually brought such great awareness to the notion of data, putting data at the core of your company, data and data value, the ability to understand how data at least contributes to the monetization of your company. AI would not be possible without the data. Right, and we've talked about this before. You call it the innovation sandwich. The innovation sandwich, last decade, last three decades, has been Moore's law. The innovation sandwich going forward is data, machine intelligence applied to that data, and cloud for scale, and that's the sandwich of innovation over the next 10 to 20 years. >> Yeah, and I think data is everywhere, so this idea of being a categorical industry segment is a little bit off, I mean, although I know data warehouse is kind of its own category and you're seeing that, but I don't think it's like a Magic Quadrant anymore. Every quadrant has data. >> Mm-hm. >> So, I think data's fundamental, and I think that's why it's going to become a layer within a control plane of either cloud or some other system, I think. I think that's pretty clear, there's no, like, one. You can't buy big data, you can't buy AI. I think you can have AI, you know, things like TensorFlow, but it's going to be a completely... Every layer of the stack is going to be impacted by AI and data. >> And I think the big players are going to infuse their applications and their databases with machine intelligence. You're going to see this, you're certainly, you know, seeing it with IBM, the sort of Watson heavy lift. Clearly Google, Amazon, you know, Facebook, Alibaba, and Microsoft, they're infusing AI throughout their entire set of cloud services and applications and infrastructure, and I think that's good news for the practitioners. People aren't... Most companies aren't going to build their own AI, they're going to buy AI, and that's how they close the gap between the sort of data haves and the data have-nots, and again, I want to emphasize that the fundamental difference, to me anyway, is having data at the core. If you look at the top five companies in terms of market value, US companies, Facebook maybe not so much anymore because of the fake news, though Facebook will be back with it's two billion users, but Apple, Google, Facebook, Amazon, who am I... And Microsoft, those five have put data at the core and they're the most valuable companies in the stock market from a market cap standpoint, why? Because it's a recognition that that intangible value of the data is actually quite valuable, and even though banks and financial institutions are data companies, their data lives in silos. So, these five have put data at the center, surrounded it with human expertise, as opposed to having humans at the center and having data all over the place. So, how do they, how do these companies close the gap? How do the companies in the flyover states close the gap? The way they close the gap, in my view, is they buy technologies that have AI infused in it, and I think the last thing I'll say is I see cloud as the substrate, and AI, and blockchain and other services, as the automation layer on top of it. I think that's going to be the big tailwind for innovation over the next decade. >> Yeah, and obviously the theme of machine learning drives a lot of the conversations here, and that's essentially never going to go away. Machine learning is the core of AI, and I would argue that AI truly doesn't even exist yet. It's machine learning really driving the value, but to put a validation on the fact that cloud is going to be driving AI business is some of the terms in popular conversations we're hearing here in New York around this event and topic, CUBENYC and Strata Conference, is you're hearing Kubernetes and blockchain, and you know, these automation, AI operation kind of conversations. That's an IT conversation, (chuckles) so you know, that's interesting. You've got IT, really, with storage. You've got to store the data, so you can't not talk about workloads and how the data moves with workloads, so you're starting to see data and workloads kind of be tossed in the same conversation, that's a cloud conversation. That is all about multi-cloud. That's why you're seeing Kubernetes, a term I never thought I would be saying at a big data show, but Kubernetes is going to be key for moving workloads around, of which there's data involved. (chuckles) Instrumenting the workloads, data inside the workloads, data driving data. This is where AI and machine learning's going to play, so again, cloud subsumes AI, that's the story, and I think that's going to be the big trend. >> Well, and I think you're right, now. I mean, that's why you're hearing the messaging of hybrid cloud and from the big distro vendors, and the other thing is you're hearing from a lot of the no-SQL database guys, they're bringing ACID compliance, they're bringing enterprise-grade capability, so you're seeing the world is hybrid. You're seeing those two worlds come together, so... >> Their worlds, it's getting leveled in the playing field out there. It's all about enterprise, B2B, AI, cloud, and data. That's theCUBE bringing you the data here. New York City, CUBENYC, that's the hashtag. Stay with us for more coverage live in New York after this short break. (techy music)

Published Date : Sep 12 2018

SUMMARY :

Brought to you by SiliconANGLE Media for the creation at the Hadoop big data ecosystem. and some of the things we learned then some of the things that are profound about Hadoop, We were talking last night about, you know, but some of the things that we talked about back then, You said, "Well, the only Red Hat of big data might be being a reduction, the ROI was a reduction I mean, first of all, I think we got and I think we're seeing that now, and the enterprise data warehouse were key components and I think we were very aggressive at that point. Yeah, and I think the other point and all the cloud analytics are and cloud for scale, and that's the sandwich Yeah, and I think data is everywhere, and I think that's why it's going to become I think that's going to be the big tailwind and I think that's going to be the big trend. and the other thing is you're hearing New York City, CUBENYC, that's the hashtag.

ENTITIES

Entity	Category	Confidence
Apple	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Diane Greene	PERSON	0.99+
Google	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
John	PERSON	0.99+
Alibaba	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Jeff Hammerbacher	PERSON	0.99+
$30	QUANTITY	0.99+
New York	LOCATION	0.99+
2010	DATE	0.99+
IBM	ORGANIZATION	0.99+
Doug Cutting	PERSON	0.99+
Mike Olson	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Dallas	LOCATION	0.99+
O'Reilly	ORGANIZATION	0.99+
Yahoo	ORGANIZATION	0.99+
Cloudera	ORGANIZATION	0.99+
five	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Abi Mehda	PERSON	0.99+
John Furrier	PERSON	0.99+
New York City	LOCATION	0.99+
$2.5 billion	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
MapR	ORGANIZATION	0.99+
Amr Awadallah	PERSON	0.99+
$40 billion	QUANTITY	0.99+
17 employees	QUANTITY	0.99+
VMworld	ORGANIZATION	0.99+
Today	DATE	0.99+
Impala	ORGANIZATION	0.99+
Nine years	QUANTITY	0.99+
four years ago	DATE	0.98+
last night	DATE	0.98+
last decade	DATE	0.98+
Strata Data Conference	EVENT	0.98+
Strata Conference	EVENT	0.98+
Hadoop Summit	EVENT	0.98+
ninth year	QUANTITY	0.98+
Four years ago	DATE	0.98+
two worlds	QUANTITY	0.97+
five companies	QUANTITY	0.97+
today	DATE	0.97+
Strata Hadoop	EVENT	0.97+
Hadoop World	EVENT	0.96+
CUBE	ORGANIZATION	0.96+
Google Next	ORGANIZATION	0.95+
Twitter	ORGANIZATION	0.95+
this year	DATE	0.95+
Spark	ORGANIZATION	0.95+
US	LOCATION	0.94+
CUBENYC	EVENT	0.94+
Strata O'Reilly	ORGANIZATION	0.93+
next decade	DATE	0.93+

Infrastructure For Big Data Workloads

>> From the SiliconANGLE media office in Boston, Massachusetts, it's theCUBE! Now, here's your host, Dave Vellante. >> Hi, everybody, welcome to this special CUBE Conversation. You know, big data workloads have evolved, and the infrastructure that runs big data workloads is also evolving. Big data, AI, other emerging workloads need infrastructure that can keep up. Welcome to this special CUBE Conversation with Patrick Osborne, who's the vice president and GM of big data and secondary storage at Hewlett Packard Enterprise, @patrick_osborne. Great to see you again, thanks for coming on. >> Great, love to be back here. >> As I said up front, big data's changing. It's evolving, and the infrastructure has to also evolve. What are you seeing, Patrick, and what's HPE seeing in terms of the market forces right now driving big data and analytics? >> Well, some of the things that we see in the data center, there is a continuous move to move from bare metal to virtualized. Everyone's on that train. To containerization of existing apps, your apps of record, business, mission-critical apps. But really, what a lot of folks are doing right now is adding additional services to those applications, those data sets, so, new ways to interact, new apps. A lot of those are being developed with a lot of techniques that revolve around big data and analytics. We're definitely seeing the pressure to modernize what you have on-prem today, but you know, you can't sit there and be static. You gotta provide new services around what you're doing for your customers. A lot of those are coming in the form of this Mode 2 type of application development. >> One of the things that we're seeing, everybody talks about digital transformation. It's the hot buzzword of the day. To us, digital means data first. Presumably, you're seeing that. Are organizations organizing around their data, and what does that mean for infrastructure? >> Yeah, absolutely. We see a lot of folks employing not only technology to do that. They're doing organizational techniques, so, peak teams. You know, bringing together a lot of different functions. Also, too, organizing around the data has become very different right now, that you've got data out on the edge, right? It's coming into the core. A lot of folks are moving some of their edge to the cloud, or even their core to the cloud. You gotta make a lot of decisions and be able to organize around a pretty complex set of places, physical and virtual, where your data's gonna lie. >> There's a lot of talk, too, about the data pipeline. The data pipeline used to be, you had an enterprise data warehouse, and the pipeline was, you'd go through a few people that would build some cubes and then they'd hand off a bunch of reports. The data pipeline, it's getting much more complex. You've got the edge coming in, you've got, you know, core. You've got the cloud, which can be on-prem or public cloud. Talk about the evolution of the data pipeline and what that means for infrastructure and big data workloads. >> For a lot of our customers, and we've got a pretty interesting business here at HPE. We do a lot with the Intelligent Edge, so, our Edgeline servers in Aruba, where a a lot of the data is sitting outside of the traditional data center. Then we have what's going on in the core, which, for a lot of customers, they are moving from either traditional EDW, right, or even Hadoop 1.0 if they started that transformation five to seven years ago, to, a lot of things are happening now in real time, or a combination thereof. The data types are pretty dynamic. Some of that is always getting processed out on the edge. Results are getting sent back to the core. We're also seeing a lot of folks move to real-time data analytics, or some people call it fast data. That sits in your core data center, so utilizing things like Kafka and Spark. A lot of the techniques for persistent storage are brand new. What it boils down to is, it's an opportunity, but it's also very complex for our customers. >> What about some of the technical trends behind what's going on with big data? I mean, you've got sprawl, with both data sprawl, you've got workload sprawl. You got developers that are dealing with a lot of complex tooling. What are you guys seeing there, in terms of the big mega-trends? >> We have, as you know, HPE has quite a few customers in the mid-range in enterprise segments. We have some customers that are very tech-forward. A lot of those customers are moving from this, you know, Hadoop 1.0, Hadoop 2.0 system to a set of essentially mixed workloads that are very multi-tenant. We see customers that have, essentially, a mix of batch-oriented workloads. Now they're introducing these streaming type of workloads to folks who are bringing in things like TensorFlow and GPGPUs, and they're trying to apply some of the techniques of AI and ML into those clusters. What we're seeing right now is that that is causing a lot of complexity, not only in the way you do your apps, but the number of applications and the number of tenants who use that data. It's getting used all day long for various different, so now what we're seeing is it's grown up. It started as an opportunity, a science project, the POC. Now it's business-critical. Becoming, now, it's very mission-critical for a lot of the services that drives. >> Am I correct that those diverse workloads used to require a bespoke set of infrastructure that was very siloed? I'm inferring that technology today will allow you to bring those workloads together on a single platform. Is that correct? >> A couple of things that we offer, and we've been helping customers to get off the complexity train, but provide them flexibility and elasticity is, a lot of the workloads that we did in the past were either very vertically-focused and integrated. One app server, networking, storage, to, you know, the beginning of the analytics phase was really around symmetrical clusters and scaling them out. Now we've got a very rich and diverse set of components and infrastructure that can essentially allow a customer to make a data lake that's very scalable. Compute, storage-oriented nodes, GPU-oriented nodes, so it's very flexible and helps us, helps the customers take complexity out of their environment. >> In thinking about, when you talk to customers, what are they struggling with, specifically as it relates to infrastructure? Again, we talked about tooling. I mean, Hadoop is well-known for the complexity of the tooling. But specifically from an infrastructure standpoint, what are the big complaints that you hear? >> A couple things that we hear is that my budget's flat for the next year or couple years, right? We talked earlier in the conversation about, I have to modernize, virtualize, containerizing my existing apps, that means I have to introduce new services as well with a very different type of DevOps, you know, mode of operations. That's all with the existing staff, right? That's the number one issue that we hear from the customers. Anything that we can do to help increase the velocity of deployment through automation. We hear now, frankly, the battle is for whether I'm gonna run these type of workloads on-prem versus off-prem. We have a set of technology as well as services, enabling services with Pointnext. You remember the acquisition we made around cloud technology partners to right-place where those workloads are gonna go and become like a broker in that conversation and assist customers to make that transition and then, ultimately, give them an elastic platform that's gonna scale for the diverse set of workloads that's well-known, sized, easy to deploy. >> As you get all this data, and the data's, you know, Hadoop, it sorta blew up the data model. Said, "Okay, we'll leave the data where it is, "we'll bring the compute there." You had a lot of skunk works projects growing. What about governance, security, compliance? As you have data sprawl, how are customers handling that challenge? Is it a challenge? >> Yeah, it certainly is a challenge. I mean, we've gone through it just recently with, you know, GDPR is implemented. You gotta think about how that's gonna fit into your workflow, and certainly security. The big thing that we see, certainly, is around if the data's residing outside of your traditional data center, that's a big issue. For us, when we have Edgeline servers, certainly a lot of things are coming in over wireless, there's a big buildout in advent of 5G coming out. That certainly is an area that customers are very concerned about in terms of who has their data, who has access to it, how can you tag it, how can you make sure it's secure. That's a big part of what we're trying to provide here at HPE. >> What specifically is HPE doing to address these problems? Products, services, partnerships, maybe you could talk about that a little bit. Maybe even start with, you know, what's your philosophy on infrastructure for big data and AI workloads? >> I mean, for us, we've over the last two years have really concentrated on essentially two areas. We have the Intelligent Edge, which is, certainly, it's been enabled by fantastic growth with our Aruba products in the networks in space and our Edgeline systems, so, being able to take that type of compute and get it as far out to the edge as possible. The other piece of it is around making hybrid IT simple, right? In that area, we wanna provide a very flexible, yet easy-to-deploy set of infrastructure for big data and AI workloads. We have this concept of the Elastic Platform for Analytics. It helps customers deploy that for a whole myriad of requirements. Very compute-oriented, storage-oriented, GPUs, cold and warm data lakes, for that matter. And the third area, what we've really focused on is the ecosystem that we bring to our customers as a portfolio company is evolving rapidly. As you know, in this big data and analytics workload space, the software development portion of it is super dynamic. If we can bring a vetted, well-known ecosystem to our customers as part of a solution with advisory services, that's definitely one of the key pieces that our customers love to come to HP for. >> What about partnerships around things like containers and simplifying the developer experience? >> I mean, we've been pretty public about some of our efforts in this area around OneSphere, and some of these, the models around, certainly, advisory services in this area with some recent acquisitions. For us, it's all about automation, and then we wanna be able to provide that experience to the customers, whether they want to develop those apps and deploy on-prem. You know, we love that. I think you guys tag it as true private cloud. But we know that the reality is, most people are embracing very quickly a hybrid cloud model. Given the ability to take those apps, develop them, put them on-prem, run them off-prem is pretty key for OneSphere. >> I remember Antonio Neri, when you guys announced Apollo, and you had the astronaut there. Antonio was just a lowly GM and VP at the time, and now he's, of course, CEO. Who knows what's in the future? But Apollo, generally at the time, it was like, okay, this is a high-performance computing system. We've talked about those worlds, HPC and big data coming together. Where does a system like Apollo fit in this world of big data workloads? >> Yeah, so we have a very wide product line for Apollo that helps, you know, some of them are very tailored to specific workloads. If you take a look at the way that people are deploying these infrastructures now, multi-tenant with many different workloads. We allow for some compute-focused systems, like the Apollo 2000. We have very balanced systems, the Apollo 4200, that allow a very good mix of CPU, memory, and now customers are certainly moving to flash and storage-class memory for these type of workloads. And then, Apollo 6500 were some of the newer systems that we have. Big memory footprint, NVIDIA GPUs allowing you to do very high calculations rates for AI and ML workloads. We take that and we aggregate that together. We've made some recent acquisitions, like Plexxi, for example. A big part of this is around simplification of the networking experience. You can probably see into the future of automation of the networking level, automation of the compute and storage level, and then having a very large and scalable data lake for customers' data repositories. Object, file, HTFS, some pretty interesting trends in that space. >> Yeah, I'm actually really super excited about the Plexxi acquisition. I think it's because flash, it used to be the bottleneck was the spinning disk, flash pushes the bottleneck largely to the network. Plexxi gonna allow you guys to scale, and I think actually leapfrog some of the other hyperconverged players that are out there. So, super excited to see what you guys do with that acquisition. It sounds like your focus is on optimizing the design for I/O. I'm sure flash fits in there as well. >> And that's a huge accelerator for, even when you take a look at our storage business, right? So, 3PAR, Nimble, All-Flash, certainly moving to NVMe and storage-class memory for acceleration of other types of big data databases. Even though we're talking about Hadoop today, right now, certainly SAP HANA, scale-out databases, Oracle, SQL, all these things play a part in the customer's infrastructure. >> Okay, so you were talking before about, a little bit about GPUs. What is this HPE Elastic Platform for big data analytics? What's that all about? >> I mean, we have a lot of the sizing and scalability falls on the shoulders of our customers in this space, especially in some of these new areas. What we've done is, we have, it's a product/a concept, and what we do is we have this, it's called the Elastic Platform for Analytics. It allows, with all those different components that I rattled off, all great systems in of their own, but when it comes to very complex multi-tenant workloads, what we do is try to take the mystery out of that for our customers, to be able to deploy that cookie-cutter module. We're even gonna get to a place pretty soon where we're able to offer that as a consumption-based service so you don't have to choose for an elastic type of acquisition experience between on-prem and off-prem. We're gonna provide that as well. It's not only a set of products. It's reference architectures. We do a lot of sizing with our partners. The Hortonworks, CloudEra's, MapR's, and a lot of the things that are out in the open source world. It's pretty good. >> We've been covering big data, as you know, for a long, long time. The early days of big data was like, "Oh, this is great, "we're just gonna put white boxes out there "and off the shelf storage!" Well, that changed as big data got, workloads became more enterprise, mainstream, they needed to be enterprise-ready. But my question to you is, okay, I hear you. You got products, you got services, you got perspectives, a philosophy. Obviously, you wanna sell some stuff. What has HPE done internally with regard to big data? How have you transformed your own business? >> For us, we wanna provide a really rich experience, not just products. To do that, you need to provide a set of services and automation, and what we've done is, with products and solutions like InfoSight, we've been able to, we call it AI for the Data Center, or certainly, the tagline of predictive analytics is something that Nimble's brought to the table for a long time. To provide that level of services, InfoSight, predictive analytics, AI for the Data Center, we're running our own big data infrastructure. It started a number of years ago even on our 3PAR platforms and other products, where we had scale-up databases. We moved and transitioned to batch-oriented Hadoop. Now we're fully embedded with real-time streaming analytics that come in every day, all day long, from our customers and telemetry. We're using AI and ML techniques to not only improve on what we've done that's certainly automating for the support experience, and making it easy to manage the platforms, but now introducing things like learning, automation engines, the recommendation engines for various things for our customers to take, essentially, the hands-on approach of managing the products and automate it and put into the products. So, for us, we've gone through a multi-phase, multi-year transition that's brought in things like Kafka and Spark and Elasticsearch. We're using all these techniques in our system to provide new services for our customers as well. >> Okay, great. You're practitioners, you got some street cred. >> Absolutely. >> Can I come back on InfoSight for a minute? It came through an acquisition of Nimble. It seems to us that you're a little bit ahead, and maybe you say a lot a bit ahead of the competition with regard to that capability. How do you see it? Where do you see InfoSight being applied across the portfolio, and how much of a lead do you think you have on competitors? >> I'm paranoid, so I don't think we ever have a good enough lead, right? You always gotta stay grinding on that front. But we think we have a really good product. You know, it speaks for itself. A lot of the customers love it. We've applied it to 3PAR, for example, so we came out with some, we have VMVision for a 3PAR that's based on InfoSight. We've got some things in the works for other product lines that are imminent pretty soon. You can think about what we've done for Nimble and 3PAR, we can apply similar type of logic to Elastic Platform for Analytics, like running at that type of cluster scale to automate a number of items that are pretty pedantic for the customers to manage. There's a lot of work going on within HPE to scale that as a service that we provide with most of our products. >> Okay, so where can I get more information on your big data offerings and what you guys are doing in that space? >> Yeah, so, we have, you can always go to hp.com/bigdata. We've got some really great information out there. We're in our run-up to our big end user event that we do every June in Las Vegas. It's HPE Discover. We have about 15,000 of our customers and trusted partners there, and we'll be doing a number of talks. I'm doing some work there with a British telecom. We'll give some great talks. Those'll be available online virtually, so you'll hear about not only what we're doing with our own InfoSight and big data services, but how other customers like BTE and 21st Century Fox and other folks are applying some of these techniques and making a big difference for their business as well. >> That's June 19th to the 21st. It's at the Sands Convention Center in between the Palazzo and the Venetian, so it's a good conference. Definitely check that out live if you can, or if not, you can all watch online. Excellent, Patrick, thanks so much for coming on and sharing with us this big data evolution. We'll be watching. >> Yeah, absolutely. >> And thank you for watcihing, everybody. We'll see you next time. This is Dave Vellante for theCUBE. (fast techno music)

Published Date : Jun 12 2018

SUMMARY :

From the SiliconANGLE media office and the infrastructure that in terms of the market forces right now to modernize what you have on-prem today, One of the things that we're seeing, of their edge to the cloud, of the data pipeline A lot of the techniques What about some of the technical trends for a lot of the services that drives. Am I correct that a lot of the workloads for the complexity of the tooling. You remember the acquisition we made the data where it is, is around if the data's residing outside Maybe even start with, you know, of the Elastic Platform for Analytics. Given the ability to take those apps, GM and VP at the time, automation of the compute So, super excited to see what you guys do in the customer's infrastructure. Okay, so you were talking before about, and a lot of the things But my question to you and automate it and put into the products. you got some street cred. bit ahead of the competition for the customers to manage. that we do every June in Las Vegas. Definitely check that out live if you can, We'll see you next time.

ENTITIES

Entity	Category	Confidence
Patrick	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Aruba	LOCATION	0.99+
Antonio	PERSON	0.99+
BTE	ORGANIZATION	0.99+
Patrick Osborne	PERSON	0.99+
HPE	ORGANIZATION	0.99+
June 19th	DATE	0.99+
Antonio Neri	PERSON	0.99+
Las Vegas	LOCATION	0.99+
Pointnext	ORGANIZATION	0.99+
Hewlett Packard Enterprise	ORGANIZATION	0.99+
NVIDIA	ORGANIZATION	0.99+
third area	QUANTITY	0.99+
21st Century Fox	ORGANIZATION	0.99+
Apollo 4200	COMMERCIAL_ITEM	0.99+
@patrick_osborne	PERSON	0.99+
Apollo 6500	COMMERCIAL_ITEM	0.99+
InfoSight	ORGANIZATION	0.99+
MapR	ORGANIZATION	0.99+
Sands Convention Center	LOCATION	0.99+
Boston, Massachusetts	LOCATION	0.98+
Apollo 2000	COMMERCIAL_ITEM	0.98+
CloudEra	ORGANIZATION	0.98+
HP	ORGANIZATION	0.98+
Nimble	ORGANIZATION	0.98+
Spark	TITLE	0.98+
SAP HANA	TITLE	0.98+
next year	DATE	0.98+
GDPR	TITLE	0.98+
One app	QUANTITY	0.98+
Venetian	LOCATION	0.98+
two areas	QUANTITY	0.98+
today	DATE	0.98+
hp.com/bigdata	OTHER	0.97+
one	QUANTITY	0.97+
Hortonworks	ORGANIZATION	0.97+
Mode 2	OTHER	0.96+
single platform	QUANTITY	0.96+
SQL	TITLE	0.96+
One	QUANTITY	0.96+
21st	DATE	0.96+
Elastic Platform	TITLE	0.95+
3PAR	TITLE	0.95+
Hadoop 1.0	TITLE	0.94+
seven years ago	DATE	0.93+
CUBE Conversation	EVENT	0.93+
Palazzo	LOCATION	0.93+
Hadoop	TITLE	0.92+
Kafka	TITLE	0.92+
Hadoop 2.0	TITLE	0.91+
Elasticsearch	TITLE	0.9+
Plexxi	ORGANIZATION	0.87+
Apollo	ORGANIZATION	0.87+
of years ago	DATE	0.86+
Elastic Platform for Analytics	TITLE	0.85+
Oracle	ORGANIZATION	0.83+
TensorFlow	TITLE	0.82+
Edgeline	ORGANIZATION	0.82+
Intelligent Edge	ORGANIZATION	0.81+
about 15,000 of	QUANTITY	0.78+
one issue	QUANTITY	0.77+
five	DATE	0.74+
HPE Discover	ORGANIZATION	0.74+
both data	QUANTITY	0.73+
data	ORGANIZATION	0.73+
years	DATE	0.72+
SiliconANGLE	LOCATION	0.71+
EDW	TITLE	0.71+
Edgeline	COMMERCIAL_ITEM	0.71+
HPE	TITLE	0.7+
OneSphere	ORGANIZATION	0.68+
couple	QUANTITY	0.64+
3PAR	ORGANIZATION	0.63+

John Kreisa, Hortonworks | Dataworks Summit EU 2018

>> Narrator: From Berlin, Germany, it's theCUBE. Covering Dataworks Summit Europe 2018. Brought to you by Hortonworks. >> Hello, welcome to theCUBE. We're here at Dataworks Summit 2018 in Berlin, Germany. I'm James Kobielus. I'm the lead analyst for Big Data Analytics, within the Wikibon team of SiliconAngle Media. Our guest is John Kreisa. He's the VP for Marketing at Hortonworks, of course, the host company of Dataworks Summit. John, it's great to have you. >> Thank you Jim, it's great to be here. >> We go long back, so you know it's always great to reconnect with you guys at Hortonworks. You guys are on a roll, it's been seven years I think since you guys were founded. I remember the founding of Hortonworks. I remember when it splashed in the Wall Street Journal. It was like oh wow, this big data thing, this Hadoop thing is actually, it's a market, it's a segment and you guys have built it. You know, you and your competitors, your partners, your ecosystem continues to grow. You guys went IPO a few years ago. Your latest numbers are pretty good. You're continuing to grow in revenues, in customer acquisitions, your deal sizes are growing. So Hortonworks remains on a roll. So, I'd like you to talk right now, John, and give us a sense of where Hortonworks is at in terms of engaging with the marketplace, in terms of trends that you're seeing, in terms of how you're addressing them. But talk about first of all the Dataworks Summit. How many attendees do you have from how many countries? Just give us sort of the layout of this show. >> I don't have all of the final counts yet. >> This is year six of the show? >> This is year six in Europe, absolutely, thank you. So it's great, we've moved it around different locations. Great venue, great host city here in Berlin. Super excited about it, I know we have representatives from more than 51 countries. If you think about that, drawing from a really broad set of countries, well beyond, as you know, because you've interviewed some of the folks beyond just Europe. We've had them from South America, U.S., Africa, and Asia as well, so really a broad swath of the open-source and big data community, which is great. The final attendance is going to be 1,250 to 1,300 range. The final numbers, but a great sized conference. The energy level's been really great, the sessions have been, you know, oversubscribed, standing room only in many of the popular sessions. So the community's strong, I think that's the thing that we really see here and that we're really continuing to invest in. It's something that Hortonworks was founded around. You referenced the founding, and driving the community forward and investing is something that has been part of our mantra since we started and it remains that way today. >> Right. So first of all what is Hortonworks? Now how does Hortonworks position itself? Clearly Hadoop is your foundation, but you, just like Cloudera, MapR, you guys have all continued to evolve to address a broader range of use-cases with a deeper stack of technology with fairly extensive partner ecosystems. So what kind of a beast is Hortonworks? It's an elephant, but what kind of an elephant is it? >> We're an elephant or riding on the elephant I'd say, so we're a global data management company. That's what we're helping organizations do. Really the end-to-end lifecycle of their data, helping them manage it regardless of where it is, whether it's on-premise or in the cloud, really through hybrid data architectures. That's really how we've seen the market evolve is, we started off in terms of our strategy with the platform based on Hadoop, as you said, to store, process, and analyze data at scale. The kind of fundamental use-case for Hadoop. Then as the company emerged, as the market kind of continued to evolve, we moved to and saw the opportunity really, capturing data from the edge. As IOT and kind of edge-use cases emerged it made sense for us to add to the platform and create the Hortonworks DataFlow. >> James: Apache NiFi >> Apache NiFi, exactly, HDF underneath, with associated additional open-source projects in there. Kafka and some streaming and things like that. So that was now move data, capture data in motion, move it back and put it into the platform for those large data applications that organizations are building on the core platform. It's also the next evolution, seeing great attach rates with that, the really strong interest in the Apache NiFi, you know, the meetup here for NiFi was oversubscribed, so really really strong interest in that. And then, the markets continued to evolve with cloud and cloud architectures, customers wanting to deploy in the cloud. You know, you saw we had that poll yesterday in the general session about cloud with really interesting results, but we saw that there was really companies wanting to deploy in a hybrid way. Some of them wanted to move specific workloads to the cloud. >> Multi-cloud, public, private. >> Exactly right, and multi-data center. >> The majority of your customer deployments are on prem. >> They are. >> Rob Bearden, your CEO, I think he said in a recent article on SiliconAngle that two-thirds of your deployments are on prem. Is that percentage going down over time? Are more of your customers shifting toward a public cloud orientation? Does Hortonworks worry about that? You've got partnerships, clearly, with the likes of IBM, AWS, and Microsoft Dasher and so forth, so do you guys see that as an opportunity, as a worrisome trend? >> No, we see it very much as an opportunity. And that's because we do have customers who are wanting to put more workloads and run things in the cloud, however, there's still almost always a component that's going to be on premise. And that creates a challenge for organizations. How do they manage the security and governance and really the overall operations of those deployments as they're in the cloud and on premise. And, to your point, multi-cloud. And so you get some complexity in there around that deployment and particularly with the regulations, we talked about GDPR earlier today. >> Oh, by the way, the Data Steward Studio demo today was really, really good. It showed that, first of all, you cover the entire range of core requirements for compliance. So that was actually the primary announcement at this show; Scott Gnau announced that. You demoed it today, I think you guys are off on a good start, yeah. We've gotten really, and thank you for that, we've gotten really good feedback on our DataPlane Services strategy, right, it provides that single pane of glass. >> I should say to our viewers that Data Steward Studio is the second of the services under the DataPlane, the Hortonworks DataPlane Services Portfolio. >> That's right, that's exactly right. >> Go ahead, keep going. >> So, you know, we see that as an opportunity. We think we're very strongly positioned in the market, being the first to bring that kind of solution to the customers and our large customers that we've been talking about and who have been starting to use DataPlane have been very, very positive. I mean they see it as something that is going to help them really kind of maintain control over these deployments as they start to spread around, as they grow their uses of the thing. >> And it's built to operate across the multi-cloud, I know this as well in terms of executing the consent or withdrawal of consent that the data subject makes through what is essentially a consent portal. >> That's right, that's right. >> That was actually a very compelling demonstration in that regard. >> It was good, and they worked very hard on it. And I was speaking to an analyst yesterday, and they were saying that they're seeing an increasing number of the customers, enterprises, wanting to have a multi-cloud strategy. They don't want to get locked into any one public cloud vendor, so, what they want is somebody who can help them maintain that common security and governance across their different deployments, and they see DataPlane Services is the way that's going to help them do that. >> So John, how is Hortonworks, what's your road map, how do you see the company in your go to market evolving over the coming years in terms of geographies, in terms of your focuses? Focus, in terms of the use-cases and workloads that the Hortonworks portfolio addresses. How is that shifting? You mentioned the Edge. AI, machine learning, deep learning. You are a reseller of IBM Data Science Experience. >> DSX, that's right. >> So, let's just focus on that. Do you see more customers turning to Hortonworks and IBM for a complete end-to-end pipeline for the ingest, for the preparation, modeling, training and so forth? And deployment of operationalized AI? Is that something you see going forward as an evolution path for your capabilities? >> I'd say yes, long-term, or even in the short-term. So, they have to get their data house in order, if you will, before they get to some of those other things, so we're still, Hortonworks strategy has always been focused on the platform aspect, right? The data-at-rest platform, data-in-motion platform, and now a platform for managing common security and governance across those different deployments. Building on that is the data science, machine learning, and AI opportunity, but our strategy there, as opposed to trying to trying to do it ourselves, is to partner, so we've got the strong partnership with IBM, resell their DSX product. And also other partnerships around to deliver those other capabilities, like machine learning and AI, from our partner ecosystem, which you referenced. We have over 2,300 partners, so a very, very strong ecosystem. And so, we're going to stick to our strategy of the platforms enabling that, which will subsequently enable data science, machine learning, and AI on top. And then, if you want me to talk about our strategy in terms of growth, so we already operate globally. We've got offices in I think 19 different countries. So we're really covering the globe in terms of the demand for Hortonworks products and beginning implements. >> Where's the fastest growing market in terms of regions for Hortonworks? >> Yeah, I mean, international generally is our fastest growing region, faster than the U.S. But we're seeing very strong growth in APAC, actually, so India, Asian countries, Singapore, and then up and through to Japan. There's a lot of growth out in the Asian region. And, you know, they're sort of moving directly to digital transformation projects at really large scale. Big banks, telcos, from a workload standpoint I'd say the patterns are very similar to what we've seen. I've been at Hortonworks for six and a half years, as it turns out, and the patterns we saw initially in terms of adoption in the U.S. became the patterns we saw in terms of adoption in Europe and now those patterns of adoption are the same in Asia. So, once a company realizes they need to either drive out operational costs or build new data applications, the patterns tend to be the same whether it's retail, financial services, telco, manufacturing. You can sort of replicate those as they move forward. >> So going forward, how is Hortonworks evolving as a company in terms of, for example with GDPR, Data Steward, data governance as a strong focus going forward, are you shifting your model in terms of your target customer away from the data engineers, the Hadoop cluster managers who are still very much the center of it, towards more data governance, towards more business analyst level of focus. Do you see Hortonworks shifting in that direction in terms of your focus, go to market, your message and everything? >> I would say it's not a shifting as much as an expansion, so we definitely are continuing to invest in the core platform, in Hadoop, and you would have heard of some of the changes that are coming in the core Hadoop 3.0 and 3.1 platform here. Alan and others can talk about those details, and in Apache NiFi. But, to your point, as we bring and have brought Data Steward Studio and DataPlane Services online, that allows us to address a different user within the organization, so it's really an expansion. We're not de-investing in any other things. It's really here's another way in a natural evolution of the way that we're helping organizations solve data problems. >> That's great, well thank you. This has been John Kreisa, he's the VP for marketing at Hortonworks. I'm James Kobielus of Wikibon SiliconAngle Media here at Dataworks Summit 2018 in Berlin. And it's been great, John, and thank you very much for coming on theCUBE. >> Great, thanks for your time. (techno music)

Published Date : Apr 19 2018

SUMMARY :

Brought to you by Hortonworks. of course, the host company of Dataworks Summit. to reconnect with you guys at Hortonworks. the sessions have been, you know, oversubscribed, you guys have all continued to evolve to address the platform based on Hadoop, as you said, in the Apache NiFi, you know, the meetup here so do you guys see that as an opportunity, and really the overall operations of those Oh, by the way, the Data Steward Studio demo today is the second of the services under the DataPlane, being the first to bring that kind of solution that the data subject makes through in that regard. an increasing number of the customers, Focus, in terms of the use-cases and workloads for the preparation, modeling, training and so forth? Building on that is the data science, machine learning, in terms of adoption in the U.S. the data engineers, the Hadoop cluster managers in the core platform, in Hadoop, and you would have This has been John Kreisa, he's the Great, thanks for your time.

ENTITIES

Entity	Category	Confidence
Alan	PERSON	0.99+
James Kobielus	PERSON	0.99+
Jim	PERSON	0.99+
Rob Bearden	PERSON	0.99+
IBM	ORGANIZATION	0.99+
John Kreisa	PERSON	0.99+
Europe	LOCATION	0.99+
John	PERSON	0.99+
Asia	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Berlin	LOCATION	0.99+
yesterday	DATE	0.99+
Africa	LOCATION	0.99+
South America	LOCATION	0.99+
SiliconAngle Media	ORGANIZATION	0.99+
U.S.	LOCATION	0.99+
1,250	QUANTITY	0.99+
Scott Gnau	PERSON	0.99+
1,300	QUANTITY	0.99+
Berlin, Germany	LOCATION	0.99+
seven years	QUANTITY	0.99+
six and a half years	QUANTITY	0.99+
Japan	LOCATION	0.99+
Hadoop	TITLE	0.99+
Asian	LOCATION	0.99+
second	QUANTITY	0.98+
over 2,300 partners	QUANTITY	0.98+
today	DATE	0.98+
two-thirds	QUANTITY	0.98+
19 different countries	QUANTITY	0.98+
Dataworks Summit	EVENT	0.98+
more than 51 countries	QUANTITY	0.98+
Hadoop 3.0	TITLE	0.98+
first	QUANTITY	0.98+
James	PERSON	0.98+
Data Steward Studio	ORGANIZATION	0.98+
Dataworks Summit EU 2018	EVENT	0.98+
Dataworks Summit 2018	EVENT	0.97+
Cloudera	ORGANIZATION	0.97+
MapR	ORGANIZATION	0.96+
GDPR	TITLE	0.96+
DataPlane Services	ORGANIZATION	0.96+
Singapore	LOCATION	0.96+
year six	QUANTITY	0.95+
2018	EVENT	0.95+
Wikibon SiliconAngle Media	ORGANIZATION	0.94+
India	LOCATION	0.94+
Hadoop	ORGANIZATION	0.94+
APAC	ORGANIZATION	0.93+
Big Data Analytics	ORGANIZATION	0.93+
3.1	TITLE	0.93+
Wall Street Journal	TITLE	0.93+
one	QUANTITY	0.93+
Apache	ORGANIZATION	0.92+
Wikibon	ORGANIZATION	0.92+
NiFi	TITLE	0.92+

Steve Wilkes, Striim | Big Data SV 2018

>> Narrator: Live from San Jose it's theCUBE. Presenting Big Data Silicon Valley. Brought to you by SiliconANGLE Media and its ecosystem partners. (upbeat music) >> Welcome back to San Jose everybody, this is theCUBE, the leader in live tech coverage and you're watching BigData SV, my name is Dave Vellante. In the early days of Hadoop everything was batch oriented. About four or five years ago the market really started to focus on real time and streaming analytics to try to really help companies affect outcomes while things were still in motion. Steve Wilks is here, he's the co-founder and CTO of a company called Stream, a firm that's been in this business for around six years. Steve welcome to theCUBE, good to see you. Thanks for coming on. >> Thanks Dave it's a pleasure to be here. >> So tell us more about that, you started about six years ago, a little bit before the market really started talking about real time and streaming. So what led you to that conclusion that you should co-found Steam way ahead of its time? >> It's partly our heritage. So the four of us that founded Stream, we were executives at GoldenGate Software. In fact our CEO Ali Kutay was the CEO of GoldenGate Software. So when we were acquired by Oracle in 2009, after having to work for Oracle for a couple years, we were trying to work out what to do next. And GoldenGate was replication software right? So it's moving data from one place to another. But customers would ask us in customer advisory boards, that data seems valuable, it's moving. Can you look at it while it's moving and analyze it while it's moving, get value out of that moving data? And so that was kind of set in our heads. And then we were thinking about what to do next, that was kind of the genesis of the idea. So the concept around Stream when we first started the company was we can't just give people streaming data, we need to give them the ability to process that data, analyze it, visualize it, play with it and really truly understand the data. As well as being able to collect it and move it somewhere else. And so the goal from day one was always to build a full end-to-end platform that did everything customers needed to do for streaming integration analytics out of the box. And that's what we've done after six years. >> I got to ask a really basic question, so you're talking about your experience at GoldenGate moving data from point a to point b and somebody said well why don't we put that to work. But is there change data or was it static data? Why couldn't I just analyze it in place? >> GoldenGate works on change data. >> Okay so that's why, there was changes going through. Why wait until it hits its target, let's do some work in real time and learn from that, get greater productivity. And now you guys have taken that to a new level. That new level being what? Modern tools, modern technologies? >> A platform built from the ground up to be inherently distributed, scalable, reliable with exactly one's processing guarantees. And to be a complete end-to-end platform. There's a recognition that the first part of being able to do streaming data integration or analytics is that you need to be able to collect the data right? And while change data captured from databases is the way to get data out of databases in a streaming fashion, you also have to deal with files and devices and message queues and anywhere else the data can reside. So you need a large number of different data collectors that all turn the enterprise data sources into streaming data. And similarly if you want to store data somewhere you need a large collection of target adapters that deliver to things. Not just on premise but also in the cloud. So things like Amazon S3 or the cloud databases like Redshift and Google BigQuery. So the idea was really that we wanted to give customers everything they need and that everything they need isn't trivial. It's not just, well we take Apache Kafka and then we stuff things into it and then we take things out. Pretty often, for example, you need to be able to enrich data and that means you need to be able to join streaming data with additional context information, reference data. And that reference data may come form a database or from files or somewhere else. So you can't call out to the database and maintain the speeds of streaming data. We have customers that are doing hundreds of thousands of events per second. So you can't call out to a database for every event and ask for records to enrich it with. And you can't even do that with an external cache because it's just not fast enough. So we built in an in-memory data grid as part of our platform. So you can join streaming data with the context information in real time without slowing anything down. So when you're thinking about doing streaming integration, it's more than just moving data around. It's ability to process it and get it in the right form, to be able to analyze it, to be able to do things like complex event processing on that data. And also to be able to visualize it and play with it is an essential part of the whole platform. >> So I wanted to ask you about end-to-end. I've seen a lot of products from larger, maybe legacy companies that will say it's end-to-end but what it really is, is a cobbled together pieces that they bought in and then, this is our end-to-end platform, but it's not unified. Or I've seen others "Well we've got an end-to-end platform" oh really, can I see the visualization? "Well we don't have visualization "we use this third party for visualization". So convince me that you're end-to-end. >> So our platform when you start with it you go into a UI, you can start building data flows. Those data flows start from connectors, we have all the connectors that you need to get your enterprise data. We have wizards to help you build those. And so now you have a data stream. Now you want to start processing that, we have SQL-based processing so you can do everything from filtering, transformation, aggregation, enrichment of data. If you want to load reference data into memory you use a cache component to drag that in, configure that. You now have data in-memory you can join with your streams. If you want to now take the results of all that processing and write it somewhere, use one of our target connectors, drag that in so you've got a data flow that's getting bigger and bigger, doing more and more processing. So now you're writing some of that data out to Kafka, oh I'm going to also add in another target adaptor write some of it into Azure Blob Storage and some of it's going to Amazon Redshift. So now you have a much bigger data flow. But now you say okay well I also want to do some analytics on that. So you take the data stream, you build another data flow that is doing some aggregation of a Windows, maybe some complex event processing, and then you use that dashboard builder to build a dashboard to visualize all of that. And that's all in one product. So it literally is everything you need to get value immediately. And you're right, the big vendors they have multiple different products and they're very happy to sell you consulting to put them all together. Even if you're trying to build this from open source and you know, organizations try and do that, you need five or six major pieces of open source, a lot of support in libraries, and a huge team of developers to just build a platform that you can start to build applications on. And most organizations aren't software platform companies, they're finance companies, oil and gas companies, healthcare companies. And they really want to focus on solving business problems and not on reinventing the wheel by building a software platform. So we can just go in there and say look; value immediately. And that really, really helps. >> So what are some of your favorite use cases, examples, maybe customer examples that you can share with me? >> So one of the great examples, one of my customers they have a lot of data in our HP non-stop system. And they needed to be able to get visibility into that immediately. And this was like order processing, supply chain, ERP data. And it would've taken a very large amount of time to do analytics directly on the HP nonstop. And finding resources to do that is hard as well. So they needed to get the data out and they need to get it into the appropriate place. And they recognize that use the right technology to ask the right question. So they wanted some of it in Hadoop so they could do some machine learning on that. They wanted some of it to go into Kafka so they could get real time analytics. And they wanted some of it to go into HBase so they could query it immediately and use that for reference purposes. So they utilized us to do change data capture against the HP nonstop, deliver that datastream out immediately into Kafka and also push some of it into HEFS and some of it into HBase. So they immediately got value out of that, because then they could also build some real-time analytics on it. It would sent out alerts if things were taking too long in their order processing system. And allowed them to get visibility directly into their process that they couldn't get before with much fewer resources and more modern technologies than they could have used before. So that's one example. >> Can I ask you a question about that? So you talked about Kafka, HBase, you talk about a lot of different open source projects. You've integrated those or you've got entries and exits into those? >> So we ship with Kafka as part of our product. It's an optional messaging bus. So, our platform has two different ways of moving data around. We have a high-speed, in-memory only message bus and that works almost network speed and it's great for a lot of different use cases. And that is what backs our data streams. So when you build a data flow, you have streams in between each step, that is backed by an in-memory bus. Pretty often though, in use cases, you need to be able to potentially rewind data for recovery purposes or have different applications running at different speeds and that's where a persistent message bus like Kafka comes in but you don't want to use a persistent message bus for everything because it's doing IO and it's slowing things down. So you typically use that at the beginning, at the sources, especially things like IOT where you can't rewind into them. Things like databases and files, you can rewind into them and replay and recover but IOT sources, you can't do that. So you would push that into a Kafka backed stream and then subsequent processing is in-memory. So we have that as part of our product. We also have Elastic as part of our product for results storage. You can switch to other results storage but that's our default. And we have a few other key components that are part of our product but then on the periphery, we have adapters integrate with a lot of the other things that you mentioned. So we have adapters to read and write HDFS, Hive, HBase, Across, Cloudera, Autumn Works, even MapR. So we have the MapR versions of the file system and MapR streams and MapR DB and then there's lots of other more proprietary connectors like CVC from Oracle, and SQL server, and MySQL and MariaDB. And then database connectors for delivery to virtually any JDBC compliant database. >> I took you down a tangent before you had a chance. You were going to give us another example. We're pretty much out of time but if you can briefly share either that or the last word, I'll give it to you. >> I think the last word would be that that is one example. We have lots and lots of other types of use cases that we do including things like: migrating data from on-premise to the cloud, being able to distribute log data, and being able to analyze that log data being able to do in-memory analytics and get real-time insights immediately and send alerts. It's a very comprehensive platform but each one of those use cases are very easy to develop on their own and you can do them very quickly. And of course as the use case expands within a customer, they build more and more and so they end up using the same platform for lots of different use cases within the same account. >> And how large is the company? How many people? >> We are around 70 people right now. >> 70 People and you're looking for funding? What rounds are you in? Where are you at with funding and revenue and all that stuff? >> Well I'd have to defer to my CEO for those questions. >> All right, so you've been around for what, six years you said? >> Yeah, we have a number of rounds of funding. We had initial seed funding then we had the investment by Summit Partners that carried us through for a while. Then subsequent investment from Intel Capital, Dell EMC, Atlantic Bridge. And that's where we are right now. >> Good, excellent. Steve, thanks so much for coming on theCUBE, really appreciate your time. >> Great, it's awesome. Thank you Dave. >> Great to meet you. All right, keep it right there everybody, we'll be back with our next guest. This is theCUBE. We're live from BigData SV in San Jose. We'll be right back. (techno music)

Published Date : Mar 9 2018

SUMMARY :

Brought to you by SiliconANGLE Media the market really started to focus So what led you to that conclusion So it's moving data from one place to another. I got to ask a really basic question, And now you guys have taken that to a new level. and that means you need to be able to So I wanted to ask you about end-to-end. So our platform when you start with it And they needed to be able to get visibility So you talked about Kafka, HBase, So when you build a data flow, you have streams We're pretty much out of time but if you can briefly to develop on their own and you can do them very quickly. And that's where we are right now. really appreciate your time. Thank you Dave. Great to meet you.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Steve Wilks	PERSON	0.99+
Steve	PERSON	0.99+
2009	DATE	0.99+
Steve Wilkes	PERSON	0.99+
five	QUANTITY	0.99+
Intel Capital	ORGANIZATION	0.99+
GoldenGate Software	ORGANIZATION	0.99+
Ali Kutay	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
hundreds	QUANTITY	0.99+
GoldenGate	ORGANIZATION	0.99+
Kafka	TITLE	0.99+
San Jose	LOCATION	0.99+
Stream	ORGANIZATION	0.99+
MySQL	TITLE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Atlantic Bridge	ORGANIZATION	0.99+
six years	QUANTITY	0.99+
Steam	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
MapR	TITLE	0.99+
HP	ORGANIZATION	0.99+
four	QUANTITY	0.99+
70 People	QUANTITY	0.99+
Dell EMC	ORGANIZATION	0.99+
MariaDB	TITLE	0.99+
Striim	PERSON	0.99+
SQL	TITLE	0.99+
one	QUANTITY	0.98+
each step	QUANTITY	0.98+
Summit Partners	ORGANIZATION	0.98+
two different ways	QUANTITY	0.97+
first part	QUANTITY	0.97+
around six years	QUANTITY	0.97+
around 70 people	QUANTITY	0.96+
HBase	TITLE	0.96+
one example	QUANTITY	0.96+
theCUBE	ORGANIZATION	0.95+
BigData SV	ORGANIZATION	0.94+
Big Data	ORGANIZATION	0.92+
Hadoop	TITLE	0.92+
one product	QUANTITY	0.92+
each one	QUANTITY	0.91+
six major pieces	QUANTITY	0.91+
About four	DATE	0.91+
CVC	TITLE	0.89+
first	QUANTITY	0.89+
about six years ago	DATE	0.88+
day one	QUANTITY	0.88+
Elastic	TITLE	0.87+
Silicon Valley	LOCATION	0.87+
Windows	TITLE	0.87+
five years ago	DATE	0.86+
S3	TITLE	0.82+
JDBC	TITLE	0.81+
Azure	TITLE	0.8+
CEO	PERSON	0.79+
one place	QUANTITY	0.78+
Redshift	TITLE	0.76+
Autumn	ORGANIZATION	0.75+
second	QUANTITY	0.74+
thousands	QUANTITY	0.72+
Big Data SV 2018	EVENT	0.71+
couple years	QUANTITY	0.71+
Google	ORGANIZATION	0.69+

Michael Weiss & Shere Saidon, NASDAQ | PentahoWorld 2017

>> Narrator: Live from Orlando, Florida, it's theCube covering PentahoWorld 2017 brought to you by Hitachi Ventara. >> Welcome back to theCube's live coverage of PentahoWorld brought to you by Hitachi Ventara. My name is Rebecca Knight, I'm your host along with my co-host, Dave Vellante. We're joined by Michael Weiss, he is the senior manager at NASDAQ, and Shere Saidon, who is analytics manager at NASDAQ. Thanks so much for coming back to theCube, I should say, you're Cube veterans now. >> We are, at least I am. This is his first year, this is his first time at PentahoWorld. So, excited to bring him along. >> Okay so you're a newbie but you're a veteran so. (laughing) >> Great. So, tell us a little bit about what has changed since the last time you came on, which was 2015, back then? >> So the biggest thing that's happened in the past 18 months is we've launched seven new exchanges. Integrated seven new exchanges. We bought the ISE, the International Stock Exchange, which is three options markets. We just completed that integration in August. We've also bought the Canadian, CHI-X, the Canadian Exchange, which also had three equities markets, so we integrated them, and we went live with a dark pool offering for Goldman back in June. So now we operate a dark pool for Goldman Sachs, and we're looking to kind of expand that offering at this point. >> So you're just getting bigger and bigger. So tell our viewers a little bit how Pentaho fits into this. >> So Pentaho is the engine that kind of does all our analytics behind the scenes at post trade, right. So we do a lot of traditionally TL, where we're doing batch processing. In the back-end we're doing a little bit more with the Hadoop ecosystem leveraging things like EMR, Spark, Presto, that type of stuff, And Pentaho kind of helps blend that stuff together a little bit. We use it for reporting, we do some of the BA, we're actually now looking to have the data Pentaho generates plug in a little bit of Tableau. So, we're looking to expand it and really leverage that data in other ways at this point. Even doing some things more externally, doing more data offerings via Pentaho externally. >> So I got to do a NASDAQ 101 for my 13 year-old. Came up to me the other day and said, "Daddy, what's the NASDAQ index and how does it work?" Well, give us a 20 second answer. >> Michael: On the NASDAQ index? >> Yeah, what's the NASDAQ Index and how does it work? >> Probably the wrong person to answer that one but, the index is generally just a blend of various stocks. So the S&P 500 is a blend of different stocks, much like that the cues, are NASDAQ's equivalent of the S&P, right, so, we use a different algorithm to determine the companies that make up that blend, but it's an index just like at the S&P. >> They're weighted by market cap- >> Michael: Right, yeah. >> And that determines the number at the end- >> Michael: Correct. >> And it goes up and down based on what the stock's index. >> Right, and that's how most people know NASDAQ, right. They see the S&P went up by 5 points, The Dow went down by 3 and the NASDAQ went up by a point, right. But most people don't realize that NASDAQ also operates 27 exchanges worldwide, I think it is now. So, probably a little bit more, maybe closer to 32, but... >> So you mentioned that you're doing a dark pool for Goldman >> Michael: Yes. >> So that's interesting. We were talking off camera about HFT and kind of the old days, and dark pools were criticized at the time. Now Goldman was one of the ones shown to be honest and above board, but what does that mean the dark pool for your business and how does that all tie in? >> Michael: So, dark pools are isolated markets, right, so they don't necessarily interact with the NASDAQ exchange themselves, it's all done within the pool. You interact with only people trading on that pool. What NASDAQ has done is we took our technology and we now host it for Goldman so, we have I-NETs our trading system, so we gave them I-NET, we built all the surrounding solutions, how you manage symbols, how you manage membership. Even the data, we curate their data in the AWS. We do some Pentaho transformations for them. We do some analytics for them. And that's actually going to start expanding, but yeah, we've provided them an entire solution, so now they don't have to manage their own dark pool. And now we're going to look to expand that to other potential clients. >> Dave: So that's NASDAQ as a technology >> Yes. >> Dave: Provider. Very interesting. So I was saying, earlier, the Hong Kong Stock Exchange is basically closing the facility where they house humans, again another example of machines replacing humans. So the joining, well NASDAQ, kind of, but NYSE, London Stock Exchange, Singapore, now Hong Kong... Essentially, electronic trading. So, brings us to the sort of technology underpinnings of NASDAQ. Shere, maybe you can talk a little bit about your role, and paint a picture of the technology infrastructure. >> Yeah so I focus primarily on the financial side of corporate finance. So we leverage Pentaho to do a lot of data integration, allow us to really answer our business questions. So, previously it would take days to put basic reporting together, now you've got it all automated, or we're working towards getting it mostly automated, and it just answer the questions that we need. And no longer use our gut to drive decisions, we're using hard data. And so that's helped us instrumentally in a lot of different places. >> Dave: So, talk more about the data pipeline, where the data's coming from, how you're blending it, and how you're bringing it through the pipeline and operationalizing it. >> Yeah, so we've got a lot of different billing systems, so we integrate companies, and historically we've let them keep their billings systems. So just kind of bring it all together into our core ERP, seeing how quantities...and just getting the data, and just figuring out on the basic side, how much do we make from a certain customer? What are we making from them? What happens in different scenarios if they consolidate, or if they default? And some of the pipeline there is just blending it all together, normalizing the data, making sure it's all in the same format, and then putting it in a format where our executives or business managers can actually make decisions off of it. >> Well you're talking about the decision making process, and you said it's no longer gut, you're using data to drive your decisions, to know which direction is the right direction. How big a change is that, just culturally speaking? How has that changed? >> Yeah, it's huge, at least on our side, it's making us a long more confident in the decisions we're making. We're no longer going in saying, hey this is probably how we should do it. No, the numbers are showing us that this is going to pay off, and we stick to it and look at the hard facts, rather than what do we think is going to happen? >> So, talk a little bit about what you guys are seeing here, and you're doing a lot of speaking here, we were joking earlier, you're kind of losing your voice. You're telling your story, what kind of reactions you getting? Share with us the behind the scenes at the conference. >> I think at this conference you're seeing a lot of people kind of fall in line with similar ideas that we're trying to get to. Taking advantage more instead of your traditional MPPs, or your traditional relational databases, moving more towards this Hadoop ecosystem. Leveraging Spark, Presto, Flume, all these various new technologies that have emerged over the past two to five years, and are now more viable than ever. They're easier to scale, if you look at your traditional MPPs, like we're a big Redshift user, but every time you scale it there's a cost with that, and we don't necessarily need to maintain all that data all the time, so something in the Hadoop ecosystem now lets us maintain that data without all the unnecessary cost. I see a lot of more of that than I did two years ago, a lot more people are following that trend. I think the other interesting trend I've seen this week is this idea of becoming more cloud agnostic. Where do you operate, and how do you store your data should be irrelevant to the data processing, and I think it's going to be a tough nut to crack for Pentaho, or any vendor. But if you can figure out a way to either do some type of cloud parity, where you have support across all your services, but you don't have to know which service you deploy to when you design your pipelines, I think that's going to be huge. I think we're a little ways from that, but that's been a common theme this week as well, both private and your big three cloud providers right now, your Googles, your Azures, and your AWS. >> So when I asked you said cloud agnostic, that's great, good vision and aspiration. The follow up would be, am I correct that you don't see it as data location agnostic, right, you want to bring the cloud model to your data, versus try to force your data into a cloud? Or not necessarily? >> A lot of it I think is being driven by not wanting to be vendor locked in, so they want to have the ability to, and I think this is easier said than done, the ability to move your data to different cloud providers based on pricing or offerings, right, and right now going from AWS to Google to Azure would be a very painful process. So you move petabytes of data across, it's not cost efficient and all the savings you want to realize by moving to maybe a Google in the future, are not going to be realized cause of all the effort it's going to take to get there. >> Dave: We had CERN on earlier, and they were working on that problem... >> Yeah, it's not a trivial problem to solve, but if you can crack that, and you can then say hey I wanna...even if I have a service offering, Like our operating a dark pool for Goldman. We also have a market tech side, where we sell our trading platform and various solutions to other exchanges worldwide. If we can come up with a way to be able to deploy to any cloud provider, even on an on-prem cloud, without having to do a bunch of customizations each time, that would be huge, it would revolutionize what we do. We're, as our own company, starting to look at that, and talking with Pentaho, they're also... are going to eye that as a potential way to go, with abstractions and things like that, but it's going to take some time. >> We're you guys here yesterday for the keynotes? >> Michael: Saw some of the keynotes, yes. >> The big messaging, like every conference that you go to, is be the disruptor, or you're going to get disrupted. We talked earlier off camera... Trading volumes are down, so the way you traditionally did business is changing, and made money is changing. >> Michael: Right. >> We talked earlier about you guys becoming a technology provider, I wonder if you could help us understand that a little bit, from the standpoint of NASDAQ strategy, when we hear your CEOs talk, real visionary, technology driven transformations. >> Yeah, I think Adena's coming in is definitely looking at that as a trend, right? Trading volumes are down, they've been going down, they've kind of stabilized a little bit, and we're stable able to make money in that space, but the problem is there's not a ton of growth. We acquire the ISE, we acquire the CHI-X, we're buying market share at that point. So you increase revenue, but you also increase overhead in that way. And you can only do so many major acquisitions at a time, you can only do how many one billion dollar acquisitions a year before you have to call it a day. And we can look at more strategic, smaller acquisitions for exchanges, but that doesn't necessarily bring you the transformation, the net revenue you're looking for. So what Adena has started to look at is, how do we transform to more of a technology company? We're really good at operating exchanges, how do we take that, and we already have market tech doing it, but how do we make that more scalable, not just to the financial sector, but to your other exchanges, your Ubers or your StubHubs of the world? How do you become a service provider, or a platform as a service for these other companies, to come in and use your tech? So we're looking at how do we rewrite our entire platform, from trading to the back-end, to do things like: Can we deploy to any cloud provider? Can we deploy on-prem? Can we be a little bit more technology agnostic so to speak, and offer these as services, and offer a bunch of microservices, so that if a startup comes up and wants to set up an exchange, they can do it, they can leverage our services, then build whatever other applications they want on top of it. I think that's a transformation we need to go through, I think it's good vision, and I'm looking forward to executing it. It's going to be a couple years before we see the fruits of that labor, but Adena's really doing a great job of coming in, and really driving that innovation, and Brad Peterson as well, our CIO, has really been pushing this vision, and I think it's really going to work out for us, assuming we can execute. >> Well you know what's interesting about that, if I may, is financial services is usually so secretive about their technology, right? But your business, you guys are becoming a technology provider, so you got to face the world and start marketing your capabilities now, and opening about that. It's sort of an interesting change. >> I think you'll see that starting to become more of a thing over the next year or two, as we start actually looking to build out the platform and figure it out. We do market on the market tech side, I mean it's not a small business, but we're more strategic about who we market to, cause we're still targeting your financial exchanges, more internationally than in the U.S., but there's only so many of them, again you have to start looking at rebranding, rebuilding, and rethinking how we think about exchanges in general, and not thinking of them as just a financial thing. >> Well that's what I wanted to get into, because you're talking about this rebranding, and this rebuilding, this transformation, to the backdrop within an industry that is changing rapidly, and we have sort of the threat of legislative reform, perhaps some administrative reforms coming down all the time, so how do you manage that? I mean, those are a lot of pressures there, are you constantly trying to push the envelope right up until any changes take place? Or what would you say Shere and Michael? >> Probably again not the right person to ask about this, but we're definitely trying to stay on top of the cutting edge in innovation and the technologies out there that, whether it be Blockchain, or different types of technologies. I mean we're definitely trying to make sure we're investing in them, while maintaining our core businesses. >> Right, it's trying to find that balance right now of when to make the next step in the technology food chain, and when to balance that with regulatory obligations. And if you look at it, going back to the idea of being able to launch marketplaces, I think what you're ending up seeing over the coming years is your Ubers, your StubHubs, I think they're going to become more regulated at some level. And we're good at operating more regulated markets, so I think that's where we can kind of come in and play a role, and help wade through those regulations a little bit more, and help build software to adhere to those regulations. >> Since you brought up Blockchain, Jamie Dimon craps all over Blockchain, or you know, Bitcoin, and then clarifies his remarks, saying look, technology underneath is here to stay. Thoughts on Blockchain? Obviously Financial Services is looking at it very closely, doing some really advanced stuff, what can you tell us? >> Yeah, I think there's no argument that it's definitely an innovation and a disruptive technology. I think that it's definitely in it's early stages across the board, so we're investing in it where we can, and trying to keep a close eye on it. We think that there's a lot of potential in a lot of different applications. >> As the NASDAQ transforms its business, how does that effect the sort of back-end analytics activity and infrastructure? >> The data is just growing, that's like the biggest challenge we have now. Data that used to be done in Excel, it's just no longer an option, so now in order to get the insights that we used to get just from having a couple people doing Excel transformations, you need to now invest in the infrastructure in the back-end, and so there's a lot that needs to go into building out an infrastructure to be able to ingest the data, and then also having the UI on the front-end, so that the business can actually view it the way they want. >> So skills wise, how's that affecting who you guys are hiring and training? And how's that transformation going? >> Michael: I'll let you go first. >> I think there's definitely, data analytics is a hot field. It's very new, there's definitely a big skills gap in administrative work and in the analytics side. Usually you have people could perform analytical functions just by being administrative or operational, and now it's really, we're investing in analysts, and making sure that we have the right people in place to be able to do these transformations, or pull the data and get the answers that we need from them. >> I mean from the tech side, I think what you're seeing is where we traditionally would just plug a developer in there, whether a Java developer, or an ETL developer, I think what you're seeing now is we're looking to bring more of a business minded data analyst to the tech side, right? So we're looking to bring a data engineer, so to speak, more to the tech side. So we're not looking to hire a traditional four year Computer Science degree, or Software Engineering degree, you're looking for a different breed of person, cause quite honestly because you're traditional Java dev. or C++ developer, they're not skilled or geared towards data. And when we've tried to plug that paradigm in, it just doesn't really work, so we're looking now to hiring more of an analyst, but someone who's a little bit more techie as well. They still need to have those skills to do some level of coding, and what we are finding is that skill gap is still very much... There's a gap there. There's a huge gap. And I think it's closing, but- >> And as you have to fund those for the new areas, I presume, like many companies in your business, you're trying to move away from the sort of undifferentiated low-level infrastructure deployment hassles, and the IT labor costs there, especially as we move to the cloud, presumably, so is that shift palpable? I mean, can you see that going on? >> Yeah, I think we made a lot of progress over the past couple years in doing that. We do more one button deployments, where the operation cost is a lot lower, a lot more automation around alerting, around when things go wrong, so there's not necessarily a human being sitting there watching a computer. We've invested a lot in that area to kind of reduce the costs, and make the experience better for our end user. And even from a development side, the cost of a new application is a lot less every time you have to do a release. The question is, how do you balance that with the regulations, and make sure you still have a good process in place. The idea of putting single button deployments in place is a great one, but you still have to balance that with making sure that what you push to productions been tested, well defined, and it meets the need, and you're not just arbitrarily throwing things out there. So we're still trying to hit that balance a little bit, it's more on the back-end side. The trading system is not quite there for obvious reasons, we're way more protective of what goes out there, then surrounding it a lot of the times, but I can see a future where, again going back to this idea of transforming our business, where you can stand up and do an exchange with the click of a button. I think that's a trend we're looking at. >> Rebecca: It's not too far in the future. >> No, I don't think it is. >> Last question, Pentaho report card. What are they doing really well? What do you want to see them do better? >> I think they continue to focus in the right areas, focusing more on the data processing side, and with the big data technologies, trying to fill that gap in the big data, and be the layer that you don't have to tie yourself to ike vCloud Air or MapR, you can kind of be a little bit more plug and play. I think they still need to do some improvements on there visualizations in their front-ends. I think they've been so much more focused on the data processing, that part of it, that the visualization's kind of lacked behind, so I think they need to put a little more focus into that, but all in all, they're an A, and we've been extremely happy with them as a software provider. >> Great. >> Shere: I think the visualization part is the part that allows people to understand that value being created at Pentaho. So I think being able to maybe improve a little bit on the visualization could go a far way. >> Michael, Shere, it's been so much fun having you on theCube, and having this conversation, keep that bull market coming please, do whatever you can. >> We'll do our best. >> I'm Rebecca Knight. We are here at PentahoWorld, sponsored by Hitachi Vantara. For Dave Vellante, we will have more from theCube in just a little bit.

Published Date : Oct 27 2017

SUMMARY :

brought to you by Hitachi Ventara. brought to you by Hitachi Ventara. So, excited to bring him along. Okay so you're a newbie the last time you came on, So the biggest thing that's So you're just getting So Pentaho is the engine So I got to do a NASDAQ of the S&P, right, so, we use a different And it goes up and down and the NASDAQ went up by a point, right. kind of the old days, and dark pools so now they don't have to and paint a picture of the and it just answer the about the data pipeline, And some of the pipeline there is just and you said it's no longer gut, in the decisions we're making. scenes at the conference. and I think it's going to that you don't see it as the ability to move your data and they were working on that problem... but it's going to take some time. so the way you traditionally from the standpoint of NASDAQ strategy, We acquire the ISE, we acquire the CHI-X, so you got to face the world We do market on the market tech side, and the technologies I think they're going to become stuff, what can you tell us? across the board, so we're so that the business can actually and in the analytics side. I mean from the tech side, and make the experience Rebecca: It's not What do you want to see them do better? and be the layer that you don't have to So I think being able to having you on theCube, and For Dave Vellante, we will

ENTITIES

Entity	Category	Confidence
Michael Weiss	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Rebecca	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Michael	PERSON	0.99+
Dave	PERSON	0.99+
NYSE	ORGANIZATION	0.99+
NASDAQ	ORGANIZATION	0.99+
August	DATE	0.99+
Jamie Dimon	PERSON	0.99+
June	DATE	0.99+
AWS	ORGANIZATION	0.99+
London Stock Exchange	ORGANIZATION	0.99+
Goldman	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
2015	DATE	0.99+
Excel	TITLE	0.99+
Shere	PERSON	0.99+
Goldman Sachs	ORGANIZATION	0.99+
Shere Saidon	PERSON	0.99+
Hong Kong Stock Exchange	ORGANIZATION	0.99+
20 second	QUANTITY	0.99+
Googles	ORGANIZATION	0.99+
four year	QUANTITY	0.99+
27 exchanges	QUANTITY	0.99+
Brad Peterson	PERSON	0.99+
5 points	QUANTITY	0.99+
Ubers	ORGANIZATION	0.99+
Adena	ORGANIZATION	0.99+
Orlando, Florida	LOCATION	0.99+
seven new exchanges	QUANTITY	0.99+
Pentaho	ORGANIZATION	0.99+
CERN	ORGANIZATION	0.99+
first year	QUANTITY	0.99+
yesterday	DATE	0.99+
International Stock Exchange	ORGANIZATION	0.99+
three options	QUANTITY	0.99+
two years ago	DATE	0.99+
Java	TITLE	0.99+
first time	QUANTITY	0.98+
Hitachi Vantara	ORGANIZATION	0.98+
one	QUANTITY	0.98+
Dav	PERSON	0.98+
U.S.	LOCATION	0.98+
a day	QUANTITY	0.98+
3	QUANTITY	0.98+
this week	DATE	0.98+
both	QUANTITY	0.97+
each time	QUANTITY	0.97+
StubHubs	ORGANIZATION	0.97+
Spark	ORGANIZATION	0.97+
ISE	ORGANIZATION	0.97+
Hitachi Ventara	ORGANIZATION	0.97+

Prakash Nanduri, Paxata | BigData NYC 2017

>> Announcer: Live from midtown Manhattan, it's theCUBE covering Big Data New York City 2017. Brought to you by SiliconANGLE Media and it's ecosystem sponsors. (upbeat techno music) >> Hey, welcome back, everyone. Here live in New York City, this is theCUBE from SiliconANGLE Media Special. Exclusive coverage of the Big Data World at NYC. We call it Big Data NYC in conjunction also with Strata Hadoop, Strata Data, Hadoop World all going on kind of around the corner from our event here on 37th Street in Manhattan. I'm John Furrier, the co-host of theCUBE with Peter Burris, Head of Research at SiliconANGLE Media, and General Manager of WikiBon Research. And our next guest is one of our famous CUBE alumni, Prakash Nanduri co-founder and CEO of Paxata who launched his company here on theCUBE at our first inaugural Big Data NYC event in 2013. Great to see you. >> Great to see you, John. >> John: Great to have you back. You've been on every year since, and it's been the lucky charm. You guys have been doing great. It's not broke, don't fix it, right? And so theCUBE is working with you guys. We love having you on. It's been a pleasure, you as an entrepreneur, launching your company. Really, the entrepreneurial mojo. It's really what it's all about. Getting access to the market, you guys got in there, and you got a position. Give us the update on Paxata. What's happening? >> Awesome, John and Peter. Great to be here again. Every time I come here to New York for Strata I always look forward to our conversations. And every year we have something exciting and new to share with you. So, if you recall in 2013, it was a tiny little show, and it was a tiny little company, and we came in with big plans. And in 2013, I said, "You know, John, we're going to completely disrupt the way business consumers and business analysts turn raw data into information and they do self-service data preparation." That's what we brought to the market in 2013. Ever since, we have gone on to do something really exciting and new for our customers every year. In '14, we came in with the first Apache Spark-based platform that allowed business analysts to do data preparation at scale interactively. Every year since, last year we did enterprise grade and we talked about how Paxata is going to be delivering our self-service data preparation solution in a highly-scalable enterprise grade deployment world. This year, what's super exciting is in addition to the recent announcements we made on Paxata running natively on the Microsoft Azure HDI Spark system. We are truly now the only information platform that allows business consumers to turn data into information in a multi-cloud hybrid world for our enterprise customers. In the last few years, I came and I talked to you and I told you about work we're doing and what great things are happening. But this year, in addition to the super-exciting announcements with Microsoft and other exciting announcements that you'll be hearing. You are going to hear directly from one of our key anchor customers, Standard Chartered Bank. 150-year-old institution operating in over 46 countries. One of the most storied banks in the world with 87,500 employees. >> John: That's not a start up. >> That's not a start up. (John laughs) >> They probably have a high bar, high bar. They got a lot of data. >> They have lots of data. And they have chosen Paxata as their information fabric. We announced our strategic partnership with them recently and you know that they are going to be speaking on theCUBE this week. And what started as a little experiment, just like our experiment in 2013, has actually mushroomed now into Michael Gorriz, and Shameek Kundu, and the entire leadership of Standard Chartered choosing Paxata as the platform that will democratize information in the bank across their 87,500 employees. We are going in a very exciting way, a very fast way, and now delivering real value to the bank. And you can hear all about it on our website-- >> Well, he's coming on theCUBE so we'll drill down on that, but banks are changing. You talk about a transformation. What is a teller? An Internet of Things device. The watch potentially could be a terminal. So, the Internet of Things of people changes the game. Are the ATMs going to go away and become like broadcast points? >> Prakash: And you're absolutely right. And really what it is about is, it doesn't matter if you're a Standard Chartered Bank or if you're a pharma company or if you're the leading healthcare company, what it is is that everyone of our customers is really becoming an information-inspired business. And what we are driving our customers to is moving from a world where they're data-driven. I think being data-driven is fine. But what you need to be is information-inspired. And what does that mean? It means that you need to be able to consume data, regardless of format, regardless of source, regardless of where it's coming from, and turn it into information that actually allows you to get inside in decisions. And that's what Paxata does for you. So, this whole notion of being information-inspired, I don't care if you're a bank, if you're a car company, or if you're a healthcare company today, you need to have-- >> Prakash, for the folks watching that might not know our history as you launched on theCUBE in 2013 and have been successful every year since. You guys have really deploying the classic entrepreneurial success formula, be fast, walk the talk, listen to customers, add value. Take a minute quickly just to talk about what you guys do. Just for the folks that don't know you. >> Absolutely, let's just actually give it in the real example of you know, a customer like Standard Chartered. Standard Chartered operates in multiple countries. They have significant number of lines of businesses. And whether it's in risk and compliance, whether it is in their marketing department, whether it's in their corporate banking business, what they have to do is, a simple example could be I want to create a customer list to be able to go and run a marketing campaign. And the customer list in a particular region is not something easy for a bank like Standard Charter to come up with. They need to be able to pull from multiple sources. They need to be able to clean the data. They need to be able to shape the data to get that list. And if you look at what is really important, the people who understand the data are actually not the folks in IT but the folks in business. So, they need to have a tool and a platform that allows them to pull data from multiple sources to be able to massage it, to be able to clean it-- >> John: So, you sell to the business person? >> We sell to the business consumer. The business analyst is our consumer. And the person who supports them is the chief data officer and the person who runs the Paxata platform on their data lake infrastructure. >> So, IT sets the data lake and you guys just let the business guys go to town on the data. >> Prakash: Bingo. >> Okay, what's the problem that you solve? If you can summarize the problem that you solve for the customers, what is it? >> We take data and turn it into information that is clean, that's complete, that's consumable and that's contextual. The hardest problem in every analytical exercise is actually taking data and cleaning it up and getting it ready for analytics. That's what we do. >> It's the prep work. >> It's the prep work. >> As companies gain experience with Big Data, John, what they need to start doing increasingly is move more of the prep work or have more of the prep work flow closer to the analyst. And the reason's actually pretty simple. It's because of that context. Because the analyst knows more about what their looking for and is a better evaluator of whether or not they get what they need. Otherwise, you end up in this strange cycle time problem between people in back end that are trying to generate the data that they think they want. And so, by making the whole concept of data preparation simpler, more straight forward, you're able to have the people who actually consume the data and need it do a better job of articulating what they need, how they need it and making it presentable to the work that they're performing. >> Exactly, Peter. What does that say about how roles are starting to merge together? Cause you've got to be at the vanguard of seeing how some of these mature organizations are working. What do you think? Are we seeing roles start to become more aligned? >> Yes, I do think. So, first and foremost, I think what's happening is there is no such thing as having just one group that's doing data science and another group consuming. I think what you're going to be going into is the world of data and information isn't all-consuming and that everybody's role. Everybody has a role in that. And everybody's going to consume. So, if you look at a business analyst that was spending 80% of their time living in Excel or working with self-service BI tools like our partner's Tableau and Power BI from Microsoft, others. What you find is these people today are living in a world where either they have to live in coding scripting world hell or they have to rely on IT to get them the real data. So, the role of a business analyst or a subject matter expert, first and foremost, the fact that they work with data and they need information that's a given. There is no business role today where you can't deal with data. >> But it also makes them real valuable, because there aren't a lot of people who are good at dealing with data. And they're very, very reliant on these people to turn that data into something that is regarded as consumable elsewhere. So, you're trying to make them much more productive. >> Exactly. So, four years years ago, when we launched on theCUBE, the whole premise was that in order to be able to really drive towards a world where you can make information and data-driven decisions, you need to ensure that the business analyst community, or what I like to call the business consumer needs to have the power of being able to, A, get access to data, B, make sense of the data, and then turn that data into something that's valuable for her or for him. >> Peter: And others. >> And others, and others. Absolutely. And that's what Paxata is doing. In a collaborative, in a 21st Century world where I don't work in a silo, I work collaboratively. And then the tool, and the platform that helps me do that is actually a 21st Century platform. >> So, John, at the beginning of the session you and Jim were talking about what is going to be one of the themes here at the show. And we observed that it used to be that people were talking about setting up the hardware, setting up the clutters, getting Hadoop to work, and Jim talked about going up the stack. Well, this is one of the indicators that, in fact, people were starting to go up the stack because they're starting to worry more about the data, what it can do, the value of how it's going to be used, and how we distribute more of that work so that we get more people using data that's actually good and useful to the business. >> John: And drives value. >> And drives value. >> Absolutely. And if I may, just put a chronological aspect to this. When we launched the company we said the business analyst needs to be in charge of the data and turning the data into something useful. Then right at that time, the world of create data lakes came in thanks to our partners like Cloudera and Hortonworks, and others, and MapR and others. In the recent past, the world of moving from on premise data lakes to hybrid, multicloud data lakes is becoming reality. Our partners at Microsoft, at AWS, and others are having customers come in and build cloud-based data lakes. So, today what you're seeing is on one hand this complete democratization within the business, like at Standard Chartered, where all these business analysts are getting access to data. And on the other hand, from the data infrastructure moving into a hybrid multicloud world. And what you need is a 21st Century information management platform that serves the need of the business and to make that data relevant and information and ready for their consumption. While at the same time we should not forget that enterprises need governance. They need lineage. They need scale. They need to be able to move things around depending on what their business needs are. And that's what Paxata is driving. That's why we're so excited about our partnership with Microsoft, with AWS, with our customer partnerships such as Standard Chartered Bank, rolling this out in an enterprise-- >> This is a democratization that you were referring to with your customers. We see this-- >> Everywhere. >> When you free the data up, good things happen but you don't want to have IT be the constraint, you want to let them enable-- >> Peter: And IT doesn't want to be the constraint. >> They don't. >> This is one of the biggest problems that they have on a daily basis. >> They're happy to let it go free as long as it's in they're mind DevOps-like related, this is cool for them. >> Well, they're happy to let it go with policy and security in place. >> Our customers, our most strategic customers, the folks who are running the data lakes, the folks who are managing the data lakes, they are the first ones that say that we want business to be able to access this data, and to be able to go and make use out of this data in the right way for the bank. And not have us be the impediment, not have us be the roadblock. While at the same time we still need governance. We still need security. We still need all those things that are important for a bank or a large enterprise. That's what Paxata is delivering to the customers. >> John: So, what's next? >> Peter: Oh, I'm sorry. >> So, really quickly. An interesting observation. People talk about data being the new fuel of business. That really doesn't work because, as Bill Schmarzo says, it's not the new fuel of business, it's new sunlight of business. And the reason why is because fuel can only be used once. >> Prakash: That's right. >> The whole point of data is that it can be used a lot, in a lot of different ways, and a lot of different contexts. And so, in many respects what we're really trying to facilitate or if someone who runs a data lake when someone in the business asks them, "Well, how do you create value for the business?" The more people, the more users, the more context that they're serving out of that common data, the more valuable the resource that they're administering. So, they want to see more utilization, more contexts, more data being moved out. But again, governance, security have to be in place. >> You bet, you bet. And using that analogy of data, and I've heard this term about data being the new oil, etc. Well, if data is the oil, information is really the refined fuel or sunlight as we like to call it. >> Peter: Yeah. >> John: Well, you're riffing on semantics, but the point is it's not a one trick pony. Data is part of the development, I wrote a blog post in 1997, I mean 2007 that said data's the new development kit. And it was kind of riffing on this notion of the old days >> Prakash: You bet. >> Here's your development kit, SDK, or whatever was how people did things back then Enter the cloud, >> Prakash: That's right. >> And boom, there it is. The data now is in the process of the refinery the developers wanted. The developers want the data libraries. Whatever that means. That's where I see it. And that is the democratization where data is available to be integrated in to apps, into feeds, into ... >> Exactly, and so it brings me to our point about what was the exciting, new product innovation announcement we made today about Intelligent Ingest. You want to be able to access data in the enterprise regardless of where it is, regardless of the cloud where it's sitting, regardless of whether it's on-premise, in the cloud. You don't need to as a business worry about whether that is a JSON file or whether that's an XML file or that's a relational file. That's irrelevant. What you want is, do I have the access to the right data? Can I take that data, can I turn it into something valuable and then can I make a decision out of it? I need to do that fast. At the same time, I need to have the governance and security, all of that. That's at the end of the day the objective that our customers are driving towards. >> Prakash, thanks so much for coming on and being a great member of our community. >> Fantastic. >> You're part of our smart network of great people out there and entrepreneurial journey continues. >> Yes. >> Final question. Just observation. As you pinch yourself and you go down the journey, you guys are walking the talk, adding new products. We're global landscape. You're seeing a lot of new stuff happening. Customers are trying to stay focused. A lot of distractions whether security or data or app development. What's your state of the industry? How do you view the current market, from your perspective and also how the customer might see it from their impact? >> Well, the first thing is that I think in the last four years we have seen significant maturity both on the providers off software technology and solutions, and also amongst the customers. I do think that going forward what is really going to make a difference is one really driving towards business outcomes by leveraging data. We've talked about a lot of this over the last few years. What real business outcomes are you delivering? What we are super excited is when we see our customers each one of them actually subscribes to Paxata, we're a SAS company, they subscribe to Paxata not because they're doing the science experiment but because they're trying to deliver real business value. What is that? Whether that is a risk in compliance solution which is going to drive towards real cost savings. Or whether that's a top line benefit because they know what they're customer 360 is and how they can go and serve their customers better or how they can improve supply chains or how they can optimize their entire efficiency in the company. I think if you take it from that lens, what is going to be important right now is there's lots of new technologies coming in, and what's important is how is it going to drive towards those top three business drivers that I have today for the next 18 months? >> John: So, that's foundational. >> That's foundational. Those are the building blocks-- >> That's what is happening. Don't jump... If you're a customer, it's great to look at new technologies, etc. There's always innovation projects-- >> RND, GPOCs, whatever. Kick the tires. >> But now, if you are really going to talk the talk about saying I'm going to be, call your word, data-driven, information-driven, whatever it is. If you're going to talk the talk, then you better walk the walk by delivering the real kind of tools and capabilities that you're business consumers can adopt. And they better adopt that fast. If they're not up and running in 24 hours, something is wrong. >> Peter: Let me ask one question before you close, John. So, you're argument, which I agree with, suggests that one of the big changes in the next 18 months, three years as this whole thing matures and gets more consistent in it's application of the value that it generates, we're going to see an explosion in the number users of these types of tools. >> Prakash: Yes, yes. >> Correct? >> Prakash: Absolutely. >> 2X, 3X, 5X? What do you think? >> I think we're just at the cusp. I think is going to grow up at least 10X and beyond. >> Peter: In the next two years? >> In the next, I would give that next three to five years. >> Peter: Three to five years? >> Yes. And we're on the journey. We're just at the tip of the high curve taking off. That's what I feel. >> Yeah, and there's going to be a lot more consolidation. You're going to start to see people who are winning. It's becoming clear as the fog lifts. It's a cloud game, a scale game. It's democratization, community-driven. It's open source software. Just solve problems, outcomes. I think outcome is going to be much faster. I think outcomes as a service will be a model that we'll probably be talking about in the future. You know, real time outcomes. Not eight month projects or year projects. >> Certainly, we started writing research about outcome-based management. >> Right. >> Wikibon Research... Prakash, one more thing? >> I also just want to say that in addition to this business outcome thing, I think in the last five years I've seen a lot of shift in our customer's world where the initial excitement about analytics, predictive, AI, machine-learning to get to outcomes. They've all come into a reality that none of that is possible if you're not able to handle, first get a grip on your data, and then be able to turn that data into something meaningful that can be analyzed. So, that is also a major shift. That's why you're seeing the growth we're seeing-- >> John: Cause it's really hard. >> Prakash: It's really hard. >> I mean, it's a cultural mindset. You have the personnel. It's an operational model. I mean this is not like, throw some pixie dust on it and it magically happens. >> That's why I say, before you go into any kind of BI, analytics, AI initiative, stop, think about your information management strategy. Think about how you're going to democratize information. Think about how you're going to get governance. Think about how you're going to enable your business to turn data into information. >> Remember, you can't do AI with IA? You can't do AI without information architecture. >> There you go. That's a great point. >> And I think this all points to why Wikibon's research have all the analysts got it right with true private cloud because people got to take care of their business here to have a foundation for the future. And you can't just jump to the future. There's too much just to come and use a scale, too many cracks in the foundation. You got to do your, take your medicine now. And do the homework and lay down a solid foundation. >> You bet. >> All right, Prakash. Great to have you on theCUBE. Again, congratulations. And again, it's great for us. I totally have a great vibe when I see you. Thinking about how you launched on theCUBE in 2013, and how far you continue to climb. Congratulations. >> Thank you so much, John. Thanks, Peter. That was fantastic. >> All right, live coverage continuing day one of three days. It's going to be a great week here in New York City. Weather's perfect and all the players are in town for Big Data NYC. I'm John Furrier with Peter Burris. Be back with more after this short break. (upbeat techno music).

Published Date : Sep 27 2017

SUMMARY :

Brought to you by SiliconANGLE Media I'm John Furrier, the co-host of theCUBE with Peter Burris, and it's been the lucky charm. In the last few years, I came and I talked to you That's not a start up. They got a lot of data. and Shameek Kundu, and the entire leadership Are the ATMs going to go away and turn it into information that actually allows you Take a minute quickly just to talk about what you guys do. And the customer list in a particular region and the person who runs the Paxata platform and you guys just let the business guys and that's contextual. is move more of the prep work or have more of the prep work are starting to merge together? And everybody's going to consume. to turn that data into something that is regarded to be able to really drive towards a world And that's what Paxata is doing. So, John, at the beginning of the session of the business and to make that data relevant This is a democratization that you were referring to This is one of the biggest problems that they have They're happy to let it go free as long as Well, they're happy to let it go with policy and to be able to go and make use out of this data And the reason why is because fuel can only be used once. out of that common data, the more valuable Well, if data is the oil, I mean 2007 that said data's the new development kit. And that is the democratization At the same time, I need to have the governance and being a great member of our community. and entrepreneurial journey continues. How do you view the current market, and also amongst the customers. Those are the building blocks-- it's great to look at new technologies, etc. Kick the tires. the real kind of tools and capabilities in it's application of the value that it generates, I think is going to grow up at least 10X and beyond. We're just at the tip of Yeah, and there's going to be a lot more consolidation. Certainly, we started writing research Prakash, one more thing? and then be able to turn that data into something meaningful You have the personnel. to turn data into information. Remember, you can't do AI with IA? There you go. And I think this all points to Great to have you on theCUBE. Thank you so much, John. It's going to be a great week here in New York City.

ENTITIES

Entity	Category	Confidence
Peter Burris	PERSON	0.99+
John	PERSON	0.99+
Jim	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
2013	DATE	0.99+
Peter	PERSON	0.99+
Prakash	PERSON	0.99+
AWS	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Prakash Nanduri	PERSON	0.99+
Bill Schmarzo	PERSON	0.99+
1997	DATE	0.99+
New York	LOCATION	0.99+
Three	QUANTITY	0.99+
80%	QUANTITY	0.99+
Michael Gorriz	PERSON	0.99+
Standard Chartered Bank	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
2007	DATE	0.99+
Hortonworks	ORGANIZATION	0.99+
87,500 employees	QUANTITY	0.99+
Paxata	ORGANIZATION	0.99+
NYC	LOCATION	0.99+
last year	DATE	0.99+
37th Street	LOCATION	0.99+
SAS	ORGANIZATION	0.99+
WikiBon Research	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
Excel	TITLE	0.99+
24 hours	QUANTITY	0.99+
One	QUANTITY	0.99+
this year	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
This year	DATE	0.99+
21st Century	DATE	0.99+
one	QUANTITY	0.99+
eight month	QUANTITY	0.99+
one question	QUANTITY	0.99+
four years years ago	DATE	0.99+
3X	QUANTITY	0.99+
5X	QUANTITY	0.99+
first	QUANTITY	0.99+
three years	QUANTITY	0.99+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for MapR: