Gian Merlino, Imply.io | AWS Startup Showcase S2 E2
(upbeat music) >> Hello, and welcome to theCUBE's presentation of the AWS Startup Showcase: Data as Code. This is Season 2, Episode 2 of the ongoing SaaS covering exciting startups from the AWS ecosystem and we're going to talk about the future of enterprise data analytics. I'm your host, John Furrier and today we're joined by Gian Merlino CTO and co-founder of Imply.io. Welcome to theCUBE. >> Hey, thanks for having me. >> Building analytics apps with Apache Druid and Imply is what the focus of this talk is and your company being showcased today. So thanks for coming on. You guys have been in the streaming data large scale for many, many years of pioneer going back. This past decade has been the key focus. Druid's unique position in that market has been key, you guys been empowering it. Take a minute to explain what you guys are doing over there at Imply. >> Yeah, for sure. So I guess to talk about Imply, I'll talk about Druid first. Imply is a open source based company and Apache Druid is the open source project that the Imply product's built around. So what Druid's all about is it's a database to power analytical applications. And there's a couple things I want to talk about there. The first off is, is why do we need that? And the second is why are we good at, and I'll just a little flavor of both. So why do we need database to power analytical apps? It's the same reason we need databases to power transactional apps. I mean, the requirements of these applications are different analytical applications, apps where you have tons of data coming in, you have lots of different people wanting to interact with that data, see what's happening both real time and historical. The requirements of that kind of application have sort of given rise to a new kind of database that Druid is one example of. There's others, of course, out there in both the open source and non open source world. And what makes Druid really good at it is, people often say what is Druid's big secret? How is it so good? Why is it so fast? And I never know what to say to that. I always sort of go to, well it's just getting all the little details right. It's a lot of pieces that individually need to be engineered, you build up software in layers, you build up a database in layers, just like any other piece of software. And to have really high performance and to do really well at a specific purpose, you kind of have to get each layer right and have each layer have as little overhead as possible. And so just a lot of kind of nitty gritty engineering work. >> What's interesting about the trends over the past 10 years in particular, maybe you can go back 10, 15 years is state of the art database was, stream a bunch of data put it into a pile, index it, interrogate it, get some reports, pretty basic stuff and then all of a sudden now you have with cloud, thousands of databases out there, potentially hundreds of databases living in the wild. So now data with Kafka and Kinesis, these kinds of technologies streaming data's happening in real time so you don't have time to put it in a pile or index it. You want real time analytics. And so perhaps whether they're mobile app, Instagrams of the world, this is now what people want in the enterprise. You guys are the heart of this. Can you talk about that dynamic of getting data quickly at scale? >> So our thinking is that actually both things matter. Realtime data matters but also historical context matters. And the best way to get historical context out of data is to put it in a pile, index it, so to speak, and then the best way to get realtime context to what's happening right now is to be able to operate on these streams. And so one of the things that we do in Druid, I wish I had more time to talk about it but one of the things that we do in Druid is we kind of integrate this real time processing and this historical processing. So we actually have a system that we call the historical system that does what you're saying, take all this data, put in a pile, index it for all your historical data. And we have a system that we call the realtime system that is pulling data in from things like Kafka, Kinesis, getting data pushed into it as the case may be. And this system is responsible for all the data that's recent, maybe the last hour or two of data will be handled by this system and then the older stuff handled by historical system. And our query layer blends these two together seamlessly so a user never needs to think about whether they're querying realtime data or historical data. It's presented as a blended view. >> It's interesting and you know a lot of the people just say, Hey, I don't really have the expertise, and now they're trying to learn it so their default was throw into a data lake. So that brings back that historical. So the rise of the data lake, you're seeing Databricks and others out there doing very well with the data lakes. How do you guys fit into that 'cause that makes it a lot of sense too cause that looks like historical information? >> So data lakes are great technology. We love that kind of stuff. I would say that a really popular pattern, with Druid there's actually two very popular patterns. One is, I would say streaming forward. So stream focus where you connect up to something like Kafka and you load data to stream and then we will actually take that data, we'll store all the historical data that came from the stream and instead of blend those two together. And another other pattern that's also very common is the data lake pattern. So you have a data lake and then you're sort of mirroring that data from the data lake into Druid. This is really common when you have a data lake that you want to be able to build an application on top of, you want to say I have this data in the data lake, I have my table, I want to build an application that has hundreds of people using it, that has really fast response time, that is always online. And so when I mirror that data into Druid and then build my app on top of that. >> Gian take me through the progression of the maturity cycle here. As you look back even a few years, the pioneers and the hardcore streaming data using data analytics at scale that you guys are doing with Druid was really a few percentage of the population doing that. And then as the hyperscale became mainstream, it's now in the enterprise, how stable is it? What's the current state of the art relative to the stability and adoption of the techniques that you guys are seeing? >> I think what we're seeing right now at this stage in the game, and this is something that we kind of see at the commercial side of Imply, what we're seeing at this stage of the game is that these kinds of realization that you actually can get a lot of value out of data by building interactive apps around it and by allowing people to kind of slice and dice it and play with it and just kind of getting out there to everybody, that there is a lot of value here and that it is actually very feasible to do with current technology. So I've been working on this problem, just in my own career for the past decade, 10 years ago where we were is even the most high tech of tech companies were like, well, I could sort of see the value. It seems like it might be difficult. And we're kind of getting from there to the high tech companies realizing that it is valuable and it is very doable. And I think that was something there was a tipping point that I saw a few years ago when these Druid and database like really started to blow up. And I think now we're seeing that beyond sort of the high tech companies, which is great to see. >> And a lot of people see the value of the data and they see the application as data as code means the application developers really want to have that functionality. Can you share the roadmap for the next 12 months for you guys on what's coming next? What's coming around the corner? >> Yeah, for sure. I mentioned during the Apache open source community, different products we're one member of that community, very prominent one but one member so I'll talk a bit about what we're doing for the Druid project as part of our effort to make Druid better and take it to the next level. And then I'll talk about some of the stuff we're doing on the, I guess, the Druid sort of commercial side. So on the Druid side, stuff that we're doing to make Druid better, take it to the next level, the big thing is something that we really started writing about a few weeks ago, the multi-stage query engine that we're working on, a new multi-stage query engine. If you're interested, the full details are on blog on our website and also on GitHub on Apache Druid GitHub, but short version is Druid's. We're sort of extending Druid's Query engine to support more and varied kinds of queries with a focus on sort of reporting queries, more complex queries. Druid's core query engine has classically been extremely good at doing rapid fire queries very quickly, so think thousands of queries per second where each query is maybe something that involves a filter in a group eye like a relatively straightforward query but we're just doing thousands of them constantly. Historically folks have not reached for technologies like Druid is, really complex and a thousand line sequel queries, complex supporting needs. Although people really do need to do both interactive stuff and complex stuff on the same dataset and so that's why we're building out these capabilities in Druid. And then on the implied commercial side, the big effort for this year is Polaris which is our cloud based Druid offering. >> Talk about the relationship between Druid and Imply? Share with the folks out there how that works. >> So Druid is, like I mentioned before, it's Apache Druid so it's a community based project. It's not a project that is owned by Imply, some open source projects are sort of owned or sponsored by a particular organization. Druid is not, Druid is an independent project. Imply is the biggest contributor to Druid. So the imply engineering team is contributing tons of stuff constantly and we're really putting a lot of the work in to improve Druid although it is a community effort. >> You guys are launching a new SaaS service on AWS. Can you tell me about what that's happening there, what it's all about? >> Yeah, so we actually launched that a couple weeks ago. It's called Polaris. It's very cool. So historically there's been two ways, you can either get started with Apache Druid, it's open source, you install it yourself, or you can get started with Imply Enterprise which is our enterprise offering. And these are the two ways you can get started historically. One of the issues of getting started with Apache Druid is that it is a very complicated distributed database. It's simple enough to run on a single server but once you want to scale things out, once you get all these things set up, you may want someone to take some of that operational burden off your hands. And on the Imply Enterprise side, it says right there in the name, it's enterprise product. It's something that may take a little bit of time to get started with. It's not something you can just roll up with a credit card and sign up for. So Polaris is really about of having a cloud product that's sort of designed to be really easy to get started with, really self-service that kind of stuff. So kind of providing a really nice getting started experience that does take that maintenance burden and operational burden away from you but is also sort of as easy to get started with as something that's database would be. >> So a more developer friendly than from an onboarding standpoint, classic. >> Exactly. Much more developer friendly is what we're going for with that product. >> So take me through the state of the art data as code in your mind 'cause infrastructure is code, DevOps has been awesome, that's cloud scale, we've seen that. Data as Code is a term we coined but means data's in the developer process. How do you see data being integrated into the workflow for developers in the future? >> Great question. I mean all kinds of ways. Part of the reason that, I kind of alluded to this earlier, building analytical applications, building applications based on data and based on letting people do analysis, how valuable it is and I guess to develop in that context there's kind of two big ways that we sort of see these things getting pushed out. One is developers building apps for other people to use. So think like, I want to build something like Google analytics, I want to build something that clicks my web traffic and then lets the marketing team slice and dice through it and make decisions about how well the marketing's doing. You can build something like that with databases like Druid and products like what we're having in Imply. I guess the other way is things that are actually helping developers do their own job. So kind of like use your own product or use it for yourself. And in this world, you kind of have things like... So going beyond what I think my favorite use case, I'll just talk about one. My favorite use case is so I'm really into performance, I spend the last 10 years of my life working on high performance database so obviously I'm into this kind of stuff. I love when people use our product to help make their own products faster. So this concept of performance monitoring and performance management for applications. One thing that I've seen some of our customers do and some of our users do that I really love is when you kind of take that performance data of your own app, as far as it can possibly go take it to the next level. I think the basic level of using performance data is I collect performance data from my application deployed out there in the world and I can just use it for monitoring. I can say, okay my response times are getting high in this region, maybe there's something wrong with that region. One of the very original use cases for Druid was that Netflix doing performance analysis, performance analysis more exciting than monitoring because you're not just understanding that there's a performance, is good or bad in whatever region sort of getting very fine grain. You're saying in this region, on this server rack for these devices, I'm seeing a degradation or I'm seeing a increase. You can see things like Apple just rolled out a new version of iOS and on that new version of iOS, my app is performing worse than the older version. And even though not many devices are on that new version yet I can kind of see that because I have the ability to get really deep in the data and then I can start slicing nice that more. I can say for those new iOS people, is it all iOS devices? Is it just the iPhone? Is it just the iPad? And that kind of stuff is just one example but it's an example that I really like. >> It's kind of like the data about the data was always good to have context, you're like data analytics for data analytics to see how it's working at scale. This is interesting because now you're bringing up the classic finding the needle in the haystack of needles, so to speak where you have so much data out there like edge cases, edge computing, for instance, you have devices sending data off. There's so much data coming in, the scale is a big issue. This is kind of where you guys seem to be a nice fit for, large scale data ingestion, large scaled data management, large scale data insights kind of all rolled in to one. Is that kind of-? >> Yeah, for sure. One of the things that we knew we had to do with Druid was we were building it for the sort of internet age and so we knew it had to scale well. So the original use case for Druid, the very first one that we ended up building for, the reason we build in the first place is because that original use case had massive scale and we struggled finding something, we were literally trying to do what we see people doing now which is we're trying to build an app on a massive data set and we're struggling to do it. And so we knew it had to scale to massive data sets. And so that's a little flavor of kind know how that works is, like I was mentioning earlier this, this realtime system and historical system, the realtime system is scalable, it's scalable out if you're reading from Kafka, we scale out just like any other Kafka consumer. And then the historical system is all based on what we call segments which are these files that has a few million rows per file. And a cluster is really big, might have thousands of servers, millions of segments, but it's a design that is kind of, it's a design that does scale to these multi-trillion road tables. >> It's interesting, you go back when you probably started, you had Twitter, Netflix, Facebook, I mean a handful of companies that were at the scale. Now, the trend is you're on this wave where those hyperscalers and, or these unique huge scale app companies are now mainstream enterprise. So as you guys roll out the enterprise version of building analytics and applications, which Druid and Imply, they got to going to get religion on this. And I think it's not hard because it's distributed computing which they're used to. So how is that enterprise transition going because I can imagine people would want it and are just kicking the tires or learning and then trying to put it into action. How are you seeing the adoption of the enterprise piece of it? >> The thing that's driving the interest is for sure doing more and more stuff on the internet because anything that happens on the internet whether it's apps or web based, there's more and more happening there and anything that is connected to the internet, anything that's serving customers on the internet, it's going to generate an absolute mountain of data. And the only question is not if you're going to have that much data, you do if you're doing anything on the internet, the only question is what are you going to do with it? So that's I think what drives the interest, is people want to try to get value out of this. And then what drives the actual adoption is I think, I don't want to necessarily talk about specific folks but within every industry I would say there's people that are leaders, there's organizations that are leaders, teams that are leaders, what drives a lot of interest is seeing someone in your own industry that has adopted new technology and has gotten a lot of value out of it. So a big part of what we do at Imply is that identify those leaders, work with them and then you can talk about how it's helped them in their business. And then also I guess the classic enterprise thing, what they're looking for is a sense of stability, a sense of supportability, a sense of robustness and this is something that comes with maturity. I think that the super high tech companies are comfortable using some open source software that's rolled off the presses a few months ago; he big enterprises are looking for something that has corporate backing, they're looking for something that's been around for a while and I think that Druid technologies like it are breaching that little maturity right now. >> It's interesting that supply chain has come up in the software side. That conversation is a lot now, you're hearing about open source being great, but in the cloud scale, you can get the data in there to identify opportunities and also potentially vulnerabilities is big discussion. Question for you on the cloud native side, how do you see cloud native, cloud scale with services like serverless Lambda, edge merging, it's easier to get into the cloud scale. How do you see the enterprise being hardened out with Druid and Imply? >> I think the cloud stuff is great, we love using it to build all of our own stuff, our product is of course built on other cloud technologies and I think these technologies built on each other, you sort of have like I mentioned earlier, all software is built in layers and cloud architecture is the same thing. What we see ourselves as doing is we're building the next layer of that stack. So we're building the analytics database layer. You saw when people first started doing these in public cloud, the very first two services that came out you can get a virtual machine and you can store some data and you can retrieve that data but there's no real analytics on it, there's just kind of storage and retrieval. And then as time goes on higher and higher levels get built out delivering more and more value and then the levels mature as they go up. And so the the bottom of layers are incredibly mature, the top most layers are cutting edge and there's a kind of a maturity gradient between those two. And so what we're doing is we're building out one of those layers. >> Awesome extraction layers, faster performance, great stuff. Final question for you, Gian, what's your vision for the future? How do you Imply and Druid it going? What's it look like five years from now? >> I think that for sure it seems like that there's two big trends that are happening in the world and it's going to sound a little bit self serving for me to say it but I believe what we're doing here says, I'm here 'cause I believe it, I believe in open source and I believe in cloud stuff. That's why I'm really excited that what we're doing is we're building a great cloud product based on a great open source project. I think that's the kind of company that I would want to buy from if I wasn't at this company and I was just building something, I would want to buy a great cloud product that's backed by a great open source project. So I think the kind of the way I see the industry going, the way I see us going and I think would be a great place to end up just kind of as an engineering world, as an industry is a lot of these really great open source projects doing things like what Kubernetes doing containers, we're doing with analytics et cetera. And then really first class really well done cloud versions of each one of them and so you can kind of choose, do you want to get down and dirty with the open source or do you want to choose just kind of have the abstraction of the cloud. >> That's awesome. Cloud scale, cloud flexibility, community getting down and dirty open source, the best of both worlds. Great solution. Goin, thanks for coming on and thanks for sharing here in the Showcase. Thanks for coming on theCUBE. >> Thank you too. >> Okay, this is theCUBE Showcase Season 2, Episode 2. I'm John Furrier, your host. Data as Code is the theme of this episode. Thanks for watching. (upbeat music)
SUMMARY :
of the AWS Startup Showcase: Data as Code. Take a minute to explain what you guys are And the second is why are we good at, Instagrams of the world, And so one of the things know a lot of the people data that came from the of the art relative to the that beyond sort of the the next 12 months for you So on the Druid side, Talk about the relationship Imply is the biggest contributor to Druid. Can you tell me about what And on the Imply Enterprise side, So a more developer friendly than from we're going for with that product. means data's in the developer process. I have the ability to get It's kind of like the One of the things that of the enterprise piece of it? I guess the classic enterprise thing, but in the cloud scale, And so the the bottom of How do you Imply and Druid it going? and so you can kind of choose, here in the Showcase. Data as Code is the theme of this episode.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
John Furrier | PERSON | 0.99+ |
Gian Merlino | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Apple | ORGANIZATION | 0.99+ |
two ways | QUANTITY | 0.99+ |
iOS | TITLE | 0.99+ |
Netflix | ORGANIZATION | 0.99+ |
10 | QUANTITY | 0.99+ |
each layer | QUANTITY | 0.99+ |
iPhone | COMMERCIAL_ITEM | 0.99+ |
millions | QUANTITY | 0.99+ |
Druid | TITLE | 0.99+ |
iPad | COMMERCIAL_ITEM | 0.99+ |
first | QUANTITY | 0.99+ |
One | QUANTITY | 0.99+ |
second | QUANTITY | 0.99+ |
thousands | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
Imply | ORGANIZATION | 0.99+ |
both | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
each query | QUANTITY | 0.99+ |
theCUBE | ORGANIZATION | 0.98+ |
ORGANIZATION | 0.98+ | |
ORGANIZATION | 0.98+ | |
Gian | PERSON | 0.98+ |
Kafka | TITLE | 0.98+ |
Imply.io | ORGANIZATION | 0.97+ |
one example | QUANTITY | 0.97+ |
first two services | QUANTITY | 0.97+ |
hundreds of people | QUANTITY | 0.97+ |
each one | QUANTITY | 0.97+ |
two big ways | QUANTITY | 0.97+ |
10 years ago | DATE | 0.96+ |
past decade | DATE | 0.96+ |
first class | QUANTITY | 0.96+ |
one member | QUANTITY | 0.96+ |
Lambda | TITLE | 0.96+ |
two big trends | QUANTITY | 0.96+ |
Apache | ORGANIZATION | 0.95+ |
both worlds | QUANTITY | 0.95+ |
Polaris | ORGANIZATION | 0.95+ |
one member | QUANTITY | 0.95+ |
today | DATE | 0.95+ |
Wen Phan, Ahana & Satyam Krishna, Blinkit & Akshay Agarwal, Blinkit | AWS Startup Showcase S2 E2
(gentle music) >> Welcome everyone to theCUBE's presentation of the AWS Startup Showcase. The theme is Data as Code; The Future of Enterprise Data and Analytics. This is the season two, episode two of the ongoing series of covering the exciting startups in the AWS ecosystem around data analytics and cloud computing. I'm your host, John Furrier. Today we're joined by great guests here. Three guests. Wen Phan, who's a Director of Product Management at Ahana, Satyam Krishna, Engineering Manager at Blinkit, and we have Akshay Agarwal, Senior Engineer at Blinkit as well. We're going to get into the relationship there. Let's get into. We're going to talk about how Blinkit's using open data lake, data house with Presto on AWS. Gentlemen, thanks for joining us. >> Thanks for having us. >> So we're going to get into the deep dive on the open data lake, but I want to just quickly get your thoughts on what it is for the folks out there. Set the table. What is the open data lakehouse? Why it is important? What's in it for the customers? Why are we seeing adoption around this because this is a big story. >> Sure. Yeah, the open data lakehouse is really being able to run a gamut of analytics, whether it be BI, SQL, machine learning, data science, on top of the data lake, which is based on inexpensive, low cost, scalable storage. And more importantly, it's also on top of open formats. And this to the end customer really offers a tremendous range of flexibility. They can run a bunch of use cases on the same storage and great price performance. >> You guys have any other thoughts on what's your reaction to the lakehouse? What is your experience with it? What's going on with Blinkit? >> No, I think for us also, it has been the primary driver of how as a company we have shifted our completely delivery model from us delivering in one day to someone who is delivering in 10 minutes, right? And a lot of this was made possible by having this kind of architecture in place, which helps us to be more open-source, more... where the tools are open-source, we have an open table format which helps us be very modular in nature, meaning we can pick solutions which works best for us, right? And that is the kind of architecture that we want to be in. >> Awesome. Wen, you know last time we chat with Ahana, we had a great conversation around Presto, data. The theme of this episode is Data as Code, which is interesting because in all the conversations in these episodes all around developers, which administrators are turning into developers, there's a developer vibe with data. And with opensource, it's software. Now you've got data taking a similar trajectory as how software development was with code, but the people running data they're not developers, they're administrators, they're operators. Now they're turning into DataOps. So it's kind of a similar vibe going on with branches and taking stuff out of and putting it back in, and testing it. Datasets becoming much more stable, iterating on machine learning algorithm. This is a movement. What's your guys reaction before we get into the relationships here with you guys. But, what's your reaction to this Data as Code movement? >> Yeah, so I think the folks at Blinkit are doing a great job there. I mean, they have a pretty compact data engineering team and they have some pretty stringent SLAs, as well as in terms of time to value and reliability. And what that ultimately translates for them is not only flexibility but reliability. So they've done some very fantastic work on a lot of automation, a lot of integration with code, and their data pipelines. And I'm sure they can give the details on that. >> Yes. Satyam and Akshay, you guys are engineers' software, but this is becoming a whole another paradigm where the frontline coding and or work or engineer data engineering is implementing the operations as well. It's kind of like DevOps for data. >> For sure. Right. And I think whenever you're working, even as a software engineer, the understanding of business is equally important. You cannot be working on something and be away from business, right? And that's where, like I mentioned earlier, when we realized that we have to completely move our stack and start giving analytics at 10 minutes, right. Because when you're delivering in 10 minutes, your leaders want to take decisions in your real-time. That means you need to move with them. You need to move with business. And when you do that, the kind of flexibility these softwares give is what enables the businesses at the end of the day. >> Awesome. This is the really kind of like, is there going to be a book called agile data warehouses? I don't think so. >> I think so. (laughing) >> The agile cloud data. This is cool. So let's get into what you guys do. What is Blinkit up to? What do you guys do? Can you take a minute to explain the company and your product? >> Sure. I'll take that. So Blinkit is India's biggest 10 minute delivery platform. It pioneered the delivery model in the country with over 10 million Indian shopping on our platform, ranging from everything: grocery staples, vegetables, emergency services, electronics, and much more, right. It currently delivers over 200,000 orders every day, and is in a hurry to bring the future of farmers to everyone in India. >> What's the relationship with Ahana and Blinkit? Wen, what's the tie in? >> Yeah, so Blinkit had a pretty well formed stack. They needed a little bit more flexibility and control. They thought a managed service was the way to go. And here at Ahana, we provide a SaaS managed service for Presto. So they engaged us and they evaluated our offering. And more importantly, we're able to partner. As a early stage startup, we really rely on very strong partners with great use cases that are willing to collaborate. And the folks at Blinkit have been really great in helping us push our product, develop our product. And we've been very happy about the value that we've been able to deliver to them as well. >> Okay. So let's unpack the open data lakehouse. What is it? What's under the covers? Let's get into it. >> Sure. So if bring up a slide. Like I said before, it's really a paradigm on being able to run a gamut of analytics on top of the open data lake. So what does that mean? How did it come about? So on the left hand side of the slide, we are coming out of this world where for the last several decades, the primary workhorse for SQL based processing and reporting and dashboarding use cases was really the data warehouse. And what we're seeing is a shift due to the trends in inexpensive scalable storage, cloud storage. The proliferation of open formats to facilitate using this storage to get certain amounts of reliability and performance, and the adoption of frameworks that can operate on top of this cloud data lake. So while here at Ahana, we're primarily focused on SQL workloads and Presto, this architecture really allows for other types of frameworks. And you see the ML and AI side. And like to Satyam's point earlier, offers a great amount of flexibility modularity for many use cases in the cloud. So really, that's really the lakehouse, and people like it for the performance, the openness, and the price performance. >> How's the open-source open side of it playing in the open-source? It's kind of open formats. What is the open-source angle on this because there's a lot of different approaches. I'm hearing open formats. You know, you have data stores which are a big part of seeing that. You got SQL, you mentioned SQL. There's got a mishmash of opportunities. Is it all coexisting? Is it one tool to rule the world or is it interchangeable? What's the open-source angle? >> There's multiple angles and I'll let definitely Satyam add to what I'm saying. This was definitely a big piece for Blinkit. So on one hand, you have the open formats. And what really the open formats enable is multiple compute engines to work on that data. And that's very huge. 'Cause it's open, you're not locked in. I think the other part of open that is important and I think it was important to Blinkit was the governance around that. So in particular Presto is governed by the Linux Foundation. And so, as a customer of open-source technology, they want some assurances for things like how's it governed? Is the license going to change? So there's that aspect of openness that I think is very important. >> Yeah. Blinkit, what's the data strategy here with lakehouse and you guys? Why are you adopting this type of architecture? >> So adding to what... Yeah, I think adding to Wen said, right. When we are thinking in terms of all these OpenStacks, you have got these open table formats, everything which is deployed over cloud, the primary reason there is modularity. It's as simple as that, right. You can plug and play so many different table formats from one thing to another based on the use case that you're trying to serve, so that you get the most value out of data. Right? I'll give you a very simple example. So for us we use... not even use one single table format. It's not that one thing solves for everything, right? We use both Hudi and Iceberg to solve for different use cases. One is good for when you're working for a certain data site. Icebergs works well when you're in the SQL kind of interface, right. Hudi's still trying to reach there. It's going to go there very soon. So having the ability to plug and play different formats based on the use case helps you to grow faster, helps you to take decisions faster because you now you're not stuck on one thing. They will have to implement it. Right. So I think that's what it is great about this data lake strategy. Keeping yourself cost effective. Yeah, please. >> So the enablement is basically use case driven. You don't have to be rearchitecturing for use cases. You can simply plug can play based on what you need for the use case. >> Yeah. You can... and again, you can focus on your business use case. You can figure out what your business users need and not worry about these things because that's where Presto comes in, helps you stitch that data together with multiple data formats, give you the performance that you need and it works out the best there. And that's something that you don't get to with traditional warehouse these days. Right? The kind of thing that we need, you don't get that. >> I do want to add. This is just to riff on what Satyam said. I think it's pretty interesting. So, it really allowed him to take the best-of-breed of what he was seeing in the community, right? So in the case of table formats, you've got Delta, you've got Hudi, you've got Iceberg, and they all have got their own roadmap and it's kind of organic of how these different communities want to evolve, and I think that's great, but you have these end consumers like Blinkit who have different maybe use cases overlapping, and they're not forced to pick one. When you have an open architecture, they can really put together best-of-breed. And as these projects evolve, they can continue to monitor it and then make decisions and continue to remain agile based on the landscape and how it's evolving. >> So the agility is a key point. Flexibility and agility, and time to valuing with your data. >> Yeah. >> All right. Wen, I got to get in to why the Presto is important here. Where does that fit in? Why is Presto important? >> Yeah. For me, it all comes down to the use cases and the needs. And reporting and dashboarding is not going to go away anytime soon. It's a very common use case. Many of our customers like Blinkit come to us for that use case. The difference now is today, people want to do that particular use case on top of the modern data lake, on top of scalable, inexpensive, low cost storage. Right? In addition to that, there's a need for this low latency interactive ability to engage with the data. This is often arises when you need to do things in a ad hoc basis or you're in the developmental phase of building things up. So if that's what your need is. And latency's important and getting your arms around the problems, very important. You have a certain SLA, I need to deliver something. That puts some requirements in the technology. And Presto is a perfect for that ideal use case. It's ideal for that use case. It's distributed, it's scalable, it's in memory. And so it's able to really provide that. I think the other benefit for Presto and why we're bidding on Presto is it works well on the data lakes, but you have to think about how are these organizations maturing with this technology. So it's not necessarily an all or nothing. You have organizations that have maybe the data lake and it's augmented with other analytical data stores like Snowflake or Redshift. So Presto also... a core aspect is its ability to federate or connect and query across different data sources. So this can be a permanent thing. This could also be a transitionary thing. We have some customers that are moving and slowly shifting their data portfolio from maybe all data warehouse into 80% data lake. But it gives that optionality, it gives that ability to transition over a timeframe. But for all those reasons, the latency, the scalability, the federation, is why Presto for this particular use case. >> And you can connect with other databases. It can be purpose built database, could be whatever. Right? >> Sure. Yes, yes. Presto has a very pluggable architecture. >> Okay. Here's the question for the Blinkit team? Why did you choose Presto and what led you to Ahana? >> So I'll take this better, over this what Presto sits well in that reach is, is how it is designed. Like basically, Presto decouples your storage with the compute. Basically like, people can use any storage and Presto just works as a query engine for them. So basically, it has a constant connectors where you can connect with a real-time databases like Pinot or a Druid, along with your warehouses like Redshift, along with your data lake that's like based on Hudi or Iceberg. So it's like a very landscape that you can use with the Presto. And consumers like the analytics doesn't need to learn the SQL or different paradigms of the querying for different sources. They just need to learn a single source. And, they get a single place to consume from. They get a single consumer on their single destination to write on also. So, it's a homologous architecture, which allows you to put a central security like which Presto integrates. So it's also based on open architecture, that's Apache engine. And it has also certain innovative features that you can see based on caching, which reduces a lot of the cost. And since you have further decoupled your storage with the compute, you can further reduce your cost, because now the biggest part of our tradition warehouse is a storage. And the cost goes massively upwards with the amount of data that you've added. Like basically, each time that you add more data, you require more storage, and warehouses ask you to write the data in their own format. Over here since we have decoupled that, the storage cost have gone down. It's literally that your cost that you are writing, and you just pay for the compute, and you can scale in scale out based on the requirements. If you have high traffic, you scale out. If you have low traffic, you scale in. So all those. >> So huge cost savings. >> Yeah. >> Yeah. Cost effectiveness, for sure. >> Cost effectiveness and you get a very good price value out of it. Like for each query, you can estimate what's the cost for you based on that tracking and all those things. >> I mean, if you think about the other classic Iceberg and what's under the water you don't know, it's the hidden cost. You think about the tooling, right, and also, time it takes to do stuff. So if you have flexibility on choice, when we were riffing on this last time we chatted with you guys and you brought it up earlier around, you can have the open formats to have different use cases in different tools or different platforms to work on it. Redshift, you can use Redshift here, or use something over there. You don't have to get locking >> Absolutely. >> Satyam & Akshay: Yeah. >> Locking is a huge problem. How do you guys see that 'cause sounds like here there's not a lot of locking. You got the open formats, and you got choice. >> Yeah. So you get best of the both worlds. Like you get with Ahana or with the Presto, you can get the best of the both worlds. Since it's cloud native, you can easily deploy your clusters very easily within like five minutes. Your cluster is up, you can start working on it. You can deploy multiple clusters for multiple teams. You get also flexibility of adding new connectors since it's open and further it's also much more secure since it's based on cloud native. So basically, you can control your security endpoints very well. So all those things comes in together with this architecture. So you can definitely go more on the lakehouse architecture than warehousing when you want to deliver data value faster. And basically, you get the much more high value out of your data in a sorted template. >> So Satyam, it sounds like the old warehousing was like the application person, not a lot of usage, old, a lot of latency. Okay. Here and there. But now you got more speed to deploy clusters, scale up scale down. Application developers are as everyone. It's not one person. It's not one group. It's whenever you want. So, you got speed. You got more diversity in the data opportunities, and your coding. >> Yeah. I think data warehouses are a way to start for every organization who is getting into data. I don't think data warehousing is still a solution and will be a solution for a lot of teams which are still getting into data. But as soon as you start scaling, as you start seeing the cost going up, as you start seeing the number of use cases adding up, having an open format definitely helps. So, I would say that's where we are also heading into and that's how our journey as well started with Presto as well, why we even thought about Ahana, right. >> (John chuckles) >> So, like you mentioned, one of the things that happened was as we were moving to the lakehouse and the open table format, I think Ahana is one of the first ones in the market to have Hudi as a first class citizen completely supported with all the things which are not even present at the time of... even with Presto, right. So we see Ahana working behind the scenes, improving even some of the things already over the open-source ecosystem. And that's where we get the most value out of Ahana as well. >> This is the convergence of open-source magic and commercialization. Wen, because you think about Data as Code, reminds me, I hear, "Data warehouse, it's not going to go away." But you got cloud scale or scale. It reminds me of the old, "Oh yeah, I have a data center." Well, here comes the cloud. So, doesn't really kill the data center, although Amazon would say that the data center's going to be eliminated. No, you just use it for whatever you need it for. You use it for specific use cases, but everyone, all the action goes to the cloud for scale. The same things happen with data, and look at the open-source community. It's kind of coming together. Data as Code is coming together. >> Yeah, absolutely. >> Absolutely. >> I do want to again to connect on another dot in terms of cost and that. You know, we've been talking a little bit about price performance, but there's an implicit cost, and I think this was also very important to Blinkit, and also why we're offering a managed service. So one piece of it. And it really revolves around the people, right? So outside of the technology, the performance. One thing that Akshay brought up and it's another important piece that I should have highlighted a little bit more is, Presto exposes the ability to interact your data in a widely adopted way, which is basically ANSI SQL. So the ability for your practitioners to use this technology is huge. That's just regular Presto. In terms of a managed service, the guys at Blinkit are a great high performing team, but they have to be very efficient with their time and what they manage. And what we're trying to do is provide leverage for them. So take a lot of the heavy lifting away, but at the same time, figuring out the right things to expose so that they have that same flexibility. And that's been the balancing point that we've been trying to balance at Ahana, but that goes back to cost. How do I total cost of ownership? And that not doesn't include just the actual querying processing time, but the ability for the organization to go ahead and absorb the solution. And what does it cost in terms of the people involved? >> Yeah. Great conversation. I mean, this brings up the question of back in the data center, the cloud days, you had the concept of an SRE, which is now popular, site reliability engineer. One person does all the clusters and manages all the scale. Is the data engineer the new SRE for data? Are we seeing a similar trajectory? Just want to get your reaction. What do you guys think? >> Yes, so I would say, definitely. It depends on the teams and the sizes of that. We are high performing team so each automation takes bits on the pieces of the architecture, like where they want to invest in. And it comes out with the value of the engineer's time and basically like how much they can invest in, how much they need to configure the architecture, and how much time it'll take to time to market. So basically like, this is what I would also highlight as an engineer. I found Ahana like the... I would say as a Presto in a cloud native environment, or I think so there's the one in the market that seamlessly scales and then scales out. And further, with a team of us, I would say our team size like three to four engineers managing cluster day in day out, conferring, tuning and all those things takes a lot of time. And Ahana came in and takes it off our plate and the hands in a solution which works out of box. So that's where this comes in. Ahana it's also based on open-source community. >> So the time of the engineer's time is so valuable. >> Yeah. >> My take on it really in terms of the data engineering being the SRE. I think that can work, it depends on the actual person, and we definitely try to make the process as easy as possible. I think in Blinkit's case, you guys are... There are data platform owners, but they definitely are aware of the pipelines. >> John: Yeah. >> So they have very intimate knowledge of what data engineers do, but I think in their case, you guys, you're managing a ton of systems. So it's not just even Presto. They have a ton of systems and surfacing that interface so they can cater to all the data engineers across their data systems, I think is the big need for them. I know you guys you want to chime in. I mean, we've seen the architecture and things like that. I think you guys did an amazing job there. >> So, and to adding to Wen's point, right. Like I generally think what DevOps is to the tech team. I think, what is data engineer or the data teams are to the data organization, right? Like they play a very similar role that you have to act as a guardrail to ensure that everyone has access to the data so the democratizing and everything is there, but that has to also come with security, right? And when you do that, there are (indistinct) a lot of points where someone can interact with data. We have... And again, there's a mixed match of open-source tools that works well, as well. And there are some paid tools as well. So for us like for visualization, we use Redash for our ad hoc analysis. And we use Tableau as well whenever we want to give a very concise reporting. We have Jupyter notebooks in place and we have EMRs as well. So we always have a mixed batch of things where people can interact with data. And most of our time is spent in acting as that guardrail to ensure that everyone should have access to data, but it shouldn't be exploited, right. And I think that's where we spend most of our time in. >> Yeah. And I think the time is valuable, but that your point about the democratization aspect of it, there seems to be a bigger step function value that you're enabling and needs to be talked out. The 10x engineer, it's more like 50x, right? If you get it done right, the enablement downstream at the scale that we're seeing with this new trend is significant. It's not just, oh yeah, visualization and get some data quicker, there's actually real advantages on a multiple with that engineering. So, and we saw that with DevOps, right? Like, you do this right and then magic happens on the edges. So, yeah, it's interesting. You guys, congratulations. Great environment. Thanks for sharing the insight Blinkit. Wen, great to see you. Ahana again with Presto, congratulations. The open-source meets data engineering. Thanks so much. >> Thanks, John. >> Appreciate it. >> Okay. >> Thanks John. >> Thanks. >> Thanks for having us. >> This season two, episode two of our ongoing series. This one is Data as Code. This is theCUBE. I'm John furrier. Thanks for watching. (gentle music)
SUMMARY :
This is the season two, episode What is the open data lakehouse? And this to the end customer And that is the kind of into the relationships here with you guys. give the details on that. is implementing the operations as well. You need to move with business. This is the really kind of like, I think so. So let's get into what you guys do. and is in a hurry to bring And the folks at Blinkit the open data lakehouse. So on the left hand side of the slide, What is the open-source angle on this Is the license going to change? with lakehouse and you guys? So having the ability to plug So the enablement is and again, you can focus So in the case of table formats, So the agility is a key point. Wen, I got to get in and the needs. And you can connect Presto has a very pluggable architecture. and what led you to Ahana? And consumers like the analytics and you get a very good and also, time it takes to do stuff. and you got choice. best of the both worlds. like the old warehousing as you start seeing the cost going up, and the open table format, the data center's going to be eliminated. figuring out the right things to expose and manages all the scale. and the sizes of that. So the time of the it depends on the actual person, I think you guys did an amazing job there. So, and to adding Thanks for sharing the insight Blinkit. This is theCUBE.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
John Furrier | PERSON | 0.99+ |
Wen Phan | PERSON | 0.99+ |
Akshay Agarwal | PERSON | 0.99+ |
John | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Ahana | PERSON | 0.99+ |
India | LOCATION | 0.99+ |
Blinkit | ORGANIZATION | 0.99+ |
Satyam Krishna | PERSON | 0.99+ |
Linux Foundation | ORGANIZATION | 0.99+ |
Ahana | ORGANIZATION | 0.99+ |
five minutes | QUANTITY | 0.99+ |
Akshay | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
10 minutes | QUANTITY | 0.99+ |
Three guests | QUANTITY | 0.99+ |
Satyam | PERSON | 0.99+ |
Blinkit | PERSON | 0.99+ |
one day | QUANTITY | 0.99+ |
10 minute | QUANTITY | 0.99+ |
Redshift | TITLE | 0.99+ |
both worlds | QUANTITY | 0.99+ |
over 200,000 orders | QUANTITY | 0.99+ |
Presto | PERSON | 0.99+ |
over 10 million | QUANTITY | 0.99+ |
SQL | TITLE | 0.99+ |
10x | QUANTITY | 0.99+ |
Wen | PERSON | 0.98+ |
50x | QUANTITY | 0.98+ |
agile | TITLE | 0.98+ |
one piece | QUANTITY | 0.98+ |
both | QUANTITY | 0.98+ |
three | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
one | QUANTITY | 0.98+ |
single destination | QUANTITY | 0.97+ |
One person | QUANTITY | 0.97+ |
each time | QUANTITY | 0.96+ |
each | QUANTITY | 0.96+ |
Presto | ORGANIZATION | 0.96+ |
one person | QUANTITY | 0.96+ |
single source | QUANTITY | 0.96+ |
Tableau | TITLE | 0.96+ |
one tool | QUANTITY | 0.96+ |
Icebergs | ORGANIZATION | 0.96+ |
Today | DATE | 0.95+ |
One | QUANTITY | 0.95+ |
one thing | QUANTITY | 0.95+ |
Fangjin Yang, Imply.io | CUBE Conversation
(bright upbeat music) >> Welcome, everyone, to this CUBE Conversation featuring Imply. I'm your host, Lisa Martin. Today, we are excited to be joined by FJ Yang, the co-founder and CEO of Imply. FJ, thanks so much for joining us today. >> Lisa, thank you so much for having me. >> Tell me a little bit about yourself and about Imply. >> Yeah, absolutely. So, I started Imply a couple years ago and before start the company, I was a technologist. So, I was a software engineer and software developer primarily specializing in distributed systems. And one of the projects I worked on, ultimately became kind of the centerpiece behind Imply. Imply, as a company is a database company. What we do is we provide developers a powerful tool in order to help them build various types of data analytic applications. We're also an open source company, where the company develops a popular open source project called Apache Druid. >> Got it, so database as a service for modern analytics applications. You're also one of the original authors of Apache Druid. Talk to me, gimme a timeline, Druid's 10-year history or so. What's the big picture? What's been the market evolution that you've seen? >> Yeah, absolutely. So, I moved out to Silicon Valley basically to try and work at a startup, 'cause I was enamored with startups and I thought they were the coolest thing ever. So, at one point, I basically joined the smallest startup I could find. It was a startup called Metamarkets, which actually doesn't exist anymore, it was ultimately acquired by Snapchat a couple years ago. But, I was one of the first employees there. And what we were trying to do at the time, was we were trying to build an analytics application, a user-facing application where people could slice and dice various types of data. At the time, the data sets we were working with were like online advertising, digital advertising data sets which were very large and complex. And, we really struggled to find a database that could basically power the kind of interactive and user experience that we know we want to provide our end customers. So, what ended up happening was we decided to build our own database and we were a three or five-person shop when we decided to build our own database, and that was Druid. And over time, we saw many other types of companies actually struggle with a similar set of problems, albeit with very different types of use cases and very different types of data sets. And, the Druid community kind of grew and evolved from that. And in my work in engaging with the community, what I saw was a market opportunity and a market gap and that's where Imply formed. >> Let's double click on that. You talked about why you built Druid, the problem you were looking to solve. But, talk to me about the role that Imply has. >> Right. So, Imply is a commercial company. What we do is we build kind of an end-to-end enterprise product around Druid as the core engine. Imply provides deployment, it deploys management, it provides security, and it also provides visualization and monitoring pieces around Druid as a core engine. What we aim to do at Imply is really enable developers to build various types of data applications with only the click of a few buttons and interacting with a simple set of APIs. So, the goal is, if you're a developer, you don't have to think about managing the database yourself, you don't have to think about the operational complexity at the database, but instead, what you do is just work with APIs and build your application. >> So, then what gives Druid its superpower? What makes Druid Druid? >> Yeah, so, Druid, the easiest way to think about it, is it's a really fast calculator and it's a very fast calculator for a whole lot of data. So, when you have a whole lot of data and you want to crunch numbers very, very quickly, Druid is very good at doing that. And, people always ask me this question, which is, what makes Druid special? And I always struggle with it, because it's never just one thing, it's actually layers, upon layers, upon layers of engineering. You start with fundamentals of how you maximally optimize the resources of any hardware. So, how do you maximize storage? How do you maximize compute? And then, there's a lot of optimizations around how do you store the data? How do you access that data in a very fast way once it's stored in order to run computations very quickly? So, unfortunately, there's no silver bullet about Druid, but maybe I can summarize in this way. Druid, it's like a search system, and a data warehouse, and a time series database all mixed together. And, that architecture enables it to be very, very quickly. And unfortunately, if you don't know what some of the components I'm talking about are, it's hard to describe where the secret sauce is (chuckling). >> Sometimes you want to keep that secret sauce secret. Talk to me about the overall data space, as we see these days, every company is a data company or if it's not, it needs to be to be successful. Where does Druid fit in the overall data space? Give us that picture of where it fits. >> Yeah, absolutely. So, it's pretty interesting that you see now in the public markets as well as the private markets, some of the hottest unicorns out there are actually data companies. And, I think what people are are understanding now for the first time, is just how vast and complex the data space is and also how large the market is as well. So for sure, there's many different components and pieces in the data space, and they oftentimes come together to form what's known as a data stack. So, data stack is basically kind of an architecture that has various systems and each of these systems are designed to do a certain set of things very, very well. For example, a company that recently went public is a company called Confluent, which mostly catered towards data transport, so getting data from one place to another. They're built around an open source engine called Apache Kafka. Databricks is another mega unicorn that's going to go public pretty soon. And they're built around an open source project called Spark, which is mainly used for data processing. Where we sit is on the data query side. So, what that means is we're a system in which people can store data and then access that data very, very quickly. And there's other systems that do that, but where our bread and butter is, is we're building some sort of application, where you have end users that are clicking buttons in order to get access to data, we're a platform that enables the best end user experience. We return queries very, very quickly with a consistent SLA, we immediately visualize data as soon as it's made available, and then we can support many, many, many concurrent end users to access the system at the same time. >> So, real time. One of the things I think that we learned during the pandemic, one of the many things is that access to real time data, it's no longer a nice to have, it is table stakes for, as I said, every company, these days is a data company. So with how you describe it, how should people think of Druid versus a data warehouse? >> Yeah. So, that's a great question. And obviously, data warehouses have been around since the 70s. In the B2B space, they're among the largest players that kind of exist in enterprise software. So, it's only natural that when you come up with sort of a new analytics database, that people compare it with what they already know, which is data warehouse. So, a lot of how we think about why we're different than data warehouse goes back to how I answered the previous question, and that we're focused right now, really, on powering different types of data applications. Data applications are UIs in which people are really accessing and getting insights from data by clicking buttons versus writing more complex equal queries. And when you click buttons and you get access to data, what you want in terms of an end user experience, is you want answers to questions to come back almost immediately. So you don't want to click a button and then see a spinning dial that goes on for minute and minutes before an answer comes back. You basically want results to come back immediately. You want that experience no matter what types of queries that you're issuing or how many people are issuing those queries. If you have thousands, if not tens of thousands of people that are trying to access data exact same time, you want to give a consistent user experience like Google, which is one of my favorite products. There're millions of people that use Google, and ask questions and they get their answers back immediately. So we try to provide that same experience, but instead of a generic search engine, what we're doing is we're providing a system that basically answers questions on data and users get a very interactive and fast experience when asking questions. And that's something that I think is very different than what data warehouses are primarily specialized in. Data warehouses are really designed to be systems in which people write very large complex sequel queries that might take minutes or hours sometimes to run. But the experience of using a data warehouse to power and application is not a great one. >> So, I'm just curious, FJ, in the last couple of years, with, as I mentioned before the access to real time data no longer a nice to have, but it's something business critical for so many industries, did you see any industries in particular in the recent years that were really primed candidates for what Druid would can deliver? >> Yeah, that's a great question. And you can imagine that the industries that really heavily rely on fast decision making are the ones that are earliest to adopt technologies like this. So, in the security space, and the observability space, as well as working with networking and various forms of backend kind of metrics data, this system has been very popular and it's been popular because people need to triage (indistinct) as they occur, they need to resolve problems, and they also need immediate visibility, as well as very fast queries on data. Another space is online advertising. Online advertising, nowadays is almost entirely programmatic and digital. So, response times are critical in order to make decisions. And that's where Druid was actually born. It was born for advertising before it kind of went everywhere else. We're seeing it more in fraud protection, fraud prevention as well as fraud diagnostics nowadays. We're seeing it in retail as well, which is pretty interesting. And, the goal, of course, is I believe every industry and every vertical needs the capabilities that we provide. So hopefully, we see a whole lot more use cases in the near future. >> Right, it's absolutely horizontal these days. So, 10-year history, you've got a community of thousands, what's the future of Druid? What do you see when you open the crystal ball and look now down the 12 months, 18 months road? >> Yeah. So, I think as a technologist, your goal as the technologist, at least for me, is to try and create technology that has as much applicability as possible and solves problems for as many people as possible. That's always the way I think about it. So, I want to do good engineering and I want to build good systems. And I think what the hallmark of a really good system is you can solve all different types of problems and condense all these different problems, actually into the same set of models and the same set of principles. And, a thing that makes me most excited about Druid is the many, many different industries that it's found value and the many different use cases it's found value. So, if I were to give 30,000 foot roadmap, that's what we're trying to do with the next generation of Druid. We're actually doing a pretty major engine upgrade right now, and pretty major overhaul the entire system. And the goal of that is to take all the learnings that we've had over the last decade and to create something new that can solve an expanded set of problems that we've heard from the community and from other places as well. >> Excellent. FJ, exciting work that you've done the last 10 years. Congratulations on that. Looking forward to the roadmap that you talked about. Thanks for sharing what Druid is, the Imply connection, and all the different use cases where it applies. We appreciate your insights. >> Appreciate you having me on the show. Thank you very much. >> My pleasure. For FJ Yang, I'm Lisa Martin. You're watching this CUBE Conversation, the leader in live tech enterprise coverage. (bright upbeat music)
SUMMARY :
the co-founder and CEO of Imply. and before start the company, You're also one of the original At the time, the data sets we were working the problem you were looking to solve. So, the goal is, if you're a developer, of the components I'm talking about are, the overall data space? in the data space, One of the things I think So, a lot of how we think So, in the security space, and look now down the 12 and the same set of principles. and all the different use Appreciate you having me on the show. the leader in live tech
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Lisa Martin | PERSON | 0.99+ |
thousands | QUANTITY | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
Lisa | PERSON | 0.99+ |
Snapchat | ORGANIZATION | 0.99+ |
10-year | QUANTITY | 0.99+ |
18 months | QUANTITY | 0.99+ |
FJ Yang | PERSON | 0.99+ |
three | QUANTITY | 0.99+ |
Imply | ORGANIZATION | 0.99+ |
Confluent | ORGANIZATION | 0.99+ |
12 months | QUANTITY | 0.99+ |
30,000 foot | QUANTITY | 0.99+ |
Druid | TITLE | 0.99+ |
each | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
Fangjin Yang | PERSON | 0.99+ |
first time | QUANTITY | 0.98+ |
Today | DATE | 0.98+ |
ORGANIZATION | 0.98+ | |
today | DATE | 0.98+ |
millions of people | QUANTITY | 0.98+ |
One | QUANTITY | 0.98+ |
Imply.io | ORGANIZATION | 0.97+ |
Metamarkets | ORGANIZATION | 0.96+ |
five-person | QUANTITY | 0.96+ |
first employees | QUANTITY | 0.94+ |
tens of thousands of people | QUANTITY | 0.94+ |
pandemic | EVENT | 0.94+ |
last couple of years | DATE | 0.91+ |
FJ | PERSON | 0.91+ |
70s | DATE | 0.89+ |
one thing | QUANTITY | 0.89+ |
Databricks | ORGANIZATION | 0.88+ |
one point | QUANTITY | 0.87+ |
Druid | PERSON | 0.84+ |
couple years ago | DATE | 0.81+ |
last decade | DATE | 0.75+ |
Apache Druid | ORGANIZATION | 0.73+ |
Conversation | EVENT | 0.73+ |
Apache | ORGANIZATION | 0.72+ |
last 10 years | DATE | 0.72+ |
double | QUANTITY | 0.69+ |
Spark | TITLE | 0.66+ |
my favorite products | QUANTITY | 0.62+ |
CUBE Conversation | TITLE | 0.58+ |
minutes | QUANTITY | 0.54+ |
minute | QUANTITY | 0.51+ |
Kafka | TITLE | 0.41+ |
CUBE Conversation | EVENT | 0.31+ |
Bala Kuchibhotla and Greg Muscarella | Nutanix .NEXT EU 2018
>> Live from London, England, it's theCUBE covering .Next Conference Europe 2018. Brought to you by Nutanix. >> Welcome back to theCUBE's coverage of Nutanix .Next 2018 here in London, England. We're gonna be talking about developers in this segment. I'm Stu Miniman and my cohost is Joep Piscaer. Happy to welcome to the program two first time guests, Bala Kuchibhotla is the General Manager of Nutanix Era, and sitting next to him is Greg Muscarella who recently joined Nutanix, is Vice President of Products at Nutanix. Both of you been up on stage, Greg was talking about Carbon and cloud native, and of course Era is the databases of service. Gentlemen, thanks so much for joining us. >> Thank you, thank you. >> Good to be here. >> Alright, so look, developers. You know, we were thinking back, you know, I love the old meme, developers, developers, developers! Balmer had it right, and style might not have been there. Microsoft, company that does quite well with developers. You know, my background is in the enterprise space. I'm an infrastructure guy that goes to cloud, and the struggle I've had a little bit is, you know, developers really work from the application down. It's like that's where they live, and as an infrastructure guy, it's a little uncomfortable for me. So maybe to set that stage, because you know I look at Nutanix, you know, at it's core, infrastructure's a big piece of it, but its distributed architectures, it's built from the architecture from like really the hyper-scale type of environments. So help connect the dots as to where Nutanix plays with the developers, and then we'll get into your products and everything else after. Bala, you want to start? >> Cool, okay. So as you know, Nutanix is definitely addressing the IT ops market. We cannot simply its storage, compute, networking, and build the infrastructure as service. Obviously if you look at the private cloud, the IT operators are becoming the cloud operators and then giving them to the developers. We are basically trying to build a cloud for IT operators so they can present the cloud to developer. Now that we have this infrastructure pretty much there for quite some time, we're not expanding the services to other things, the platform, the platform as service. Now going back to the developer community, you will have the same kind of cloud-like consumption. That these cloud operators, the IT operators are providing the cloud for you. US developers get the same kind of public cloud consumption. They lack ability, that the ability you are trying to do, easy tools, (mumbling), and S3s, that kind of stuff, EBS, you have the same kind of APS for our Nutanix that you can spin up a VM, spin up a database, spin up a storage and then do what you want to do kind of stuff. So that's the natural journey for that kind of stuff. >> Yeah, Greg? >> Yeah, I have to agree. Look, the world has changed quite a bit for developers, and it's gotten a lot better. If you look at the tooling and what you can now do on your laptop and spinning up what would be a pretty complex environment from a three tier application with a robust database, an app tier, anything else you might have on the storage side, spin it up, break it down, and with your CICD pipeline you can have it deployed to production pretty rapidly. So we look at doing is, you know, recreating that experience that the cloud has really brought to those developers and having the same type of tooling for those enterprise-grade applications that are going to be deployed, you know, on that infrastructure that is needed in private data centers. >> So looking at, you know, one of the reasons why developers love cloud services so much, it's easy for them. They can just consume it, it's very low friction. They don't even really, you know, need to go through a purchasing process, other than credit card maybe paid for themselves in the beginning. So you know, low friction is really the key word here. So I'm wondering, you know, looking at the Nutanix, the IT ops perspective, how are you kinda bring that low friction into the developer world? >> Yeah, so I'll take the question. So essentially what I am seeing is the world in the enterprise world is very fragmented. People doing silos kind of stuff. As you rightly said, developers really want to be liberated from all this bureaucracy, right? So they really need a service kind of world where they can go click on it, they get their compute kind of stuff. There's a pressure on the IT ops to give that experience, otherwise people will flee to public a lot. As simple as that, right? So to me, the way I see is the IT ops, the DB ops, the traditional DB ops inner ring, they are understanding the need that, hey well, we gotta be service-ified. We want to provide that kind of service-like interface to our teams who are consuming that kinda stuff. So this software, Nutanix as the enterprise cloud software, lets them create their own private cloud and then give those services to the developers kinda stuff. So it's a natural transition as a company for us. We got to start from the cloud operators, now we're exposing the cloud services from the cloud operators to the cloud consumers. Essentially the developers. >> Greg, up on stage you talked about cloud native, and your premise is that cloud native is a term for a methodology, not necessarily that it's born in the cloud. Maybe help explain that a little bit, and you know, we think Nutanix is mostly in data centers today, so, you know, why isn't this just saying, "No, no, no, we can be cloud native, too." >> Fair point, and I think we're not alone in that as well, in being an enterprise infrastructure company that was looking at enabling cloud native applications, our cloud native architecture within the private data center Say look, really it's a form of doing distributed computing, right, and that's the core to it, right? So you have a stateless, ephemeral infrastructure. You're not upgrading things, you know, you're blowing it away and rebuilding it. There's some core things like that, that will move across whether it be in the cloud or on prem. And of course you need tooling for that, right, 'cause that's not the methodology most enterprise developers or operators are really going through, right, so everything's pets, not much cattle. We're really trying to change that quite a bit, and that's both enabling technology but it's also the practices that people will deploy. And we're seeing is, it's not so much us trying to sell this it's more like hey, we're used to this in the cloud, why can't we do this on prem in our private data center where we have all of our data, and the other services that we need to interact with, like, that's where the demand's really coming from. So it's that mass of data they want to interact with with the type of architecture that they've gotten used to for rapid development and deployment. >> So one other thing, you mentioned pets versus cattle. One of the things I've been seeing from, you know, an IT ops perspective is you need a good ecosystem of management products around your pets or your cattle to be able to make it cattle, right? If you don't have the tooling, you're gonna do manual interaction, and it's going to become pets. So I'm wondering, you know, in that cloud native space, how are you helping the IT ops to actually make it a cattle experience, and you know, towards management or monitoring, or backup stuff like that? >> So, you know, a lot of that is surrounded around Kubernetes, right, as a center of mass. So it's not just us doing it, it's us pulling in a lot of the support and ecosystem that is being built by the community for that and leveraging that piece. And then we have other things we'll either add onto that as it integrates with our platform and some of the capabilities there, or things that we may do, just again, pure open source. Give you a couple examples of that, so I mentioned Epoch on stage, right, so it's sort of something that brings additional metrics to Prometheus. So in addition to CPU and memory storage consumption, you're actually getting latency and other more business metrics that you might be using to trigger things in Kubernetes, like auto-scaling. I don't necessarily always scale on CPU or memory, maybe it's a customer experience that's difficult to measure The other thing is because we have the storage layer underneath, you know, we look at doing things like, again it's early in Kubernetes, but snapshotting from within Kubernetes. Right, so if we have a CSI provider, why not from within Kubernetes let an application or a container trigger a snapshot. Underneath our storage layer will take that snap and then it becomes an object that's available from within Kubernetes. So there's a whole lot of things happening. >> I just want to add a couple of comments to that. This pets versus cattle is standardization, right, like we're talking about it. In typical, old legacy enterprises there are let's take the example of databases. Like, every application team has their own databases they are trying to pass, they're all trying to do management around it kind of stuff. When we do a couple of servers, like we looked at around 2,400 databases for a typical company, they have 400 different configurations of the software. And so like this is one of the biggest companies that we talking about kind of stuff. With that kind of stuff they cannot manage cloud, obviously. This is not no more a cattle kind of stuff. But how do you bring that kind of standardization, right? That is where the Era as a product is actually coming into this. We are trying to standardize, but when you try to standardize these database environments for on premise enterprise cloud, you have to do it at their terms. What I meant to try to say is when you try to go for public cloud, you have this catalog 11204 pull the node to PSE5, you can only create databases with whatever the software the public cloud guys are doing it. But on premise needs are slightly different. So that is where Nutanix, Era, and this products will come into. We allow to people to create the cloud, and then we allow them to create their own catalog of software that they can standardize. So that is what I call standardization at their customer terms, that's what we're trying. >> And let me add to that, though. It also brings in this convenience, 'cause not only is it coming up with standardize, but we've made it even more convenient, right, because now a developer can go provision their own database, they're gonna get a standard configuration for what that is, and so you made it easier for developers and you're getting something that is more cattle-like. >> Bala, I think you're in a good seat to be able to actually give us a little bit of independent commentary, you know. The movement of databases is one of the hottest topics in the industry. I haven't seen whether Andy Jassy was sparing back with Larry Ellison, you know, at re:Invent this week, but you know, we've been watching the growth of things like Postgres, and lot of these changes, you know, Era sits clearly in that space. So what do you seeing from customers, you know, the modernization of applications is, you know, what I call the long pole in the tent. It's the toughest thing for me to be able to do. I said we usually want to first, you know, you modernize your platform, Nutanix helps with that, public cloud helps with that, and then I can modernize my application. You know, database tends to be, it's the stickiest application that we have in the industry. So what are you seeing? >> Yeah, so there are two class of applications that we see. This space is completely green field We are starting off completely. People love cloud-like experience and cloud native databases that's where the public cloud can kind of try to help them. But if you see 70 to 80% of the money still is with all the traditional apps. You're trying to now cloudify them. The cloud native stack that we talk about, the cloud native database, is not going to the game. Like you really need to think about how do you kind of take these big, giant databases that are there with Oracles, and DBTools, that kind of stuff but give the cloud-like experience, right? So the actually very difficult game for any public cloud, that's why you don't see rack provisioning and a dot list is still not there, or even if JCP natively. Oracle does that but little bit difficult. Data gravity forces people to come to on premise, that's my humble take on this, right. But how do you build, how do you make this gray area I call it a brown field, and convert them into more of a consumer-centered kind of stuff? That's where Era actually tries to play. It has two roles that, if you have existing databases, we turn to kind of convert them into more of a cloud-like databases for you, or if you have a green field then we can get you directly onto the cloud native experience. Or if you're trying to migrate from technology to other technology, definitely we would like to help. These are the three things that we try to do through Era kinda of stuff, yeah. >> So looking forward, you know, we're starting out with databases, you know, making that simple, making that small so that there's less friction in that. So maybe a question for Greg, so what's the future for Nutanix in, you know, enabling other services, other cloud-like services on a Nutanix platform going forward? >> In addition to databases. >> Exactly. >> Yeah, so we're a big proponent of standard APIs, as I talked about, right, so we have that in storage for a long time, that makes things easy with databases. We have a standard client talking to standard database backends. As we see other core building blocks, those are the kind of things that we're gonna want to build and deliver as well. So S3 is a defacto standard for object storage, for instance, so people are following that. You'll get Pub/Sub with Kafka APIs, Druid. There's a whole bunch of things, especially from the Apache project, that have become sort of defacto standards, so really it's like, okay, well which building blocks are needed by developers to build these applications that they want, and how do we really work the the community to establish those as open standards. 'Cause we really want, you know, I talked about the portability quite a bit. So we don't want anyone locked into our stack or anyone else's stack, it's like hey, let's build with the best toolkits, let's use standard, open APIs, and then developers get what they need which is portability, or run the application where they want to run it. So that's our strategy of going forward. >> Into some-I-tab we have easy to equal end, which is AHV, we have EBS equal end, we have our called Acropolis Block Services. We have S3 equal end, which is called Buckets, we have database RDS equal end, we have Era, and now we are going with content as which we call Carbon. So we are trying to kind of look at those critical services for anyone, especially for developers, to say that man, it's all ecosystem, it's not like one piece, single piece It's not this compute, it's not this storage, but it is an ecosystem of services that we need to kind of predict. >> Want to just come back to what we were talking beginning, the relationship with developers. How much of what Nutanix does is really kind of the IT ops that then enables developers, and how much direct developer engagement is it? Like, you know, is there development activity here at the conference going on that we should know about? I know that Nutanix goes to a lot of the developer shows. But maybe if you could give us some commentary on that. >> Yeah, I can start that, it's a path, right? So currently we certainly have the bulk of our interactions are gonna be on the IT operations side, and so it's only through them, because their customers are the developers that we really interact primarily today. But you should see that changing quite a bit, and I think that you'll that with the tools that we're providing directly to developers to interact with you know, through the APIs like they have Era. So for instance, if IT has deployed Era internally, then if I want a database I can go straight to those APIs or command line to grab those things. And you'll see that continuously be a trend as we let developers interact directly with our products. >> Just to give you an example, right, within the company, within Nutanix, we are drinking our own champaign, right. So we are operating a private cloud and we are exposing our APIs to all our developers. Today, if someone wants a database in Nutanix, they go to a control plane and say I want a database. Right, that's the API. How the infrastructure is getting, it's a means to an end for them, right. That's where we are going with our customers, too, hey, here is how you build your private cloud, here is how you expose all your service end points for different services, and your developers just need to enjoy them. And then there's a building aspect of it, that's the nuance that private clouds need to deal with. How do they charge the developers, how do they charge meter, that kind of stuff that people will talk about today. >> You know, I definitely heard when I talked to all the product teams, especially everything in Zai cloud, you know, extensibility with APIs is built into everything you're doing. So we're going to have to leave it there. Greg, we're gonna be catching up with you and the Nutanix team in two weeks at the Cube-Con show in Seattle. So thanks so much for joining us. Bala, pleasure, thanks for giving us all the update. And thank you, we're gonna be back with more coverage here. From Nutanix .Next 2018 in London, I'm Stu Miniman and Joep Piscaer is my cohost. Going to be do a Dutch session in a second, so be sure to stay with that. First foreign language interview on theCUBE, and thank you for watching. (electronic music)
SUMMARY :
Brought to you by Nutanix. Both of you been up on stage, Greg was talking and the struggle I've had a little bit is, you know, They lack ability, that the ability you are trying to do, that are going to be deployed, you know, So I'm wondering, you know, looking at the Nutanix, There's a pressure on the IT ops to give that experience, Maybe help explain that a little bit, and you know, right, and that's the core to it, right? One of the things I've been seeing from, you know, So, you know, a lot of that is surrounded around pull the node to PSE5, you can only create and so you made it easier for developers the modernization of applications is, you know, a green field then we can get you So looking forward, you know, we're starting out 'Cause we really want, you know, I talked and now we are going with content as which we call Carbon. Like, you know, is there development activity are the developers that we really interact primarily today. that's the nuance that private clouds need to deal with. Greg, we're gonna be catching up with you
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Bala Kuchibhotla | PERSON | 0.99+ |
Greg | PERSON | 0.99+ |
Greg Muscarella | PERSON | 0.99+ |
Nutanix | ORGANIZATION | 0.99+ |
Joep Piscaer | PERSON | 0.99+ |
Andy Jassy | PERSON | 0.99+ |
Stu Miniman | PERSON | 0.99+ |
70 | QUANTITY | 0.99+ |
London | LOCATION | 0.99+ |
Larry Ellison | PERSON | 0.99+ |
Seattle | LOCATION | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
Today | DATE | 0.99+ |
London, England | LOCATION | 0.99+ |
two roles | QUANTITY | 0.99+ |
single piece | QUANTITY | 0.99+ |
one piece | QUANTITY | 0.99+ |
Both | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
three things | QUANTITY | 0.98+ |
400 different configurations | QUANTITY | 0.98+ |
Oracles | ORGANIZATION | 0.98+ |
Bala | PERSON | 0.98+ |
80% | QUANTITY | 0.98+ |
Postgres | ORGANIZATION | 0.98+ |
around 2,400 databases | QUANTITY | 0.98+ |
Acropolis Block Services | ORGANIZATION | 0.98+ |
Balmer | PERSON | 0.97+ |
Cube-Con | EVENT | 0.97+ |
one | QUANTITY | 0.97+ |
both | QUANTITY | 0.97+ |
Kubernetes | TITLE | 0.97+ |
EBS | ORGANIZATION | 0.96+ |
US | LOCATION | 0.96+ |
S3 | TITLE | 0.95+ |
Era | ORGANIZATION | 0.95+ |
Dutch | OTHER | 0.95+ |
two weeks | QUANTITY | 0.95+ |
One | QUANTITY | 0.95+ |
Druid | TITLE | 0.94+ |
first | QUANTITY | 0.94+ |
Era | TITLE | 0.94+ |
Carbon | ORGANIZATION | 0.91+ |
Nutanix Era | ORGANIZATION | 0.9+ |
11204 | OTHER | 0.9+ |
2018 | DATE | 0.89+ |
Epoch | ORGANIZATION | 0.85+ |
two class | QUANTITY | 0.85+ |
Prometheus | TITLE | 0.83+ |
three tier application | QUANTITY | 0.82+ |
PSE5 | TITLE | 0.81+ |
First foreign | QUANTITY | 0.81+ |
theCUBE | ORGANIZATION | 0.81+ |
Apache | ORGANIZATION | 0.81+ |
two first time guests | QUANTITY | 0.8+ |
this week | DATE | 0.8+ |
Harjot Gill & Rajiv Mirani, Nutanix | Nutanix .NEXT 2018
>> Announcer: Live from New Orleans, Louisiana it's the Cube, covering Dot Next Conference 2018. Brought to you by Nutanix. >> Welcome back, I'm Stu Miniman here at the Cube in New Orleans, the Nutanix Dot Next Conference. Joining me is Keith Townsend, going wall-to-wall with interviews for two days. And going to dig into some really geeky techy stuff, Micro segmentation and the like. Happy to welcome to the program two first-time guests, Harjot Gill, who is the Senior Director of product and engineering at Nutanix and Rajiv Mirani, who's the CTO of Cloud Platform. Thank you both for joining us. >> Both: Thanks, thanks for having us. >> Alright, so Rajiv you've been with Nutanix for a bit, so we're going to get Harjot first. So we beat four acquisitions that Nutanix has made in the software space in the last year or so. One of them was Netsil. >> Harjot: Yes. >> So bring us back. You were and are the CEO of the Netsil Group. Tell us, kind of, a why of the company, size of the team, things like that. >> That's good yeah, so previously, as I was co-founder and CEO of Netsil, which I don't know whether you noticed, is listen spelled backwards. And, essentially, it was like microservices analytics platform and the core technology of Nexus was, where designers at University of Pennsylvania in the research group. That's where most of my team came from. It's a really small team, like just 10 engineers, who took on this like very interesting challenge in the industry as micro services were taking off, applications were, like, ported to modern platforms, like kubernetes. We saw an opportunity to take, like, a network centric approach in doing performance analysis and liability analysis. And the product that we built is very interesting. It can be thought of as, like, Google Maps for your cloud applications just like Splunk, in the past, was Google search for data center. So we came up with this concept where you can, like, visualize different abstractions and different virtualization layers of your application delivery. And that was our product. >> Alright, Rajiv, we've been talking about the, really, expansion of services that you're offering. You know, security and networking, obviously a big space. So first of all, not not a Stanford team that you brought in but University of Pennsylvania. Explain a little bit for us justification, how Netsil fits in with the Nutanix portfolio. >> Yeah, the Netsil Technology is unique in many different ways and we actually see a lot of different applications for it. The core product that they have today, the way they do performance monitoring by staying just on the network, not installing any host agents. It's pretty unusual. It's something that we really liked about the technology. The fact that they can do this at layer seven can actually look at application data to deep packet inspection at line speed. It's even more impressive. And they really build at the scale out architecture based on Harjot's research work. We looked at that and we said, "hey look, this can be used for performance monitoring, it can be used for application discovery, it can be used for security operations." There's just so many different directions we can take this in. And it's a great team that's built it with a relatively small number of people. We want these guys to be working with us not not as a separate company. And it moved very quickly. The acquisition happened quite quickly. We talked a little bit this morning about how they're going to use it for micro segmentation but there's many other use cases we see coming down the pike. >> So let's talk a little bit about the enterprise of applicability. You know, when you guys looked at it, you mainly looked at containers and the challenges of a micro, i'm sorry, of multi services and basically twelve fact applications. >> Harjot: Yeah. >> How is that applicable to the typical enterprise, which 90% of their applications are modern lifts. Same capability? What what capabilities are you bringing to Bear for traditional application? >> It's pretty applicable everywhere because network is a very stable source of truth, like what remains constant in the legacy as well as in the new world is your TCP/IP stack. And it's a very stable source of truth to tap into. So one of the value proposition that Netsil had with an offer very, like, the early enterprise customers that we signed up, was helping them migrate from this monolithic architectures to micro services. And their existing tools on the market, if you look at APM tools or even the logging tools, were inadequate when taking them on this journey. And you can think of Netsil as a very pervasive solution. I mean, the analogy that I usually give people is, like drones versus troops on the ground. Where Netsil can quickly set up, like a breadth of coverage in any environment, whether it's like Legacy or micro services, you are covered. And and then once you find issues in your environment with security issues or performance issues, you can systematically drill in. Either add more instrumentation creating or add policies with micro segmentation. That was the whole idea. So there was a gap in the market for this kind of a tool. >> So let's talk about integration of Nutanix. One of the, what I'm calling, first principles for Nutanix is, push button one click easy. >> [Harjot And Rajiv] Yes. >> What does the Netsil application look like in a Nutanix environment to the Nutanix administrator? >> So let's take the micro segmentation example again, right. So today, if you were to micro segment an existing application, it's pretty hard to know where to begin. So Netsil described it as a hairy problem but we know he likes hair. But what Netsil does is it takes all the data it's gathering from the network and it gives you all this visibility into how every part of your application is interacting with each other. You can group it in different ways, so it's not just about VMs talking to Vms. If you have a micro services based application, that's actually very little value. You really want, which services are talking to each service or even more, which service tiers are talking to which service tiers. But gathering all that data, we can actually fully automate the creation of micro segmentation policies for existing applications. So today what we saw was more of a manual thing. We've set it up previously. It's just that we haven't enough time to do integration yet. You expect that to become completely automated. Similarly with the remediation stuff, the troubleshooting stuff. We have it integrated with the Netsil technology, with the machine learning things that we have been working on. Once we do that, we can explain a lot more automated insights into your applications, integrated alert system, integrated with our metrics and stat systems. So a lot of work to do but a lot of potential for this technology, I think. >> So yeah, so it actually does solve this chicken and an egg problem, as Rajiv said, with actually making micro segmentation operational by first discovering these ground field apps and then suggesting policies, right? And all the goodness of Netsil will be brought on to, like, products like Prism, where out-of-the-box, Netsil can provide visibility and metrics for workloads such as VDI and all the packaged applications and all the Mongo Db and all of the stuff that is hosted on top of Nutanix platform and selling it to the same ID ops. >> Harjot, the space you're playing in is really changing so so fast. >> Harjot: Yes it is. >> Talk about micro segmentation and containers and serverless and the like. What, at its core, will allow your product to be able to stay up with the pace of change? >> So the code of the product, as I mentioned, I mean, it's network based, so one of the things, like, you get with that is, like, it's a very stable source of truth. So your languages keep evolving. So in if you look at the, I mean, this mind-boggling introduction of, like, open source technologies into enterprise environments, which you don't control what languages they are written in. And your developers are like picking up the latest and greatest tools. So in that world the core of the technology, which is like network based, still works the same and that allows us to be ,like, really future-proof this thing here. >> Languages of frameworks change. The network protocols are much more stable. >> Yet, to some people's chagrin, the protocols don't change. So let's talk a little bit about products and overlap of products. One of the, I think, confusing points, or can be confusing, is where Netsil fits in when it comes to Comm and overall to Zai. Where, where's the interaction and overlap or what's the relative? >> Yeah, so you can think of every workload in the cloud as a coup de loop, observe, orient, decide, and act. Now what Comm helps the customer is to like act faster, right. Whereas Netsil comes in and provides the observe and the orient piece. So it's all part of the same workload workflow. If you are an IT ops person, you need tools to observe and help orient, so you can decide faster. And tools like Comm and kubernetes, in the future, with one click, just a few clicks, you can make massive changes to your cloud infrastructure. But without observability you are just flying blind. That's where Netsil comes in. So that's why, as you've said, as Rajiv said, like it's going to enhance a lot of areas within Nutanix and, possibly like, even continue selling as a multi cloud monitoring solution. >> Just as we do brownfield input for micro segmentation, you can imagine that it would be a great great product for Comm as well. Being able to do brownfield import of applications and making them into Comm blueprints. >> Yeah, Rajiv, you've had some pent up demand from customers for the micro segmentation piece but give us a little bit.. You said there's other applications, what should we be expecting to see from the Netsil product line? >> So as CTU I can talk future, so let me tell you some stuff on the kubernete timelines. One great area for us to explore is around security operations. Since since Netsil is already in the net world looking at all traffic, it can easily establish a baseline, of which Vms, which containers normally talk to each other. What kind of requests to make. And it's registered at layer seven, so it can even go and look into what kind of API endpoints are normally called. And once it's base-lined this, detecting variation, selecting violations is going to be relatively simple. So we can alert on security violations, unusual behavior, services making calls to services that shouldn't be making calls to. All that kind of stuff. So that's one area for us to explore. We talked about Comm, so Comm can benefit greatly by being able to import brownfield applications into the Comm umbrella, making blueprints out of them. There's integrations with Prism Pro, which will enable the kind of metrics that Netsil is collecting and integrating it to what Prism Pro already does, putting into one single framework, adding it to capacity planning, adding in all the Prism Pro features that we have. So there's a lot of stuff we can do. >> So that's an awful lot of data. Where's this stored and what's the engine behind it? >> That's a great question. Actually, Netsil not only innovated in this unique way of collecting, we also invented a lot in-time series databases. So the back end of Netsil is powered by a database called Apache Druid, which is an OLAP time series database. So it can ingest that scale and you can run complex queries in sub-second latency XQ. So it can like summarize billions of data points at sub-second latencies. And the third thing that Netsil innovated is, in the visualizations. We are talking about, like, visualizing this complex data that is coming from these modern transforming environments. That's another area where Netsil innovated with this Maps interface to summarize and build easy-to-understand visualizations on your complex infrastructure. >> Now I'm scared that my head would explode but I would love to get you guys on with Satyam and talk through what additional data and when it comes to IOT machine learning, what additional insights. Quick question, are you guys working with Satyam at all at this point? >> We've started, like, understanding the lay of the land, so we're, like, still getting introduced to a lot of teams. As you guys know, these Nutanix is now growing very rapidly, there's so many areas to, like, learn about. And we are primarily working with a micro segmentation team right now but going forward, you will see Netsil's goodness being brought into other areas at Nutanix. >> Yeah, Rajiv one question I have from a software standpoint in general, where does AI fit into, you know, what you're doing with Zai and Comm? >> Yes, so for all of them, you know, we're using machine learning fairly extensively today to even do basic things like capacity planning, the what-if modeling that we've been doing. But to go beyond machine learning, if we actually invest in building an AI platform, I feel we can do a lot more in terms of root cause analysis and mediation, troubleshooting of applications, finding performance bottlenecks automatically. Essentially, really making that invisible infrastructure dream come true. We're close, we're not quite there yet. >> Yeah, and it's really about, like, getting quality data in without friction. So you have, like, AI is now being commoditized in the industry like all the algorithms are now like mainstream. So the biggest challenge has always been how do you go and capture the data at low friction? That's what Netsil brings onboard. >> Yeah, I'm super excited for the micro segmentation. Let's talk about what if customers... What has been the customer reaction to Netsil and just the new capability? >> We see a lot of excitement. This is micro segmentation barely been out, what, a couple of months at this point? And we already have fairly large customers deploying it out there, and a lot of demand for proof of concepts and so on at this point. It was very clear to us from the beginning that when people were looking at other SDN solutions, the number one use case they were using in the enterprise was for micro segmentation. So we took that, we made it as simple as we could. In true Nutanix fashion we said, "okay, let's make micro segmentation as one-click as we can." And it's been gratifying, I think, to see the initial reaction. In fact, some of the initial feedback we've gotten has been along the lines of, this is almost too simple. >> So one of the challenges that we've had in Enterprise is hybrid cloud. When you look at a EC2 instance and you have an internal database and the two communicate, that EC2 instance is ephemeral, we don't know how to handle that. Does Netsil address that challenge at all? >> It does, in fact, it's been designed for even a faster moving world of containers. I'll give you an example of kubernetes, it is, I mean, a similar example. So next Hill installs as a daemon set on kubernetes experiencing structure insertion. You are, like, independently inserting without developers. And as soon as it is installed, it's not just looking at packets, it's also like tapping into docker socket for metadata. So as soon as containers go up and down, new ones brought up, it actually pulls the metadata, the container IDs, the service IDs, kubernetes, pod names and whatnot. And then measures that to the metrics that we are collecting. So that in the UI, as you saw in the demo today, you're not so much slicing and dicing by IP addresses. You're slicing and dicing by that service tax, so your BMS can come and go, containers can come and go. But we are looking at the behavior of this group of cattle, and you know the cattle versus pets analogy, the whole idea in the new world is, to like, create these services as the new pets and your cattle are ephemeral, and the whole idea that Netsil can discover micro-services, discover the boundary of micro services by looking at layer 7 behavior and by smartly grouping things based on the behavior. So we know exactly what a MySQL database and different installations of MySQL look like based on the behavior and the query behavior, and group them together. >> So enforcement. And is that at the bot level or is that at the container level? >> So on the enforcement side, Netsil is mostly on the visibility. So on the micro segmentation side there is... >> Today micro-segmentation, of which for Vms as we build out our next version of container services, we are looking into building a micro segmentation for kubernetes as well, and that will be at the bot level. >> Alright Kieth, I'm looking forward to this is CTO advisor podcast, digging a little bit more into micro-segmentation. It may be Rajiv and.. >> We'll have them on for sure. >> ...and Harjot can stop by so time. But thank you gentlemen so much for coming. Congratulations on the update. Looking forward to hearing more. Keith and I have a little bit more here left of day one of Nutanix dot next 2018. I'm Stu Miniman, Kieth Townsend. Thank you for watching the Cube. (Electronic Music)
SUMMARY :
Brought to you by Nutanix. in New Orleans, the Nutanix Dot Next Conference. in the software space in the last year or so. size of the team, things like that. So we came up with this concept where you can, like, So first of all, not not a Stanford team that you brought in Yeah, the Netsil Technology is unique the enterprise of applicability. How is that applicable to the typical enterprise, And and then once you find issues in your environment So let's talk about integration of Nutanix. So let's take the micro segmentation example again, right. and all the Mongo Db and all of the stuff Harjot, the space you're playing in and serverless and the like. So the code of the product, as I mentioned, Languages of frameworks change. and overall to Zai. So it's all part of the same workload workflow. you can imagine that it would be a great great product from customers for the micro segmentation piece adding in all the Prism Pro features that we have. So that's an awful lot of data. So the back end of Netsil is powered by a database but I would love to get you guys on with Satyam And we are primarily working with the what-if modeling that we've been doing. So the biggest challenge has always been What has been the customer reaction to Netsil So we took that, we made it as simple as we could. So one of the challenges that we've had in Enterprise So that in the UI, as you saw in the demo today, And is that at the bot level So on the micro segmentation side there is... and that will be at the bot level. to this is CTO advisor podcast, Congratulations on the update.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Keith Townsend | PERSON | 0.99+ |
Keith | PERSON | 0.99+ |
Nutanix | ORGANIZATION | 0.99+ |
Netsil | ORGANIZATION | 0.99+ |
Harjot Gill | PERSON | 0.99+ |
Stu Miniman | PERSON | 0.99+ |
90% | QUANTITY | 0.99+ |
Harjot | PERSON | 0.99+ |
two days | QUANTITY | 0.99+ |
Kieth Townsend | PERSON | 0.99+ |
One | QUANTITY | 0.99+ |
Kieth | PERSON | 0.99+ |
Rajiv | PERSON | 0.99+ |
MySQL | TITLE | 0.99+ |
10 engineers | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
Netsil Group | ORGANIZATION | 0.99+ |
New Orleans | LOCATION | 0.99+ |
last year | DATE | 0.99+ |
New Orleans, Louisiana | LOCATION | 0.99+ |
one click | QUANTITY | 0.99+ |
Rajiv Mirani | PERSON | 0.99+ |
one | QUANTITY | 0.99+ |
both | QUANTITY | 0.99+ |
Both | QUANTITY | 0.99+ |
each service | QUANTITY | 0.98+ |
Zai | ORGANIZATION | 0.97+ |
EC2 | TITLE | 0.97+ |
two | QUANTITY | 0.97+ |
third thing | QUANTITY | 0.96+ |
one single framework | QUANTITY | 0.96+ |
first | QUANTITY | 0.96+ |
2018 | DATE | 0.96+ |
University of Pennsylvania | ORGANIZATION | 0.96+ |
layer seven | OTHER | 0.95+ |
one-click | QUANTITY | 0.95+ |
twelve fact | QUANTITY | 0.95+ |
University of Pennsylvania | ORGANIZATION | 0.95+ |
layer 7 | OTHER | 0.94+ |
Nexus | ORGANIZATION | 0.94+ |
Stanford | ORGANIZATION | 0.94+ |
Prism Pro | TITLE | 0.93+ |
Today | DATE | 0.92+ |
Greg Fee, Lyft | Flink Forward 2018
>> Narrator: Live from San Francisco, it's theCUBE covering Flink Forward brought to you by Data Artisans. >> This is George Gilbert. We are at Data Artisan's conference Flink Forward. It is for the Apache Flink commmunity, sponsored by Data Artisans, and all the work they're doing to move Flink Forward, and to surround it with additional value that makes building stream-processing applications accessible to mainstream companies. Right now though, we are not talking to a mainstream company, we're talking to Greg Fee from Lyft. Not Uber. (laughs) And Greg tell us a little bit about what you're doing with Flink. What's the first-use case, that comes to mind that really exercises its capabilities? >> Sure, yeah, so the process of adopting Flink at Lyft has really started with a use case, which was, we're trying to make machine learning more accessible across all of Lyft. So we already use machine learning in quite a few applications, but we want to make sure that we use machine learning as much as possible, we really think that's the path forward. And one of the fundamental difficulties with that is having consistent feature generation between these offline batch-y training scenarios and the online real-time streaming scenarios. And the unified processing engine of Flink really helps us bridge that gap, so. >> When you say unified processing engine, are you saying that the fact that you can manage code and data, as sort of an application version, and some of the, either code or data, is part of the model, and so your versioning? >> That's even a step beyond what I'm talking about. >> Okay. >> Just the basic fundamental ability to have one piece of business logic that you can apply at the batch bulk layer, and in the real-time layer. >> George: Yeah. >> So that's sort of like the core of what Flink gives you. >> Are you running both batch and streaming on Flink? >> Yes, that's right. >> And using the, so, you're using the windows? Or just periodic execution on a stream to simulate batch? >> That's right. So we have, so feature generation crosses a broad spectrum of possible use cases in Flink. >> George: Yeah. >> And this is where we sort of transition more into what dA platform could give for us. So, we're looking to have thousands of different features across all of our machine learning models. So having a platform that can help us host many of these little programs running, help with the application life-cycle of each of these features, as we version them over time. So, we're very excited about what dA platform can do for us. >> Can you tell us a little more about how the stream processing helps you with the feature selection engineering, and is it that you're using streaming, or simulated batch, or batch using the same programming model to train these models, and you're using, you're picking up different derived data, is that how it's working? >> So, typical life-cycle is, it's going to be a feature engineering stage, so the data scientist is looking at their data, they're trying figure out patterns in the data, and they're going to, how you apply Flink there, is as you come up with potential algorithms for how you generate your feature, can run that through Flink, generate some data, apply machine learning model on top of it, and sort of play around with that data, prototype things. >> So, what you're doing is offline, or out of the platform, you're doing the feature selection and the engineering. >> Man: Right. >> Then you attach a stream to it that has just the relevant, perhaps, the relevant features. >> Man: Right. >> And then that model gets sort of, well maybe not yet, but eventually versioned as part of the application, which includes the application, the rest of the application logic and the data. >> Right. So, like some of the stuff that was touched on this morning at the keynotes, the versioning and maintaining machine learning applications, is a much, is a very complex ecosystem there. So being able to say, okay, going from the prototype stage, doing stuff in batch, to doing stuff in production, and real-time, then being able to version those over time, to move to better and better versions of the future generation, is very important to us. >> I don't know if this is the most politically correct thing, but you just explained it better than everyone else we have talked to. >> Great. (laughs) >> About how it all fits together with the machine learning. So, once you've got that in place, it sounds like you're using the dA platform, as well as, you know, perhaps some extensions for machine learning, to sort of add that as a separate life-cycle, besides the application code. Then, is that going to be the enterprise-wide platform for deploying, developing and deploying, machine learning applications? >> Yes, certainly we think there's probably a broad ecosystem to do machine learning. It's a very, sort of, wide open area. Certainly my agenda is to push it across the company and get as many things running in this system as possible. I think the real-time aspects of it, a unifying aspect, of what Flink can give us, and the platform can give us, in terms of the life-cycles. >> So, are you set up essentially like where you're the, a shared resource, a shared service, which is the platform group? >> Man: Right. >> And then, all the business units, adopt that platform and build their apps on it. >> Right. So my initiative is part of a greater data science platform at Lyft, so, my goal is to have, we have hundreds of data scientists who are going to be looking at this data, giving me little features that they want to do, and we're probably going to end up numbering in the thousands of features, being able to generate all those, maintain all those little programs. >> And when you say generate all those little programs, that's the application logic, and the models specific to that application? >> That's right, well. >> Or is it this? >> There's features that are typically shared across many models. >> Okay. >> So there's like two layers of things happening. >> So you're managing features separately from the models. >> That's right. >> Interesting. Okay, haven't heard that. And is the application manager tooling going to help address that, or is that custom stuff that you have to do? >> So, I think there's, I think there's a potential that that's the way we're going to manage the model stuff as well, but it's still little new over there. >> That you put it on the application platform? >> Right. >> Then that's sort of at the boundary of what you're doing right now, or what you will be doing shortly. >> Right. It's all, it's a matter of use-case, whether it's online or offline, and how it fits best in with the rest of the Lyft engineering system. >> When you're talking about your application landscape, do you have lots of streaming applications that feed other streaming applications, going through a hub. Or, are they sort of more discrete, you know, artifacts, discrete programs, and then when do you keep, stay within the streaming processors, and when do you have it in a shared database? >> That's a, that's a lot of questions, kind of a deep question. So, the goal is to have a central hub, where sort of all of our event data passes through it, and that allows us to decouple. >> So that's to be careful, that's not a database central hub, that's a, like a? >> An event hub. >> Event hub. >> Right. >> Yeah, okay. >> So, an event hub in the middle allows us to decompose the different, sort of smaller programs, which again are probably going to number in the thousands, so that being able to have different parts of the company maintain their own part of the overall system is very important to us. I think we'll probably see Flink as a major player, in terms of how those programs run, but we'll probably be shooting things off to other systems like Druid, like Hive, like Presto, like Elasticsearch. >> As derived data? >> As all derived data, from these Flink jobs. And then also, pushing data directly out into some of our production systems to feed into machine learning decisions. >> Okay, this is quite, sounds like the most ambitious infrastructure that we've heard, in that it sounds like pretty ubiquitous. >> We want to be a machine-learning first company. So, it's everywhere. >> So, now help me clarify for me, when? Because this is, you know, for mainstream companies who've programmed with, you know, DBMS, as a shared state manager for decades, help explain to them when you would still use a DBMS for shared state, and when you would start using the distributed state that's embedded in Flink, and the derived data, you know, at the endpoints, at the syncs. >> So I mean, I guess this kind of gets into your exact, your use cases and, you know, your opinions and thoughts about how to use these things best, but. >> George: Your opinion is what we're interested in. >> Right. From where I'm coming, I see basically databases as potential one sync for this data. They do things very well, right? They do structured queries very well. You can have indices built off that, aggregates, really feed into a lot of visualization stuff. >> George: Yeah. >> But, from where I am sitting, like we're really moving away from databases as something that feeds production data. We've got other stores to do that, that are sort of more tailored towards those scenarios. >> When you say to feed production data, this is transaction capture, or data capture. >> Right. So we don't have a lot of atomic transactions, outside the payments at Lyft, most of the stuff is eventually consistent. So we have stores, more like Dynamo or Cassandra HBase that feed a lot of our production data. >> And those databases, are they for like ambient information like influencing an interaction, it doesn't sound like automating a transaction. It would be, it sounds like, context that helps with analytics, but very separate from the OLTP apps. >> That's right. So we have, you can kind of bifurcate the company into the data that's used in production to make decisions that are like facing the user, and then our analytics back end, that really helps business analysts and like the executives make decisions about how we proceed. >> And so that second part, that backend, is more like operational efficiency. >> Man: Right. >> And coding new business processes to support new ways of doing business, but the customer-facing stuff specifically like with payments, that still needs a traditional OLTP. >> Man: Right. >> But there not, those use cases aren't growing that much. >> That's right. So, basically we have very specific use-cases for like a traditional database, but in terms of capturing the types of scale, and the type of growth, we're looking for at Lyft, we think some of the other storage engines suit those better. >> So in that use-case, would the OLTP DBMS be at the front end, would it be a source, or a sync? It sounds like it's a source. >> So we actually do it both ways. Right, so, it's great to get our transactional data flowing through our streaming system, it's a lot of value in that, but also then pushing it out, back to some of the aggregate results to DBMS, helps with our analytics pipeline. >> Okay, okay. Well this is actually really interesting. So, where do you see the dA platform helping, you know, going forward; is it something you don't really need because you've built all that scaffolding to help with sort of application life-cycle management, or or do you see it as something that'll help sort of push Flink sort of enterprise-wide? >> I think the dA platform really helps people sort of adopt Flink at an enterprise level. Maintaining the applications is a core part of what it means to run it as a business. And so we're looking at dA platform as a way of managing our applications, and I think, like I'm just talking about one, I'm mostly talking about one application we have for Flink at Lyft. >> Yeah. >> We have many other Flink programs actually running, that are sort of unrelated to my project. >> What about managing non-Flink applications? Do you need an application manager? Is it okay that it's associated with one service, or platform like Flink, or is there a desire you know among bleeding edge customers to have an overall, sort of infrastructure management, application management kind of suite. >> Yes, for sure. You're touching on something I have started to push inside of Lyft, which is the need for an overall application life-cycle management product that's not technology specific. >> Would these sort of plug into the dA platform and whatever the confluent, you know, equivalent is, or is it going to to directly tie to the, you know, operational capabilities, or the functional capabilities, not the management capabilities. In other words would it plug into like core Flink, core Kafka, core Spark, that sort of stuff? >> I think that's sort of largely to be determined. If you go back to sort of how distributed design system works, typically. We have a user plane, which is going to be our data users. Then you end up with the thing we're probably most familiar with, which is our data plane, technologies like Flink and Kafka and Hive, all those guys. What's missing in the middle right now is a control plane. It's a map from the user desire, from the user intention, to what we do with all of that data plane stuff. So launch a new program, maybe you need a new Kafka topic, maybe you need to provision in Kafka. Higher, you need to get some Flink programs running, and whether that talks directly talks to Flink, and goes against Kubernetes, or something like that, or whether it talks to a higher level, like more application-specific platform. >> Man: Yeah. >> I think, you know it's certainly a lot easier, if we have some of these platforms in the way. >> Because they give you better abstractions. >> That's right. >> To talk to the platforms. >> That's right. >> That's interesting. Okay, geesh, we learn something really, really interesting with each interview. I'm curious though, if you look out a couple years, how much of your application landscape will be continuous processing, and is that something you can see mainstream enterprises adopting, or has decades of work with, you know, batch and interactive sort of made people too difficult to learn something so radically new? >> I think it's all going to be driven by the business needs, and whether the value is there for people to make that transition 'cause it is quite expensive to invest in new infrastructure. For companies like Lyft, where we're trying to make decisions very quickly, you know, users get down to two seconds makes a difference for the customer, so we're trying to be as, you know, real-time as possible. I used to work at Salesforce. Salespeople are a little less sensitive to these things, and you know it's very, very traditional world. >> That's interesting. (background applauding) >> But even Salesforce is moving towards that style. >> Even Salesforce is moving? >> Is moving toward streaming processing. >> Really? >> George: So like, I think we're going to see it slowly be adopted across the big enterprises. >> George: I imagine that's probably for their analytics. >> That's where they're starting, of course, yeah. >> Okay. So, this was, a little more affirmation on to how we're going to see the control plane evolve, and the interesting use-cases that you're up to. I hope we can see you back next year. And you can tell us how far you've proceeded. >> I certainly hope so, yeah. >> This was really interesting. So, Greg Fee from Lyft. We will hopefully see you again. And this is George Gilbert. We're at the Data Artisans Flink Forward conference in San Francisco. We'll be back after this break. (techno music)
SUMMARY :
brought to you by Data Artisans. What's the first-use case, that comes to mind And one of the fundamental difficulties with that That's even a step beyond what Just the basic fundamental ability to have So we have, so feature generation crosses a broad So having a platform that can help us host with potential algorithms for how you So, what you're doing is offline, or out of the platform, Then you attach a stream to it that has just of the application logic and the data. So, like some of the stuff that was touched on politically correct thing, but you just explained (laughs) Then, is that going to be the enterprise-wide platform in terms of the life-cycles. and build their apps on it. in the thousands of features, being able to generate There's features that are typically And is the application manager tooling going to help that that's the way we're going to manage the model stuff Then that's sort of at the boundary of what you're of the Lyft engineering system. and when do you have it in a shared database? So, the goal is to have a central hub, So, an event hub in the middle allows us to decompose some of our production systems to feed into Okay, this is quite, sounds like the most ambitious So, it's everywhere. and the derived data, you know, at the endpoints, about how to use these things best, but. into a lot of visualization stuff. We've got other stores to do that, that are sort of When you say to feed production data, outside the payments at Lyft, most of the stuff And those databases, are they for like ambient information So we have, you can kind of bifurcate the company And so that second part, that backend, is more like of doing business, but the customer-facing stuff the types of scale, and the type of growth, we're looking be at the front end, would it be a source, or a sync? some of the aggregate results to DBMS, So, where do you see the dA platform helping, you know, Maintaining the applications is a core part actually running, that are sort of unrelated to my project. you know among bleeding edge customers to have an overall, inside of Lyft, which is the need for an overall application or is it going to to directly tie to the, you know, to what we do with all of that data plane stuff. I think, you know it's certainly a lot easier, or has decades of work with, you know, and you know it's very, That's interesting. that style. adopted across the big enterprises. I hope we can see you back next year. We're at the Data Artisans Flink Forward conference
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
George Gilbert | PERSON | 0.99+ |
George | PERSON | 0.99+ |
Greg | PERSON | 0.99+ |
Greg Fee | PERSON | 0.99+ |
Data Artisans | ORGANIZATION | 0.99+ |
San Francisco | LOCATION | 0.99+ |
Lyft | ORGANIZATION | 0.99+ |
thousands | QUANTITY | 0.99+ |
next year | DATE | 0.99+ |
second part | QUANTITY | 0.99+ |
Uber | ORGANIZATION | 0.99+ |
each interview | QUANTITY | 0.99+ |
Dynamo | ORGANIZATION | 0.99+ |
Salesforce | ORGANIZATION | 0.99+ |
Apache | ORGANIZATION | 0.98+ |
Flink | ORGANIZATION | 0.98+ |
one service | QUANTITY | 0.98+ |
two layers | QUANTITY | 0.98+ |
two seconds | QUANTITY | 0.98+ |
each | QUANTITY | 0.97+ |
thousands of features | QUANTITY | 0.97+ |
both ways | QUANTITY | 0.97+ |
Kafka | TITLE | 0.93+ |
first-use case | QUANTITY | 0.92+ |
one application | QUANTITY | 0.92+ |
Druid | TITLE | 0.92+ |
Flink Forward | TITLE | 0.92+ |
decades | QUANTITY | 0.91+ |
Elasticsearch | TITLE | 0.89+ |
Data Artisans Flink Forward | EVENT | 0.89+ |
one | QUANTITY | 0.89+ |
Artisan | EVENT | 0.87+ |
first company | QUANTITY | 0.87+ |
hundreds of data scientists | QUANTITY | 0.87+ |
both batch | QUANTITY | 0.84+ |
one piece | QUANTITY | 0.83+ |
2018 | DATE | 0.81+ |
Flink | TITLE | 0.8+ |
Hive | TITLE | 0.77+ |
Presto | TITLE | 0.76+ |
this morning | DATE | 0.75+ |
features | QUANTITY | 0.74+ |
couple | QUANTITY | 0.73+ |
Flink Forward | EVENT | 0.69+ |
Hive | ORGANIZATION | 0.65+ |
Spark | TITLE | 0.62+ |
Kubernetes | ORGANIZATION | 0.61+ |
Data | ORGANIZATION | 0.6+ |
Cassandra HBase | ORGANIZATION | 0.57+ |
Josh Klahr & Prashanthi Paty | DataWorks Summit 2017
>> Announcer: Live from San Jose, in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2017. Brought to you by Hortonworks. >> Hey, welcome back to theCUBE. Day two of the DataWorks Summit, I'm Lisa Martin with my cohost, George Gilbert. We've had a great day and a half so far, learning a ton in this hyper-growth, big data world meets IoT, machine learning, data science. George and I are excited to welcome our next guests. We have Josh Klahr, the VP of Product Management from AtScale. Welcome George, welcome back. >> Thank you. >> And we have Prashanthi Paty, the Head of Data Engineering for GoDaddy. Welcome to theCUBE. >> Thank you. >> Great to have you guys here. So, wanted to kind of talk to you guys about, one, how you guys are working together, but two, also some of the trends that you guys are seeing. So as we talked about, in the tech industry, it's two degrees of Kevin Bacon, right. You guys worked together back in the day at Yahoo. Talk to us about what you both visualized and experienced in terms of the Hadoop adoption maturity cycle. >> Sure. >> You want to start, Josh? >> Yeah, I'll start, and you can chime in and correct me. But yeah, as you mentioned, Prashanthi and I worked together at Yahoo. It feels like a long time ago. In our central data group. And we had two main jobs. First job was, collect all of the data from our ad systems, our audience systems, and stick that data into a Hadoop cluster. At the time, we were kind of doing it while Hadoop was kind of being developed. And the other thing that we did was, we had to support a bunch of BI consumers. So we built cubes, we built data marts, we used MicroStrategy, Tableau, and I would say the experience there was a great experience with Hadoop in terms of the ability to have low-cost storage, scale out data processing of all of, what were really, billions and billions, tens of billions of events a day. But when it came to BI, it felt like we were doing stuff the old way. And we were moving data off cluster, and making it small. In fact, you did a lot of that. >> Well, yeah, at the end of the day, we were using Hadoop as a staging layer. So we would process a whole bunch of data there, and then we would scale it back, and move it into, again, relational stores or cubes, because basically we couldn't afford to give any accessibility to BI tools or to our end users directly on Hadoop. So while we surely did a large-scale data processing in Hadoop layer, we failed to turn on the insights right there. >> Lisa: Okay. >> Maybe there's a lesson in there for folks who are getting slightly more mature versions of Hadoop now, but can learn from also some of the experiences you've had. Were there issues in terms of, having cleaned and curated data, were there issues for BI with performance and the lack of proper file formats like Parquet? What was it that where you hit the wall? >> It was both, you have to remember this, we were probably one of the first teams to put a data warehouse on Hadoop. So we were dealing with Pig versions of like, 0.5, 0.6, so we were putting a lot of demand on the tooling and the infrastructure. Hadoop was still in a very nascent stage at that time. That was one. And I think a lot of the focus was on, hey, now we have the ability to do clickstream analytics at scale, right. So we did a lot of the backend stuff. But the presentation is where I think we struggled. >> So would that mean that you did do, the idea is that you could do full resolution without sampling on the backend, and then you would extract and presumably sort of denormalize so that you could, essentially run data match for subject matter interests. >> Yeah, and that's exactly what we did is, we took all of this big data, but to make it work for BI, which were two things, one was performance. It was really, can you get an interactive query and response time. And the other thing was the interface. Can a Tableau user connect and understand what they're looking at. You had to make the data small again. And that was actually the genesis of AtScale, which is where I am today, was, we were frustrated with this, big data platform and having to then make the data small again in order to support BI. >> That's a great transition, Josh. Let's actually talk about AtScale. You guys saw BI on Hadoop as this big white space. How have you succeeded there, and then let's talk about what GoDaddy is doing with AtScale and big data. >> Yeah, I think that we definitely learned, we took the learnings from our experience at Yahoo, and we really thought about, if we were to start from scratch, and solve the problem the way we wanted it to be solved, what would that system look like. And it was a few things. One was an interface that worked for BI. I don't want to date myself, but my experience in the software space started with OLAP. And I can tell you OLAP isn't dead. When you go and talk to an enterprise, a fortune 1000 enterprise and you talk about OLAP, that's how they think. They think in terms of measures and dimensions and hierarchies. So one important thing for us was to project an OLAP interface on top of data that's Hadoop native. It's Hive tables, Parquet, ORC, you kind of talk about all of the mess that may sit underneath the covers. So one thing was projecting that interface, the other thing was delivering performance. So we've invested a lot in using the Hadoop cluster natively to deliver performing queries. We do this by creating aggregate tables and summary tables and being smart about how we route queries. But we've done it in a way that makes a Hadoop admin very happy. You don't have to buy a bunch of AtScale servers in addition to your Hadoop cluster. We scale the way the Hadoop cluster scales. So we don't require separate technology. So we fit really nicely into that Hadoop ecosystem. >> So how do you make, making the Hadoop admin happy is a good thing. How do you make the business user happy, who needs now, as we were here yesterday, to kind of merge more with the data science folks to be able to understand or even have the chance to articulate, "These are the business outcomes "we want to look for and we want to see." How do you guys, maybe, under the hood, if you will, AtScale, make the business guys and gals happy? >> I'll share my opinion and then Prashanthi can comment on her experience but, as I've mentioned before, the business users want an interface that's simple to use. And so that's one thing we do, is, we give them the ability to just look at measures and dimensions. If I'm a business, I grew up using Excel to do my analysis. The thing I like most as an analyst is a big fat wide table. And so that's what, we make an underlying Hadoop cluster and what could be tens or hundreds of tables look like a single big fat wide table for a data analyst. You talk to a data scientist, you talk to a business analyst, that's the way they want to view the world. So that's one thing we do. And then, we give them response times that are fast. We give them interactivity, so that you could really quickly start to get a sense of the shape of the data. >> And allowing them to get that time to value. >> Yes. >> I can imagine. >> Just a follow-up on that. When you have to prepare the aggregates, essentially like the cubes, instead of the old BI tools running on a data mart, what is the additional latency that's required from data coming fresh into the data lake and then transforming it into something that's consumption ready for the business user? >> Yeah, I think I can take that. So again, if you look at the last 10 years, in the initial period, certainly at Yahoo, we just threw engineering resources at that problem, right. So we had teams dedicated to building these aggregates. But the whole premise of Hadoop was the ability to do unstructured optimizations. And by having a team find out the new data coming in and then integrating that into your pipeline, so we were adding a lot of latency. And so we needed to figure out how we can do this in a more seamless way, in a more real-time way. And get the, you know, the real premise of Hadoop. Get it at the hands of our business users. I mean, I think that's where AtScale is doing a lot of the good work in terms of dynamically being able to create aggregates based on the design that you put in the cube. So we are starting to work with them on our implementation. We're looking forward to the results. >> Tell us a little bit more about what you're looking to achieve. So GoDaddy is a customer of AtScale. Tell us a little bit more about that. What are you looking to build together, and kind of, where are you in your journey right now? >> Yeah, so the main goal for us is to move beyond predefined models, dashboards, and reports. So we want to be more agile with our schema changes. Time to market is one. And performance, right. Ability to put BI tools directly on top of Hadoop, is one. And also to push as much of the semantics as possible down into the Hadoop layer. So those are the things that we're looking to do. >> So that sounds like a classic business intelligence component, but sort of rethought for a big data era. >> I love that quote, and I feel it. >> Prashanthi: Yes. >> Josh: Yes. (laughing) >> That's exactly what we're trying to do. >> But that's also, some of the things you mentioned are non-trivial. You want to have this, time goes in to the pre-processing of data so that it's consumable, but you also wanted it to be dynamic, which is sort of a trade-off, which means, you know, that takes time. So is that a sort of a set of requirements, a wishlist for AtScale, or is that something that you're building on your own? >> I think there's a lot happening in that space. They are one of the first people to come out with their product, which is solving a real problem that we tried to solve for a long time. And I think as we start using them more and more, we'll surely be pushing them to bring in more features. I think the algorithm that they have to dynamically generate aggregates is something that we're giving quite a lot of feedback to them on. >> Our last guest from Pentaho was talking about, there was, in her keynote today, the quote from I think McKinsey report that said, "40% of machine learning data is either not fully "exploited or not used at all." So, tell us, kind of, where is big daddy regarding machine learning? What are you seeing? What are you seeing at AtScale and how are you guys going to work together to maybe venture into that frontier? >> Yeah, I mean, I think one of the key requirements we're placing on our data scientists is, not only do you have to be very good at your data science job, you have to be a very good programmer too to make use of the big data technologies. And we're seeing some interesting developments like very workload-specific engines coming into the market now for search, for graph, for machine learning, as well. Which is supposed to give the tools right into the hands of data scientists. I personally haven't worked with them to be able to comment. But I do think that the next realm on big data is this workload-specific engines, and coming on top of Hadoop, and realizing more of the insights for the end users. >> Curious, can you elaborate a little more on those workload-specific engines, that sounds rather intriguing. >> Well, I think interactive, interacting with Hadoop on a real-time basis, we see search-based engines like Elasticsearch, Solr, and there is also Druid. At Yahoo, we were quite a bit shop of Druid actually. And we were using it as an interactive query layer directly with our applications, BI applications. This is our JavaScript-based BI applications, and Hadoop. So I think there are quite a few means to realize insights from Hadoop now. And that's the space where I see workload-specific engines coming in. >> And you mentioned earlier before we started that you were using Mahout, presumably for machine learning. And I guess I thought the center of gravity for that type of analytics has moved to Spark, and you haven't mentioned Spark yet. We are not using Mahout though. I mentioned it as something that's in that space. But yeah, I mean, Spark is pretty interesting. Spark SQL, doing ETL with Spark, as well as using Spark SQL for queries is something that looks very, very promising lately. >> Quick question for you, from a business perspective, so you're the Head of Engineering at GoDaddy. How do you interact with your business users? The C-suite, for example, where data science, machine learning, they understand, we have to have, they're embracing Hadoop more and more. They need to really, embracing big data and leveraging Hadoop as an enabler. What's the conversation like, or maybe even the influence of the GoDaddy business C-suite on engineering? How do you guys work collaboratively? >> So we do have very regular stakeholder meeting. And these are business stakeholders. So we have representatives from our marketing teams, finance, product teams, and data science team. We consider data science as one of our customers. We take requirements from them. We give them peek into the work we're doing. We also let them be part of our agile team so that when we have something released, they're the first ones looking at it and testing it. So they're very much part of the process. I don't think we can afford to just sit back and work on this monolithic data warehouse and at the end of the day say, "Hey, here is what we have" and ask them to go get the insights from it. So it's a very agile process, and they're very much part of it. >> One last question for you, sorry George, is, you guys mentioned you are sort of early in your partnership, unless I misunderstood. What has AtScale help GoDaddy achieve so far and what are your expectations, say the next six months? >> We want the world. (laughing) >> Lisa: Just that. >> Yeah, but the premise is, I mean, so Josh and I, we were part of the same team at Yahoo, where we faced problems that AtScale is trying to solve. So the premise of being able to solve those problems, which is, like their name, basically delivering data at scale, that's the premise that I'm very much looking forward to from them. >> Well, excellent. Well, we want to thank you both for joining us on theCUBE. We wish you the best of luck in attaining the world. (all laughing) >> Josh: There we go, thank you. >> Excellent, guys. Josh Klahr, thank you so much. >> My pleasure. Prashanthi, thank you for being on theCUBE for the first time. >> No problem. >> You've been watching theCUBE live at the day two of the DataWorks Summit. For my cohost George Gilbert, I am Lisa Martin. Stick around guys, we'll be right back. (jingle)
SUMMARY :
Brought to you by Hortonworks. George and I are excited to welcome our next guests. And we have Prashanthi Paty, Talk to us about what you both visualized and experienced And the other thing that we did was, and then we would scale it back, and the lack of proper file formats like Parquet? So we were dealing with Pig versions of like, the idea is that you could do full resolution And the other thing was the interface. How have you succeeded there, and solve the problem the way we wanted it to be solved, So how do you make, And so that's one thing we do, is, that's consumption ready for the business user? based on the design that you put in the cube. and kind of, where are you in your journey right now? So we want to be more agile with our schema changes. So that sounds like a classic business intelligence Josh: Yes. of data so that it's consumable, but you also wanted And I think as we start using them more and more, What are you seeing at AtScale and how are you guys and realizing more of the insights for the end users. Curious, can you elaborate a little more And we were using it as an interactive query layer and you haven't mentioned Spark yet. machine learning, they understand, we have to have, and at the end of the day say, "Hey, here is what we have" you guys mentioned you are sort of early We want the world. So the premise of being able to solve those problems, Well, we want to thank you both for joining us on theCUBE. Josh Klahr, thank you so much. for the first time. of the DataWorks Summit.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Josh | PERSON | 0.99+ |
George | PERSON | 0.99+ |
Lisa Martin | PERSON | 0.99+ |
George Gilbert | PERSON | 0.99+ |
Josh Klahr | PERSON | 0.99+ |
Prashanthi Paty | PERSON | 0.99+ |
Prashanthi | PERSON | 0.99+ |
Lisa | PERSON | 0.99+ |
Yahoo | ORGANIZATION | 0.99+ |
Kevin Bacon | PERSON | 0.99+ |
San Jose | LOCATION | 0.99+ |
Excel | TITLE | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
GoDaddy | ORGANIZATION | 0.99+ |
40% | QUANTITY | 0.99+ |
yesterday | DATE | 0.99+ |
AtScale | ORGANIZATION | 0.99+ |
tens | QUANTITY | 0.99+ |
Spark | TITLE | 0.99+ |
Druid | TITLE | 0.99+ |
First job | QUANTITY | 0.99+ |
Hadoop | TITLE | 0.99+ |
two | QUANTITY | 0.99+ |
Spark SQL | TITLE | 0.99+ |
today | DATE | 0.99+ |
two degrees | QUANTITY | 0.99+ |
both | QUANTITY | 0.98+ |
one | QUANTITY | 0.98+ |
DataWorks Summit | EVENT | 0.98+ |
two things | QUANTITY | 0.98+ |
Elasticsearch | TITLE | 0.98+ |
first time | QUANTITY | 0.98+ |
DataWorks Summit 2017 | EVENT | 0.97+ |
first teams | QUANTITY | 0.96+ |
Solr | TITLE | 0.96+ |
Mahout | TITLE | 0.95+ |
hundreds of tables | QUANTITY | 0.95+ |
two main jobs | QUANTITY | 0.94+ |
One last question | QUANTITY | 0.94+ |
billions and | QUANTITY | 0.94+ |
McKinsey | ORGANIZATION | 0.94+ |
Day two | QUANTITY | 0.94+ |
One | QUANTITY | 0.94+ |
Parquet | TITLE | 0.94+ |
Tableau | TITLE | 0.93+ |
Carlo Vaiti | DataWorks Summit Europe 2017
>> Announcer: You are CUBE Alumni. Live from Munich, Germany, it's theCUBE. Covering, DataWorks Summit Europe 2017. Brought to you by Hortonworks. >> Hello, everyone, welcome back to live coverage at DataWorks 2017, I'm John Furrier with my cohost, Dave Vellante. Two days of coverage here in Munich, Germany, covering Hortonworks and Yahoo, presenting Hadoop Summit, now called DataWorks 2017. Our next guest is Carlo Vaiti, who's the HPE chief technology strategist, EMEA Digital Solutions, Europe, Middle East, and Africa. Welcome to theCUBE. >> Thank you, John. >> So we were just chatting before we came on, of your historic background at IBM, Oracle, and now HPE, and now back into the saddle there. >> Don't forget Sun Microsystems. >> Sun Microsystems, sorry, Sun, yeah. I mean, great, great run. >> It was a long run. >> You've seen the computer revolution happen. I worked at HP for nine years, from '88 to '97. Again, Dave was a premier analyst during that run of client-server. We've seen the computer revolution happen. Now we're seeing the digital revolution where the iPhone is now 10 years old, Cloud is booming, data's at the center of the value proposition, so a completely new disruptive capability. >> Carlo: Sure, yes. >> So what are you doing as the CTO, chief technologist for HPE, how are you guys bringing this story together? 'Cause there's so much going on at HPE. You got the services spit, you got the software split, and HP's focusing on the new style of IT, as Meg Whitman calls it. >> So, yeah. My role in EMEA is actually all about having basically a visionary kind of strategy role for what's going to be HP in the future, in terms of IT. And one of the things that we are looking at is, is specifically to have, we split our strategy in three different aspects, so three transformation areas. The first one which we usually talk is what I call hybrid IT, right, which is basically making services around either On-Premise or on Cloud for our customer base. The second one is actually power the Intelligent Edge, so is actually looking after our collaboration and when we acquire Aruba components. And the third one, which is in the middle, and that's why I'm here at the DataWorks Summit, is actually the data-analytics aspects. And we have a couple of solution in there. One is the Enterprise great Hadoop, which is part of this. This is actually how we generalize all the figure and the strategy for HP. >> It's interesting, Dave and I were talking yesterday, being in Europe, it's obviously a different sideshow, it's smaller than the DataWorks or Hadoop Summit in North America in San Jose, but there's a ton of Internet of things, IoT or IIoT, 'cause here in Germany, obviously, a lot of industrial nations, but in Europe in general, a lot of smart cities initiatives, a lot of mobility, a ton of Internet of things opportunity, more than in the US. >> Absolutely. >> Can you comment on how you guys are tackling the IoT? Because it's an Intelligent Edge, certainly, but it's also data, it's in your wheelhouse. >> Yes, sure. So I'm actually working, it's a good question, because I'm actually working a couple of projects in Eastern Europe, where it's all about Industrial IoT Analytics, IIoTA. That's the new terminology we use. So what we do is actually, we analyze from a business perspective, what are the business pain points, in an oil and gas company for example. And we understand for example, what kind of things that they need and must have. And what I'm saying here is, one of the aspects for example, is the drilling opportunity. So how much oil you can extract from a specific rig in the middle of the North Sea, for example. This is one of the key question, because the customer want to understand, in the future, how much oil they can extract. The other one is for example, the upstream business. So doing on the retail side and having, say, when my customer is stopping in a gas station, I want go in the shop, immediately giving, I dunno, my daughter, a kind of campaign for the Barbie, because they like the Barbie. So IoT, Industrial IoT help us in actually making a much better customer experience, and that's the case of the upstream business, but is also helping us in actually much faster business outcomes. And that's what the customer wants, right? 'Cause, and was talking with your colleague before, I'm talking to the business guy. I'm not talking to the IT anymore in these kind of place, and that's how IoT allow us a chance to change the conversation at the industry level. >> These are first-time conversations too. You're getting at the kinds of business conversations that weren't possible five years ago. >> Carlo: Yes, sure. >> I mean and 10 years ago, they would have seemed fantasy. Now they're reality. >> The role of analytics in my opinion, is becoming extremely key, and I said this morning, for me my best center is that the detail, is the stone foundation of the digital economy. I continue to repeat this terminology, because it's actually where everything is starting from. So what I mean is, let's take a look at the analytic aspect. So if I'm able to analyze the data close to the shop floor, okay, close to the shop manufacturing floor, if I'm able to analyze my data on the rig, in the oil and gas industry, if I'm able to analyze doing preprocessing analytics, with Kafka, Druid, these kind of open-source software, where close to the Intelligent Edge, then my customers going to be happy, because I give them very fast response, and the decision-maker can get to decision in a faster time. Today, it takes a long time to take these type of decision. So that's why we want to move into the power Intelligent Edge. >> So you're saying, data's foundational, but if you get to the Intelligent Edge, it's dynamic. So you have a dynamic reactive, realtime time series, or presences of data, but you need the foundational pre-data. >> Perfect. >> Is that kind of what you're getting at? >> Yes, that's the first step. Preprocessing analytics is what we do. In the next generation of, we think is going to be Industrial IoT Analytics, we're going to actually put massive amount of compute close to the shop manufacturing floor. We call internally or actually externally, convergent planned infrastructure. And that's the key point, right? >> John: Convergent plan? >> Convergent planned infrastructure, CPI. If you look at in Google, you will find. It's a solution we bring in the market a few months ago. We announce it in December last year. >> Yeah, Antonio's smart. He also had a converged systems as well. One of the first ones. >> Yeah, so that's converge compute at the edge basically. >> Correct, converge compute-- >> Very powerful. >> Very powerful, and we run analytics on the edge. That's the key point. >> Which we love, because that means you don't have to send everything back to the Cloud because it's too expensive, it's going to take too long, it's not going to work. >> Carlo: The bandwidth on the network is much less. >> There's no way that's going to be successful, unless you go to the edge and-- >> It takes time. >> With a cost. >> Now the other thing is, of course, you've got the Aruba asset, to be able to, I always say, joke, connect the windmill. But, Carlo, can we go back to the IoTA example? >> Carlo: Correct, yeah. >> I want to help, help our audience understand, sort of, the new HP, post these spin merges. So perviously you would say, okay, we have Vertica. You still have partnership, or you still own Vertica, but after September 1st-- >> Absolutely, absolutely. It's part of the columnar side-- >> Right, yes, absolutely, but, so. But the new strategy is to be more of a platform for a variety of technology. So how for instance would you solve, or did you solve, that problem that you described? What did you actually deliver? >> So again, as I said, we're, especially in the Industrial IoT, we are an ecosystem, okay? So we're one element of the ecosystem solution. For the oil and gas specifically, we're working with other system integrator. We're working with oil and the industry gas expertise, like DXC company, right, the company that we just split a few days ago, and we're working with them. They're providing the industry expertise. We are a infrastructure provided around that, and the services around that for the infrastructure element. But for the industry expertise, we try to have a kind of little bit of knowledge, to start the conversation with the customer. But again, my role in the strategy is actually to be a ecosystem digital integrator. That's the new terminology we like to bring in the market, because we really believe that's the way HP role is going to be. And the relevance of HP is totally depending if we are going to be successful in these type of things. >> Okay, now a couple other things you talked about in your keynote. I'm just going to list them, and then we can go wherever we want. There was Data Link 3.0, Storage Disaggregation, which is kind of interesting, 'cause it's been a problem. Hadoop as a service, Realtime Everywhere, and then Analytics at the Edge, which we kind of just talked about. Let's pick one. Let's start with Data Link 3.0. What is that? John doesn't like the term data link. He likes data ocean. >> I like data ocean. >> Is Data Link 3.0 becoming an ocean? >> It's becoming an ocean. So, Data Link 3.0 for us is actually following what is going to be the future for HDFS 3.0. So we have three elements. The erasure coding feature, which is coming on HDFS. The second element is around having HDFS data tier, multi-data tier. So we're going to have faster SSD drives. We're going to have big memory nodes. We're going to have GPU nodes. And the reason why I say disaggregation is because some of the workload will be only compute, and some of the workload will be only storage, okay? So we're going to bring, and the customer require this, because it's getting more data, and they need to have for example, YARN application running on compute nodes, and the same level, they want to have storage compute block, sorry, storage components, running on the storage model, like HBase for example, like HDFS 3.0 with the multi-tier option. So that's why the data disaggregation, or disaggregation between compute and storage, is the key point. We call this asymmetric, right? Hadoop is becoming asymmetric. That's what it mean. >> And the problem you're solving there, is when I add a node to a cluster, I don't have to add compute and storage together, I can disaggregate and choose whatever I need, >> Everyone that we did. >> based on the workload. >> They are all multitenancy kind of workload, and they are independent and they scale out. Of course, it's much more complex, but we have actually proved that this is the way to go, because that's what the customer is demanding. >> So, 3.0 is actually functional. It's erasure coding, you said. There's a data tier. You've got different memory levels. >> And I forgot to mention, the containerization of the application. Having dockerized the application for example. Using mesosphere for example, right? So having the containerization of the application is what all of that means, because what we do in Hadoop, we actually build the different clusters, they need to talk to each other, and change data in a faster way. And a solution like, a product like SQL Manager, from Hortonworks, is actually helping us to get this connection between the cluster faster and faster. And that's what the customer wants. >> And then Hadoop as a service, is that an on-premise solution, is that a hybrid solution, is it a Cloud solution, all three? >> I can offer all of them. Hadoop is a service could be run on-premise, could be run on a public Cloud, could be run on Azure, or could be mix of them, partially on-premise, and partially on public. >> And what are you seeing with regard to customer adoption of Cloud, and specifically around Hadoop and big data? >> I think the way I see that option is all the customer want to start very small. The maturity is actually better from a technology standpoint. If you're asking me the same question maybe a year ago, I would say, it's difficult. Now I think they've got the point. Every large customer, they want to build this big data ocean, note the delay, ocean, whatever you want to call it. >> John: Love that. (laughs) >> All right. They want to build this data ocean, and the point I want to make is, they want to start small, but they want to think very high. Very big, right, from their perspective. And the way they approach us is, we have a kind of methodology. We establish the maturity assessment. We do a kind of capability maturity assessment, where we find that if the customer is actually a pioneer, or is actually a very traditional one, so it's very slow-going. Once we determine where is the stage of the customer is, we propose some specific proof of concept. And in three months usually, we're putting this in place. >> You also talked about realtime everywhere. We in our research, we talk about the, historically, you had batchy of interactive, and now you have what we call continuous, or realtime streaming workloads. How prevalent is that? Where do you see it going in the future? >> So I think is another train for the future, as I mentioned this morning in my presentation. So and Spark is actually doing the open-source memory engine process, is actually the core of this stuff. We see 60 to 70 time faster analytics, compared to not to use Spark. So many customer implemented Spark because of this. The requirement are that the customer needs an immediate response time, okay, for a specific decision-making that they have to do, in order to improve their business, in order to improve their life. But this require a different architecture. >> I have a question, 'cause you, you've lived in the United States, you're obviously global, and spent a lot of time in Europe as well, and a lot of times, people want to discuss the differences between, let's make it specific here, the European continent and North America, and from a sophistication standpoint, same, we can agree on that, but there are still differences. Maybe, more greater privacy concerns. The whole thing with the Cloud and the NSA in the United States, created some concerns. What do you see as the differences today between North America and Europe? >> From my perspective, I think we are much more for example take IoT, Industrial IoT. I think in Europe we are much more advanced. I think in the manufacturing and the automotive space, the connected car kind of things, autonomous driving, this is something that we know already how to manage, how to do it. I mean, Tesla in the US is a good example that what I'm saying is not true, but if I look at for example, large German manufacturing car, they always implemented these type of things already today. >> Dave: For years, yeah. >> That's the difference, right? I think the second step is about the faster analytic approach. So what I mentioned before. The Power the Intelligent Edge, in my opinion at the moment, is much more advanced in the US compared to Europe. But I think Europe is starting to run back, and going on the same route. Because we believe that putting compute capacity on the edge is what actually the customer wants. But that's the two big differences I see. >> The other two big external factors that we like to look at, are Brexit and Trump. So (laughs) how 'about Brexit? Now that it's starting to sort of actually become, begin the process, how should we think about it? Is it overblown? It is critical? What's your take? >> Well, I think it's too early to say. UK just split a few days ago, right, officially. It's going to take another 18 months before it's going to be completed. From a commercial standpoint, we don't see any difference so far. We're actually working the same way. For me it's too early to say if there's going to be any implication on that. >> And we don't know about Trump. We don't have to talk about it, but the, but I saw some data recently that's, European sentiment, business sentiment is trending stronger than the US, which is different than it's been for the last many years. What do you see in terms of just sentiment, business conditions in Europe? Do you see a pick up? >> It's getting better, it is getting better. I mean, if I look at the major countries, the P&L is going positive, 1.5%. So I think from that perspective, we are getting better. Of course we are still suffering from the Chinese, and Japanese market sometimes. Especially in some of the big large deals. The inclusion of the Japanese market, I feel it, and the Chinese market, I feel that. But I think the economy is going to be okay, so it's going to be good. >> Carlo, I want to thank you for coming on and sharing your insight, final question for you. You're new to HPE, okay. We have a lot of history, obviously I was, spent a long part of my career there, early in my career. Dave and I have covered the transformation of HP for many, many years, with theCUBE certainly. What attracted you to HP and what would you say is going on at HP from your standpoint, that people should know about? >> So I think the number one thing is that for us the word is going to be hybrid. It means that some of the services that you can implement, either on-premise or on Cloud, could be done very well by the new Pointnext organization. I'm not part of Pointnext. I'm in the EG, Enterprise Group division. But I am fan for Pointnext because I believe this is the future of our company, is on the services side, that's where it's going. >> I would just point out, Dave and I, our commentary on the spin merge has been, create these highly cohesive entities, very focused. Antonio now running EG, big fans, of where it's actually an efficient business model. >> Carlo: Absolutely. >> And Chris Hsu is running the Micro Focus, CUBE Alumni. >> Carlo: It's a very efficient model, yes. >> Well, congratulations and thanks for coming on and sharing your insights here in Europe. And certainly it is an IoT world, IIoT. I love the analytics story, foundational services. It's going to be great, open source powering it, and this is theCUBE, opening up our content, and sharing that with you. I'm John Furrier, Dave Vellante. Stay with us for more great coverage, here from Munich after the short break.
SUMMARY :
Brought to you by Hortonworks. Welcome to theCUBE. and now back into the saddle there. I mean, great, great run. data's at the center of the value proposition, and HP's focusing on the new style And one of the things that we are looking at is, it's smaller than the DataWorks or Hadoop Summit Can you comment on how you guys are tackling the IoT? and that's the case of the upstream business, You're getting at the kinds of business conversations I mean and 10 years ago, they would have seemed fantasy. and the decision-maker can get to decision in a faster time. So you have a dynamic reactive, And that's the key point, right? It's a solution we bring in the market a few months ago. One of the first ones. That's the key point. it's going to take too long, it's not going to work. Now the other thing is, sort of, the new HP, post these spin merges. It's part of the columnar side-- But the new strategy is to be more That's the new terminology we like to bring in the market, John doesn't like the term data link. and the same level, they want to have but we have actually proved that this is the way to go, So, 3.0 is actually functional. So having the containerization of the application Hadoop is a service could be run on-premise, all the customer want to start very small. John: Love that. and the point I want to make is, they want to start small, and now you have what we call continuous, is actually the core of this stuff. in the United States, created some concerns. I mean, Tesla in the US is a good example is much more advanced in the US compared to Europe. actually become, begin the process, before it's going to be completed. We don't have to talk about it, but the, and the Chinese market, I feel that. Dave and I have covered the transformation of HP It means that some of the services that you can implement, our commentary on the spin merge has been, I love the analytics story, foundational services.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Carlo | PERSON | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
Europe | LOCATION | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Germany | LOCATION | 0.99+ |
Trump | PERSON | 0.99+ |
Meg Whitman | PERSON | 0.99+ |
Vertica | ORGANIZATION | 0.99+ |
Pointnext | ORGANIZATION | 0.99+ |
Chris Hsu | PERSON | 0.99+ |
John | PERSON | 0.99+ |
Carlo Vaiti | PERSON | 0.99+ |
John Furrier | PERSON | 0.99+ |
HP | ORGANIZATION | 0.99+ |
Munich | LOCATION | 0.99+ |
HPE | ORGANIZATION | 0.99+ |
Yahoo | ORGANIZATION | 0.99+ |
Sun Microsystems | ORGANIZATION | 0.99+ |
Antonio | PERSON | 0.99+ |
US | LOCATION | 0.99+ |
EG | ORGANIZATION | 0.99+ |
second element | QUANTITY | 0.99+ |
United States | LOCATION | 0.99+ |
second step | QUANTITY | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
December last year | DATE | 0.99+ |
iPhone | COMMERCIAL_ITEM | 0.99+ |
San Jose | LOCATION | 0.99+ |
1.5% | QUANTITY | 0.99+ |
yesterday | DATE | 0.99+ |
North America | LOCATION | 0.99+ |
September 1st | DATE | 0.99+ |
'97 | DATE | 0.99+ |
'88 | DATE | 0.99+ |
Africa | LOCATION | 0.99+ |
one | QUANTITY | 0.99+ |
Today | DATE | 0.99+ |
three months | QUANTITY | 0.99+ |
Eastern Europe | LOCATION | 0.99+ |
Sun | ORGANIZATION | 0.99+ |
Two days | QUANTITY | 0.99+ |
60 | QUANTITY | 0.99+ |
DataWorks 2017 | EVENT | 0.99+ |
10 years ago | DATE | 0.99+ |
DXC | ORGANIZATION | 0.98+ |
EMEA Digital Solutions | ORGANIZATION | 0.98+ |
five years ago | DATE | 0.98+ |
a year ago | DATE | 0.98+ |
Tesla | ORGANIZATION | 0.98+ |