Ben Sharma, Tony Fisher, Zaloni - BigData SV 2017 - #BigDataSV - #theCUBE
>> Announcer: Live from San Jose, California, it's The Cube, covering Big Data Silicon Valley 20-17. (rhythmic music) >> Hey, welcome back, everyone. We're live in Silicon Valley for Big Data SV, Big Data Silicon Valley in conjunction with Strata + Hadoob. This is the week where it all happens in Silicon Valley around the emergence of the Big Data as it goes to the next level. The Cube is actually on the ground covering it like a blanket. I'm John Furrier. My cohost, George Gilbert with Boogie Bond. And our next guest, we have two executives from Zeloni, Ben Sharma, who's the founder and CEO, and Tony Fischer, SVP and strategy. Guys, welcome back to The Cube. Good to see you. >> Thank you for having us back. >> You guys are great guests. You're in New York for Big Data NYC, and a lot is going on, certainly, here, and it's just getting kicked off with Strata-Hadoob, they got the sessions today, but you guys have already got some news out there. Give us the update. What's the big discussion at the show? >> So yeah, 20-16 was a great year for us. A lot of growth. We tripled our customer base, and a lot of interest in data lake, as customers are going from say Pilot and POCs into production implementation so far though. And in conjunction with that, this week we launched what we call a solution named Data Lake in a Box, appropriately, right? So what that means is we're bringing the full stack together to customers, so that we can get a data lake up and running in eight weeks time frame, with enterprise create data ingestion from their source systems hydrated into the data lake and ready for analytics. >> So is it a pretty big box, and is it waterproof? (all laughing) I mean, this is the big discussion now, pun intended. But the data lake is evolving, so I wanted to get your take on it. This is kind of been a theme that's been leading up and now front and center here on The Cube. Already the data lake has changed, also we've heard, I think Dave Alante in New York said data swamp. But using the data is critical on a data lake. So as it goes to more mature model of leveraging the data, what are the key trends right now? What are you guys seeing? Because this is a hot topic that everyone is talking about. >> Well, that's a good distinction that we like to make, is the difference between a data swamp and a data lake. >> And a data lake is much more governed. It has the rigor, it has the automation, it has a lot of the concepts that people are used to from traditional architectures, only we apply them in the scale-out architecture. So we put together a maturity model that really maps out a customer's journey throughout the big data and the data lake experience. And each phase of this, we can see what the customer's doing, what their trends are and where they want to go, and we can advise to them the right way to move forward. And so a lot of the customers we see are kind of in kind of what we call the ignore stage. I'd say most of the people we talk to are just ignoring. They don't have things active, but they're doing a lot of research. They're trying to figure out what's next. And we want to move them from there. The next stage up is called store. And store is basically just the sandbox environment. "I'm going to stick stuff in there." "I'm going to hope something comes out of it." No collaboration. But then, moving forward, there's the managed phase, the automated phase, and the optimized phase. And our goal is to move them up into those phases as quickly as possible. And data lake in a box is an effort to do that, to leapfrog them into a managed data lake environment. >> So that's kind of where the swamp analogy comes in, because the data lake, the swamp is kind of dirty, where you can almost think, "Okay, the first step is store it." And then they get busy or they try to figure out how to operationalize it, and then it's kind of like, "Uh ..." So your point, they're trying to get to that. So you guys get 'em to that set up, and then move them quickly to value? Is that kind of the approach? >> Yeah. So, time to value is critical, right? So how do you reduce the time to insight from the time the data is produced by the date producer, till the time you can make the data available to the data consumer for analytics and downstream use cases. So that's kind of our core focus in bringing these solutions to the market. >> Dave often and I were talking, and George always talk about the value of data at the right time at the right place, is the critical lynch-pin for the value, whether it's an app-driven, or whatever. So the data lake, you never know what data in the data lake will need to be pulled out and put into either real time or an app. So you have to assume at any given moment there's going to be data value. >> Sure >> So that, conceptually, people can get that. But how do you make that happen? Because that's a really hard problem. How do you guys tackle that when a customer says, "Hey, I want to do the data lake. "I've got to have the coverage. "I got to know who's accessing stuff. "But at the end of the day, "I got to move the data to where it's valuable." >> Sure. So the approach we have taken is with an integrated platform with a common metadata layer. Metadata is the key. So, using this common metadata layer, being able to do managed ingestion from various different sources, being able to do data validation and data quality, being able to manage the life cycle of the data, being able to generate these insights about the data itself, so that you can use that effectively for data science or for downstream applications and use cases is critical based on our experience of taking these applications from, say, a POC pilot phase into a production phase. >> And what's the next step, once you guys get to that point with the metadata? Because, like, I get that, it's like everyone's got the metadata focus. Now, I'm the data engineer, the data NG or the geek, the supergeek and then you've got the data science, then the analysts, then there will probably be a new category, a bot or something AI will do something. But you can have a spectrum of applications on the data side. How do they get access to the metadata? Is it through the machine learning? Do you guys have anything unique there that makes that seamless or is that the end goal? >> Sure, do you want to take that? >> Yes sure, it's a multi-pronged answer, but I'll start and you can jump in. One of the things we provide as part of our overall platform is a product called Micah. And Micah is really the kind of on-ramp to the data. And all those people that you just named, we love them all, but their access to the data is through a self-service data preparation product, and key to that is the metadata repository. So, all the metadata is out there; we call it a catalog at that point, and so they can go in, look at the catalog, get a sense for the data, get an understanding for the form and function of the data, see who uses it, see where it's used, and determine if that's the data that they want, and if it is, they have the ability to refine it further, or they can put it in a shopping cart if they have access to it, they can get it immediately, they can refine it, if they don't have access to it, there's an automatic request that they can get access to it. And so it's a onramp concept, of having a card catalog of all the information that's out there, how it's being used, how it's been refined, to allow the end user to make sure that they've got the right data, they can be positioned for their ultimate application. >> And just to add to what Tony said, because we are using this common metadata layer, and capturing metadata every instance, if you will, we are serving it up to the data consumers, using a rich catalog, so that a lot of our enterprise customers are now starting to create what they consider a data marketplace or a data portal within their organization, so that they're able to catalog not just the data that's in the data lake, but also data that's in other data stores. And provide one single unified view of these data sets, so that your data scientists can come in and see is this a data set that I can use for my model building? What are the different attributes of this data set? What is the quality of the data? How fresh is the data? And those kind of traits, so that they are effective in their analytical journey. >> I think that's the key thing that's interesting to me, is that you're seeing the big data explosions over the past ten years, eight years, we've been covering The Cube since the dupe world started. But now, it's the data set world, so it's a big data set in this market. The data sets are the key because that's what data scientists want to wrangle around with, and sling data sets with whatever tooling they want to use. Is that kind of the same trend that you guys see? >> That's correct. And also what we're seeing in the marketplace, is that customers are moving from a single architecture to a distributed architecture, where they may have a hybrid environment with some things being instantiated in the Cloud, some things being on PRIM. So how do you not provide a unified interface across these multiple environments, and in a governed way, so that the right people have access to the right data, and it's not the data swamp. >> Okay, so lets go back to the maturity model because I like that framework. So now you've just complicated the heck out of it. Cause now you've got Cloud, and then on PRIM, and then now, how do you put that prism of maturity model, on now hybrid, so how does that cross-connect there? And a second follow-up to that is, where are the customers on this progress bar? I'm sure they're different by customer but, so, maturity model to the hybrid, and then trends in the customer base that you're seeing? >> Alright, I'll take the second one, and then you can take the first one, okay? So, the vast majority of the people that we work with, and the people, the prospects customers, analysts we've talked to, other industry dignitaries, they put the vast majority of the customers in the ignore stage. Really just doing their research. So a good 50% plus of most organizations are still in that stage. And then, the data swamp environment, that I'm using it to store stuff, hopefully I'll get something good out of it. That's another 25% of the population. And so, most of the customers are there, and we're trying to move them kind of rapidly up and into a managed and automated data lake environment. The other trend along these lines that we're seeing, that's pretty interesting, is the emergence of IT in the big data world. It used to be a business user's world, and business users built these sandboxes, and business users did what they wanted to. But now, we see organizations that are really starting to bring IT into the fold, because they need the governance, they need the automation, they need the type of rigor that they're used to, in other data environments, and has been lacking in the big data environment. >> And you've got the IOT code cracking the code on the IOT side which has created another dimension of complexity. On the numbers of the 50% that ignore, is that profile more for Fortune 1000? >> It's larger companies, it's Fortune, and Global 2000. >> Got it, okay, and the terms of the hybrid maturity model, how's that, and add a third dimension, IOT, we've got a multi-dimensional chess game going here. >> I think they way we think about it is, that they're different patterns of data sets coming in. So they could be batched, they could be files, or database extracts, or they could be streams, right? So as long as you think about a converged architecture that can handle these different patterns, then you can map different use cases whether they are IOT and streaming use cases versus what we are seeing is that a lot of companies are trying to replace their operational analytics platforms with a data lake environment, and they're building their operational analytics on top of the data lake, correct? So you need to think more from an abstraction layer, how do you abstract it out? Because one of the challenges that we see customers facing, is that they don't want to get sticky with one Cloud service provider because they may have multiple Cloud service providers, >> John: It's a multi-Cloud world right now. >> So how do you leverage that, where you have one Cloud service provider in one geo, another Cloud service provider in another geo, and still being able to have an abstraction layer on top of it, so that you're building applications? >> So do you guys provide that data layer across that abstraction? >> That is correct, yes, so we leverage the ecosystem, but what we do is add the data management and data governance layer, we provide that abstraction, so that you can be on PREM, you can be in Cloud service provider one, or Cloud service provider two. You still have the same controls, and same governance functions as you build your data lake environment. >> And this is consistent with some of the Cube interviews we had all day today, and other Cube interviews, where when you had the Cloud, you're renting basically, but you own your data. You get to have a nice ... And that metadata seems to be the key, that's the key, right? For everything. >> That's right. And now what we're seeing is that a lot of our Enterprise customers are looking at bringing in some of the public cloud infrastructure into their on-PRAM environment as they are going to be available in appliances and things like that, right? So how do you then make sure that whatever you're doing in a non-enterprise cloud environment you are also able to extend it to the enterprise-- >> And the consequences to the enterprise is that the enterprise multiple jobs, if they don't have a consistent data layer ... >> Sure, yeah. >> It's just more redundancy. >> Exactly. >> Not redundancy, duplication actually. >> Yeah, duplication and difficulty of rationalizing it together. >> So let me drill down into a little more detail on the transition between these sort of maturity phases? And then the movement into production apps. I'm curious to know, we've heard Tableau, XL, Power BI, Click I guess, being-- sort of adapting to being front ends to big data. But they don't, for their experience to work they can't really handle big data sets. So you need the MPP sequel database on the data lake. And I guess the question there is is there value to be gotten or measurable value to be gotten just from turning the data lake into you know, interactive BI kind of platform? And sort of as the first step along that maturity model. >> One of the patterns we were seeing is that serving LIR is becoming more and more mature in the data lake, so that earlier it used to be mainly batch type of workloads. Now, with MPP engines running on the data lake itself, you are able to connect your existing BI applications, whether it's Tableau, Click, Power BI, and others, to these engines so that you are able to get low-latency query response times and are able to slice-and-dice your data sets in the data lake itself. >> But you're essentially still, you have to sample the data. You can't handle the full data set unless you're working with something like Zoom Data. >> Yeah, so there are physical limitations obviously. And then there are also this next generation of BI tools which work in a converged manner in the data lake itself. So there's like Zoom Data, Arcadia, and others that are able to kind of run inside the data lake itself instead of you having to have an external environment like the other BI tools, so we see that as a pattern. But if you already are an enterprise, you have on board a BI platform, how do you leverage that with the data lake as part of the next-generation architecture is a key trend that we are seeing. >> So that your metadata helps make that from swamp to curated data lake. >> That's right, and not only that what we have done, as Tony was mentioning, in our Micah product we have a self-service catalog and then we provide a shopping cart experience where you can actually source data sets into the shopping cart, and we let them provision a sandbox. And when they provision the sandbox, they can actually launch Tableau or whatever the BI tool of choice is on that sandbox, so that they can actually-- and that sandbox could exist in the data lake or it could exist on a relational data store or an MPP data store that's outside of the data lake. That's part of your modern data architecture. >> But further to your point, if people have to throw out all of their decision support applications and their BI applications in order to change their data infrastructure, they're not going to do it. >> Understood. >> So you have to make that environment work and that's what Ben's referring to with a lot of the new accelerator tools and things that will sit on top of the data lake. >> Guys, thanks so much for coming on The Cube. Really appreciate it. I'll give you guys the final word in the segment ... What do you expect this week? I mean, obviously, we've been seeing the consolidation. You're starting to see the swim lanes of with Spark and Open Source and you see the cloud and IOT colliding, there's a huge intersection with deep learning, AI is certainly hyped up now beyond all recognition but it's essentially deep learning. Neural networks meets machine learning. That's been around before, but now freely available with Cloud and Compute. And so kind of a interesting dynamic that's rockin' the big data world. Your thoughts on what we're going to see this week and how that relates to the industry? >> I'll take a stab at it and you may feel free to jump in. I think what we'll see is that lot of customers that have been playing with big data for a couple of years are now getting to a point where what worked for one or two use cases now needs to be scaled out and provided at an enterprise scale. So they're looking at a managed and a governance layer to put on top of the platform. So they can enable machine learning and AI and all those use cases, because business is asking for them. Right? Business is asking for how they can bring intenser flow and run on the data lake itself, right? So we see those kind of requirements coming up more and more frequently. >> Awesome. Tony? >> What he said. >> And enterprise readiness certainly has to be table-- there's a lot of table stakes in the enterprise. It's not like, easy to get into, you can see Google kind of just putting their toe in the water with the Google cloud, tenser flow, great highlight they got spanner, so all these other things like latency rearing their heads again. So these are all kind of table stakes. >> Yeah, and the other thing, moving forward with respect to machine learning and some of the advanced algorithms, what we're doing now and some of the research we're doing is actually using machine learning to manage the data lake, which is a new concept, so when we get to the optimized phase of our maturity model, a lot of that has to do with self-correcting and self-automating. >> I need some machine learning and some AI, so does George and we need machine learning to watch the machine learn, and then algorithmists for algorithms. It's a crazy world, exciting time for us. >> Are we going to have a bot next time when we come here? (all laughing) >> We're going to chat off of messenger, we just came from south by southwest. Guys, thanks for coming on The Cube. Great insight and congratulations on the continued momentum. This is The Cube breakin' it down with experts, CEOs, entrepreneurs, all here inside The Cube. Big Data Sv, I'm John for George Gilbert. We'll be back after this short break. Thanks! (upbeat electronic music)
SUMMARY :
Announcer: Live from This is the week where it What's the big discussion at the show? hydrated into the data lake But the data lake is evolving, is the difference between a and the data lake experience. Is that kind of the approach? make the data available So the data lake, you never "But at the end of the day, So the approach we have taken is seamless or is that the end goal? One of the things we provide that's in the data lake, Is that kind of the same so that the right people have access And a second follow-up to that is, and the people, the prospects customers, On the numbers of the 50% that ignore, it's Fortune, and Global 2000. of the hybrid maturity model, of the data lake, correct? John: It's a multi-Cloud the data management and And that metadata seems to be the key, some of the public cloud And the consequences of rationalizing it together. database on the data lake. in the data lake itself. You can't handle the full data set manner in the data lake itself. So that your metadata helps make that exist in the data lake But further to your point, if So you have to make and how that relates to the industry? and run on the data lake itself, right? stakes in the enterprise. a lot of that has to and some AI, so does George and we need on the continued momentum.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
George Gilbert | PERSON | 0.99+ |
Tony Fischer | PERSON | 0.99+ |
one | QUANTITY | 0.99+ |
Tony | PERSON | 0.99+ |
Dave Alante | PERSON | 0.99+ |
Tony Fisher | PERSON | 0.99+ |
George | PERSON | 0.99+ |
Ben Sharma | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
New York | LOCATION | 0.99+ |
John Furrier | PERSON | 0.99+ |
George Gilbert | PERSON | 0.99+ |
John | PERSON | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
Zeloni | PERSON | 0.99+ |
Zaloni | PERSON | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
50% | QUANTITY | 0.99+ |
San Jose, California | LOCATION | 0.99+ |
25% | QUANTITY | 0.99+ |
ORGANIZATION | 0.99+ | |
eight weeks | QUANTITY | 0.99+ |
two executives | QUANTITY | 0.99+ |
first step | QUANTITY | 0.99+ |
Tableau | TITLE | 0.99+ |
eight years | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
Big Data | ORGANIZATION | 0.98+ |
two | QUANTITY | 0.98+ |
this week | DATE | 0.98+ |
second one | QUANTITY | 0.98+ |
One | QUANTITY | 0.98+ |
first one | QUANTITY | 0.98+ |
each phase | QUANTITY | 0.98+ |
Ben | PERSON | 0.97+ |
NYC | LOCATION | 0.97+ |
20-16 | DATE | 0.97+ |
Cloud | TITLE | 0.97+ |
Strata | ORGANIZATION | 0.97+ |
Big Data Sv | ORGANIZATION | 0.97+ |
second | QUANTITY | 0.96+ |
two use cases | QUANTITY | 0.96+ |
Cube | ORGANIZATION | 0.96+ |
third | QUANTITY | 0.94+ |
The Cube | ORGANIZATION | 0.91+ |
single architecture | QUANTITY | 0.91+ |
Power | TITLE | 0.9+ |
Micah | LOCATION | 0.85+ |
Arcadia | TITLE | 0.83+ |
Zoom Data | TITLE | 0.83+ |
Big Data SV | ORGANIZATION | 0.82+ |
Micah | PERSON | 0.81+ |
Click | TITLE | 0.8+ |
Strata-Hadoob | TITLE | 0.8+ |
Zoom Data | TITLE | 0.78+ |
Fortune | ORGANIZATION | 0.78+ |
Spark | TITLE | 0.78+ |
Power BI | TITLE | 0.78+ |
#theCUBE | ORGANIZATION | 0.77+ |
one geo | QUANTITY | 0.76+ |
one single unified | QUANTITY | 0.75+ |
Big Data Silicon Valley | ORGANIZATION | 0.72+ |
Bond | ORGANIZATION | 0.72+ |
Hadoob | ORGANIZATION | 0.72+ |
POCs | ORGANIZATION | 0.67+ |
PRIM | TITLE | 0.66+ |
Data | ORGANIZATION | 0.65+ |
lake | ORGANIZATION | 0.6+ |
Pilot | ORGANIZATION | 0.58+ |
XL | TITLE | 0.58+ |
of years | QUANTITY | 0.56+ |
Global | ORGANIZATION | 0.55+ |