Christian Rodatus, Datameer | BigData NYC 2017
>> Announcer: Live from Midtown Manhattan, it's theCUBE covering Big Data New York City 2017. Brought to by SiliconANGLE Media and its ecosystem sponsors. >> Coverage to theCUBE in New York City for Big Data NYC, the hashtag is BigDataNYC. This is our fifth year doing our own event in conjunction with Strata Hadoop, now called Strata Data, used to be Hadoop World, our eighth year covering the industry, we've been there from the beginning in 2010, the beginning of this revolution. I'm John Furrier, the co-host, with Jim Kobielus, our lead analyst at Wikibon. Our next guest is Christian Rodatus, who is the CEO of Datameer. Datameer, obviously, one of the startups now evolving on the, I think, eighth year or so, roughly seven or eight years old. Great customer base, been successful blocking and tackling, just doing good business. Your shirt says show him the data. Welcome to theCUBE, Christian, appreciate it. >> So well established, I barely think of you as a startup anymore. >> It's kind of true, and actually a couple of months ago, after I took on the job, I met Mike Olson, and Datameer and Cloudera were sort of founded the same year, I believe late 2009, early 2010. Then, he told me there were two open source projects with MapReduce and Hadoop, basically, and Datameer was founded to actually enable customers to do something with it, as an entry platform to help getting data in, create the data and doing something with it. And now, if you walk the show floor, it's a completely different landscape now. >> We've had you guys on before, the founder, Stefan, has been on. Interesting migration, we've seen you guys grow from a customer base standpoint. You've come on as the CEO to kind of take it to the next level. Give us an update on what's going on at Datameer. Obviously, the shirt says "Show me the data." Show me the money kind of play there, I get that. That's where the money is, the data is where the action is. Real solutions, not pie in the sky, we're now in our eighth year of this market, so there's not a lot of tolerance for hype even though there's a lot of AI watching going on. What's going on with you guys? >> I would say, interesting enough I met with a customer, prospective customer, this morning, and this was a very typical organization. So, this is a customer that was an insurance company, and they're just about to spin up their first Hadoop cluster to actually work on customer management applications. And they are overwhelmed with what the market offers now. There's 27 open source projects, there's dozens and dozens of other different tools that try to basically, they try best of reach approaches and certain layers of the stack for specific applications, and they don't really know how to stitch this all together. And if I reflect from a customer meeting at a Canadian bank recently that has very successfully deployed applications on the data lake, like in fraud management and compliance applications and things like this, they still struggle to basically replicate the same performance and the service level agreements that they used from their old EDW that they still have in production. And so, everybody's now going out there and trying to figure out how to get value out of the data lake for the business users, right? There's a lot of approaches that these companies are trying. There's SQL-on-Hadoop that supposedly doesn't perform properly. There is other solutions like OLAP on Hadoop that tries to emulate what they've been used to from the EDWs, and we believe these are the wrong approaches, so we want to stay true to the stack and be native to the stack and offer a platform that really operates end-to-end from interesting the data into the data lake to creation, preparation of the data, and ultimately, building the data pipelines for the business users, and this is certainly something-- >> Here's more of a play for the business users now, not the data scientists and statistical modelers. I thought the data scientists were your core market. Is that not true? >> So, our primary user base as Datameer used to be like, until last week, we were the data engineers in the companies, or basically the people that built the data lake, that created the data and built these data pipelines for the business user community no matter what tool they were using. >> Jim, I want to get your thoughts on this for Christian's interest. Last year, so these guys can fix your microphone. I think you guys fix the microphone for us, his earpiece there, but I want to get a question to Chris, and I ask to redirect through you. Gartner, another analyst firm. >> Jim: I've heard of 'em. >> Not a big fan personally, but you know. >> Jim: They're still in business? >> The magic quadrant, they use that tool. Anyway, they had a good intro stat. Last year, they predicted through 2017, 60% of big data projects will fail. So, the question for both you guys is did that actually happen? I don't think it did, I'm not hearing that 60% have failed, but we are seeing the struggle around analytics and scaling analytics in a way that's like a dev ops mentality. So, thoughts on this 60% data projects fail. >> I don't know whether it's 60%, there was another statistic that said there's only 14% of Hadoop deployments, or production or something, >> They said 60, six zero. >> Or whatever. >> Define failure, I mean, you've built a data lake, and maybe you're not using it immediately for any particular application. Does that mean you've failed, or does it simply mean you haven't found the killer application yet for it? I don't know, your thoughts. >> I agree with you, it's probably not a failure to that extent. It's more like how do they, so they dump the data into it, right, they build the infrastructure, now it's about the next step data lake 2.0 to figure out how do I get value out of the data, how do I go after the right applications, how do I build a platform and tools that basically promotes the use of that data throughout the business community in a meaningful way. >> Okay, so what's going on with you guys from a product standpoint? You guys have some announcements. Let's get to some of the latest and greatest. >> Absolutely. I think we were very strong in data creation, data preparation and the entire data governance around it, and we are using, as a user interface, we are using this spreadsheet-like user interface called a workbook, it really looks like Excel, but it's not. It operates at completely different scale. It's basically an Excel spreadsheet on steroids. Our customers built a data pipeline, so this is the data engineers that we discussed before, but we also have a relatively small power user community in our client base that use that spreadsheet for deep data exploration. Now, we are lifting this to the next level, and we put up a visualization layer on top of it that runs natively in the stack, and what you get is basically a visual experience not only in the data curation process but also in deep data exploration, and this is combined with two platform technologies that we use, it's based on highly scalable distributed search in the backend engine of our product, number one. We have also adopted a columnar data store, Parquet, for our file system now. In this combination, the data exploration capabilities we bring to the market will allow power analysts to really dig deep into the data, so there's literally no limits in terms of the breadth and the depth of the data. It could be billions of rows, it could be thousands of different attributes and columns that you are looking at, and you will get a response time of sub-second as we create indices on demand as we run this through the analytic process. >> With these fast queries and visualization, do you also have the ability to do semantic data virtualization roll-ups across multi-cloud or multi-cluster? >> Yeah, absolutely. We, also there's a second trend that we discussed right before we started the live transmission here. Things are also moving into the cloud, so what we are seeing right now is the EDW's not going away, the on prem is data lake, that prevail, right, and now they are thinking about moving certain workload types into the cloud, and we understand ourselves as a platform play that builds a data fabric that really ties all these data assets together, and it enables business. >> On the trends, we weren't on camera, we'll bring it up here, the impact of cloud to the data world. You've seen this movie before, you have extensive experience in this space going back to the origination, you'd say Teradata. When it was the classic, old-school data warehouse. And then, great purpose, great growth, massive value creation. Enter the Hadoop kind of disruption. Hadoop evolved from batch to do ranking stuff, and then tried to, it was a hammer that turned into a lawnmower, right? Then they started going down the path, and really, it wasn't workable for what people were looking at, but everyone was still trying to be the Teradata of whatever. Fast forward, so things have evolved and things are starting to shake out, same picture of data warehouse-like stuff, now you got cloud. It seems to be changing the nature of what it will become in the future. What's your perspective on that evolution? What's different about now and what's same about now that's, from the old days? What's the similarities of the old-school, and what's different that people are missing? >> I think it's a lot related to cloud, just in general. It is extremely important to fast adoptions throughout the organization, to get performance, and service-level agreements without customers. This is where we clearly can help, and we give them a user experience that is meaningful and that resembles what they were used to from the old EDW world, right? That's number one. Number two, and this comes back to a question to 60% fail, or why is it failing or working. I think there's a lot of really interesting projects out, and our customers are betting big time on the data lake projects whether it being on premise or in the cloud. And we work with HSBC, for instance, in the United Kingdom. They've got 32 data lake projects throughout the organization, and I spoke to one of these-- >> Not 32 data lakes, 32 projects that involve tapping into the data lake. >> 32 projects that involve various data lakes. >> Okay. (chuckling) >> And I spoke to one of the chief data officers there, and they said they are data center infrastructure just by having kick-started these projects will explode. And they're not in the business of operating all the hardware and things like this, and so, a major bank like them, they made an announcement recently, a public announcement, you can read about it, started moving the data assets into the cloud. This is clearly happening at rapid pace, and it will change the paradigm in terms of breathability and being able to satisfy peak workload requirements as they come up, when you run a compliance report at quota end or something like this, so this will certainly help with adoption and creating business value for our customers. >> We talk about all the time real-time, and there's so many examples of how data science has changed the game. I mean, I was talking about, from a cyber perspective, how data science helped capture Bin Laden to how I can get increased sales to better user experience on devices. Having real-time access to data, and you put in some quick data science around things, really helps things in the edge. What's your view on real-time? Obviously, that's super important, you got to kind of get your house in order in terms of base data hygiene and foundational work, building blocks. At the end of the day, the real-time seems to be super hot right now. >> Real-time is a relative term, right, so there's certainly applications like IOT applications, or machine data that you analyze that require real-time access. I would call it right-time, so what's the increment of data load that is required for certain applications? We are certainly not a real-time application yet. We can possibly load data through Kafka and stream data through Kafka, but in general, we are still a batch-oriented platform. We can do. >> Which, by the way, is not going away any time soon. It's like super important. >> No, it's not going away at all, right. It can do many batches at relatively frequent increments, which is usually enough for what our customers demand from our platform today, but we're certainly looking at more streaming types of capability as we move this forward. >> What do the customer architectures look like? Because you brought up the good point, we talk about this all the time, batch versus real-time. They're not mutually exclusive, obviously, good architectures would argue that you decouple them, obviously will have a good software elements all through the life cycle of data. >> Through the stack. >> And have the stack, and the stack's only going to get more robust. Your customers, what's the main value that you guys provide them, the problem that you're solving today and the benefits to them? >> Absolutely, so our true value is that there's no breakages in the stack. We enter, and we can basically satisfy all requirements from interesting the data, from blending and integrating the data, preparing the data, building the data pipelines, and analyzing the data. And all this we do in a highly secure and governed environment, so if you stitch it together, as a customer, the customer this morning asked me, "Whom do you compete with?" I keep getting this question all the time, and we really compete with two things. We compete with build-your-own, which customers still opt to do nowadays, while our things are really point and click and highly automated, and we compete with a combination of different products. You need to have at least three to four different products to be able to do what we do, but then you get security breaks, you get lack of data lineage and data governance through the process, and this is the biggest value that we can bring to the table. And secondly now with visual exploration, we offer capability that literally nobody has in the marketplace, where we give power users the capability to explore with blazing fast response times, billion rows of data in a very free-form type of exploration process. >> Are there more power users now than there were when you started as a company? It seemed like tools like Datameer have brought people into the sort of power user camp, just simply by the virtue of having access to your tool. What are your thoughts there? >> Absolutely, it's definitely growing, and you see also different companies exploiting their capability in different ways. You might find insurance or financial services customers that have a very sophisticated capability building in that area, and you might see 1,000 to 2,000 users that do deep data exploration, and other companies are starting out with a couple of dozen and then evolving it as they go. >> Christian, I got to ask you as the new CEO of Datameer, obviously going to the next level, you guys have been successful. We were commenting yesterday on theCUBE about, we've been covering this for eight years in depth in terms of CUBE coverage, we've seen the waves come and go of hype, but now there's not a lot of tolerance for hype. You guys are one of the companies, I will say, that stay to your knitting, you didn't overplay your hand. You've certainly rode the hype like everyone else did, but your solution is very specific on value, and so, you didn't overplay your hand, the company didn't really overplay their hand, in my opinion. But now, there's really the hand is value. >> Absolutely. >> As the new CEO, you got to kind of put a little shiny new toy on there, and you know, rub the, keep the car lookin' shiny and everything looking good with cutting edge stuff, the same time scaling up what's been working. The question is what are you doubling down on, and what are you investing in to keep that innovation going? >> There's really three things, and you're very much right, so this has become a mature company. We've grown with our customer base, our enterprise features and capabilities are second to none in the marketplace, this is what our customers achieve, and now, the three investment areas that we are putting together and where we are doubling down is really visual exploration as I outlined before. Number two, hybrid cloud architectures, we don't believe the customers move their entire stack right into the cloud. There's a few that are going to do this and that are looking into these things, but we will, we believe in the idea that they will still have to EDW their on premise data lake and some workload capabilities in the cloud which will be growing, so this is investment area number two. Number three is the entire concept of data curation for machine learning. This is something where we've released a plug-in earlier in the year for TensorFlow where we can basically build data pipelines for machine learning applications. This is still very small. We see some interest from customers, but it's growing interest. >> It's a directionally correct kind of vector, you're looking and say, it's a good sign, let's kick the tires on that and play around. >> Absolutely. >> 'Cause machine learning's got to learn, too. You got to learn from somewhere. >> And quite frankly, deep learning, machine learning tools for the rest of us, there aren't really all that many for the rest of us power users, they're going to have to come along and get really super visual in terms of enabling visual modular development and tuning of these models. What are your thoughts there in terms of going forward about a visualization layer to make machine learning and deep learning developers more productive? >> That is an area where we will not engage in a way. We will stick with our platform play where we focus on building the data pipelines into those tools. >> Jim: Gotcha. >> In the last area where we invest is ecosystem integration, so we think with our visual explorer backend that is built on search and on a Parquet file format is, or columnar store, is really a key differentiator in feeding or building data pipelines into the incumbent BRE ecosystems and accelerating those as well. We've currently prototypes running where we can basically give the same performance and depth of analytic capability to some of the existing BI tools that are out there. >> What are some the ecosystem partners do you guys have? I know partnering is a big part of what you guys have done. Can you name a few? >> I mean, the biggest one-- >> Everybody, Switzerland. >> No, not really. We are focused on staying true to our stack and how we can provide value to our customers, so we work actively and very important on our cloud strategy with Microsoft and Amazon AWS in evolving our cloud strategy. We've started working with various BI vendors throughout that you know about, right, and we definitely have a play also with some of the big SIs and IBM is a more popular one. >> So, BI guys mostly on the tool visualization side. You said you were a pipeline. >> On tool and visualization side, right. We have very effective integration for our data pipelines into the BI tools today we support TD for Tableau, we have a native integration. >> Why compete there, just be a service provider. >> Absolutely, and we have more and better technology come up to even accelerate those tools as well in our big data stuff. >> You're focused, you're scaling, final word I'll give to you for the segment. Share with the folks that are a Datameer customer or have not yet become a customer, what's the outlook, what's the new Datameer look like under your leadership? What should they expect? >> Yeah, absolutely, so I think they can expect utmost predictability, the way how we roll out the division and how we build our product in the next couple of releases. The next five, six months are critical for us. We have launched Visual Explorer here at the conference. We're going to launch our native cloud solution probably middle of November to the customer base. So, these are the big milestones that will help us for our next fiscal year and provide really great value to our customers, and that's what they can expect, predictability, a very solid product, all the enterprise-grade features they need and require for what they do. And if you look at it, we are really enterprise play, and the customer base that we have is very demanding and challenging, and we want to keep up and deliver a capability that is relevant for them and helps them create values from the data lakes. >> Christian Rodatus, technology enthusiast, passionate, now CEO of Datameer. Great to have you on theCUBE, thanks for sharing. >> Thanks so much. >> And we'll be following your progress. Datameer here inside theCUBE live coverage, hashtag BigDataNYC, our fifth year doing our own event here in conjunction with Strata Data, formerly Strata Hadoop, Hadoop World, eight years covering this space. I'm John Furrier with Jim Kobielus here inside theCUBE. More after this short break. >> Christian: Thank you. (upbeat electronic music)
SUMMARY :
Brought to by SiliconANGLE Media and its ecosystem sponsors. I'm John Furrier, the co-host, with Jim Kobielus, So well established, I barely think of you create the data and doing something with it. You've come on as the CEO to kind of and the service level agreements that they used Here's more of a play for the business users now, that created the data and built these data pipelines and I ask to redirect through you. So, the question for both you guys is the killer application yet for it? the next step data lake 2.0 to figure out Okay, so what's going on with you guys and columns that you are looking at, and we understand ourselves as a platform play the impact of cloud to the data world. and that resembles what they were used to tapping into the data lake. and being able to satisfy peak workload requirements and you put in some quick data science around things, or machine data that you analyze Which, by the way, is not going away any time soon. more streaming types of capability as we move this forward. What do the customer architectures look like? and the stack's only going to get more robust. and analyzing the data. just simply by the virtue of having access to your tool. and you see also different companies and so, you didn't overplay your hand, the company and what are you investing in to keep that innovation going? and now, the three investment areas let's kick the tires on that and play around. You got to learn from somewhere. for the rest of us power users, We will stick with our platform play and depth of analytic capability to some of What are some the ecosystem partners do you guys have? and how we can provide value to our customers, on the tool visualization side. into the BI tools today we support TD for Tableau, Absolutely, and we have more and better technology Share with the folks that are a Datameer customer and the customer base that we have is Great to have you on theCUBE, here in conjunction with Strata Data, Christian: Thank you.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Jim Kobielus | PERSON | 0.99+ |
Chris | PERSON | 0.99+ |
HSBC | ORGANIZATION | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
Jim | PERSON | 0.99+ |
Christian Rodatus | PERSON | 0.99+ |
Stefan | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
John Furrier | PERSON | 0.99+ |
60% | QUANTITY | 0.99+ |
2017 | DATE | 0.99+ |
Datameer | ORGANIZATION | 0.99+ |
2010 | DATE | 0.99+ |
32 projects | QUANTITY | 0.99+ |
Last year | DATE | 0.99+ |
United Kingdom | LOCATION | 0.99+ |
1,000 | QUANTITY | 0.99+ |
New York City | LOCATION | 0.99+ |
14% | QUANTITY | 0.99+ |
eight years | QUANTITY | 0.99+ |
fifth year | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
Cloudera | ORGANIZATION | 0.99+ |
Excel | TITLE | 0.99+ |
eighth year | QUANTITY | 0.99+ |
late 2009 | DATE | 0.99+ |
early 2010 | DATE | 0.99+ |
Mike Olson | PERSON | 0.99+ |
60 | QUANTITY | 0.99+ |
27 open source projects | QUANTITY | 0.99+ |
last week | DATE | 0.99+ |
thousands | QUANTITY | 0.99+ |
two things | QUANTITY | 0.99+ |
Kafka | TITLE | 0.99+ |
seven | QUANTITY | 0.99+ |
second trend | QUANTITY | 0.99+ |
Midtown Manhattan | LOCATION | 0.99+ |
yesterday | DATE | 0.99+ |
Christian | PERSON | 0.99+ |
both | QUANTITY | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.98+ |
two open source projects | QUANTITY | 0.98+ |
Gartner | ORGANIZATION | 0.98+ |
two platform technologies | QUANTITY | 0.98+ |
Wikibon | ORGANIZATION | 0.98+ |
Switzerland | LOCATION | 0.98+ |
billions of rows | QUANTITY | 0.98+ |
first | QUANTITY | 0.98+ |
MapReduce | ORGANIZATION | 0.98+ |
2,000 users | QUANTITY | 0.98+ |
Bin Laden | PERSON | 0.98+ |
NYC | LOCATION | 0.97+ |
Strata Data | ORGANIZATION | 0.97+ |
32 data lakes | QUANTITY | 0.97+ |
six | QUANTITY | 0.97+ |
Hadoop | TITLE | 0.97+ |
secondly | QUANTITY | 0.96+ |
next fiscal year | DATE | 0.96+ |
three things | QUANTITY | 0.96+ |
today | DATE | 0.95+ |
four different products | QUANTITY | 0.95+ |
Teradata | ORGANIZATION | 0.95+ |
Christian | ORGANIZATION | 0.95+ |
this morning | DATE | 0.95+ |
TD | ORGANIZATION | 0.94+ |
EDW | ORGANIZATION | 0.94+ |
BigData | EVENT | 0.92+ |