Jeff Bettencourt, DataTorrent & Nathan Trueblood, DataTorrent - DataWorks Summit 2017

>> Narrator: Live, from San Jose, in the heart of Silicon Valley, it's The Cube. Covering, DataWorks Summit, 2017. Brought to you by Hortonworks. >> Welcome back to The Cube. We are live on day two of the DataWorks Summit. From the heart of Silicon Valley. I am Lisa Martin, my co-host is George Gilbert. We're very excited to be joined by our next guest from DataTorrent, we've got Nathan Trueblood, VP of Product, hey Nathan. >> Hi. >> Lisa: And, the man who gave me my start in high tech, 12 years ago, the SVP of Marketing, Jeff Bettencourt. Welcome, Jeff. >> Hi, Lisa, good to see ya. >> Lisa: Great to see you, too, so. Tell us about the SVP of Marketing, who is DataTorrent, what do you guys do, what are doing in the big data space? >> Jeff: So, DataTorrent is all about real time streaming. So, it's really taken a different paradigm to handling information as it comes from the different sources that are out there, so you think, big IOT, you think, all of these different new things that are creating pieces of information. It could be humans, it could be machines. Sensors, whatever it is. And taking that in realtime, rather than putting it traditionally just in a data lake and then later on coming back and investigating the data that you stored. So, we started about 2011, started by some of the early founders, people that started Yahoo. And, we're pioneers in Hadoop with Hadoop yarn. This is one of the guys here, too. And so we're all about building realtime analytics for our customers, making sure that they can get business decisions done in realtime. As the information is created. And, Nathan will talk a little bit about what we're doing on the application side of it, as well. Building these hard application pipelines for our customers to assist them to get started faster. >> Lisa: Excellent. >> So, alright, let's turn to those realtime applications. Umm, my familiarity with DataTorrent started probably about five years ago, I think, where it was, I think the position is, I don't think that there was so much talk about streaming but it was like, you know, realtime data feed, but, now we have, I mean, streaming is sort of center of gravity. Sort of, appear to big data. >> Nathan: Yeah. >> So, tell us how someone whose building apps, should think about the two solution categories how they compliment each other and what sort of applications we can build now that we couldn't build before? >> So, I think the way I look at it, is not so much two different things that compliment each other, but streaming analytics and realtime data processing and analytics is really just a natural progression of where big data has been going. So, you know, when we were at Yahoo and we're running Hadoop in scale, you know, first thing on the scene was just simply the ability to produce insight out of a massive amount of data. But then there was this constant pressure, well, okay, now we've produced that insight in a day, can you do it in an hour? You know, can you do it in half an hour? And particularly at Yahoo at the time that Ah-mol, our CTO and I were there, there was just constant pressure of can you produce insight from a huge volume of data more quickly? And, so we kind of saw at that time, two major trends. One, was that we were kind of reaching a limit of where you could go with the Hadoop and batch architecture at that time. And so a new approach was required. And that's what really was sort of, the foundation of the Apache Apex project and of DataTorrent the company, was simply realizing that a new approach was required because the more that Yahoo or other businesses can take information from the world around them and take action on that as quickly as possible, that's going to make you more competitive. So I'd look at streaming as really just a natural progression. Where, now it's possible to get inside and take action on data as close to the time of data creation as possible and if you can do that, then, you're going to be competitive. And so we see this coming across a whole bunch of different verticals. So that's how I kind of look at the sort of it's not too much complimentary, as a trend in where big data is going. Now, the kinds of things that weren't possible before this, are, you know, the kinds of applications where now you can take insight whether it's from IOD or from sensors or from retail, all the things that are going on. Whereas before, you would land this in a data lake, do a bunch of analysis, produce some insight, maybe change your behavior, but ultimately, you weren't being as responsive as you could be to customers. So now what we are seeing, why I think the center of mass is moved into realtime and streaming, is that now it's possible to, you know, give the customer an offer the second they walk into a store. Based on what you know about them and their history. This was always something that the internet properties were trying to move towards, but now we see, that same technology is being made available across a whole bunch of different verticals. A whole bunch of different industries and that's why you know, when you look at Apex and DataTorrent, we're involved not only in things like adtech, but in industrial automation and IOT, and we're involved in, you know, retail and customer 360 because in every one of these cases, insurance, finance, security and fraud prevention, it's a huge competitive advantage if you can get insight and make a decision, close to the time of the data creation. So, I think that's really where the shift is coming from. And then the other thing I would mention here, is that a big thrust of our company, and of Apache Apex and this is, so we saw streaming was going to be something that every one was going to need. The other thing we saw from our experience at Yahoo, was that, really getting something to work at a POC level, showing that something is possible, with streaming analytics is really only a small part of the problem. Being able to take and put something into production at scale and run a business on it, is a much bigger part of the problem. And so, we put into both the Apache Apex problem as well as into our product, the ability to not only get insight out of this data in motion, but to be able to put that into production at scale. And so, that's why we've had quite a few customers who have put our product, in production at scale and have been running that way, you know, in some cases for years. And so that's another sort of key area where we're forging a path, which is, it's not enough to do POC and show that something is possible. You have to be able to run a business on it. >> Lisa: So, talk to us about where DataTorrent sits within a modern data architecture. You guys are kind of playing in a couple of, integrated in a couple of different areas. What goes through what that looks like? >> So, in terms of a modern data architecture, I mean part of it is what I just covered in that, we're moving sort of from a batch to streaming world where the notion of batch is not going away, but now when you have something, you know a streaming application, that's something that's running all the time, 24/7, there's no concept of batch. Batch is really more the concept of how you are processing data through that streaming application so, what we're seeing in the modern data architecture, is that, you know, typically you have people taking data, extracting it and eventually loading it into some kind of a data lake, right? What we're doing is, shifting left of the data lake. You know, analyzing information when it's created. Produce insight from it, take action on it, and then, yes, land it in the data lake, but once you land it in the data lake, now, all of the purposes of what you're doing with that data have shifted. You know, we're producing insight, taking action to the left of the data lake and then we use that data lake to do things, like train your you know, your machine learning model that we're then going to use to the left of the data lake. Use the data lake to do slicing and dicing of your data to better understand what kinds of campaigns you want to run, things like that. But ultimately, you're using the realtime portion of this to be able to take those campaigns and then measure the impacts you're having on your customers in realtime. >> So, okay, cause that was going to be my followup question, which is, there does seem to be a role, for a historical repository for richer context. >> Nathan: Absolutely. >> And you're acknowledging that. Like, did the low legacy analytics happen first? Then, store up for a richer model, you know, later? >> Nathan: Correct. >> Umm. So, there are a couple things then that seem to be like requirements, next steps, which is, if you're doing the modeling, the research model, in the cloud, how do you orchestrate its distribution towards the sources of the realtime data, umm, and in other words, if you do training up in the cloud where you have, the biggest data or the richest data. Is DataTorrent or Apex a part of the process of orchestrating the distribution and coherence of the models that should be at the edge, or closer to where the data sources are? >> So, I guess there's a couple different ways we can think about that problem. So, you know we have customers today who are essentially providing into the streaming analytics application, you know, the models that have been trained on the data from the data lake. And, part of the approach we take in Apex and DataTorrent, is that you can reload and be changing those models all of the time. So, our architecture is such that it's full tolerant it stays up all the time so you can actually change the application and evolve it over time. So, we have customers that are reloading models on a regular basis, so that's whether it's machine learning or even just a rules engine, we're able to reload that on a regular basis. The other part of your question, if I understood you, was really about the distribution of data. And the distribution of models, and the distribution of data and where do you train that. And I think that you're going to have data in the cloud, you're going to have data on premises, you're going to have data at the edge, again, what we allow customers to do, is to be able to take and integrate that data and make decisions on it, regardless kind of where it lives, so we'll see streaming applications that get deployed into the cloud. But they may be synchronized in some portion of the data, to on premises or vis versa. So, certainly we can orchestrate all of that as part of an overall streaming application. >> Lisa: I want to ask Jeff, now. Give us a cross section of your customers. You've got customers ranging from small businesses, to fortune 10. >> Jeff: Yep. >> Give us some, kind of used cases that really took out of you, that really showcased the great potential that DataTorrent gives. >> Jeff: So if you think about the heritage of our company coming out of the early guys that were in Yahoo, adtech is obviously one that we hit hard and it's something we know how to do really really well. So, adtech is one of those things where they're constantly changing so you can take that same model and say, if I'm looking at adtech and saying, if I applied that to a distribution of products, in a manufacturing facility, it's kind of all the same type of activities, right? I'm managing a lot of inventory, I'm trying to get that inventory to the right place at the right time and I'm trying to fill that aspect of it. So that's kind of where we kind of started but we've got customers in the financial sector, right, that are really looking at instantaneous type of transactions that are happening. And then how do you apply knowledge and information to that while you're bringing that source data in so that you can make decisions. Some of those decisions have people involved with them and some of them are just machine based, right, so you take the people equation out. We kind of have this funny thing that Guy Churchward our CEO talks about, called the do loop and the do loop is where the people come in and how do we remove people out of that do loop and really make it easier for companies to act, prevent? So then if you take that aspect of it, we've got companies like in the publishing space. We've got companies in the IOT space, so they're doing interview management, stuff like that, so, we go from very you know, medium sized customers all the way up to very very large enterprises. >> Lisa: You're really turning up a variety of industries and to tech companies, because they have to be these days. >> Nathan: Right, well and one other thing I would mention, there, which is important, especially as we look at big data and a lot of customer concern about complexity. You know, I mentioned earlier about the challenge of not just coming up with an idea but being able to put that into production. So, one of the other big ares of focus for DataTorrent, as a company, is that not only have we developed platform for streaming analytics and applications but we're starting to deliver applications that you can download and run on our platform that deliver an outcome to a customer immediately. So, increasingly as we see in different verticals, different applications, then we turn those into applications we can make available to all of our customers that solve business problems immediately. One of the challenges for a long time in IT is simply how do you eliminate complexity and there's no getting away from the fact that this is big data in its complex systems. But to drive mass adoption, we're focused on how can we deliver outcomes for our customers as quickly as possible and the way to do that is by making applications available across all these different verticals. >> Well you guys, this has been so educational. We wish you guys continued success, here. It sounds like you're really being quite disruptive in an of yourselves, so if you haven't heard of them, DataTorrent.com, check them out. Nathan, Jeff, thanks so much for giving us your time this afternoon. >> Great, thanks for the opportunity. >> Lisa: We look forward to having you back. You've been watching The Cube, live from day two of the DataWorks Summit, from the heart of Silicon Valley, for my co-host George Gilbert, I'm Lisa Martin, stick around, we'll be right back. (upbeat music)

Published Date : Jun 14 2017

SUMMARY :

Brought to you by Hortonworks. From the heart of Silicon Valley. 12 years ago, the SVP of Marketing, Jeff Bettencourt. who is DataTorrent, what do you guys do, the data that you stored. but it was like, you know, realtime data feed, is that now it's possible to, you know, Lisa: So, talk to us about where DataTorrent Batch is really more the concept of how you are So, okay, cause that was going to be my followup question, Then, store up for a richer model, you know, later? in the cloud, how do you orchestrate its distribution and DataTorrent, is that you can reload to fortune 10. showcased the great potential that DataTorrent gives. so that you can make decisions. of industries and to tech companies, that you can download and run on our platform We wish you guys continued success, here. Lisa: We look forward to having you back.

ENTITIES

Entity	Category	Confidence
Jeff	PERSON	0.99+
Nathan	PERSON	0.99+
George Gilbert	PERSON	0.99+
Jeff Bettencourt	PERSON	0.99+
Lisa	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Yahoo	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
adtech	ORGANIZATION	0.99+
Nathan Trueblood	PERSON	0.99+
Apex	ORGANIZATION	0.99+
DataTorrent	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
Guy Churchward	PERSON	0.99+
The Cube	TITLE	0.99+
half an hour	QUANTITY	0.99+
one	QUANTITY	0.99+
an hour	QUANTITY	0.98+
DataWorks Summit	EVENT	0.98+
two different things	QUANTITY	0.98+
One	QUANTITY	0.98+
Apache	ORGANIZATION	0.97+
today	DATE	0.97+
both	QUANTITY	0.97+
Ah-mol	ORGANIZATION	0.96+
first thing	QUANTITY	0.96+
DataTorrent.com	ORGANIZATION	0.96+
a day	QUANTITY	0.95+
Hortonworks	ORGANIZATION	0.95+
day two	QUANTITY	0.94+
12 years ago	DATE	0.93+
this afternoon	DATE	0.92+
DataWorks Summit 2017	EVENT	0.92+
2011	DATE	0.91+
first	QUANTITY	0.91+
two solution	QUANTITY	0.9+
about five years ago	DATE	0.88+
Apache Apex	ORGANIZATION	0.88+
SVP	PERSON	0.83+
Hadoop	ORGANIZATION	0.77+
two major trends	QUANTITY	0.77+
2017	DATE	0.74+
second	QUANTITY	0.68+
360	QUANTITY	0.66+
The Cube	ORGANIZATION	0.63+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Jeff Bettencourt: