Guy Churchward, DataTorrent | CUBEConversations

(upbeat electronic music) >> Hey, welcome back, everybody. Jeff Frick here with theCUBE. We're having a CUBE Conversation in the Palo Alto studio, a little bit of a break from the crazy conference season, so we can have a little more intimate conversation without the madness of some of the shows. So we're really excited to have many-time CUBE alumni, Guy Churchward, on. He's the president and CEO of DataTorrent. Guy, great to see you. >> Thank you, Jeff, 'preciate it. >> So how have you been surviving the crazy conference season? >> It's been crazy. This is very unusual. It's just calm and quiet and relaxed, and there's not people buzzing around, so it's different. >> So you've been at DataTorrent for a while now, so give us kind of the quick update, where you guys are, how things are moving along for you. >> Yeah, I mean, I've kicked in about five months, so I think I'm just coming up to sort of five and a half, six months, so it's a enough time to get my feet wet, understand whether I made a massive mistake or whether it's exciting. I'm still-- >> Jeff: Still here, you're wearing the T-shirt. >> Yeah, I'm pleased to say I'm still very excited about it. It's a great opportunity, and the space is just hot, hot. >> So you guys are involved in streaming data and streaming analytics, and you know, we had Hadoop, was kind of the hot thing in big data, and really the focus has shifted now to streaming analytics. You guys are playing right in that space and have been for a while, but you're starting to make some changes and come at the problem from a slightly different twist. Give us an update on what you guys are up to. >> Yeah, I mean, so when I dropped into DataTorrent, obviously, it's real-time data analytics, based on stream processing or event processing. So the idea is to say instead of doing things like analytics, insight, and action on data at rest, you know, traditional way of doing things is sucking data into a data store and then poking it litigiously at sort of a real-time analytics basis. And what the company decided to do, and again, this is around the founders, is to say if you could take the insight and action piece and shift it left of the data store in memory and then literally garner the insight and action when an event happens, then that's obviously faster and it's quicker. And it was interesting, a client said to us recently that batch, or stream, or near real-time, or microbatch, is sort of like real-time for a person, 'cause a person can't think that fast. So the latency is a factor of that, but what we do is real-time for a computer. So the idea here is that you literally have sub-second latency and response and actions and insight. But anyway, they built a toolkit, and they built a development platform, and it's completely extensible, and we've got a dozen customers on board, and they're high production, and people are running a billion events per second, so it's very cool. But there wasn't this repeatable business, and I think the deeper I got into it, you also look at it and you say, "Well, Hadoop isn't the easiest thing to deploy." >> Jeff: Right, right, consistently. >> And, the company had this mantra, really, of going to solve total cost of ownership and time to value, so in other words, how fast can I get to an outcome and how cheap is it to run it. So can you create unique IP on top of opensource that allows you to basically get up and running quickly, it's got a good budget constraint from a scale-up perspective and scale-out, but at the same time, you don't need these genius developers to work on it because there's only a small portion of people who basically can deploy a Hadoop cluster in a massive scale in a reliable way. So we thought, well, the thing to do is to really bring it into the masses. But again, if you bring a toolkit down, you're really saying here's a toolkit and an opportunity, and then build the applications and see what you can do. What we figured is actually what you want to do is to say, no, let's just see if we can take Hadoop out of the picture and the complexity of it, and actually provide an end-to-end application. So we looked to each of the customers' current deployments and then figured out, can we actually industrialize that pipeline? In other words, take the opensource components, ruggedize them, scale them, make sure that they stay up, they're full torrents, 7x24, and then provide them as an application. So we're actually shifting our focus, I think, from just what are called the apex platform and the stream-based processing platform to an application factory and actually producing end-to-end applications. >> 'Cause it's so interesting to think of batch and batch in not real-time compared to real-time streaming, right? We used to take action on a sample of old data, and now, you've got the opportunity to actually take action on all of the now data. Pretty significant difference. >> Yeah, I mean, it kills me. I've got to say, since the last time we met, I literally wrote a blog series, and one of them was called Analytics, Real-Time Analytics versus Real-Time Analytics. And I had this hilarious situation where I was talking to a client, and I asked then, and I said, "Do you do real-time analytics?" They go, "Yeah." And I said, "Do you work on real-time data?" And they said, "Yeah." And I said, "What's your latency between an event happening "and you being able to take an action on the event?" And he said, "Well, 60 milliseconds." It's just amazing. I said, "Well, tell me what your architecture looks like." And he says, "Well I take Kafka into Apex as a stream. "I then import it in essence into Cassandra, "and then I allow my customers to poke the data." So I said, "Well, but that's not 60 milliseconds." And he goes, "No, no, it is." And I said, "What are you measuring?" He goes, "Well, the customer basically puts "an inquiry onto the data store." And so literally, what he's doing is a real-time query against a stale data that's sitting inside of a date lake. But he swore blind. >> But it's fast though, right? >> And that's the thing is he's looking, he say, "Hey, well, I can get a really quick response." Well, I can as well. I mean, I can look at Google World and I can look at my house, and I can find out that my house is not real-time. And that's really what it was. So you then say to yourself, well look, the whole security market is based around this technology. It's classic ETL, and it's basically get the data, suck it in, park it into a data store, and then poke at it. >> Jeff: Right >> But that means that that latency, by just the sheer fact that you're taking the data in and you're normalizing it and dropping it into a data store, your latency's already out there. And so one of the applications that we looked at is around fraud, and specifically payment fraud and credit card fraud. And everything out there in the market today is basically, it's detection because of the latency. If you kind of think about it, credit card swipe, the transaction's happened, they catch the first one, they look at it and say, "Well, that's a bit weird." If another one of these ones comes up, then we know we've got fraud. Well, of course, what happens is they suck the data in, it sits inside a data store, they poke the data a little bit later, and they figure out, actually, it is fraud. But the second action has happened. So they detected fraud, but they couldn't prevent it, so everything out there is payment fraud prevention, or payment fraud detection because it's basically got that latency. So what we've done is we said to ourself, "No, we actually can prevent it." Because if you can move the insight and actions to the left-hand side of the data store, and as the event is happening, you literally can grab that card swipe and say no, no, no, you don't do it anymore, you prevent it. So, it's literally taking that whole market from, in essence, detection to prevention. And this is, it's kind of fascinating because there's other angles to this. There's a marketplace inside the credit card site that talks about card not present, and there's a thing called OmniChannel, and OmniChannel's interesting, 'cause most retailers have gone out there and they've got their bricks and mortar infrastructure and architecture and data centers, and they've gone and acquired an online company. And so, now, they have these two different architectures, and if you imagine if you got to hop between the two, it kind of has gaps. And so, the fraudsters will exploit OmniChannel because there's multiple different architectures around, right? So if you think about it, there's one side of saying, hey, if we can prevent that, so taking in a huge amount of data, having it talk, having a life cycle around it, and literally being able to detect and then prevent fraud before the fraudsters can actually figure out what to do, that's fantastic, and then on the plus side, you could take that same pipeline and that same application, and you can actually provide it to the retailers and say, well, what you'd want to do is things like, again, I wrote another blog on it, loyalty brand. You know, on the retail side, is for instance, my wife, we shop like crazy, everybody does. I try not to, but let's say she's been on the Nordstrom site, and we've got a Nordstrom. So Nordstrom has a cookie on their system and they can figure what had been done. And she's surfing around, and she finds a dress she kind of likes, but she doesn't buy it because she doesn't want to spend the money. Now, I'm in Nordstrom's about four weeks later, and I'm literally buying a pair of socks. A card swipe, and what it does is because you've got this OmniChannel and you can connect the two, what they want to do is to be able to turn around and say, "Oh, Guy, before we run this credit card, "we noticed that your wife was looking at this dress. "We know her birthday's coming up. "And by the way, we've checked our store, "and we've got the color and the size "she wants it in, and if you want, "we'll put it on the credit card." >> Don't tell her that, she already bought too much. She won't want you to get that dress. Nah, it's a great, it's a really interesting example, right? >> But it is that, and if you kind of think about it, and this where, when they say every second counts, it's like every millisecond counts. And so it really is machine-to-machine, real-time, and that's what we're providing. >> Well, that's the interesting, you know, a couple things just jump into mind as you're talking. One is by going the application route, right, you're reducing the overhead for just pure talent that we keep hearing about. It's such a shortage in some of these big data applications, Hadoop, specifically. So now, you're delivering a bunch of that, that's already packaged to do a degree in an application, is that accurate? >> Yeah, I mean I kind of look at the engineering talent inside an organization is like a triangle. And at the very top, you have talented engineers that basically can hard code and that's really where our technology has sat traditionally. So, we go to a large organization. They have a hundred people dedicated to this sport. The challenge is then it means the small organizations who don't have it can't take advantage. And then you've got at the base end, you have technologies like Tableau, you know, as a GUI that you can use by an IT guy. And in the middle you've got this massive swath of engineering talent that literally isn't the, Yoda hardcode on the analytics stuff and really can't do the Hadoop cluster. But they want to basically get dangerous on this technology, and if you can take your, you know, the top talent, and you bring that in to that center and then provide it at a cost economics that makes sense, then you're away. And that's really what we've seen is. So our client base is going to go from the 1410, 1420, 1450s, into the 14,000s and you bring it down, and that's really, if you think about it, that's where Splunk kind of got their roots. Which is really, get an application, allow people to use it, execute against it and then build that base up. >> That's ironically that you bring up Splunk 'cause George Gilbert, one of our Wikibon analysts, loves to say that Splunk is the best imitation of Hadoop that was ever created. He thinks of it really as a Hadoop application as opposed to Splunk, because they're super successful. They found a great application. They've been doing a terrific job. But the other piece that you brought up that triggered my mind was really the machine-to-machine. And real-time is always an interesting topic. What is real time? I always think of real time means in time to do something about it. That can be a wide spectrum depending on what you're actually doing. And the machine-to-machine aspect is really important because they do operate at a completely different level of speed. And time is very different for a machine-to-machine operation interaction interface than trying to provide some insight to a human, so they can start to make a decision. >> Yeah, I mean, you know, it was, again, one of those moments through the last five months I was looking at it. There's a very popular technology in our space called Spark, Apache Spark. And it's successful and it's great in batch and it's got micro-batch and there's actually a thing called Spark Streaming, which is micro-batch. But in essence, it's about a second latency, and so you look at it and you go, but what's in a second? You know what I mean? I mean, surely that's good enough. And absolutely, it's good enough for some stuff. But if you were, I mean we joke about it with things like autonomous cars. If you have cruise control, adaptive cruise control, you don't want that run on batch because that second is the difference between you slamming into a truck or not. If you have DHL, they're doing delivery drops to you, and you're actually measuring weather patterns against it, and correlating where you're going to drive and how and high and where, there's no way that you're going to run on a batch process. And then batch is just so slow in comparison. We actually built an application and it's a demo up on our web. And it's a live app, and when I sat down with the engineering team, and I said, "Look, I need people to understand "what real real-time does and the benefits of it." And it's simply doing is shifting the analytics and actions from the right-hand side of where the data store is, to the left-hand side. So you take all of the latency of parting the data and then go find the data. And what we did is we said, look, well, I want to do this really fair and, when you were a kid, there used to be games like Snap, you know, where the cards that you would turn over and you'd go snap and it's mine. So we're just looking and say, "Okay, "why don't we do something like that?" It's like fishing, you know, tickling fish and who sees the first fish, you grab it, it's yours. So we created an application that basically creates random numbers at a very, very huge speed, and whichever process, we have three processes running, whichever one sees it the first time, puts their hands up and says, "I got that." And if somebody else says, "I've got that," but they see a timestamp on the other one, they can't claim it. One wins, and the other two lose. And I did it, and we optimized around, basically, the Apache Apex code, which is ours in stream mode, the Apache Apex, believe it or not, in a micro-batch mode, and Spark Streaming, as fast as they can, and we literally engineered the hell out of them to get them as fast as possible. And if you look at the results, it literally is, win every time for stream, and a loss every time for the other two. So from a speed perspective, now the reality is like I said, is if I'm showing a dashboard to you, by the time you blink, all three have gotten you the data. It's immaterial, and this isn't knocking on Spark. Our largest deployments all run on what we call, like a cask-type architecture, which is basically Kafka Apache, Spark. So we see this in Hadoop, and it's always in there. So it's kind of this cache thing. So we like it for what it is, but where customers come unbundled, is where they try and force-fit a technology into the wrong space. And so again, you mentioned Splunk, these sort of waves of innovation. We find every client sitting there, going, "I want to get inside quicker". The amount of meetings that we're all in, where you sit there and go, "If I'd only known that now "or before, then I would've made a decision." And, you know, in the good old days, we worked at-rest data. At-rest was really the kingdom of Splunk. If you think about it, we're now in the tail end of batch, which is really where Spark's done. So Splunk and Spark are kind of there, and now you're into this real-time. So again, it's running at a fair pace, but the learnings that we've had over the last few months is toolkits are great, platforms are great, but to bring this out into a mass adoption, you really need to make sure that you've provided hardened application. So we see ourselves now as, you know, real-time big data applications company, not just Apache. >> And when you look at the application space that you're going to attack, do you look at it kind of vertically, do you look at it functionally, kind of, you mentioned fraud as one of the earlier ones. How are you kind of organizing yourself around the application space? >> Yeah, and so, the best way for me to describe it, and I want to spin it in a better way than this, but I'll tell you exactly as we've done it, which is, I've looked at what the customers have currently got and we have deployments in about a dozen big customers and they're all different use cases, and then I've looked at it and said, "What you really want to do is you want to go "to a market that people have a current problem, "and also in a vertical where they're prepared "to pay for something and solving a problem "that if they give you money, they either "make money quickly or they save money quickly." So it's actually-- >> So simple. (laughs) >> But it would be much better if I said it in a pure way and I made some magical thing up, but in reality is I'm looking and going, "You got to go where the hardest problems are," And right now, a few things like card not present, you look at roaming abuse and you look at OmniChannel from payment fraud, everybody is looking for something. Now, the challenge is the market's noisy there, and so what happens is everybody's saying, "But I've got it." >> That's what strikes me about the fraud thing is you would think that that's a pretty sophisticated market place in which to compete. So you clearly have to have an advantage to even get a meeting, I would imagine. >> Yeah, and again, we've tested the market. The market's pretty hard on the back of it. We've got an application coming out shortly, and we're actually doing design partnerships with a couple of big banks. So but we don't want to be seen as just a fraud, now, just a fraud, just a fraud prevention company. (chuckles) I'll stay with a fraud, myself. But you kind of look and you say, look, they'll be a set of fraud applications because there's about half a dozen only to be done, retail, like I mentioned on things like the loyalty brand stuff. We have a number of companies that are using us for ad tech. So again, I can't mention the names. Actually, we've just published one, Publix, no, PubMatic is one of the ad tech organizations that's using our products. But we'll literally come out and harden that pipeline as well. So we're going to strut along but instead of just saying, "Hey, we've solved absolutely everything," what I want to do is to solve a problem for someone and then just move forward. You know, most of our customers have somewhere between three to five different applications that are running up and that are in production. So once the platform's in, you know, then they see the value of it. But we really want to make sure that we're closer to the end result and to an outcome, because that's the du jour way that customers want to buy things now. >> Well, and they always have, right? Like you said, they've got a burning issue. You either got to make money or save money. And if it's not a burning issue, it falls to the bottom of the pile, 'cause there's something that's burning that they need to fix quickly. >> And the other thing, Jeff, is if you, and again, it's dirty laundry, but if you think about it, I go to an account and the account's got a fraud solution, and it's all right but it's not doing what they want, but we come along up with a platform, say, "We can do absolutely anything." And then they go, "Well, I've got this really difficult "problem that no one's solved for me, "but I'm not even sure if I've got a budget for it. "Let's spend two year messing around with it. And that's no good, you know? From a small company, you really want that tractionable event, so my thing is just say, "No, what we want to do is I want to go "talk to John about John's problem," and say, "I can solve it better than the current one." And there is nothing in the market today, on the payment fraud side, that will provide prevention. It is all detection. So, there's a unique value. The question is whether we can get the noise out. >> All right, well, we look forward to watching the progress and we'll check again in five months or so. >> Thank you, Jeff, 'preciate it. >> Guy Churchward, he's from DataTorrent, President and CEO. Took over about five months ago and kind of changed the course a little bit. Exciting to watch, thanks for stopping by. >> Guy: Thank you >> All right, Jeff Frick, you're watching the theCUBE. See you next time. Thanks for watching. (upbeat electronic music)

Published Date : Jul 21 2017

SUMMARY :

a little bit of a break from the crazy conference season, and there's not people buzzing around, so it's different. where you guys are, how things are moving along for you. to get my feet wet, understand whether I made It's a great opportunity, and the space is just hot, hot. and really the focus has shifted now to streaming analytics. So the idea here is that you literally have and then build the applications and see what you can do. 'Cause it's so interesting to think and I said, "Do you do real-time analytics?" And that's the thing is he's looking, and if you imagine if you got to hop She won't want you to get that dress. But it is that, and if you kind of think about it, Well, that's the interesting, you know, And at the very top, you have talented engineers But the other piece that you brought up and so you look at it and you go, but what's in a second? And when you look at the application space Yeah, and so, the best way for me to describe it, So simple. you look at roaming abuse and you look at OmniChannel So you clearly have to have an advantage So once the platform's in, you know, that they need to fix quickly. and again, it's dirty laundry, but if you think about it, and we'll check again in five months or so. and kind of changed the course a little bit. See you next time.

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
Nordstrom	ORGANIZATION	0.99+
Jeff	PERSON	0.99+
George Gilbert	PERSON	0.99+
Jeff Frick	PERSON	0.99+
DataTorrent	ORGANIZATION	0.99+
OmniChannel	ORGANIZATION	0.99+
DHL	ORGANIZATION	0.99+
Guy Churchward	PERSON	0.99+
Palo Alto	LOCATION	0.99+
PubMatic	ORGANIZATION	0.99+
60 milliseconds	QUANTITY	0.99+
three	QUANTITY	0.99+
Publix	ORGANIZATION	0.99+
two year	QUANTITY	0.99+
second action	QUANTITY	0.99+
first time	QUANTITY	0.99+
two	QUANTITY	0.99+
three processes	QUANTITY	0.99+
14,000s	QUANTITY	0.99+
six months	QUANTITY	0.98+
CUBE	ORGANIZATION	0.98+
first one	QUANTITY	0.98+
Tableau	TITLE	0.98+
one	QUANTITY	0.98+
Snap	TITLE	0.98+
today	DATE	0.98+
Apache	ORGANIZATION	0.98+
one side	QUANTITY	0.98+
Splunk	ORGANIZATION	0.98+
first fish	QUANTITY	0.98+
One	QUANTITY	0.97+
second	QUANTITY	0.97+
DataTorrent	PERSON	0.97+
five different applications	QUANTITY	0.96+
five months	QUANTITY	0.96+
One wins	QUANTITY	0.96+
two different architectures	QUANTITY	0.96+
each	QUANTITY	0.95+
Hadoop	TITLE	0.95+
five and a half	QUANTITY	0.95+
a billion events per second	QUANTITY	0.95+
about half a dozen	QUANTITY	0.94+
Spark	TITLE	0.94+
about five months	QUANTITY	0.94+
a dozen customers	QUANTITY	0.93+
Spark	ORGANIZATION	0.93+
five months ago	DATE	0.91+
about four weeks later	DATE	0.91+
Wikibon	ORGANIZATION	0.9+
theCUBE	ORGANIZATION	0.9+
last five months	DATE	0.89+
one of them	QUANTITY	0.89+
Apex	TITLE	0.85+
1450s	QUANTITY	0.85+
Kafka	TITLE	0.84+
snap	TITLE	0.83+
Yoda	ORGANIZATION	0.83+
hundred people	QUANTITY	0.81+
Hadoop	ORGANIZATION	0.8+
about a dozen big	QUANTITY	0.79+
1420	QUANTITY	0.78+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for PubMatic: