Dean Wampler Ph.D | Flink Forward 2017

>> Welcome everyone to the first ever U.S. user conference of Apache Flink, sponsored by data Artisans, the creators of Flink. The conference kicked off this morning with some very high-profile customer use cases, including Netflix and Uber, which were quite impressive. We're on the ground at the Kabuki Hotel in San Francisco and our first guest is Dean Wampler, VP of fast data engineering at Lightbend. Welcome Dean. >> Thank you. Good to see you again George. >> So, big picture context setting, Spark exploded on the scene, blew away the expectations, even of their creators, with the speed and the deeply integrated libraries, and essentially replaced MapReduce really quickly. >> Yeah. >> So what is behind Flink's rapid adoption? >> Right, I think it's an interesting story and if you'd asked me a year ago, I probably would've said, well I'm not sure we really need Flink, Spark seems to meet all our needs. But, I pretty quickly changed my mind as I got to know about Flink because, it is a broad ecosystem, there's a wide variety of problems people are trying to solve, and what Flink is doing very well is solving low latency streaming, but still at scale, like Spark. Where Spark is still primarily a mini-batch model, so it has longer latency. And Flink has been on the cutting edge too, of embracing some of the more advanced streaming scenarios, like proper handling of late arrival of data, windowing semantics, things like this. So it's really filling an important niche, but a fairly broad niche that people have. And also, not everybody needs the full-featured capabilities of Spark like batch analytics or whatever, and so having one tool that's focused just on processing streams is often a good idea. >> So would that relate to a smaller surface area to learn and to administer? >> I think it's a big part of it, yeah. I mean Spark is incredibly well engineered and it works very well, but it's a bigger system so there's going to be more to run. And there is something very attractive about having a more focused tool that, you know, less things to break basically. >> You mention sort of lower-latency and a few extra, a few fewer bells and whistles. Can you give us some examples of use cases where you wouldn't need perhaps all of the integrated libraries of Spark or the big footprint that gives you all that resilience and, you know, the functional programming that lets you sort of, recreate lineage. Tell us sort of how a customer who's approaching this should pick the trade-offs. >> Right. Well normally when you have a low latency problem, it means you have less time to do work, so you tend to do simpler things, in that time frame. But, just to give you a really interesting example, I was talking with a development team at a bank recently that does credit card authorizations. You click by on a website and there's maybe a few hundred milliseconds when the user is expecting a reply, right. But it turns out there's so many things going on in that loop, from browser to servers and back that they only have about ten milliseconds, when they get the data, to make a decision about whether this looks fraudulent or it looks legit, and they make a decision. So ten milliseconds is fairly narrow, that means you have to have your models already done and ready to go. And a quick way to actually apply them, you know, take this data, ask the model is this okay, and get a response. So, a lot of it is kind of boiling down to that, it's either, I would say one of two things, it's either I'm doing basic filtering, transforming of data, like raw data coming into my environment/ Or I have some maybe more sophisticated analytics that are running behind the scenes, and then in real time, so it's, so to speak, data is coming in and I'm asking questions against those models about this data, like authorizing credit cards. >> Okay, so to recap, the low latency means you have to have perhaps scored your models already. Okay, so trained and scored in the background and then, with this low latency solution you can look up, key based look up I guess, to an external store, okay. So how is Lightbend making it simple to put, what essentially has to be for any pipeline it appears, multiple products together seamlessly. >> That is the challenge. I mean it would be great if you could just deploy Flink, and that was the only thing you needed or Kafka, or pick any one of them. But of course, the reality is, we always have to integrate a bunch of tools together, and it's that integration that's usually the hard part. How do I know why this thing's misbehaving, when maybe it's something upstream that's misbehaving? That sort of thing. So, we've been surveying the landscape to understand, first of all, what are the tools that seem to be most mature, most vibrant as a community, that address the variety of scenarios people are trying to deal with, some of which we just discussed. And what are the kind of integration problems that you have to solve to make these reliable systems? So we've been building a platform, called the Fast Data Platform, that's approaching its first beta, that is designed to help solve a lot of those problems for you, so you can focus on your actual business problems. >> And from a customer point of view, would you take end-to-end ownership of that solution, so that if they chose you could manage it On-Prem or in the Cloud, and handle level three support across the stack? >> That's an interesting question. We think eventually we'll get to that point of more of a service offering, but right now most of the customers we're talking to are still more interested in managing things themselves, but not having as much of a hassle of doing it themselves. So what we're trying to balance is tooling that makes it easier to get started quickly and build applications, but also leverages some of the modern, like machine-learning, artificial intelligence stuff to automatically detect and correct for a lot of common problems, and other management scenarios. So at least it's not quite as, you're on your own, as it could be if you were just trying to glue everything together yourself. >> So if I understand, it sounds like the first stage in the journey is, help me rationalize what I'm trying to get to work together On-Prem, and part of that is using machine-learning now, as part of management. And then, over time, this management gets better and better at root-cause analysis and auto-remediation, and then it can move into the Cloud. And these disparate components become part of a single SAS solution, under the management. >> That's the long-term goal, definitely yeah. >> Looking out at where all this intense interest is right now in IOT applications. We can't really go back to the Cloud for, send all the data back to the Cloud, and get an immediate answer, and then drive an action. How do you see that shaping up in terms of what's on the edge and what's on the Cloud? >> Yeah, that's a really interesting question, and there are some particular challenges, because a lot of companies will migrate to the Cloud in a peace meal fashion, so they've got a sort of hybrid deployment scenario with things On-Premise and in the Cloud, and so forth. One of the things you mentioned that's pretty important, is I've got all this data coming in, how do I capture it reliably? So, tools like Kafka are really good for that and Pravega that Strachan from EMC mentioned, is sort of filling the same need, that I need to capture stuff reliably, serve downstream consumers, make it easy to do analytics over this stream that looks a lot different than a traditional database, where it's kind of data at rest, it's not static, but it's not moving. So, that's one of the things you have to do well, and then figure out how to get that data to the right consumer, and account for all of the latencies, like if I needed that ten millisecond credit card authorization, but I had data split over my On-Premise and my Cloud environment, you know, that would not work very well. So there's a lot of that kind of architecture of data flow, so it becomes really important. >> Do you see Lightbend offering that management solution that enforces SLAs or do you see sourcing that technology from others and then integrating it tightly with the particular software building blocks that make up the pipeline? >> It's a little of both. We're sort of in the early stages of building services along those lines. Some of the technology we've had for a while, our Akka middleware system, and the streaming API on top of it would be really good for basing that kind of a platform, where you can think about SLA requirements and trading off performance, or whatever, versus getting answers in a reasonable time, good recovery and error scenarios, stuff like that. So it's all early days, but we are thinking very hard about that problem, because ultimately, at the end of the, that's what customers care about, they don't care about Kafka versus Spark, or whatever. They just care about, I've got data coming in, I need an answer, and ten milliseconds or I lose money, and that's the kind of thing that they want you to sell for them, so that's really what we have to focus on. >> So, last question before we have to go, do you see potentially a scenario where there's one type of technology on the edge, or many types, and then something more dominant in the Cloud, where basically you do more training, model training, and out on the edge you do the low latency predictions or prescriptions. >> That's pretty much the architecture that has emerged. I'm going to talk a little bit about this today, in my talk, where, like we said earlier, I may have a very short window in which I have to make a decision, but it's based on a model that I have been building for a while and I can build in the background, where I have more tolerance for the time it takes. >> Up in the Cloud? >> Up in the Cloud. Actually this is kind of independent of deployment scenario, but it could be both like that, so you could have something that is closer to the consumer of the data, maybe in the Cloud, and deployed in Europe for European customers, but it might be working with systems back in the U.S.A. that are doing the heavy-lifting of building these models and so forth. We live in such a world where you can put things where you want, you can move things around, you can glue things together, and a lot of times it's just knowing what's the right combination of stuff. >> Alright Dean, it was great to see you and to hear the story. It sounds compelling. >> Thank you very much. >> So, this is George Gilbert. We are on the ground at Flink Forward, data Artisans user conference for the Flink product, and we will be back after this short break.

Published Date : Apr 14 2017

SUMMARY :

We're on the ground at the Kabuki Hotel in San Francisco Good to see you again George. Spark exploded on the scene, of embracing some of the more advanced streaming scenarios, you know, less things to break basically. that gives you all that resilience and, you know, that means you have to have your models already done Okay, so to recap, the low latency means you have to have and that was the only thing you needed that makes it easier to get started quickly and part of that is using machine-learning now, send all the data back to the Cloud, So, that's one of the things you have to do well, and that's the kind of thing in the Cloud, where basically you do more training, but it's based on a model that I have been building that are doing the heavy-lifting and to hear the story. We are on the ground at Flink Forward,

ENTITIES

Entity	Category	Confidence
Dean Wampler	PERSON	0.99+
George Gilbert	PERSON	0.99+
Dean	PERSON	0.99+
George	PERSON	0.99+
Europe	LOCATION	0.99+
U.S.A.	LOCATION	0.99+
Flink	ORGANIZATION	0.99+
EMC	ORGANIZATION	0.99+
San Francisco	LOCATION	0.99+
Lightbend	ORGANIZATION	0.99+
first beta	QUANTITY	0.99+
One	QUANTITY	0.99+
first guest	QUANTITY	0.98+
both	QUANTITY	0.98+
first	QUANTITY	0.98+
today	DATE	0.98+
ten milliseconds	QUANTITY	0.98+
Kafka	TITLE	0.98+
one	QUANTITY	0.98+
Uber	ORGANIZATION	0.98+
ten millisecond	QUANTITY	0.98+
Spark	TITLE	0.97+
U.S.	LOCATION	0.97+
two things	QUANTITY	0.96+
first stage	QUANTITY	0.96+
Netflix	ORGANIZATION	0.95+
a year ago	DATE	0.95+
about ten milliseconds	QUANTITY	0.94+
level three	QUANTITY	0.94+
Flink Forward	ORGANIZATION	0.93+
one type	QUANTITY	0.93+
single	QUANTITY	0.92+
2017	DATE	0.89+
MapReduce	TITLE	0.89+
Apache Flink	ORGANIZATION	0.89+
Akka	ORGANIZATION	0.88+
European	OTHER	0.83+
this morning	DATE	0.78+
Kabuki Hotel	LOCATION	0.78+
one tool	QUANTITY	0.77+
Pravega	TITLE	0.72+
few hundred milliseconds	QUANTITY	0.66+
Strachan	PERSON	0.62+
SAS	ORGANIZATION	0.58+
Forward	EVENT	0.55+
Cloud	TITLE	0.51+
Fast Data Platform	TITLE	0.5+
Flink	TITLE	0.5+
Lightbend	PERSON	0.38+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Strachan: