Tom Kaitchuck, Dell EMC - Flink Forward - #FFSF17 - #theCUBE

>> Hi everyone, welcome back. We are at the Flink Forward user conference sponsored by data Artisans, the creators of Flink. We're here at the Kabuki Hotel in San Francisco. We're on the ground and we're with Tom Kaitchuck, who is senior consulting engineer at Dell EMC. >> Yes. >> You had a fairly exciting announcement to make this morning. Why don't you give us a little background on that? >> Yes, so we're announcing Pravega, which is a new open-source product, that Dell's been working on for the last year. And we're opening the floodgates on May 10th and it's going to act as a streaming storage system. >> Okay, so help us square a couple circles. So, we all learned over the last couple years as Kafka took off and, sort of, swept... large and small enterprises alike, large and medium-sized enterprises alike by storm, that we thought the way to communicate data between applications, but as you were telling me, it still makes assumptions about conventional hardware that it runs on that might be, perhaps, suboptimal. >> Yeah so, I think the difference between what we're doing and what Kafka is, it just fundamentally comes down to the model. Kafka is a messaging system and its model is built around messages. Ours is a streaming system and we operate fundamentally as a stream. So, when a client sends bytes over the wire, the server does not interpret them at all. It is opaque. It is analogous to a Unix pipe or an HTB socket. What goes over isn't interpreted and that gives us the ability to channel that data in. We ended up piping it into a long-term archival system, which gives us advantages in terms of storage. So, wherein a system that's like Kafka, where you need performance and you need to get high throughput, you're going to basically run on machines that are built for iAPPS, they're built for capacity to get data in and get data out and that works and it's fast, but what it doesn't give you is it doesn't give you cheap long-term storage. So, usually what people do, is they have a separate system for cheap, long-term storage that's usually something like HDFS. So, you end up running a Kafka job that reads out of your Kafka topic and ends up writing to HDFS. So, what we're doing is building a streaming system that is directly taking the stream that's coming in from the user and holding it locally and giving you the ability to stream off of it and the ability to connect to it and listen to it in realtime and giving you strong consistency, but at the same time, the ultimate place where this is stored durably is in your long-term storage, it's in your HDFS, and the advantage of that is that your storage becomes cheap, dense storage that you're used to configuring for HDFS, and so you can configure very long-term storage. So, you can use the same interface to back up and go to last year and stream forward and the advantage of that, is that you don't end up in what I refer to as this sort of accidental lambda architecture, where you built something like a Flink cluster and you say oh well, this is great, and it connects to the streaming component for Kafka and we can stream data and we get realtime analytics and we can do all this nice stuff, but then if we have a bug in our code and we need to go back, you actually need to flip to a different connector and deploy a different job to refill back a backfill from a different storage system. So, we're wanting it to solve that problem. >> Okay, so... let's frame that, so that a customer today would have, a mainstream customer that's been working with Hadoop, would have their data lake, HDFS and their data, which is sort of big, sort of old archive almost. >> Yes. >> And then they would be using perhaps Kafka either to ingest data, additional data into the data lake or, perhaps, extracting it for an application that wants to process it with continuous processing or low latency perhaps. Now, your solution comes where you want an emphasis on speed and scale and you're not reformatting the data essentially to hit the disk in a format that's understandable by the file system. Your data is trying to move along in the format of memory, if I'm understanding correctly. So there's a lot less translation going on and you use, partly because of that, and partly because you have, I'm understanding, higher capacity storage, you don't have to spill to disk and exercise all that I/O that you would get from expensive disks. >> Right. >> So, HDFS, big data, the Dell EMC solution, much faster data than Kafka and, then so, makes it a good citizen in a world where you want to built more and more continuous applications where latency, every last bit of latency, is the enemy. >> Yes, yes. So, our goal is to get very low append latency and that's important because we can't, like right now, you can't reasonably do something even analogous to streaming off of HDFS, because the write latency is just too high. You end up calling write with a small bit of data and you're talking 100 plus milliseconds and then you need to go turn around and read, and your read performance will be very low if you do lots of tiny appends. So, what we give you is a system that lets you do lots of tiny appends very fast, very low latency, but the same time, the data's ultimately being stored in HDFS. So, you still get the nice bulk storage capacity of HDFS, but without incurring the penalty of all those tiny appends. >> And, just to be clear, those tiny appends, it's like, your system is absorbing whatever volume or velocity that's thrown at it. So it handles the back pressure and then, rather than HDFS sort of backing things up cause of its high latency write path, you're absorbing all that because you're not very resource-intensive, being optimized for speed and capacity. And then you can put it back in to the long-term store, HDFS. >> Yes, we can aggregate all these tiny writes into one or two big writes and put them in, yeah. >> So, tell us some of the use cases that you're working on with design partners or... >> Right, so the big one we're working on with data Artisans is we want to get exactly-once semantics in Flink jobs that are derived from one another. So, for example, if you have a job and it takes in say, an order or something, and it processes it and it generates some derivative data,. Today, if you want to have exactly-once semantic on a job that's running on that derivative data, it has to be co-located and run with the first job. And that's problematic for a number of reasons. Namely, because in a lot of companies, you don't want to have some secondary job impact the primary one. So, you want something in between that can operate as a buffer there. But right now, there's no way to do that with a streaming pipeline without giving up exactly-once semantics. And exactly-once semantics is a really big deal for a lot of Flink applications and so what we can let you do is have one Flink job that runs, produces some output and then goes into Pravega as a sink and that Pravega turns around and is a source for another Flink job and you can still have exactly-once semantics end to end. >> Okay so, it sounds like, just the way Kafka was sort of the source and sink, consumer, producer, through a hub, but once it was handed off to another system, it lost that exactly-once guarantee and, as we said, wasn't optimized for, necessarily for throughput and capacity, so that's how you guys solved that problem. Okay so, if you were to pick some common applications that have been attacked by, or served by Kafka and Flink which ones are there certain characteristics that would be most amenable to the Dell EMC solution? >> Anything that requires strong consistency. So the real difference that we have is that we have a strong consistent application. So, we don't just have this one API that's dealing with events and so on. We actually have this low-level primitive and we're building a lot of APIs on top of it. So, let me give you an example. We have an API that let's you have, we call a state synchronizer. And what that is, is an object that you can hold in memory across a number of machines and you can perform updates on that object. But it's guaranteed that every process that's performing an update is performing an update on the latest version of that object. So that object is coordinated across a fleet and everyone sees the same sequence of updates and sees the same object at any given time. And that's a real advantage for anywhere where you're trying to do something that requires strong consistency. So you can do those sorts of applications and you can also do things that require transactional semantics. So, one thing that we allow is that when you write data to our output, you can do it transactionally. So, you can have one Pravega stream and coordinate a transaction potentially across different areas of sort of keyspace that would end up actually on multiple Pravega hosts and have that atomic consistency where you call commit and all of the rights across all of them go in simultaneously. And that's a big for a lot of appliactions and you can sort of combine these two primitives where you have a state object and you have a transaction object to interlink transactionality with that of an external system. So, you could, for example, say I have a Flink sink that's going to have a couple of different outputs, but one of them is, say, a SQL database, right? And then you could say, I want this output to go to Pravega if and only if, my transaction to SQL commits. >> Oh, it sounds like you get a freebie of distributed sort of transactions. >> Yes. >> That's very, very interesting, cause that's something, that's a handoff that you would expect from a single-vendor solution. Very, very impressive. Alright, Tom, on that note, we're going to have to cut it off, because we are ending our coverage at Flink Forward, the data Artisans user conference, the first one held in the U.S. and we are at the Kabuki Hotel in San Francisco. I'm George Gilbert and we're signing off for this afternoon. Thanks for watching. (bright music)

Published Date : Apr 15 2017

SUMMARY :

We're on the ground and we're with Tom Kaitchuck, Why don't you give us a little background on that? and it's going to act as a streaming storage system. but as you were telling me, and the advantage of that, is that you don't end up in that's been working with Hadoop, and partly because you have, in a world where you want to built more and more and then you need to go turn around and read, And then you can put it back in Yes, we can aggregate all these tiny writes that you're working on with design partners or... So, for example, if you have a job and it takes in so that's how you guys solved that problem. and you can also do things Oh, it sounds like you get that's a handoff that you would expect

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Tom Kaitchuck	PERSON	0.99+
Tom	PERSON	0.99+
May 10th	DATE	0.99+
U.S.	LOCATION	0.99+
Dell	ORGANIZATION	0.99+
100 plus milliseconds	QUANTITY	0.99+
Flink	ORGANIZATION	0.99+
Kafka	TITLE	0.99+
San Francisco	LOCATION	0.99+
one	QUANTITY	0.99+
first job	QUANTITY	0.99+
Today	DATE	0.99+
Dell EMC	ORGANIZATION	0.99+
Pravega	TITLE	0.98+
last year	DATE	0.98+
first one	QUANTITY	0.98+
single	QUANTITY	0.97+
two big	QUANTITY	0.97+
today	DATE	0.96+
SQL	TITLE	0.96+
#FFSF17	EVENT	0.95+
one thing	QUANTITY	0.94+
two primitives	QUANTITY	0.9+
Pravega	ORGANIZATION	0.87+
last couple years	DATE	0.86+
this afternoon	DATE	0.86+
Flink Forward	ORGANIZATION	0.84+
Flink Forward user	EVENT	0.84+
Kabuki Hotel	LOCATION	0.8+
iAPPS	TITLE	0.8+
Hadoop	TITLE	0.79+
Flink Forward	EVENT	0.75+
Flink	TITLE	0.64+
Kafka	PERSON	0.61+
morning	DATE	0.59+
Pravega	PERSON	0.53+
Artisans	EVENT	0.45+
#theCUBE	ORGANIZATION	0.39+

Kenneth Knowles, Google - Flink Forward - #FFSF17 - #theCUBE

>> Welcome everybody, we're at the Flink Forward conference in San Francisco, at the Kabuki Hotel. Flink Forward U.S. is the first U.S. user conference for the Flink community sponsored by data Artisans, the creators of Flink, and we're here with special guest Kenneth Knowles-- >> Hi. >> Who works for Google and who heads up the Apache Beam Team where, just to set context, Beam is the API Or STK on which developers can build stream processing apps that can be supported by Google's Dataflow, Apache Flink, Spark, Apex, among other future products that'll come along. Ken, why don't you tell us, what was the genesis of Beam, and why did Google open up sort of the API to it. >> So, I can speak as an Apache Beam Team PMC member, that the genesis came from a combined code donation to Apache from Google Cloud Dataflow STK and there was also already written by data Artisans a Flink runner for that, which already included some portability hooks, and then there was also a runner for Spark that was written by some folks at PayPal. And so, sort of those three efforts pointed out that it was a good time to have a unified model for these DAG-based computational... I guess it's a DAG-based computational model. >> Okay, so I want to pause you for a moment. >> Yeah. >> And generally, we try to avoid being rude and cutting off our guests but, in this case, help us understand what a DAG is, and why it's so important. >> Okay, so a DAG is a directed acyclic graph, and, in some sense, if you draw a boxes and arrows diagram of your computation where you say "I read some data from here," and it goes through some filters and then I do a join and then I write it somewhere. These all end up looking what they call the DAG just because of the fact that it is the structure, and all computation sort of can be modeled this way, and in particular, these massively parallel computations profit a lot from being modeled this way as opposed to MapReduce because the fact that you have access to the entire DAG means you can perform transformations and optimizations and you have more opportunities for executing it in different ways. >> Oh, in other words, because you can see the big picture you can find, like, the shortest path as opposed to I've got to do this step, I've got to do this step and this step. >> Yeah, it's exactly like that, you're not constrained to sort of, the person writing the program knows what it is that they want to compute, and then, you know, you have very smart people writing the optimizer and the execution engine. So it may execute an entirely different way, so for example, if you're doing a summation, right, rather than shuffling all your data to one place and summing there, maybe you do some partial summations, and then you just shuffle accumulators to one place, and finish the summation, right? >> Okay, now let me bump you up a couple levels >> Yeah. >> And tell us, so, MapReduce was a trees within the forest approach, you know, lots of seeing just what's a couple feet ahead of you. And now we have the big picture that allows you to find the best path, perhaps, one way of saying it. Tell us though, with Google or with others who are using Beam-compatible applications, what new class of solutions can they build that you wouldn't have done with MapReduce before? >> Well, I guess there's... There's two main aspects to Beam that I would emphasize, there's the portability, so you can write this application without having to commit to which backend you're going to run it on. And there's... There's also the unification of streaming and batch which is not present in a number of backends, and Beam as this layer sort of makes it very easy to use sort of batch-style computation and streaming-style computation in the same pipeline. And actually I said there was two things, the third thing that actually really opens things up is that Beam is not just a portability layer across backends, it's also a portability layer across languages, so, something that really only has preliminary support on a lot of systems is Python, so, for example, Beam has a Python STK where you write a DAG description of your computation in Python, and via Beam's portability API's, one of these sort of usually Java-centric engines would be able to run that Python pipeline. >> Okay, so-- >> So, did I answer your question? >> Yes, yes, but let's go one level deeper, which is, if MapReduce, if its sweet spot was web crawl indexing in batch mode, what are some of the things that are now possible with a Beam-style platform that supports Beam, you know, underneath it, that can do this direct acyclic graph processing? >> I guess what I, I'm still learning all the different things that you can do with this style of computation, and the truth is it's just extremely general, right? You can set up a DAG, and there's a lot of talks here at Flink Forward about using a stream processor to do high frequency trading or fraud detection. And those are completely different even though they're in the same model of computation as, you know, you would still use it for things like crawling the web and doing PageRank over. Actually, at the moment we don't have iterative computations so we wouldn't do PageRank today. >> So, is it considered a complete replacement, and then new used cases for older style frameworks like MapReduce, or is it a complement for things where you want to do more with data in motion or lower latency? >> It is absolutely intended as a full replacement for MapReduce, yes, like, if you're thinking about writing a MapReduce pipeline, instead you should write a Beam pipeline, and then you should benchmark it on different Beam backends, right? >> And, so, working with Spark, working with Flink, how are they, in terms of implementing the full richness of the Beam-interface relative to the Google product Dataflow, from which I assumed Beam was derived? >> So, all of the different backends exist in sort of different states as far as implementing the full model. One thing I really want to emphasize is that Beam is not trying to take the intersection on all of these, right? And I think that your question already shows that you know this, we keep sort of a matrix on our website where we say, "Okay there's all these different "features you might want, "and then there's all these backends "you might want to run it on," and it's sort of there's can you do it, can you do it sometimes, and notes about that, we want this whole matrix to be, yes, you can use all of the model on Flink, all of it on Spark, all of it on Google Cloud Dataflow, but so they all have some gaps and I guess, yeah, we're really welcoming contributors in that space. >> So, for someone whose been around for a long time, you might think of it as an ODBC driver, where the capabilities of the databases behind it are different, and so the drivers can only support some subset of a full capability. >> Yeah, I think that there's, so, I'm not familiar enough with ODBC to say absolutely yes, absolutely no, but yes, it's that sort of a thing, it's like the JVM has many languages on it and ODBC provides this generic database abstraction. >> Is Google's goal with Beam API to make it so that customers demand a level of portability that goes not just for the on-prim products but for products that are in other public clouds, and sort of pry open the API lock in? >> So, I can't say what Google's goals are, but I can certainly say that Beam's goals are that nobody's going to be locked into a particular backend. >> Okay. >> I mean, I can't even say what Beam's goals are, sorry, those are my goals, I can speak for myself. >> Is Beam seeing so far adoption by the sort of big consumer internet companies, or has it started to spread to mainstream enterprises, or is still a little immature? >> I think Beam's still a little bit less mature than that, we're heading into our first stable release, so, we began incubating it as an Apache project about a year ago, and then, around the beginning of the new year, actually right at the end of 2016, we graduated to be an Apache top level project, so right now we're sort of on the road from we've become a top level project, we're seeing contributions ramp up dramatically, and we're aiming for a stable release as soon as possible, our next release we expect to be a stable API that we would encourage users and enterprises to adopt I think. >> Okay, and that's when we would see it in production form on the Google Cloud platform? >> Well, so the thing is that the code and the backends behind it are all very mature, but, right now, we're still sort of like, I don't know how to say it, we're polishing the edges, right, it's still got a lot of rough edges and you might encounter them if you're trying it out right now and things might change out from under you before we make our stable release. >> Understood. >> Yep. All right. Kenneth, thank you for joining us, and for the update on the Beam project and we'll be looking for that and seeing its progress over the next few months. >> Great. Thanks for having me. >> With that, I'm George Gilbert, I'm with Kenneth Knowles, we're at the dataArtisan's Flink Forward user conference in San Francisco at the Kabuki Hotel and we'll be back after a few minutes.

Published Date : Apr 15 2017

SUMMARY :

and we're here with special guest Kenneth Knowles-- Beam is the API Or STK on which developers can build and then there was also a runner for Spark and cutting off our guests but, in this case, and you have more opportunities for executing it Oh, in other words, because you can see the big picture and then you just shuffle accumulators to one place, that allows you to find the best path, and streaming-style computation in the same pipeline. and the truth is it's just extremely general, right? and it's sort of there's can you do it, and so the drivers can only support some subset and ODBC provides this generic database abstraction. are that nobody's going to be I mean, I can't even say what Beam's goals are, and we're aiming for a stable release and you might encounter them and for the update on the Beam project Thanks for having me. in San Francisco at the Kabuki Hotel

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Kenneth	PERSON	0.99+
Kenneth Knowles	PERSON	0.99+
San Francisco	LOCATION	0.99+
Python	TITLE	0.99+
Google	ORGANIZATION	0.99+
Ken	PERSON	0.99+
two things	QUANTITY	0.99+
PayPal	ORGANIZATION	0.99+
one place	QUANTITY	0.98+
three efforts	QUANTITY	0.98+
Flink	ORGANIZATION	0.98+
Flink Forward	EVENT	0.98+
Python STK	TITLE	0.98+
Apache	ORGANIZATION	0.98+
MapReduce	TITLE	0.98+
ODBC	TITLE	0.97+
Beam	TITLE	0.97+
dataArtisan	ORGANIZATION	0.97+
third thing	QUANTITY	0.97+
first stable release	QUANTITY	0.96+
first	QUANTITY	0.95+
#FFSF17	EVENT	0.95+
Apache Beam Team	ORGANIZATION	0.94+
Flink Forward	ORGANIZATION	0.94+
two main aspects	QUANTITY	0.93+
Artisans	ORGANIZATION	0.93+
Beam	ORGANIZATION	0.93+
Spark	TITLE	0.92+
end of 2016	DATE	0.92+
Kabuki Hotel	LOCATION	0.92+
today	DATE	0.87+
about a year ago	DATE	0.85+
Cloud Dataflow	TITLE	0.83+
Dataflow	TITLE	0.82+
Java	TITLE	0.81+
one way	QUANTITY	0.77+
One thing	QUANTITY	0.73+
Google Cloud	TITLE	0.72+
couple feet	QUANTITY	0.71+
Apache	TITLE	0.7+
Flink Forward user	EVENT	0.7+
JVM	TITLE	0.69+
Cloud Dataflow STK	TITLE	0.69+
PMC	ORGANIZATION	0.69+
Forward	EVENT	0.64+
year	DATE	0.62+
DAG	OTHER	0.59+
U.S.	LOCATION	0.53+
Apex	TITLE	0.51+

Stephan Ewen | Flink Forward 2017

(click) >> Welcome, everyone, we're back at the Flink Forward user conference sponsored by data Artisan's folks. This is the first U.S. based Flink user conference, and we are on the ground at the Kabuki Hotel in San Francisco. We have a special guest, Stephan Ewen, who is one of the founders of data Artisans, and one of the creators of Flink. He is CTO, and he is in a position to shed some unique light on the direction of the company and the product. Welcome, Stephan. >> Yeah, so you were asking about how can stream processing or how can Flink and data Artisans help companies that are enterprises that want to adopt this kind of technologies actually do that despite the fact that we've been seeing, if we look at what the big internet companies that first adopted these technologies, what they had to do, they had to go through all this big process of productionizing these things by integrating them with so many other systems, making sure everything fits together, everything kind of works as one piece. What can we do there? So I think there are a few interesting points to that. Let's maybe start with stream processing in general. So, stream processing by itself has actually the potential to simplify many of these setups and infrastructures, per se. There's multiple dimensions to that. First of all, the ability to just more naturally fit what you're doing to what is actually happening. Let me qualify that a little bit. All these companies that are dealing with big data are dealing with data that is typically continuously produced from sensors, from user devices, from server logs, from all these things, right? Which is quite naturally a stream. And processing this with systems that give you the abstraction of a stream is a much more natural fit, so you eliminate bunches of the pipeline that do, for example, try to do periodic ingestion, and then grooming that into a video file and data sets and periodic processing of that and you can for example, get rid of a lot of these things. You kind of get a paradigm that unifies the processing of real time data and also historic data. So this by itself is an interesting development that I think many have recognized and that's why they're excited about stream processing because it helps reduce a lot of that complexity. So that is one side to it. The other side to it is that there was always kind of an interplay between the processing on the data and then you want to do something with these insights, right, you don't process the data just for the fun of processing, right? Usually the outcome infers to something. Sometimes it's just a report, but sometimes it's something that immediately affects how certain services react. For example, how they apply their decisions in classifying transactions as frauds or how to send out alerts, how to trigger certain actions. The interesting thing is then, we're going to see actually a little more of that later in this conference also, is that in this reprocessing paradigm there's this very natural way for these online live applications and the analytical applications to march together, again, reducing a bunch of this complexity. Another thing that is happening that I think is very, very powerful and helping (mumbles) in bringing these kind of technologies to a broader anchor system is actually how the whole deployment stick is growing. So we see actually more and more users converging onto recessed management infrastructures. Yan was an interesting first step to make it really easy and once you've productionized that part of productionized voice systems but even beyond that, like the uptake of mezas, the uptake of containment engines like (mumbles) on the ability to just prepare more functionality buttoned together out of the box, it doesn't pack into a container of what you need and put it into a repository and then various people can bring up these services without having to go through all of the set up and integration work, it can kind of way better templated integration with systems with this kind of technology. So those seem to be helping a lot for much broader adoption of these kind of technologies. Both stream processing as an easier paradigm, fewer moving parts, and developments and (mumbles) technologies. >> So let me see if I can repeat back just a summary version, which is stream processing is more natural to how the data is generated, and so we want to match the processing to how it originates, it flows. At the same time, if we do more of that, that becomes a workload or an application pattern that then becomes more familiar to more people who didn't grow up in a continuous processing environment. But also, it has a third capability of reducing the latency between originating or adjusting the data and getting an analysis that informs a decision whether by a person or a machine. Would that be a >> Yeah, you can even go one step further, it's not just about introducing the latency from the analysis to the decision. In many cases you can actually see that the part that does the analysis in the decision just merge and become one thing which makes it much fewer moving parts, less integration work, less, yeah, less maintenance and complexity. >> Okay, and this would be like, for example, how application databases are taking on the capabilities of analytic databases to some extent, or how stream processors can have machine learning whether they're doing online learning or calling a model that they're going to score in real time or even a pre scored model, is that another example of where we put? >> You can think of those as examples, yeah. A nice way to think about it is that if you look at what a lot of what the analytical applications do versus let's say, just online services that measure offers and trades, or to generate alerts. A lot of those kind of are, in some sense, different ways of just reacting to events, right? If you are receiving some real time data and just want to process these interact with some form of knowledge that you accumulated over the past, or some form of knowledge that you've accumulated from some other inputs and then react to that. That kind of paradigm which is in the core of stream processing for (mumbles) is so generic that it covers many of these use cases, both building directly applications, as we have actually seen, we have seen users that directly build a social network on Flink, where the events that they receive are, you know, a user being created, a user joining a group and so on, and it also covers the analytics of just saying, you know, I have a stream of sensor data and on certain outliers I want to raise alerts. It's so similar once you start thinking about both of them as just handling streams of events, in this flexible fashion that it helps to just bring together many things. >> So, that sounds like it would play into the notion of, micro services where the service is responsible for its own state, and they communicate with each other asynchronously, so you have a cooperating collection of components. Now, there are a lot of people who grew up with databases out here sharing the state among modules of applications. What might drive the growth of this new pattern, the microservices, for, you know, considering that there's millions of people who just know how to use databases to build apps. >> The interesting part that I think drives this new adaption is that it's such a natural fit for the microservice world. So how do you deploy microservices with state, right? You can have a central database with which you work and every time you create a new service you have to make sure that it fits with the capacities and capabilities of the database, you have to make sure that the group that runs this database is okay with the additional load that, or you can go to the different model where each microservice comes up with its own database, but that time, every time you deploy one and that may be a new service or it may just be experimenting with a different variation of the service they'd be testing. You'd have to bring out a completely new thing. In this interesting world of stream processing, stateful stream processing is done by Flink state is embedded directly in the processing application. So, you actually don't worry about this thing separately, you just deploy that one thing, and it brings both together tightly integrated, and it's a natural fit, right, the working set of your application goes with your application. If it deployed, if it's (mumbles), if you bring it down, these things go away. What the central part in this thing is it's nothing more than if you wish a back up store where it would take these snapshots of microservices and store them in order to recover them from catastrophic failures in order to just have an historic version to look into if you figure it out later, you know, something happened, and was this introduced in the last week, let me look at what it looked like the week before or to just migrate it to a different cluster. >> So, we're going to have to cut things short in a moment, but I wanted to ask you one last question: If like, microservices as a sweet spot and sort of near real time decisions are also a sweet spot for Kafka, what might we expect to see in terms of a roadmap that helps make those, either that generalizes those cases, or that opens up new use cases? >> Yes, so, what we're immediately working on in Flink right now is definitely extending the support in this area for the ability to keep much larger state in these applications, so state that really goes into the multiple terrabytes per service, functionality that allows us to manage this, even easier to evolve this, you know. If the application actually starts owning the state and it's not in a centralized database anymore, you start needing a little bit of tooling around this state, similar as the tooling you need in databases, a (mumbles) in all of that, so things that actually make that part easier. Handling (mumbles) and we're actually looking into what are the API's that users actually want in this area, so Flink has I think pretty stellar stream processing API's and if you've seen in the last release, we've actually started adding more low level API's one could even think, API's in which you don't think as streams as distributed collections and windows but to just think about the very basic in gradiances, events, state, time and snapshots, so more control and more flexibility by just taking directly the basic building blocks rather than more high level abstractions. I think you can expect more evolution on that layer, definitely in the near future. >> Alright, Stephan, we have to leave it at that, and hopefully to pick up the conversation not too long in the future, we are at the Flink Forward Conference at the Kabuki Hotel in San Francisco, and we will be back with more just after a few moments. (funky music)

Published Date : Apr 15 2017

SUMMARY :

and one of the creators of Flink. First of all, the ability to just more naturally that then becomes more familiar to more people that does the analysis in the decision just merge and it also covers the analytics of just saying, you know, the microservices, for, you know, and capabilities of the database, similar as the tooling you need in databases, a (mumbles) and hopefully to pick up the conversation

ENTITIES

Entity	Category	Confidence
Stephan	PERSON	0.99+
Stephan Ewen	PERSON	0.99+
Flink	ORGANIZATION	0.99+
San Francisco	LOCATION	0.99+
one	QUANTITY	0.99+
last week	DATE	0.99+
first step	QUANTITY	0.99+
one piece	QUANTITY	0.99+
both	QUANTITY	0.98+
U.S.	LOCATION	0.98+
one side	QUANTITY	0.98+
first	QUANTITY	0.98+
each microservice	QUANTITY	0.98+
one thing	QUANTITY	0.97+
First	QUANTITY	0.97+
one last question	QUANTITY	0.95+
Both	QUANTITY	0.94+
third	QUANTITY	0.92+
Kabuki Hotel	LOCATION	0.9+
Kafka	TITLE	0.89+
one step	QUANTITY	0.89+
Artisan	ORGANIZATION	0.85+
Flink Forward user	EVENT	0.85+
millions of people	QUANTITY	0.85+
data Artisans	ORGANIZATION	0.82+
Flink Forward	ORGANIZATION	0.82+
2017	DATE	0.73+
Forward Conference	LOCATION	0.55+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Flink Forward user: