Kostas Tzoumas, data Artisans | Flink Forward 2018

(techno music) >> Announcer: Live, from San Francisco, it's theCUBE. Covering Flink Forward, brought to you by data Artisans. (techno music) >> Hello again everybody, this is George Gilbert, we're at the Flink Forward Conference, sponsored by data Artisans, the provider of both Apache Flink and the commercial distribution, the dA Platform that supports the productionization and operationalization of Flink, and makes it more accessible to mainstream enterprises. We're priviledged to have Kostas Tzoumas, CEO of data Artisans, with us today. Welcome Kostas. >> Thank you. Thank you George. >> So, tell us, let's start with sort of an idealized application-use case, that is in the sweet spot of Flink, and then let's talk about how that's going to broaden over time. >> Yeah, so just a little bit of an umbrella above that. So what we see very, very consistently, we see it in tech companies, and we see, so modern tech companies, and we see it in traditional enterprises that are trying to move there, is a move towards a business that runs in real time. Runs 24/7, is data-driven, so decisions are made based on data, and is software operated. So increasingly decisions are made by AI, by software, rather than someone looking at something and making a decision, yeah. So for example, some of the largest users of Apache Flink are companies like Uber, Netflix, Alibaba, Lyft, they are all working in this way. >> Can you tell us about the size of their, you know, something in terms of records per day, or cluster size, or, >> Yeah, sure. So, latest I heard, Alibaba is powering Alibaba Certs, more than a thousand nodes, terabytes of states, I'm pretty sure they will give us bigger numbers today. Netflix has reported of doing about one trillion events per day. >> George: Wow. >> On Flink. So pretty big sizes. >> So and is Netflix, I think I read, is powering their real time recommendation updates. >> They are powering a bunch of things, a bunch of applications, there's a lot of routing events internally. I think they have a talk, they had a talk definitely at the last conference, where they talk about this. And it's really a variety of use cases. It's really about building a platform, internally. And offering it to all sorts of departments in the company, be that for recommendations, be that for BI, be that for running, state of microservices, you know, all sorts of things. And we also see, the more traditional enterprise moving to this modus operandi. For example, ING is also one of our biggest partners, it's a global consumer bank based in the Netherlands, and their CEO is saying that ING is not a bank, it's a tech company that happens to have a banking license. It's a tech company that inherited a banking license. So that's how they want to operate. So what we see, is stream processing is really the enabler for this kind of business, for this kind of modern business where we interact with, in real time, they interact with the consumer in real time, they push notifications, they can change the pricing, et cetera, et cetera. So this is really the crux of stateful stream processing , for me. >> So okay, so tell us, for those who, you know, have a passing understanding of how Kafka's evolving, how Apache Spark and Structured Streaming's evolving, as distinct from, but also, Databricks. What is it about having state management that's sort of integrated, that for example, might make it easy to elastically change a cluster size by repartitioning. What can you assume about managing state internally, that makes things easier? >> Yeah, so I think really the, the sweet spot of Flink, is that if you are looking for stream process, from a stream processing engine, and for a stateful stream processing engine for that matter, Flink is the definition of this. It's the definite solution to this problem. It was created from scratch, with this in mind, it was not sort of a bolt-on on top of something else, so it's streaming from the get-go. And we have done a lot of work to make state a first-class citizen. What this means, is that in Flink programs, you can keep state that scales to terabytes, we have seen that, and you can manage this state together with your application. So Flink has this model based on check points, where you take a check point of your application and state together, and you can restart at any time from there. So it's really, the core of Flink, is around state management. >> And you manage exactly one semantics across the checkpointing? >> It's exactly once, it's application-level exactly once. We have also introduced end-to-end exactly once with Kafka. So Kafka-Flink-Kafka exactly once. So fully consistent. >> Okay so, let's drill down a little bit. What are some of the things that customers would do with an application running on a, let's say a big cluster or a couple clusters, where they want to operate both on the application logic and on the state that having it integrated you know makes much easier? >> Yeah, so it is a lot about a flipped architecture and about making operations and DevOps much, much easier. So traditionally what you would do is create, let's say a containerized stateless application and have a central centralized data store to keep all your states. What you do now, is the state becomes part of the application. So this has several benefits. It has performance benefits, it has organizational benefits in the company. >> Autonomy >> Autonomy between teams. It has, you know it gives you a lot of flexibility on what you can do with the applications, like, for example right, scaling an application. What you can do with Flink is that you have an application running with parallelism over 100 and you are getting a higher volume and you want to scale it to 500 right, so you can simply with Flink take a snapshot of the state and the application together, and then restart it at a 500 and Flink is going to resolve the state. So no need to do anything on a database. >> And then it'll reshard and Flink will reshard it. >> Will reshard and it will restart. And then one step further with the product that we have introduced, dA Platform which includes Flink, you can simply do this with one click or with one rest command. >> So, the the resharding was possible with core Flink, the Apache Flink and the dA Platform just makes it that much easier along with other operations. >> Yeah so what the dA Platform does is it gives you an API for common operational tasks, that we observed everybody that was deploying Flink at a decent scale needs to do. It abstracts, it is based on Kubernetes, but it gives you a higher-level API than Kubernetes. You can manage the application and the state together, and it gives that to you in a rest API, in a UI, et cetera. >> Okay, so in other words it's sort of like by abstracting even up from Kubernetes you might have a cluster as a first-class citizen but you're treating it almost like a single entity and then under the covers you're managing the, the things that happen across the cluster. >> So what we have in the dA Platform is a notion of a deployment which is, think of it as, I think of it as a cluster, but it's basically based on containers. So you have this notion of deployments that you can manage, (coughs) sorry, and then you have a notion of an application. And an application, is a Flink job that evolves over time. And then you have a very, you know, bird's-eye view on this. You can, when you update the code, this is the same application with updated code. You can travel through a history, you can visit the logs, and you can do common operational tasks, like as I said, rescaling, updating the code, rollbacks, replays, migrate to a new deployment target, et cetera. >> Let me ask you, outside of the big tech companies who have built much of the application management scaffolding themselves, you can democratize access to stream processing because the capabilities, you know, are not in the skill set of traditional, mainstream developers. So question, the first thing I hear from a lot of sort of newbies, or people who want to experiment, is, "Well, it's so easy to manage the state "in a shared database, even if I'm processing, "you know, continuously." Where should they make the trade-off? When is it appropriate to use a shared database? Maybe you know, for real OLTP work, and then when can you sort of scale it out and manage it integrally with the rest of the application? >> So when should we use a database and when should we use streaming, right? >> Yeah, and even if it's streaming with the embedded state. >> Yeah, that's a very good question. I think it really depends on the use case. So what we see in the market, is many enterprises start with with a use case that either doesn't scale, or it's not developer friendly enough to have these database application levels. Level separation. And then it quickly spreads out in the whole company and other teams start using it. So for example, in the work we did with ING, they started with a fraud detection application, where the idea was to load models dynamically in the application, as the data scientists are creating new models, and have a scalable fraud detection system that can handle their load. And then we have seen other teams in the company adopting processing after that. >> Okay, so that sounds like where the model becomes part of the application logic and it's a version of the application logic and then, >> The version of the model >> Is associated with the checkpoint >> Correct. >> So let me ask you then, what happens when you you're managing let's say terabytes of state across a cluster, and someone wants to query across that distributed state. Is there in Flink a query manager that, you know, knows about where all the shards are and the statistics around the shards to do a cost-based query? >> So there is a feature in Flink called queryable state that gives you the ability to do, very simple for now, queries on the state. This feature is evolving, it's in progress. And it will get more sophisticated and more production-ready over time. >> And that enables a different class of users. >> Exactly, I wouldn't, like to be frank, I wouldn't use it for complex data warehousing scenarios. That still needs a data warehouse, but you can do point queries and a few, you know, slightly more sophisticated queries. >> So this is different. This type of state would be different from like in Kafka where you can store you know the commit log for X amount of time and then replay it. This, it's in a database I assume, not in a log form and so, you have faster access. >> Exactly, and it's placed together with a log, so, you can think of the state in Flink as the materialized view of the log, at any given point in time, with various versions. >> Okay. >> And really, the way replay works is, roll back the state to a prior version and roll back the log, the input log, to that same logical time. >> Okay, so how do you see Flink spreading out, now that it's been proven in the most demanding customers, and now we have to accommodate skills, you know, where the developers and DevOps don't have quite the same distributed systems knowledge? >> Yeah, I mean we do a lot of work at data Artisans with financial services, insurance, very traditional companies, but it's definitely something that is work in progress in the sense that our product the dA Platform makes operation smarts easier. This was a common problem everywhere, this was something that tech companies solved for themselves, and we wanted to solve it for everyone else. Application development is yet another thing, and as we saw today in the last keynote, we are working together with Google and the BIM Community to bring Python, GOLD, all sorts of languages into Flink. >> Okay so that'll help at the developer level, and you're also doing work at the operations level with the platform. >> And of course there's SQL right? So Flink has Stream SQL which is standard SQL. >> And would you see, at some point, actually sort of managing the platform for customers, either on-prem or in the cloud? >> Yeah, so right now, the platform is running on Kubernetes, which means that typically the customer installs it in their clusters, in their Kubernetes clusters. Which can be either their own machines, or it can be a Kubernetes service from a cloud vendor. Moving forward I think it will be very interesting yes, to move to more hosted solutions. Make it even easier for people. >> Do you see a breakpoint or a transition between the most sophisticated customers who, either are comfortable on their own premises, or who were cloud, sort of native, from the beginning, and then sort of the rest of the mainstream? You know, what sort of applications might they move to the cloud or might coexist between on-prem and the cloud? >> Well I think it's clear that the cloud is, you know, every new business starts on the cloud, that's clear. There's a lot of enterprise that is not yet there, but there's big willingness to move there. And there's a lot of hybrid cloud solutions as well. >> Do you see mainstream customers rewriting applications because they would be so much more powerful in stream processing, or do you see them doing just new applications? >> Both, we see both. It's always easier to start with a new application, but we do see a lot of legacy applications in big companies that are not working anymore. And we see those rewritten. And very core applications, very core to the business. >> So could that be, could you be sort of the source and in an analytic processing for the continuous data and then that sort of feeds a transaction and some parameters that then feed a model? >> Yeah. >> Is that, is that a, >> Yeah. >> so in other words you could augment existing OLTP applications with analytics then inform them in real time essentially. >> Absolutely. >> Okay, 'cause that sounds like then something that people would build around what exists. >> Yeah, I mean you can do, you can think of stream processing, in a way, as transaction processing. It's not a dedicated OLTP store, but you can think of it in this flipped architecture right? Like the log is essentially the re-do log, you know, and then you create the materialized views, that's the write path, and then you have the read path, which is queryable state. This is this whole CQRS idea right? >> Yeah, Command-Query-Response. >> Exactly. >> So, this is actually interesting, and I guess this is critical, it's sort of like a new way of doing distributed databases. I know that's not the word you would choose, but it's like the derived data, managed by, sort of coming off of the state changes, then in the stream processor that goes through a single sort of append-only log, and then reading, and how do you manage consistency on the materialized views that derive data? >> Yeah, so we have seen Flink users implement that. So we have seen, you know, companies really base the complete product on the CQRS pattern. I think this is a little bit further out. Consistency-wise, Flink gives you the exactly once consistency on the write path, yeah. What we see a lot more is an architecture where there's a lot of transactional stores in the front end that are running, and then there needs to be some kind of global, of single source of truth, between all of them. And a very typical way to do that is to get these logs into a stream, and then have a Flink application that can actually scale to that. Create a single source of truth from all of these transactional stores. >> And by having, by feeding the transactional stores into this sort of hub, I presume, some cluster as a hub, and even if it's in the form of sort of a log, how can you replay it with sufficient throughput, I guess not to be a data warehouse but to, you know, have low latency for updating the derived data? And is that derived data I assume, in non-Flink products? >> Yeah, so the way it works is that, you know, you can get the change logs from the databases, you can use something like Kafka to buffer them up, and then you can use Flink for all the processing and to do the reprocessing with Flink, this is really one of the core strengths of Flink. Basically what you do is, you replay the Flink program together with the states you can get really, really high throughput reprocessing there. >> Where does the super high throughput come from? Is that because of the integration of state and logic? >> Yeah, that is because Flink is a true streaming engine. It is a high-performance streaming engine. And it manages the state, there's no tier, >> Crossing a boundary? >> no tier crossing and there's no boundary crossing when you access state. It's embedded in the Flink application. >> Okay, so that you can optimize the IO path? >> Correct. >> Okay, very, very interesting. So, it sounds like the Kafka guys, the Confluent folks, their aspirations, from the last time we talked to 'em, doesn't extend to analytics, you know, I don't know whether they want partners to do that, but it sounds like they have a similar topology, but they're, but I'm not clear how much of a first-class citizen state is, other than the log. How would you characterize the trade-offs between the two? >> Yeah, so, I mean obviously I cannot comment on Confluent, but like, what I think is that the state and the log are two very different things. You can think of the log as storage, it's a kind of hot storage because it's the most recent data but you know, you cannot query it, it's not a materialized view, right. So for me the separation is between processing state and storage. The log is is a kind of storage, so kind of message queue. State is really the active data, the real-time active data that needs to have consistency guarantees, and that's a completely different thing. >> Okay, and that's the, you're managing, it's almost like you're managing under the covers a distributed database. >> Yes, kind of. Yeah a distributed key-value store if you wish. >> Okay, okay, and then that's exposed through multiple interfaces, data stream, table. >> Data stream, table API, SQL, other languages in the future, et cetera. >> Okay, so going further down the line, how do you see the sort of use cases that are going to get you across the chasm from the big tech companies into the mainstream? >> Yeah, so we are already seeing that a lot. So we're doing a lot of work with financial services, insurance companies a lot of very traditional businesses. And it's really a lot about maintaining single source of truth, becoming more real-time in the way they interact with the outside world, and the customer, like they do see the need to transform. If we take financial services and investment banks for example, there is a big push in this industry to modernize the IT infrastructure, to get rid of legacy, to adopt modern solutions, become more real-time, et cetera. >> And so they really needed this, like the application platform, the dA Platform, because operationalizing what Netflix did isn't going to be very difficult maybe for non-tech companies. >> Yeah, I mean, you know, it's always a trade-off right, and you know for some, some companies build, some companies buy, and for many companies it's much more sensible to buy. That's why we have software products. And really, our motivation was that we worked in the open-source Flink community with all the big tech companies. We saw their successes, we saw what they built, we saw, you know, their failures. We saw everything and we decided to build this for everybody else, for everyone that, you know, is not Netflix, is not Uber, cannot hire software developers so easily, or with such good quality. >> Okay, alright, on that note, Kostas, we're going to have to end it, and to be continued, one with Stefan next, apparently. >> Nice. >> And then hopefully next year as well. >> Nice. Thank you. >> Alright, thanks Kostas. >> Thank you George. Alright, we're with Kostas Tzoumas, CEO of data Artisans, the company behind Apache Flink and now the application platform that makes Flink run for mainstream enterprises. We will be back, after this short break. (techno music)

Published Date : Apr 11 2018

SUMMARY :

Covering Flink Forward, brought to you by data Artisans. and makes it more accessible to mainstream enterprises. Thank you George. application-use case, that is in the sweet spot of Flink, So for example, some of the largest users of Apache Flink I'm pretty sure they will give us bigger numbers today. So pretty big sizes. So and is Netflix, I think I read, is powering it's a tech company that happens to have a banking license. So okay, so tell us, for those who, you know, and you can restart at any time from there. We have also introduced end-to-end exactly once with Kafka. and on the state that having it integrated So traditionally what you would do is and you want to scale it to 500 right, which includes Flink, you can simply do this with one click So, the the resharding was possible with and it gives that to you in a rest API, in a UI, et cetera. you might have a cluster as a first-class citizen and you can do common operational tasks, because the capabilities, you know, are not in the skill set So for example, in the work we did with ING, and the statistics around the shards that gives you the ability to do, but you can do point queries and a few, you know, where you can store you know the commit log so, you can think of the state in Flink and roll back the log, the input log, in the sense that our product the dA Platform at the operations level with the platform. And of course there's SQL right? Yeah, so right now, the platform is running on Kubernetes, Well I think it's clear that the cloud is, you know, It's always easier to start with a new application, so in other words you could augment Okay, 'cause that sounds like then something that's the write path, and then you have the read path, I know that's not the word you would choose, So we have seen, you know, companies Yeah, so the way it works is that, you know, And it manages the state, there's no tier, It's embedded in the Flink application. doesn't extend to analytics, you know, but you know, you cannot query it, Okay, and that's the, you're managing, it's almost like Yeah a distributed key-value store if you wish. Okay, okay, and then that's exposed other languages in the future, et cetera. and the customer, like they do see the need to transform. like the application platform, the dA Platform, and you know for some, some companies build, and to be continued, one with Stefan next, apparently. and now the application platform

ENTITIES

Entity	Category	Confidence
Alibaba	ORGANIZATION	0.99+
Netflix	ORGANIZATION	0.99+
Uber	ORGANIZATION	0.99+
ING	ORGANIZATION	0.99+
George	PERSON	0.99+
George Gilbert	PERSON	0.99+
Kostas Tzoumas	PERSON	0.99+
Google	ORGANIZATION	0.99+
Lyft	ORGANIZATION	0.99+
Kostas	PERSON	0.99+
Stefan	PERSON	0.99+
San Francisco	LOCATION	0.99+
Flink	ORGANIZATION	0.99+
next year	DATE	0.99+
Netherlands	LOCATION	0.99+
two	QUANTITY	0.99+
Both	QUANTITY	0.99+
Kafka	TITLE	0.99+
both	QUANTITY	0.99+
one click	QUANTITY	0.99+
Python	TITLE	0.99+
SQL	TITLE	0.98+
first thing	QUANTITY	0.98+
more than a thousand nodes	QUANTITY	0.98+
Kubernetes	TITLE	0.98+
500	QUANTITY	0.98+
today	DATE	0.98+
one	QUANTITY	0.98+
Confluent	ORGANIZATION	0.97+
Artisans	ORGANIZATION	0.96+
single source	QUANTITY	0.96+
2018	DATE	0.96+
over 100	QUANTITY	0.95+
dA Platform	TITLE	0.95+
Flink	TITLE	0.94+
about one trillion events per day	QUANTITY	0.94+
Apache Flink	ORGANIZATION	0.93+
single	QUANTITY	0.92+
Flink Forward Conference	EVENT	0.9+
one step	QUANTITY	0.9+
Apache	ORGANIZATION	0.89+
Databricks	ORGANIZATION	0.89+
Kafka	PERSON	0.88+
first	QUANTITY	0.86+
single entity	QUANTITY	0.84+
one rest command	QUANTITY	0.82+

Stephan Ewen | Flink Forward 2017

(click) >> Welcome, everyone, we're back at the Flink Forward user conference sponsored by data Artisan's folks. This is the first U.S. based Flink user conference, and we are on the ground at the Kabuki Hotel in San Francisco. We have a special guest, Stephan Ewen, who is one of the founders of data Artisans, and one of the creators of Flink. He is CTO, and he is in a position to shed some unique light on the direction of the company and the product. Welcome, Stephan. >> Yeah, so you were asking about how can stream processing or how can Flink and data Artisans help companies that are enterprises that want to adopt this kind of technologies actually do that despite the fact that we've been seeing, if we look at what the big internet companies that first adopted these technologies, what they had to do, they had to go through all this big process of productionizing these things by integrating them with so many other systems, making sure everything fits together, everything kind of works as one piece. What can we do there? So I think there are a few interesting points to that. Let's maybe start with stream processing in general. So, stream processing by itself has actually the potential to simplify many of these setups and infrastructures, per se. There's multiple dimensions to that. First of all, the ability to just more naturally fit what you're doing to what is actually happening. Let me qualify that a little bit. All these companies that are dealing with big data are dealing with data that is typically continuously produced from sensors, from user devices, from server logs, from all these things, right? Which is quite naturally a stream. And processing this with systems that give you the abstraction of a stream is a much more natural fit, so you eliminate bunches of the pipeline that do, for example, try to do periodic ingestion, and then grooming that into a video file and data sets and periodic processing of that and you can for example, get rid of a lot of these things. You kind of get a paradigm that unifies the processing of real time data and also historic data. So this by itself is an interesting development that I think many have recognized and that's why they're excited about stream processing because it helps reduce a lot of that complexity. So that is one side to it. The other side to it is that there was always kind of an interplay between the processing on the data and then you want to do something with these insights, right, you don't process the data just for the fun of processing, right? Usually the outcome infers to something. Sometimes it's just a report, but sometimes it's something that immediately affects how certain services react. For example, how they apply their decisions in classifying transactions as frauds or how to send out alerts, how to trigger certain actions. The interesting thing is then, we're going to see actually a little more of that later in this conference also, is that in this reprocessing paradigm there's this very natural way for these online live applications and the analytical applications to march together, again, reducing a bunch of this complexity. Another thing that is happening that I think is very, very powerful and helping (mumbles) in bringing these kind of technologies to a broader anchor system is actually how the whole deployment stick is growing. So we see actually more and more users converging onto recessed management infrastructures. Yan was an interesting first step to make it really easy and once you've productionized that part of productionized voice systems but even beyond that, like the uptake of mezas, the uptake of containment engines like (mumbles) on the ability to just prepare more functionality buttoned together out of the box, it doesn't pack into a container of what you need and put it into a repository and then various people can bring up these services without having to go through all of the set up and integration work, it can kind of way better templated integration with systems with this kind of technology. So those seem to be helping a lot for much broader adoption of these kind of technologies. Both stream processing as an easier paradigm, fewer moving parts, and developments and (mumbles) technologies. >> So let me see if I can repeat back just a summary version, which is stream processing is more natural to how the data is generated, and so we want to match the processing to how it originates, it flows. At the same time, if we do more of that, that becomes a workload or an application pattern that then becomes more familiar to more people who didn't grow up in a continuous processing environment. But also, it has a third capability of reducing the latency between originating or adjusting the data and getting an analysis that informs a decision whether by a person or a machine. Would that be a >> Yeah, you can even go one step further, it's not just about introducing the latency from the analysis to the decision. In many cases you can actually see that the part that does the analysis in the decision just merge and become one thing which makes it much fewer moving parts, less integration work, less, yeah, less maintenance and complexity. >> Okay, and this would be like, for example, how application databases are taking on the capabilities of analytic databases to some extent, or how stream processors can have machine learning whether they're doing online learning or calling a model that they're going to score in real time or even a pre scored model, is that another example of where we put? >> You can think of those as examples, yeah. A nice way to think about it is that if you look at what a lot of what the analytical applications do versus let's say, just online services that measure offers and trades, or to generate alerts. A lot of those kind of are, in some sense, different ways of just reacting to events, right? If you are receiving some real time data and just want to process these interact with some form of knowledge that you accumulated over the past, or some form of knowledge that you've accumulated from some other inputs and then react to that. That kind of paradigm which is in the core of stream processing for (mumbles) is so generic that it covers many of these use cases, both building directly applications, as we have actually seen, we have seen users that directly build a social network on Flink, where the events that they receive are, you know, a user being created, a user joining a group and so on, and it also covers the analytics of just saying, you know, I have a stream of sensor data and on certain outliers I want to raise alerts. It's so similar once you start thinking about both of them as just handling streams of events, in this flexible fashion that it helps to just bring together many things. >> So, that sounds like it would play into the notion of, micro services where the service is responsible for its own state, and they communicate with each other asynchronously, so you have a cooperating collection of components. Now, there are a lot of people who grew up with databases out here sharing the state among modules of applications. What might drive the growth of this new pattern, the microservices, for, you know, considering that there's millions of people who just know how to use databases to build apps. >> The interesting part that I think drives this new adaption is that it's such a natural fit for the microservice world. So how do you deploy microservices with state, right? You can have a central database with which you work and every time you create a new service you have to make sure that it fits with the capacities and capabilities of the database, you have to make sure that the group that runs this database is okay with the additional load that, or you can go to the different model where each microservice comes up with its own database, but that time, every time you deploy one and that may be a new service or it may just be experimenting with a different variation of the service they'd be testing. You'd have to bring out a completely new thing. In this interesting world of stream processing, stateful stream processing is done by Flink state is embedded directly in the processing application. So, you actually don't worry about this thing separately, you just deploy that one thing, and it brings both together tightly integrated, and it's a natural fit, right, the working set of your application goes with your application. If it deployed, if it's (mumbles), if you bring it down, these things go away. What the central part in this thing is it's nothing more than if you wish a back up store where it would take these snapshots of microservices and store them in order to recover them from catastrophic failures in order to just have an historic version to look into if you figure it out later, you know, something happened, and was this introduced in the last week, let me look at what it looked like the week before or to just migrate it to a different cluster. >> So, we're going to have to cut things short in a moment, but I wanted to ask you one last question: If like, microservices as a sweet spot and sort of near real time decisions are also a sweet spot for Kafka, what might we expect to see in terms of a roadmap that helps make those, either that generalizes those cases, or that opens up new use cases? >> Yes, so, what we're immediately working on in Flink right now is definitely extending the support in this area for the ability to keep much larger state in these applications, so state that really goes into the multiple terrabytes per service, functionality that allows us to manage this, even easier to evolve this, you know. If the application actually starts owning the state and it's not in a centralized database anymore, you start needing a little bit of tooling around this state, similar as the tooling you need in databases, a (mumbles) in all of that, so things that actually make that part easier. Handling (mumbles) and we're actually looking into what are the API's that users actually want in this area, so Flink has I think pretty stellar stream processing API's and if you've seen in the last release, we've actually started adding more low level API's one could even think, API's in which you don't think as streams as distributed collections and windows but to just think about the very basic in gradiances, events, state, time and snapshots, so more control and more flexibility by just taking directly the basic building blocks rather than more high level abstractions. I think you can expect more evolution on that layer, definitely in the near future. >> Alright, Stephan, we have to leave it at that, and hopefully to pick up the conversation not too long in the future, we are at the Flink Forward Conference at the Kabuki Hotel in San Francisco, and we will be back with more just after a few moments. (funky music)

Published Date : Apr 15 2017

SUMMARY :

and one of the creators of Flink. First of all, the ability to just more naturally that then becomes more familiar to more people that does the analysis in the decision just merge and it also covers the analytics of just saying, you know, the microservices, for, you know, and capabilities of the database, similar as the tooling you need in databases, a (mumbles) and hopefully to pick up the conversation

ENTITIES

Entity	Category	Confidence
Stephan	PERSON	0.99+
Stephan Ewen	PERSON	0.99+
Flink	ORGANIZATION	0.99+
San Francisco	LOCATION	0.99+
one	QUANTITY	0.99+
last week	DATE	0.99+
first step	QUANTITY	0.99+
one piece	QUANTITY	0.99+
both	QUANTITY	0.98+
U.S.	LOCATION	0.98+
one side	QUANTITY	0.98+
first	QUANTITY	0.98+
each microservice	QUANTITY	0.98+
one thing	QUANTITY	0.97+
First	QUANTITY	0.97+
one last question	QUANTITY	0.95+
Both	QUANTITY	0.94+
third	QUANTITY	0.92+
Kabuki Hotel	LOCATION	0.9+
Kafka	TITLE	0.89+
one step	QUANTITY	0.89+
Artisan	ORGANIZATION	0.85+
Flink Forward user	EVENT	0.85+
millions of people	QUANTITY	0.85+
data Artisans	ORGANIZATION	0.82+
Flink Forward	ORGANIZATION	0.82+
2017	DATE	0.73+
Forward Conference	LOCATION	0.55+

Xiaowei Jiang | Flink Forward 2017

>> Welcome everyone, we're back at the first Flink Forward Conference in the U.S. It's the Flink User Conference sponsored by Data Artisans, the creators of Apache Flink. We're on the ground at the Kabuki Hotel, and we've heard some very high-impact customer presentations this morning, including Uber and Netflix. And we have the great honor to have Xiaowei Jiang from Alibaba with us. He's Senior Director of Research, and what's so special about having him as our guest is that they have the largest Flink cluster in operation in the world that we know of, and that the Flink folks know of as well. So welcome, Xiaowei. >> Thanks for having me. >> So we gather you have a 1,500 node cluster running Flink. Let's sort of unpack how you got there. What were some of the use cases that drove you in the direction of Flink and complementary technologies to build with? >> Okay, I explain a few use cases. The first use case that prompted us to look into Flink is the classical Soch ETL case. Where basically it needs to process all the data that's necessary for such series. So we look into Flink about two years ago. The next case we use is the A/B testing framework which is used to evaluate how your machine learning model works. So, today we using a few other very interesting case, like we are using to do machine learning to adjust ranking of search results, to personalize your search results at real-time to deliver the best search results for our user. We are also using to do real-time anti-fraud detection for ads. So these are the typical use case we are doing. >> Okay, this is very interesting because with the ads, and the one before that, was it fraud? >> Ads is anti-fraud. Before that is machine learning, real-time machine learning. >> So for those, low latency is very important. Now, help unpack that. Are you doing the training for these models like in a central location and then pushing the models out close to where they're going to be used for like the near real-time decisions? Or is that all run in the same cluster? >> Yeah, so basically we are doing two things. We use Flink to do real-time feature update which change the feature at the real-time, like in a few seconds. So for example, when a user buy a product, the inventory needs to be updated. Such features get reflected in the ranking of search results to real-time. We also use it to do real-time trending of the model itself. This becomes important in some special events. For example, on China Singles Day which is the largest shopping holiday in China, it generates more revenue than Black Friday in United States already. On such a day, because things go on sale for almost 50 percent off, the user's behavior changes a lot. So whatever model you trend before does not work reliably. So it's really nice to have a way to adjust the model at real-time to deliver a best experience to our users. All this is actually running in the same cluster. >> OK, that's really interesting. So, it's like you have a multi-tenant solution that sounds like it's rather resource intensive. >> Yes. >> When you're changing a feature, or features, in the models, how do you go through the process of evaluating them and finding out their efficacy before you put them into production? >> Yeah, so this is exactly the A/B testing framework I just mentioned earlier. >> George: Okay. >> So, we also use Flink to track the metrics, the performance of these models at real time. Once these data are processed, we upload them into our Olark system so we can see the performance of the models at real time. >> Okay. Very, very impressive. So, explain perhaps why Flink was appropriate for those use cases. Is it because you really needed super low latency, or that you wanted less resource-intensive sort of streaming engine to support these? What made it fit that right sweet spot? >> Yeah, so Soch has lots of different products. They have lots of different data processing needs, so when we looked into all these needs, we quickly realized we actually need a computer engine that can do both batch processing and streaming processing. And in terms of streaming processing, we have a few needs. For example, we really need super low latency. So in some cases, for example, if a product is sold out, and is still displaying in your search results, when users click and try to buy they cannot buy it. It's a bad experience. So, the sooner you can get the data processed, the better. So with- >> So near real-time for you means, how many milliseconds does the- >> It's usually like a second. One second, something like that. >> But that's one second end to end talking to inventory. >> That's right. >> How much time would the model itself have to- >> Oh, it's very short. Yeah. >> George: In the single digit milliseconds? >> It's probably around that. There are some scenarios that require single digit milliseconds. Like a security scenario; that's something we are currently looking into. So when you do transactions in our site, we need to detect if it's a fraud transaction. We want to be able to block such transactions at real-time. For that to happen, we really need a latency that's below 10 millisecond. So when we're looking at computer engines, this is also one of the requirements we will think about. So we really need a computer engine which is able to deliver sub-second latency if necessary, and at the same time can also do batch efficiently. So we are looking for solutions that can cover all our computation needs. >> So one way of looking at it is many vendors and customers talk about elasticity as in the size of the cluster, but you're talking about elasticity or scaling in terms of latency. >> Yes, latency and the way of doing computation. So you can view the security in the scenario as super restrict on the latency requirement, but view Apache as most relaxed version of latency requirement. We want a full spectrum; it's a part of the full spectrum. It's possible that you can use different engines for each scenario; but which means you are required to maintain more code bases, which can be a headache. And we believe it's possible to have a single solution that works for all these use cases. >> So, okay, last question. Help us understand, for mainstream customers who don't hire the top Ph.D's out of the Chinese universities but who have skilled data scientists but not an unending supply, and aspire to build solutions like this; tell us some of the trade-offs they should consider given that, you know, the skillset and the bench strength is very deep at Alibaba, and it's perhaps not as widely disseminated or dispersed within a mainstream enterprise. How should they think about the trade-offs in terms of the building blocks for this type of system? >> Yeah, that's a very good question. So we actually thought about this. So, initially what we did is we were using data set and data string API, which is a relatively lower level API. So to develop an application with this is reasonable, but it still requires some skill. So we want a way to make it even simpler, for example, to make it possible for data scientists to do this. And so in the last half a year, we spent a lot of time working on Tableau API and SQL Support, which basically tries to describe your computation logic or data processing logic using SQL. SQL is used widely, so a lot of people have experience in it. So we are hoping with this approach, it will greatly lower the threshold of how people to use Flink. At the same time, SQL is also a nice way to unify the streaming processing and the batch processing. With SQL, you only need to write your process logic once. You can run it in different modes. >> So, okay this is interesting because some of the Flink folks say you know, structured streaming, which is a table construct with dataframes in Spark, is not a natural way to think about streaming. And yet, the Spark guys say hey that's what everyone's comfortable with. We'll live with probabilistic answers instead of deterministic answers, because we might have late arrivals in the data. But it sounds like there's a feeling in the Flink community that you really do want to work with tables despite their shortcomings, because so many people understand them. >> So ease of use is definitely one of the strengths of SQL, and the other strength of SQL is it's very descriptive. The user doesn't need to say exactly how you do the computation, but it just tells you what I want to get. This gives the framework a lot of freedom in optimization. So users don't need to worry about hard details to optimize their code. It lets the system do its work. At the same time, I think that deterministic things can be achieved in SQL. It just means the framework needs to handle such kind things correctly with the implementation of SQL. >> Okay. >> When using SQL, you are not really sacrificing such determinism. >> Okay. This is, we'll have to save this for a follow-up conversation, because there's more to unpack there. But Xiaowei Jiang, thank you very much for joining us and imparting some of the wisdom from Alibaba. We are on the ground at Flink Forward, the Data Artisans conference for the Flink community at the Kabuki hotel in San, Francisco; and we'll be right back.

Published Date : Apr 14 2017

SUMMARY :

and that the Flink folks know of as well. So we gather you have a 1,500 node cluster running Flink. So these are the typical use case we are doing. Before that is machine learning, Or is that all run in the same cluster? adjust the model at real-time to deliver a best experience So, it's like you have a multi-tenant solution Yeah, so this is exactly the A/B testing framework of the models at real time. or that you wanted less resource-intensive So, the sooner you can get the data processed, the better. It's usually like a second. Oh, it's very short. For that to happen, we really need a latency as in the size of the cluster, So you can view the security in the scenario in terms of the building blocks for this type of system? So we are hoping with this approach, because some of the Flink folks say It just means the framework needs to handle you are not really sacrificing such determinism. and imparting some of the wisdom from Alibaba.

ENTITIES

Entity	Category	Confidence
George	PERSON	0.99+
China	LOCATION	0.99+
Xiaowei Jiang	PERSON	0.99+
One second	QUANTITY	0.99+
Alibaba	ORGANIZATION	0.99+
United States	LOCATION	0.99+
SQL	TITLE	0.99+
Uber	ORGANIZATION	0.99+
one	QUANTITY	0.99+
two things	QUANTITY	0.99+
Xiaowei	PERSON	0.99+
Netflix	ORGANIZATION	0.99+
San, Francisco	LOCATION	0.99+
Tableau	TITLE	0.99+
one second	QUANTITY	0.99+
Flink	ORGANIZATION	0.99+
Black Friday	EVENT	0.98+
Soch	ORGANIZATION	0.98+
each scenario	QUANTITY	0.98+
today	DATE	0.98+
Spark	TITLE	0.98+
both	QUANTITY	0.98+
first use case	QUANTITY	0.97+
U.S.	LOCATION	0.97+
Data Artisans	ORGANIZATION	0.96+
China Singles Day	EVENT	0.96+
almost 50 percent	QUANTITY	0.96+
Flink Forward Conference	EVENT	0.95+
single solution	QUANTITY	0.94+
Flink User Conference	EVENT	0.93+
two years ago	DATE	0.93+
Flink Forward	EVENT	0.91+
below 10 millisecond	QUANTITY	0.91+
Apache Flink	ORGANIZATION	0.91+
1,500 node	QUANTITY	0.91+
Chinese	OTHER	0.9+
second	QUANTITY	0.89+
single	QUANTITY	0.88+
this morning	DATE	0.88+
Kabuki Hotel	LOCATION	0.87+
last half a year	DATE	0.87+
Apache	ORGANIZATION	0.85+
Olark	ORGANIZATION	0.83+
single digit	QUANTITY	0.77+
Data Artisans conference	EVENT	0.77+
Flink	TITLE	0.74+
2017	DATE	0.74+
first	QUANTITY	0.65+
seconds	QUANTITY	0.62+
Kabuki	ORGANIZATION	0.56+
people	QUANTITY	0.51+
lot	QUANTITY	0.47+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Forward Conference: