Kostas Tzoumas, data Artisans | Flink Forward 2018
(techno music) >> Announcer: Live, from San Francisco, it's theCUBE. Covering Flink Forward, brought to you by data Artisans. (techno music) >> Hello again everybody, this is George Gilbert, we're at the Flink Forward Conference, sponsored by data Artisans, the provider of both Apache Flink and the commercial distribution, the dA Platform that supports the productionization and operationalization of Flink, and makes it more accessible to mainstream enterprises. We're priviledged to have Kostas Tzoumas, CEO of data Artisans, with us today. Welcome Kostas. >> Thank you. Thank you George. >> So, tell us, let's start with sort of an idealized application-use case, that is in the sweet spot of Flink, and then let's talk about how that's going to broaden over time. >> Yeah, so just a little bit of an umbrella above that. So what we see very, very consistently, we see it in tech companies, and we see, so modern tech companies, and we see it in traditional enterprises that are trying to move there, is a move towards a business that runs in real time. Runs 24/7, is data-driven, so decisions are made based on data, and is software operated. So increasingly decisions are made by AI, by software, rather than someone looking at something and making a decision, yeah. So for example, some of the largest users of Apache Flink are companies like Uber, Netflix, Alibaba, Lyft, they are all working in this way. >> Can you tell us about the size of their, you know, something in terms of records per day, or cluster size, or, >> Yeah, sure. So, latest I heard, Alibaba is powering Alibaba Certs, more than a thousand nodes, terabytes of states, I'm pretty sure they will give us bigger numbers today. Netflix has reported of doing about one trillion events per day. >> George: Wow. >> On Flink. So pretty big sizes. >> So and is Netflix, I think I read, is powering their real time recommendation updates. >> They are powering a bunch of things, a bunch of applications, there's a lot of routing events internally. I think they have a talk, they had a talk definitely at the last conference, where they talk about this. And it's really a variety of use cases. It's really about building a platform, internally. And offering it to all sorts of departments in the company, be that for recommendations, be that for BI, be that for running, state of microservices, you know, all sorts of things. And we also see, the more traditional enterprise moving to this modus operandi. For example, ING is also one of our biggest partners, it's a global consumer bank based in the Netherlands, and their CEO is saying that ING is not a bank, it's a tech company that happens to have a banking license. It's a tech company that inherited a banking license. So that's how they want to operate. So what we see, is stream processing is really the enabler for this kind of business, for this kind of modern business where we interact with, in real time, they interact with the consumer in real time, they push notifications, they can change the pricing, et cetera, et cetera. So this is really the crux of stateful stream processing , for me. >> So okay, so tell us, for those who, you know, have a passing understanding of how Kafka's evolving, how Apache Spark and Structured Streaming's evolving, as distinct from, but also, Databricks. What is it about having state management that's sort of integrated, that for example, might make it easy to elastically change a cluster size by repartitioning. What can you assume about managing state internally, that makes things easier? >> Yeah, so I think really the, the sweet spot of Flink, is that if you are looking for stream process, from a stream processing engine, and for a stateful stream processing engine for that matter, Flink is the definition of this. It's the definite solution to this problem. It was created from scratch, with this in mind, it was not sort of a bolt-on on top of something else, so it's streaming from the get-go. And we have done a lot of work to make state a first-class citizen. What this means, is that in Flink programs, you can keep state that scales to terabytes, we have seen that, and you can manage this state together with your application. So Flink has this model based on check points, where you take a check point of your application and state together, and you can restart at any time from there. So it's really, the core of Flink, is around state management. >> And you manage exactly one semantics across the checkpointing? >> It's exactly once, it's application-level exactly once. We have also introduced end-to-end exactly once with Kafka. So Kafka-Flink-Kafka exactly once. So fully consistent. >> Okay so, let's drill down a little bit. What are some of the things that customers would do with an application running on a, let's say a big cluster or a couple clusters, where they want to operate both on the application logic and on the state that having it integrated you know makes much easier? >> Yeah, so it is a lot about a flipped architecture and about making operations and DevOps much, much easier. So traditionally what you would do is create, let's say a containerized stateless application and have a central centralized data store to keep all your states. What you do now, is the state becomes part of the application. So this has several benefits. It has performance benefits, it has organizational benefits in the company. >> Autonomy >> Autonomy between teams. It has, you know it gives you a lot of flexibility on what you can do with the applications, like, for example right, scaling an application. What you can do with Flink is that you have an application running with parallelism over 100 and you are getting a higher volume and you want to scale it to 500 right, so you can simply with Flink take a snapshot of the state and the application together, and then restart it at a 500 and Flink is going to resolve the state. So no need to do anything on a database. >> And then it'll reshard and Flink will reshard it. >> Will reshard and it will restart. And then one step further with the product that we have introduced, dA Platform which includes Flink, you can simply do this with one click or with one rest command. >> So, the the resharding was possible with core Flink, the Apache Flink and the dA Platform just makes it that much easier along with other operations. >> Yeah so what the dA Platform does is it gives you an API for common operational tasks, that we observed everybody that was deploying Flink at a decent scale needs to do. It abstracts, it is based on Kubernetes, but it gives you a higher-level API than Kubernetes. You can manage the application and the state together, and it gives that to you in a rest API, in a UI, et cetera. >> Okay, so in other words it's sort of like by abstracting even up from Kubernetes you might have a cluster as a first-class citizen but you're treating it almost like a single entity and then under the covers you're managing the, the things that happen across the cluster. >> So what we have in the dA Platform is a notion of a deployment which is, think of it as, I think of it as a cluster, but it's basically based on containers. So you have this notion of deployments that you can manage, (coughs) sorry, and then you have a notion of an application. And an application, is a Flink job that evolves over time. And then you have a very, you know, bird's-eye view on this. You can, when you update the code, this is the same application with updated code. You can travel through a history, you can visit the logs, and you can do common operational tasks, like as I said, rescaling, updating the code, rollbacks, replays, migrate to a new deployment target, et cetera. >> Let me ask you, outside of the big tech companies who have built much of the application management scaffolding themselves, you can democratize access to stream processing because the capabilities, you know, are not in the skill set of traditional, mainstream developers. So question, the first thing I hear from a lot of sort of newbies, or people who want to experiment, is, "Well, it's so easy to manage the state "in a shared database, even if I'm processing, "you know, continuously." Where should they make the trade-off? When is it appropriate to use a shared database? Maybe you know, for real OLTP work, and then when can you sort of scale it out and manage it integrally with the rest of the application? >> So when should we use a database and when should we use streaming, right? >> Yeah, and even if it's streaming with the embedded state. >> Yeah, that's a very good question. I think it really depends on the use case. So what we see in the market, is many enterprises start with with a use case that either doesn't scale, or it's not developer friendly enough to have these database application levels. Level separation. And then it quickly spreads out in the whole company and other teams start using it. So for example, in the work we did with ING, they started with a fraud detection application, where the idea was to load models dynamically in the application, as the data scientists are creating new models, and have a scalable fraud detection system that can handle their load. And then we have seen other teams in the company adopting processing after that. >> Okay, so that sounds like where the model becomes part of the application logic and it's a version of the application logic and then, >> The version of the model >> Is associated with the checkpoint >> Correct. >> So let me ask you then, what happens when you you're managing let's say terabytes of state across a cluster, and someone wants to query across that distributed state. Is there in Flink a query manager that, you know, knows about where all the shards are and the statistics around the shards to do a cost-based query? >> So there is a feature in Flink called queryable state that gives you the ability to do, very simple for now, queries on the state. This feature is evolving, it's in progress. And it will get more sophisticated and more production-ready over time. >> And that enables a different class of users. >> Exactly, I wouldn't, like to be frank, I wouldn't use it for complex data warehousing scenarios. That still needs a data warehouse, but you can do point queries and a few, you know, slightly more sophisticated queries. >> So this is different. This type of state would be different from like in Kafka where you can store you know the commit log for X amount of time and then replay it. This, it's in a database I assume, not in a log form and so, you have faster access. >> Exactly, and it's placed together with a log, so, you can think of the state in Flink as the materialized view of the log, at any given point in time, with various versions. >> Okay. >> And really, the way replay works is, roll back the state to a prior version and roll back the log, the input log, to that same logical time. >> Okay, so how do you see Flink spreading out, now that it's been proven in the most demanding customers, and now we have to accommodate skills, you know, where the developers and DevOps don't have quite the same distributed systems knowledge? >> Yeah, I mean we do a lot of work at data Artisans with financial services, insurance, very traditional companies, but it's definitely something that is work in progress in the sense that our product the dA Platform makes operation smarts easier. This was a common problem everywhere, this was something that tech companies solved for themselves, and we wanted to solve it for everyone else. Application development is yet another thing, and as we saw today in the last keynote, we are working together with Google and the BIM Community to bring Python, GOLD, all sorts of languages into Flink. >> Okay so that'll help at the developer level, and you're also doing work at the operations level with the platform. >> And of course there's SQL right? So Flink has Stream SQL which is standard SQL. >> And would you see, at some point, actually sort of managing the platform for customers, either on-prem or in the cloud? >> Yeah, so right now, the platform is running on Kubernetes, which means that typically the customer installs it in their clusters, in their Kubernetes clusters. Which can be either their own machines, or it can be a Kubernetes service from a cloud vendor. Moving forward I think it will be very interesting yes, to move to more hosted solutions. Make it even easier for people. >> Do you see a breakpoint or a transition between the most sophisticated customers who, either are comfortable on their own premises, or who were cloud, sort of native, from the beginning, and then sort of the rest of the mainstream? You know, what sort of applications might they move to the cloud or might coexist between on-prem and the cloud? >> Well I think it's clear that the cloud is, you know, every new business starts on the cloud, that's clear. There's a lot of enterprise that is not yet there, but there's big willingness to move there. And there's a lot of hybrid cloud solutions as well. >> Do you see mainstream customers rewriting applications because they would be so much more powerful in stream processing, or do you see them doing just new applications? >> Both, we see both. It's always easier to start with a new application, but we do see a lot of legacy applications in big companies that are not working anymore. And we see those rewritten. And very core applications, very core to the business. >> So could that be, could you be sort of the source and in an analytic processing for the continuous data and then that sort of feeds a transaction and some parameters that then feed a model? >> Yeah. >> Is that, is that a, >> Yeah. >> so in other words you could augment existing OLTP applications with analytics then inform them in real time essentially. >> Absolutely. >> Okay, 'cause that sounds like then something that people would build around what exists. >> Yeah, I mean you can do, you can think of stream processing, in a way, as transaction processing. It's not a dedicated OLTP store, but you can think of it in this flipped architecture right? Like the log is essentially the re-do log, you know, and then you create the materialized views, that's the write path, and then you have the read path, which is queryable state. This is this whole CQRS idea right? >> Yeah, Command-Query-Response. >> Exactly. >> So, this is actually interesting, and I guess this is critical, it's sort of like a new way of doing distributed databases. I know that's not the word you would choose, but it's like the derived data, managed by, sort of coming off of the state changes, then in the stream processor that goes through a single sort of append-only log, and then reading, and how do you manage consistency on the materialized views that derive data? >> Yeah, so we have seen Flink users implement that. So we have seen, you know, companies really base the complete product on the CQRS pattern. I think this is a little bit further out. Consistency-wise, Flink gives you the exactly once consistency on the write path, yeah. What we see a lot more is an architecture where there's a lot of transactional stores in the front end that are running, and then there needs to be some kind of global, of single source of truth, between all of them. And a very typical way to do that is to get these logs into a stream, and then have a Flink application that can actually scale to that. Create a single source of truth from all of these transactional stores. >> And by having, by feeding the transactional stores into this sort of hub, I presume, some cluster as a hub, and even if it's in the form of sort of a log, how can you replay it with sufficient throughput, I guess not to be a data warehouse but to, you know, have low latency for updating the derived data? And is that derived data I assume, in non-Flink products? >> Yeah, so the way it works is that, you know, you can get the change logs from the databases, you can use something like Kafka to buffer them up, and then you can use Flink for all the processing and to do the reprocessing with Flink, this is really one of the core strengths of Flink. Basically what you do is, you replay the Flink program together with the states you can get really, really high throughput reprocessing there. >> Where does the super high throughput come from? Is that because of the integration of state and logic? >> Yeah, that is because Flink is a true streaming engine. It is a high-performance streaming engine. And it manages the state, there's no tier, >> Crossing a boundary? >> no tier crossing and there's no boundary crossing when you access state. It's embedded in the Flink application. >> Okay, so that you can optimize the IO path? >> Correct. >> Okay, very, very interesting. So, it sounds like the Kafka guys, the Confluent folks, their aspirations, from the last time we talked to 'em, doesn't extend to analytics, you know, I don't know whether they want partners to do that, but it sounds like they have a similar topology, but they're, but I'm not clear how much of a first-class citizen state is, other than the log. How would you characterize the trade-offs between the two? >> Yeah, so, I mean obviously I cannot comment on Confluent, but like, what I think is that the state and the log are two very different things. You can think of the log as storage, it's a kind of hot storage because it's the most recent data but you know, you cannot query it, it's not a materialized view, right. So for me the separation is between processing state and storage. The log is is a kind of storage, so kind of message queue. State is really the active data, the real-time active data that needs to have consistency guarantees, and that's a completely different thing. >> Okay, and that's the, you're managing, it's almost like you're managing under the covers a distributed database. >> Yes, kind of. Yeah a distributed key-value store if you wish. >> Okay, okay, and then that's exposed through multiple interfaces, data stream, table. >> Data stream, table API, SQL, other languages in the future, et cetera. >> Okay, so going further down the line, how do you see the sort of use cases that are going to get you across the chasm from the big tech companies into the mainstream? >> Yeah, so we are already seeing that a lot. So we're doing a lot of work with financial services, insurance companies a lot of very traditional businesses. And it's really a lot about maintaining single source of truth, becoming more real-time in the way they interact with the outside world, and the customer, like they do see the need to transform. If we take financial services and investment banks for example, there is a big push in this industry to modernize the IT infrastructure, to get rid of legacy, to adopt modern solutions, become more real-time, et cetera. >> And so they really needed this, like the application platform, the dA Platform, because operationalizing what Netflix did isn't going to be very difficult maybe for non-tech companies. >> Yeah, I mean, you know, it's always a trade-off right, and you know for some, some companies build, some companies buy, and for many companies it's much more sensible to buy. That's why we have software products. And really, our motivation was that we worked in the open-source Flink community with all the big tech companies. We saw their successes, we saw what they built, we saw, you know, their failures. We saw everything and we decided to build this for everybody else, for everyone that, you know, is not Netflix, is not Uber, cannot hire software developers so easily, or with such good quality. >> Okay, alright, on that note, Kostas, we're going to have to end it, and to be continued, one with Stefan next, apparently. >> Nice. >> And then hopefully next year as well. >> Nice. Thank you. >> Alright, thanks Kostas. >> Thank you George. Alright, we're with Kostas Tzoumas, CEO of data Artisans, the company behind Apache Flink and now the application platform that makes Flink run for mainstream enterprises. We will be back, after this short break. (techno music)
SUMMARY :
Covering Flink Forward, brought to you by data Artisans. and makes it more accessible to mainstream enterprises. Thank you George. application-use case, that is in the sweet spot of Flink, So for example, some of the largest users of Apache Flink I'm pretty sure they will give us bigger numbers today. So pretty big sizes. So and is Netflix, I think I read, is powering it's a tech company that happens to have a banking license. So okay, so tell us, for those who, you know, and you can restart at any time from there. We have also introduced end-to-end exactly once with Kafka. and on the state that having it integrated So traditionally what you would do is and you want to scale it to 500 right, which includes Flink, you can simply do this with one click So, the the resharding was possible with and it gives that to you in a rest API, in a UI, et cetera. you might have a cluster as a first-class citizen and you can do common operational tasks, because the capabilities, you know, are not in the skill set So for example, in the work we did with ING, and the statistics around the shards that gives you the ability to do, but you can do point queries and a few, you know, where you can store you know the commit log so, you can think of the state in Flink and roll back the log, the input log, in the sense that our product the dA Platform at the operations level with the platform. And of course there's SQL right? Yeah, so right now, the platform is running on Kubernetes, Well I think it's clear that the cloud is, you know, It's always easier to start with a new application, so in other words you could augment Okay, 'cause that sounds like then something that's the write path, and then you have the read path, I know that's not the word you would choose, So we have seen, you know, companies Yeah, so the way it works is that, you know, And it manages the state, there's no tier, It's embedded in the Flink application. doesn't extend to analytics, you know, but you know, you cannot query it, Okay, and that's the, you're managing, it's almost like Yeah a distributed key-value store if you wish. Okay, okay, and then that's exposed other languages in the future, et cetera. and the customer, like they do see the need to transform. like the application platform, the dA Platform, and you know for some, some companies build, and to be continued, one with Stefan next, apparently. and now the application platform
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Alibaba | ORGANIZATION | 0.99+ |
Netflix | ORGANIZATION | 0.99+ |
Uber | ORGANIZATION | 0.99+ |
ING | ORGANIZATION | 0.99+ |
George | PERSON | 0.99+ |
George Gilbert | PERSON | 0.99+ |
Kostas Tzoumas | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
Lyft | ORGANIZATION | 0.99+ |
Kostas | PERSON | 0.99+ |
Stefan | PERSON | 0.99+ |
San Francisco | LOCATION | 0.99+ |
Flink | ORGANIZATION | 0.99+ |
next year | DATE | 0.99+ |
Netherlands | LOCATION | 0.99+ |
two | QUANTITY | 0.99+ |
Both | QUANTITY | 0.99+ |
Kafka | TITLE | 0.99+ |
both | QUANTITY | 0.99+ |
one click | QUANTITY | 0.99+ |
Python | TITLE | 0.99+ |
SQL | TITLE | 0.98+ |
first thing | QUANTITY | 0.98+ |
more than a thousand nodes | QUANTITY | 0.98+ |
Kubernetes | TITLE | 0.98+ |
500 | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
one | QUANTITY | 0.98+ |
Confluent | ORGANIZATION | 0.97+ |
Artisans | ORGANIZATION | 0.96+ |
single source | QUANTITY | 0.96+ |
2018 | DATE | 0.96+ |
over 100 | QUANTITY | 0.95+ |
dA Platform | TITLE | 0.95+ |
Flink | TITLE | 0.94+ |
about one trillion events per day | QUANTITY | 0.94+ |
Apache Flink | ORGANIZATION | 0.93+ |
single | QUANTITY | 0.92+ |
Flink Forward Conference | EVENT | 0.9+ |
one step | QUANTITY | 0.9+ |
Apache | ORGANIZATION | 0.89+ |
Databricks | ORGANIZATION | 0.89+ |
Kafka | PERSON | 0.88+ |
first | QUANTITY | 0.86+ |
single entity | QUANTITY | 0.84+ |
one rest command | QUANTITY | 0.82+ |