Enrico Canzonieri, Yelp, Inc. | Flink Forward 2018

>> Narrator: Live from San Francisco, it's The Cube, covering Flink Forward, brought to you by Data Artisans. (upbeat music) >> Hi this is George Gilbert, we are at Flink Forward the conference for the Apache Flink community, sponsored by Data Artisans, which is the company commercializing Flink, and we have with us now Enrico Canzonieri, wait a minute I didn't get that right, Canzonieri, Sorry, Enrico from Yelp. And he's going to tell us how sort of Flink has taken Yelp by storm over the past year. Why don't we start off with where you were last year in terms of your data pipeline and what challenges you were facing. >> Yeah sure, so we had a Python company in the sense we developed most of our software in Python, so until last year we had most of our stream processing was depending on Python. We had developed and announced framework that was doing Python processing, and that was really what we had, there was no Flink running, most of the applications were built around a very simple interface that was process message function, that was what we expected developers to use, so no real obstruction there. >> Okay so, in other words, it sounds like you had a discrete task, request response or a batch, and then hand it off to the next function, and is that what the pipeline looked like? >> The pipeline was more of a streaming pipeline where we had a Kafka topic and input and we had these developers who would write this process message function where each message would be individually processed, there was a kind of the semantic of the pipeline there. And then we will get the result of that processing task into another Kafka topic and then get another processing function on top of that So we could have very easily, two or three processing tasks all connected by Kafka topics. Obviously there were like big limitations of that kind of architecture, especially when you want to do something more advanced that can include windowing, aggregation, or especially state management. >> Well Kafka has several layers of abstraction and I guess you'd have to go pretty low level to get the windowing, all the windowing and state management capabilities. >> Yeah it becomes really hard you basically have to implement by yourself unless you're using a (mumbles) on the conference platform or you're using what they call Kafka streams, but we are not using that. >> Oh, not? Okay. >> Obviously we were trying to implement that on top of what Python or Simple Flame work from zero. >> So tell us how the choice of Flink, sort of where did it hit your awareness and where did you start mapping it into this pipeline that was Python based? >> Yeah so we had the really I think two main use cases, this was last year, that we were struggling to get right and to really get working. The first one was a connector and the challenge there was to aggregate that locally, scale it to hundreds of streams, and then once we aggregated that locally, upload at our own S3. So that was one application, we were really struggling to get that work because in the framework we had, we had no real obstruction for windowing so we had this process message function where it was trying to implement all of that. And also because we were using a very low level Kafka code primitives, getting scalability was not as straight forward. So that was one application that was pretty challenging, the other one was really purer state full application, where we needed to retain the state forever. It was doing a windowed join across streams, so obviously the challenges in that case are even harder more because you have to implement a state management from the ground up. >> And all the time semantics. >> Yeah we had basically no even time semantics. We were not supporting that. >> Okay. >> So we looked at Flink because of even time support so then we could actually do even time processing. State management support, already implemented, it's way different than implementing it from the ground up. And then obviously the abstractions so the streaming primitives, you have windows, you have a nice interface that you can use that makes developers who are writing code it becomes easier for them. >> So let's start with the state management, help us walk through like, what capabilities in state management does Flink have relative to the lowest level abstraction you were using in Kafka or perhaps what spark structured streaming might provide. >> Yeah so I think the nice features in streams are really around the fact that the state management is implemented and fully supports the clusterized approach of Flink. So for example if you're using Kafka, Flink already, in the Kafka connector, Flink already provides a way to represent the Kafka state the state of a Kafka consumer. It also, for open editors, if you have a flat map or you have a window, state for Windows is already fully supported so if you are cumulating events, in your window, you don't really know what to do, then nothing special, these states will be automatically maintained by the Flink framework. That means that if Flink is taking a snapshot so a check point or a save point, all the state that was there will get stored in the check point and then will be able to recover. >> For the full window. >> Yeah. >> It's cause it understands the concept of the window when it does a check point. >> Yeah because there's native support in Flink for that. >> And what's the advantage of having state be integrated with the compute as opposed to compute and then some sort of API to a separate state manager? >> It's definitely like call to clarity, it's a big simplification of how you implement your code, your streaming application. Cause in the end if for every simple string application you need to go ahead and implement or define, implement basically the way your state gets stored, that really makes a very complex application, especially on the maintain, for maintenance. So in Flink you kind of focus on the business logic we actually did some tuning on the state manager that was necessary, but the tuning that we did applies in the same way across all the applications we built. Then users want to build an application, they focus on the business logic that they want, and they have, I would say the state is more kind of declarative, you say you want this map, you need this list and this state as part of the state and Flink will take care of actually making sure that that gets into the check point. >> So the sort of semantics of state management are built in at the compute layer, as opposed to going down to an API for a separate service in other implementations. >> Yeah, yeah. >> Okay. All right we have just a minute left, tell us about some of the things you're looking forward to doing with Flink, and are they similar to what the DA platform that's coming out from Data Artisans or do you have still a whole bunch of things on the data pipeline that you want to accomplish with just the core functionality? >> Yeah we definitely, I will say one of the features that we are really excited about is the stream sequel. I see a lot of potential there for new applications, we actually use a stream sequel at Yelp and we deploy that as a service so it makes it easier for users to deploy and to develop stream plus string applications. We definitely are planning to expand our Flink deployment into just new apps. Especially one of the things that we try to do especially building reusable components, and trying to deploy the reusable components are very coupled with the way we think about our data pipeline. >> Okay so would it be fair to say that can you look at the DA platform and say for companies that are not quite as sophisticated as you, that this is going to make it easier for you know main stream companies to build and deploy, operate? >> I see good potential there, I was looking at the presentation in the morning I like the integration we culminated for sure, since like that's where kind of the current trend for application deployment is going. So yeah I definitely see potential, I think for Yelp we clearly have a complex enough deployment and service integration that won't probably be a good fit for us. But probably companies that are approaching the route of Flink, now and we'll probably have an already existing deployment they may probably give it a try. >> Okay, all right Enrico we got to end it there but that was very helpful and thanks for stopping by. >> Thanks for having me here. >> Okay. And this is George Gilbert we are at Flink Forward, the Data Artisan's conference for the Apache Flink community, and we will be right back after this short break. (upbeat music)

Published Date : Apr 11 2018

SUMMARY :

covering Flink Forward, brought to you by Data Artisans. and we have with us now Enrico Canzonieri, in the sense we developed most of our software in Python, and we had these developers who would write this and state management capabilities. on the conference platform or you're using Okay. Obviously we were trying to implement that because in the framework we had, Yeah we had basically no even time semantics. so then we could actually do even time processing. relative to the lowest level abstraction you were using all the state that was there It's cause it understands the concept of the window that was necessary, but the tuning that we did So the sort of semantics of state management on the data pipeline that you want to accomplish Especially one of the things that we try to do I like the integration we culminated for sure, but that was very helpful and thanks for stopping by. and we will be right back after this short break.

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Enrico Canzonieri	PERSON	0.99+
Enrico	PERSON	0.99+
Data Artisans	ORGANIZATION	0.99+
Flink	ORGANIZATION	0.99+
Canzonieri	PERSON	0.99+
Python	TITLE	0.99+
last year	DATE	0.99+
two	QUANTITY	0.99+
Kafka	TITLE	0.99+
Apache	ORGANIZATION	0.99+
Yelp	ORGANIZATION	0.99+
each message	QUANTITY	0.99+
one application	QUANTITY	0.99+
Yelp, Inc.	ORGANIZATION	0.99+
S3	TITLE	0.99+
hundreds of streams	QUANTITY	0.99+
San Francisco	LOCATION	0.98+
one	QUANTITY	0.96+
past year	DATE	0.94+
2018	DATE	0.93+
Simple Flame	TITLE	0.92+
Windows	TITLE	0.91+
first one	QUANTITY	0.9+
two main use cases	QUANTITY	0.88+
Flink	TITLE	0.87+
Data Artisan	EVENT	0.82+
three processing tasks	QUANTITY	0.76+
Flink Forward	ORGANIZATION	0.73+
zero	QUANTITY	0.72+
Narrator	TITLE	0.69+
The Cube	TITLE	0.69+
string	QUANTITY	0.68+
windows	TITLE	0.64+
things	QUANTITY	0.52+
Flink Forward	EVENT	0.5+
Forward	EVENT	0.49+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Data Artisan: