Kostas Roungeris & Matt Ferguson, Cisco | Cisco Live EU Barcelona 2020
>>live from Barcelona, Spain. It's the Cube covering Cisco Live 2020 right to you by Cisco and its ecosystem partners >>back. This is the Cube's coverage >>of Cisco Live 2020 here in Barcelona, doing about three and 1/2 days of wall to wall coverage here. Stew Minimum. My co host for this segment is Dave Volante. John Furrier is also here, scaring the floor and really happy to welcome to the program. Two first time guests. I believe so. Uh, Derek is the product manager of product marketing for Cloud Computing with Cisco, and sitting to his left is Matt Ferguson, who's director of product development, also with the Cisco Cloud Group. David here from Boston. Matt is also from the Boston area, and customers is coming over from London. So thanks so much for joining us. Thank you. All right, so obviously, cloud computing something we've been talking about many years. We've really found fascinating relationship. Cisco's had, with its customers a zealous through the partner ecosystem, had many good discussions about some of the announcements this week. Maybe start a little bit, you know, Cisco's software journey and positioning in the cloud space right now. >>So it's a really interesting dynamic when we started transitioning to multi cloud and we actually deal with Cloud and Compute coming together and we've had whether you're looking at three infrastructure ops organization or whether you're looking at the APS up operations or whether you're looking at, you know, your DEV environment, your security operations. Each organization has to deal with their angle, which they view, you know, multi cloud. Or they view how they actually operate within those the cloud computing context. And so whether you're on the infrastructure side, you're looking at compute. You're looking at storage. You're looking at resources. If you're on an app operator, you're looking at performance. You're looking at visibility assurance. If you are in the security operations you're looking at, maybe governance. You're looking at policy, and then when you're a developer, you really sort of thinking about see I CD. You're talking about agility, and there's very few organizations like Cisco that actually is looking at from a product perspective. All those various angles >>of multi cloud >> is definitely a lot of pieces. Maybe up level it for us a little bit. There's so many pieces way talk for so long. You know you don't talk to any company that doesn't have a cloud. Strategy doesn't mean that it's not going to change over time. And it means every company's got a known positioning. But talk about the relationship Cisco has with its customers and really the advisory condition that you want to have with >>its actually a very relevant question. To what? To what Matt is talking about, because Wei talked a lot about multi cloud as a trend and hybrid clouds and this kind of relationship between the traditional view of looking at computing data centers and then expanding to different clouds. You know, public cloud providers have now amazing platform capabilities. And if you think about it, the it goes back to what Matt said about I t ops and development kind of efforts. Why is this happening? Really, there's There's the study that we did with with an analyst, and there was an amazing a shocking stats around how within the next three years, organizations will have to support 50% more applications than they do now. And we have been trying to test this, that our events that made customer meetings etcetera, that is a lot of a lot of change for organizations. So if you think about why are they use, why do they need to basically let go and expand to those clouds? Is because they want to service. I T ops teams want to servers with capabilities, their developers faster, right? And this is where you have within the I T ops kind of theme organization. You have the security kind of frame through the compute frame, the networking where, you know Cisco has a traditional footprint. How do you blend all this? How do you bring all this together in a linear way to support individual unique application modernization efforts? I think that's what we're hearing from customers in terms of the feedback. And this is what influences our >>strategy to converts the different business units. And it's an area engineering effort, right? >>I want to poke at that a little bit. I mean, a couple years ago, I have to admit I was kind of a multi cloud skeptic. I always said that I thought it was more of a symptom that actually strategy a symptom of shadow I t and different workloads and so forth, but now kind of buying in because I think I t in particular has been brought in to clean up the crime scene. I often say so I think it is becoming a strategy. So if you could help us understand what you're hearing from customers in terms of their strategy towards multi cloud and how Cisco it was mapping into that, >>yeah, so So when we talk to customers, it comes back to the angle at which they're approaching the problem. And, like you said, that shadow I t. Has been probably around for longer than anybody want, cares to admit, because people want to move faster. Organizations want to get their product out to market sooner. And so what? What really is we're having conversations now about, you know, how do I get the visibility? How do I get you know, the policies in the governance so that I can actually understand either how much I'm spending in the cloud or whether I'm getting the actual performance that I'm looking for, that I need that connectivity. So I get the bandwidth, and so these are the kinds of conversations that we have with customers is going. I realized that this is going on now I actually have to Now put some governance and controls around. That is their products is their solutions, is there? You know, they're looking to Cisco to help them through this journey because it is a journey. Because as much as we talk about cloud and you know, companies that were born in the cloud cloud native there is a tremendous number of I see organizations that are just starting that journey that are just entering into this phase where they have to solve these problems. >>Yeah, I agree. And they're starting the journey with a deliberate strategy as opposed to Okay, we got this thing. But if you think about the competitive landscape, it's kind of interesting. And I want to try to understand where Cisco fits because again, you initially had companies that didn't know in a public cloud sort of pushing multi cloud. You say? Well, I guess they have to do that. But now you see, and those come out with Google, you see Microsoft leaning in way. Think eventually aws is gonna lead in. And then you say I'm kind of interested in working with some of these cloud agnostic not trying to force Now, now Cisco. A few years ago, you didn't really think about Cisco as a player. Now this goes right in the middle. I have said often that Cisco's in a great position John Furrier as well to connect businesses and from a source of networking strength, making a strong argument that we have the most cost effective, most secure, highest performance networks to connect clouds. That seems to be a pretty fundamental strength of yours. And does that essentially summarize your strategy? And And how does that map into the actions that you're taking in terms of products and services that you're bringing to market? >>I would say that I can I can I can take that. Yeah, for sure. It's chewy question for hours. So I was thinking about satellite you mentioned before. Like Okay, that's, you know, the world has turned around completely way seem to talk about Target satellite Is something bad happening? And now, suddenly we completely forgot about it, like let let free, free up the developers and let them do whatever they want. And basically that is what I think is happening out there in the market. So all of the solutions you mentioned in the go to market approaches and the architectures that the public cloud providers at least our offering out there. Certainly the Big Three have differences have their strengths on. And I think those things are closer to the developer environment. Basically, you know, if you're looking into something like AI ml, there's one provider that you go with. If you're looking for a mobile development framework, you're gonna go somewhere else. If you're looking for a D, are you gonna go somewhere else? Maybe not a big cloud, but your service provider. But you've been dealing with all this all this time, so you know that they have their accreditation that you're looking for. So where does Cisco come in? You know, we're not a public cloud provider way offer products as a service from our data centers and our partners data centers. But at the the way that the industry sees a cloud provider a public cloud like AWS Azure, Google, Oracle, IBM, etcetera, we're not that we don't do that. Our mission is to enable organizations with software hardware products SAS products to be able to facilitate their connectivity, security, visibility, observe ability, and in doing business and in leveraging the best benefits from those clubs. So way kind of way kind of moved to a point where we flip around the question, and the first question is, Who is your club provider? What? How many? Tell us the clouds you work with, and we can give you the modular pieces you can put we can put together for you. So these, so that you can make the best out of >>your club. Being able to do that across clouds in an environment that is consistent with policies that are consistent, that represent the edicts of your organization, no matter where your data lives, that's sort of the vision and the way >>this is translated into products into Cisco's products. You naturally think about Cisco as the connectivity provider networking. That's that's really sort of our, you know, go to in what we're also when we have a significant computing portfolio as well. So connectivity is not only the connectivity of the actual wire between geography is point A to point B. In the natural routing and switching world, there's connectivity between applications between compute and so this week. You know, the announcements were significant in that space when you talk about the compute and the cloud coming together on a single platform, that gives you not only the ability to look at your applications from an experience journey map so you can actually know where problems might occur in the application domain. You can actually, then go that next level down into the infrastructure level and you can say, Okay, maybe I'm running out of some sort of resource, whether it's compute resource, whether it's memory, whether it's on your private cloud that you have enabled on Prem, or whether it's in the public cloud, that you have that application residing and then, quite candidly, you have the actual hardware itself. So inter site. It has an ability to control that entire stack so you can have that visibility all the way down to the hardware layer. >>I'm glad you brought up some of the applications. I wonder if we could stay there for a moment. Talk about some of the changing patterns for customers. A lot of talk in the industry about cloud native often gets conflated with micro services, container ization and lots of the individual pieces there. But when one of Our favorite things have been talking about this week is software that really sits at the application layer and how that connects down through some of the infrastructure pieces. So help us understand what you're hearing from customers and how you're helping them through this transition to cost. You're saying, Absolutely, there's going to be lots of new applications, more applications and they still have the old stuff that they need to continue to manage because we know in I t nothing ever goes away. Yeah, >>that's that's definitely I was I was thinking, you know, there's there's a vacuum at the moment on and there's things that Cisco is doing from from a technology perspective to fill that gap between application. What you see when it comes to monitoring, making sure your services are observable. And how does that fit within the infrastructure stack, You know, everything upwards of the network layer. Basically, that is changing dramatically. Some of the things that matter touched upon with regards to, you know, being able to connect the networking, the security and the infrastructure of the compute infrastructure that the developers basically are deploying on top. So there's a lot of the desert out of things on continue ization. There's a lot of, in fact, it's one part of the off the shelf inter site of the stack that you mentioned and one of the big announcements. Uh huh. You know that there's a lot of discussion in the industry around. Okay, how does that abstract further the conversation on networking, for example? Because that now what we're seeing is that you have a huge monoliths enterprise applications that are being carved down into micro services. Okay, they know there's a big misunderstanding around what is cloud native? Is it related to containers? Different kind of things, right? But containers are naturally the infrastructure defacto currency for developers to deploy because of many, many benefits. But then what happens between the kubernetes layer, which seems to be the standard and the application? Who's going to be managing services talking to each other that are multiplying? You know, things like service mesh, network service mess? How is the never evolving to be able to create this immutable infrastructure for developers to deploy applications? So there's so many things happening at the same time where Cisco has actually a lot of taking a lot of the front seat. Leading that conversation >>is where it gets really interesting. Sort of hard to squint through because you mentioned kubernetes is the de facto standard, but it's a defacto standard that's open everybody's playing with. But historically, this industry has been defined by a leader comes out with a de facto standard kubernetes, not a company. It's an open standard, so but there's so many other components than containers. And so history would suggest that there's going to be another defacto standard or multiple standards that emerge. And your point earlier. You got to have the full stack. You can't just do networking. You can't just do certain if you so you guys are attacking that whole pie. So how do you think this thing will evolve? I mean, you guys obviously intend to put out a stat cast a wide net as possible, captured not only your existing install basement attract, attract others on you're going aggressively at it as a czar. Others How do you see it shaking out? You see you know, four or five pockets, you see one leader emerging. I mean, customers would love all you guys to get together and come up with standards. That's not going to happen. So where it's jump ball right now? >>Well, yeah. You think about, you know, to your point regarding kubernetes is not a company, right? It is. It is a community driven. I mean, it was open source by a large company, but it's community driven now, and that's the pace at which open source is sort of evolving. There is so much coming at I t organizations from a new paradigm, a new software, something that's, you know, the new the shiny object that sort of everybody sort of has to jump onto and sort of say, that is the way we're gonna function. So I t organizations have to struggle with this influx of just every coming at them and every angle. And I think what starting toe happen is the management and the you know that Stack who controls that or who is helping i t organizations to manage it for them. So really, what we're trying to say is there's elements that have to put together that have to function, and kubernetes is just one example Docker, the operating system that associated with it that runs all that stuff then you have the application that goes right sides on top of it. So now what we have to have is things like what we just announced this week. Hx AP the application platform for a check so you have the Compute cluster, but then you have the stack on top of that that's managed by an organization that's looking at the security that's looking at the the actual making opinions about what should go in the stack and managing that for you. So you don't have to deal with that because you just focus on the application development. Yeah, >>I mean, Cisco's in a strong position to do. There's no question about it. To me, it comes down to execution. If you guys execute and deliver on the products and services that you say, you know, you announced, for instance, this weekend previously, and you continue on a road map, you're gonna get a fair share of this market place. I think there's no question >>so last topic before we let you go is love your viewpoint on customers. What's separating kind of leaders from you know, the followers in this space, you know, there's so much data out there. And I'm a big fan of the State of Dev Ops report Help separate, You know, some not be not. Here's the technology or the piece, but the organizational and, you know, dynamics that you should do. So it sounds like you like that report also, love. What do you hear from customers? How do you help guide them towards becoming leaders in the cloud space? >>Yeah, The State of Dev Ops report was fascinating. I mean, they've been doing that for a number of years now. Yeah, exactly. And really what? It's sort of highlighting is two main factors that I think that are in this revolution or the third paradigm shift. Our journey we're going through, there's the technology side for sure, and so that's getting more complex. You have micro services, you have application explosion. You have a lot of things that are occurring just in technology that you're trying to keep up. But then it's really about the human aspect of human elements, the people about it. And that's really I think, what separates you know, the elites that are really sort of, you know, just charging forward and ahead because they've been able to sort of break down the silos because really, what you're talking about in cloud Native Dev Ops is how you take the journey of the experience of the service from end end from the development all the way to production. And how do you actually sort of not have organizations that look at their domain their data, set their operations and then have to translate that or have to sort of you have another conversation with another organization that that doesn't look at that, That has no experience of that? So that is what we're talking about, that end and view. >>And in addition to all the things we've been talking about, I think security's a linchpin here. You guys are executing on security. You got a big portfolio and you've seen a lot of M and A and a lot of companies trying to get in, and it's gonna be interesting to see how that plays out. But that's going to be a key because organizations are going to start there from a strategy standpoint, and they build out >>Yeah, absolutely. If you follow Dev ops methodologies, security gets baked in along the way so that you're not having to 100% gone after anything, just give you the final word. >>I was just a follow up with You. Got some other model was saying, There's so many, there's what's happening out there Is this democracy around? Standards with is driven by communities and way love that in fact, Cisco is involved in many open sores community projects. But you asked about customers and just right before you were asking about you know who is gonna be the winner. There's so many use cases. >>Uh huh. >>There's so much depth in Tim's off. You know what customers want to do with on top of kubernetes, you know, take Ai Ml, for example, something that we have way have some, some some offering services on there's cast. A mother wants to ai ml their their container stuck. Their infrastructure will be so much different to someone else, is doing something just hosting. And there's always going to be a SAS provider that is niche servicing some oil and gas company, you know, which means that the company of that industry will go and follow that instead of just going to a public cloud provider that is more agnostic. Does that make sense? Yeah. >>Yeah. There's relationships that exist that are just gonna get blown away. That add value today. And they're not going to just throw him out. Exactly. >>Well, thank you so much for helping us understand the updates where your customers are driving super exciting space. Look forward to keeping an eye on it. Thanks so much. Alright, there's still lots more coming here from Cisco Live 2020 in Barcelona. People are standing watching all the developer events, lots going on the floor and we still have more. So thank you for watching the Cube. Yeah, yeah.
SUMMARY :
Cisco Live 2020 right to you by Cisco and its ecosystem This is the Cube's coverage start a little bit, you know, Cisco's software journey and positioning in If you are in the security operations you're looking at, maybe governance. its customers and really the advisory condition that you want to have with And this is where you have within the I T ops kind of theme strategy to converts the different business units. So if you could help us understand what you're hearing How do I get you know, the policies in the governance so that And I want to try to understand where Cisco fits because again, you initially So all of the solutions you mentioned in the go to market approaches and that is consistent with policies that are consistent, that represent the edicts of your organization, It has an ability to control that entire stack so you can have that that really sits at the application layer and how that connects down through some There's a lot of, in fact, it's one part of the off the shelf inter site of the stack that you mentioned Sort of hard to squint through because you mentioned kubernetes is the example Docker, the operating system that associated with it that runs all that stuff then you have the application you know, you announced, for instance, this weekend previously, and you continue on a road map, you're gonna get a but the organizational and, you know, dynamics that you should do. data, set their operations and then have to translate that or have to sort of you have And in addition to all the things we've been talking about, I think security's a linchpin here. not having to 100% gone after anything, just give you the final word. customers and just right before you were asking about you know who is gonna be the winner. on top of kubernetes, you know, take Ai Ml, for example, something that we have way And they're not going to just throw him out. So thank you for
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Cisco | ORGANIZATION | 0.99+ |
Dave Volante | PERSON | 0.99+ |
Matt Ferguson | PERSON | 0.99+ |
Derek | PERSON | 0.99+ |
Boston | LOCATION | 0.99+ |
David | PERSON | 0.99+ |
London | LOCATION | 0.99+ |
Barcelona | LOCATION | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
Matt Ferguson | PERSON | 0.99+ |
Matt | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
100% | QUANTITY | 0.99+ |
John Furrier | PERSON | 0.99+ |
Kostas Roungeris | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
first question | QUANTITY | 0.99+ |
Cisco Cloud Group | ORGANIZATION | 0.99+ |
Barcelona, Spain | LOCATION | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
two main factors | QUANTITY | 0.99+ |
Tim | PERSON | 0.99+ |
Wei | PERSON | 0.99+ |
this week | DATE | 0.98+ |
1/2 days | QUANTITY | 0.98+ |
one provider | QUANTITY | 0.98+ |
Two first time | QUANTITY | 0.98+ |
Each organization | QUANTITY | 0.98+ |
SAS | ORGANIZATION | 0.97+ |
single platform | QUANTITY | 0.97+ |
four | QUANTITY | 0.97+ |
third paradigm | QUANTITY | 0.96+ |
today | DATE | 0.96+ |
five pockets | QUANTITY | 0.96+ |
one | QUANTITY | 0.95+ |
Cisco Live 2020 | EVENT | 0.93+ |
State of | TITLE | 0.92+ |
Live 2020 | COMMERCIAL_ITEM | 0.91+ |
about three | QUANTITY | 0.89+ |
this weekend | DATE | 0.89+ |
State of Dev Ops | TITLE | 0.89+ |
Cube | COMMERCIAL_ITEM | 0.89+ |
one leader | QUANTITY | 0.87+ |
couple years ago | DATE | 0.87+ |
one part | QUANTITY | 0.84+ |
50% more | QUANTITY | 0.83+ |
one example | QUANTITY | 0.82+ |
Target | ORGANIZATION | 0.81+ |
Cisco Live EU Barcelona 2020 | EVENT | 0.81+ |
few years ago | DATE | 0.76+ |
next three years | DATE | 0.7+ |
aws | ORGANIZATION | 0.68+ |
hours | QUANTITY | 0.66+ |
three | QUANTITY | 0.65+ |
Cisco | COMMERCIAL_ITEM | 0.64+ |
Native Dev | ORGANIZATION | 0.63+ |
Ops | TITLE | 0.56+ |
Hx | ORGANIZATION | 0.52+ |
M | ORGANIZATION | 0.51+ |
Stack | ORGANIZATION | 0.5+ |
years | QUANTITY | 0.48+ |
AP | ORGANIZATION | 0.38+ |
Azure | TITLE | 0.38+ |
Docker | TITLE | 0.36+ |
Three | OTHER | 0.2+ |
Kostas Tzoumas, data Artisans | Flink Forward 2018
(techno music) >> Announcer: Live, from San Francisco, it's theCUBE. Covering Flink Forward, brought to you by data Artisans. (techno music) >> Hello again everybody, this is George Gilbert, we're at the Flink Forward Conference, sponsored by data Artisans, the provider of both Apache Flink and the commercial distribution, the dA Platform that supports the productionization and operationalization of Flink, and makes it more accessible to mainstream enterprises. We're priviledged to have Kostas Tzoumas, CEO of data Artisans, with us today. Welcome Kostas. >> Thank you. Thank you George. >> So, tell us, let's start with sort of an idealized application-use case, that is in the sweet spot of Flink, and then let's talk about how that's going to broaden over time. >> Yeah, so just a little bit of an umbrella above that. So what we see very, very consistently, we see it in tech companies, and we see, so modern tech companies, and we see it in traditional enterprises that are trying to move there, is a move towards a business that runs in real time. Runs 24/7, is data-driven, so decisions are made based on data, and is software operated. So increasingly decisions are made by AI, by software, rather than someone looking at something and making a decision, yeah. So for example, some of the largest users of Apache Flink are companies like Uber, Netflix, Alibaba, Lyft, they are all working in this way. >> Can you tell us about the size of their, you know, something in terms of records per day, or cluster size, or, >> Yeah, sure. So, latest I heard, Alibaba is powering Alibaba Certs, more than a thousand nodes, terabytes of states, I'm pretty sure they will give us bigger numbers today. Netflix has reported of doing about one trillion events per day. >> George: Wow. >> On Flink. So pretty big sizes. >> So and is Netflix, I think I read, is powering their real time recommendation updates. >> They are powering a bunch of things, a bunch of applications, there's a lot of routing events internally. I think they have a talk, they had a talk definitely at the last conference, where they talk about this. And it's really a variety of use cases. It's really about building a platform, internally. And offering it to all sorts of departments in the company, be that for recommendations, be that for BI, be that for running, state of microservices, you know, all sorts of things. And we also see, the more traditional enterprise moving to this modus operandi. For example, ING is also one of our biggest partners, it's a global consumer bank based in the Netherlands, and their CEO is saying that ING is not a bank, it's a tech company that happens to have a banking license. It's a tech company that inherited a banking license. So that's how they want to operate. So what we see, is stream processing is really the enabler for this kind of business, for this kind of modern business where we interact with, in real time, they interact with the consumer in real time, they push notifications, they can change the pricing, et cetera, et cetera. So this is really the crux of stateful stream processing , for me. >> So okay, so tell us, for those who, you know, have a passing understanding of how Kafka's evolving, how Apache Spark and Structured Streaming's evolving, as distinct from, but also, Databricks. What is it about having state management that's sort of integrated, that for example, might make it easy to elastically change a cluster size by repartitioning. What can you assume about managing state internally, that makes things easier? >> Yeah, so I think really the, the sweet spot of Flink, is that if you are looking for stream process, from a stream processing engine, and for a stateful stream processing engine for that matter, Flink is the definition of this. It's the definite solution to this problem. It was created from scratch, with this in mind, it was not sort of a bolt-on on top of something else, so it's streaming from the get-go. And we have done a lot of work to make state a first-class citizen. What this means, is that in Flink programs, you can keep state that scales to terabytes, we have seen that, and you can manage this state together with your application. So Flink has this model based on check points, where you take a check point of your application and state together, and you can restart at any time from there. So it's really, the core of Flink, is around state management. >> And you manage exactly one semantics across the checkpointing? >> It's exactly once, it's application-level exactly once. We have also introduced end-to-end exactly once with Kafka. So Kafka-Flink-Kafka exactly once. So fully consistent. >> Okay so, let's drill down a little bit. What are some of the things that customers would do with an application running on a, let's say a big cluster or a couple clusters, where they want to operate both on the application logic and on the state that having it integrated you know makes much easier? >> Yeah, so it is a lot about a flipped architecture and about making operations and DevOps much, much easier. So traditionally what you would do is create, let's say a containerized stateless application and have a central centralized data store to keep all your states. What you do now, is the state becomes part of the application. So this has several benefits. It has performance benefits, it has organizational benefits in the company. >> Autonomy >> Autonomy between teams. It has, you know it gives you a lot of flexibility on what you can do with the applications, like, for example right, scaling an application. What you can do with Flink is that you have an application running with parallelism over 100 and you are getting a higher volume and you want to scale it to 500 right, so you can simply with Flink take a snapshot of the state and the application together, and then restart it at a 500 and Flink is going to resolve the state. So no need to do anything on a database. >> And then it'll reshard and Flink will reshard it. >> Will reshard and it will restart. And then one step further with the product that we have introduced, dA Platform which includes Flink, you can simply do this with one click or with one rest command. >> So, the the resharding was possible with core Flink, the Apache Flink and the dA Platform just makes it that much easier along with other operations. >> Yeah so what the dA Platform does is it gives you an API for common operational tasks, that we observed everybody that was deploying Flink at a decent scale needs to do. It abstracts, it is based on Kubernetes, but it gives you a higher-level API than Kubernetes. You can manage the application and the state together, and it gives that to you in a rest API, in a UI, et cetera. >> Okay, so in other words it's sort of like by abstracting even up from Kubernetes you might have a cluster as a first-class citizen but you're treating it almost like a single entity and then under the covers you're managing the, the things that happen across the cluster. >> So what we have in the dA Platform is a notion of a deployment which is, think of it as, I think of it as a cluster, but it's basically based on containers. So you have this notion of deployments that you can manage, (coughs) sorry, and then you have a notion of an application. And an application, is a Flink job that evolves over time. And then you have a very, you know, bird's-eye view on this. You can, when you update the code, this is the same application with updated code. You can travel through a history, you can visit the logs, and you can do common operational tasks, like as I said, rescaling, updating the code, rollbacks, replays, migrate to a new deployment target, et cetera. >> Let me ask you, outside of the big tech companies who have built much of the application management scaffolding themselves, you can democratize access to stream processing because the capabilities, you know, are not in the skill set of traditional, mainstream developers. So question, the first thing I hear from a lot of sort of newbies, or people who want to experiment, is, "Well, it's so easy to manage the state "in a shared database, even if I'm processing, "you know, continuously." Where should they make the trade-off? When is it appropriate to use a shared database? Maybe you know, for real OLTP work, and then when can you sort of scale it out and manage it integrally with the rest of the application? >> So when should we use a database and when should we use streaming, right? >> Yeah, and even if it's streaming with the embedded state. >> Yeah, that's a very good question. I think it really depends on the use case. So what we see in the market, is many enterprises start with with a use case that either doesn't scale, or it's not developer friendly enough to have these database application levels. Level separation. And then it quickly spreads out in the whole company and other teams start using it. So for example, in the work we did with ING, they started with a fraud detection application, where the idea was to load models dynamically in the application, as the data scientists are creating new models, and have a scalable fraud detection system that can handle their load. And then we have seen other teams in the company adopting processing after that. >> Okay, so that sounds like where the model becomes part of the application logic and it's a version of the application logic and then, >> The version of the model >> Is associated with the checkpoint >> Correct. >> So let me ask you then, what happens when you you're managing let's say terabytes of state across a cluster, and someone wants to query across that distributed state. Is there in Flink a query manager that, you know, knows about where all the shards are and the statistics around the shards to do a cost-based query? >> So there is a feature in Flink called queryable state that gives you the ability to do, very simple for now, queries on the state. This feature is evolving, it's in progress. And it will get more sophisticated and more production-ready over time. >> And that enables a different class of users. >> Exactly, I wouldn't, like to be frank, I wouldn't use it for complex data warehousing scenarios. That still needs a data warehouse, but you can do point queries and a few, you know, slightly more sophisticated queries. >> So this is different. This type of state would be different from like in Kafka where you can store you know the commit log for X amount of time and then replay it. This, it's in a database I assume, not in a log form and so, you have faster access. >> Exactly, and it's placed together with a log, so, you can think of the state in Flink as the materialized view of the log, at any given point in time, with various versions. >> Okay. >> And really, the way replay works is, roll back the state to a prior version and roll back the log, the input log, to that same logical time. >> Okay, so how do you see Flink spreading out, now that it's been proven in the most demanding customers, and now we have to accommodate skills, you know, where the developers and DevOps don't have quite the same distributed systems knowledge? >> Yeah, I mean we do a lot of work at data Artisans with financial services, insurance, very traditional companies, but it's definitely something that is work in progress in the sense that our product the dA Platform makes operation smarts easier. This was a common problem everywhere, this was something that tech companies solved for themselves, and we wanted to solve it for everyone else. Application development is yet another thing, and as we saw today in the last keynote, we are working together with Google and the BIM Community to bring Python, GOLD, all sorts of languages into Flink. >> Okay so that'll help at the developer level, and you're also doing work at the operations level with the platform. >> And of course there's SQL right? So Flink has Stream SQL which is standard SQL. >> And would you see, at some point, actually sort of managing the platform for customers, either on-prem or in the cloud? >> Yeah, so right now, the platform is running on Kubernetes, which means that typically the customer installs it in their clusters, in their Kubernetes clusters. Which can be either their own machines, or it can be a Kubernetes service from a cloud vendor. Moving forward I think it will be very interesting yes, to move to more hosted solutions. Make it even easier for people. >> Do you see a breakpoint or a transition between the most sophisticated customers who, either are comfortable on their own premises, or who were cloud, sort of native, from the beginning, and then sort of the rest of the mainstream? You know, what sort of applications might they move to the cloud or might coexist between on-prem and the cloud? >> Well I think it's clear that the cloud is, you know, every new business starts on the cloud, that's clear. There's a lot of enterprise that is not yet there, but there's big willingness to move there. And there's a lot of hybrid cloud solutions as well. >> Do you see mainstream customers rewriting applications because they would be so much more powerful in stream processing, or do you see them doing just new applications? >> Both, we see both. It's always easier to start with a new application, but we do see a lot of legacy applications in big companies that are not working anymore. And we see those rewritten. And very core applications, very core to the business. >> So could that be, could you be sort of the source and in an analytic processing for the continuous data and then that sort of feeds a transaction and some parameters that then feed a model? >> Yeah. >> Is that, is that a, >> Yeah. >> so in other words you could augment existing OLTP applications with analytics then inform them in real time essentially. >> Absolutely. >> Okay, 'cause that sounds like then something that people would build around what exists. >> Yeah, I mean you can do, you can think of stream processing, in a way, as transaction processing. It's not a dedicated OLTP store, but you can think of it in this flipped architecture right? Like the log is essentially the re-do log, you know, and then you create the materialized views, that's the write path, and then you have the read path, which is queryable state. This is this whole CQRS idea right? >> Yeah, Command-Query-Response. >> Exactly. >> So, this is actually interesting, and I guess this is critical, it's sort of like a new way of doing distributed databases. I know that's not the word you would choose, but it's like the derived data, managed by, sort of coming off of the state changes, then in the stream processor that goes through a single sort of append-only log, and then reading, and how do you manage consistency on the materialized views that derive data? >> Yeah, so we have seen Flink users implement that. So we have seen, you know, companies really base the complete product on the CQRS pattern. I think this is a little bit further out. Consistency-wise, Flink gives you the exactly once consistency on the write path, yeah. What we see a lot more is an architecture where there's a lot of transactional stores in the front end that are running, and then there needs to be some kind of global, of single source of truth, between all of them. And a very typical way to do that is to get these logs into a stream, and then have a Flink application that can actually scale to that. Create a single source of truth from all of these transactional stores. >> And by having, by feeding the transactional stores into this sort of hub, I presume, some cluster as a hub, and even if it's in the form of sort of a log, how can you replay it with sufficient throughput, I guess not to be a data warehouse but to, you know, have low latency for updating the derived data? And is that derived data I assume, in non-Flink products? >> Yeah, so the way it works is that, you know, you can get the change logs from the databases, you can use something like Kafka to buffer them up, and then you can use Flink for all the processing and to do the reprocessing with Flink, this is really one of the core strengths of Flink. Basically what you do is, you replay the Flink program together with the states you can get really, really high throughput reprocessing there. >> Where does the super high throughput come from? Is that because of the integration of state and logic? >> Yeah, that is because Flink is a true streaming engine. It is a high-performance streaming engine. And it manages the state, there's no tier, >> Crossing a boundary? >> no tier crossing and there's no boundary crossing when you access state. It's embedded in the Flink application. >> Okay, so that you can optimize the IO path? >> Correct. >> Okay, very, very interesting. So, it sounds like the Kafka guys, the Confluent folks, their aspirations, from the last time we talked to 'em, doesn't extend to analytics, you know, I don't know whether they want partners to do that, but it sounds like they have a similar topology, but they're, but I'm not clear how much of a first-class citizen state is, other than the log. How would you characterize the trade-offs between the two? >> Yeah, so, I mean obviously I cannot comment on Confluent, but like, what I think is that the state and the log are two very different things. You can think of the log as storage, it's a kind of hot storage because it's the most recent data but you know, you cannot query it, it's not a materialized view, right. So for me the separation is between processing state and storage. The log is is a kind of storage, so kind of message queue. State is really the active data, the real-time active data that needs to have consistency guarantees, and that's a completely different thing. >> Okay, and that's the, you're managing, it's almost like you're managing under the covers a distributed database. >> Yes, kind of. Yeah a distributed key-value store if you wish. >> Okay, okay, and then that's exposed through multiple interfaces, data stream, table. >> Data stream, table API, SQL, other languages in the future, et cetera. >> Okay, so going further down the line, how do you see the sort of use cases that are going to get you across the chasm from the big tech companies into the mainstream? >> Yeah, so we are already seeing that a lot. So we're doing a lot of work with financial services, insurance companies a lot of very traditional businesses. And it's really a lot about maintaining single source of truth, becoming more real-time in the way they interact with the outside world, and the customer, like they do see the need to transform. If we take financial services and investment banks for example, there is a big push in this industry to modernize the IT infrastructure, to get rid of legacy, to adopt modern solutions, become more real-time, et cetera. >> And so they really needed this, like the application platform, the dA Platform, because operationalizing what Netflix did isn't going to be very difficult maybe for non-tech companies. >> Yeah, I mean, you know, it's always a trade-off right, and you know for some, some companies build, some companies buy, and for many companies it's much more sensible to buy. That's why we have software products. And really, our motivation was that we worked in the open-source Flink community with all the big tech companies. We saw their successes, we saw what they built, we saw, you know, their failures. We saw everything and we decided to build this for everybody else, for everyone that, you know, is not Netflix, is not Uber, cannot hire software developers so easily, or with such good quality. >> Okay, alright, on that note, Kostas, we're going to have to end it, and to be continued, one with Stefan next, apparently. >> Nice. >> And then hopefully next year as well. >> Nice. Thank you. >> Alright, thanks Kostas. >> Thank you George. Alright, we're with Kostas Tzoumas, CEO of data Artisans, the company behind Apache Flink and now the application platform that makes Flink run for mainstream enterprises. We will be back, after this short break. (techno music)
SUMMARY :
Covering Flink Forward, brought to you by data Artisans. and makes it more accessible to mainstream enterprises. Thank you George. application-use case, that is in the sweet spot of Flink, So for example, some of the largest users of Apache Flink I'm pretty sure they will give us bigger numbers today. So pretty big sizes. So and is Netflix, I think I read, is powering it's a tech company that happens to have a banking license. So okay, so tell us, for those who, you know, and you can restart at any time from there. We have also introduced end-to-end exactly once with Kafka. and on the state that having it integrated So traditionally what you would do is and you want to scale it to 500 right, which includes Flink, you can simply do this with one click So, the the resharding was possible with and it gives that to you in a rest API, in a UI, et cetera. you might have a cluster as a first-class citizen and you can do common operational tasks, because the capabilities, you know, are not in the skill set So for example, in the work we did with ING, and the statistics around the shards that gives you the ability to do, but you can do point queries and a few, you know, where you can store you know the commit log so, you can think of the state in Flink and roll back the log, the input log, in the sense that our product the dA Platform at the operations level with the platform. And of course there's SQL right? Yeah, so right now, the platform is running on Kubernetes, Well I think it's clear that the cloud is, you know, It's always easier to start with a new application, so in other words you could augment Okay, 'cause that sounds like then something that's the write path, and then you have the read path, I know that's not the word you would choose, So we have seen, you know, companies Yeah, so the way it works is that, you know, And it manages the state, there's no tier, It's embedded in the Flink application. doesn't extend to analytics, you know, but you know, you cannot query it, Okay, and that's the, you're managing, it's almost like Yeah a distributed key-value store if you wish. Okay, okay, and then that's exposed other languages in the future, et cetera. and the customer, like they do see the need to transform. like the application platform, the dA Platform, and you know for some, some companies build, and to be continued, one with Stefan next, apparently. and now the application platform
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Alibaba | ORGANIZATION | 0.99+ |
Netflix | ORGANIZATION | 0.99+ |
Uber | ORGANIZATION | 0.99+ |
ING | ORGANIZATION | 0.99+ |
George | PERSON | 0.99+ |
George Gilbert | PERSON | 0.99+ |
Kostas Tzoumas | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
Lyft | ORGANIZATION | 0.99+ |
Kostas | PERSON | 0.99+ |
Stefan | PERSON | 0.99+ |
San Francisco | LOCATION | 0.99+ |
Flink | ORGANIZATION | 0.99+ |
next year | DATE | 0.99+ |
Netherlands | LOCATION | 0.99+ |
two | QUANTITY | 0.99+ |
Both | QUANTITY | 0.99+ |
Kafka | TITLE | 0.99+ |
both | QUANTITY | 0.99+ |
one click | QUANTITY | 0.99+ |
Python | TITLE | 0.99+ |
SQL | TITLE | 0.98+ |
first thing | QUANTITY | 0.98+ |
more than a thousand nodes | QUANTITY | 0.98+ |
Kubernetes | TITLE | 0.98+ |
500 | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
one | QUANTITY | 0.98+ |
Confluent | ORGANIZATION | 0.97+ |
Artisans | ORGANIZATION | 0.96+ |
single source | QUANTITY | 0.96+ |
2018 | DATE | 0.96+ |
over 100 | QUANTITY | 0.95+ |
dA Platform | TITLE | 0.95+ |
Flink | TITLE | 0.94+ |
about one trillion events per day | QUANTITY | 0.94+ |
Apache Flink | ORGANIZATION | 0.93+ |
single | QUANTITY | 0.92+ |
Flink Forward Conference | EVENT | 0.9+ |
one step | QUANTITY | 0.9+ |
Apache | ORGANIZATION | 0.89+ |
Databricks | ORGANIZATION | 0.89+ |
Kafka | PERSON | 0.88+ |
first | QUANTITY | 0.86+ |
single entity | QUANTITY | 0.84+ |
one rest command | QUANTITY | 0.82+ |
Stephan Ewen, data Artisans | Flink Forward 2018
>> Narrator: Live from San Francisco. It's the CUBE covering Flink Forward brought to you by data Artisans. >> Hi, this is George Gilbert. We are at Flink Forward. The conference put on by data Artisans for the Apache Flink community. This is the second Flink Forward in San Francisco and we are honored to have with us Stephan Ewen, co-founder of data Artisans, co-creator of Apache Flink, and CTO of data Artisans. Stephan, welcome. >> Thank you, George. >> Okay, so with others we were talking about the use cases they were trying to solve but you put together the sort of all the pieces in your head first and are building out, you know, something that's ultimately gets broader and broader in its applicability. Help us, now maybe from the bottom up, help us think through the problems you were trying to solve and and let's start, you know, with the ones that you saw first and then how the platform grows so that you can solve more and more a broader scale of problems. >> Yes, yeah, happy to do that. So, I think we have to take a bunch of step backs and kind of look at what is the let's say the breadth or use cases that we're looking at. How did that, you know, influence some of the inherent decisions and how we've built Flink? How does that relate to what we presented earlier today, the in Austrian processing platform and so on? So, starting to work on Flink and stream processing. Stream processing is an extremely general and broad paradigm, right? We've actually started to say what Flink is underneath the hood. It's an engine to do stateful computations over data streams. It's a system that can process data streams as a batch processor processes, you know, bounded data. It can process data streams as a real-time stream processor produces real-time streams of events. It can handle, you know, data streams as in sophisticated event by event, stateful, timely, logic as you know many applications that are, you know, implemented as data-driven micro services or so and implement their logic. And the basic idea behind how Flink takes its approach to that is just start with the basic ingredients that you need that and try not to impose any form of like various constraints and so on around the use of that. So, when I give the presentations, I very often say the basic building blocks for Flink is just like flowing streams of data, streams being, you know, like received from that systems like Kafka, file systems, databases. So, you route them, you may want to repartition them, organize them by key, broadcast them, depending on what you need to do. You implement computation on these streams, a computation that can keep state almost as if it was, you know, like a standalone java application. You don't think necessarily in terms of writing state or database. Think more in terms of maintaining your own variables or so. Sophisticated access to tracking time and progress or progress of data, completeness of data. That's in some sense what is behind the event time streaming notion. You're tracking completeness of data as for a certain point of time. And then to to round this all up, give this a really nice operational tool by introducing this concept of distributed consistent snapshots. And just sticking with these basic primitives, you have streams that just flow, no barrier, no transactional barriers necessarily there between operations, no microbatches, just streams that flow, state variables that get updated and then full tolerance happening as an asynchronous background process. Now that is what is in some sense the I would say kind of the core idea and what helps Flink generalize from batch processing to, you know, real-time stream processing to event-driven applications. And what we saw today is, in the presentation that I gave earlier, how we use that to build a platform for stream processing and event-driven applications. That's taking some of these things and in that case I'm most prominently the fourth aspect the ability to draw like some application snapshots at any point in time and and use this as an extremely powerful operational tool. You can think of it as being a tool to archive applications, migrate applications, fork applications, modify them independently. >> And these snapshots are essentially your individual snapshots at the node level and then you're sort of organizing them into one big logical snapshot. >> Yeah, each node is its own snapshot but they're consistently organized into a globally consistent snapshot, yes. That has a few very interesting and important implications for example. So just to give you one example where this makes really things much easier. If you have an application that you want to upgrade and you don't have a mechanism like that right, what is the default way that many folks do these updates today? Try to do a rolling upgrade of all your individual nodes. You replace one then the next, then the next, then the next but that has this interesting situation where at some point in time there's actually two versions of the application running at the same time. >> And operating on the same sort of data stream. >> Potentially, yeah, or on some partitions of the data stream, we have one version and some partitions you have another version. You may be at the point we have to maintain two wire formats like all pieces of your logic have to be written in understanding both versions or you try to you know use the data format that makes this a little easier but it's just inherently a thing that you don't even have to worry about it if you have this consistent distributed snapshots. It's just a way to switch from one application to the other as if nothing was like shared or in-flight at any point in time. It just gets many of these problems just out of the way. >> Okay and that snapshot applies to code and data? >> So in Flink's architecture itself, the snapshot applies first of all only to data. And that is very important. >> George: Yeah. >> Because what it actually allows you is to decouple the snapshot from the code if you want to. >> George: Okay. >> That allows you to do things like we showed earlier this morning. If you actually have an earlier snapshot where the data is correct then you change the code but you introduce the back. You can just say, "Okay, let me actually change the code "and apply different code to a different snapshot." So, you can actually, roll back or roll forward different versions of code and different versions of state independently or you can go and say when I'm forking this application I'm actually modifying it. That is a level of flexibility that's incredible to, yeah, once you've actually start to make use of it and practice it, it's incredibly useful. It's been actually almost, it's been one of the maybe least obvious things once you start to look into stream processing but once you actually started production as stream processing, this operational flexibility that you get there is I would say very high up for a lot of users when they said, "Okay this is "why we took Flink to streaming production and not others." The ability to do for example that. >> But this sounds then like with some stream processors the idea of the unbundling the database you have derived data you know at different sync points and that derived data is you know for analysis, views, whatever, but it sounds like what you're doing is taking a derived data of sort of what the application is working on in progress and creating essentially a logically consistent view that's not really derived data for some other application use but for operational use. >> Yeah, so. >> Is that a fair way to explain? >> Yeah, let me try to rephrase it a bit. >> Okay. >> When you start to take this streaming style approach to things, which you know it's been called turning the database inside out, unbundling the database, your input sequence of event is arguably the ground truth and what the stream processor computes is as a view of the state of the world. So, while this sounds you know this sounds at first super easy and you know views, you can always recompute a few, right? Now in practice this view of the world is not just something that's just like a lightweight thing that's only derived from the sequence of events. it's actually the, it's the state of the world that you want to use. It might not be fully reproducible just because either the sequence of events has been truncated or because the sequence events is just like too plain long to feasibly recompute it in a reasonable time. So, having a way to work with this in a way that just complements this whole idea of you know like event-driven, log-driven architecture very cleanly is kind of what this snapshot tool also gives you. >> Okay, so then help us think so that sounds like that was part of core Flink. >> That is part of core Flink's inherent design, yes. >> Okay, so then take us to the the next level of abstraction. The scaffolding that you're building around it with the dA platform and how that should make that sort of thing that makes stream processing more accessible, how it you know it empowers a whole other generation. >> Yeah, so there's different angles to what the dA platform does. So, one angle is just very pragmatically easing rollout of applications by having a one way to integrate the you know the platform with your metrics. Alerting logins, the ICD pipeline, and then every application that you deploy over there just like inherits all of that like every edge in the application developer doesn't have to worry about anything. They just say like this is my piece of code. I'm putting it there and it's just going to be hooked in with everything else. That's not rocket science but it's extremely valuable because there's like a lot of tedious bits here and there that you know otherwise eat up a significant amount of the development time. Like technologically maybe more challenging part that this solves is the part where we're really integrating the application snapshot, the compute resources, the configuration management and everything into this model where you don't think about I'm running a Flink job here. That Flink job has created a snapshot that is running around here. There's also a snapshot here which probably may come from that Flink application. Also, that Flink application was running. That's actually just a new version of that Flink application which is the let's say testing or acceptance run for the version that we're about to deploy here and so like tying all of these things together. >> So, it's not just the artifacts from one program, it's how they all interrelate? >> It gives you the idea of exactly of how they all interrelate because an application over its lifetime will correspond to different configurations different code versions, different different deployments on production a/b testing and so on and like how all of these things kind of work together how they interplay right, Flink, like I said before Flink deliberately couples checkpoints and code and so on in a rather loose way to allow you to to evolve the code differently then and still be able to match a previous snapshot into a newer code version and so on. We make heavy use of that but we we cannot give you a good way of first of all tracking all of these things together how do they how do they relate, when was which version running, what code version was that, having a snapshots we can always go back and reinstate earlier versions, having the ability to always move on a deployment from here to there, like fork it, drop it, and so on. That is one part of it and the other part of it is the tight integration with with Kubernetes which is initially container sweet spot was stateless compute and the way stream processing is, how architecture works is the nodes are inherently not stateless, they have a view of the state of the world. This is recoverable always. You can also change the number of containers and with Flink and other frameworks you have the ability to kind of adjust this and so on, >> Including repartitioning the-- >> Including repartitioning the state, but it's a thing that you have to be often quite careful how to do that so that it all integrates exactly consistency, like the right containers are running at the right point in time with the exact right version and there's not like there's not a split brain situation where this happens to be still running some other partitions at the same time or you're running that container goes down and it's this a situation where you're supposed to recover or rescale like, figuring all of these things out, together this is what they like the idea of integrating these things in a very tight way gives you so think of it as the following way, right? You start with, initially you just start with Docker. Doctor is a way to say, I'm packaging up everything that a process needs, all of its environment to make sure that I can deploy it here and here in here and just always works it's not like, "Oh, I'm missing "the correct version of the library here," or "I'm "interfering with that other process on a port." On top of Docker, people added things like Kubernetes to orchestrate many containers together forming an application and then on top of Kubernetes there are things like Helm or for certain frameworks there's like Kubernetes Operators and so on which try to raise the abstraction to say, "Okay we're taking care of these aspects that this needs in addition to a container orchestration," we're doing exactly that thing like we're raising the abstraction one level up to say, okay we're not just thinking about the containers the computer and maybe they're like local persistent storage but we're looking at the entire state full application with its compute, with its state with its archival storage with all of it together. >> Okay let me sort of peel off with a question about more conventionally trained developers and admins and they're used to databases for a batch and request response type jobs or applications do you see them becoming potential developers of continuous stream processing apps or do you see it only mainly for a new a new generation of developers? >> No, I would I would actually say that that a lot of the like classic... Call it request/response or call it like create update, delete create read update delete or so application working against the database, there's this huge potential for stream processing or that kind of event-driven architectures to help change this view. There's actually a fascinating talk here by the folks from (mumbles) who implemented an entire social network in this in this industry processing architecture so not against the database but against a log in and a stream processor instead it comes with some really cool... with some really cool properties like very unique way of of having operational flexibility too at the same time test, and evolve run and do very rapid iterations over your-- >> Because of the decoupling? >> Exactly, because of the decoupling because you don't have to always worry about okay I'm experimenting here with something. Let me first of all create a copy of the database and then once I actually think that this is working out well then, okay how do I either migrate those changes back or make sure that the copy of the database that I did that bring this up to speed with a production database again before I switch over to the new version and so like so many of these things, the pieces just fall together easily in the streaming world. >> I think I asked this of Kostas, but if a business analyst wants to query the current state of what's in the cluster, do they go through some sort of head node that knows where the partitions lay and then some sort of query optimizer figures out how to execute that with a cost model or something? In other words, if you want it to do some sort of batcher interactive type... >> So there's different answers to that, I think. First of all, there's the ability to log into the state of link as in you have the individual nodes that maintains they're doing the computation and you can look into this but it's more like a look up thing. >> It's you're not running a query as in a sequel query against that particular state. If you would like to do something like that, what Flink gives you as the ability is as always... There's a wide variety of connectors so you can for example say, I'm describing my streaming computation here, you can describe in an SQL, you can say the result of this thing, I'm writing it to a neatly queryable data store and in-memory database or so and then you would actually run the dashboard style exploratory queries against that particular database. So Flink's sweet spot at this point is not to run like many small fast short-lived sequel queries against something that is in Flink running at the moment. That's not what it is yet built and optimized for. >> A more batch oriented one would be the derived data that's in the form of a materialized view. >> Exactly, so this place, these two sites play together very well, right? You have the more exploratory better style queries that go against the view and then you have the stream processor and streaming sequel used to continuously compute that view that you then explore. >> Do you see scenarios where you have traditional OLTP databases that are capturing business transactions but now you want to inform those transactions or potentially automate them with machine learning. And so you capture a transaction, and then there's sort of ambient data, whether it's about the user interaction or it's about the machine data flowing in, and maybe you don't capture the transaction right away but you're capturing data for the transaction and the ambient data. The ambient data you calculate some sort of analytic result. Could be a model score and that informs the transaction that's running at the front end of this pipeline. Is that a model that you see in the future? >> So that sounds like a formal use case that has actually been run. It's not uncommon, yeah. It's actually, in some sense, a model like that is behind many of the fraud detection applications. You have the transaction that you capture. You have a lot of contextual data that you receive from which you either built a model in the stream processor or you built a model offline and push it into the stream processor. As you know, let's say a stream of model updates, and then you're using that stream of model updates. You derive your classifiers or your rule engines, or your predictor state from that set of updates and from the history of the previous transactions and then you use that to attach a classification to the transaction and then once this is actually returned, this stream is fed back to the part of the computation that actually processes that transaction itself to trigger the decision whether to for example hold it back or to let it go forward. >> So this is an application where people who have built traditional architectures would add this capability on for low latency analytics? >> Yeah, that's one way to look at it, yeah. >> As opposed to a rip and replace, like we're going to take out our request/response in our batch and put in stream processing. >> Yeah, so that is definitely a way that stream processing is used, that you you basically capture a change log or so of whatever is happening in either a database or you just immediately capture the events, the interaction from users and devices and then you let the stream processor run side by side with the old infrastructure. And just exactly compute additional information that, even a mainframe database might in the end used to decide what to do with a certain transaction. So it's a way to complement legacy infrastructure with new infrastructure without having to break off or break away the legacy infrastructure. >> So let me ask in a different direction more on the complexity that forms attacks for developers and administrators. Many of the open source community products slash projects solve narrow sort of functions within a broader landscape and there's a tax on developers and admins and trying to make those work together because of the different security models, data models, all that. >> There is a zoo of systems and technologies out there and also of different paradigms to do things. Once systems kind of have a similar paradigm, or a tier in mind, they usually work together well, but there's different philosophical takes-- >> Give me some examples of the different paradigms that don't fit together well. >> For example... Maybe one good example was initially when streaming was a rather new thing. At this point in time stream processors were very much thought of as a bit of an addition to the, let's say, the batch stack or whatever ever other stack you currently have, just look at it as an auxiliary piece to do some approximate computation and a big reason why that was the case is because, the way that these stream processors thought of state was with a different consistency model, the way they thought of time was actually different than you know like the batch processors of the databases at which use time stem fields and the early stream processors-- >> They can't handle event time. >> Exactly, just use processing time, that's why these things you know you could maybe complement the stack with that but it didn't really go well together, you couldn't just say like, okay I can actually take this batch job kind of interpret it also as a streaming job. Once the stream processors got a better interpretation. >> The OEM architecture. >> Exactly. So once the stream processors adopted a stronger consistency model a time model that is more compatible with reprocessing and so on, all of these things all of a sudden fit together much better. >> Okay so, do you see that vendors who are oriented around a single paradigm or unified paradigm, do you see them continuing to broaden their footprint so that they can essentially take some of the complexity off the developer and the admin by providing something that, one throat to choke with the pieces that were designed to work together out-of-the-box, unlike some of the zoos with the former Hadoop community? In other words, lot of vendors seem to be trying to do a broader footprint so that it's something that's just simpler to develop to and to operate? >> There there are a few good efforts happening in that space right now, so one that I really like is the idea of standardizing on some APIs. APIs are hard to standardize on but you can at least standardize on semantics, which is something, that for example Flink and Beam have been very keen on trying to have an open discussion and a road map that is very compatible in thinking about streaming semantics. This has been taken to the next level I would say with the whole streaming sequel design. Beam is adding adding stream sequel and Flink is adding stream sequel, both in collaboration with the Apache CXF project, so very similar standardized semantics and so on, and the sequel compliancy so you start to get common interfaces, which is a very important first step I would say. Standardizing on things like-- >> So sequel semantics are across products that would be within a stream processing architecture? >> Yes and I think this will become really powerful once other vendors start to adopt the same interpretation of streaming sequel and think of it as, yes it's a way to take a changing data table here and project a view of this changing data table, a changing materialized view into another system, and then use this as a starting point to maybe compute another derive, you see. You can actually start and think more high-level about things, think really relational queries, dynamic tables across different pieces of infrastructure. Once you can do something like interplay in architectures become easier to handle, because even if on the runtime level things behave a bit different, at least you start to establish a standardized model, in thinking about how to compose your architecture and even if you decide to change on the way, you frequently saved the problem of having to rip everything out and redesign everything because the next system that you bring in just has a completely different paradigm that it follows. >> Okay, this is helpful. To be continued offline or back online on the CUBE. This is George Gilbert. We were having a very interesting and extended conversation with Stephan Ewen, CTO and co-founder of data Artisans and one of the creators of Apache Flink. And we are at Flink Forward in San Francisco. We will be back after this short break.
SUMMARY :
brought to you by data Artisans. This is the second Flink Forward in San Francisco how the platform grows so that you can solve with the basic ingredients that you need that and then you're sort of organizing them So just to give you one example where this makes have to worry about it if you have this consistent the snapshot applies first of all only to data. the snapshot from the code if you want to. that you get there is I would say very high up and that derived data is you know for analysis, approach to things, which you know it's been called like that was part of core Flink. more accessible, how it you know it empowers and everything into this model where you and so on in a rather loose way to allow you to raise the abstraction to say, "Okay we're taking care that a lot of the like classic... make sure that the copy of the database that I did that In other words, if you want it to do the state of link as in you have the individual nodes or so and then you would actually run of a materialized view. go against the view and then you have the stream processor Is that a model that you see in the future? You have the transaction that you capture. As opposed to a rip and replace, and devices and then you let the stream processor run Many of the open source community there and also of different paradigms to do things. Give me some examples of the different paradigms that the batch stack or whatever ever other stack you currently you know you could maybe complement the stack with that So once the stream processors right now, so one that I really like is the idea of to maybe compute another derive, you see. and one of the creators of Apache Flink.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
George Gilbert | PERSON | 0.99+ |
Stephan Ewen | PERSON | 0.99+ |
George | PERSON | 0.99+ |
Stephan | PERSON | 0.99+ |
San Francisco | LOCATION | 0.99+ |
Flink | ORGANIZATION | 0.99+ |
one version | QUANTITY | 0.99+ |
both versions | QUANTITY | 0.99+ |
two sites | QUANTITY | 0.99+ |
Apache Flink | ORGANIZATION | 0.99+ |
two versions | QUANTITY | 0.99+ |
Flink Forward | ORGANIZATION | 0.99+ |
second | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
today | DATE | 0.98+ |
fourth aspect | QUANTITY | 0.98+ |
java | TITLE | 0.98+ |
Artisans | ORGANIZATION | 0.98+ |
one program | QUANTITY | 0.97+ |
one way | QUANTITY | 0.97+ |
both | QUANTITY | 0.97+ |
Kubernetes | TITLE | 0.97+ |
one angle | QUANTITY | 0.97+ |
Kafka | TITLE | 0.96+ |
one part | QUANTITY | 0.96+ |
first step | QUANTITY | 0.96+ |
two wire formats | QUANTITY | 0.96+ |
first | QUANTITY | 0.96+ |
First | QUANTITY | 0.94+ |
each node | QUANTITY | 0.94+ |
Beam | ORGANIZATION | 0.94+ |
one example | QUANTITY | 0.94+ |
CTO | PERSON | 0.93+ |
2018 | DATE | 0.93+ |
Docker | TITLE | 0.92+ |
Apache | ORGANIZATION | 0.91+ |
one good example | QUANTITY | 0.91+ |
single paradigm | QUANTITY | 0.9+ |
one application | QUANTITY | 0.89+ |
Flink | TITLE | 0.86+ |
node | TITLE | 0.79+ |
Kostas | ORGANIZATION | 0.76+ |
earlier this morning | DATE | 0.69+ |
CUBE | ORGANIZATION | 0.67+ |
SQL | TITLE | 0.64+ |
Helm | TITLE | 0.59+ |
CXF | TITLE | 0.59+ |