Greg Benson, SnapLogic | Flink Forward 2018

>> Announcer: Live from San Francisco, it's theCUBE covering Flink Forward brought to you by Data Artisans. >> Hi this is George Gilbert. We are at Flink Forward on the ground in San Francisco. This is the user conference for the Apache Flink Community. It's the second one in the US and this is sponsored by Data Artisans. We have with us Greg Benson, who's Chief Scientist at Snap Logic and also professor of computer science at University of San Francisco. >> Yeah that's great, thanks for havin' me. >> Good to have you. So, Greg, tell us a little bit about how Snap Logic currently sets up its, well how it builds its current technology to connect different applications. And then talk about, a little bit, where you're headed and what you're trying to do. >> Sure, sure, so Snap Logic is a data and app integration Cloud platform. We provide a graphical interface that lets you drag and drop. You can open components that we call Snaps and you kind of put them together like Lego pieces to define relatively sophisticated tasks so that you don't have to write Java code. We use machine learning to help you build out these pipelines quickly so we can anticipate based on your data sources, what you are going to need next, and that lends itself to rapid building of these pipelines. We have a couple of different ways to execute these pipelines. You can think of it as sort of this specification of what the pipeline's supposed to do. We have a proprietary engine that we can execute on single notes, either in the Cloud or behind your firewall in your data center. We also have a mode which can translate these pipelines into Spark code and then execute those pipelines at scale. So, you can do sort of small, low latency processing to sort of larger, batch processing on very large data sets. >> Okay, and so you were telling me before that you're evaluating Flink or doing research with Flink as another option. Tell us what use cases that would address that the first two don't. >> Yeah, good question. I'd love to just back up a little bit. So, because I have this dual role of Chief Scientist and as a professor of Computer Science, I'm able to get graduate students to work on research projects for credit, and then eventually as interns at SnapLogic. A recent project that we've been working on since we started last fall so working on about six or seven months now is investigating Flink as a possible new back end for the SnapLogic platform. So this allows us to you know, to explore and prototype and just sort of figure out if there's going to be a good match between an emerging technology and our platform. So, to go back to your question. What would this address? Well, so, without going into too much of the technical differences between Flink and Spark which I imagine has come up in some of your conversations or it comes up here because they can solve similar use cases our experience with Flink is the code base is easy to work with both from taking our specification of pipelines and then converting them into Flink code that can run. But there's another benefit that we see from Flink and that is, whenever any product, whether it's our product or anybody else's product, that uses something like Spark or Flink as a back end, there's this challenge because you're converting something that your users understand into this target, right, this Spark API code or Flink API code. And the challenge there is if something goes wrong, how do you propagate that back to the users so the user doesn't have to read log files or get into the nuts and bolts of how Spark really works. >> It's almost like you've compiled the code, and now if something doesn't work right, you need to work at the source level. >> That's exactly right, and that's what we don't want our users to do, right? >> Right. >> So one promising thing about Flink is that we're able to integrate the code base in such a way that we have a better understanding of what's happening in the failure conditions that occur. And we're working on ways to propagate those back to the user so they can take actionable steps to remedy those without having to understand the Flink API code iself. >> And what is it, then, about Flink or its API that gives you that feedback about errors or you know, operational status that gives you better visibility than you would get in something else like Spark. >> Yeah, so without getting too too deep on the subject, what we have found is, one thing nice about the Flink code base is the core is written in Scala, but there's a lot of, all the IO and memory handling is written in Java and that's where we need to do our primary interfacing and the building blocks, sort of the core building blocks to get to, for example, something that you build with a dataset API to execution. We have found it easier to follow the transformation steps that Flink takes to end up with the resulting sort of optimized, optimized Flink pipeline. Now by understanding that transformation, like you were saying, the compilation step, by understanding it, then we can work backwards, and understand how, when something happens, how to trace it back to what the user was originally trying to specify. >> The GUI specification. >> Yeah. Right. >> So, help me understand though it sounds like you're the one essentially building a compiler from a graphical specification language down to Spark as the, you know, sort of, pseudo, you know, psuedo compile code, >> Yep. >> Or Flink. And, but if you're the one doing that compilation, I'm still struggling to understand why you would have better reverse engineering capabilities with one. >> It just is a matter of getting visibility into the steps that the underlying frameworks are taking and so, I'm not saying this is impossible to do in Spark, but we have found that we've had, it's been easier for us to get into the transformation steps that Flink is taking. >> Almost like, for someone who's had as much programming as a one semester in night school, like a variable and specter that's already there, >> Yeah, that's a good, there you go, yeah, yeah, yeah. >> Okay, so you don't have to go try and you can't actually add it, and you don't have to then infer it from all this log data. >> Now, I should add, there's another potential Flink. You were asking about use cases and what does Flink address. As you know, Flink is a streaming platform, in addition to being a batch platform, and Flink does streaming differently than how Spark does. Spark takes a microbatch approach. What we're also looking at in my research effort is how to take advantage of Flink's streaming approach to allow the SnapLogic GUI to be used to specify streaming Flink applications. Initially we're just focused on the batch mode but now we're also looking at the potential to convert these graphical pipelines into streaming Flink applications, which would be a great benefit to customers who want-- >> George: Real time integration. >> Want to do what Alibaba and all the other companies are doing but take advantage of it without having to get to the nuts and bolts of the programming. Do it through the GUI. >> Wow, so it's almost like, it's like, Flink, Beam, in terms of obstruction layers, >> Sure. >> And then SnapLogic. >> Greg: Sure, yes. >> Not that you would compile the beam, but the idea that you would have perv and processing and a real-time pipeline. >> Yes. >> Okay. So that's actually interesting, so that would open up a whole new set of capabilities. >> Yeah and, you know, it follows our you know, company's vision in allowing lots of users to do very sophisticated things without being, you know, Hadoop developers or Spark developers, or even Flink developers, we do a lot of the hard work of trying to give you a representation that's easier to work with, right but, also allow you to sort of evolve that and de-bug it and also eventually get the performance out of these systems One of the challenges of course of Spark and Flink is that they have to be tuned, and you have to, and so what we're trying to do is, using some of our machine learning, is eventually gather information that can help us identify how to tune different types of work flows in different environments. And that, if we're able to do that in it's entirety, then we, you know, we take out a lot of the really hard work that goes into making a lot of these streaming applications both scalable and performing. >> Performimg. So this would be, but you would have, to do that, you would probably have to collect well, what's the term? I guess data from the operations of many customers, >> Right. >> Because, as training data, just as the developer alone, you won't really have enough. >> Absolutely, and that's, so that you have to bootstrap that. For our machine learning that we currently use today, we leverage, you know, the thousands of pipelines, the trillions of documents that we now process on a monthly basis, and that allows us to provide good recommendations when you're building pipelines, because we have a lot of information. >> Oh, so you are serving the runtime, these runtime compilations. >> Yes. >> Oh, they're not all hosted on the customer premises. >> Oh, no no no, we do both. So it's interesting, we do both. So you can, you can deploy completely in the cloud, we're a complete SASS provider for you. Most of our customers though, you know, Banks Healthcare, want to run our engine behind their firewalls. Even when we do that though, we still have metadata that we can get introspection, sort of anonymized, but we can get introspection into how things are behaving. >> Okay. That's very interesting. Alright, Greg we're going to have to end it on that note, but uh you know, I guess everyone stay tuned. That sounds like a big step forward in sort of specification of real time pipelines at a graphical level. >> Yeah, well, it's, I hope to be talking to you again soon with more results. >> Looking forward to it. With that, this is George Gilbert, we are at Flink Forward, the user conference for the Apache Flink conference, sorry for the Apache Flink user community, sponsored by Data Artisans, we will be back shortly. (upbeat music)

Published Date : Apr 11 2018

SUMMARY :

brought to you by Data Artisans. We are at Flink Forward on the ground in San Francisco. and what you're trying to do. so that you don't have to write Java code. Okay, and so you were telling me before So this allows us to you know, to explore and prototype you need to work at the source level. so they can take actionable steps to remedy those that gives you that feedback something that you build with a dataset API to execution. you would have better and so, I'm not saying this is impossible to do in Spark, and you don't have to then infer it from all this log data. As you know, Flink is a streaming platform, Want to do what Alibaba and all the other companies the idea that you would have perv and processing so that would open up a whole new is that they have to be tuned, and you have to, So this would be, but you would have, to do that, just as the developer alone, you won't really have enough. we leverage, you know, the thousands of pipelines, Oh, so you are serving the runtime, Most of our customers though, you know, Banks Healthcare, you know, I guess everyone stay tuned. Yeah, well, it's, I hope to be talking to you again soon Looking forward to it.

ENTITIES

Entity	Category	Confidence
Greg Benson	PERSON	0.99+
George Gilbert	PERSON	0.99+
Greg	PERSON	0.99+
US	LOCATION	0.99+
Alibaba	ORGANIZATION	0.99+
Java	TITLE	0.99+
San Francisco	LOCATION	0.99+
Snap Logic	ORGANIZATION	0.99+
George	PERSON	0.99+
Data Artisans	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Spark	TITLE	0.99+
Scala	TITLE	0.99+
Flink	ORGANIZATION	0.99+
thousands	QUANTITY	0.99+
Banks Healthcare	ORGANIZATION	0.99+
second one	QUANTITY	0.99+
Lego	ORGANIZATION	0.99+
last fall	DATE	0.98+
one semester	QUANTITY	0.98+
SnapLogic	ORGANIZATION	0.98+
SnapLogic	TITLE	0.97+
first two	QUANTITY	0.97+
today	DATE	0.97+
Flink	TITLE	0.96+
single notes	QUANTITY	0.96+
about six	QUANTITY	0.96+
trillions of documents	QUANTITY	0.95+
Flink Forward	ORGANIZATION	0.95+
seven months	QUANTITY	0.94+
One	QUANTITY	0.94+
University of San Francisco	ORGANIZATION	0.93+
one	QUANTITY	0.92+
one thing	QUANTITY	0.91+
Apache Flink Community	ORGANIZATION	0.89+
Spark	ORGANIZATION	0.85+
Apache Flink	ORGANIZATION	0.82+
Flink Forward	EVENT	0.82+
2018	DATE	0.81+
pipelines	QUANTITY	0.81+
Flink	EVENT	0.76+
SASS	ORGANIZATION	0.73+
dual	QUANTITY	0.72+
Logic	TITLE	0.72+
Apache	ORGANIZATION	0.57+
Hadoop	ORGANIZATION	0.54+
thing	QUANTITY	0.54+
Beam	TITLE	0.51+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Snap Logic: