Day One Wrap - #SparkSummit - #theCUBE

>> Announcer: Live from San Francisco, it's the CUBE covering Spark Summit 2017, brought to by Databricks. (energetic music plays) >> And what an exciting day we've had here at the CUBE. We've been at Spark Summit 2017, talking to partners, to customers, to founders, technologists, data scientists. It's been a load of information, right? >> Yeah, an overload of information. >> Well, George, you've been here in the studio with me talking with a lot of the guests. I'm going to ask you to maybe recap some of the top things you've heard today for our guests. >> Okay so, well, Databricks laid down, sort of, three themes that they wanted folks to take away. Deep learning, Structured Streaming, and serverless. Now, deep learning is not entirely new to Spark. But they've dramatically improved their support for it. I think, going beyond the frameworks that were written specifically for Spark, like Deeplearning4j and BigDL by Intel And now like TensorFlow, which is the opensource framework from Google, has gotten much better support. Structured Streaming, it was not clear how much more news we were going to get, because it's been talked about for 18 months. And they really, really surprised a lot of people, including me, where they took, essentially, the processing time for an event or a small batch of events down to 1 millisecond. Whereas, before, it was in the hundreds if not higher. And that changes the type of apps you can build. And also, the Databricks guys had coined the term continuous apps, which means they operate on a never-ending stream of data, which is different from what we've had in the past where it's batch or with a user interface, request-response. So they definitely turned up the volume on what they can do with continuous apps. And serverless, they'll talk about more tomorrow. And Jim, I think, is going to weigh in. But it, basically, greatly simplifies the ability to run this infrastructure, because you don't think of it as a cluster of resources. You just know that it's sort of out there, and you ask requests of it, and it figures out how to fulfill it. I will say, the other big surprise for me was when we have Matei, who's the creator of Spark and the chief technologist at Databricks, come on the show and say, when we asked him about how Spark was going to deal with, essentially, more advanced storage of data so that you could update things, so that you could get queries back, so that you could do analytics, and not just of stuff that's stored in Spark but stuff that Spark stores essentially below it. And he said, "You know, Databricks, you can expect to see come out with or partner with a database to do these advanced scenarios." And I got the distinct impression, and after listen to the tape again, that he was talking about for Apache Spark, which is separate from Databricks, that they would do some sort of key-value store. So in other words, when you look at competitors or quasi-competitors like Confluent Kafka or a data artist in Flink, they don't, they're not perfect competitors. They overlap some. Now Spark is pushing its way more into overlapping with some of those solutions. >> Alright. Well, Jim Kobielus. And thank you for that, George. You've been mingling with the masses today. (laughs) And you've been here all day as well. >> Educated masses, yeah, (David laughs) who are really engaged in this stuff, yes. >> Well, great, maybe give us some of your top takeaways after all the conversations you've had today. >> They're not all that dissimilar from George's. What Databricks, Databricks of course being the center, the developer, the primary committer in the Spark opensource community. They've done a number of very important things in terms of the announcements today at this event that push Spark, the Spark ecosystem, where it needs to go to expand the range of capabilities and their deployability into production environments. I feel the deep-learning side, announcement in terms of the deep-learning pipeline API very, very important. Now, as George indicated, Spark has been used in a fair number of deep-learning development environments. But not as a modeling tool so much as a training tool, a tool for In Memory distributed training of deep-learning models that we developed in TensorFlow, in Caffe, and other frameworks. Now this announcement is essentially bringing support for deep learning directly into the Spark modeling pipeline, the machine-learning modeling pipeline, being able to call out to deep learning, you know, TensorFlow and so forth, from within MLlib. That's very important. That means that Spark developers, of which there are many, far more than there are TensorFlow developers, will now have an easy pass to bring more deep learning into their projects. That's critically important to democratize deep learning. I hope, and from what I've seen what Databricks has indicated, that they have support currently in API reaching out to both TensorFlow and Keras, that they have plans to bring in API support for access to other leading DL toolkits such as Caffe, Caffe 2, which is Facebook-developed, such as MXNet, which is Amazon-developed, and so forth. That's very encouraging. Structured Streaming is very important in terms of what they announced, which is an API to enable access to faster, or higher-throughput Structured Streaming in their cloud environment. And they also announced that they have gone beyond, in terms of the code that they've built, the micro-batch architecture of Structured Streaming, to enable it to evolve into a more true streaming environment to be able to contend credibly with the likes of Flink. 'Cause I think that the Spark community has, sort of, had their back against the wall with Structured Streaming that they couldn't fully provide a true sub-millisecond en-oo-en latency environment heretofore. But it sounds like with this R&D that Databricks is addressing that, and that's critically important for the Spark community to continue to evolve in terms of continuous computation. And then the serverless-apps announcement is also very important, 'cause I see it as really being, it's a fully-managed multi-tenant Spark-development environment, as an enabler for continuous Build, Deploy, and Testing DevOps within a Spark machine-learning and now deep-learning context. The Spark community as it evolves and matures needs robust DevOps tools to production-ize these machine-learning and deep-learning models. Because really, in many ways, many customers, many developers are now using, or developing, Spark applications that are real 24-by-7 enterprise application artifacts that need a robust DevOps environment. And I think that Databricks has indicated they know where this market needs to go and they're pushing it with R&D. And I'm encouraged by all those signs. >> So, great. Well thank you, Jim. I hope both you gentlemen are looking forward to tomorrow. I certainly am. >> Oh yeah. >> And to you out there, tune in again around 10:00 a.m. Pacific Time. We're going to be broadcasting live here. From Spark Summit 2017, I'm David Goad with Jim and George, saying goodbye for now. And we'll see you in the morning. (sparse percussion music playing) (wind humming and waves crashing).

Published Date : Jun 7 2017

SUMMARY :

Announcer: Live from San Francisco, it's the CUBE to customers, to founders, technologists, data scientists. I'm going to ask you to maybe recap And that changes the type of apps you can build. And thank you for that, George. after all the conversations you've had today. for the Spark community to continue to evolve I hope both you gentlemen are looking forward to tomorrow. And to you out there, tune in again

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Jim	PERSON	0.99+
George	PERSON	0.99+
David	PERSON	0.99+
David Goad	PERSON	0.99+
San Francisco	LOCATION	0.99+
Matei	PERSON	0.99+
tomorrow	DATE	0.99+
Amazon	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
hundreds	QUANTITY	0.99+
Spark	TITLE	0.99+
both	QUANTITY	0.98+
Google	ORGANIZATION	0.98+
Intel	ORGANIZATION	0.98+
Spark Summit 2017	EVENT	0.98+
18 months	QUANTITY	0.98+
Flink	ORGANIZATION	0.97+
Facebook	ORGANIZATION	0.97+
Confluent Kafka	ORGANIZATION	0.97+
Caffe	ORGANIZATION	0.96+
today	DATE	0.96+
TensorFlow	TITLE	0.94+
three themes	QUANTITY	0.94+
10:00 a.m. Pacific Time	DATE	0.94+
CUBE	ORGANIZATION	0.94+
Deeplearning4j	TITLE	0.94+
Spark	ORGANIZATION	0.93+
1 millisecond	QUANTITY	0.93+
Keras	ORGANIZATION	0.91+
Day One	QUANTITY	0.81+
BigDL	TITLE	0.79+
TensorFlow	ORGANIZATION	0.79+
7	QUANTITY	0.77+
MLlib	TITLE	0.73+
Caffe 2	ORGANIZATION	0.7+
Caffe	TITLE	0.7+
24-	QUANTITY	0.68+
MXNet	ORGANIZATION	0.67+
Apache Spark	ORGANIZATION	0.54+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Deeplearning4j: