Day Two Kickoff - Spark Summit East 2017 - #SparkSummit - #theCUBE

>> Narrator: Live from Boston, Massachusetts, this is theCUBE, covering Spark Summit East 2017. Brought to you by Databricks. Now, here are your hosts, Dave Vellante and George Gilbert. >> Welcome back to day two in Boston where it is snowing sideways here. But we're all here at Spark Summit #SparkSummit, Spark Summit East, this is theCUBE. Sound like an Anglo flagship product. We go out to the event, we program for our audience, we extract the signal from the noise. I'm here with George Gilbert, day two, at Spark Summit, George. We're seeing the evolution of so-called big data. Spark was a key part of that. Designed to really both simplify and speed up big data oriented transactions and really help fulfill the dream of big data, which is to be able to affect outcomes in near real time. A lot of those outcomes, of course, are related to ad tech and selling and retail oriented use cases, but we're hearing more and more around education and deep learning and affecting consumers and human life in different ways. We're now 10 years in to the whole big data trend, what's your take, George, on what's going on here? >> Even if we started off with ad tech, which is what most of the big internet companies did, we always start off in any new paradigm with one application that kind of defines that era. And then we copy and extend that pattern. For me, on the rethinking your business the a McGraw-Hill interview we did yesterday was the most amazing thing because they took, what they had was a textbook business for their education unit and they're re-thinking the business, as in what does it mean to be an education company? And they take cognitive science about how people learn and then they take essentially digital assets and help people on a curriculum, not the centuries old sort of teacher, lecture, homework kind of thing, but individualized education where the patterns of reinforcement are consistent with how each student learns. And it's not just a break up the lecture into little bits, it's more of a how do you learn most effectively? How do you internalize information? >> I think that is a great example, George, and there are many, many examples of companies that are transforming digitally. Years and years ago people started to think about okay, how can I instrument or digitize certain assets that I have for certain physical assets? I remember a story when we did the MIT event in London with Andy MacAfee and Eric Binyolsen, they were giving the example of McCormick Spice, the spice company, who digitized by turning what they were doing into recipes and driving demand for their product and actually building new communities. That was kind of an interesting example, but sort of mundane. The McGraw-Hill education is massive. Their chief data scientist, chief data scientist? I don't know, the head of engineering, I guess, is who he was. >> VP of Analytics and Data Science. >> VP of Analytics and Data Science, yeah. He spoke today and got a big round of applause when he sort of led off about the importance of education at the keynote. He's right on, and I think that's a classic example of a company that was built around printing presses and distributing dead trees that is completely transformed and it's quite successful. Over the last only two years brought in a new CEO. So that's good, but let's bring it back to Spark specifically. When Spark first came out, George, you were very enthusiastic. You're technical, you love the deep tech. And you saw the potential for Spark to really address some of the problems that we faced with Hadoop, particularly the complexity, the batch orientation. Even some of the costs -- >> The hidden costs. >> Associated with that, those hidden costs. So you were very enthusiastic, in your mind, has Spark lived up to your initial expectations? >> That's a really good question, and I guess techies like me are often a little more enthusiastic than the current maturity of the technology. Spark doesn't replace Hadoop, but it carves out a big chunk of what Hadoop would do. Spark doesn't address storage, and it doesn't really have any sort of management bits. So you could sort of hollow out Hadoop and put Spark in. But it's still got a little ways to go in terms of becoming really, really fast to respond in near real time. Not just human real time, but like machine real time. It doesn't work sort of deeply with databases yet. It's still teething, and sort of every release, which is approximately every 12 to 18 months, it gets broader in its applicability. So there's no question sort of everyone is piling on, which means that'll help it mature faster. >> When Hadoop was first sort of introduced to the early masses, not the main stream masses, but the early masses, the profundity of Hadoop was that you could leave data in place and bring compute to the data. And people got very excited about that because they knew there was so much data and you just couldn't keep moving it around. But the early insiders of Hadoop, I remember, they would come to theCUBE and everybody was, of course, enthusiastic and lot of cheerleading going on. But in the hallway conversations with Hadoop, with the real insiders you would have conversations about, people are going to realize how much this sucks some day and how hard this is and it's going to hit a wall. Some of the cheerleaders would say, no way, Hadoop forever. Now you've started to see that in practice. And the number of real hardcore transformations as a result of Hadoop in and of itself have been quite limited. The same is true for virtually, most anyway, technology, not any technology. I'd say the smartphone was pretty transformative in and of itself, but nonetheless, we are seeing that sort of progression and we're starting to see a lot of the same use cases that you hear about like fraud detection and retargeting as coming up again. I think what we're seeing is those are improving. Like fraud detection, I talked yesterday about it used to be six months before you'd even detect fraud, if you ever did. Now it's minutes or seconds. But you still get a lot of false positives. So we're going to just keep turning that crank. Mike Gualtieri today talked about the efficacy of today's AI and he gave some examples of Google, he showed a plane crash and he said, it said plane and it accurately identified that, but also the API said it could be wind sports or something like that. So you can see it's still not there yet. At the same time, you see things like Siri and Amazon Alexa getting better and better and better. So my question to you, kind of long-winded here, is, is that what Spark is all about? Just making better the initial initiatives around big data, or is it more transformative than that? >> Interesting question, and I would come at it with a couple different answers. Spark was a reaction to you can't, you can't have multiple different engines to attack all the different data problems because you would do a part of the analysis here, push it into a disk, pull it out of a disk to another engine, all of that would take too long or be too complex a pipeline to go from end to the other. Spark was like, we'll do it all in our unified engine and you can come at it from SQL, you can come at it from streaming, so it's all in one place. That changes the sophistication of what you can do, the simplicity, and therefore how many people can access it and apply it to these problems. And the fact that it's so much faster means you can attack a qualitatively different setup of problems. >> I think as well it really underscores the importance of Open Source and the ability of the Open Source community to launch projects that both stick and can attract serious investment. Not only with IBM, but that's a good example. But entire ecosystems that collectively can really move the needle. Big day today, George, we've got a number of guests. We'll give you the last word at the open. >> Okay, what I thought, this is going to sound a little bit sort of abstract, but a couple of two takeaways from some of our most technical speakers yesterday. One was with Juan Stoyka who sort of co-headed the lab that was the genesis of Spark at Berkeley. >> AMPLabs. >> The AMPLab at Berkeley. >> And now Rise Labs. >> And then also with the IBM Chief Data Officer for the Analytics Unit. >> Seth Filbrun. >> Filbrun, yes. When we look at what's the core value add ultimately, it's not these infrastructure analytic frameworks and that sort of thing, it's the machine learning model in its flywheel feedback state where it's getting trained and re-trained on the data that comes in from the app and then as you continually improve it, that was the whole rationale for Data Links, but not with models. It was put all the data there because you're going to ask questions you couldn't anticipate. So here it's collect all the data from the app because you're going to improve the model in ways you didn't expect. And that beating heart, that living model that's always getting better, that's the core value add. And that's going to belong to end customers and to application companies. >> One of the speakers today, AI kind of invented in the 50s, a lot of excitement in the 70s, kind of died in the 80s and it's coming back. It's almost like it's being reborn. And it's still in its infant stages, but the potential is enormous. All right, George, that's a wrap for the open. Big day today, keep it right there, everybody. We got a number of guests today, and as well, don't forget, at the end of the day today George and I will be introducing part two of our WikiBon Big Data forecast. This is where we'll release a lot of our numbers and George will give a first look at that. So keep it right there everybody, this is theCUBE. We're live from Spark Summit East, #SparkSummit. We'll be right back. (techno music)

Published Date : Feb 9 2017

SUMMARY :

Brought to you by Databricks. fulfill the dream of big data, which is to be able it's more of a how do you learn most effectively? the example of McCormick Spice, the spice company, some of the problems that we faced with Hadoop, So you were very enthusiastic, in your mind, than the current maturity of the technology. At the same time, you see things like Siri That changes the sophistication of what you can do, of Open Source and the ability of the Open Source community One was with Juan Stoyka who sort of co-headed the lab for the Analytics Unit. that comes in from the app and then as you One of the speakers today, AI kind of invented

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Mike Gualtieri	PERSON	0.99+
George	PERSON	0.99+
Juan Stoyka	PERSON	0.99+
Boston	LOCATION	0.99+
IBM	ORGANIZATION	0.99+
Eric Binyolsen	PERSON	0.99+
London	LOCATION	0.99+
yesterday	DATE	0.99+
10 years	QUANTITY	0.99+
Siri	TITLE	0.99+
Berkeley	LOCATION	0.99+
Google	ORGANIZATION	0.99+
McCormick Spice	ORGANIZATION	0.99+
Boston, Massachusetts	LOCATION	0.99+
Rise Labs	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
today	DATE	0.99+
Seth Filbrun	PERSON	0.99+
80s	DATE	0.98+
50s	DATE	0.98+
each student	QUANTITY	0.98+
two takeaways	QUANTITY	0.98+
70s	DATE	0.98+
Spark	ORGANIZATION	0.98+
Spark Summit East 2017	EVENT	0.98+
first	QUANTITY	0.97+
both	QUANTITY	0.97+
Andy MacAfee	PERSON	0.97+
#SparkSummit	EVENT	0.97+
One	QUANTITY	0.96+
1	QUANTITY	0.96+
day two	QUANTITY	0.95+
one application	QUANTITY	0.95+
Spark	TITLE	0.95+
McGraw-Hill	PERSON	0.94+
AMPLabs	ORGANIZATION	0.94+
Years	DATE	0.94+
one place	QUANTITY	0.93+
Hadoop	TITLE	0.93+
Alexa	TITLE	0.93+
Databricks	ORGANIZATION	0.93+
Spark Summit East	EVENT	0.93+
12	QUANTITY	0.91+
two years	QUANTITY	0.91+
Spark Summit East	LOCATION	0.91+
six months	QUANTITY	0.9+
SQL	TITLE	0.89+
Chief Data Officer	PERSON	0.89+
Hadoop	PERSON	0.85+
much	QUANTITY	0.84+
Spark Summit	EVENT	0.84+
Anglo	OTHER	0.81+
first look	QUANTITY	0.75+
8 months	QUANTITY	0.72+
WikiBon	ORGANIZATION	0.69+
part two	QUANTITY	0.69+
Hill	ORGANIZATION	0.68+
Kickoff	EVENT	0.64+
couple	QUANTITY	0.64+
McGraw-	PERSON	0.64+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Seth Filbrun: