Mike Gualtieri, Forrester Research - Spark Summit East 2017 - #sparksummit - #theCUBE
>> Narrator: Live from Boston, Massachusetts, this is the Cube, covering Spark Summit East 2017, brought to you by Databricks. Now, here are your hosts, Dave Vellante and George Gilbert. >> Welcome back to Boston, everybody, where the town is still euphoric. Mike Gualtieri is here, he's the principal analyst at Forrester Research, attended the parade yesterday. How great was that, Mike? >> Yes. Yes. It was awesome. >> Nothing like we've ever seen before. All right, the first question is what was the bigger shocking surprise, upset, greatest win, was it the Red Sox over the Yankees or was it the Superbowl this weekend? >> That's the question, I think it's the Superbowl. >> Yeah, who knows, right? Who knows. It was a lot of fun. So how was the parade yesterday? >> It was magnificent. I mean, it was freezing. No one cared. I mean--but it was, yeah, it was great. Great to see that team in person. >> That's good, wish we could talk, We can, but we'll get into it. So, we're here at Spark Summit, and, you know, the show's getting bigger, you're seeing more sponsors, still heavily a technical audience, but what's your take these days? We were talking off-camera about the whole big data thing. It used to be the hottest thing in the world, and now nobody wants to have big data in their title. What's Forrester's take on that? >> I mean, I think big data-- I think it's just become mainstream, so we're just back to data. You know, because all data is potentially big. So, I don't think it's-- it's not the thing anymore. I mean, what do you do with big data? You analyze it, right? And part of what this whole Spark Summit is about-- look at all the sessions. Data science, machine learning, streaming analytics, so it's all about sort of using that data now, so big data is still important, but the value of big data comes from all this advanced analytics. >> Yeah, and we talked earlier, I mean, a lot of the value of, you know, Hadoop was cutting costs. You know, you've mentioned commodity components and reduction in denominator, and breaking the need for some kind of big storage container. OK, so that-- we got there. Now, shifting to new sources of value, what are you spending your time on these days in terms of research? >> Artificial intelligence, machine learning, so those are really forms of advanced analytics, so that's been-- that's been very hot. We did a survey last year, an AI survey, and we asked a large group of people, we said, oh, you know, what are you doing with AI? 58% said they're researching it. 19% said they're training a model. Right, so that's interesting. 58% are researching it, and far fewer are actually, you know, actually doing something with it. Now, the reality is, if you phrase that a little bit differently, and you said, oh, what are you doing with machine learning? Many more would say yes, we're doing machine learning. So it begs the question, what do enterprises think of AI? And what do they think it is? So, a lot of my inquiries are spent helping enterprises understand what AI is, what they should focus on, and the other part of it is what are the technologies used for AI, and deep learning is the hottest. >> So, you wrote a piece late last year, what's possible today in AI. What's possible today in AI? >> Well, you know, before understanding was possible, it's important to understand what's not possible, right? And so we sort of characterize it as there's pure AI, and there's pragmatic AI. So it's real simple. Pure AI is the sci-fi stuff, we've all seen it, Ex Machina, Star Wars, whatever, right? That's not what we're talking about. That's not what enterprises can do today. We're talking about pragmatic AI, and pragmatic AI is about building predictive models. It's about conversational APIs, to interact in a natural way with humans, it's about image analysis, which is something very hot because of deep learning. So, AI is really about the building blocks that companies have been using, but then using them in combination to create even more intelligent solutions. And they have more options on the market, both from open source, both from cloud services that-- from Google, Microsoft, IBM, and now Amazon, at their re-- Were you guys at their reinvent conference? >> I wasn't, personally, but we were certainly there. >> Yeah, they announced the Amazon AI, which is a set of three services that developers can use without knowing anything about AI or being a data scientist. But, I mean, I think the way to think about AI is that it is data science. It requires the expertise of a data scientist to do AI. >> Following up on that comment, which was really interesting, is we try and-- whereas vendors try and democratize access to machine learning and AI, and I say that with two terms because usually the machine learning is the stuff that's sort of widely accessible and AI is a little further out, but there's a spectrum when you can just access an API, which is like a pre-trained model-- >> Pre-trained model, yep. >> It's developer-accessible, you don't need to be a data scientist, and then at the other end, you know, you need to pick your algorithms, you need to pick your features, you need to find the right data, so how do you see that horizon moving over time? >> Yeah, no, I-- So, these machine learning services, as you say, they're pre-trained models, totally accessible by anyone, anyone who can call an API or a restful service can access these. But their scope is limited, right? So, if, for example, you take the image API, you know, the imaging API that you can get from Google or now Amazon, you can drop an image in there and it will say, oh, there's a wine bottle on a picnic table on the beach. Right? It can identify that. So that's pretty cool, there might be a lot of use cases for that, but think of an enterprise use case. No. You can't do it, and let me give you this example. Say you're an insurance company, and you have a picture of a steel roof that's caved in. If you give that to one of these APIs, it might say steel roof, it may say damage, but what it's not going to do is it's not going to be able to estimate the damage, it's not going to be able to create a bill of materials on how to repair it, because Google hasn't trained it at that level. OK, so, enterprises are going to have to do this themselves, or an ISV is going to have to do it, because think about it, you've got 10 years worth of all these pictures taken of damage. And with all of those pictures, you've got tons of write-ups from an adjuster. Whoa, if you could shove that into a deep learning algorithm, you could potentially have consumers take pictures, or someone untrained, and have this thing say here's what the estimate damage is, this is the situation. >> And I've read about like insurance use cases like that, where the customer could, after they sort of have a crack up, take pictures all around the car, and then the insurance company could provide an estimate, tell them where the nearest repair shops are-- >> Yeah, but right now it's like the early days of e-commerce, where you could send an order in and then it would fax it and they'd type it in. So, I think, yes, insurance coverage is taking those pictures, and the question is can we automate it, and-- >> Well, let me actually iterate on that question, which is so who can build a more end-to-end solution, assuming, you know, there's a lot of heavy lifting that's got to go on for each enterprise trying to build a use case like that. Is it internal development and only at big companies that have a few of these data science gurus? Would it be like an IBM Global Services or an EXIN SURE, or would it be like a vertical ISV where it's semi-custom, semi-patent? >> I think it's both, but I also think it's two or three people walking around this conference, right, understanding Spark, maybe understanding how to use TensorFlow in conjunction with Spark that will start to come up with these ideas as well. So I think-- I think we'll see all of those solutions. Certainly, like IBM with their cognitive computing-- oh, and by the way, so we think that cognitive computing equals pragmatic AI, right, because it has similar characteristics. So, we're already seeing the big ISVs and the big application developers, SAP, Oracle, creating AI-infused applications or modules, but yeah, we're going to see small ISVs do it. There's one in Austin, Texas, called InteractiveTel. It's like 10 people. What they do is they use the Google-- so they sell to large car dealerships, like Ernie Boch. And they record every conversation, phone conversation with customers. They use the Google pre-trained model to convert the speech to text, and then they use their own machine learning to analyze that text to find out if there's a customer service problem or if there's a selling opportunity, and then they alert managers or other people in the organization. So, small company, very narrowly focused on something like car buying. >> So, I wonder if we could come back to something you said about pragmatic AI. We love to have someone like you on the Cube, because we like to talk about the horses on the track. So, if Watson is pragmatic AI, and we all-- well, I think you saw the 60 Minutes show, I don't know, whenever it was, three or four months ago, and IBM Watson got all the love. They barely mentioned Amazon and Google and Facebook, and Microsoft didn't get any mention. So, and there seems to be sentiment that, OK, all the real action is in Silicon Valley. But you've got IBM doing pragmatic AI. Do those two worlds come together in your view? How does that whole market shake up? >> I don't think they come together in the way I think you're suggesting. I think what Google, Microsoft, Facebook, what they're doing is they're churning out fundamental technology, like one of the most popular deep learning frameworks, TensorFlow, is a Google thing that they open sourced. And as I pointed out, those image APIs, that Amazon has, that's not going to work for insurance, that's not going to work for radiology. So, I don't think they're in-- >> George Gilbert: Facebook's going to apply it differently-- >> Yeah, I think what they're trying to do is they're trying to apply it to the millions of consumers that use their platforms, and then I think they throw off some of the technology for the rest of the world to use, fundamentally. >> And then the rest of the world has to apply those. >> Yeah, but I don't think they're in the business of building insurance solutions or building logistical solutions. >> Right. >> But you said something that was really, really potentially intriguing, which was you could take the horizontal Google speech to text API, and then-- >> Mike Gualtieri: And recombine it. >> --put your own model on top of that. And that's, techies call that like ensemble modeling, but essentially you're taking, almost like an OS level service, and you're putting in a more vertical application on top of it, to relate it to our old ways of looking at software, and that's interesting. >> Yeah, because what we're talking about right now, but this conversation is now about applications. Right, we're talking about applications, which need lots of different services recombined, whereas mostly the data science conversation has been narrowly about building one customer lifetime value model or one churn model. Now the conversation, when we talk about AI, is becoming about combining many different services and many different models. >> Dave Vellante: And the platform for building applications is really-- >> Yeah, yeah. >> And that platform, the richest platform, or the platform that is, that is most attractive has the most building blocks to work with, or the broadest ones? >> The best ones, I would say, right now. The reason why I say it that way is because this technology is still moving very rapidly. So for an image analysis, deep learning, very good for image, nothing's better than deep learning for image analysis. But if you're doing business process models or like churn models, well, deep learning hasn't played out there yet. So, right now I think there's some fragmentation. There's so much innovation. Ultimately it may come together. What we're seeing is, many of these companies are saying, OK, look, we're going to bring in the open source. It's pretty difficult to create a deep learning library. And so, you know, a lot of the vendors in the machine learning space, instead of creating their own, they're just bringing in MXNet or TensorFlow. >> I might be thinking of something from a different angle, which is not what underlying implementation they're using, whether it's deep learning or whether it's just random forest, or whatever the terminology is, you know, the traditional statistical stuff. The idea, though, is you want a platform-- like way, way back, Windows, with the Win32 API had essentially more widgets for helping you build graphical applications than any other platform >> Mike Gualtieri: Yeah, I see where you're going. >> And I guess I'm thinking it doesn't matter what the underlying implementation is, but how many widgets can you string together? >> I'm totally with you there, yeah. And so I think what you're saying is look, a platform that has the most capabilities, but abstracts, the implementations, and can, you know, can be somewhat pluggable-- right, good, to keep up with the innovation, yeah. And there's a lot of new companies out there, too, that are tackling this. One of them's called Bonsai AI, you know, small startup, they're trying to abstract deep learning, because deep learning right now, like TensorFlow and MXNet, that's a little bit of a challenge to learn, so they're abstracting it. But so are a lot of the-- so is SAS, IBM, et cetera. >> So, Mike, we're out of time, but I want to talk about your talk tomorrow. So, AI meets Spark, give us a little preview. >> AI meets Spark. Basically, the prerequisite to AI is a very sophisticated and fast data pipeline, because just because we're talking about AI doesn't mean we don't need data to build these models. So, I think Spark gives you the best of both worlds, right? It's designed for these sort of complex data pipelines that you need to prep data, but now, with MLlib for more traditional machine learning, and now with their announcement of TensorFrames, which is going to be an interface for TensorFlow, now you've got deep learning, too. And you've got it in a cluster architecture, so it can scale. So, pretty cool. >> All right, Mike, thanks very much for coming on the Cube. You know, way to go Pats, awesome. Really a pleasure having you back. >> Thanks. >> All right, keep right there, buddy. We'll be back with our next guest right after this short break. This is the Cube. (peppy music)
SUMMARY :
brought to you by Databricks. Mike Gualtieri is here, he's the principal analyst It was awesome. All right, the first question is So how was the parade yesterday? Great to see that team in person. and, you know, the show's getting bigger, I mean, what do you do with big data? what are you spending your time on Now, the reality is, if you phrase that So, you wrote a piece late last year, So, AI is really about the building blocks It requires the expertise of a data scientist to do AI. So, if, for example, you take the image API, of e-commerce, where you could send an order in assuming, you know, there's a lot of heavy lifting and the big application developers, SAP, Oracle, We love to have someone like you on the Cube, that Amazon has, that's not going to work for insurance, Yeah, I think what they're trying to do Yeah, but I don't think they're in the business and you're putting in a more vertical application Yeah, because what we're talking about right now, And so, you know, a lot of the vendors you know, the traditional statistical stuff. and can, you know, can be somewhat pluggable-- So, Mike, we're out of time, So, I think Spark gives you the best of both worlds, right? Really a pleasure having you back. This is the Cube.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
IBM | ORGANIZATION | 0.99+ |
Mike Gualtieri | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
George Gilbert | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
Microsoft | ORGANIZATION | 0.99+ |
two | QUANTITY | 0.99+ |
Mike | PERSON | 0.99+ |
Red Sox | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
Boston | LOCATION | 0.99+ |
Star Wars | TITLE | 0.99+ |
10 years | QUANTITY | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
two terms | QUANTITY | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
Yankees | ORGANIZATION | 0.99+ |
10 people | QUANTITY | 0.99+ |
Superbowl | EVENT | 0.99+ |
last year | DATE | 0.99+ |
IBM Global Services | ORGANIZATION | 0.99+ |
one | QUANTITY | 0.99+ |
both | QUANTITY | 0.99+ |
Ex Machina | TITLE | 0.99+ |
Boston, Massachusetts | LOCATION | 0.99+ |
Win32 | TITLE | 0.99+ |
first question | QUANTITY | 0.99+ |
Austin, Texas | LOCATION | 0.99+ |
19% | QUANTITY | 0.99+ |
millions | QUANTITY | 0.99+ |
yesterday | DATE | 0.99+ |
three | DATE | 0.99+ |
58% | QUANTITY | 0.99+ |
Forrester Research | ORGANIZATION | 0.99+ |
three people | QUANTITY | 0.99+ |
Spark | TITLE | 0.99+ |
One | QUANTITY | 0.99+ |
SAS | ORGANIZATION | 0.98+ |
tomorrow | DATE | 0.98+ |
three services | QUANTITY | 0.98+ |
Databricks | ORGANIZATION | 0.98+ |
Spark Summit | EVENT | 0.98+ |
both worlds | QUANTITY | 0.98+ |
TensorFrames | TITLE | 0.97+ |
MLlib | TITLE | 0.97+ |
SAP | ORGANIZATION | 0.97+ |
today | DATE | 0.96+ |
each enterprise | QUANTITY | 0.96+ |
TensorFlow | TITLE | 0.96+ |
four months ago | DATE | 0.95+ |
two worlds | QUANTITY | 0.95+ |
Windows | TITLE | 0.95+ |
Cube | COMMERCIAL_ITEM | 0.94+ |
late last year | DATE | 0.93+ |
Ernie Boch | PERSON | 0.91+ |