Luis Ceze, OctoML | Amazon re:MARS 2022
(upbeat music) >> Welcome back, everyone, to theCUBE's coverage here live on the floor at AWS re:MARS 2022. I'm John Furrier, host for theCUBE. Great event, machine learning, automation, robotics, space, that's MARS. It's part of the re-series of events, re:Invent's the big event at the end of the year, re:Inforce, security, re:MARS, really intersection of the future of space, industrial, automation, which is very heavily DevOps machine learning, of course, machine learning, which is AI. We have Luis Ceze here, who's the CEO co-founder of OctoML. Welcome to theCUBE. >> Thank you very much for having me in the show, John. >> So we've been following you guys. You guys are a growing startup funded by Madrona Venture Capital, one of your backers. You guys are here at the show. This is a, I would say small show relative what it's going to be, but a lot of robotics, a lot of space, a lot of industrial kind of edge, but machine learning is the centerpiece of this trend. You guys are in the middle of it. Tell us your story. >> Absolutely, yeah. So our mission is to make machine learning sustainable and accessible to everyone. So I say sustainable because it means we're going to make it faster and more efficient. You know, use less human effort, and accessible to everyone, accessible to as many developers as possible, and also accessible in any device. So, we started from an open source project that began at University of Washington, where I'm a professor there. And several of the co-founders were PhD students there. We started with this open source project called Apache TVM that had actually contributions and collaborations from Amazon and a bunch of other big tech companies. And that allows you to get a machine learning model and run on any hardware, like run on CPUs, GPUs, various GPUs, accelerators, and so on. It was the kernel of our company and the project's been around for about six years or so. Company is about three years old. And we grew from Apache TVM into a whole platform that essentially supports any model on any hardware cloud and edge. >> So is the thesis that, when it first started, that you want to be agnostic on platform? >> Agnostic on hardware, that's right. >> Hardware, hardware. >> Yeah. >> What was it like back then? What kind of hardware were you talking about back then? Cause a lot's changed, certainly on the silicon side. >> Luis: Absolutely, yeah. >> So take me through the journey, 'cause I could see the progression. I'm connecting the dots here. >> So once upon a time, yeah, no... (both chuckling) >> I walked in the snow with my bare feet. >> You have to be careful because if you wake up the professor in me, then you're going to be here for two hours, you know. >> Fast forward. >> The average version here is that, clearly machine learning has shown to actually solve real interesting, high value problems. And where machine learning runs in the end, it becomes code that runs on different hardware, right? And when we started Apache TVM, which stands for tensor virtual machine, at that time it was just beginning to start using GPUs for machine learning, we already saw that, with a bunch of machine learning models popping up and CPUs and GPU's starting to be used for machine learning, it was clear that it come opportunity to run on everywhere. >> And GPU's were coming fast. >> GPUs were coming and huge diversity of CPUs, of GPU's and accelerators now, and the ecosystem and the system software that maps models to hardware is still very fragmented today. So hardware vendors have their own specific stacks. So Nvidia has its own software stack, and so does Intel, AMD. And honestly, I mean, I hope I'm not being, you know, too controversial here to say that it kind of of looks like the mainframe era. We had tight coupling between hardware and software. You know, if you bought IBM hardware, you had to buy IBM OS and IBM database, IBM applications, it all tightly coupled. And if you want to use IBM software, you had to buy IBM hardware. So that's kind of like what machine learning systems look like today. If you buy a certain big name GPU, you've got to use their software. Even if you use their software, which is pretty good, you have to buy their GPUs, right? So, but you know, we wanted to help peel away the model and the software infrastructure from the hardware to give people choice, ability to run the models where it best suit them. Right? So that includes picking the best instance in the cloud, that's going to give you the right, you know, cost properties, performance properties, or might want to run it on the edge. You might run it on an accelerator. >> What year was that roughly, when you were going this? >> We started that project in 2015, 2016 >> Yeah. So that was pre-conventional wisdom. I think TensorFlow wasn't even around yet. >> Luis: No, it wasn't. >> It was, I'm thinking like 2017 or so. >> Luis: Right. So that was the beginning of, okay, this is opportunity. AWS, I don't think they had released some of the nitro stuff that the Hamilton was working on. So, they were already kind of going that way. It's kind of like converging. >> Luis: Yeah. >> The space was happening, exploding. >> Right. And the way that was dealt with, and to this day, you know, to a large extent as well is by backing machine learning models with a bunch of hardware specific libraries. And we were some of the first ones to say, like, know what, let's take a compilation approach, take a model and compile it to very efficient code for that specific hardware. And what underpins all of that is using machine learning for machine learning code optimization. Right? But it was way back when. We can talk about where we are today. >> No, let's fast forward. >> That's the beginning of the open source project. >> But that was a fundamental belief, worldview there. I mean, you have a world real view that was logical when you compare to the mainframe, but not obvious to the machine learning community. Okay, good call, check. Now let's fast forward, okay. Evolution, we'll go through the speed of the years. More chips are coming, you got GPUs, and seeing what's going on in AWS. Wow! Now it's booming. Now I got unlimited processors, I got silicon on chips, I got, everywhere >> Yeah. And what's interesting is that the ecosystem got even more complex, in fact. Because now you have, there's a cross product between machine learning models, frameworks like TensorFlow, PyTorch, Keras, and like that and so on, and then hardware targets. So how do you navigate that? What we want here, our vision is to say, folks should focus, people should focus on making the machine learning models do what they want to do that solves a value, like solves a problem of high value to them. Right? So another deployment should be completely automatic. Today, it's very, very manual to a large extent. So once you're serious about deploying machine learning model, you got a good understanding where you're going to deploy it, how you're going to deploy it, and then, you know, pick out the right libraries and compilers, and we automated the whole thing in our platform. This is why you see the tagline, the booth is right there, like bringing DevOps agility for machine learning, because our mission is to make that fully transparent. >> Well, I think that, first of all, I use that line here, cause I'm looking at it here on live on camera. People can't see, but it's like, I use it on a couple couple of my interviews because the word agility is very interesting because that's kind of the test on any kind of approach these days. Agility could be, and I talked to the robotics guys, just having their product be more agile. I talked to Pepsi here just before you came on, they had this large scale data environment because they built an architecture, but that fostered agility. So again, this is an architectural concept, it's a systems' view of agility being the output, and removing dependencies, which I think what you guys were trying to do. >> Only part of what we do. Right? So agility means a bunch of things. First, you know-- >> Yeah explain. >> Today it takes a couple months to get a model from, when the model's ready, to production, why not turn that in two hours. Agile, literally, physically agile, in terms of walk off time. Right? And then the other thing is give you flexibility to choose where your model should run. So, in our deployment, between the demo and the platform expansion that we announced yesterday, you know, we give the ability of getting your model and, you know, get it compiled, get it optimized for any instance in the cloud and automatically move it around. Today, that's not the case. You have to pick one instance and that's what you do. And then you might auto scale with that one instance. So we give the agility of actually running and scaling the model the way you want, and the way it gives you the right SLAs. >> Yeah, I think Swami was mentioning that, not specifically that use case for you, but that use case generally, that scale being moving things around, making them faster, not having to do that integration work. >> Scale, and run the models where they need to run. Like some day you want to have a large scale deployment in the cloud. You're going to have models in the edge for various reasons because speed of light is limited. We cannot make lights faster. So, you know, got to have some, that's a physics there you cannot change. There's privacy reasons. You want to keep data locally, not send it around to run the model locally. So anyways, and giving the flexibility. >> Let me jump in real quick. I want to ask this specific question because you made me think of something. So we're just having a data mesh conversation. And one of the comments that's come out of a few of these data as code conversations is data's the product now. So if you can move data to the edge, which everyone's talking about, you know, why move data if you don't have to, but I can move a machine learning algorithm to the edge. Cause it's costly to move data. I can move computer, everyone knows that. But now I can move machine learning to anywhere else and not worry about integrating on the fly. So the model is the code. >> It is the product. >> Yeah. And since you said, the model is the code, okay, now we're talking even more here. So machine learning models today are not treated as code, by the way. So do not have any of the typical properties of code that you can, whenever you write a piece of code, you run a code, you don't know, you don't even think what is a CPU, we don't think where it runs, what kind of CPU it runs, what kind of instance it runs. But with machine learning model, you do. So what we are doing and created this fully transparent automated way of allowing you to treat your machine learning models if you were a regular function that you call and then a function could run anywhere. >> Yeah. >> Right. >> That's why-- >> That's better. >> Bringing DevOps agility-- >> That's better. >> Yeah. And you can use existing-- >> That's better, because I can run it on the Artemis too, in space. >> You could, yeah. >> If they have the hardware. (both laugh) >> And that allows you to run your existing, continue to use your existing DevOps infrastructure and your existing people. >> So I have to ask you, cause since you're a professor, this is like a masterclass on theCube. Thank you for coming on. Professor. (Luis laughing) I'm a hardware guy. I'm building hardware for Boston Dynamics, Spot, the dog, that's the diversity in hardware, it's tends to be purpose driven. I got a spaceship, I'm going to have hardware on there. >> Luis: Right. >> It's generally viewed in the community here, that everyone I talk to and other communities, open source is going to drive all software. That's a check. But the scale and integration is super important. And they're also recognizing that hardware is really about the software. And they even said on stage, here. Hardware is not about the hardware, it's about the software. So if you believe that to be true, then your model checks all the boxes. Are people getting this? >> I think they're starting to. Here is why, right. A lot of companies that were hardware first, that thought about software too late, aren't making it. Right? There's a large number of hardware companies, AI chip companies that aren't making it. Probably some of them that won't make it, unfortunately just because they started thinking about software too late. I'm so glad to see a lot of the early, I hope I'm not just doing our own horn here, but Apache TVM, the infrastructure that we built to map models to different hardware, it's very flexible. So we see a lot of emerging chip companies like SiMa.ai's been doing fantastic work, and they use Apache TVM to map algorithms to their hardware. And there's a bunch of others that are also using Apache TVM. That's because you have, you know, an opening infrastructure that keeps it up to date with all the machine learning frameworks and models and allows you to extend to the chips that you want. So these companies pay attention that early, gives them a much higher fighting chance, I'd say. >> Well, first of all, not only are you backable by the VCs cause you have pedigree, you're a professor, you're smart, and you get good recruiting-- >> Luis: I don't know about the smart part. >> And you get good recruiting for PhDs out of University of Washington, which is not too shabby computer science department. But they want to make money. The VCs want to make money. >> Right. >> So you have to make money. So what's the pitch? What's the business model? >> Yeah. Absolutely. >> Share us what you're thinking there. >> Yeah. The value of using our solution is shorter time to value for your model from months to hours. Second, you shrink operator, op-packs, because you don't need a specialized expensive team. Talk about expensive, expensive engineers who can understand machine learning hardware and software engineering to deploy models. You don't need those teams if you use this automated solution, right? Then you reduce that. And also, in the process of actually getting a model and getting specialized to the hardware, making hardware aware, we're talking about a very significant performance improvement that leads to lower cost of deployment in the cloud. We're talking about very significant reduction in costs in cloud deployment. And also enabling new applications on the edge that weren't possible before. It creates, you know, latent value opportunities. Right? So, that's the high level value pitch. But how do we make money? Well, we charge for access to the platform. Right? >> Usage. Consumption. >> Yeah, and value based. Yeah, so it's consumption and value based. So depends on the scale of the deployment. If you're going to deploy machine learning model at a larger scale, chances are that it produces a lot of value. So then we'll capture some of that value in our pricing scale. >> So, you have direct sales force then to work those deals. >> Exactly. >> Got it. How many customers do you have? Just curious. >> So we started, the SaaS platform just launched now. So we started onboarding customers. We've been building this for a while. We have a bunch of, you know, partners that we can talk about openly, like, you know, revenue generating partners, that's fair to say. We work closely with Qualcomm to enable Snapdragon on TVM and hence our platform. We're close with AMD as well, enabling AMD hardware on the platform. We've been working closely with two hyperscaler cloud providers that-- >> I wonder who they are. >> I don't know who they are, right. >> Both start with the letter A. >> And they're both here, right. What is that? >> They both start with the letter A. >> Oh, that's right. >> I won't give it away. (laughing) >> Don't give it away. >> One has three, one has four. (both laugh) >> I'm guessing, by the way. >> Then we have customers in the, actually, early customers have been using the platform from the beginning in the consumer electronics space, in Japan, you know, self driving car technology, as well. As well as some AI first companies that actually, whose core value, the core business come from AI models. >> So, serious, serious customers. They got deep tech chops. They're integrating, they see this as a strategic part of their architecture. >> That's what I call AI native, exactly. But now there's, we have several enterprise customers in line now, we've been talking to. Of course, because now we launched the platform, now we started onboarding and exploring how we're going to serve it to these customers. But it's pretty clear that our technology can solve a lot of other pain points right now. And we're going to work with them as early customers to go and refine them. >> So, do you sell to the little guys, like us? Will we be customers if we wanted to be? >> You could, absolutely, yeah. >> What we have to do, have machine learning folks on staff? >> So, here's what you're going to have to do. Since you can see the booth, others can't. No, but they can certainly, you can try our demo. >> OctoML. >> And you should look at the transparent AI app that's compiled and optimized with our flow, and deployed and built with our flow. That allows you to get your image and do style transfer. You know, you can get you and a pineapple and see how you look like with a pineapple texture. >> We got a lot of transcript and video data. >> Right. Yeah. Right, exactly. So, you can use that. Then there's a very clear-- >> But I could use it. You're not blocking me from using it. Everyone's, it's pretty much democratized. >> You can try the demo, and then you can request access to the platform. >> But you get a lot of more serious deeper customers. But you can serve anybody, what you're saying. >> Luis: We can serve anybody, yeah. >> All right, so what's the vision going forward? Let me ask this. When did people start getting the epiphany of removing the machine learning from the hardware? Was it recently, a couple years ago? >> Well, on the research side, we helped start that trend a while ago. I don't need to repeat that. But I think the vision that's important here, I want the audience here to take away is that, there's a lot of progress being made in creating machine learning models. So, there's fantastic tools to deal with training data, and creating the models, and so on. And now there's a bunch of models that can solve real problems there. The question is, how do you very easily integrate that into your intelligent applications? Madrona Venture Group has been very vocal and investing heavily in intelligent applications both and user applications as well as enablers. So we say an enable of that because it's so easy to use our flow to get a model integrated into your application. Now, any regular software developer can integrate that. And that's just the beginning, right? Because, you know, now we have CI/CD integration to keep your models up to date, to continue to integrate, and then there's more downstream support for other features that you normally have in regular software development. >> I've been thinking about this for a long, long, time. And I think this whole code, no one thinks about code. Like, I write code, I'm deploying it. I think this idea of machine learning as code independent of other dependencies is really amazing. It's so obvious now that you say it. What's the choices now? Let's just say that, I buy it, I love it, I'm using it. Now what do I got to do if I want to deploy it? Do I have to pick processors? Are there verified platforms that you support? Is there a short list? Is there every piece of hardware? >> We actually can help you. I hope we're not saying we can do everything in the world here, but we can help you with that. So, here's how. When you have them all in the platform you can actually see how this model runs on any instance of any cloud, by the way. So we support all the three major cloud providers. And then you can make decisions. For example, if you care about latency, your model has to run on, at most 50 milliseconds, because you're going to have interactivity. And then, after that, you don't care if it's faster. All you care is that, is it going to run cheap enough. So we can help you navigate. And also going to make it automatic. >> It's like tire kicking in the dealer showroom. >> Right. >> You can test everything out, you can see the simulation. Are they simulations, or are they real tests? >> Oh, no, we run all in real hardware. So, we have, as I said, we support any instances of any of the major clouds. We actually run on the cloud. But we also support a select number of edge devices today, like ARMs and Nvidia Jetsons. And we have the OctoML cloud, which is a bunch of racks with a bunch Raspberry Pis and Nvidia Jetsons, and very soon, a bunch of mobile phones there too that can actually run the real hardware, and validate it, and test it out, so you can see that your model runs performant and economically enough in the cloud. And it can run on the edge devices-- >> You're a machine learning as a service. Would that be an accurate? >> That's part of it, because we're not doing the machine learning model itself. You come with a model and we make it deployable and make it ready to deploy. So, here's why it's important. Let me try. There's a large number of really interesting companies that do API models, as in API as a service. You have an NLP model, you have computer vision models, where you call an API and then point in the cloud. You send an image and you got a description, for example. But it is using a third party. Now, if you want to have your model on your infrastructure but having the same convenience as an API you can use our service. So, today, chances are that, if you have a model that you know that you want to do, there might not be an API for it, we actually automatically create the API for you. >> Okay, so that's why I get the DevOps agility for machine learning is a better description. Cause it's not, you're not providing the service. You're providing the service of deploying it like DevOps infrastructure as code. You're now ML as code. >> It's your model, your API, your infrastructure, but all of the convenience of having it ready to go, fully automatic, hands off. >> Cause I think what's interesting about this is that it brings the craftsmanship back to machine learning. Cause it's a craft. I mean, let's face it. >> Yeah. I want human brains, which are very precious resources, to focus on building those models, that is going to solve business problems. I don't want these very smart human brains figuring out how to scrub this into actually getting run the right way. This should be automatic. That's why we use machine learning, for machine learning to solve that. >> Here's an idea for you. We should write a book called, The Lean Machine Learning. Cause the lean startup was all about DevOps. >> Luis: We call machine leaning. No, that's not it going to work. (laughs) >> Remember when iteration was the big mantra. Oh, yeah, iterate. You know, that was from DevOps. >> Yeah, that's right. >> This code allowed for standing up stuff fast, double down, we all know the history, what it turned out. That was a good value for developers. >> I could really agree. If you don't mind me building on that point. You know, something we see as OctoML, but we also see at Madrona as well. Seeing that there's a trend towards best in breed for each one of the stages of getting a model deployed. From the data aspect of creating the data, and then to the model creation aspect, to the model deployment, and even model monitoring. Right? We develop integrations with all the major pieces of the ecosystem, such that you can integrate, say with model monitoring to go and monitor how a model is doing. Just like you monitor how code is doing in deployment in the cloud. >> It's evolution. I think it's a great step. And again, I love the analogy to the mainstream. I lived during those days. I remember the monolithic propriety, and then, you know, OSI model kind of blew it. But that OSI stack never went full stack, and it only stopped at TCP/IP. So, I think the same thing's going on here. You see some scalability around it to try to uncouple it, free it. >> Absolutely. And sustainability and accessibility to make it run faster and make it run on any deice that you want by any developer. So, that's the tagline. >> Luis Ceze, thanks for coming on. Professor. >> Thank you. >> I didn't know you were a professor. That's great to have you on. It was a masterclass in DevOps agility for machine learning. Thanks for coming on. Appreciate it. >> Thank you very much. Thank you. >> Congratulations, again. All right. OctoML here on theCube. Really important. Uncoupling the machine learning from the hardware specifically. That's only going to make space faster and safer, and more reliable. And that's where the whole theme of re:MARS is. Let's see how they fit in. I'm John for theCube. Thanks for watching. More coverage after this short break. >> Luis: Thank you. (gentle music)
SUMMARY :
live on the floor at AWS re:MARS 2022. for having me in the show, John. but machine learning is the And that allows you to get certainly on the silicon side. 'cause I could see the progression. So once upon a time, yeah, no... because if you wake up learning runs in the end, that's going to give you the So that was pre-conventional wisdom. the Hamilton was working on. and to this day, you know, That's the beginning of that was logical when you is that the ecosystem because that's kind of the test First, you know-- and scaling the model the way you want, not having to do that integration work. Scale, and run the models So if you can move data to the edge, So do not have any of the typical And you can use existing-- the Artemis too, in space. If they have the hardware. And that allows you So I have to ask you, So if you believe that to be true, to the chips that you want. about the smart part. And you get good recruiting for PhDs So you have to make money. And also, in the process So depends on the scale of the deployment. So, you have direct sales How many customers do you have? We have a bunch of, you know, And they're both here, right. I won't give it away. One has three, one has four. in Japan, you know, self They're integrating, they see this as it to these customers. Since you can see the booth, others can't. and see how you look like We got a lot of So, you can use that. But I could use it. and then you can request But you can serve anybody, of removing the machine for other features that you normally have It's so obvious now that you say it. So we can help you navigate. in the dealer showroom. you can see the simulation. And it can run on the edge devices-- You're a machine learning as a service. know that you want to do, I get the DevOps agility but all of the convenience it brings the craftsmanship for machine learning to solve that. Cause the lean startup No, that's not it going to work. You know, that was from DevOps. double down, we all know the such that you can integrate, and then, you know, OSI on any deice that you Professor. That's great to have you on. Thank you very much. Uncoupling the machine learning Luis: Thank you.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Luis Ceze | PERSON | 0.99+ |
Qualcomm | ORGANIZATION | 0.99+ |
Luis | PERSON | 0.99+ |
2015 | DATE | 0.99+ |
John | PERSON | 0.99+ |
John Furrier | PERSON | 0.99+ |
Boston Dynamics | ORGANIZATION | 0.99+ |
two hours | QUANTITY | 0.99+ |
Nvidia | ORGANIZATION | 0.99+ |
2017 | DATE | 0.99+ |
Japan | LOCATION | 0.99+ |
Madrona Venture Capital | ORGANIZATION | 0.99+ |
AMD | ORGANIZATION | 0.99+ |
one | QUANTITY | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
three | QUANTITY | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
One | QUANTITY | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
four | QUANTITY | 0.99+ |
2016 | DATE | 0.99+ |
University of Washington | ORGANIZATION | 0.99+ |
Today | DATE | 0.99+ |
Pepsi | ORGANIZATION | 0.99+ |
Both | QUANTITY | 0.99+ |
yesterday | DATE | 0.99+ |
First | QUANTITY | 0.99+ |
both | QUANTITY | 0.99+ |
Second | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
SiMa.ai | ORGANIZATION | 0.99+ |
OctoML | TITLE | 0.99+ |
OctoML | ORGANIZATION | 0.99+ |
Intel | ORGANIZATION | 0.98+ |
one instance | QUANTITY | 0.98+ |
DevOps | TITLE | 0.98+ |
Madrona Venture Group | ORGANIZATION | 0.98+ |
Swami | PERSON | 0.98+ |
Madrona | ORGANIZATION | 0.98+ |
about six years | QUANTITY | 0.96+ |
Spot | ORGANIZATION | 0.96+ |
The Lean Machine Learning | TITLE | 0.95+ |
first | QUANTITY | 0.95+ |
theCUBE | ORGANIZATION | 0.94+ |
ARMs | ORGANIZATION | 0.94+ |
pineapple | ORGANIZATION | 0.94+ |
Raspberry Pis | ORGANIZATION | 0.92+ |
TensorFlow | TITLE | 0.89+ |
Snapdragon | ORGANIZATION | 0.89+ |
about three years old | QUANTITY | 0.89+ |
a couple years ago | DATE | 0.88+ |
two hyperscaler cloud providers | QUANTITY | 0.88+ |
first ones | QUANTITY | 0.87+ |
one of | QUANTITY | 0.85+ |
50 milliseconds | QUANTITY | 0.83+ |
Apache TVM | ORGANIZATION | 0.82+ |
both laugh | QUANTITY | 0.82+ |
three major cloud providers | QUANTITY | 0.81+ |
Piotr Mierzejewski, IBM | Dataworks Summit EU 2018
>> Announcer: From Berlin, Germany, it's theCUBE covering Dataworks Summit Europe 2018 brought to you by Hortonworks. (upbeat music) >> Well hello, I'm James Kobielus and welcome to theCUBE. We are here at Dataworks Summit 2018, in Berlin, Germany. It's a great event, Hortonworks is the host, they made some great announcements. They've had partners doing the keynotes and the sessions, breakouts, and IBM is one of their big partners. Speaking of IBM, from IBM we have a program manager, Piotr, I'll get this right, Piotr Mierzejewski, your focus is on data science machine learning and data science experience which is one of the IBM Products for working data scientists to build and to train models in team data science enterprise operational environments, so Piotr, welcome to theCUBE. I don't think we've had you before. >> Thank you. >> You're a program manager. I'd like you to discuss what you do for IBM, I'd like you to discuss Data Science Experience. I know that Hortonworks is a reseller of Data Science Experience, so I'd like you to discuss the partnership going forward and how you and Hortonworks are serving your customers, data scientists and others in those teams who are building and training and deploying machine learning and deep learning, AI, into operational applications. So Piotr, I give it to you now. >> Thank you. Thank you for inviting me here, very excited. This is a very loaded question, and I would like to begin, before I get actually to why the partnership makes sense, I would like to begin with two things. First, there is no machine learning about data. And second, machine learning is not easy. Especially, especially-- >> James: I never said it was! (Piotr laughs) >> Well there is this kind of perception, like you can have a data scientist working on their Mac, working on some machine learning algorithms and they can create a recommendation engine, let's say in a two, three days' time. This is because of the explosion of open-source in that space. You have thousands of libraries, from Python, from R, from Scala, you have access to Spark. All these various open-source offerings that are enabling data scientists to actually do this wonderful work. However, when you start talking about bringing machine learning to the enterprise, this is not an easy thing to do. You have to think about governance, resiliency, the data access, actual model deployments, which are not trivial. When you have to expose this in a uniform fashion to actually various business units. Now all this has to actually work in a private cloud, public clouds environment, on a variety of hardware, a variety of different operating systems. Now that is not trivial. (laughs) Now when you deploy a model, as the data scientist is going to deploy the model, he needs to be able to actually explain how the model was created. He has to be able to explain what the data was used. He needs to ensure-- >> Explicable AI, or explicable machine learning, yeah, that's a hot focus of our concern, of enterprises everywhere, especially in a world where governance and tracking and lineage GDPR and so forth, so hot. >> Yes, you've mentioned all the right things. Now, so given those two things, there's no ML web data, and ML is not easy, why the partnership between Hortonworks and IBM makes sense, well, you're looking at the number one industry leading big data plot from Hortonworks. Then, you look at a DSX local, which, I'm proud to say, I've been there since the first line of code, and I'm feeling very passionate about the product, is the merger between the two, ability to integrate them tightly together gives your data scientists secure access to data, ability to leverage the spark that runs inside a Hortonworks cluster, ability to actually work in a platform like DSX that doesn't limit you to just one kind of technology but allows you to work with multiple technologies, ability to actually work on not only-- >> When you say technologies here, you're referring to frameworks like TensorFlow, and-- >> Precisely. Very good, now that part I'm going to get into very shortly, (laughs) so please don't steal my thunder. >> James: Okay. >> Now, what I was saying is that not only DSX and Hortonworks integrated to the point that you can actually manage your Hadoop clusters, Hadoop environments within a DSX, you can actually work on your Python models and your analytics within DSX and then push it remotely to be executed where your data is. Now, why is this important? If you work with the data that's megabytes, gigabytes, maybe you know you can pull it in, but in truly what you want to do when you move to the terabytes and the petabytes of data, what happens is that you actually have to push the analytics to where your data resides, and leverage for example YARN, a resource manager, to distribute your workloads and actually train your models on your actually HDP cluster. That's one of the huge volume propositions. Now, mind you to say this is all done in a secure fashion, with ability to actually install DSX on the edge notes of the HDP clusters. >> James: Hmm... >> As of HDP 264, DSX has been certified to actually work with HDP. Now, this partnership embarked, we embarked on this partnership about 10 months ago. Now, often happens that there is announcements, but there is not much materializing after such announcement. This is not true in case of DSX and HDP. We have had, just recently we have had a release of the DSX 1.2 which I'm super excited about. Now, let's talk about those open-source toolings in the various platforms. Now, you don't want to force your data scientists to actually work with just one environment. Some of them might prefer to work on Spark, some of them like their RStudio, they're statisticians, they like R, others like Python, with Zeppelin, say Jupyter Notebook. Now, how about Tensorflow? What are you going to do when actually, you know, you have to do the deep learning workloads, when you want to use neural nets? Well, DSX does support ability to actually bring in GPU notes and do the Tensorflow training. As a sidecar approach, you can append the note, you can scale the platform horizontally and vertically, and train your deep learning workloads, and actually remove the sidecar out. So you should put it towards the cluster and remove it at will. Now, DSX also actually not only satisfies the needs of your programmer data scientists, that actually code in Python and Scala or R, but actually allows your business analysts to work and create models in a visual fashion. As of DSX 1.2, you can actually, we have embedded, integrated, an SPSS modeler, redesigned, rebranded, this is an amazing technology from IBM that's been on for a while, very well established, but now with the new interface, embedded inside a DSX platform, allows your business analysts to actually train and create the model in a visual fashion and, what is beautiful-- >> Business analysts, not traditional data scientists. >> Not traditional data scientists. >> That sounds equivalent to how IBM, a few years back, was able to bring more of a visual experience to SPSS proper to enable the business analysts of the world to build and do data-mining and so forth with structured data. Go ahead, I don't want to steal your thunder here. >> No, no, precisely. (laughs) >> But I see it's the same phenomenon, you bring the same capability to greatly expand the range of data professionals who can do, in this case, do machine learning hopefully as well as professional, dedicated data scientists. >> Certainly, now what we have to also understand is that data science is actually a team sport. It involves various stakeholders from the organization. From executive, that actually gives you the business use case to your data engineers that actually understand where your data is and can grant the access-- >> James: They manage the Hadoop clusters, many of them, yeah. >> Precisely. So they manage the Hadoop clusters, they actually manage your relational databases, because we have to realize that not all the data is in the datalinks yet, you have legacy systems, which DSX allows you to actually connect to and integrate to get data from. It also allows you to actually consume data from streaming sources, so if you actually have a Kafka message cob and actually were streaming data from your applications or IoT devices, you can actually integrate all those various data sources and federate them within the DSX to use for machine training models. Now, this is all around predictive analytics. But what if I tell you that right now with the DSX you can actually do prescriptive analytics as well? With the 1.2, again I'm going to be coming back to this 1.2 DSX with the most recent release we have actually added decision optimization, an industry-leading solution from IBM-- >> Prescriptive analytics, gotcha-- >> Yes, for prescriptive analysis. So now if you have warehouses, or you have a fleet of trucks, or you want to optimize the flow in let's say, a utility company, whether it be for power or could it be for, let's say for water, you can actually create and train prescriptive models within DSX and deploy them the same fashion as you will deploy and manage your SPSS streams as well as the machine learning models from Spark, from Python, so with XGBoost, Tensorflow, Keras, all those various aspects. >> James: Mmmhmm. >> Now what's going to get really exciting in the next two months, DSX will actually bring in natural learning language processing and text analysis and sentiment analysis by Vio X. So Watson Explorer, it's another offering from IBM... >> James: It's called, what is the name of it? >> Watson Explorer. >> Oh Watson Explorer, yes. >> Watson Explorer, yes. >> So now you're going to have this collaborative message platform, extendable! Extendable collaborative platform that can actually install and run in your data centers without the need to access internet. That's actually critical. Yes, we can deploy an IWS. Yes we can deploy an Azure. On Google Cloud, definitely we can deploy in Softlayer and we're very good at that, however in the majority of cases we find that the customers have challenges for bringing the data out to the cloud environments. Hence, with DSX, we designed it to actually deploy and run and scale everywhere. Now, how we have done it, we've embraced open source. This was a huge shift within IBM to realize that yes we do have 350,000 employees, yes we could develop container technologies, but why? Why not embrace what is actually industry standards with the Docker and equivalent as they became industry standards? Bring in RStudio, the Jupyter, the Zeppelin Notebooks, bring in the ability for a data scientist to choose the environments they want to work with and actually extend them and make the deployments of web services, applications, the models, and those are actually full releases, I'm not only talking about the model, I'm talking about the scripts that can go with that ability to actually pull the data in and allow the models to be re-trained, evaluated and actually re-deployed without taking them down. Now that's what actually becomes, that's what is the true differentiator when it comes to DSX, and all done in either your public or private cloud environments. >> So that's coming in the next version of DSX? >> Outside of DSX-- >> James: We're almost out of time, so-- >> Oh, I'm so sorry! >> No, no, no. It's my job as the host to let you know that. >> Of course. (laughs) >> So if you could summarize where DSX is going in 30 seconds or less as a product, the next version is, what is it? >> It's going to be the 1.2.1. >> James: Okay. >> 1.2.1 and we're expecting to release at the end of June. What's going to be unique in the 1.2.1 is infusing the text and sentiment analysis, so natural language processing with predictive and prescriptive analysis for both developers and your business analysts. >> James: Yes. >> So essentially a platform not only for your data scientist but pretty much every single persona inside the organization >> Including your marketing professionals who are baking sentiment analysis into what they do. Thank you very much. This has been Piotr Mierzejewski of IBM. He's a Program Manager for DSX and for ML, AI, and data science solutions and of course a strong partnership is with Hortonworks. We're here at Dataworks Summit in Berlin. We've had two excellent days of conversations with industry experts including Piotr. We want to thank everyone, we want to thank the host of this event, Hortonworks for having us here. We want to thank all of our guests, all these experts, for sharing their time out of their busy schedules. We want to thank everybody at this event for all the fascinating conversations, the breakouts have been great, the whole buzz here is exciting. GDPR's coming down and everybody's gearing up and getting ready for that, but everybody's also focused on innovative and disruptive uses of AI and machine learning and business, and using tools like DSX. I'm James Kobielus for the entire CUBE team, SiliconANGLE Media, wishing you all, wherever you are, whenever you watch this, have a good day and thank you for watching theCUBE. (upbeat music)
SUMMARY :
brought to you by Hortonworks. and to train models in team data science and how you and Hortonworks are serving your customers, Thank you for inviting me here, very excited. from Python, from R, from Scala, you have access to Spark. GDPR and so forth, so hot. that doesn't limit you to just one kind of technology Very good, now that part I'm going to get into very shortly, and then push it remotely to be executed where your data is. Now, you don't want to force your data scientists of the world to build and do data-mining (laughs) you bring the same capability the business use case to your data engineers James: They manage the Hadoop clusters, With the 1.2, again I'm going to be coming back to this as you will deploy and manage your SPSS streams in the next two months, DSX will actually bring in and allow the models to be re-trained, evaluated It's my job as the host to let you know that. (laughs) is infusing the text and sentiment analysis, and of course a strong partnership is with Hortonworks.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Piotr Mierzejewski | PERSON | 0.99+ |
James Kobielus | PERSON | 0.99+ |
James | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Piotr | PERSON | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
30 seconds | QUANTITY | 0.99+ |
Berlin | LOCATION | 0.99+ |
IWS | ORGANIZATION | 0.99+ |
Python | TITLE | 0.99+ |
Spark | TITLE | 0.99+ |
two | QUANTITY | 0.99+ |
First | QUANTITY | 0.99+ |
Scala | TITLE | 0.99+ |
Berlin, Germany | LOCATION | 0.99+ |
350,000 employees | QUANTITY | 0.99+ |
DSX | ORGANIZATION | 0.99+ |
Mac | COMMERCIAL_ITEM | 0.99+ |
two things | QUANTITY | 0.99+ |
RStudio | TITLE | 0.99+ |
DSX | TITLE | 0.99+ |
DSX 1.2 | TITLE | 0.98+ |
both developers | QUANTITY | 0.98+ |
second | QUANTITY | 0.98+ |
GDPR | TITLE | 0.98+ |
Watson Explorer | TITLE | 0.98+ |
Dataworks Summit 2018 | EVENT | 0.98+ |
first line | QUANTITY | 0.98+ |
Dataworks Summit Europe 2018 | EVENT | 0.98+ |
SiliconANGLE Media | ORGANIZATION | 0.97+ |
end of June | DATE | 0.97+ |
TensorFlow | TITLE | 0.97+ |
thousands of libraries | QUANTITY | 0.96+ |
R | TITLE | 0.96+ |
Jupyter | ORGANIZATION | 0.96+ |
1.2.1 | OTHER | 0.96+ |
two excellent days | QUANTITY | 0.95+ |
Dataworks Summit | EVENT | 0.94+ |
Dataworks Summit EU 2018 | EVENT | 0.94+ |
SPSS | TITLE | 0.94+ |
one | QUANTITY | 0.94+ |
Azure | TITLE | 0.92+ |
one kind | QUANTITY | 0.92+ |
theCUBE | ORGANIZATION | 0.92+ |
HDP | ORGANIZATION | 0.91+ |
Ritika Gunnar, IBM | IBM Think 2018
>> Narrator: Live from Las Vegas, it's theCUBE! Covering IBM Think 2018. Brought to you by IBM. >> Hello and I'm John Furrier. We're here in theCUBE studios at Think 2018, IBM Think 2018 in Mandalay Bay, in Las Vegas. We're extracting the signal from the noise, talking to all the executives, customers, thought leaders, inside the community of IBM and theCUBE. Our next guest is Ritika Gunnar who is the VP of Product for Watson and AI, cloud data platforms, all the goodness of the product side. Welcome to theCUBE. >> Thank you, great to be here again. >> So, we love talking to the product people because we want to know what the product strategy is. What's available, what's the hottest features. Obviously, we've been talking about, these are our words, Jenny introduced the innovation sandwich. >> Ritika: She did. >> The data's in the middle, and you have blockchain and AI on both sides of it. This is really the future. This is where they're going to see automation. This is where you're going to see efficiencies being created, inefficiencies being abstracted away. Obviously blockchain's got more of an infrastructure, futuristic piece to it. AI in play now, machine learning. You got Cloud underneath it all. How has the product morphed? What is the product today? We've heard of World of Watson in the past. You got Watson for this, you got Watson for IOT, You got Watson for this. What is the current offering? What's the product? Can you take a minute, just to explain what, semantically, it is? >> Sure. I'll start off by saying what is Watson? Watson is AI for smarter business. I want to start there. Because Watson is equal to how do we really get AI infused in our enterprise organizations and that is the core foundation of what Watson is. You heard a couple of announcements that the conference this week about what we're doing with Watson Studio, which is about providing that framework for what it means to infuse AI in our clients' applications. And you talked about machine learning. It's not just about machine learning anymore. It really is about how do we pair what machine learning is, which is about tweaking and tuning single algorithms, to what we're doing with deep learning. And that's one of the core components of what we're doing with Watson Studio is how do we make AI truly accessible. Not just machine learning but deep learning to be able to infuse those in our client environments really seamlessly and so the deep learning as a service piece of what we're doing in the studio was a big part of the announcements this week because deep learning allows our clients to really have it in a very accessible way. And there were a few things we announced with deep learning as a service. We said, look just like with predictive analytics we have capabilities that easily allow you to democratize that to knowledge workers and to business analysts by adding drag-and-drop capabilities. We can do the same thing with deep learning and deep learning capabilities. So we have taken a lot of things that have come from our research area and started putting those into the product to really bring about enterprise capabilities for deep learning but in a really de-skilled way. >> Yeah, and also to remind the folks, there's a platform involved here. Maybe you can say it's been re-platformed, I don't know. Maybe you can answer that. Has it been re-platformed or is it just the platformization of existing stuff? Because there's certainly demand. TensorFlow at Google showed that there's a demand for machine learning libraries and then deep learning behind. You got Amazon Web Services with Sagemaker, Touting. As a service model for AI, it's definitely in demand. So talk about the platform piece underneath. What is it? How does it get rendered? And then we'll come back and talk about the user consumption side. >> So it definitely is not a re-platformization. You recall what we have done with a focus initially on what we did on data science and what we did on machine learning. And the number one thing that we did was we were about supporting open-source and open frameworks. So it's not just one framework, like a TensorFlow framework, but it's about what we can do with TensorFlow, Keras, PyTorch, Caffe, and be able to use all of our builders' favorite open-source frameworks and be able to use that in a way where then we can add additional value on top of that and help them accelerate what it means to actually have that in the enterprise and what it means to actually de-skill that for the organization. So we started there. But really, if you look at where Watson has focused on the APIs and the API services, it's bringing together those capabilities of what we're doing with unstructured, pre-trained services, and then allowing clients to be able to bring together the structured and unstructured together on one platform, and adding the deep learning as a service capabilities, which is truly differentiating. >> Well, I think the important point there, just to amplify, and for the people to know is, it's not just your version of the tools for the data, you're looking at bringing data in from anywhere the customer, your customer wants it. And that's super critical. You don't want to ignore data. You can't. You got to have access to the data that matters. >> Yeah, you know, I think one of the other critical pieces that we're talking about here is, data without AI is meaningless and AI without data is really not useful or very accurate. So, having both of them in a yin yang and then bringing them together as we're doing in the Watson Studio is extremely important. >> The other thing I want get now to the user side, the consumption side you mentioned making it easier, but one of the things we've been hearing, that's been a theme in the hallways and certainly in theCUBE here is; bad data equals bad AI. >> Bad data equals bad AI. >> It's not just about bolting a AI on, you really got to take a holistic approach and a hygiene approach to the data and understanding where the data is contextually is relevant to the application. Talk about, that means kind of nuance, but break that down. What's your reaction to that and how do you talk to customers saying, okay look you want to do AI here's the playbook. How do you explain that in a very simple way? >> Well you heard of the AI ladder, making your data ready for AI. This is a really important concept because you need to be able to have trust in the data that you have, relevancy in the data that you have, and so it is about not just the connectivity to that data, but can you start having curated and rich data that is really valuable, that's accurate that you can trust, that you can leverage. It becomes not just about the data, but about the governance and the self-service capabilities that you can have and around that data and then it is about the machine learning and the deep learning characteristics that you can put on there. But, all three of those components are absolutely essential. What we're seeing it's not even about the data that you have within the firewall of your organization, it's about what you're doing to really augment that with external data. That's another area that we're having pre-trained, enriched, data sets with what we're doing with the Wats and data kits is extremely important; industry specific data. >> Well you know my pet peeve is always I love data. I'm a data geek, I love innovation, I love data driven, but you can't have data without good human interaction. The human component is critical and certainly with seeing trends where startups like Elation that we've interviewed; are taking this social approach to data where they're looking at it like you don't need to be a data geek or data scientist. The average business person's creating the value in especially blockchain, we were just talking in theCUBE that it's the business model Innovations, it's universal property and the technology can be enabled and managed appropriately. This is where the value is. What's the human component? Is there like... You want to know who's using the data? >> Well-- >> Why are they using data? It's like do I share the data? Can you leverage other people's data? This is kind of a melting pot. >> It is. >> What's the human piece of it? >> It truly is about enabling more people access to what it means to infuse AI into their organization. When I said it's not about re-platforming, but it's about expanding. We started with the data scientists, and we're adding to that the application developer. The third piece of that is, how do you get the knowledge worker? The subject matter expert? The person who understand the actual machine, or equipment that needs to be inspected. How do you get them to start customizing models without having to know anything about the data science element? That's extremely important because I can auto-tag and auto-classify stuff and use AI to get them started, but there is that human element of not needing to be a data scientist, but still having input into that AI and that's a very beautiful thing. >> You know it's interesting is in the security industry you've seen groups; birds of a feather flock together, where they share hats and it's a super important community aspect of it. Data has now, and now with AI, you get the AI ladder, but this points to AI literacy within the organizations. >> Exactly. >> So you're seeing people saying, hey we need AI literacy. Not coding per se, but how do we manage data? But it's also understanding who within your peer group is evolving. So your seeing now a whole formation of user base out there, users who want to know who their; the birds of the other feather flocking together. This is now a social gamification opportunity because they're growing together. >> There're-- >> What's your thought on that? >> There're two things there I would say. First, is we often go to the technology and as a product person I just spoke to you a lot about the technology. But, what we find in talking to our clients, is that it really is about helping them with the skills, the culture, the process transformation that needs to happen within the organization to break down the boundaries and the silos exist to truly get AI into an organization. That's the first thing. The second, is when you think about AI and what it means to actually infuse AI into an enterprise organization there's an ethics component of this. There's ethics and bias, and bias components which you need to mitigate and detect, and those are real problems and by the way IBM, especially with the work that we're doing within Watson, with the work that we're doing in research, we're taking this on front and center and it's extremely important to what we do. >> You guys used to talk about that as cognitive, but I think you're so right on. I think this is such a progressive topic, love to do a deeper dive on it, but really you nailed it. Data has to have a consensus algorithm built into it. Meaning you need to have, that's why I brought up this social dynamic, because I'm seeing people within organizations address regulatory issues, legal issues, ethical, societal issues all together and it requires a group. >> That's right. >> Not just algorithm, people to synthesize. >> Exactly. >> And that's either diversity, diverse groups from different places and experiences whether it's an expert here, user there; all coming together. This is not really talked about much. How are you guys-- >> I think it will be more. >> John: It will, you think so? >> Absolutely it will be more. >> What do you see from customers? You've done a lot of client meetings. Are they talking about this? Or they still more in the how do I stand up AI, literacy. >> They are starting to talk about it because look, imagine if you train your model on bad data. You actually have bias then in your model and that means that the accuracy of that model is not where you need it to be if your going to run it in an enterprise organization. So, being able to do things like detect it and proactively mitigate it are at the forefront and by the way this where our teams are really focusing on what we can do to further the AI practice in the enterprise and it is where we really believe that the ethics part of this is so important for that enterprise or smarter business component. >> Iterating through the quality the data's really good. Okay, so now I was talking to Rob Thomas talking about data containers. We were kind of nerding out on Kubernetes and all that good stuff. You almost imagine Kubernetes and containers making data really easy to move around and manage effectively with software, but I mentioned consensus on the understanding the quality of the data and understanding the impact of the data. When you say consensus, the first thing that jumps in my mind is blockchain, cryptocurrency. Is there a tokenization economics model in data somewhere? Because all the best stuff going on in blockchain and cryptocurrency that's technically more impactful is the changing of the economics. Changing of the technical architectures. You almost can say, hmm. >> You can actually see over a time that there is a business model that puts more value not just on the data and the data assets themselves, but on the models and the insights that are actually created from the AI assets themselves. I do believe that is a transformation just like what we're seeing in blockchain and the type of cryptocurrency that exists within there, and the kind of where the value is. We will see the same shift within data and AI. >> Well, you know, we're really interested in exploring and if you guys have any input to that we'd love to get more access to thought leaders around the relationship people and things have to data. Obviously the internet of things is one piece, but the human relationship the data. You're seeing it play out in real time. Uber had a first death this week, that was tragic. First self-driving car fatality. You're seeing Facebook really get handed huge negative press on the fact that they mismanaged the data that was optimized for advertising not user experience. You're starting to see a shift in an evolution where people are starting to recognize the role of the human and their data and other people's data. This is a big topic. >> It's a huge topic and I think we'll see a lot more from it and the weeks, and months, and years ahead on this. I think it becomes a really important point as to how we start to really innovate in and around not just the data, but the AI we apply to it and then the implications of it and what it means in terms of if the data's not right, if the algorithm's aren't right, if the biases is there. It is big implications for society and for the environment as a whole. >> I really appreciate you taking the time to speak with us. I know you're super busy. My final question's much more share some color commentary on IBM Think this week, the event, your reaction to, obviously it's massive, and also the customer conversations you've had. You've told me that your in client briefings and meetings. What are they talking about? What are they asking for? What are some of the things that are, low-hanging fruit use cases? Where's the starting point? Where are people jumping in? Can you just share any data you have on-- >> Oh I can share. That's a fully loaded question; that's like 10 questions all in one. But the Think conference has been great in terms of when you think about the problems that we're trying to solve with AI, it's not AI alone, right? It actually is integrated in with things like data, with the systems, with how we actually integrate that in terms of a hybrid way of what we're doing on premises and what we're doing in private Cloud, what we're doing in public Cloud. So, actually having a forum where we're talking about all of that together in a unified manner has actually been great feedback that I've heard from many customers, many analysts, and in general from an IBM perspective, I believe has been extremely valuable. I think the types of questions that I'm hearing and the types of inputs and conversations we're having, are one of where clients want to be able to innovate and really do things that are in Horizon three type things. What are the things they should be doing in Horizon one, Horizon two, and Horizon three when it comes to AI and when it comes to AI and how they treat their data. This is really important because-- >> What's Horizon one, two and three? >> You think about Horizon one, those are things you should be doing immediately to get immediate value in your business. Horizon two, are kind of mid-term, 18 to 24. 24 plus months out is Horizon 3. So when you think about an AI journey, what is your AI journey really look like in terms of what you should be doing in the immediate terms. Small, quick wins. >> Foundational. >> What are things that you can do kind of projects that will pan out in a year and what are the two to three year projects that we should be doing. This are the most frequent conversations that I've been having with a lot of our clients in terms of what is that AI journey we should be thinking about, what are the projects right now, how do we work with you on the projects right now on H1 and H2. What are the things we can start incubating that are longer term. And these extremely transformational in nature. It's kind of like what do we do to really automate self-driving, not just cars, but what we do for trains and we do to do really revolutionize certain industries and professions. >> How does your product roadmap to your Horizons? Can you share a little bit about the priorities on the roadmap? I know you don't want to share a lot of data, competitive information. But, can you give an antidotal or at least a trajectory of what the priorities are and some guiding principals? >> I hinted at some of it, but I only talked about the Studio, right... During this discussion, but still Studio is just one of a three-pronged approach that we have in Watson. The Studio really is about laying the foundation that is equivalent for how do we get AI in our enterprises for the builders, and it's like a place where builders go to be able to create, build, deploy those models, machine learning, deep learning models and be able to do so in a de-skilled way. Well, on top of that, as you know, we've done thousands of engagements and we know the most comprehensive ways that clients are trying to use Watson and AI in their organizations. So taking our learnings from that, we're starting to harden those in applications so that clients can easily infuse that into their businesses. We have capabilities for things like Watson Assistance, which was announced this week at the conference that really helped clients with pre-existing skills like how do you have a customer care solution, but then how can you extend it to other industries like automotive, or hospitality, or retail. So, we're working not just within Watson but within broader IBM to bring solutions like that. We also have talked about compliance. Every organization has a regulatory, or compliance, or legal department that deals with either SOWs, legal documents, technical documents. How do you then start making sure that you're adhering to the types of regulations or legal requirements that you have on those documents. Compare and comply actually uses a lot of the Watson technologies to be able to do that. And scaling this out in terms of how clients are really using the AI in their business is the other point of where Watson will absolutely focus going forward. >> That's awesome, Ritika. Thank you for coming on theCUBE, sharing the awesome work and again gutting across IBM and also outside in the industry. The more data the better the potential. >> Absolutely. >> Well thanks for sharing the data. We're putting the data out there for you. theCUBE is one big data machine, we're data driven. We love doing these interviews, of course getting the experts and the product folks on theCUBE is super important to us. I'm John Furrier, more coverage for IBM Think after this short break. (upbeat music)
SUMMARY :
Brought to you by IBM. all the goodness of the product side. Jenny introduced the innovation sandwich. and you have blockchain and AI on both sides of it. and that is the core foundation of what Watson is. Yeah, and also to remind the folks, there's a platform and adding the deep learning as a service capabilities, and for the people to know is, and then bringing them together the consumption side you mentioned making it easier, and how do you talk to customers saying, and the self-service capabilities that you can have and the technology can be enabled and managed appropriately. It's like do I share the data? that human element of not needing to be a data scientist, You know it's interesting is in the security industry the birds of the other feather flocking together. and the silos exist to truly get AI into an organization. love to do a deeper dive on it, but really you nailed it. How are you guys-- What do you see from customers? and that means that the accuracy of that model is not is the changing of the economics. and the kind of where the value is. and if you guys have any input to and for the environment as a whole. and also the customer conversations you've had. and the types of inputs and conversations we're having, what you should be doing in the immediate terms. What are the things we can start incubating on the roadmap? of the Watson technologies to be able to do that. and also outside in the industry. and the product folks on theCUBE is super important to us.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
IBM | ORGANIZATION | 0.99+ |
Jenny | PERSON | 0.99+ |
John | PERSON | 0.99+ |
Ritika Gunnar | PERSON | 0.99+ |
Uber | ORGANIZATION | 0.99+ |
John Furrier | PERSON | 0.99+ |
Mandalay Bay | LOCATION | 0.99+ |
10 questions | QUANTITY | 0.99+ |
ORGANIZATION | 0.99+ | |
two | QUANTITY | 0.99+ |
ORGANIZATION | 0.99+ | |
Rob Thomas | PERSON | 0.99+ |
Amazon Web Services | ORGANIZATION | 0.99+ |
Ritika | PERSON | 0.99+ |
both | QUANTITY | 0.99+ |
First | QUANTITY | 0.99+ |
three year | QUANTITY | 0.99+ |
Horizon 3 | TITLE | 0.99+ |
third piece | QUANTITY | 0.99+ |
second | QUANTITY | 0.99+ |
Horizon three | TITLE | 0.99+ |
Watson | TITLE | 0.99+ |
one piece | QUANTITY | 0.98+ |
both sides | QUANTITY | 0.98+ |
first death | QUANTITY | 0.98+ |
this week | DATE | 0.98+ |
TensorFlow | TITLE | 0.98+ |
Las Vegas | LOCATION | 0.98+ |
one | QUANTITY | 0.98+ |
a year | QUANTITY | 0.97+ |
one platform | QUANTITY | 0.97+ |
Kubernetes | TITLE | 0.97+ |
Horizon two | TITLE | 0.97+ |
Elation | ORGANIZATION | 0.96+ |
first thing | QUANTITY | 0.96+ |
18 | QUANTITY | 0.96+ |
Watson Studio | TITLE | 0.96+ |
today | DATE | 0.95+ |
thousands | QUANTITY | 0.95+ |
two things | QUANTITY | 0.95+ |
PyTorch | TITLE | 0.95+ |
Watson Assistance | TITLE | 0.94+ |
24. 24 | QUANTITY | 0.94+ |
2018 | DATE | 0.94+ |
Horizon one | TITLE | 0.93+ |
Think 2018 | EVENT | 0.93+ |
three | QUANTITY | 0.9+ |
one framework | QUANTITY | 0.89+ |
Sagemaker | ORGANIZATION | 0.88+ |
theCUBE | ORGANIZATION | 0.88+ |
single algorithms | QUANTITY | 0.88+ |
Think | COMMERCIAL_ITEM | 0.85+ |
Keras | TITLE | 0.84+ |
Caffe | TITLE | 0.81+ |
three | TITLE | 0.8+ |
First self- | QUANTITY | 0.79+ |
Wikibon Presents: Software is Eating the Edge | The Entangling of Big Data and IIoT
>> So as folks make their way over from Javits I'm going to give you the least interesting part of the evening and that's my segment in which I welcome you here, introduce myself, lay out what what we're going to do for the next couple of hours. So first off, thank you very much for coming. As all of you know Wikibon is a part of SiliconANGLE which also includes theCUBE, so if you look around, this is what we have been doing for the past couple of days here in the TheCUBE. We've been inviting some significant thought leaders from over on the show and in incredibly expensive limousines driven them up the street to come on to TheCUBE and spend time with us and talk about some of the things that are happening in the industry today that are especially important. We tore it down, and we're having this party tonight. So we want to thank you very much for coming and look forward to having more conversations with all of you. Now what are we going to talk about? Well Wikibon is the research arm of SiliconANGLE. So we take data that comes out of TheCUBE and other places and we incorporated it into our research. And work very closely with large end users and large technology companies regarding how to make better decisions in this incredibly complex, incredibly important transformative world of digital business. What we're going to talk about tonight, and I've got a couple of my analysts assembled, and we're also going to have a panel, is this notion of software is eating the Edge. Now most of you have probably heard Marc Andreessen, the venture capitalist and developer, original developer of Netscape many years ago, talk about how software's eating the world. Well, if software is truly going to eat the world, it's going to eat at, it's going to take the big chunks, big bites at the Edge. That's where the actual action's going to be. And what we want to talk about specifically is the entangling of the internet or the industrial internet of things and IoT with analytics. So that's what we're going to talk about over the course of the next couple of hours. To do that we're going to, I've already blown the schedule, that's on me. But to do that I'm going to spend a couple minutes talking about what we regard as the essential digital business capabilities which includes analytics and Big Data, and includes IIoT and we'll explain at least in our position why those two things come together the way that they do. But I'm going to ask the august and revered Neil Raden, Wikibon analyst to come on up and talk about harvesting value at the Edge. 'Cause there are some, not now Neil, when we're done, when I'm done. So I'm going to ask Neil to come on up and we'll talk, he's going to talk about harvesting value at the Edge. And then Jim Kobielus will follow up with him, another Wikibon analyst, he'll talk specifically about how we're going to take that combination of analytics and Edge and turn it into the new types of systems and software that are going to sustain this significant transformation that's going on. And then after that, I'm going to ask Neil and Jim to come, going to invite some other folks up and we're going to run a panel to talk about some of these issues and do a real question and answer. So the goal here is before we break for drinks is to create a community feeling within the room. That includes smart people here, smart people in the audience having a conversation ultimately about some of these significant changes so please participate and we look forward to talking about the rest of it. All right, let's get going! What is digital business? One of the nice things about being an analyst is that you can reach back on people who were significantly smarter than you and build your points of view on the shoulders of those giants including Peter Drucker. Many years ago Peter Drucker made the observation that the purpose of business is to create and keep a customer. Not better shareholder value, not anything else. It is about creating and keeping your customer. Now you can argue with that, at the end of the day, if you don't have customers, you don't have a business. Now the observation that we've made, what we've added to that is that we've made the observation that the difference between business and digital business essentially is one thing. That's data. A digital business uses data to differentially create and keep customers. That's the only difference. If you think about the difference between taxi cab companies here in New York City, every cab that I've been in in the last three days has bothered me about Uber. The reason, the difference between Uber and a taxi cab company is data. That's the primary difference. Uber uses data as an asset. And we think this is the fundamental feature of digital business that everybody has to pay attention to. How is a business going to use data as an asset? Is the business using data as an asset? Is a business driving its engagement with customers, the role of its product et cetera using data? And if they are, they are becoming a more digital business. Now when you think about that, what we're really talking about is how are they going to put data to work? How are they going to take their customer data and their operational data and their financial data and any other kind of data and ultimately turn that into superior engagement or improved customer experience or more agile operations or increased automation? Those are the kinds of outcomes that we're talking about. But it is about putting data to work. That's fundamentally what we're trying to do within a digital business. Now that leads to an observation about the crucial strategic business capabilities that every business that aspires to be more digital or to be digital has to put in place. And I want to be clear. When I say strategic capabilities I mean something specific. When you talk about, for example technology architecture or information architecture there is this notion of what capabilities does your business need? Your business needs capabilities to pursue and achieve its mission. And in the digital business these are the capabilities that are now additive to this core question, ultimately of whether or not the company is a digital business. What are the three capabilities? One, you have to capture data. Not just do a good job of it, but better than your competition. You have to capture data better than your competition. In a way that is ultimately less intrusive on your markets and on your customers. That's in many respects, one of the first priorities of the internet of things and people. The idea of using sensors and related technologies to capture more data. Once you capture that data you have to turn it into value. You have to do something with it that creates business value so you can do a better job of engaging your markets and serving your customers. And that essentially is what we regard as the basis of Big Data. Including operations, including financial performance and everything else, but ultimately it's taking the data that's being captured and turning it into value within the business. The last point here is that once you have generated a model, or an insight or some other resource that you can act upon, you then have to act upon it in the real world. We call that systems of agency, the ability to enact based on data. Now I want to spend just a second talking about systems of agency 'cause we think it's an interesting concept and it's something Jim Kobielus is going to talk about a little bit later. When we say systems of agency, what we're saying is increasingly machines are acting on behalf of a brand. Or systems, combinations of machines and people are acting on behalf of the brand. And this whole notion of agency is the idea that ultimately these systems are now acting as the business's agent. They are at the front line of engaging customers. It's an extremely rich proposition that has subtle but crucial implications. For example I was talking to a senior decision maker at a business today and they made a quick observation, they talked about they, on their way here to New York City they had followed a woman who was going through security, opened up her suitcase and took out a bird. And then went through security with the bird. And the reason why I bring this up now is as TSA was trying to figure out how exactly to deal with this, the bird started talking and repeating things that the woman had said and many of those things, in fact, might have put her in jail. Now in this case the bird is not an agent of that woman. You can't put the woman in jail because of what the bird said. But increasingly we have to ask ourselves as we ask machines to do more on our behalf, digital instrumentation and elements to do more on our behalf, it's going to have blow back and an impact on our brand if we don't do it well. I want to draw that forward a little bit because I suggest there's going to be a new lifecycle for data. And the way that we think about it is we have the internet or the Edge which is comprised of things and crucially people, using sensors, whether they be smaller processors in control towers or whether they be phones that are tracking where we go, and this crucial element here is something that we call information transducers. Now a transducer in a traditional sense is something that takes energy from one form to another so that it can perform new types of work. By information transducer I essentially mean it takes information from one form to another so it can perform another type of work. This is a crucial feature of data. One of the beauties of data is that it can be used in multiple places at multiple times and not engender significant net new costs. It's one of the few assets that you can say about that. So the concept of an information transducer's really important because it's the basis for a lot of transformations of data as data flies through organizations. So we end up with the transducers storing data in the form of analytics, machine learning, business operations, other types of things, and then it goes back and it's transduced, back into to the real world as we program the real world and turning into these systems of agency. So that's the new lifecycle. And increasingly, that's how we have to think about data flows. Capturing it, turning it into value and having it act on our behalf in front of markets. That could have enormous implications for how ultimately money is spent over the next few years. So Wikibon does a significant amount of market research in addition to advising our large user customers. And that includes doing studies on cloud, public cloud, but also studies on what's happening within the analytics world. And if you take a look at it, what we basically see happening over the course of the next few years is significant investments in software and also services to get the word out. But we also expect there's going to be a lot of hardware. A significant amount of hardware that's ultimately sold within this space. And that's because of something that we call true private cloud. This concept of ultimately a business increasingly being designed and architected around the idea of data assets means that the reality, the physical realities of how data operates, how much it costs to store it or move it, the issues of latency, the issues of intellectual property protection as well as things like the regulatory regimes that are being put in place to govern how data gets used in between locations. All of those factors are going to drive increased utilization of what we call true private cloud. On premise technologies that provide the cloud experience but act where the data naturally needs to be processed. I'll come a little bit more to that in a second. So we think that it's going to be a relatively balanced market, a lot of stuff is going to end up in the cloud, but as Neil and Jim will talk about, there's going to be an enormous amount of analytics that pulls an enormous amount of data out to the Edge 'cause that's where the action's going to be. Now one of the things I want to also reveal to you is we've done a fair amount of data, we've done a fair amount of research around this question of where or how will data guide decisions about infrastructure? And in particular the Edge is driving these conversations. So here is a piece of research that one of our cohorts at Wikibon did, David Floyer. Taking a look at IoT Edge cost comparisons over a three year period. And it showed on the left hand side, an example where the sensor towers and other types of devices were streaming data back into a central location in a wind farm, stylized wind farm example. Very very expensive. Significant amounts of money end up being consumed, significant resources end up being consumed by the cost of moving the data from one place to another. Now this is even assuming that latency does not become a problem. The second example that we looked at is if we kept more of that data at the Edge and processed at the Edge. And literally it is a 85 plus percent cost reduction to keep more of the data at the Edge. Now that has enormous implications, how we think about big data, how we think about next generation architectures, et cetera. But it's these costs that are going to be so crucial to shaping the decisions that we make over the next two years about where we put hardware, where we put resources, what type of automation is possible, and what types of technology management has to be put in place. Ultimately we think it's going to lead to a structure, an architecture in the infrastructure as well as applications that is informed more by moving cloud to the data than moving the data to the cloud. That's kind of our fundamental proposition is that the norm in the industry has been to think about moving all data up to the cloud because who wants to do IT? It's so much cheaper, look what Amazon can do. Or what AWS can do. All true statements. Very very important in many respects. But most businesses today are starting to rethink that simple proposition and asking themselves do we have to move our business to the cloud, or can we move the cloud to the business? And increasingly what we see happening as we talk to our large customers about this, is that the cloud is being extended out to the Edge, we're moving the cloud and cloud services out to the business. Because of economic reasons, intellectual property control reasons, regulatory reasons, security reasons, any number of other reasons. It's just a more natural way to deal with it. And of course, the most important reason is latency. So with that as a quick backdrop, if I may quickly summarize, we believe fundamentally that the difference today is that businesses are trying to understand how to use data as an asset. And that requires an investment in new sets of technology capabilities that are not cheap, not simple and require significant thought, a lot of planning, lot of change within an IT and business organizations. How we capture data, how we turn it into value, and how we translate that into real world action through software. That's going to lead to a rethinking, ultimately, based on cost and other factors about how we deploy infrastructure. How we use the cloud so that the data guides the activity and not the choice of cloud supplier determines or limits what we can do with our data. And that's going to lead to this notion of true private cloud and elevate the role the Edge plays in analytics and all other architectures. So I hope that was perfectly clear. And now what I want to do is I want to bring up Neil Raden. Yes, now's the time Neil! So let me invite Neil up to spend some time talking about harvesting value at the Edge. Can you see his, all right. Got it. >> Oh boy. Hi everybody. Yeah, this is a really, this is a really big and complicated topic so I decided to just concentrate on something fairly simple, but I know that Peter mentioned customers. And he also had a picture of Peter Drucker. I had the pleasure in 1998 of interviewing Peter and photographing him. Peter Drucker, not this Peter. Because I'd started a magazine called Hired Brains. It was for consultants. And Peter said, Peter said a number of really interesting things to me, but one of them was his definition of a customer was someone who wrote you a check that didn't bounce. He was kind of a wag. He was! So anyway, he had to leave to do a video conference with Jack Welch and so I said to him, how do you charge Jack Welch to spend an hour on a video conference? And he said, you know I have this theory that you should always charge your client enough that it hurts a little bit or they don't take you seriously. Well, I had the chance to talk to Jack's wife, Suzie Welch recently and I told her that story and she said, "Oh he's full of it, Jack never paid "a dime for those conferences!" (laughs) So anyway, all right, so let's talk about this. To me, things about, engineered things like the hardware and network and all these other standards and so forth, we haven't fully developed those yet, but they're coming. As far as I'm concerned, they're not the most interesting thing. The most interesting thing to me in Edge Analytics is what you're going to get out of it, what the result is going to be. Making sense of this data that's coming. And while we're on data, something I've been thinking a lot lately because everybody I've talked to for the last three days just keeps talking to me about data. I have this feeling that data isn't actually quite real. That any data that we deal with is the result of some process that's captured it from something else that's actually real. In other words it's proxy. So it's not exactly perfect. And that's why we've always had these problems about customer A, customer A, customer A, what's their definition? What's the definition of this, that and the other thing? And with sensor data, I really have the feeling, when companies get, not you know, not companies, organizations get instrumented and start dealing with this kind of data what they're going to find is that this is the first time, and I've been involved in analytics, I don't want to date myself, 'cause I know I look young, but the first, I've been dealing with analytics since 1975. And everything we've ever done in analytics has involved pulling data from some other system that was not designed for analytics. But if you think about sensor data, this is data that we're actually going to catch the first time. It's going to be ours! We're not going to get it from some other source. It's going to be the real deal, to the extent that it's the real deal. Now you may say, ya know Neil, a sensor that's sending us information about oil pressure or temperature or something like that, how can you quarrel with that? Well, I can quarrel with it because I don't know if the sensor's doing it right. So we still don't know, even with that data, if it's right, but that's what we have to work with. Now, what does that really mean? Is that we have to be really careful with this data. It's ours, we have to take care of it. We don't get to reload it from source some other day. If we munge it up it's gone forever. So that has, that has very serious implications, but let me, let me roll you back a little bit. The way I look at analytics is it's come in three different eras. And we're entering into the third now. The first era was business intelligence. It was basically built and governed by IT, it was system of record kind of reporting. And as far as I can recall, it probably started around 1988 or at least that's the year that Howard Dresner claims to have invented the term. I'm not sure it's true. And things happened before 1988 that was sort of like BI, but 88 was when they really started coming out, that's when we saw BusinessObjects and Cognos and MicroStrategy and those kinds of things. The second generation just popped out on everybody else. We're all looking around at BI and we were saying why isn't this working? Why are only five people in the organization using this? Why are we not getting value out of this massive license we bought? And along comes companies like Tableau doing data discovery, visualization, data prep and Line of Business people are using this now. But it's still the same kind of data sources. It's moved out a little bit, but it still hasn't really hit the Big Data thing. Now we're in third generation, so we not only had Big Data, which has come and hit us like a tsunami, but we're looking at smart discovery, we're looking at machine learning. We're looking at AI induced analytics workflows. And then all the natural language cousins. You know, natural language processing, natural language, what's? Oh Q, natural language query. Natural language generation. Anybody here know what natural language generation is? Yeah, so what you see now is you do some sort of analysis and that tool comes up and says this chart is about the following and it used the following data, and it's blah blah blah blah blah. I think it's kind of wordy and it's going to refined some, but it's an interesting, it's an interesting thing to do. Now, the problem I see with Edge Analytics and IoT in general is that most of the canonical examples we talk about are pretty thin. I know we talk about autonomous cars, I hope to God we never have them, 'cause I'm a car guy. Fleet Management, I think Qualcomm started Fleet Management in 1988, that is not a new application. Industrial controls. I seem to remember, I seem to remember Honeywell doing industrial controls at least in the 70s and before that I wasn't, I don't want to talk about what I was doing, but I definitely wasn't in this industry. So my feeling is we all need to sit down and think about this and get creative. Because the real value in Edge Analytics or IoT, whatever you want to call it, the real value is going to be figuring out something that's new or different. Creating a brand new business. Changing the way an operation happens in a company, right? And I think there's a lot of smart people out there and I think there's a million apps that we haven't even talked about so, if you as a vendor come to me and tell me how great your product is, please don't talk to me about autonomous cars or Fleet Managing, 'cause I've heard about that, okay? Now, hardware and architecture are really not the most interesting thing. We fell into that trap with data warehousing. We've fallen into that trap with Big Data. We talk about speeds and feeds. Somebody said to me the other day, what's the narrative of this company? This is a technology provider. And I said as far as I can tell, they don't have a narrative they have some products and they compete in a space. And when they go to clients and the clients say, what's the value of your product? They don't have an answer for that. So we don't want to fall into this trap, okay? Because IoT is going to inform you in ways you've never even dreamed about. Unfortunately some of them are going to be really stinky, you know, they're going to be really bad. You're going to lose more of your privacy, it's going to get harder to get, I dunno, mortgage for example, I dunno, maybe it'll be easier, but in any case, it's not going to all be good. So let's really think about what you want to do with this technology to do something that's really valuable. Cost takeout is not the place to justify an IoT project. Because number one, it's very expensive, and number two, it's a waste of the technology because you should be looking at, you know the old numerator denominator thing? You should be looking at the numerators and forget about the denominators because that's not what you do with IoT. And the other thing is you don't want to get over confident. Actually this is good advice about anything, right? But in this case, I love this quote by Derek Sivers He's a pretty funny guy. He said, "If more information was the answer, "then we'd all be billionaires with perfect abs." I'm not sure what's on his wishlist, but you know, I would, those aren't necessarily the two things I would think of, okay. Now, what I said about the data, I want to explain some more. Big Data Analytics, if you look at this graphic, it depicts it perfectly. It's a bunch of different stuff falling into the funnel. All right? It comes from other places, it's not original material. And when it comes in, it's always used as second hand data. Now what does that mean? That means that you have to figure out the semantics of this information and you have to find a way to put it together in a way that's useful to you, okay. That's Big Data. That's where we are. How is that different from IoT data? It's like I said, IoT is original. You can put it together any way you want because no one else has ever done that before. It's yours to construct, okay. You don't even have to transform it into a schema because you're creating the new application. But the most important thing is you have to take care of it 'cause if you lose it, it's gone. It's the original data. It's the same way, in operational systems for a long long time we've always been concerned about backup and security and everything else. You better believe this is a problem. I know a lot of people think about streaming data, that we're going to look at it for a minute, and we're going to throw most of it away. Personally I don't think that's going to happen. I think it's all going to be saved, at least for a while. Now, the governance and security, oh, by the way, I don't know where you're going to find a presentation where somebody uses a newspaper clipping about Vladimir Lenin, but here it is, enjoy yourselves. I believe that when people think about governance and security today they're still thinking along the same grids that we thought about it all along. But this is very very different and again, I'm sorry I keep thrashing this around, but this is treasured data that has to be carefully taken care of. Now when I say governance, my experience has been over the years that governance is something that IT does to make everybody's lives miserable. But that's not what I mean by governance today. It means a comprehensive program to really secure the value of the data as an asset. And you need to think about this differently. Now the other thing is you may not get to think about it differently, because some of the stuff may end up being subject to regulation. And if the regulators start regulating some of this, then that'll take some of the degrees of freedom away from you in how you put this together, but you know, that's the way it works. Now, machine learning, I think I told somebody the other day that claims about machine learning in software products are as common as twisters in trail parks. And a lot of it is not really what I'd call machine learning. But there's a lot of it around. And I think all of the open source machine learning and artificial intelligence that's popped up, it's great because all those math PhDs who work at Home Depot now have something to do when they go home at night and they construct this stuff. But if you're going to have machine learning at the Edge, here's the question, what kind of machine learning would you have at the Edge? As opposed to developing your models back at say, the cloud, when you transmit the data there. The devices at the Edge are not very powerful. And they don't have a lot of memory. So you're only going to be able to do things that have been modeled or constructed somewhere else. But that's okay. Because machine learning algorithm development is actually slow and painful. So you really want the people who know how to do this working with gobs of data creating models and testing them offline. And when you have something that works, you can put it there. Now there's one thing I want to talk about before I finish, and I think I'm almost finished. I wrote a book about 10 years ago about automated decision making and the conclusion that I came up with was that little decisions add up, and that's good. But it also means you don't have to get them all right. But you don't want computers or software making decisions unattended if it involves human life, or frankly any life. Or the environment. So when you think about the applications that you can build using this architecture and this technology, think about the fact that you're not going to be doing air traffic control, you're not going to be monitoring crossing guards at the elementary school. You're going to be doing things that may seem fairly mundane. Managing machinery on the factory floor, I mean that may sound great, but really isn't that interesting. Managing well heads, drilling for oil, well I mean, it's great to the extent that it doesn't cause wells to explode, but they don't usually explode. What it's usually used for is to drive the cost out of preventative maintenance. Not very interesting. So use your heads. Come up with really cool stuff. And any of you who are involved in Edge Analytics, the next time I talk to you I don't want to hear about the same five applications that everybody talks about. Let's hear about some new ones. So, in conclusion, I don't really have anything in conclusion except that Peter mentioned something about limousines bringing people up here. On Monday I was slogging up and down Park Avenue and Madison Avenue with my client and we were visiting all the hedge funds there because we were doing a project with them. And in the miserable weather I looked at him and I said, for godsake Paul, where's the black car? And he said, that was the 90s. (laughs) Thank you. So, Jim, up to you. (audience applauding) This is terrible, go that way, this was terrible coming that way. >> Woo, don't want to trip! And let's move to, there we go. Hi everybody, how ya doing? Thanks Neil, thanks Peter, those were great discussions. So I'm the third leg in this relay race here, talking about of course how software is eating the world. And focusing on the value of Edge Analytics in a lot of real world scenarios. Programming the real world for, to make the world a better place. So I will talk, I'll break it out analytically in terms of the research that Wikibon is doing in the area of the IoT, but specifically how AI intelligence is being embedded really to all material reality potentially at the Edge. But mobile applications and industrial IoT and the smart appliances and self driving vehicles. I will break it out in terms of a reference architecture for understanding what functions are being pushed to the Edge to hardware, to our phones and so forth to drive various scenarios in terms of real world results. So I'll move a pace here. So basically AI software or AI microservices are being infused into Edge hardware as we speak. What we see is more vendors of smart phones and other, real world appliances and things like smart driving, self driving vehicles. What they're doing is they're instrumenting their products with computer vision and natural language processing, environmental awareness based on sensing and actuation and those capabilities and inferences that these devices just do to both provide human support for human users of these devices as well as to enable varying degrees of autonomous operation. So what I'll be talking about is how AI is a foundation for data driven systems of agency of the sort that Peter is talking about. Infusing data driven intelligence into everything or potentially so. As more of this capability, all these algorithms for things like, ya know for doing real time predictions and classifications, anomaly detection and so forth, as this functionality gets diffused widely and becomes more commoditized, you'll see it burned into an ever-wider variety of hardware architecture, neuro synaptic chips, GPUs and so forth. So what I've got here in front of you is a sort of a high level reference architecture that we're building up in our research at Wikibon. So AI, artificial intelligence is a big term, a big paradigm, I'm not going to unpack it completely. Of course we don't have oodles of time so I'm going to take you fairly quickly through the high points. It's a driver for systems of agency. Programming the real world. Transducing digital inputs, the data, to analog real world results. Through the embedding of this capability in the IoT, but pushing more and more of it out to the Edge with points of decision and action in real time. And there are four capabilities that we're seeing in terms of AI enabled, enabling capabilities that are absolutely critical to software being pushed to the Edge are sensing, actuation, inference and Learning. Sensing and actuation like Peter was describing, it's about capturing data from the environment within which a device or users is operating or moving. And then actuation is the fancy term for doing stuff, ya know like industrial IoT, it's obviously machine controlled, but clearly, you know self driving vehicles is steering a vehicle and avoiding crashing and so forth. Inference is the meat and potatoes as it were of AI. Analytics does inferences. It infers from the data, the logic of the application. Predictive logic, correlations, classification, abstractions, differentiation, anomaly detection, recognizing faces and voices. We see that now with Apple and the latest version of the iPhone is embedding face recognition as a core, as the core multifactor authentication technique. Clearly that's a harbinger of what's going to be universal fairly soon which is that depends on AI. That depends on convolutional neural networks, that is some heavy hitting processing power that's necessary and it's processing the data that's coming from your face. So that's critically important. So what we're looking at then is the AI software is taking root in hardware to power continuous agency. Getting stuff done. Powered decision support by human beings who have to take varying degrees of action in various environments. We don't necessarily want to let the car steer itself in all scenarios, we want some degree of override, for lots of good reasons. They want to protect life and limb including their own. And just more data driven automation across the internet of things in the broadest sense. So unpacking this reference framework, what's happening is that AI driven intelligence is powering real time decisioning at the Edge. Real time local sensing from the data that it's capturing there, it's ingesting the data. Some, not all of that data, may be persistent at the Edge. Some, perhaps most of it, will be pushed into the cloud for other processing. When you have these highly complex algorithms that are doing AI deep learning, multilayer, to do a variety of anti-fraud and higher level like narrative, auto-narrative roll-ups from various scenes that are unfolding. A lot of this processing is going to begin to happen in the cloud, but a fair amount of the more narrowly scoped inferences that drive real time decision support at the point of action will be done on the device itself. Contextual actuation, so it's the sensor data that's captured by the device along with other data that may be coming down in real time streams through the cloud will provide the broader contextual envelope of data needed to drive actuation, to drive various models and rules and so forth that are making stuff happen at the point of action, at the Edge. Continuous inference. What it all comes down to is that inference is what's going on inside the chips at the Edge device. And what we're seeing is a growing range of hardware architectures, GPUs, CPUs, FPGAs, ASIC, Neuro synaptic chips of all sorts playing in various combinations that are automating more and more very complex inference scenarios at the Edge. And not just individual devices, swarms of devices, like drones and so forth are essentially an Edge unto themselves. You'll see these tiered hierarchies of Edge swarms that are playing and doing inferences of ever more complex dynamic nature. And much of this will be, this capability, the fundamental capabilities that is powering them all will be burned into the hardware that powers them. And then adaptive learning. Now I use the term learning rather than training here, training is at the core of it. Training means everything in terms of the predictive fitness or the fitness of your AI services for whatever task, predictions, classifications, face recognition that you, you've built them for. But I use the term learning in a broader sense. It's what's make your inferences get better and better, more accurate over time is that you're training them with fresh data in a supervised learning environment. But you can have reinforcement learning if you're doing like say robotics and you don't have ground truth against which to train the data set. You know there's maximize a reward function versus minimize a loss function, you know, the standard approach, the latter for supervised learning. There's also, of course, the issue, or not the issue, the approach of unsupervised learning with cluster analysis critically important in a lot of real world scenarios. So Edge AI Algorithms, clearly, deep learning which is multilayered machine learning models that can do abstractions at higher and higher levels. Face recognition is a high level abstraction. Faces in a social environment is an even higher level of abstraction in terms of groups. Faces over time and bodies and gestures, doing various things in various environments is an even higher level abstraction in terms of narratives that can be rolled up, are being rolled up by deep learning capabilities of great sophistication. Convolutional neural networks for processing images, recurrent neural networks for processing time series. Generative adversarial networks for doing essentially what's called generative applications of all sort, composing music, and a lot of it's being used for auto programming. These are all deep learning. There's a variety of other algorithm approaches I'm not going to bore you with here. Deep learning is essentially the enabler of the five senses of the IoT. Your phone's going to have, has a camera, it has a microphone, it has the ability to of course, has geolocation and navigation capabilities. It's environmentally aware, it's got an accelerometer and so forth embedded therein. The reason that your phone and all of the devices are getting scary sentient is that they have the sensory modalities and the AI, the deep learning that enables them to make environmentally correct decisions in the wider range of scenarios. So machine learning is the foundation of all of this, but there are other, I mean of deep learning, artificial neural networks is the foundation of that. But there are other approaches for machine learning I want to make you aware of because support vector machines and these other established approaches for machine learning are not going away but really what's driving the show now is deep learning, because it's scary effective. And so that's where most of the investment in AI is going into these days for deep learning. AI Edge platforms, tools and frameworks are just coming along like gangbusters. Much development of AI, of deep learning happens in the context of your data lake. This is where you're storing your training data. This is the data that you use to build and test to validate in your models. So we're seeing a deepening stack of Hadoop and there's Kafka, and Spark and so forth that are driving the training (coughs) excuse me, of AI models that are power all these Edge Analytic applications so that that lake will continue to broaden in terms, and deepen in terms of a scope and the range of data sets and the range of modeling, AI modeling supports. Data science is critically important in this scenario because the data scientist, the data science teams, the tools and techniques and flows of data science are the fundamental development paradigm or discipline or capability that's being leveraged to build and to train and to deploy and iterate all this AI that's being pushed to the Edge. So clearly data science is at the center, data scientists of an increasingly specialized nature are necessary to the realization to this value at the Edge. AI frameworks are coming along like you know, a mile a minute. TensorFlow has achieved a, is an open source, most of these are open source, has achieved sort of almost like a defacto standard, status, I'm using the word defacto in air quotes. There's Theano and Keras and xNet and CNTK and a variety of other ones. We're seeing range of AI frameworks come to market, most open source. Most are supported by most of the major tool vendors as well. So at Wikibon we're definitely tracking that, we plan to go deeper in our coverage of that space. And then next best action, powers recommendation engines. I mean next best action decision automation of the sort of thing Neil's covered in a variety of contexts in his career is fundamentally important to Edge Analytics to systems of agency 'cause it's driving the process automation, decision automation, sort of the targeted recommendations that are made at the Edge to individual users as well as to process that automation. That's absolutely necessary for self driving vehicles to do their jobs and industrial IoT. So what we're seeing is more and more recommendation engine or recommender capabilities powered by ML and DL are going to the Edge, are already at the Edge for a variety of applications. Edge AI capabilities, like I said, there's sensing. And sensing at the Edge is becoming ever more rich, mixed reality Edge modalities of all sort are for augmented reality and so forth. We're just seeing a growth in certain, the range of sensory modalities that are enabled or filtered and analyzed through AI that are being pushed to the Edge, into the chip sets. Actuation, that's where robotics comes in. Robotics is coming into all aspects of our lives. And you know, it's brainless without AI, without deep learning and these capabilities. Inference, autonomous edge decisioning. Like I said, it's, a growing range of inferences that are being done at the Edge. And that's where it has to happen 'cause that's the point of decision. Learning, training, much training, most training will continue to be done in the cloud because it's very data intensive. It's a grind to train and optimize an AI algorithm to do its job. It's not something that you necessarily want to do or can do at the Edge at Edge devices so, the models that are built and trained in the cloud are pushed down through a dev ops process down to the Edge and that's the way it will work pretty much in most AI environments, Edge analytics environments. You centralize the modeling, you decentralize the execution of the inference models. The training engines will be in the cloud. Edge AI applications. I'll just run you through sort of a core list of the ones that are coming into, already come into the mainstream at the Edge. Multifactor authentication, clearly the Apple announcement of face recognition is just a harbinger of the fact that that's coming to every device. Computer vision speech recognition, NLP, digital assistance and chat bots powered by natural language processing and understanding, it's all AI powered. And it's becoming very mainstream. Emotion detection, face recognition, you know I could go on and on but these are like the core things that everybody has access to or will by 2020 and they're core devices, mass market devices. Developers, designers and hardware engineers are coming together to pool their expertise to build and train not just the AI, but also the entire package of hardware in UX and the orchestration of real world business scenarios or life scenarios that all this intelligence, the submitted intelligence enables and most, much of what they build in terms of AI will be containerized as micro services through Docker and orchestrated through Kubernetes as full cloud services in an increasingly distributed fabric. That's coming along very rapidly. We can see a fair amount of that already on display at Strata in terms of what the vendors are doing or announcing or who they're working with. The hardware itself, the Edge, you know at the Edge, some data will be persistent, needs to be persistent to drive inference. That's, and you know to drive a variety of different application scenarios that need some degree of historical data related to what that device in question happens to be sensing or has sensed in the immediate past or you know, whatever. The hardware itself is geared towards both sensing and increasingly persistence and Edge driven actuation of real world results. The whole notion of drones and robotics being embedded into everything that we do. That's where that comes in. That has to be powered by low cost, low power commodity chip sets of various sorts. What we see right now in terms of chip sets is it's a GPUs, Nvidia has gone real far and GPUs have come along very fast in terms of power inference engines, you know like the Tesla cars and so forth. But GPUs are in many ways the core hardware sub straight for in inference engines in DL so far. But to become a mass market phenomenon, it's got to get cheaper and lower powered and more commoditized, and so we see a fair number of CPUs being used as the hardware for Edge Analytic applications. Some vendors are fairly big on FPGAs, I believe Microsoft has gone fairly far with FPGAs inside DL strategy. ASIC, I mean, there's neuro synaptic chips like IBM's got one. There's at least a few dozen vendors of neuro synaptic chips on the market so at Wikibon we're going to track that market as it develops. And what we're seeing is a fair number of scenarios where it's a mixed environment where you use one chip set architecture at the inference side of the Edge, and other chip set architectures that are driving the DL as processed in the cloud, playing together within a common architecture. And we see some, a fair number of DL environments where the actual training is done in the cloud on Spark using CPUs and parallelized in memory, but pushing Tensorflow models that might be trained through Spark down to the Edge where the inferences are done in FPGAs and GPUs. Those kinds of mixed hardware scenarios are very, very, likely to be standard going forward in lots of areas. So analytics at the Edge power continuous results is what it's all about. The whole point is really not moving the data, it's putting the inference at the Edge and working from the data that's already captured and persistent there for the duration of whatever action or decision or result needs to be powered from the Edge. Like Neil said cost takeout alone is not worth doing. Cost takeout alone is not the rationale for putting AI at the Edge. It's getting new stuff done, new kinds of things done in an automated consistent, intelligent, contextualized way to make our lives better and more productive. Security and governance are becoming more important. Governance of the models, governance of the data, governance in a dev ops context in terms of version controls over all those DL models that are built, that are trained, that are containerized and deployed. Continuous iteration and improvement of those to help them learn to do, make our lives better and easier. With that said, I'm going to hand it over now. It's five minutes after the hour. We're going to get going with the Influencer Panel so what we'd like to do is I call Peter, and Peter's going to call our influencers. >> All right, am I live yet? Can you hear me? All right so, we've got, let me jump back in control here. We've got, again, the objective here is to have community take on some things. And so what we want to do is I want to invite five other people up, Neil why don't you come on up as well. Start with Neil. You can sit here. On the far right hand side, Judith, Judith Hurwitz. >> Neil: I'm glad I'm on the left side. >> From the Hurwitz Group. >> From the Hurwitz Group. Jennifer Shin who's affiliated with UC Berkeley. Jennifer are you here? >> She's here, Jennifer where are you? >> She was here a second ago. >> Neil: I saw her walk out she may have, >> Peter: All right, she'll be back in a second. >> Here's Jennifer! >> Here's Jennifer! >> Neil: With 8 Path Solutions, right? >> Yep. >> Yeah 8 Path Solutions. >> Just get my mic. >> Take your time Jen. >> Peter: All right, Stephanie McReynolds. Far left. And finally Joe Caserta, Joe come on up. >> Stephie's with Elysian >> And to the left. So what I want to do is I want to start by having everybody just go around introduce yourself quickly. Judith, why don't we start there. >> I'm Judith Hurwitz, I'm president of Hurwitz and Associates. We're an analyst research and fault leadership firm. I'm the co-author of eight books. Most recent is Cognitive Computing and Big Data Analytics. I've been in the market for a couple years now. >> Jennifer. >> Hi, my name's Jennifer Shin. I'm the founder and Chief Data Scientist 8 Path Solutions LLC. We do data science analytics and technology. We're actually about to do a big launch next month, with Box actually. >> We're apparent, are we having a, sorry Jennifer, are we having a problem with Jennifer's microphone? >> Man: Just turn it back on? >> Oh you have to turn it back on. >> It was on, oh sorry, can you hear me now? >> Yes! We can hear you now. >> Okay, I don't know how that turned back off, but okay. >> So you got to redo all that Jen. >> Okay, so my name's Jennifer Shin, I'm founder of 8 Path Solutions LLC, it's a data science analytics and technology company. I founded it about six years ago. So we've been developing some really cool technology that we're going to be launching with Box next month. It's really exciting. And I have, I've been developing a lot of patents and some technology as well as teaching at UC Berkeley as a lecturer in data science. >> You know Jim, you know Neil, Joe, you ready to go? >> Joe: Just broke my microphone. >> Joe's microphone is broken. >> Joe: Now it should be all right. >> Jim: Speak into Neil's. >> Joe: Hello, hello? >> I just feel not worthy in the presence of Joe Caserta. (several laughing) >> That's right, master of mics. If you can hear me, Joe Caserta, so yeah, I've been doing data technology solutions since 1986, almost as old as Neil here, but been doing specifically like BI, data warehousing, business intelligence type of work since 1996. And been doing, wholly dedicated to Big Data solutions and modern data engineering since 2009. Where should I be looking? >> Yeah I don't know where is the camera? >> Yeah, and that's basically it. So my company was formed in 2001, it's called Caserta Concepts. We recently rebranded to only Caserta 'cause what we do is way more than just concepts. So we conceptualize the stuff, we envision what the future brings and we actually build it. And we help clients large and small who are just, want to be leaders in innovation using data specifically to advance their business. >> Peter: And finally Stephanie McReynolds. >> I'm Stephanie McReynolds, I had product marketing as well as corporate marketing for a company called Elysian. And we are a data catalog so we help bring together not only a technical understanding of your data, but we curate that data with human knowledge and use automated intelligence internally within the system to make recommendations about what data to use for decision making. And some of our customers like City of San Diego, a large automotive manufacturer working on self driving cars and General Electric use Elysian to help power their solutions for IoT at the Edge. >> All right so let's jump right into it. And again if you have a question, raise your hand, and we'll do our best to get it to the floor. But what I want to do is I want to get seven questions in front of this group and have you guys discuss, slog, disagree, agree. Let's start here. What is the relationship between Big Data AI and IoT? Now Wikibon's put forward its observation that data's being generated at the Edge, that action is being taken at the Edge and then increasingly the software and other infrastructure architectures need to accommodate the realities of how data is going to work in these very complex systems. That's our perspective. Anybody, Judith, you want to start? >> Yeah, so I think that if you look at AI machine learning, all these different areas, you have to be able to have the data learned. Now when it comes to IoT, I think one of the issues we have to be careful about is not all data will be at the Edge. Not all data needs to be analyzed at the Edge. For example if the light is green and that's good and it's supposed to be green, do you really have to constantly analyze the fact that the light is green? You actually only really want to be able to analyze and take action when there's an anomaly. Well if it goes purple, that's actually a sign that something might explode, so that's where you want to make sure that you have the analytics at the edge. Not for everything, but for the things where there is an anomaly and a change. >> Joe, how about from your perspective? >> For me I think the evolution of data is really becoming, eventually oxygen is just, I mean data's going to be the oxygen we breathe. It used to be very very reactive and there used to be like a latency. You do something, there's a behavior, there's an event, there's a transaction, and then you go record it and then you collect it, and then you can analyze it. And it was very very waterfallish, right? And then eventually we figured out to put it back into the system. Or at least human beings interpret it to try to make the system better and that is really completely turned on it's head, we don't do that anymore. Right now it's very very, it's synchronous, where as we're actually making these transactions, the machines, we don't really need, I mean human beings are involved a bit, but less and less and less. And it's just a reality, it may not be politically correct to say but it's a reality that my phone in my pocket is following my behavior, and it knows without telling a human being what I'm doing. And it can actually help me do things like get to where I want to go faster depending on my preference if I want to save money or save time or visit things along the way. And I think that's all integration of big data, streaming data, artificial intelligence and I think the next thing that we're going to start seeing is the culmination of all of that. I actually, hopefully it'll be published soon, I just wrote an article for Forbes with the term of ARBI and ARBI is the integration of Augmented Reality and Business Intelligence. Where I think essentially we're going to see, you know, hold your phone up to Jim's face and it's going to recognize-- >> Peter: It's going to break. >> And it's going to say exactly you know, what are the key metrics that we want to know about Jim. If he works on my sales force, what's his attainment of goal, what is-- >> Jim: Can it read my mind? >> Potentially based on behavior patterns. >> Now I'm scared. >> I don't think Jim's buying it. >> It will, without a doubt be able to predict what you've done in the past, you may, with some certain level of confidence you may do again in the future, right? And is that mind reading? It's pretty close, right? >> Well, sometimes, I mean, mind reading is in the eye of the individual who wants to know. And if the machine appears to approximate what's going on in the person's head, sometimes you can't tell. So I guess, I guess we could call that the Turing machine test of the paranormal. >> Well, face recognition, micro gesture recognition, I mean facial gestures, people can do it. Maybe not better than a coin toss, but if it can be seen visually and captured and analyzed, conceivably some degree of mind reading can be built in. I can see when somebody's angry looking at me so, that's a possibility. That's kind of a scary possibility in a surveillance society, potentially. >> Neil: Right, absolutely. >> Peter: Stephanie, what do you think? >> Well, I hear a world of it's the bots versus the humans being painted here and I think that, you know at Elysian we have a very strong perspective on this and that is that the greatest impact, or the greatest results is going to be when humans figure out how to collaborate with the machines. And so yes, you want to get to the location more quickly, but the machine as in the bot isn't able to tell you exactly what to do and you're just going to blindly follow it. You need to train that machine, you need to have a partnership with that machine. So, a lot of the power, and I think this goes back to Judith's story is then what is the human decision making that can be augmented with data from the machine, but then the humans are actually training the training side and driving machines in the right direction. I think that's when we get true power out of some of these solutions so it's not just all about the technology. It's not all about the data or the AI, or the IoT, it's about how that empowers human systems to become smarter and more effective and more efficient. And I think we're playing that out in our technology in a certain way and I think organizations that are thinking along those lines with IoT are seeing more benefits immediately from those projects. >> So I think we have a general agreement of what kind of some of the things you talked about, IoT, crucial capturing information, and then having action being taken, AI being crucial to defining and refining the nature of the actions that are being taken Big Data ultimately powering how a lot of that changes. Let's go to the next one. >> So actually I have something to add to that. So I think it makes sense, right, with IoT, why we have Big Data associated with it. If you think about what data is collected by IoT. We're talking about a serial information, right? It's over time, it's going to grow exponentially just by definition, right, so every minute you collect a piece of information that means over time, it's going to keep growing, growing, growing as it accumulates. So that's one of the reasons why the IoT is so strongly associated with Big Data. And also why you need AI to be able to differentiate between one minute versus next minute, right? Trying to find a better way rather than looking at all that information and manually picking out patterns. To have some automated process for being able to filter through that much data that's being collected. >> I want to point out though based on what you just said Jennifer, I want to bring Neil in at this point, that this question of IoT now generating unprecedented levels of data does introduce this idea of the primary source. Historically what we've done within technology, or within IT certainly is we've taken stylized data. There is no such thing as a real world accounting thing. It is a human contrivance. And we stylize data and therefore it's relatively easy to be very precise on it. But when we start, as you noted, when we start measuring things with a tolerance down to thousandths of a millimeter, whatever that is, metric system, now we're still sometimes dealing with errors that we have to attend to. So, the reality is we're not just dealing with stylized data, we're dealing with real data, and it's more, more frequent, but it also has special cases that we have to attend to as in terms of how we use it. What do you think Neil? >> Well, I mean, I agree with that, I think I already said that, right. >> Yes you did, okay let's move on to the next one. >> Well it's a doppelganger, the digital twin doppelganger that's automatically created by your very fact that you're living and interacting and so forth and so on. It's going to accumulate regardless. Now that doppelganger may not be your agent, or might not be the foundation for your agent unless there's some other piece of logic like an interest graph that you build, a human being saying this is my broad set of interests, and so all of my agents out there in the IoT, you all need to be aware that when you make a decision on my behalf as my agent, this is what Jim would do. You know I mean there needs to be that kind of logic somewhere in this fabric to enable true agency. >> All right, so I'm going to start with you. Oh go ahead. >> I have a real short answer to this though. I think that Big Data provides the data and compute platform to make AI possible. For those of us who dipped our toes in the water in the 80s, we got clobbered because we didn't have the, we didn't have the facilities, we didn't have the resources to really do AI, we just kind of played around with it. And I think that the other thing about it is if you combine Big Data and AI and IoT, what you're going to see is people, a lot of the applications we develop now are very inward looking, we look at our organization, we look at our customers. We try to figure out how to sell more shoes to fashionable ladies, right? But with this technology, I think people can really expand what they're thinking about and what they model and come up with applications that are much more external. >> Actually what I would add to that is also it actually introduces being able to use engineering, right? Having engineers interested in the data. Because it's actually technical data that's collected not just say preferences or information about people, but actual measurements that are being collected with IoT. So it's really interesting in the engineering space because it opens up a whole new world for the engineers to actually look at data and to actually combine both that hardware side as well as the data that's being collected from it. >> Well, Neil, you and I have talked about something, 'cause it's not just engineers. We have in the healthcare industry for example, which you know a fair amount about, there's this notion of empirical based management. And the idea that increasingly we have to be driven by data as a way of improving the way that managers do things, the way the managers collect or collaborate and ultimately collectively how they take action. So it's not just engineers, it's supposed to also inform business, what's actually happening in the healthcare world when we start thinking about some of this empirical based management, is it working? What are some of the barriers? >> It's not a function of technology. What happens in medicine and healthcare research is, I guess you can say it borders on fraud. (people chuckling) No, I'm not kidding. I know the New England Journal of Medicine a couple of years ago released a study and said that at least half their articles that they published turned out to be written, ghost written by pharmaceutical companies. (man chuckling) Right, so I think the problem is that when you do a clinical study, the one that really killed me about 10 years ago was the women's health initiative. They spent $700 million gathering this data over 20 years. And when they released it they looked at all the wrong things deliberately, right? So I think that's a systemic-- >> I think you're bringing up a really important point that we haven't brought up yet, and that is is can you use Big Data and machine learning to begin to take the biases out? So if you let the, if you divorce your preconceived notions and your biases from the data and let the data lead you to the logic, you start to, I think get better over time, but it's going to take a while to get there because we do tend to gravitate towards our biases. >> I will share an anecdote. So I had some arm pain, and I had numbness in my thumb and pointer finger and I went to, excruciating pain, went to the hospital. So the doctor examined me, and he said you probably have a pinched nerve, he said, but I'm not exactly sure which nerve it would be, I'll be right back. And I kid you not, he went to a computer and he Googled it. (Neil laughs) And he came back because this little bit of information was something that could easily be looked up, right? Every nerve in your spine is connected to your different fingers so the pointer and the thumb just happens to be your C6, so he came back and said, it's your C6. (Neil mumbles) >> You know an interesting, I mean that's a good example. One of the issues with healthcare data is that the data set is not always shared across the entire research community, so by making Big Data accessible to everyone, you actually start a more rational conversation or debate on well what are the true insights-- >> If that conversation includes what Judith talked about, the actual model that you use to set priorities and make decisions about what's actually important. So it's not just about improving, this is the test. It's not just about improving your understanding of the wrong thing, it's also testing whether it's the right or wrong thing as well. >> That's right, to be able to test that you need to have humans in dialog with one another bringing different biases to the table to work through okay is there truth in this data? >> It's context and it's correlation and you can have a great correlation that's garbage. You know if you don't have the right context. >> Peter: So I want to, hold on Jim, I want to, >> It's exploratory. >> Hold on Jim, I want to take it to the next question 'cause I want to build off of what you talked about Stephanie and that is that this says something about what is the Edge. And our perspective is that the Edge is not just devices. That when we talk about the Edge, we're talking about human beings and the role that human beings are going to play both as sensors or carrying things with them, but also as actuators, actually taking action which is not a simple thing. So what do you guys think? What does the Edge mean to you? Joe, why don't you start? >> Well, I think it could be a combination of the two. And specifically when we talk about healthcare. So I believe in 2017 when we eat we don't know why we're eating, like I think we should absolutely by now be able to know exactly what is my protein level, what is my calcium level, what is my potassium level? And then find the foods to meet that. What have I depleted versus what I should have, and eat very very purposely and not by taste-- >> And it's amazing that red wine is always the answer. >> It is. (people laughing) And tequila, that helps too. >> Jim: You're a precision foodie is what you are. (several chuckle) >> There's no reason why we should not be able to know that right now, right? And when it comes to healthcare is, the biggest problem or challenge with healthcare is no matter how great of a technology you have, you can't, you can't, you can't manage what you can't measure. And you're really not allowed to use a lot of this data so you can't measure it, right? You can't do things very very scientifically right, in the healthcare world and I think regulation in the healthcare world is really burdening advancement in science. >> Peter: Any thoughts Jennifer? >> Yes, I teach statistics for data scientists, right, so you know we talk about a lot of these concepts. I think what makes these questions so difficult is you have to find a balance, right, a middle ground. For instance, in the case of are you being too biased through data, well you could say like we want to look at data only objectively, but then there are certain relationships that your data models might show that aren't actually a causal relationship. For instance, if there's an alien that came from space and saw earth, saw the people, everyone's carrying umbrellas right, and then it started to rain. That alien might think well, it's because they're carrying umbrellas that it's raining. Now we know from real world that that's actually not the way these things work. So if you look only at the data, that's the potential risk. That you'll start making associations or saying something's causal when it's actually not, right? So that's one of the, one of the I think big challenges. I think when it comes to looking also at things like healthcare data, right? Do you collect data about anything and everything? Does it mean that A, we need to collect all that data for the question we're looking at? Or that it's actually the best, more optimal way to be able to get to the answer? Meaning sometimes you can take some shortcuts in terms of what data you collect and still get the right answer and not have maybe that level of specificity that's going to cost you millions extra to be able to get. >> So Jennifer as a data scientist, I want to build upon what you just said. And that is, are we going to start to see methods and models emerge for how we actually solve some of these problems? So for example, we know how to build a system for stylized process like accounting or some elements of accounting. We have methods and models that lead to technology and actions and whatnot all the way down to that that system can be generated. We don't have the same notion to the same degree when we start talking about AI and some of these Big Datas. We have algorithms, we have technology. But are we going to start seeing, as a data scientist, repeatability and learning and how to think the problems through that's going to lead us to a more likely best or at least good result? >> So I think that's a bit of a tough question, right? Because part of it is, it's going to depend on how many of these researchers actually get exposed to real world scenarios, right? Research looks into all these papers, and you come up with all these models, but if it's never tested in a real world scenario, well, I mean we really can't validate that it works, right? So I think it is dependent on how much of this integration there's going to be between the research community and industry and how much investment there is. Funding is going to matter in this case. If there's no funding in the research side, then you'll see a lot of industry folk who feel very confident about their models that, but again on the other side of course, if researchers don't validate those models then you really can't say for sure that it's actually more accurate, or it's more efficient. >> It's the issue of real world testing and experimentation, A B testing, that's standard practice in many operationalized ML and AI implementations in the business world, but real world experimentation in the Edge analytics, what you're actually transducing are touching people's actual lives. Problem there is, like in healthcare and so forth, when you're experimenting with people's lives, somebody's going to die. I mean, in other words, that's a critical, in terms of causal analysis, you've got to tread lightly on doing operationalizing that kind of testing in the IoT when people's lives and health are at stake. >> We still give 'em placebos. So we still test 'em. All right so let's go to the next question. What are the hottest innovations in AI? Stephanie I want to start with you as a company, someone at a company that's got kind of an interesting little thing happening. We start thinking about how do we better catalog data and represent it to a large number of people. What are some of the hottest innovations in AI as you see it? >> I think it's a little counter intuitive about what the hottest innovations are in AI, because we're at a spot in the industry where the most successful companies that are working with AI are actually incorporating them into solutions. So the best AI solutions are actually the products that you don't know there's AI operating underneath. But they're having a significant impact on business decision making or bringing a different type of application to the market and you know, I think there's a lot of investment that's going into AI tooling and tool sets for data scientists or researchers, but the more innovative companies are thinking through how do we really take AI and make it have an impact on business decision making and that means kind of hiding the AI to the business user. Because if you think a bot is making a decision instead of you, you're not going to partner with that bot very easily or very readily. I worked at, way at the start of my career, I worked in CRM when recommendation engines were all the rage online and also in call centers. And the hardest thing was to get a call center agent to actually read the script that the algorithm was presenting to them, that algorithm was 99% correct most of the time, but there was this human resistance to letting a computer tell you what to tell that customer on the other side even if it was more successful in the end. And so I think that the innovation in AI that's really going to push us forward is when humans feel like they can partner with these bots and they don't think of it as a bot, but they think about as assisting their work and getting to a better result-- >> Hence the augmentation point you made earlier. >> Absolutely, absolutely. >> Joe how 'about you? What do you look at? What are you excited about? >> I think the coolest thing at the moment right now is chat bots. Like to be able, like to have voice be able to speak with you in natural language, to do that, I think that's pretty innovative, right? And I do think that eventually, for the average user, not for techies like me, but for the average user, I think keyboards are going to be a thing of the past. I think we're going to communicate with computers through voice and I think this is the very very beginning of that and it's an incredible innovation. >> Neil? >> Well, I think we all have myopia here. We're all thinking about commercial applications. Big, big things are happening with AI in the intelligence community, in military, the defense industry, in all sorts of things. Meteorology. And that's where, well, hopefully not on an every day basis with military, you really see the effect of this. But I was involved in a project a couple of years ago where we were developing AI software to detect artillery pieces in terrain from satellite imagery. I don't have to tell you what country that was. I think you can probably figure that one out right? But there are legions of people in many many companies that are involved in that industry. So if you're talking about the dollars spent on AI, I think the stuff that we do in our industries is probably fairly small. >> Well it reminds me of an application I actually thought was interesting about AI related to that, AI being applied to removing mines from war zones. >> Why not? >> Which is not a bad thing for a whole lot of people. Judith what do you look at? >> So I'm looking at things like being able to have pre-trained data sets in specific solution areas. I think that that's something that's coming. Also the ability to, to really be able to have a machine assist you in selecting the right algorithms based on what your data looks like and the problems you're trying to solve. Some of the things that data scientists still spend a lot of their time on, but can be augmented with some, basically we have to move to levels of abstraction before this becomes truly ubiquitous across many different areas. >> Peter: Jennifer? >> So I'm going to say computer vision. >> Computer vision? >> Computer vision. So computer vision ranges from image recognition to be able to say what content is in the image. Is it a dog, is it a cat, is it a blueberry muffin? Like a sort of popular post out there where it's like a blueberry muffin versus like I think a chihuahua and then it compares the two. And can the AI really actually detect difference, right? So I think that's really where a lot of people who are in this space of being in both the AI space as well as data science are looking to for the new innovations. I think, for instance, cloud vision I think that's what Google still calls it. The vision API we've they've released on beta allows you to actually use an API to send your image and then have it be recognized right, by their API. There's another startup in New York called Clarify that also does a similar thing as well as you know Amazon has their recognition platform as well. So I think in a, from images being able to detect what's in the content as well as from videos, being able to say things like how many people are entering a frame? How many people enter the store? Not having to actually go look at it and count it, but having a computer actually tally that information for you, right? >> There's actually an extra piece to that. So if I have a picture of a stop sign, and I'm an automated car, and is it a picture on the back of a bus of a stop sign, or is it a real stop sign? So that's going to be one of the complications. >> Doesn't matter to a New York City cab driver. How 'about you Jim? >> Probably not. (laughs) >> Hottest thing in AI is General Adversarial Networks, GANT, what's hot about that, well, I'll be very quick, most AI, most deep learning, machine learning is analytical, it's distilling or inferring insights from the data. Generative takes that same algorithmic basis but to build stuff. In other words, to create realistic looking photographs, to compose music, to build CAD CAM models essentially that can be constructed on 3D printers. So GANT, it's a huge research focus all around the world are used for, often increasingly used for natural language generation. In other words it's institutionalizing or having a foundation for nailing the Turing test every single time, building something with machines that looks like it was constructed by a human and doing it over and over again to fool humans. I mean you can imagine the fraud potential. But you can also imagine just the sheer, like it's going to shape the world, GANT. >> All right so I'm going to say one thing, and then we're going to ask if anybody in the audience has an idea. So the thing that I find interesting is traditional programs, or when you tell a machine to do something you don't need incentives. When you tell a human being something, you have to provide incentives. Like how do you get someone to actually read the text. And this whole question of elements within AI that incorporate incentives as a way of trying to guide human behavior is absolutely fascinating to me. Whether it's gamification, or even some things we're thinking about with block chain and bitcoins and related types of stuff. To my mind that's going to have an enormous impact, some good, some bad. Anybody in the audience? I don't want to lose everybody here. What do you think sir? And I'll try to do my best to repeat it. Oh we have a mic. >> So my question's about, Okay, so the question's pretty much about what Stephanie's talking about which is human and loop training right? I come from a computer vision background. That's the problem, we need millions of images trained, we need humans to do that. And that's like you know, the workforce is essentially people that aren't necessarily part of the AI community, they're people that are just able to use that data and analyze the data and label that data. That's something that I think is a big problem everyone in the computer vision industry at least faces. I was wondering-- >> So again, but the problem is that is the difficulty of methodologically bringing together people who understand it and people who, people who have domain expertise people who have algorithm expertise and working together? >> I think the expertise issue comes in healthcare, right? In healthcare you need experts to be labeling your images. With contextual information where essentially augmented reality applications coming in, you have the AR kit and everything coming out, but there is a lack of context based intelligence. And all of that comes through training images, and all of that requires people to do it. And that's kind of like the foundational basis of AI coming forward is not necessarily an algorithm, right? It's how well are datas labeled? Who's doing the labeling and how do we ensure that it happens? >> Great question. So for the panel. So if you think about it, a consultant talks about being on the bench. How much time are they going to have to spend on trying to develop additional business? How much time should we set aside for executives to help train some of the assistants? >> I think that the key is not, to think of the problem a different way is that you would have people manually label data and that's one way to solve the problem. But you can also look at what is the natural workflow of that executive, or that individual? And is there a way to gather that context automatically using AI, right? And if you can do that, it's similar to what we do in our product, we observe how someone is analyzing the data and from those observations we can actually create the metadata that then trains the system in a particular direction. But you have to think about solving the problem differently of finding the workflow that then you can feed into to make this labeling easy without the human really realizing that they're labeling the data. >> Peter: Anybody else? >> I'll just add to what Stephanie said, so in the IoT applications, all those sensory modalities, the computer vision, the speech recognition, all that, that's all potential training data. So it cross checks against all the other models that are processing all the other data coming from that device. So that the natural language process of understanding can be reality checked against the images that the person happens to be commenting upon, or the scene in which they're embedded, so yeah, the data's embedded-- >> I don't think we're, we're not at the stage yet where this is easy. It's going to take time before we do start doing the pre-training of some of these details so that it goes faster, but right now, there're not that many shortcuts. >> Go ahead Joe. >> Sorry so a couple things. So one is like, I was just caught up on your incentivizing programs to be more efficient like humans. You know in Ethereum that has this notion, which is bot chain, has this theory, this concept of gas. Where like as the process becomes more efficient it costs less to actually run, right? It costs less ether, right? So it actually is kind of, the machine is actually incentivized and you don't really know what it's going to cost until the machine processes it, right? So there is like some notion of that there. But as far as like vision, like training the machine for computer vision, I think it's through adoption and crowdsourcing, so as people start using it more they're going to be adding more pictures. Very very organically. And then the machines will be trained and right now is a very small handful doing it, and it's very proactive by the Googles and the Facebooks and all of that. But as we start using it, as they start looking at my images and Jim's and Jen's images, it's going to keep getting smarter and smarter through adoption and through very organic process. >> So Neil, let me ask you a question. Who owns the value that's generated as a consequence of all these people ultimately contributing their insight and intelligence into these systems? >> Well, to a certain extent the people who are contributing the insight own nothing because the systems collect their actions and the things they do and then that data doesn't belong to them, it belongs to whoever collected it or whoever's going to do something with it. But the other thing, getting back to the medical stuff. It's not enough to say that the systems, people will do the right thing, because a lot of them are not motivated to do the right thing. The whole grant thing, the whole oh my god I'm not going to go against the senior professor. A lot of these, I knew a guy who was a doctor at University of Pittsburgh and they were doing a clinical study on the tubes that they put in little kids' ears who have ear infections, right? And-- >> Google it! Who helps out? >> Anyway, I forget the exact thing, but he came out and said that the principle investigator lied when he made the presentation, that it should be this, I forget which way it went. He was fired from his position at Pittsburgh and he has never worked as a doctor again. 'Cause he went against the senior line of authority. He was-- >> Another question back here? >> Man: Yes, Mark Turner has a question. >> Not a question, just want to piggyback what you're saying about the transfixation of maybe in healthcare of black and white images and color images in the case of sonograms and ultrasound and mammograms, you see that happening using AI? You see that being, I mean it's already happening, do you see it moving forward in that kind of way? I mean, talk more about that, about you know, AI and black and white images being used and they can be transfixed, they can be made to color images so you can see things better, doctors can perform better operations. >> So I'm sorry, but could you summarize down? What's the question? Summarize it just, >> I had a lot of students, they're interested in the cross pollenization between AI and say the medical community as far as things like ultrasound and sonograms and mammograms and how you can literally take a black and white image and it can, using algorithms and stuff be made to color images that can help doctors better do the work that they've already been doing, just do it better. You touched on it like 30 seconds. >> So how AI can be used to actually add information in a way that's not necessarily invasive but is ultimately improves how someone might respond to it or use it, yes? Related? I've also got something say about medical images in a second, any of you guys want to, go ahead Jennifer. >> Yeah, so for one thing, you know and it kind of goes back to what we were talking about before. When we look at for instance scans, like at some point I was looking at CT scans, right, for lung cancer nodules. In order for me, who I don't have a medical background, to identify where the nodule is, of course, a doctor actually had to go in and specify which slice of the scan had the nodule and where exactly it is, so it's on both the slice level as well as, within that 2D image, where it's located and the size of it. So the beauty of things like AI is that ultimately right now a radiologist has to look at every slice and actually identify this manually, right? The goal of course would be that one day we wouldn't have to have someone look at every slice to like 300 usually slices and be able to identify it much more automated. And I think the reality is we're not going to get something where it's going to be 100%. And with anything we do in the real world it's always like a 95% chance of it being accurate. So I think it's finding that in between of where, what's the threshold that we want to use to be able to say that this is, definitively say a lung cancer nodule or not. I think the other thing to think about is in terms of how their using other information, what they might use is a for instance, to say like you know, based on other characteristics of the person's health, they might use that as sort of a grading right? So you know, how dark or how light something is, identify maybe in that region, the prevalence of that specific variable. So that's usually how they integrate that information into something that's already existing in the computer vision sense. I think that's, the difficulty with this of course, is being able to identify which variables were introduced into data that does exist. >> So I'll make two quick observations on this then I'll go to the next question. One is radiologists have historically been some of the highest paid physicians within the medical community partly because they don't have to be particularly clinical. They don't have to spend a lot of time with patients. They tend to spend time with doctors which means they can do a lot of work in a little bit of time, and charge a fair amount of money. As we start to introduce some of these technologies that allow us to from a machine standpoint actually make diagnoses based on those images, I find it fascinating that you now see television ads promoting the role that the radiologist plays in clinical medicine. It's kind of an interesting response. >> It's also disruptive as I'm seeing more and more studies showing that deep learning models processing images, ultrasounds and so forth are getting as accurate as many of the best radiologists. >> That's the point! >> Detecting cancer >> Now radiologists are saying oh look, we do this great thing in terms of interacting with the patients, never have because they're being dis-intermediated. The second thing that I'll note is one of my favorite examples of that if I got it right, is looking at the images, the deep space images that come out of Hubble. Where they're taking data from thousands, maybe even millions of images and combining it together in interesting ways you can actually see depth. You can actually move through to a very very small scale a system that's 150, well maybe that, can't be that much, maybe six billion light years away. Fascinating stuff. All right so let me go to the last question here, and then I'm going to close it down, then we can have something to drink. What are the hottest, oh I'm sorry, question? >> Yes, hi, my name's George, I'm with Blue Talon. You asked earlier there the question what's the hottest thing in the Edge and AI, I would say that it's security. It seems to me that before you can empower agency you need to be able to authorize what they can act on, how they can act on, who they can act on. So it seems if you're going to move from very distributed data at the Edge and analytics at the Edge, there has to be security similarly done at the Edge. And I saw (speaking faintly) slides that called out security as a key prerequisite and maybe Judith can comment, but I'm curious how security's going to evolve to meet this analytics at the Edge. >> Well, let me do that and I'll ask Jen to comment. The notion of agency is crucially important, slightly different from security, just so we're clear. And the basic idea here is historically folks have thought about moving data or they thought about moving application function, now we are thinking about moving authority. So as you said. That's not necessarily, that's not really a security question, but this has been a problem that's been in, of concern in a number of different domains. How do we move authority with the resources? And that's really what informs the whole agency process. But with that said, Jim. >> Yeah actually I'll, yeah, thank you for bringing up security so identity is the foundation of security. Strong identity, multifactor, face recognition, biometrics and so forth. Clearly AI, machine learning, deep learning are powering a new era of biometrics and you know it's behavioral metrics and so forth that's organic to people's use of devices and so forth. You know getting to the point that Peter was raising is important, agency! Systems of agency. Your agent, you have to, you as a human being should be vouching in a secure, tamper proof way, your identity should be vouching for the identity of some agent, physical or virtual that does stuff on your behalf. How can that, how should that be managed within this increasingly distributed IoT fabric? Well a lot of that's been worked. It all ran through webs of trust, public key infrastructure, formats and you know SAML for single sign and so forth. It's all about assertion, strong assertions and vouching. I mean there's the whole workflows of things. Back in the ancient days when I was actually a PKI analyst three analyst firms ago, I got deep into all the guts of all those federation agreements, something like that has to be IoT scalable to enable systems agency to be truly fluid. So we can vouch for our agents wherever they happen to be. We're going to keep on having as human beings agents all over creation, we're not even going to be aware of everywhere that our agents are, but our identity-- >> It's not just-- >> Our identity has to follow. >> But it's not just identity, it's also authorization and context. >> Permissioning, of course. >> So I may be the right person to do something yesterday, but I'm not authorized to do it in another context in another application. >> Role based permissioning, yeah. Or persona based. >> That's right. >> I agree. >> And obviously it's going to be interesting to see the role that block chain or its follow on to the technology is going to play here. Okay so let me throw one more questions out. What are the hottest applications of AI at the Edge? We've talked about a number of them, does anybody want to add something that hasn't been talked about? Or do you want to get a beer? (people laughing) Stephanie, you raised your hand first. >> I was going to go, I bring something mundane to the table actually because I think one of the most exciting innovations with IoT and AI are actually simple things like City of San Diego is rolling out 3200 automated street lights that will actually help you find a parking space, reduce the amount of emissions into the atmosphere, so has some environmental change, positive environmental change impact. I mean, it's street lights, it's not like a, it's not medical industry, it doesn't look like a life changing innovation, and yet if we automate streetlights and we manage our energy better, and maybe they can flicker on and off if there's a parking space there for you, that's a significant impact on everyone's life. >> And dramatically suppress the impact of backseat driving! >> (laughs) Exactly. >> Joe what were you saying? >> I was just going to say you know there's already the technology out there where you can put a camera on a drone with machine learning within an artificial intelligence within it, and it can look at buildings and determine whether there's rusty pipes and cracks in cement and leaky roofs and all of those things. And that's all based on artificial intelligence. And I think if you can do that, to be able to look at an x-ray and determine if there's a tumor there is not out of the realm of possibility, right? >> Neil? >> I agree with both of them, that's what I meant about external kind of applications. Instead of figuring out what to sell our customers. Which is most what we hear. I just, I think all of those things are imminently doable. And boy street lights that help you find a parking place, that's brilliant, right? >> Simple! >> It improves your life more than, I dunno. Something I use on the internet recently, but I think it's great! That's, I'd like to see a thousand things like that. >> Peter: Jim? >> Yeah, building on what Stephanie and Neil were saying, it's ambient intelligence built into everything to enable fine grain microclimate awareness of all of us as human beings moving through the world. And enable reading of every microclimate in buildings. In other words, you know you have sensors on your body that are always detecting the heat, the humidity, the level of pollution or whatever in every environment that you're in or that you might be likely to move into fairly soon and either A can help give you guidance in real time about where to avoid, or give that environment guidance about how to adjust itself to your, like the lighting or whatever it might be to your specific requirements. And you know when you have a room like this, full of other human beings, there has to be some negotiated settlement. Some will find it too hot, some will find it too cold or whatever but I think that is fundamental in terms of reshaping the sheer quality of experience of most of our lived habitats on the planet potentially. That's really the Edge analytics application that depends on everybody having, being fully equipped with a personal area network of sensors that's communicating into the cloud. >> Jennifer? >> So I think, what's really interesting about it is being able to utilize the technology we do have, it's a lot cheaper now to have a lot of these ways of measuring that we didn't have before. And whether or not engineers can then leverage what we have as ways to measure things and then of course then you need people like data scientists to build the right model. So you can collect all this data, if you don't build the right model that identifies these patterns then all that data's just collected and it's just made a repository. So without having the models that supports patterns that are actually in the data, you're not going to find a better way of being able to find insights in the data itself. So I think what will be really interesting is to see how existing technology is leveraged, to collect data and then how that's actually modeled as well as to be able to see how technology's going to now develop from where it is now, to being able to either collect things more sensitively or in the case of say for instance if you're dealing with like how people move, whether we can build things that we can then use to measure how we move, right? Like how we move every day and then being able to model that in a way that is actually going to give us better insights in things like healthcare and just maybe even just our behaviors. >> Peter: Judith? >> So, I think we also have to look at it from a peer to peer perspective. So I may be able to get some data from one thing at the Edge, but then all those Edge devices, sensors or whatever, they all have to interact with each other because we don't live, we may, in our business lives, act in silos, but in the real world when you look at things like sensors and devices it's how they react with each other on a peer to peer basis. >> All right, before I invite John up, I want to say, I'll say what my thing is, and it's not the hottest. It's the one I hate the most. I hate AI generated music. (people laughing) Hate it. All right, I want to thank all the panelists, every single person, some great commentary, great observations. I want to thank you very much. I want to thank everybody that joined. John in a second you'll kind of announce who's the big winner. But the one thing I want to do is, is I was listening, I learned a lot from everybody, but I want to call out the one comment that I think we all need to remember, and I'm going to give you the award Stephanie. And that is increasing we have to remember that the best AI is probably AI that we don't even know is working on our behalf. The same flip side of that is all of us have to be very cognizant of the idea that AI is acting on our behalf and we may not know it. So, John why don't you come on up. Who won the, whatever it's called, the raffle? >> You won. >> Thank you! >> How 'about a round of applause for the great panel. (audience applauding) Okay we have a put the business cards in the basket, we're going to have that brought up. We're going to have two raffle gifts, some nice Bose headsets and speaker, Bluetooth speaker. Got to wait for that. I just want to say thank you for coming and for the folks watching, this is our fifth year doing our own event called Big Data NYC which is really an extension of the landscape beyond the Big Data world that's Cloud and AI and IoT and other great things happen and great experts and influencers and analysts here. Thanks for sharing your opinion. Really appreciate you taking the time to come out and share your data and your knowledge, appreciate it. Thank you. Where's the? >> Sam's right in front of you. >> There's the thing, okay. Got to be present to win. We saw some people sneaking out the back door to go to a dinner. >> First prize first. >> Okay first prize is the Bose headset. >> Bluetooth and noise canceling. >> I won't look, Sam you got to hold it down, I can see the cards. >> All right. >> Stephanie you won! (Stephanie laughing) Okay, Sawny Cox, Sawny Allie Cox? (audience applauding) Yay look at that! He's here! The bar's open so help yourself, but we got one more. >> Congratulations. Picture right here. >> Hold that I saw you. Wake up a little bit. Okay, all right. Next one is, my kids love this. This is great, great for the beach, great for everything portable speaker, great gift. >> What is it? >> Portable speaker. >> It is a portable speaker, it's pretty awesome. >> Oh you grabbed mine. >> Oh that's one of our guys. >> (lauging) But who was it? >> Can't be related! Ava, Ava, Ava. Okay Gene Penesko (audience applauding) Hey! He came in! All right look at that, the timing's great. >> Another one? (people laughing) >> Hey thanks everybody, enjoy the night, thank Peter Burris, head of research for SiliconANGLE, Wikibon and he great guests and influencers and friends. And you guys for coming in the community. Thanks for watching and thanks for coming. Enjoy the party and some drinks and that's out, that's it for the influencer panel and analyst discussion. Thank you. (logo music)
SUMMARY :
is that the cloud is being extended out to the Edge, the next time I talk to you I don't want to hear that are made at the Edge to individual users We've got, again, the objective here is to have community From the Hurwitz Group. And finally Joe Caserta, Joe come on up. And to the left. I've been in the market for a couple years now. I'm the founder and Chief Data Scientist We can hear you now. And I have, I've been developing a lot of patents I just feel not worthy in the presence of Joe Caserta. If you can hear me, Joe Caserta, so yeah, I've been doing We recently rebranded to only Caserta 'cause what we do to make recommendations about what data to use the realities of how data is going to work in these to make sure that you have the analytics at the edge. and ARBI is the integration of Augmented Reality And it's going to say exactly you know, And if the machine appears to approximate what's and analyzed, conceivably some degree of mind reading but the machine as in the bot isn't able to tell you kind of some of the things you talked about, IoT, So that's one of the reasons why the IoT of the primary source. Well, I mean, I agree with that, I think I already or might not be the foundation for your agent All right, so I'm going to start with you. a lot of the applications we develop now are very So it's really interesting in the engineering space And the idea that increasingly we have to be driven I know the New England Journal of Medicine So if you let the, if you divorce your preconceived notions So the doctor examined me, and he said you probably have One of the issues with healthcare data is that the data set the actual model that you use to set priorities and you can have a great correlation that's garbage. What does the Edge mean to you? And then find the foods to meet that. And tequila, that helps too. Jim: You're a precision foodie is what you are. in the healthcare world and I think regulation For instance, in the case of are you being too biased We don't have the same notion to the same degree but again on the other side of course, in the Edge analytics, what you're actually transducing What are some of the hottest innovations in AI and that means kind of hiding the AI to the business user. I think keyboards are going to be a thing of the past. I don't have to tell you what country that was. AI being applied to removing mines from war zones. Judith what do you look at? and the problems you're trying to solve. And can the AI really actually detect difference, right? So that's going to be one of the complications. Doesn't matter to a New York City cab driver. (laughs) So GANT, it's a huge research focus all around the world So the thing that I find interesting is traditional people that aren't necessarily part of the AI community, and all of that requires people to do it. So for the panel. of finding the workflow that then you can feed into that the person happens to be commenting upon, It's going to take time before we do start doing and Jim's and Jen's images, it's going to keep getting Who owns the value that's generated as a consequence But the other thing, getting back to the medical stuff. and said that the principle investigator lied and color images in the case of sonograms and ultrasound and say the medical community as far as things in a second, any of you guys want to, go ahead Jennifer. to say like you know, based on other characteristics I find it fascinating that you now see television ads as many of the best radiologists. and then I'm going to close it down, It seems to me that before you can empower agency Well, let me do that and I'll ask Jen to comment. agreements, something like that has to be IoT scalable and context. So I may be the right person to do something yesterday, Or persona based. that block chain or its follow on to the technology into the atmosphere, so has some environmental change, the technology out there where you can put a camera And boy street lights that help you find a parking place, That's, I'd like to see a thousand things like that. that are always detecting the heat, the humidity, patterns that are actually in the data, but in the real world when you look at things and I'm going to give you the award Stephanie. and for the folks watching, We saw some people sneaking out the back door I can see the cards. Stephanie you won! Picture right here. This is great, great for the beach, great for everything All right look at that, the timing's great. that's it for the influencer panel and analyst discussion.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Judith | PERSON | 0.99+ |
Jennifer | PERSON | 0.99+ |
Jim | PERSON | 0.99+ |
Neil | PERSON | 0.99+ |
Stephanie McReynolds | PERSON | 0.99+ |
Jack | PERSON | 0.99+ |
2001 | DATE | 0.99+ |
Marc Andreessen | PERSON | 0.99+ |
Jim Kobielus | PERSON | 0.99+ |
Jennifer Shin | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Joe Caserta | PERSON | 0.99+ |
Suzie Welch | PERSON | 0.99+ |
Joe | PERSON | 0.99+ |
David Floyer | PERSON | 0.99+ |
Peter | PERSON | 0.99+ |
Stephanie | PERSON | 0.99+ |
Jen | PERSON | 0.99+ |
Neil Raden | PERSON | 0.99+ |
Mark Turner | PERSON | 0.99+ |
Judith Hurwitz | PERSON | 0.99+ |
John | PERSON | 0.99+ |
Elysian | ORGANIZATION | 0.99+ |
Uber | ORGANIZATION | 0.99+ |
Qualcomm | ORGANIZATION | 0.99+ |
Peter Burris | PERSON | 0.99+ |
2017 | DATE | 0.99+ |
Honeywell | ORGANIZATION | 0.99+ |
Apple | ORGANIZATION | 0.99+ |
Derek Sivers | PERSON | 0.99+ |
New York | LOCATION | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
New York City | LOCATION | 0.99+ |
1998 | DATE | 0.99+ |
Tanmay Bakshi, IBM Honorary Cloud Advisor | Open Source Summit 2017
>> Announcer: Live from Los Angeles. It's theCUBE covering Open Source Summit North America 2017. Brought to you by, the Linux Foundation and Red Hat. >> Hello everyone, welcome back. Our live coverage, theCUBE's live coverage, of the Open Source Summit in North America, it's a part of the Linux Foundation. I'm John Furrier your host, with Stu Miniman our co-host. Our next guest is Tanmay Bakshi, who is an IBM honorary cloud advisor, algorithmist, former CUBE alumni. Great to see you. >> Thank you very much! Glad to be here! >> You get taller every year. It was what, three years ago, two years ago? >> I believe yeah, two years ago, Interconnect 2016. >> IBM show... doing a lot of great stuff. You're an IBM VIP, you're doing a lot of work with them. IBM Champion. >> Thank you >> Congratulations. >> Thank you. >> What's new? You're pushing any code today? >> Definitely! Now today, getting ready for my BoF that I've got tonight, it's been absolutely great. I've been working on a lot of new projects that I'm going to be talking about today and tomorrow at my keynote. Like I've been working on AskTanmay, or course you know, Interconnect 2016, very first time I presented AskTanmay. Since then, a lot has changed, I've incorporated real, deep learning algorithms, custom, with tensorflow. Into AskTanmay, AskTanmay now thinks about what it's actually looking at, using Watson as well, it's really interesting. And of course, new projects that I'm working on, including DeepSPADE, which actually, basically helps online communities, to detect, and of course report and flag spam, from different websites. For example, Stack Overflow, which I'm working on right now. >> So you're doing some deep learning stuff >> Tanmay: Yes >> with IBM Watson, the team, everything else. >> Tanmay: Exactly, yes. >> What's the coolest thing you've worked on, since we last talked? (laughing) >> Well it would have to be a tie between AskTanmay, DeepSPADE, and advancement to the Cognitive Story. As you know, from last time, I've been working on lots of interesting projects, like with AskTanmay, some great new updates that you'll hear about today. DeepSPADE itself though, I'd like to get a little bit more into that. There's actually, I mean of course, everyone listening right now has used Stack Overflow or Stack Exchange at one point in their lives. And so, they've probably noticed that, a little bit, here and there, you'd see a spam message on Stack Overflow, on a comment or post. And of course there are methods to try and prevent spam on Stack Overflow, but they aren't very effective. And that's why a group of programmers, known as Charcoal SE, actually went ahead and started creating, basically this sweep to try and prevent spam on Stack Exchange. And they call it, SmokeDetector. And it helps them to find and remove spam on Stack Exchange. >> This is so good until it goes out, and the battery needs to be replaced, and you got to get on a chair. But this whole SmokeDetector, this is a real way they help create a good, healthy community. >> Yes, exactly. So, they try and basically find spam, report to moderators, and if enough alarms are set off, they try and report it, or flag it automatically, via other people's accounts. And so basically, what I'm trying to do is, I mean, a few weeks ago, when I found out about what they're doing, I found out that they use regular expressions to try and find spam. And so they have, you know, years of people gathering experience, they're experts in this field. And they keep, you know, adding more regular expressions to try and find spam. And since I, you know, am really really passionate about deep learning, I thought why not try and help them out, trying to augment this sort of SmokeDetector, with deep learning. And so, they graciously donated their data set to me, which has a good amount of training, training rows for me to actually train a deep learning system to classify a post between spam or non-spam. And you'll be hearing a lot more about the model architecture, the CNN plus GRU model, that I've got running in Keras, tonight during my BoF. >> Now, machine learning, could be a real benefit to spam detection, cause the patterns. >> Tanmay: Exactly. >> Spammers tend to have their own patterns, >> Tanmay: Exactly. >> as do bots. >> Tanmay: Yes, exactly, exactly. And eventually, you realize that hey, maybe we're not using the same words in every post, but there's a specific pattern of words, or specific type of word, that always appears in a spam message. And machine learning would help us combat against that. And of course, in this case, maybe we don't actually have a word, or a specific website, or a specific phone number, that would trigger a regular expression alarm. But in the context that this website appears, machine learning can tell us that, "hey, yeah, this is probably a spam post." There are lots of really interesting places where machine learning can tie in with this, and help out with the accuracy. In fact, I've been able to reach around 98% accuracy, and around 15 thousand testing rows. So, I'm very glad with the results so far, and of course, I'm continuing to do all this brand retuning and everything... >> Alright, so how old are you this year? I can't keep the numbers straight. Are you 13, 14? >> Well originally, Interconnect 2016, I was 12, but now I'm 13 years old, and I'm going to be 14 in October, October 16th. >> Okay, so you're knocking on 14? >> Tanmay: Uh, not just yet there, I'll be 14... >> So, Tanmay, you're 14, you're time's done, at this point. But, one of your missions, to be serious, is helping to inspire the next generation. Especially here, at the Open Source Summit, give us a preview of what we're going to see in your keynote. >> Sure, definitely. And now, as you mentioned, in fact, I actually have a goal. Which is really to reach out to and help 100 thousand aspiring coders along their journey, of learning to code, and of course then applying that code in lots of different fields. In fact I'm actually, already around 4,500 people there. Which, I'm very very excited about. But today, during my BoF, as I mentioned, I'm going to be talking a lot about the in-depth of the DeepSpade and AskTanmay projects I've been working on. But tomorrow, during my keynote, you'll be hearing a lot about generally all the projects that I've been working on, and how they're impacting lots of different fields. Like, healthcare, utility, security via artificial intelligence and machine learning. >> So, when you first talked to us about AskTanmay, it's been what almost 18 months, I think there. What's changed, what's accelerating? I hear you throw out things like Tensorflow, not something we were talking about two years ago. >> Tanmay: Yeah. >> What have been some of the key learnings you've had, as you've really dug into this? >> Sure, in fact, this actually something that I'm going to be covering tonight. And that is, that AskTanmay, you could say, that it's DNA, well, from AskMSR, that was made in 2002. And I took that, revived it, and basically made it into AskTanmay. In its DNA, there were specific elements, like for example, it really relies on data redundancy. If there's no data redundancy, then AskTanmay doesn't do well. If you were to ask it where it was, where's the Open Source Summit North America going to be held, it wouldn't answer correctly, because it's not redundant enough on the internet. It's mentioned once or twice, but not more than that. And so, I learned that it's currently very, I guess you could say naive how it actually understands the data that it's collecting. However, over the past, I'd say around six or seven months, I've been able to implement a BiDAF or Bi-Directional Attention Flow, that was created by Allen AI. It's completely open-source, and it uses something that's called a SQuAD data set, or Stanford Question and Answer Data Set. In order to actually take paragraphs and questions, and try to return answers as snippets from the paragraphs. And so again, integrating AskTanmay, this allows me to really reduce the data redundancy requirement, able to merge very similar answers to have, you know better answers on the top of the list, and of course I'm able to have it more smart, it's not as naive. It actually understands the content that it's gathering from search engines. For example, Google and Bing, which I've also added search support for. So again, a lot has changed, using deep learning but still, sort of the key-points of AskTanmay requires very little computational power, very very cross-platform, runs on any operating system, including iOS, Android, etc. And of course, from there, open-source completely. >> So how has your life changed, since all the, you've been really in the spotlight, and well-deserved I think. It's been great to have you On theCUBE multiple times, thanks for coming on. >> Thank you No, definitely of course. >> Dave Vallante was just calling. He wants to ask you a few questions himself. Dave, if you're watching, we'll get you on, just call right now. What's going on, what are you going to do when... Are you like happy right now? Are you cool with everything? Or is there a point where you say, "Hey I want to play a little bit with different tools", you want more freedom? What's going on? >> Well, you see, right now I'm very very excited, I'm very happy with what I'm doing. Because of course I mean, my life generally has changed quite a bit since last Interconnect, you could say. From Interconnect 2016 to 17, to now. Of course, since then, I've been able to go into lots of different fields. Not only am I working with general deep learning at IBM Watson, now I'm working with lots of different tools. And I'm working especially, in terms of like, for example Linux. What I've been doing with open-source and everything. I've been able to create, for example, AskTanmay now integrated Keras and tensorflow. DeepSpade is actually built entirely off of tensorflow and Keras. And now I've also been able to venture into lots of different APIs as well. Not just with IBM Watson. Also things like, we've got the Dandelion API. Which AskTanmay also relies off of Dandelion, providing text similarity services for semantic and syntactic text similarity. Which, again, we'll be talking about tonight as well. So, yeah, lot's has changed, and of course, with all this sort of, new stuff that I'm able to show, or new media for which I'm able to share my knowledge, for example, all these, you know CUBE, interviews I've been doing, and of course all these keynotes, I'm able to really spread my message about AI, why I believe it's not only our future, but also our present. Like, for example, I also mentioned this last time. If you were to just open up your phone right now, you already see that you're, half of your phone is powered by AI. It's detecting that hey you're at your home right now, you just drove back from work, and it's this time on this day, so you probably want to open up this application. It predicts that, and provides you with that. Apart from that, things like Siri, Google Now, these are all powered by AI, they're already an integral part of our lives. And of course, what they're going to be doing in our lives to come is just absolutely great. With like, healthcare, providing artificial communication ability for people who can't communicate naturally. I think it's going to be really really interesting. >> Tanmay, it's always great have you on theCUBE. Congratulations. >> Tanmay: Thank you very much. >> AskTanmay, good projects. Let's stay in touch, as we start to produce more collaboration, we'd love to keep promoting your work. Great job. And you're an inspiration to many. >> Tanmay: Thank you very much, glad to be here. >> Thanks for coming on theCUBE. Live coverage from the Open Source Summit's theCUBE, in Los Angeles. I'm John Furrer, Stu Miniman. We'll be back with more live coverage after short this break. (upbeat music)
SUMMARY :
Brought to you by, Great to see you. It was what, three years ago, two years ago? You're an IBM VIP, you're doing a lot of work with them. that I'm going to be talking about today And it helps them to find and the battery needs to be replaced, And so they have, you know, could be a real benefit to spam detection, And eventually, you realize that hey, Alright, so how old are you this year? and I'm going to be 14 in October, October 16th. to be serious, And now, as you mentioned, in fact, I hear you throw out things like Tensorflow, and of course I'm able to have it more smart, It's been great to have you Thank you What's going on, what are you going to do when... And now I've also been able to venture into lots Tanmay, it's always great have you on theCUBE. And you're an inspiration to many. from the Open Source Summit's theCUBE, in Los Angeles.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Vallante | PERSON | 0.99+ |
Tanmay | PERSON | 0.99+ |
Stu Miniman | PERSON | 0.99+ |
John Furrer | PERSON | 0.99+ |
Tanmay Bakshi | PERSON | 0.99+ |
Red Hat | ORGANIZATION | 0.99+ |
Dave | PERSON | 0.99+ |
Linux Foundation | ORGANIZATION | 0.99+ |
2002 | DATE | 0.99+ |
John Furrier | PERSON | 0.99+ |
Siri | TITLE | 0.99+ |
Los Angeles | LOCATION | 0.99+ |
tomorrow | DATE | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
CUBE | ORGANIZATION | 0.99+ |
three years ago | DATE | 0.99+ |
today | DATE | 0.99+ |
tonight | DATE | 0.99+ |
two years ago | DATE | 0.99+ |
13 | QUANTITY | 0.99+ |
iOS | TITLE | 0.99+ |
Linux | TITLE | 0.99+ |
ORGANIZATION | 0.99+ | |
100 thousand | QUANTITY | 0.99+ |
Android | TITLE | 0.99+ |
North America | LOCATION | 0.99+ |
CNN | ORGANIZATION | 0.99+ |
twice | QUANTITY | 0.98+ |
AskTanmay | ORGANIZATION | 0.98+ |
Open Source Summit | EVENT | 0.98+ |
14 | QUANTITY | 0.98+ |
theCUBE | ORGANIZATION | 0.98+ |
first | QUANTITY | 0.97+ |
12 | QUANTITY | 0.97+ |
around 98% | QUANTITY | 0.97+ |
Interconnect | ORGANIZATION | 0.97+ |
IBM Watson | ORGANIZATION | 0.97+ |
DeepSpade | TITLE | 0.97+ |
Stanford | ORGANIZATION | 0.97+ |
Bing | ORGANIZATION | 0.97+ |
around 4,500 people | QUANTITY | 0.96+ |
Open Source Summit North America 2017 | EVENT | 0.96+ |
Open Source Summit 2017 | EVENT | 0.96+ |
GRU | ORGANIZATION | 0.95+ |
first time | QUANTITY | 0.95+ |
Dandelion | TITLE | 0.95+ |
Stack Overflow | TITLE | 0.95+ |
once | QUANTITY | 0.94+ |
Keras | TITLE | 0.93+ |
Open Source Summit North America | EVENT | 0.92+ |
one point | QUANTITY | 0.92+ |
this year | DATE | 0.92+ |
around 15 thousand testing rows | QUANTITY | 0.91+ |
around six | QUANTITY | 0.9+ |
Interconnect | TITLE | 0.9+ |
Ron Bodkin, Teradata - DataWorks Summit 2017
>> Announcer: Live from San Jose in the heart of Silicon Valley, It's theCUBE covering DataWorks Summit 2017. Brought to you by Hortonworks. >> Welcome back to theCUBE. We are live at the DataWorks Summit on day two. We have had a great day and a half learning a lot about the next generation of big data, machine learning, artificial intelligence, I'm Lisa Martin, and my co-host is George Gilbert. We are next joined by a CUBE alumni, Ron Bodkin, the VP and General Manager of Artificial Intelligence for Teradata. Welcome back to theCUBE! >> Well thank you Lisa, it's nice to be here. >> Yeah, so talk to us about what you're doing right now. Your keynote is tomorrow. >> Ron: Yeah. >> What are you doing, what is Teradata doing in helping customers to be able to leverage artificial intelligence? >> Sure, yeah so as you may know, I ha`ve been involved in this conference and the big data space for a long time as the founding CEO of Think Big Analytics. We were involved in really helping customers in the beginning of big data in the enterprise. And so, we are seeing a very similar trend in the space of artificial intelligence, right? The rapid advances in recent years in deep learning have opened up a lot of opportunity to really create value from all the data the customers have in their data ecosystems, right? So Teradata has a big role to play in having high quality product, Teradata database, analytic ecosystem products such as Hadoop, such as QueryGrid for connecting these systems together, right? So what we're seeing is our customers are very excited by artificial intelligence, but what we're really focused on is how do they get to the value, right? What can they do that's really going to get results, right? And we bring this perspective of having this strong solutions approach inside of Teradata, and so we have Think Big Analytics consulting for data science, we now have been building up experts in deep learning in that organization, working with customers, right? We've brought product functionality so we're innovating around how do we keep pushing the Teradata product family forward with functionality around streaming with listeners. Functionality like the ability to, how do you take GPU and start to think about how can we add that and make that deploy efficiently inside our customer's data center. How can you take advantage of innovation in open source with projects like TensorFlow and Keras becoming important for our customers. So we're seeing is a lot of customers are excited about use cases for artificial intelligence. And tomorrow in the keynote I'm going to touch on a few of them, ranging from applications like preventative maintenance, anti-fraud in banking, to e-commerce recommendations and we're seeing those are some of the examples of use cases where customers are saying hey, there's a lot of value in combining traditional machine learning, wide learning, with deep learning using neural nets to generalize. >> Help us understand if there's an arc where there's the mix of what's repeatable and what's packagable, or what's custom, how that changes over time, or whether it's just by solution. >> Yeah, it's a great question. Right, I mean I think there's a lot of infrastructure that any of these systems need to rest on. So having data infrastructure, having quality data that you can rely on is foundational, and so you need to get that installed and working well as a beginning point. Obviously having repeatable products that manage data with high SLAs and supporting not use production use, but also how do you let data scientists analyze data in a lab and make that work well. So there's that foundational data layer. Then there's the whole integration of the data science into applications, which is critical, analytics, ops, agile ways of making it possible to take the data and build repeatable processes, and those are very horizontal, right? There's some variation, but those work the same in a lot of use cases. At this stage, I'd say, in deep learning, just like in machine learning generally, you still have a lot of horizontal infrastructure. You've got Spark, you've got TensorFlow, those are support use case across many industries. But then you get to the next level, you get specific problems, and there's a lot of nuance. What modeling techniques are going to work, what data sets matter? Okay, you've got time series data and a problem like fraud. What techniques are going to make that work well? And recommendations, you may have a long tail of items to think about recommending. How do you generalize across the long tail where you can't learn. People who use some relatively small thing or go to an obscure website, or buy an obscure product, there's not enough data to say are they likely to buy something else or do something else, but how do you categorize them so you get statistical power to make useful recommendations, right? Those are things that are very specific that there's a lot of repeatability and a specific solution area of. >> This is, when you talk about the data assets that might be specific to a customer and then I guess some third party or syndicated sources. If you have an outcome in mind, but not every customer has the same inventory of data, so how do you square that circle? >> That's a great question. And I really think that's a lot of the opportunity in the enterprise of applying analytics, so this whole summit DataWorks is about hey, the power of your data. What you can get by collecting your data in a well-managed ecosystem and creating value. So, there's always a nuance. It's like what's happening in your customers, what's your business process, what's special about how you interact, what's the core of your business? So I guess my view is that the way anybody that wants to be a winner in this new digital era and have processes that take advantage of artificial intelligence is going to have to use data as a competitive advantage and build on their unique data. So because we see a lot of times enterprises struggle with this. There's a tendency to say hey, can we just buy a package off the shelf SaaS solution and do that? And for context, for things that are the same for everybody in an industry, that's a great choice. But if you're doing that for your core differentiation of your business, you're in deep trouble in this digital era. >> And that's a great place, sorry George, really quickly. That this day and age, every company is a technology company. You mentioned a use case in banking, fraud detection, which is huge. There's tremendous value that can be gleaned from artificial intelligence, and there's also tremendous risk to them. I'm curious, maybe just kind of a generalization. Where are your customers on this journey in terms of have they, are you going out to customers that have already embraced Hadoop and have a significant amount of data that they say, all right, we've got a lot of data here, we need to understand the context. Where are customers in that maturity evolution? >> Sure, so I'd say that we're really fast-approaching the slope of enlightenment for Hadoop, which is to say the enthusiasm of three years ago when people thought Hadoop was going to do everything have kind of waned and there's now more of an appreciation, like there's a lot of value in having a data warehouse for high value curated data for large-scale use. There's a lot of value in having a data lake of fairly raw data that can be used for exploration in the data science arena. So there's emerging, like what is the best architecture for streaming and how do you drive realtime decisions, and that's still very much up in the air. So I'd say that most of our customers are somewhere on that journey, I think that a lot of them have backed off from their initial ambitions that they bought a little too much of the hype of all that Hadoop might do and they're realizing what it is good for, and how they really need to build a complementary ecosystem. The other thing I think is exciting though is I see the conversation is moving from the technology to the use cases. People are a lot more excited about how can we drive value and analytics, and let's work backwards from the analytics value to the data that's going to support it. >> Absolutely. >> So building on that, we talk about sort of what's core and if you can't have something completely repeatable that's going to be core to your sustainable advantage, but if everyone is learning from data, how does a customer achieve a competitive advantage or even sustain a competitive advantage? Is it orchestrating learning that feeds, that informs processes all across the business, or is it just sort of a perpetual Red Queen effect? >> Well, that's a great question. I mean, I think there's a few things, right? There's operational excellence in every discipline, so having good data scientists, having the right data, collecting data, thinking about how do you get network effects, those are all elements. So I would say there's a table-stakes aspect that if you're not doing this, you're in trouble, but then if you are it's like how do you optimize and lift your game and get better at it? So that's an important fact that you see companies that say how do we acquire data? Like one of the things that you see digital disruptors, like a Tesla, doing is changing the game by saying we're changing the way we work with our customers to get access to the data. Think of the difference between every time you buy a Tesla you sign over the rights for them to collect and use all your data, when the traditional auto OEMs are struggling to get access to a lot of the data because they have intermediaries that control the relationship and aren't willing to share. And a similar thing in other industries, you see in consumer packaged goods. You see a lot of manufacturers there are saying how do we get partnerships, how do we get more accurate data? The old models of going out to the Nielsens of the world and saying give us aggregates, and we'll pay you a lot to give us a summary report, that's not working. How do we learn directly in a digital world about our consumers so we can be more relevant? So one of the things is definitely that control of data and access to data, as well as we see a lot of companies saying what are the acquisitions we can make? What are start ups and capabilities that we can plug in, and complement to get data, to get analytic capability that we can then tailor for our needs? >> It's funny that you mention Tesla having more cars on the road, collecting more data than pretty much anyone else at this point. But then there's like Stanford's sort of luminary for AI, Fei-Fei Li. She signed on I think with Toyota, because she said they sell 10 million cars a year, I'm going to be swimming in data compared to anyone else, possible exception of GM or maybe some Chinese manufacturer. So where does, how can you get around scale when using data at scale to inform your models? How would someone like a Tesla be able to get an end run around that? So that's the battle, the disruptor comes in, they're not at scale, but they maybe change the game in some way. Like having different terms that give them access to different kinds of data, more complete data. So that's sort of part of the answer, is to disrupt an industry you need a strategy what's different, right, like in Tesla's case an electric vehicle. And they've been investing in autonomous vehicles with AI, of course everybody in the industry is seeing that and is racing. I mean, Google really started that whole wave going a long time ago as another potential disruptor coming in with their own unique data asset. So, I think it's all about the combination of capabilities that you need. Disruptors often bring a commitment to a different business process, and that's a big challenge is a lot of times the hardest things are the business processes that are entrenched in existing organizations and disruptors can say we're rethinking the way this gets done. I mean, the example of that in ride sharing, the Ubers and Lyfts of the world, deities where they are re-conceiving what does it mean to consume automobile services. Maybe you don't want to own a car at all if you're a millennial, maybe you just want to have access to a car when you need to go somewhere. That's a good example of a disruptive business model change. >> What are some things that are on the intermediate-term horizon that might affect how you go about trying to create a sustainable advantage? And here I mean things like where deep learning might help data scientists with feature engineering so there's less need for, you can make data scientists less of a scarce resource. Or where there's new types of training for models where you need less data? Those sorts of things might disrupt the practice of achieving an advantage with current AI technology. >> You know, that's a great question. So near-term, the ability to be more efficient in data science is a big deal. There's no surprise that there's a big talent gap, big shortage of qualified data scientists in the enterprise and one of the things that's exciting is that deep learning lets you get more information out of the data, so it learns more so that you'd have to do less future engineering. It's not like a magic box you just pour in raw data to deep learning and out comes the answers, so you still need qualified data scientists, but it's a force multiplier. There's less work to do in future engineering, and therefore you get better results. So that's a factor, you're starting to see things like a hyperparameter search where people will create neural networks that search for the best machine learning model, and again get another level of leverage. Now, today doing that is very expensive. The amount of hardware to do that, very few organizations are going to spend millions of dollars to sort of automate the discovery of models, but things are moving so fast. I mean, even just in the last six weeks to have Nvidia and Google both announce significant breakthroughs in hardware. And I just had a colleague forward me a paper for recent research that says hey this technique could produce a hundred times faster results in deep learning convergence. So you've got rapid advances in investment in the hardware and the software. Historically software improvements have outstripped hardware improvements throughout the history of computing, so it's quite reasonable to expect you'll have 10 thousand times the price performance for deep learning in five years. So things that today might cost a hundred million dollars and no one would do, could cost 10 thousand dollars in five years, and suddenly it's a no-brainer to apply a technique like that to automate something instead of hiring more scarce data scientists that are hard to find, and make the data scientists more productive so they're spending more time thinking about what's going on and less time trying out different variations of how do I configure this thing, does this work, does this, right? >> Oh gosh, Ron, we could keep chatting away. Thank you so much for stopping by theCUBE again, we wish you the best of luck in your keynote tomorrow. I think people are going to be very inspired by your passion, your energy, and also the tremendous opportunity that is really sitting right in front of us. >> Thank you, Lisa, it's a very exciting time to be in the data industry, and the emergence of AI and the enterprise, I couldn't be more excited by it. >> Oh, excellent, well your excitement is palpable. We want to thank you for watching. We are live on theCUBE at the DataWorks Summit day 2, #dws17. For my cohost George Gilbert, I'm Lisa Martin, stick around. We'll be right back. (upbeat electronic melody)
SUMMARY :
Brought to you by Hortonworks. We are live at the DataWorks Summit on day two. Yeah, so talk to us about what you're doing right now. Functionality like the ability to, how do you take GPU and what's packagable, or what's custom, how that changes of infrastructure that any of these systems need to rest on. that might be specific to a customer There's a tendency to say hey, can we just buy a package are you going out to customers that have already embraced conversation is moving from the technology to the use cases. Like one of the things that you see digital disruptors, So that's sort of part of the answer, is to disrupt horizon that might affect how you go about So near-term, the ability to be more efficient we wish you the best of luck in your keynote tomorrow. and the emergence of AI and the enterprise, We want to thank you for watching.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Toyota | ORGANIZATION | 0.99+ |
George Gilbert | PERSON | 0.99+ |
Nvidia | ORGANIZATION | 0.99+ |
Lisa Martin | PERSON | 0.99+ |
George | PERSON | 0.99+ |
Ron Bodkin | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
Lisa | PERSON | 0.99+ |
Tesla | ORGANIZATION | 0.99+ |
Ron | PERSON | 0.99+ |
San Jose | LOCATION | 0.99+ |
five years | QUANTITY | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
10 thousand dollars | QUANTITY | 0.99+ |
GM | ORGANIZATION | 0.99+ |
Stanford | ORGANIZATION | 0.99+ |
Teradata | ORGANIZATION | 0.99+ |
Ubers | ORGANIZATION | 0.99+ |
Think Big Analytics | ORGANIZATION | 0.99+ |
10 thousand times | QUANTITY | 0.99+ |
tomorrow | DATE | 0.99+ |
one | QUANTITY | 0.99+ |
DataWorks Summit | EVENT | 0.98+ |
CUBE | ORGANIZATION | 0.98+ |
today | DATE | 0.98+ |
both | QUANTITY | 0.98+ |
three years ago | DATE | 0.98+ |
DataWorks Summit 2017 | EVENT | 0.97+ |
Hadoop | TITLE | 0.97+ |
Lyfts | ORGANIZATION | 0.97+ |
#dws17 | EVENT | 0.96+ |
10 million cars a year | QUANTITY | 0.96+ |
theCUBE | ORGANIZATION | 0.95+ |
millions of dollars | QUANTITY | 0.94+ |
hundred times | QUANTITY | 0.92+ |
Nielsens | ORGANIZATION | 0.91+ |
last six weeks | DATE | 0.89+ |
Spark | TITLE | 0.88+ |
day two | QUANTITY | 0.86+ |
Hortonworks | ORGANIZATION | 0.83+ |
a hundred million dollars | QUANTITY | 0.81+ |
Fei-Fei Li | COMMERCIAL_ITEM | 0.8+ |
TensorFlow | TITLE | 0.77+ |
Chinese | OTHER | 0.75+ |
Teradata - | EVENT | 0.67+ |
QueryGrid | ORGANIZATION | 0.64+ |
DataWorks | ORGANIZATION | 0.63+ |
things | QUANTITY | 0.61+ |
a half | QUANTITY | 0.55+ |
Keras | TITLE | 0.53+ |
Hadoop | ORGANIZATION | 0.44+ |
2 | DATE | 0.4+ |
day | QUANTITY | 0.35+ |
Reynold Xin, Databricks - #Spark Summit - #theCUBE
>> Narrator: Live from San Francisco, it's theCUBE, covering Spark Summit 2017. Brought to you by Databricks. >> Welcome back we're here at theCube at Spark Summit 2017. I'm David Goad here with George Gilbert, George. >> Good to be here. >> Thanks for hanging with us. Well here's the other man of the hour here. We just talked with Ali, the CEO at Databricks and now we have the Chief Architect and co-founder at Databricks, Reynold Xin. Reynold, how are you? >> I'm good. How are you doing? >> David: Awesome. Enjoying yourself here at the show? >> Absolutely, it's fantastic. It's the largest Summit. It's a lot interesting things, a lot of interesting people with who I meet. >> Well I know you're a really humble guy but I had to ask Ali what should I ask Reynold when he gets up here. Reynold is one of the biggest contributors to Spark. And you've been with us for a long time right? >> Yes, I've been contributing for Spark for about five or six years and that's probably the most number of commits to the project and lately more I'm working with other people to help design the roadmap for both Spark and Databricks with them. >> Well let's get started talking about some of the new developments that you want maybe our audience at theCUBE hasn't heard here in the keynote this morning. What are some of the most exciting new developments? >> So, I think in general if we look at Spark, there are three directions I would say we doubling down. One the first direction is the deep learning. Deep learning is extremely hot and it's very capable but as we alluded to earlier in a blog post, deep learning has reached sort of a mass produced point in which it shows tremendous potential but the tools are very difficult to use. And we are hoping to democratize deep learning and do what Spark did to big data, to deep learning with this new library called deep learning pipelines. What it does, it integrates different deep learning libraries directly in Spark and can actually expose models in sequel. So, even the business analysts are capable of leveraging that. So, that one area, deep learning. The second area is streaming. Streaming, again, I think that a lot of customers have aspirations to actually shorten the latency and increase the throughput in streaming. So, the structured streaming effort is going to be generally available and last month alone on Databricks platform, I think out customers processed three trillion records, last month alone using structured streaming. And we also have a new effort to actually push down the latency all the way to some millisecond range. So, you can really do blazingly fast streaming analytics. And last but not least is the SEQUEL Data Warehousing area, Data warehousing I think that it's a very mature area from the outset of big data point of view, but from a big data one it's still pretty new and there's a lot of use cases that's popping up there. And Spark with approaches like the CBO and also impact here in the database runtime with DBIO, we're actually substantially improving the performance and the capabilities of data warehousing futures. >> We're going to dig in to some of those technologies here in just a second with George. But have you heard anything here so far from anyone that's changed your mind maybe about what to focus on next? So, one thing I've heard from a few customers is actually visibility and debugability of the big data jobs. So many of them are fairly technical engineers and some of them are less sophisticated engineers and they have written jobs and sometimes the job runs slow. And so the performance engineer in me would think so how do I make the job run fast? The different way to actually solve that problem is how can we expose the right information so the customer can actually understand and figure it out themselves. This is why my job is slow and this how I can tweak it to make it faster. Rather than giving people the fish, you actually give them the tools to fish. >> If you can call that bugability. >> Reynold: Yeah, Debugability. >> Debugability. >> Reynold: And visibility, yeah. >> Alright, awesome, George. >> So, let's go back and unpack some of those kind of juicy areas that you identified, on deep learning you were able to distribute, if I understand things right, the predictions. You could put models out on a cluster but the really hard part, the compute intensive stuff, was training across a cluster. And so Deep Learning, 4J and I think Intel's BigDL, they were written for Spark to do that. But with all the excitement over some of the new frameworks, are they now at the point where they are as good citizens on Spark as they are on their native environments? >> Yeah so, this is a very interesting question, obviously a lot of other frameworks are becoming more and more popular, such as TensorFlow, MXNet, Theano, Keras and Office. What the Deep Learning Pipeline library does, is actually exposes all these single note Deep Learning tools as highly optimized for say even GPUs or CPUs, to be available as a estimator or like a module in a pipeline of the machine learning pipeline library in spark. So, now users can actually leverage Spark's capability to, for example, do hyper parameter churning. So, when you're building a machine learning model, it's fairly rare that you just run something once and you're good with it. Usually have to fiddle with a lot of the parameters. For example, you might run over a hundred experiments to actually figure out what is the best model I can get. This is where actually Spark really shines. When you combine Spark with some deep learning library be it BigDL or be it MXNet, be it TensorFlow, you could be using Spark to distribute that training and then do cross validation on it. So you can actually find the best model very quickly. And Spark takes care of all the job scheduling, all the tolerance properties and how do you read data in from different data sources. >> And without my dropping too much in the weeds, there was a version of that where Spark wouldn't take care of all the communications. It would maybe distribute the models and then do some of the averaging of what was done out on the cluster. Are you saying that all that now can be managed by Spark? >> In that library, Spark will be able to actually take care of picking the best model out of it. And there are different ways you an design how do you define the best. The best could be some average of some different models. The best could be just pick one out of this. The best could be maybe there's a tree of models that you classify it on. >> George: And that's a hyper parameter configuration choice? >> So that is actually building functionality in Sparks machine learning pipeline. And now what we're doing is now you can actually plug all those deep learning libraries directly into that as part of the pipeline to be used. Another maybe just to add, >> Yeah, yeah, >> Another really cool functionality of the deep learning pipeline is transfer learning. So as you said, deep learning takes a very long time, it's very computationally demanding. And it takes a lot of resources, expertise to train. But with transfer learning what we allow the customers to do is they can take an existing deep learning model as well train in a different domain and they we'd retrain it on a very small amount of data very quickly and they can adapt it to a different domain. That's how sort of the demo on the James Bond car. So there is a general image classifier that we train it on probably just a few thousand images. And now we can actually detect whether a car is James Bond's car or not. >> Oh, and the implications there are huge, which is you don't have to have huge training data sets for modifying a model of a similar situation. I want to, in the time we have, there's always been this debate about whether Sparks should manage state, whether it's database, key value store. Tell us how the thinking about that has evolved and then how the integration interfaces for achieving that have evolved. >> One of the, I would say, advantages of Spark is that it's unbiased and works with a variety of storage systems, be it Cassandra, be it Edgebase, be it HDFS, be is S3. There is a metadata management functionality in Spark which is the catalog of tables that customers can define. But the actual storage sits somewhere else. And I don't think that will change in the near future because we do see that the storage systems have matured significantly in the last few years and I just wrote blog post last week about the advantage of S3 over HDFS for example. The storage price is being driven down by almost a factor of 10X when you go to the cloud. I just don't think it makes sense at this point to be building storage systems for analytics. That said, I think there's a lot of building on top of existing storage system. There's actually a lot of opportunities for optimization on how you can leverage the specific properties of the underlying storage system to get to maximum performance. For example, how are you doing intelligent caching, how do you start thinking about building indexes actually against the data that's stored for scanned workloads. >> With Tungsten's, you take advantage of the latest hardware and where we get more memory intensive systems and now that the Catalyst Optimizer has a cost based optimizer or will be, and large memory. Can you change how you go about knowing what data you're managing in the underlying system and therefore, achieve a tremendous acceleration in performance? >> This is actually one area we invested in the DBIO module as part of Databricks Runtime, and what DBIO does, a lot of this are still in progress, but for example, we're adding some form of indexing capability to add to the system so we can quickly skip and prune out all the irrelevant data when the user is doing simple point look-ups. Or if the user is doing a scan heavy workload with some predicates. That actually has to do with how we think about the underlying data structure. The storage system is still the same storage system, like S3, but were adding actually indexing functionalities on top of it as part of DBIO. >> And so what would be the application profiles? Is it just for the analytic queries or can you do the point look-ups and updates in that sort of scenario too? >> So it's interesting you're talking about updates. Updates is another thing that we've got a lot of future requests on. We're actively thinking about how we will support update workload. Now, that said, I just want to emphasize for both use case of doing point look-ups and updates, we're still talking about in the context of analytic environment. So we would be talking about for example maybe bulk updates or low throughput updates rather than doing transactional updates in which every time you swipe a credit card, some record gets updated. That's probably more belongs on the transactional databases like Oracle or my SEQUEL even. >> What about when you think about people who are going to run, they started out with Spark on prem, they realize they're going to put much more of their resources in the cloud, but with IIOT, industrial IOT type applications they're going to have Spark maybe in a gateway server on the edge? What do you think that configuration looks like? >> Really interesting, it's kind of two questions maybe. The first is the hybrid on prem, cloud solution. Again, so one of the nice advantage of Spark is the couple of storage and compute. So when you want to move for example, workloads from one prem to the cloud, the one you care the most about is probably actually the data 'cause the compute, it doesn't really matter that much where you run it but data's the one that's hard to move. We do have customers that's leveraging Databricks in the cloud but actually reading data directly from on prem the reliance of the caching solution we have that minimize the data transfer over time. And is one route I would say it's pretty popular. Another on is, with Amazon you can literally give them just a show ball of functionality. You give them hard drive with trucks, the trucks will ship your data directly put in a three. With IOT, a common pattern we see is a lot of the edge devices, would be actually pushing the data directly into some some fire hose like Kinesis or Kafka or, I'm sure Google and Microsoft both have their own variance of that. And then you use Spark to directly subscribe to those topics and process them in real time with structured streaming. >> And so would Spark be down, let's say at the site level. if it's not on the device itself? >> It's a interesting thought and maybe one thing we should actually consider more in the future is how do we push Spark to the edges. Right now it's more of a centralized model in which the devices push data into Spark which is centralized somewhere. I've seen for example, I don't remember exact the use case but it has to do with some scientific experiment in the North Pole. And of course there you don't have a great uplink of all the data connecting transferring back to some national lab and rather they would do a smart parsing there and then ship the aggregated result back. There's another one but it's less common. >> Alright well just one minute now before the break so I'm going to give you a chance to address the Spark community. What's the next big technical challenge you hope people will work on for the benefit of everybody? >> In general Spark came along with two focuses. One is performance, the other one's ease of use. And I still think big data tools are too difficult to use. Deep learning tools, even harder. The barrier to entry is very high for office tools. I would say, we might have already addressed performance to a degree that I think it's actually pretty usable. The systems are fast enough. Now, we should work on actually make (mumbles) even easier to use. It's what also we focus a lot on at Databricks here. >> David: Democratizing access right? >> Absolutely. >> Alright well Reynold, I wish we could talk to you all day. This is great. We are out of time now. Want to appreciate you coming by theCUBE and sharing your insights and good luck with the rest of the show. >> Thank you very much David and George. >> Thank you all for watching here were at theCUBE at Sparks Summit 2017. Stay tuned, lots of other great guests coming up today. We'll see you in a few minutes.
SUMMARY :
Brought to you by Databricks. I'm David Goad here with George Gilbert, George. Well here's the other man of the hour here. How are you doing? David: Awesome. It's the largest Summit. Reynold is one of the biggest contributors to Spark. and that's probably the most number of the new developments that you want So, the structured streaming effort is going to be And so the performance engineer in me would think kind of juicy areas that you identified, all the tolerance properties and how do you read data of the averaging of what was done out on the cluster. And there are different ways you an design as part of the pipeline to be used. of the deep learning pipeline is transfer learning. Oh, and the implications there are huge, of the underlying storage system and now that the Catalyst Optimizer The storage system is still the same storage system, That's probably more belongs on the transactional databases the one you care the most about if it's not on the device itself? And of course there you don't have a great uplink so I'm going to give you a chance One is performance, the other one's ease of use. Want to appreciate you coming by theCUBE Thank you all for watching here were at theCUBE
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
George Gilbert | PERSON | 0.99+ |
Reynold | PERSON | 0.99+ |
Ali | PERSON | 0.99+ |
David | PERSON | 0.99+ |
George | PERSON | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
David Goad | PERSON | 0.99+ |
Databricks | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
North Pole | LOCATION | 0.99+ |
San Francisco | LOCATION | 0.99+ |
Reynold Xin | PERSON | 0.99+ |
last month | DATE | 0.99+ |
10X | QUANTITY | 0.99+ |
two questions | QUANTITY | 0.99+ |
three trillion records | QUANTITY | 0.99+ |
second area | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
last week | DATE | 0.99+ |
Spark | TITLE | 0.99+ |
Spark Summit 2017 | EVENT | 0.99+ |
first direction | QUANTITY | 0.99+ |
One | QUANTITY | 0.99+ |
James Bond | PERSON | 0.98+ |
Spark | ORGANIZATION | 0.98+ |
both | QUANTITY | 0.98+ |
first | QUANTITY | 0.98+ |
one | QUANTITY | 0.98+ |
Tungsten | ORGANIZATION | 0.98+ |
two focuses | QUANTITY | 0.97+ |
three directions | QUANTITY | 0.97+ |
one minute | QUANTITY | 0.97+ |
one area | QUANTITY | 0.96+ |
three | QUANTITY | 0.96+ |
about five | QUANTITY | 0.96+ |
DBIO | ORGANIZATION | 0.96+ |
six years | QUANTITY | 0.95+ |
one thing | QUANTITY | 0.94+ |
over a hundred experiments | QUANTITY | 0.94+ |
Oracle | ORGANIZATION | 0.92+ |
Theano | TITLE | 0.92+ |
single note | QUANTITY | 0.91+ |
Intel | ORGANIZATION | 0.91+ |
one route | QUANTITY | 0.89+ |
theCUBE | ORGANIZATION | 0.88+ |
Office | TITLE | 0.87+ |
TensorFlow | TITLE | 0.87+ |
S3 | TITLE | 0.87+ |
MXNet | TITLE | 0.85+ |
Day One Wrap - #SparkSummit - #theCUBE
>> Announcer: Live from San Francisco, it's the CUBE covering Spark Summit 2017, brought to by Databricks. (energetic music plays) >> And what an exciting day we've had here at the CUBE. We've been at Spark Summit 2017, talking to partners, to customers, to founders, technologists, data scientists. It's been a load of information, right? >> Yeah, an overload of information. >> Well, George, you've been here in the studio with me talking with a lot of the guests. I'm going to ask you to maybe recap some of the top things you've heard today for our guests. >> Okay so, well, Databricks laid down, sort of, three themes that they wanted folks to take away. Deep learning, Structured Streaming, and serverless. Now, deep learning is not entirely new to Spark. But they've dramatically improved their support for it. I think, going beyond the frameworks that were written specifically for Spark, like Deeplearning4j and BigDL by Intel And now like TensorFlow, which is the opensource framework from Google, has gotten much better support. Structured Streaming, it was not clear how much more news we were going to get, because it's been talked about for 18 months. And they really, really surprised a lot of people, including me, where they took, essentially, the processing time for an event or a small batch of events down to 1 millisecond. Whereas, before, it was in the hundreds if not higher. And that changes the type of apps you can build. And also, the Databricks guys had coined the term continuous apps, which means they operate on a never-ending stream of data, which is different from what we've had in the past where it's batch or with a user interface, request-response. So they definitely turned up the volume on what they can do with continuous apps. And serverless, they'll talk about more tomorrow. And Jim, I think, is going to weigh in. But it, basically, greatly simplifies the ability to run this infrastructure, because you don't think of it as a cluster of resources. You just know that it's sort of out there, and you ask requests of it, and it figures out how to fulfill it. I will say, the other big surprise for me was when we have Matei, who's the creator of Spark and the chief technologist at Databricks, come on the show and say, when we asked him about how Spark was going to deal with, essentially, more advanced storage of data so that you could update things, so that you could get queries back, so that you could do analytics, and not just of stuff that's stored in Spark but stuff that Spark stores essentially below it. And he said, "You know, Databricks, you can expect to see come out with or partner with a database to do these advanced scenarios." And I got the distinct impression, and after listen to the tape again, that he was talking about for Apache Spark, which is separate from Databricks, that they would do some sort of key-value store. So in other words, when you look at competitors or quasi-competitors like Confluent Kafka or a data artist in Flink, they don't, they're not perfect competitors. They overlap some. Now Spark is pushing its way more into overlapping with some of those solutions. >> Alright. Well, Jim Kobielus. And thank you for that, George. You've been mingling with the masses today. (laughs) And you've been here all day as well. >> Educated masses, yeah, (David laughs) who are really engaged in this stuff, yes. >> Well, great, maybe give us some of your top takeaways after all the conversations you've had today. >> They're not all that dissimilar from George's. What Databricks, Databricks of course being the center, the developer, the primary committer in the Spark opensource community. They've done a number of very important things in terms of the announcements today at this event that push Spark, the Spark ecosystem, where it needs to go to expand the range of capabilities and their deployability into production environments. I feel the deep-learning side, announcement in terms of the deep-learning pipeline API very, very important. Now, as George indicated, Spark has been used in a fair number of deep-learning development environments. But not as a modeling tool so much as a training tool, a tool for In Memory distributed training of deep-learning models that we developed in TensorFlow, in Caffe, and other frameworks. Now this announcement is essentially bringing support for deep learning directly into the Spark modeling pipeline, the machine-learning modeling pipeline, being able to call out to deep learning, you know, TensorFlow and so forth, from within MLlib. That's very important. That means that Spark developers, of which there are many, far more than there are TensorFlow developers, will now have an easy pass to bring more deep learning into their projects. That's critically important to democratize deep learning. I hope, and from what I've seen what Databricks has indicated, that they have support currently in API reaching out to both TensorFlow and Keras, that they have plans to bring in API support for access to other leading DL toolkits such as Caffe, Caffe 2, which is Facebook-developed, such as MXNet, which is Amazon-developed, and so forth. That's very encouraging. Structured Streaming is very important in terms of what they announced, which is an API to enable access to faster, or higher-throughput Structured Streaming in their cloud environment. And they also announced that they have gone beyond, in terms of the code that they've built, the micro-batch architecture of Structured Streaming, to enable it to evolve into a more true streaming environment to be able to contend credibly with the likes of Flink. 'Cause I think that the Spark community has, sort of, had their back against the wall with Structured Streaming that they couldn't fully provide a true sub-millisecond en-oo-en latency environment heretofore. But it sounds like with this R&D that Databricks is addressing that, and that's critically important for the Spark community to continue to evolve in terms of continuous computation. And then the serverless-apps announcement is also very important, 'cause I see it as really being, it's a fully-managed multi-tenant Spark-development environment, as an enabler for continuous Build, Deploy, and Testing DevOps within a Spark machine-learning and now deep-learning context. The Spark community as it evolves and matures needs robust DevOps tools to production-ize these machine-learning and deep-learning models. Because really, in many ways, many customers, many developers are now using, or developing, Spark applications that are real 24-by-7 enterprise application artifacts that need a robust DevOps environment. And I think that Databricks has indicated they know where this market needs to go and they're pushing it with R&D. And I'm encouraged by all those signs. >> So, great. Well thank you, Jim. I hope both you gentlemen are looking forward to tomorrow. I certainly am. >> Oh yeah. >> And to you out there, tune in again around 10:00 a.m. Pacific Time. We're going to be broadcasting live here. From Spark Summit 2017, I'm David Goad with Jim and George, saying goodbye for now. And we'll see you in the morning. (sparse percussion music playing) (wind humming and waves crashing).
SUMMARY :
Announcer: Live from San Francisco, it's the CUBE to customers, to founders, technologists, data scientists. I'm going to ask you to maybe recap And that changes the type of apps you can build. And thank you for that, George. after all the conversations you've had today. for the Spark community to continue to evolve I hope both you gentlemen are looking forward to tomorrow. And to you out there, tune in again
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Jim Kobielus | PERSON | 0.99+ |
Jim | PERSON | 0.99+ |
George | PERSON | 0.99+ |
David | PERSON | 0.99+ |
David Goad | PERSON | 0.99+ |
San Francisco | LOCATION | 0.99+ |
Matei | PERSON | 0.99+ |
tomorrow | DATE | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Databricks | ORGANIZATION | 0.99+ |
hundreds | QUANTITY | 0.99+ |
Spark | TITLE | 0.99+ |
both | QUANTITY | 0.98+ |
ORGANIZATION | 0.98+ | |
Intel | ORGANIZATION | 0.98+ |
Spark Summit 2017 | EVENT | 0.98+ |
18 months | QUANTITY | 0.98+ |
Flink | ORGANIZATION | 0.97+ |
ORGANIZATION | 0.97+ | |
Confluent Kafka | ORGANIZATION | 0.97+ |
Caffe | ORGANIZATION | 0.96+ |
today | DATE | 0.96+ |
TensorFlow | TITLE | 0.94+ |
three themes | QUANTITY | 0.94+ |
10:00 a.m. Pacific Time | DATE | 0.94+ |
CUBE | ORGANIZATION | 0.94+ |
Deeplearning4j | TITLE | 0.94+ |
Spark | ORGANIZATION | 0.93+ |
1 millisecond | QUANTITY | 0.93+ |
Keras | ORGANIZATION | 0.91+ |
Day One | QUANTITY | 0.81+ |
BigDL | TITLE | 0.79+ |
TensorFlow | ORGANIZATION | 0.79+ |
7 | QUANTITY | 0.77+ |
MLlib | TITLE | 0.73+ |
Caffe 2 | ORGANIZATION | 0.7+ |
Caffe | TITLE | 0.7+ |
24- | QUANTITY | 0.68+ |
MXNet | ORGANIZATION | 0.67+ |
Apache Spark | ORGANIZATION | 0.54+ |