Mike Tarselli, TetraScience | CUBE Conversation May 2021
>>Mhm >>Yes, welcome to this cube conversation. I'm lisa martin excited about this conversation. It's combining my background in life sciences with technology. Please welcome Mike Tarsa Lee, the chief scientific officer at Tetra Science. Mike I'm so excited to talk to you today. >>Thank you lisa and thank you very much to the cube for hosting us. >>Absolutely. So we talk about cloud and data all the time. This is going to be a very interesting conversation especially because we've seen events of the last what are we on 14 months and counting have really accelerated the need for drug discovery and really everyone's kind of focused on that. But I want you to talk with our audience about Tetra science, Who you guys are, what you do and you were founded in 2014. You just raised 80 million in series B but give us an idea of who you are and what you do. >>Got it. Tetro Science, what are we? We are digital plumbers and that may seem funny but really we are taking the world of data and we are trying to resolve it in such a way that people can actually pipe it from the data sources they have in a vendor agnostic way to the data targets in which they need to consume that data. So bringing that metaphor a little bit more to life sciences, let's say that you're a chemist and you have a mass spec and an NMR and some other piece of technology and you need all of those to speak the same language. Right? Generally speaking, all of these are going to be made by different vendors. They're all going to have different control software and they're all going to have slightly different ways of sending their data in. Petro Science takes those all in. We bring them up to the cloud or cloud native solution. We harmonize them, we extract the data first and then we actually put it into what we call our special sauce are intermediate data schema to harmonize it. So you have sort of like a picture and a diagram of what the prototypical mass spec or H P. L. C. Or cell counting data should look like. And then we build pipelines to export that data over to where you need it. So if you need it to live in an L. N. Or a limb system or in a visualization tool like spot fire tableau. We got you covered. So again we're trying to pipe things from left to right from sources to targets and we're trying to do it with scientific context. >>That was an outstanding description. Data plumbers who have secret sauce and never would have thought I would have heard that when I woke up this morning. But I'm going to unpack this more because one of the things that I read in the press release that just went out just a few weeks ago announcing the series B funding, it said that that picture science is pioneering a $300 billion dollar Greenfield data market and operating this is what got my attention without a direct cloud native and open platform competitor. Why is that? >>That's right. If you look at the way pharma data is handled today, even those that long tend to be either on prem solutions with a sort of license model or a distribution into a company and therefore maintenance costs, professional services, etcetera. Or you're looking at somebody who is maybe cloud but their cloud second, you know, they started with their on prem journey and they said we should go and build out some puppies, we should go to the cloud migrate. However, we're cloud first cloud native. So that's one first strong point. And the second is that in terms of data harmonization and in terms of looking at data in a vendor agnostic way, um many companies claim to do it. But the real hard test of this, the metal, what will say is when you can look at this with the Scientific contextual ization we offer. So yes, you can collect the data and put it on a cloud. Okay great. Yes. You may be able to do an extract, transform and load and move it to somewhere else. Okay. But can you actually do that from front to back while retaining all the context of the data while keeping all of the metadata in the right place? With veracity, with G XP readiness, with data fidelity and when it gets over to the other side can somebody say oh yeah that's all the data from all the H. P. L. C. S we control. I got it. I see where it is. I see where to go get it, I see who created it. I see the full data train and validation landscape and I can rebuild that back and I can look back to the old raw source files if I need to. Um I challenge someone to find another direct company that's doing that today. >>You talk about that context and the thing that sort of surprises me is with how incredibly important scientific discovery is and has been for since the beginning of time. Why is why has nobody come out in the last seven years and tried to facilitate this for life sciences organizations. >>Right. I would say that people have tried and I would say that there are definitely strides being made in the open source community, in the data science community and inside pharma and biotech themselves on these sort of build motif, right. If you are inside of a company and you understand your own ontology and processes while you can probably design an application or a workflow using several different tools in order to get that data there. But will it be generally useful to the bioscience community? One thing we pride ourselves on is when we product eyes a connector we call or an integration, we actually do it with a many different companies, generic cases in mind. So we say, OK, you have an h p l C problem over at this top pharma, you have an HPC problem with this biotech and you have another one of the C R. O. Okay. What are the common points between all of those? Can we actually distill that down to a workflow? Everyone's going to need, for example a compliance workflow. So everybody needs compliance. Right. So we can actually look into an empower or a unicorn operation and we can say, okay, did you sign off on that? Did it come through the right way? Was the data corrupted etcetera? That's going to be generically useful to everybody? And that's just one example of something we can do right now for anybody in bio pharma. >>Let's talk about the events of the last 14 months or so mentioned 10 X revenue growth in 2020. Covid really really highlighted the need to accelerate drug discovery and we've seen that. But talk to me about some of the things that Tetra science has seen and done to facilitate that. >>Yeah, this past 14 months. I mean um I will say that the global pandemic has been a challenge for everyone involved ourselves as well. We've basically gone to a full remote workforce. Um We have tried our very best to stay on top of it with remote collaboration tools with vera, with GIT hub with everything. However, I'll say that it's actually been some of the most successful time in our company's history because of that sort of lack of any kind of friction from the physical world. Right? We've really been able to dig down and dig deep on our integrations are connections, our business strategy. And because of that, we've actually been able to deliver a lot of value to customers because, let's be honest, we don't actually have to be on prem from what we're doing since we're not an on prem solution and we're not an original equipment manufacturer, we don't have to say, okay, we're going to go plug the thing in to the H. P. L. C. We don't have to be there to tune the specific wireless protocols or you're a W. S. Protocols, it can all be done remotely. So it's about building good relationships, building trust with our colleagues and clients and making sure we're delivering and over delivering every time. And then people say great um when I elect a Tetra solution, I know what's going right to the cloud, I know I can pick my hosting options, I know you're going to keep delivering more value to me every month. Um Thanks, >>I like that you make it sound simple and that actually you bring up a great point though that the one of the many things that was accelerated this last year Plus is the need to be remote that need to be able to still communicate, collaborate but also the need to establish and really foster those relationships that you have with existing customers and partners as everybody was navigating very, very different challenges. I want to talk now about how you're helping customers unlock the problem that is in every industry data silos and point to point integration where things can talk to each other, Talk to me about how you're helping customers like where do they start with? Touch? Where do you start that? Um kind of journey to unlock data value? >>Sure. Journey to unlock data value. Great question. So first I'll say that customers tend to come to us, it's the oddest thing and we're very lucky and very grateful for this, but they tend to have heard about what we've done with other companies and they come to us they say listen, we've heard about a deployment you've done with novo Nordisk, I can say that for example because you know, it's publicly known. Um so they'll say, you know, we hear about what you've done, we understand that you have deep expertise in chromatography or in bio process. And they'll say here's my really sticky problem. What can you do here? And invariably they're going to lay out a long list of instruments and software for us. Um we've seen lists that go up past 2000 instruments. Um and they'll say, yeah, they'll say here's all the things we need connected, here's four or five different use cases. Um we'll bring you start to finish, we'll give you 20 scientists in the room to talk through them and then we to get somewhere between two and four weeks to think about that problem and come back and say here's how we might solve that. Invariably, all of these problems are going to have a data silos somewhere, there's going to be in Oregon where the preclinical doesn't see the biology or the biology doesn't see the screening etcetera. So we say, all right, give us one scientist from each of those, hence establishing trust, establishing input from everybody. And collaboratively we'll work with, you will set up an architecture diagram, will set up a first version of a prototype connector, will set up all this stuff they need in order to get moving, we'll deliver value upfront before we've ever signed a contract and will say, is this a good way to go for you? And they'll say either no, no, thank you or they'll say yes, let's go forward, let's do a pilot a proof of concept or let's do a full production rollout. And invariably this data silos problem can usually be resolved by again, these generic size connectors are intermediate data schema, which talks and moves things into a common format. Right? And then also by organizationally, since we're already connecting all these groups in this problem statement, they tend to continue working together even when we're no longer front and center, right? They say, oh we set up that thing together. Let's keep thinking about how to make our data more available to one another. >>Interesting. So culturally, within the organization it sounds like Tetra is having significant influences their, you know, the collaboration but also data ownership. Sometimes that becomes a sticky situation where there are owners and they want to read retain that control. Right? You're laughing? You've been through this before. I'd like to understand a little bit more though about the conversation because typically we're talking about tech but we're also talking about science. Are you having these technical conversations with scientists as well as I. T. What is that actual team from the customer perspective look >>like? Oh sure. So the technical conversation and science conversation are going on sometimes in parallel and sometimes in the same threat entirely. Oftentimes the folks who reach out to us first tend to be the scientists. They say I've got a problem, you know and and my research and and I. T. Will probably hear about this later. But let's go. And then we will invariably say well let's bring in your R. And D. I. T. Counterparts because we need them to help solve it right? But yes we are usually having those conversations in parallel at first and then we unite them into one large discussion. And we have varied team members here on the Tetris side we have me from science along with multiple different other PhD holders and pharma lifers in our business who actually can look at the scientific use cases and recommend best practices for that and visualizations. We also have a lot of solutions architects and delivery engineers who can look at it from the how should the platform assemble the solution and how can we carry it through? Um And those two groups are three groups really unite together to provide a unified front and to help the customer through and the customer ends up providing the same thing as we do. So they'll give us on the one call, right? Um a technical expert, a data and QA person and a scientist all in one group and they'll say you guys work together to make sure that our orders best represented here. Um And I think that that's actually a really productive way to do this because we end up finding out things and going deeper into the connector than we would have otherwise. >>It's very collaborative, which is I bet those are such interesting conversations to be a part of it. So it's part of the conversation there helping them understand how to establish a common vision for data across their organization. >>Yes, that that tends to be a sort of further reaching conversation. I'll say in the initial sort of short term conversation, we don't usually say you three scientists or engineers are going to change the fate of the entire orig. That's maybe a little outside of our scope for now. But yes, that first group tends to describe a limited solution. We help to solve that and then go one step past and then they'll nudge somebody else in the Oregon. Say, do you see what Petra did over here? Maybe you could use it over here in your process. And so in that way we sort of get this cultural buy in and then increased collaboration inside a single company. >>Talk to me about some customers that you've worked with it. Especially love to know some of the ones that you've helped in the last year where things have been so incredibly dynamic in the market. But give us an insight into maybe some specific customers that work with you guys. >>Sure. I'd love to I'll speak to the ones that are already on our case studies. You can go anytime detector science dot com and read all of these. But we've worked with Prelude therapeutics for example. We looked at a high throughput screening cascade with them and we were able to take an instrument that was basically unloved in a corner at T. Can liquid handler, hook it up into their Ln. And their screening application and bring in and incorporate data from an external party and do all of that together and merge it so they could actually see out the other side a screening cascade and see their data in minutes as opposed to hours or days. We've also worked as you've seen the press release with novo Nordisk, we worked on automating much of their background for their chromatography fleet. Um and finally we've also worked with several smaller biotechs in looking at sort of in stan shih ation, they say well we've just started we don't have an L. N. We don't have a limbs were about to buy these 50 instruments. Um what can you do with us and we'll actually help them to scope what their initial data storage and harmonization strategy should even be. Um so so we're really man, we're at everywhere from the enterprise where its fleets of thousands of instruments and we're really giving data to a large amount of scientists worldwide, all the way down to the small biotech with 50 people who were helping add value there. >>So big range there in terms of the data conversation, I'm curious has have you seen it change in the last year plus with respect to elevating to the C suite level or the board saying we've got to be able to figure this out because as we saw, you know, the race for the Covid 19 vaccine for example. Time to value and and to discovery is so critical. Is that C suite or board involved in having conversations with you guys? >>It's funny because they are but they are a little later. Um we tend to be a scientist and user driven um solution. So at the beginning we get a power user, an engineer or a R and D I. T. Person in who really has a problem to solve. And as they are going through and developing with us, eventually they're going to need either approval for the time, the resources or the budget and then they'll go up to their VP or their CIA or someone else at the executive level and say, let's start having more of this conversation. Um, as a tandem effort, we are starting to become involved in some thought leadership exercises with some larger firms. And we are looking at the strategic aspect through conferences, through white papers etcetera to speak more directly to that C suite and to say, hey, you know, we could fit your industry for dato motif. And then one other thing you said, time to value. So I'll say that the Tetro science executive team actually looks at that as a tract metric. So we're actually looking at driving that down every single week. >>That's outstanding. That's a hard one to measure, especially in a market that is so dynamic. But that time to value for your customers is critical. Again, covid sort of surfaced a number of things and some silver linings. But that being able to get hands on the day to make sure that you can actually pull insights from it accelerate facilitate drug discovery. That time to value there is absolutely critical. >>Yeah. I'll say if you look at the companies that really, you know, went first and foremost, let's look at Moderna right? Not our customer by the way, but we'll look at Madonna quickly as an example as an example are um, everything they do is automated, right? Everything they do is cloud first. Everything they do is global collaboration networks, you know, with harmonized data etcetera. That is the model we believe Everyone's going to go to in the next 3-5 years. If you look at the fact that Madonna went from sequence to initial vaccine in what, 50, 60 days, that kind of delivery is what the market will become accustomed to. And so we're going to see many more farmers and biotechs move to that cloud first. Distributed model. All data has to go in somewhere centrally. Everyone has to be able to benefit from it. And we are happy to help them get >>Well that's that, you know, setting setting a new record for pace is key there, but it's also one of those silver linings that has come out of this to show that not only was that critical to do, but it can be done. We have the technology, we have the brain power to be able to put those all user would harmonize those together to drive this. So give me a last question. Give me an insight into some of the things that are ahead for Tetra science the rest of this year. >>Oh gosh, so many things. One of the nice parts about having funding in the bank and having a dedicated team is the ability to do more. So first of course our our enterprise pharma and BioPharma clients, there are plenty more use cases, workflows, instruments. We've just about scratch the surface but we're going to keep growing and growing our our integrations and connectors. First of all right we want to be like a netflix for connectors. You know we just want you to come and say look do they have the connector? No well don't worry. They're going to have it in a month or two. Um so that we can be basically the almost the swiss army knife for every single connector you can imagine. Then we're going to be developing a lot more data apps so things that you can use to derive value from your data out. And then again, we're going to be looking at helping to educate everybody. So how is cloud useful? Why go to the system with harmonization? How does this influence your compliance? How can you do bi directional communication? There's lots of ways you can use. Once you have harmonized centralized data, you can do things with it to influence your order and drive times down again from days and weeks, two minutes and seconds. So let's get there. And I think we're going to try doing that over the next year. >>That's awesome. Never a dull moment. And I, you should partner with your marketing folks because we talked about, you talked about data plumbing the secret sauce and becoming the netflix of connectors. These are three gems that you dropped on this this morning mike. This has been awesome. Thank you for sharing with us what teacher science is doing, how you're really helping to fast track a lot of the incredibly important research that we're all really um dependent on and helping to heal the world through data. It's been a pleasure talking with you. >>Haley says I'm a real quickly. It's a team effort. The entire Tetro science team deserves credit for this. I'm just lucky enough to be able to speak to you. So thank you very much for the opportunity. >>And she about cheers to the whole touch of science team. Keep up the great work guys. Uh for mike Roselli, I'm lisa martin. You're watching this cube conversation. >>Mhm.
SUMMARY :
Mike I'm so excited to talk to you today. But I want you to talk with our audience about over to where you need it. But I'm going to unpack this more because one of the things that I read I can rebuild that back and I can look back to the old raw source files if I need to. You talk about that context and the thing that sort of surprises me is with how incredibly important scientific So we say, OK, you have an h p l C problem over at this top pharma, Covid really really highlighted the need to accelerate to the H. P. L. C. We don't have to be there to tune the specific wireless protocols or you're a W. is the need to be remote that need to be able to still communicate, we understand that you have deep expertise in chromatography or in bio process. T. What is that actual team from the customer perspective look and going deeper into the connector than we would have otherwise. it. So it's part of the conversation there helping them understand how to establish of short term conversation, we don't usually say you three scientists or engineers are going to change the Especially love to know some of the ones that you've helped Um what can you do with us and we'll actually help them to scope what their initial data as we saw, you know, the race for the Covid 19 vaccine for example. So at the beginning we get a But that being able to get hands on the day to make That is the model we believe Everyone's going to go to in the next 3-5 years. We have the technology, we have the brain power to be able to put those You know we just want you to come and say look do they have the connector? And I, you should partner with your marketing folks because we talked about, I'm just lucky enough to be able to speak to you. And she about cheers to the whole touch of science team.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
2014 | DATE | 0.99+ |
Mike Tarselli | PERSON | 0.99+ |
CIA | ORGANIZATION | 0.99+ |
Oregon | LOCATION | 0.99+ |
50 | QUANTITY | 0.99+ |
Mike | PERSON | 0.99+ |
Haley | PERSON | 0.99+ |
2020 | DATE | 0.99+ |
Mike Tarsa Lee | PERSON | 0.99+ |
Tetro Science | ORGANIZATION | 0.99+ |
Tetra science | ORGANIZATION | 0.99+ |
lisa martin | PERSON | 0.99+ |
mike Roselli | PERSON | 0.99+ |
lisa | PERSON | 0.99+ |
four | QUANTITY | 0.99+ |
May 2021 | DATE | 0.99+ |
20 scientists | QUANTITY | 0.99+ |
Madonna | PERSON | 0.99+ |
netflix | ORGANIZATION | 0.99+ |
two groups | QUANTITY | 0.99+ |
50 people | QUANTITY | 0.99+ |
80 million | QUANTITY | 0.99+ |
one call | QUANTITY | 0.99+ |
three groups | QUANTITY | 0.99+ |
two minutes | QUANTITY | 0.99+ |
Tetra Science | ORGANIZATION | 0.99+ |
one group | QUANTITY | 0.99+ |
50 instruments | QUANTITY | 0.99+ |
14 months | QUANTITY | 0.99+ |
$300 billion dollar | QUANTITY | 0.99+ |
One | QUANTITY | 0.99+ |
novo Nordisk | ORGANIZATION | 0.99+ |
two | QUANTITY | 0.99+ |
second | QUANTITY | 0.99+ |
last year | DATE | 0.99+ |
Petro Science | ORGANIZATION | 0.99+ |
today | DATE | 0.99+ |
four weeks | QUANTITY | 0.99+ |
Moderna | ORGANIZATION | 0.98+ |
three scientists | QUANTITY | 0.98+ |
each | QUANTITY | 0.98+ |
60 days | QUANTITY | 0.98+ |
first | QUANTITY | 0.98+ |
one scientist | QUANTITY | 0.98+ |
a month | QUANTITY | 0.98+ |
First | QUANTITY | 0.97+ |
one | QUANTITY | 0.97+ |
Petra | PERSON | 0.97+ |
first version | QUANTITY | 0.96+ |
one example | QUANTITY | 0.96+ |
series B | OTHER | 0.96+ |
next year | DATE | 0.96+ |
2000 instruments | QUANTITY | 0.96+ |
five different use cases | QUANTITY | 0.94+ |
single company | QUANTITY | 0.94+ |
Tetro | ORGANIZATION | 0.94+ |
first group | QUANTITY | 0.93+ |
mike | PERSON | 0.93+ |
three gems | QUANTITY | 0.92+ |
this morning | DATE | 0.9+ |
one step | QUANTITY | 0.9+ |
first strong point | QUANTITY | 0.89+ |
BioPharma | ORGANIZATION | 0.89+ |
TetraScience | ORGANIZATION | 0.88+ |
few weeks ago | DATE | 0.86+ |
last 14 months | DATE | 0.86+ |
past 14 months | DATE | 0.86+ |
Tetris | ORGANIZATION | 0.85+ |
last seven years | DATE | 0.85+ |
thousands of instruments | QUANTITY | 0.83+ |
this year | DATE | 0.82+ |
H. P. L. C. | ORGANIZATION | 0.81+ |
swiss | ORGANIZATION | 0.8+ |
10 X revenue | QUANTITY | 0.79+ |
C | TITLE | 0.79+ |
single week | QUANTITY | 0.78+ |
A Day in the Life of a Data Scientist
>>Hello, everyone. Welcome to the a day in the life of a data science talk. Uh, my name is Terry Chang. I'm a data scientist for the ASML container platform team. And with me, I have in the chat room, they will be moderating the chat. I have Matt MCO as well as Doug Tackett, and we're going to dive straight into kind of what we can do with the asthma container platform and how we can support the role of a data scientist. >>So just >>A quick agenda. So I'm going to do some introductions and kind of set the context of what we're going to talk about. And then we're actually going to dive straight into the ASML container platforms. So we're going to walk straight into what a data scientist will do, kind of a pretty much a day in the life of the data scientists. And then we'll have some question and answer. So big data has been the talk within the last few years within the last decade or so. And with big data, there's a lot of ways to derive meaning. And then a lot of businesses are trying to utilize their applications and trying to optimize every decision with their, uh, application utilizing data. So previously we had a lot of focus on data analytics, but recently we've seen a lot of data being used for machine learning. So trying to take any data that they can and send it off to the data scientists to start doing some modeling and trying to do some prediction. >>So that's kind of where we're seeing modern businesses rooted in analytics and data science in itself is a team sport. We're seeing that it doesn't, we need more than data scientists to do all this modeling. We need data engineers to take the data, massage the data and do kind of some data manipulation in order to get it right for the data scientists. We have data analysts who are monitoring the models, and we even have the data scientists themselves who are building and iterating through multiple different models until they find a one that is satisfactory to the business needs. Then once they're done, they can send it off to the software engineers who will actually build it out into their application, whether it's a mobile app or a web app. And then we have the operations team kind of assigning the resources and also monitoring it as well. >>So we're really seeing data science as a team sport, and it does require a lot of different expertise and here's the kind of basic machine learning pipeline that we see in the industry now. So, uh, at the top we have this training environment and this is, uh, an entire loop. Uh, we'll have some registration, we'll have some inferencing and at the center of all, this is all the data prep, as well as your repositories, such as for your data, for any of your GitHub repository, things of that sort. So we're kind of seeing the machine learning industry, go follow this very basic pattern and at a high level I'll glance through this very quickly, but this is kind of what the, uh, machine learning pipeline will look like on the ASML container platform. So at the top left, we'll have our, our project depository, which is our, uh, persistent storage. >>We'll have some training clusters, we'll have a notebook, we'll have an inference deployment engine and a rest API, which is all sitting on top of the Kubernetes cluster. And the benefit of the container platform is that this is all abstracted away from the data scientist. So I will actually go straight into that. So just to preface, before we go into the data as small container platform, where we're going to look at is a machine learning example, problem that is, uh, trying to predict how long a specific taxi ride will take. So with a Jupiter notebook, the data scientists can take all of this data. They can do their data manipulation, train a model on a specific set of features, such as the location of a taxi ride, the duration of a taxi ride, and then model it to trying to figure out, you know, what, what kind of prediction we can get on a future taxi ride. >>So that's the example that we will talk through today. I'm going to hop out of my slides and jump into my web browser. So let me zoom in on this. So here I have a Jupiter environment and, um, this is all running on the container platform. All I need is actually this link and I can access my environment. So as a data scientist, I can grab this link from my it admin or my system administrator. And I could quickly start iterating and, and start coding. So on the left-hand side of the Jupiter, we actually have a file directory structure. So this is already synced up to my get repository, which I will show in a little bit on the container platform so quickly I can pull any files that are on my get hub repository. I can even push with a button here, but I can, uh, open up this Python notebook. >>And with all this, uh, unique features of the Jupiter environment, I can start coding. So each of these cells can run Python code and in specific the container at the ESMO container platform team, we've actually built our own in-house lime magic commands. So these are unique commands, um, that we can use to interact with the underlying infrastructure of the container platform. So the first line magic command that I want to mention is this command called percent attachments. When I run this command, I'll actually get the available training clusters that I can send training jobs to. So this specific notebook, uh, it's pretty much been created for me to quickly iterate and develop a model very quickly. I don't have to use all the resources. I don't have to allocate a full set of GPU boxes onto my little Jupiter environment. So with the training cluster, I can attach these individual data science notebooks to those training clusters and the data scientists can actually utilize those resources as a shared environment. >>So the, essentially the shared large eight GPU box can actually be shared. They don't have to be allocated to a single data scientist moving on. We have another line magic command, it's called percent percent Python training. This is how we're going to utilize that training cluster. So I will prepare the cell percent percent with the name of the training cluster. And this is going to tell this notebook to send this entire training cell, to be trained on those resources on that training cluster. So the data scientists can quickly iterate through a model. They can then format that model and all that code into a large cell and send it off to that training cluster. So because of that training cluster is actually located somewhere else. It has no context of what has been done locally in this notebook. So we're going to have to do and copy everything into one large cell. >>So as you see here, I'm going to be importing some libraries and I'm in a, you know, start defining some helper functions. I'm going to read in my dataset and with the typical data science modeling life cycle, we're going to have to take in the data. We're going to have to do some data pre-processing. So maybe the data scientists will do this. Maybe the data engineer will do this, but they have access to that data. So I'm here. I'm actually getting there to be reading in the data from the project repository. And I'll talk about this a little bit later with all of the clusters within the container platform, we have access to some project repository that has been set up using the underlying data fabric. So with this, I have, uh, some data preprocessing, I'm going to cleanse some of my data that I noticed that maybe something is missing or, uh, some data doesn't look funky. >>Maybe the data types aren't correct. This will all happen here in these cells. So once that is done, I can print out that the data is done cleaning. I can start training my model. So here we have to split our data, set into a test, train, uh, data split so that we have some data for actually training the model and some data to test the model. So I can split my data there. I could create my XG boost object to start doing my training and XG boost is kind of like a decision tree machine learning algorithm, and I'm going to fit my data into this, uh, XG boost algorithm. And then I'm going to do some prediction. And then in addition, I'm actually going to be tracking some of the metrics and printing them out. So these are common metrics that we, that data scientists want to see when they do their training of the algorithm. >>Just to see if some of the accuracy is being improved, if the loss is being improved or the mean absolute error. So things like that. So these are all things, data scientists want to see. And at the end of this training job, I'm going to be saving the model. So I'm going to be saving it back into the project repository in which we will have access to. And at the end, I will print out the end time so I can execute that cell. And I've already executed that cell. So you'll see all of these print statements happening here. So importing the libraries, the training was run reading and data, et cetera. All of this has been printed out from that training job. Um, and in order to access that, uh, kind of glance through that, we would get an output with a unique history URL. >>So when we send the training job to that training cluster, we'll the training cluster will send back a unique URL in which we'll use the last line magic command that I want to talk about called percent logs. So percent logs will actually, uh, parse out that response from the training cluster. And actually we can track in real time what is happening in that training job so quickly, we can see that the data scientist has a sandbox environment available to them. They have access to their get repository. They have access to a project repository in which they can read in some of their data and save the model. So very quick interactive environment for the data scientists to do all of their work. And it's all provisioned on the ASML container platform. And it's also abstracted away. So here, um, I want to mention that again, this URL is being surfaced through the container platform. >>The data scientist doesn't have to interact with that at all, but let's take, it's take a step back. Uh, this is the day to day in the life of the data scientists. Now, if we go backwards into the container platform and we're going to walk through how it was all set up for them. So here is my login page to the container platform. I'm going to log in as my user, and this is going to bring me to the, uh, view of the, uh, Emma lops tenant within the container platform. So this is where everything has been set up for me, the data scientist doesn't have to see this if they don't need to, but what I'll walk through now is kind of the topics that I mentioned previously that we would go back into. So first is the project repository. So this project deposited comes with each tenant that is created on the platform. >>So this is a more, nothing more than a shared collaborative workspace environment in which data scientist or any data scientist who is allocated to this tenant. They have this politics client that can visually see all their data of all, all of their code. And this is actually taking a piece of the underlying data fabric and using that for your project depository. So you can see here, I have some code I can create and see my scoring script. I can see the models that have been created within this tenant. So it's pretty much a powerful tool in which you can store your code store any of your data and have the ability to read and write from any of your Jupiter environments or any of your created clusters within this tenant. So a very cool ad here in which you can, uh, quickly interact with your data. >>The next thing I want to show is the source control. So here is where you would plug in all of your information for your source control. And if I edit this, you guys will actually see all the information that I've passed in to configure the source control. So on the backend, the container platform will take these credentials and connect the Jupiter notebooks you create within this tenant to that get repository. So this is the information that I've passed in. If GitHub is not of interest, we also have support for bit bucket here as well. So next I want to show you guys that we do have these notebook environments. So, um, the notebook environment was created here and you can see that I have a notebook called Teri notebook, and this is all running on the Kubernetes environment within the container platform. So either the data scientists can come here and create their notebook or their project admin can create the notebook. >>And all you'd have to do is come here to this notebook end points. And this, the container platform will actually map the container platform to a specific port in which you can just give this link to the data scientists. And this link will actually bring them to their own Jupiter environment and they can start doing all of their model just as I showed in that previous Jupiter environment. Next I want to show the training cluster. This is the training cluster that was created in which I can attach my notebook to start utilizing those training clusters. And then the last thing I want to show is the model, the deployment cluster. So once that model has been saved, we have a model registry in which we can register the model into the platform. And then the last step is to create a deployment clusters. So here on my screen, I have a deployment cluster called taxi deployment. >>And then all these serving end points have been configured for me. And most importantly, this endpoint model. So the deployment cluster is actually a wrap the, uh, train model with the flask wrapper and add a rest endpoint to it so quickly. I can operationalize my model by taking this end point and creating a curl command, or even a post request. So here I have my trusty postman tool in which I can format a post request. So I've taken that end point from the container platform. I've formatted my body, uh, right here. So these are some of the features that I want to send to that model. And I want to know how long this specific taxi ride at this location at this time of day would take. So I can go ahead and send that request. And then quickly I will get an output of the ride. >>Duration will take about 2,600 seconds. So pretty much we've walked through how a data scientists can quickly interact with their notebook. They can train their model. And then coming into the platform, we saw the project repository, we saw the source control. We can register the model within the platform, and then quickly we can operationalize that model with our deployment cluster, uh, and have our model up and running and available for inference. So that wraps up the demo. Uh, I'm gonna pass it back to Doug and Matt and see if they want to come off mute and see if there are any questions, Matt, Doug, you there. Okay. >>Yeah. Hey, Hey Terry, sorry. Sorry. Just had some trouble getting off mute there. Uh, no, that was a, that was an excellent presentation. And I think there are generally some questions that come up when I talk to customers around how integrated into the Kubernetes ecosystem is this capability and where does this sort of Ezreal starts? And the open source, uh, technologies like, um, cube flow as an example, uh, begin. >>Yeah, sure. Matt. So this is kind of one layer up. We have our Emma LOBs tenant and this is all running on a piece of a Kubernetes cluster. So if I log back out and go into the site admin view, this is where you would see all the Kubernetes clusters being created. And it's actually all abstracted away from the data scientists. They don't have to know Kubernetes. They just interact with the platform if they want to. But here in the site admin view, I had this Kubernetes dashboard and here on the left-hand side, I have all my Kubernetes sections. So if I just add some compute hosts, whether they're VMs or cloud compute hosts, like ETQ hosts, we can have these, uh, resources abstracted away from us to then create a Kubernetes cluster. So moving on down, I have created this Kubernetes cluster utilizing those resources. >>Um, so if I go ahead and edit this cluster, you'll actually see that have these hosts, which is just a click and a click and drop method. I can move different hosts to then configure my Kubernetes cluster. Once my Kubernetes cluster is configured, I can then create Kubernetes tenant or in this case, it's a namespace. So once I have this namespace available, I can then go into that tenant. And as my user, I don't actually see that it is running on Kubernetes. So in addition with our ML ops tenants, you have the ability to bootstrap cute flow. So queue flow is a open source machine learning framework that is run on Kubernetes, and we have the ability to link that up as well. So, uh, coming back to my Emma lops tenant, I can log in what I showed is the ASML container platform version of Emma flops. But you see here, we've also integrated QP flow. So, uh, very, uh, a nod to, uh, HPS contribution to, you know, utilizing open source. Um, it's actually all configured within our platform. So, um, hopefully, >>Yeah, actually, Tara, can you hear me? It's Doug. So there were a couple of other questions actually about key flare that came in. I wonder whether you could just comment on why we've chosen cube flow. Cause I know there was a question about ML flow in stead and what the differences between ML flow and coop flow. >>Yeah, sure. So the, just to reiterate, there are some questions about QP flow and I'm just, >>Yeah, so obviously one of, uh, one of the people watching saw the queue flow dashboard there, I guess. Um, and so couldn't help but get excited about it. But there was another question about whether, you know, ML flow versus cube flow and what the difference was between them. >>Yeah. So with flow, it's, it's an open source framework that Google has developed. It's a very powerful framework that comes with a lot of other unique tools and Kubernetes. So with Q flow, you really have the ability to launch other notebooks. You have the ability to utilize different Kubernetes operators like TensorFlow and PI torch. You can utilize a lot of the, some of the frameworks within Q4 to do training like Q4 pipelines, which visually allow you to see your training jobs, uh, within the queue flow. It also has a plethora of different serving mechanisms, such as Seldin, uh, for, you know, deploying your, your machine learning models. You have Ks serving, you have TF serving. So Q4 is very, it's a very powerful tool for data scientists to utilize if they want a full end to end open source and know how to use Kubernetes. So it's just a, another way to do your machine learning model development and right with ML flow, it's actually a different piece of the machine learning pipeline. So ML flow mainly focuses on model experimentation, comparing different models, uh, during the training and it off it can be used with Q4. >>The complimentary Terry I think is what you're saying. Sorry. I know we are dramatically running out of time now. So that was really fantastic demo. Thank you very much, indeed. >>Exactly. Thank you. So yeah, I think that wraps it up. Um, one last thing I want to mention is there is this slide that I want to show in case you have any other questions, uh, you can visit hp.com/asml, hp.com/container platform. If you have any questions and that wraps it up. So thank you guys.
SUMMARY :
I'm a data scientist for the ASML container platform team. So I'm going to do some introductions and kind of set the context of what we're going to talk about. the models, and we even have the data scientists themselves who are building and iterating So at the top left, we'll have our, our project depository, which is our, And the benefit of the container platform is that this is all abstracted away from the data scientist. So that's the example that we will talk through today. So the first line magic command that I want to mention is this command called percent attachments. So the data scientists can quickly iterate through a model. So maybe the data scientists will do this. So once that is done, I can print out that the data is done cleaning. So I'm going to be saving it back into the project repository in which we will So here, um, I want to mention that again, this URL is being So here is my login page to the container So this is a more, nothing more than a shared collaborative workspace environment in So on the backend, the container platform will take these credentials and connect So once that model has been saved, we have a model registry in which we can register So I've taken that end point from the container platform. So that wraps up the demo. And the open source, uh, technologies like, um, cube flow as an example, So moving on down, I have created this Kubernetes cluster So once I have this namespace available, So there were a couple of other questions actually So the, just to reiterate, there are some questions about QP flow and I'm just, But there was another question about whether, you know, ML flow versus cube flow and So with Q flow, you really have the ability to launch So that was really fantastic demo. So thank you guys.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Doug | PERSON | 0.99+ |
Doug Tackett | PERSON | 0.99+ |
Terry Chang | PERSON | 0.99+ |
Terry | PERSON | 0.99+ |
Tara | PERSON | 0.99+ |
Matt | PERSON | 0.99+ |
Python | TITLE | 0.99+ |
ORGANIZATION | 0.99+ | |
Matt MCO | PERSON | 0.99+ |
Jupiter | LOCATION | 0.99+ |
Kubernetes | TITLE | 0.99+ |
first line | QUANTITY | 0.98+ |
each | QUANTITY | 0.98+ |
GitHub | ORGANIZATION | 0.98+ |
today | DATE | 0.98+ |
first | QUANTITY | 0.98+ |
about 2,600 seconds | QUANTITY | 0.97+ |
Q4 | TITLE | 0.97+ |
A Day in the Life of a Data Scientist | TITLE | 0.97+ |
hp.com/asml | OTHER | 0.97+ |
last decade | DATE | 0.97+ |
one layer | QUANTITY | 0.95+ |
hp.com/container | OTHER | 0.92+ |
single data | QUANTITY | 0.91+ |
Emma | PERSON | 0.91+ |
one large cell | QUANTITY | 0.91+ |
each tenant | QUANTITY | 0.88+ |
one | QUANTITY | 0.84+ |
one last thing | QUANTITY | 0.81+ |
Q flow | TITLE | 0.8+ |
Emma | TITLE | 0.8+ |
ESMO | ORGANIZATION | 0.76+ |
last few years | DATE | 0.74+ |
one of | QUANTITY | 0.73+ |
day | QUANTITY | 0.72+ |
eight GPU | QUANTITY | 0.7+ |
Seldin | TITLE | 0.69+ |
Q4 | DATE | 0.67+ |
percent percent | OTHER | 0.65+ |
Ezreal | ORGANIZATION | 0.65+ |
some questions | QUANTITY | 0.65+ |
ASML | TITLE | 0.65+ |
ASML | ORGANIZATION | 0.61+ |
people | QUANTITY | 0.49+ |
ETQ | TITLE | 0.46+ |
Teri | ORGANIZATION | 0.4+ |
Emma | ORGANIZATION | 0.35+ |
Democratizing AI and Advanced Analytics with Dataiku x Snowflake
>>My name is Dave Volonte, and with me are two world class technologists, visionaries and entrepreneurs. And Wa Dodgeville is the he co founded Snowflake, and he's now the president of the product division. And Florian Duetto is the co founder and CEO of Data Aiko. Gentlemen, welcome to the Cube to first timers. Love it. >>Great to be here >>now, Florian you and Ben Wa You have a number of customers in common. And I have said many times on the Cube that you know, the first era of cloud was really about infrastructure, making it more agile, taking out costs. And the next generation of innovation is really coming from the application of machine intelligence to data with the cloud is really the scale platform. So is that premise your relevant to you? Do you buy that? And and why do you think snowflake and data ICU make a good match for customers? >>I think that because it's our values that are aligned when it's all about actually today allowing complexity for customers. So you close the gap or the democratizing access to data access to technology. It's not only about data data is important, but it's also about the impact of data. Who can you make the best out of data as fast as possible as easily as possible within an organization. And another value is about just the openness of the platform building the future together? Uh, I think a platform that is not just about the platform but also full ecosystem of partners around it, bringing the level off accessibility and flexibility you need for the 10 years away. >>Yeah, so that's key. But it's not just data. It's turning data into insights. Have been why you came out of the world of very powerful but highly complex databases. And we know we all know that you and the snowflake team you get very high marks for really radically simplifying customers lives. But can you talk specifically about the types of challenges that your customers air using snowflake to solve? >>Yeah, so So the really the challenge, you know, be four. Snowflake. I would say waas really? To put all the data, you know, in one place and run all the computers, all the workloads that you wanted to run, You know, against that data and off course, you know, existing legacy platforms. We're not able to support. You know that level of concurrency, Many workload. You know, we we talk about machine learning that a science that are engendering, you know, that our house big data were closed or running in one place didn't make sense at all. And therefore, you know what customers did is to create silos, silos of data everywhere, you know, with different system having a subset of the data. And of course, now you cannot analyze this data in one place. So, snowflake, we really solve that problem by creating a single, you know, architectural where you can put all the data in the cloud. So it's a really cloud native we really thought about You know how to solve that problem, how to create, you know, leverage, Cloud and the lessee cc off cloud to really put all the die in one place, but at the same time not run all workload at the same place. So each workload that runs in Snowflake that is dedicated, You know, computer resource is to run, and that makes it very Ajai, right? You know, Floyd and talk about, you know, data scientists having to run analysis, so they need you know a lot of compute resources, but only for, you know, a few hours on. Do you know, with snowflake they can run these new work lord at this workload to the system, get the compute resources that they need to run this workload. And when it's over, they can shut down. You know that their system, it will be automatically shut down. Therefore, they would not pay for the resources that they don't use. So it's a very Ajai system where you can do this, analyzes when you need, and you have all the power to run all this workload at the same time. >>Well, it's profound what you guys built to me. I mean, of course, everybody's trying to copy it now. It was like, remember that bringing the notion of bringing compute to the data and the Hadoop days, and I think that that Asai say everybody is sort of following your suit now are trying to Florian I gotta say the first data scientist I ever interviewed on the Cube was amazing. Hilary Mason, right after she started a bit Lee. And, you know, she made data science that sounds so compelling. But data science is hard. So same same question for you. What do you see is the biggest challenges for customers that they're facing with data science. >>The biggest challenge, from my perspective, is that owns you solve the issue of the data. Seidel with snowflake, you don't want to bring another Seidel, which would be a side off skills. Essentially, there is to the talent gap between the talented label of the market, or are it is to actually find recruits trained data scientist on what needs to be done. And so you need actually to simplify the access to technologies such as every organization can make it, whatever the talent, by bridging that gap and to get there, there is a need of actually breaking up the silos. And in a collaborative approach where technologists and business work together and actually put some their hands into those data projects together, >>it makes sense for flooring. Let's stay with you for a minute. If I can your observation spaces, you know it's pretty, pretty global, and and so you have a unique perspective on how companies around the world might be using data and data science. Are you seeing any trends may be differences between regions or maybe within different industries. What are you seeing? >>Yes. Yeah, definitely. I do see trends that are not geographic that much, but much more in terms of maturity of certain industries and certain sectors, which are that certain industries invested a lot in terms of data, data access, ability to start data in the last few years and no age, a level of maturity where they can invest more and get to the next steps. And it's really rely on the ability of certain medial certain organization actually to have built this long term strategy a few years ago and no start raping up the benefits. >>You know, a decade ago, Florian Hal Varian, we, you know, famously said that the sexy job in the next 10 years will be statisticians. And then everybody sort of change that to data scientists and then everybody. All the statisticians became data scientists, and they got a raise. But data science requires more than just statistics acumen. What what skills >>do >>you see as critical for the next generation of data science? >>Yeah, it's a good question because I think the first generation of the patient is became the licenses because they could done some pipe and quickly on be flexible. And I think that the skills or the next generation of data sentences will definitely be different. It will be first about being able to speak the language of the business, meaning, oh, you translate data inside predictive modeling all of this into actionable insight or business impact. And it would be about you collaborate with the rest of the business. It's not just a farce. You can build something off fast. You can do a notebook in python or your credit models off themselves. It's about, oh, you actually build this bridge with the business. And obviously those things are important. But we also has become the center of the fact that technology will evolve in the future. There will be new tools and technologies, and they will still need to keep this level of flexibility and get to understand quickly, quickly. What are the next tools they need to use the new languages or whatever to get there. >>As you look back on 2020 what are you thinking? What are you telling people as we head into next year? >>Yeah, I I think it's Zaveri interesting, right? We did this crisis, as has told us that the world really can change from one day to the next. And this has, you know, dramatic, you know, and perform the, you know, aspect. For example, companies all the sudden, you know, So their revenue line, you know, dropping. And they had to do less meat data. Some of the companies was the reverse, right? All the sudden, you know, they were online, like in stock out, for example, and their business, you know, completely, you know, change, you know, from one day to the other. So this GT off, You know, I, you know, adjusting the resource is that you have tow the task a need that can change, you know, using solution like snowflakes, you know, really has that. And we saw, you know, both in in our customers some customers from one day to the to do the next where, you know, growing like big time because they benefited, you know, from from from from co vid and their business benefited, but also, as you know, had to drop. And what is nice with with with cloud, it allows to, you know, I just compute resources toe, you know, to your business needs, you know, and really adjusted, you know, in our, uh, the the other aspect is is understanding what is happening, right? You need to analyze the we saw all these all our customers basically wanted to understand. What is that going to be the impact on my business? How can I adapt? How can I adjust? And and for that, they needed to analyze data. And, of course, a lot of data which are not necessarily data about, you know, their business, but also data from the outside. You know, for example, coffee data, You know, where is the States? You know, what is the impact? You know, geographic impact from covitz, You know, all the time and access to this data is critical. So this is, you know, the promise off the data crowd, right? You know, having one single place where you can put all the data off the world. So our customers, all the Children you know, started to consume the cov data from our that our marketplace and and we had the literally thousands of customers looking at this data analyzing this data, uh, to make good decisions So this agility and and and this, you know, adapt adapting, you know, from from one hour to the next is really critical. And that goes, you know, with data with crowding adjusting, resource is on and that's, you know, doesn't exist on premise. So So So indeed, I think the lesson learned is is we are living in a world which machines changing all the time and we have for understanding We have to adjust and and And that's why cloud, you know, somewhere it's great. >>Excellent. Thank you. You know the kid we like to talk about disruption, of course. Who doesn't on And also, I mean, you look at a I and and the impact that is beginning to have and kind of pre co vid. You look at some of the industries that were getting disrupted by, you know, we talked about digital transformation and you had on the one end of the spectrum industries like publishing which are highly disrupted or taxis. And you could say Okay, well, that's, you know, bits versus Adam, the old Negroponte thing. But then the flip side of that look at financial services that hadn't been dramatically disrupted. Certainly healthcare, which is ripe for disruption Defense. So the number number of industries that really hadn't leaned into digital transformation If it ain't broke, don't fix it. Not on my watch. There was this complacency and then, >>of >>course, co vid broke everything. So, florian, I wonder if you could comment? You know what industry or industries do you think you're gonna be most impacted by data science and what I call machine intelligence or a I in the coming years and decades? >>Honestly, I think it's all of them artist, most of them because for some industries, the impact is very visible because we're talking about brand new products, drones like cars or whatever that are very visible for us. But for others, we are talking about sport from changes in the way you operate as an organization, even if financial industry itself doesn't seems to be so impacted when you look it from the consumer side or the outside. In fact, internally, it's probably impacted just because the way you use data on developer for flexibility, you need the kind off cost gay you can get by leveraging the latest technologies is just enormous, and so it will actually transform the industry that also and overall, I think that 2020 is only a where, from the perspective of a I and analytics, we understood this idea of maturity and resilience, maturity, meaning that when you've got a crisis, you actually need data and ai more than before. You need to actually call the people from data in the room to take better decisions and look for a while and not background. And I think that's a very important learning from 2020 that will tell things about 2021 and the resilience it's like, Yeah, Data Analytics today is a function consuming every industries and is so important that it's something that needs to work. So the infrastructure is to work in frustration in super resilient. So probably not on prime on a fully and prime at some point and the kind of residence where you need to be able to plan for literally anything like no hypothesis in terms of behaviors can be taken for granted. And that's something that is new and which is just signaling that we're just getting to the next step for the analytics. >>I wonder, Benoit, if you have anything to add to that. I mean, I often wonder, you know, winter machine's gonna be able to make better diagnoses than doctors. Some people say already, you know? Well, the financial services traditional banks lose control of payment systems. Uh, you know what's gonna happen to big retail stores? I mean, maybe bring us home with maybe some of your final thoughts. >>Yeah, I would say, you know, I I don't see that as a negative, right? The human being will always be involved very closely, but the machine and the data can really have, you know, see, Coalition, you know, in the data that that would be impossible for for for human being alone, you know, you know, to to discover so So I think it's going to be a compliment, not a replacement on. Do you know everything that has made us you know faster, you know, doesn't mean that that we have less work to do. It means that we can doom or and and we have so much, you know, to do, uh, that that I would not be worried about, You know, the effect off being more efficient and and and better at at our you know, work. And indeed, you know, I fundamentally think that that data, you know, processing off images and doing, you know, I ai on on on these images and discovering, you know, patterns and and potentially flagging, you know, disease, where all year that then it was possible is going toe have a huge impact in in health care, Onda and And as as as Ryan was saying, every you know, every industry is going to be impacted by by that technology. So So, yeah, I'm very optimistic. >>Great guys. I wish we had more time. I gotta leave it there. But so thanks so much for coming on. The Cube was really a pleasure having you.
SUMMARY :
And Wa Dodgeville is the he co founded And I have said many times on the Cube that you know, the first era of cloud was really about infrastructure, So you close the gap or the democratizing access to data And we know we all know that you and the snowflake team you get very high marks for Yeah, so So the really the challenge, you know, be four. And, you know, And so you need actually to simplify the access to you know it's pretty, pretty global, and and so you have a unique perspective on how companies the ability of certain medial certain organization actually to have built this long term strategy You know, a decade ago, Florian Hal Varian, we, you know, famously said that the sexy job in the next And it would be about you collaborate with the rest of the business. So our customers, all the Children you know, started to consume the cov you know, we talked about digital transformation and you had on the one end of the spectrum industries You know what industry or industries do you think you're gonna be most impacted by data the kind of residence where you need to be able to plan for literally I mean, I often wonder, you know, winter machine's gonna be able to make better diagnoses that data, you know, processing off images and doing, you know, I ai on I gotta leave it there.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Volonte | PERSON | 0.99+ |
Florian Duetto | PERSON | 0.99+ |
Hilary Mason | PERSON | 0.99+ |
Florian Hal Varian | PERSON | 0.99+ |
Florian | PERSON | 0.99+ |
Benoit | PERSON | 0.99+ |
Ryan | PERSON | 0.99+ |
Ben Wa | PERSON | 0.99+ |
Data Aiko | ORGANIZATION | 0.99+ |
2020 | DATE | 0.99+ |
10 years | QUANTITY | 0.99+ |
Lee | PERSON | 0.99+ |
Wa Dodgeville | PERSON | 0.99+ |
next year | DATE | 0.99+ |
python | TITLE | 0.99+ |
Snowflake | ORGANIZATION | 0.99+ |
first | QUANTITY | 0.99+ |
one place | QUANTITY | 0.99+ |
one hour | QUANTITY | 0.98+ |
a decade ago | DATE | 0.98+ |
Floyd | PERSON | 0.98+ |
2021 | DATE | 0.98+ |
one day | QUANTITY | 0.98+ |
both | QUANTITY | 0.97+ |
today | DATE | 0.97+ |
first generation | QUANTITY | 0.96+ |
Adam | PERSON | 0.93+ |
Onda | ORGANIZATION | 0.93+ |
one single place | QUANTITY | 0.93+ |
florian | PERSON | 0.93+ |
each workload | QUANTITY | 0.92+ |
one | QUANTITY | 0.91+ |
four | QUANTITY | 0.9+ |
few years ago | DATE | 0.88+ |
thousands of customers | QUANTITY | 0.88+ |
Cube | COMMERCIAL_ITEM | 0.87+ |
first data scientist | QUANTITY | 0.84+ |
single | QUANTITY | 0.83+ |
Asai | PERSON | 0.82+ |
two world | QUANTITY | 0.81+ |
first era | QUANTITY | 0.74+ |
next 10 years | DATE | 0.74+ |
Negroponte | PERSON | 0.73+ |
Zaveri | ORGANIZATION | 0.72+ |
Dataiku | ORGANIZATION | 0.7+ |
Cube | ORGANIZATION | 0.64+ |
Ajai | ORGANIZATION | 0.58+ |
years | DATE | 0.57+ |
covitz | PERSON | 0.53+ |
decades | QUANTITY | 0.52+ |
Cube | PERSON | 0.45+ |
Snowflake | TITLE | 0.45+ |
Seidel | ORGANIZATION | 0.43+ |
snowflake | EVENT | 0.35+ |
Seidel | COMMERCIAL_ITEM | 0.34+ |
Democratizing AI & Advanced Analytics with Dataiku x Snowflake | Snowflake Data Cloud Summit
>> My name is Dave Vellante. And with me are two world-class technologists, visionaries and entrepreneurs. Benoit Dageville, he co-founded Snowflake and he's now the President of the Product Division, and Florian Douetteau is the Co-founder and CEO of Dataiku. Gentlemen, welcome to the cube to first timers, love it. >> Yup, great to be here. >> Now Florian you and Benoit, you have a number of customers in common, and I've said many times on theCUBE, that the first era of cloud was really about infrastructure, making it more agile, taking out costs. And the next generation of innovation, is really coming from the application of machine intelligence to data with the cloud, is really the scale platform. So is that premise relevant to you, do you buy that? And why do you think Snowflake, and Dataiku make a good match for customers? >> I think that because it's our values that aligned, when it gets all about actually today, and knowing complexity of our customers, so you close the gap. Where we need to commoditize the access to data, the access to technology, it's not only about data. Data is important, but it's also about the impacts of data. How can you make the best out of data as fast as possible, as easily as possible, within an organization. And another value is about just the openness of the platform, building a future together. Having a platform that is not just about the platform, but also for the ecosystem of partners around it, bringing the level of accessibility, and flexibility you need for the 10 years of that. >> Yeah, so that's key, that it's not just data. It's turning data into insights. Now Benoit, you came out of the world of very powerful, but highly complex databases. And we know we all know that you and the Snowflake team, you get very high marks for really radically simplifying customers' lives. But can you talk specifically about the types of challenges that your customers are using Snowflake to solve? >> Yeah, so the challenge before snowflake, I would say, was really to put all the data in one place, and run all the computes, all the workloads that you wanted to run against that data. And of course existing legacy platforms were not able to support that level of concurrency, many workload, we talk about machine learning, data science, data engineering, data warehouse, big data workloads, all running in one place didn't make sense at all. And therefore be what customers did this to create silos, silos of data everywhere, with different system, having a subset of the data. And of course now, you cannot analyze this data in one place. So Snowflake, we really solved that problem by creating a single architecture where you can put all the data into cloud. So it's a really cloud native. We really thought about how solve that problem, how to create, leverage cloud, and the elasticity of cloud to really put all the data in one place. But at the same time, not run all workload at the same place. So each workload that runs in Snowflake, at its dedicated compute resources to run. And that makes it agile, right? Florian talked about data scientist having to run analysis, so they need a lot of compute resources, but only for a few hours. And with Snowflake, they can run these new workload, add this workload to the system, get the compute resources that they need to run this workload. And then when it's over, they can shut down their system, it will automatically shut down. Therefore they would not pay for the resources that they don't use. So it's a very agile system, where you can do this analysis when you need, and you have all the power to run all these workload at the same time. >> Well, it's profound what you guys built. I mean to me, I mean of course everybody's trying to copy it now, it was like, I remember that bringing the notion of bringing compute to the data, in the Hadoop days. And I think that, as I say, everybody is sort of following your suit now or trying to. Florian, I got to say the first data scientist I ever interviewed on theCUBE, it was the amazing Hillary Mason, right after she started at Bitly, and she made data sciences sounds so compelling, but data science is a hard. So same question for you, what do you see as the biggest challenges for customers that they're facing with data science? >> The biggest challenge from my perspective, is that once you solve the issue of the data silo, with Snowflake, you don't want to bring another silo, which will be a silo of skills. And essentially, thanks to the talent gap, between the talent available to the markets, or are released to actually find recruits, train data scientists, and what needs to be done. And so you need actually to simplify the access to technologies such as, every organization can make it, whatever the talent, by bridging that gap. And to get there, there's a need of actually backing up the silos. Having a collaborative approach, where technologies and business work together, and actually all puts up their ends into those data projects together. >> It makes sense, Florain let's stay with you for a minute, if I can. Your observation space, it's pretty, pretty global. And so you have a unique perspective on how can companies around the world might be using data, and data science. Are you seeing any trends, maybe differences between regions, or maybe within different industries? What are you seeing? >> Yeah, definitely I do see trends that are not geographic, that much, but much more in terms of maturity of certain industries and certain sectors. Which are, that certain industries invested a lot, in terms of data, data access, ability to store data. As well as experience, and know region level of maturity, where they can invest more, and get to the next steps. And it's really relying on the ability of certain leaders, certain organizations, actually, to have built these long-term data strategy, a few years ago when no stats reaping of the benefits. >> A decade ago, Florian, Hal Varian famously said that the sexy job in the next 10 years will be statisticians. And then everybody sort of changed that to data scientist. And then everybody, all the statisticians became data scientists, and they got a raise. But data science requires more than just statistics acumen. What skills do you see as critical for the next generation of data science? >> Yeah, it's a great question because I think the first generation of data scientists, became data scientists because they could have done some Python quickly, and be flexible. And I think that the skills of the next generation of data scientists will definitely be different. It will be, first of all, being able to speak the language of the business, meaning how you translates data insight, predictive modeling, all of this into actionable insights of business impact. And it would be about how you collaborate with the rest of the business. It's not just how fast you can build something, how fast you can do a notebook in Python, or do predictive models of some sorts. It's about how you actually build this bridge with the business, and obviously those things are important, but we also must be cognizant of the fact that technology will evolve in the future. There will be new tools, new technologies, and they will still need to keep this level of flexibility to understand quickly what are the next tools they need to use a new languages, or whatever to get there. >> As you look back on 2020, what are you thinking? What are you telling people as we head into next year? >> Yeah, I think it's very interesting, right? This crises has told us that the world really can change from one day to the next. And this has dramatic and perform the aspects. For example companies all of a sudden, show their revenue line dropping, and they had to do less with data. And some other companies was the reverse, right? All of a sudden, they were online like Instacart, for example, and their business completely changed from one day to the other. So this agility of adjusting the resources that you have to do the task, and need that can change, using solution like Snowflake really helps that. Then we saw both in our customers. Some customers from one day to the next, were growing like big time, because they benefited from COVID, and their business benefited. But others had to drop. And what is nice with cloud, it allows you to adjust compute resources to your business needs, and really address it in house. The other aspect is understanding what happening, right? You need to analyze. We saw all our customers basically, wanted to understand what is the going to be the impact on my business? How can I adapt? How can I adjust? And for that, they needed to analyze data. And of course, a lot of data which are not necessarily data about their business, but also they are from the outside. For example, COVID data, where is the States, what is the impact, geographic impact on COVID, the time. And access to this data is critical. So this is the premise of the data cloud, right? Having one single place, where you can put all the data of the world. So our customer obviously then, started to consume the COVID data from that our data marketplace. And we had delete already thousand customers looking at this data, analyzing these data, and to make good decisions. So this agility and this, adapting from one hour to the next is really critical. And that goes with data, with cloud, with interesting resources, and that doesn't exist on premise. So indeed I think the lesson learned is we are living in a world, which is changing all the time, and we have to understand it. We have to adjust, and that's why cloud some ways is great. >> Excellent thank you. In theCUBE we like to talk about disruption, of course, who doesn't? And also, I mean, you look at AI, and the impact that it's beginning to have, and kind of pre-COVID. You look at some of the industries that were getting disrupted by, everyone talks about digital transformation. And you had on the one end of the spectrum, industries like publishing, which are highly disrupted, or taxis. And you can say, okay, well that's Bits versus Adam, the old Negroponte thing. But then the flip side of, you say look at financial services that hadn't been dramatically disrupted, certainly healthcare, which is ripe for disruption, defense. So there a number of industries that really hadn't leaned into digital transformation, if it ain't broke, don't fix it. Not on my watch. There was this complacency. And then of course COVID broke everything. So Florian I wonder if you could comment, what industry or industries do you think are going to be most impacted by data science, and what I call machine intelligence, or AI, in the coming years and decade? >> Honestly, I think it's all of them, or at least most of them, because for some industries, the impact is very visible, because we have talking about brand new products, drones, flying cars, or whatever that are very visible for us. But for others, we are talking about a part from changes in the way you operate as an organization. Even if financial industry itself doesn't seem to be so impacted, when you look at it from the consumer side, or the outside insights in Germany, it's probably impacted just because the way you use data (mumbles) for flexibility you need. Is there kind of the cost gain you can get by leveraging the latest technologies, is just the numbers. And so it's will actually comes from the industry that also. And overall, I think that 2020, is a year where, from the perspective of AI and analytics, we understood this idea of maturity and resilience, maturity meaning that when you've got to crisis you actually need data and AI more than before, you need to actually call the people from data in the room to take better decisions, and look for one and a backlog. And I think that's a very important learning from 2020, that will tell things about 2021. And the resilience, it's like, data analytics today is a function transforming every industries, and is so important that it's something that needs to work. So the infrastructure needs to work, the infrastructure needs to be super resilient, so probably not on prem or not fully on prem, at some point. And the kind of resilience where you need to be able to blend for literally anything, like no hypothesis in terms of BLOs, can be taken for granted. And that's something that is new, and which is just signaling that we are just getting to a next step for data analytics. >> I wonder Benoir if you have anything to add to that. I mean, I often wonder, when are machines going to be able to make better diagnoses than doctors, some people say already. Will the financial services, traditional banks lose control of payment systems? What's going to happen to big retail stores? I mean, maybe bring us home with maybe some of your finals thoughts. >> Yeah, I would say I don't see that as a negative, right? The human being will always be involved very closely, but then the machine, and the data can really help, see correlation in the data that would be impossible for human being alone to discover. So I think it's going to be a compliment not a replacement. And everything that has made us faster, doesn't mean that we have less work to do. It means that we can do more. And we have so much to do, that I will not be worried about the effect of being more efficient, and bare at our work. And indeed, I fundamentally think that data, processing of images, and doing AI on these images, and discovering patterns, and potentially flagging disease way earlier than it was possible. It is going to have a huge impact in health care. And as Florian was saying, every industry is going to be impacted by that technology. So, yeah, I'm very optimistic. >> Great, guys, I wish we had more time. I've got to leave it there, but so thanks so much for coming on theCUBE. It was really a pleasure having you.
SUMMARY :
and Florian Douetteau is the And the next generation of innovation, the access to data, about the types of challenges all the workloads that you of bringing compute to the And essentially, thanks to the talent gap, And so you have a unique perspective And it's really relying on the that the sexy job in the next 10 years of the next generation the resources that you have and the impact that And the kind of resilience where you need Will the financial services, and the data can really help, I've got to leave it there,
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Vellante | PERSON | 0.99+ |
Benoit | PERSON | 0.99+ |
Florian Douetteau | PERSON | 0.99+ |
Florian | PERSON | 0.99+ |
Benoit Dageville | PERSON | 0.99+ |
Dataiku | ORGANIZATION | 0.99+ |
2020 | DATE | 0.99+ |
Hillary Mason | PERSON | 0.99+ |
Hal Varian | PERSON | 0.99+ |
10 years | QUANTITY | 0.99+ |
Python | TITLE | 0.99+ |
Snowflake | ORGANIZATION | 0.99+ |
Germany | LOCATION | 0.99+ |
one hour | QUANTITY | 0.99+ |
both | QUANTITY | 0.99+ |
next year | DATE | 0.99+ |
Bitly | ORGANIZATION | 0.99+ |
one day | QUANTITY | 0.98+ |
2021 | DATE | 0.98+ |
A decade ago | DATE | 0.98+ |
one place | QUANTITY | 0.97+ |
Snowflake Data Cloud Summit | EVENT | 0.97+ |
Snowflake | TITLE | 0.96+ |
each workload | QUANTITY | 0.96+ |
today | DATE | 0.96+ |
first generation | QUANTITY | 0.96+ |
Benoir | PERSON | 0.95+ |
snowflake | EVENT | 0.94+ |
first era | QUANTITY | 0.92+ |
COVID | OTHER | 0.92+ |
single architecture | QUANTITY | 0.91+ |
thousand customers | QUANTITY | 0.9+ |
first data scientist | QUANTITY | 0.9+ |
one | QUANTITY | 0.88+ |
one single place | QUANTITY | 0.87+ |
few years ago | DATE | 0.86+ |
Negroponte | PERSON | 0.85+ |
Florain | ORGANIZATION | 0.82+ |
two world | QUANTITY | 0.81+ |
first | QUANTITY | 0.8+ |
Instacart | ORGANIZATION | 0.75+ |
next 10 years | DATE | 0.7+ |
hours | QUANTITY | 0.67+ |
Snowflake | EVENT | 0.59+ |
a minute | QUANTITY | 0.58+ |
theCUBE | ORGANIZATION | 0.55+ |
Adam | PERSON | 0.49+ |
Benoit Dageville and Florian Douetteau V1
>> Hello everyone, welcome back to theCUBE'S wall to wall coverage of the Snowflake Data Cloud Summit. My name is Dave Vellante and with me are two world-class technologists, visionaries, and entrepreneurs. Benoit Dageville is the, he co-founded Snowflake. And he's now the president of the Product division and Florian Douetteau is the co-founder and CEO of Dataiku. Gentlemen, welcome to theCUBE, two first timers, love it. >> Great time to be here. >> Now Florian, you and Benoit, you have a number of customers in common. And I've said many times on theCUBE that, the first era of cloud was really about infrastructure, making it more agile taking out costs. And the next generation of innovation is really coming from the application of machine intelligence to data with the cloud, is really the scale platform. So is that premise relevant to you, do you buy that? And why do you think Snowflake and Dataiku make a good match for customers? >> I think that because it's our values that align. When it gets all about actually today, and knowing complexity per customer, so you close the gap or we need to commoditize the access to data, the access to technology, it's not only about data, data is important, but it's also about the impacts of data. How can you make the best out of data as fast as possible, as easily as possible within an organization? And another value is about just the openness of the platform, building a future together. I think a platform that is not just about the platform but also for the ecosystem of partners around it, bringing the little bit of accessibility and flexibility, you need for the 10 years of that. >> Yes, so that's key, but it's not just data. It's turning data into insights. Now Benoit, you came out of the world of very powerful, but highly complex databases. And we all know that, you and the Snowflake team, you get very high marks for really radically simplifying customers' lives. But can you talk specifically about the types of challenges that your customers are using Snowflake to solve? >> Yeah, so really the challenge before Snowflake, I would say, was really to put all the data, in one place and run all the computes, all the workloads that you wanted to run, against that data. And of course, existing legacy platforms were not able to support that level of concurrency, many workload. We talk about machine learning, data science, data engineering, data warehouse, big data workloads, all running in one place, didn't make sense at all. And therefore, what customers did, is to create silos, silos of data everywhere, with different systems having a subset of the data. And of course now you cannot analyze this data in one place. So Snowflake, we really solved that problem by creating a single architecture where you can put all the data in the cloud. So it's a really cloud native. We really thought about how to solve that problem, how to create leverage cloud and the elasticity of cloud to really put all the data in one place. But at the same time, not run all workload at the same place. So each workload that runs in Snowflake at least dedicate compute resources to run. And that makes it very agile, right. Florian talked about data scientist having to run analysis. So they need a lot of compute resources, but only for few hours and with Snowflake, they can run these new workload, add this workload to the system, get the compute resources that they need to run this workload. And then when it's over, they can shut down their system. It will automatically shut down. Therefore they would not pay for the resources that they don't choose. So it's a very agile system, where you can do these analysis when you need, and you have all the power to run all these workload at the same time. >> Well, it's profound what you guys built. To me, I mean, because everybody's trying to copy it now. It's like, I remember the notion of bringing compute to the data in the Hadoop days. And I think that, as I say, everybody is sort of following your suit now or trying to. Florian, I got to say, the first data scientist I ever interviewed on theCUBE was the amazing Hilary Mason, right after she started at Bitly. And she made data science sounds so compelling, but data science is hard. So same question for you. What do you see is the biggest challenges for customers that they're facing with data science? >> The biggest challenge from my perspective is that once you solve the issue of the data silo with Snowflake, you don't want to bring another silo, which would be a silo of skills. And essentially, thanks to that talent gap between the talent and labor of the markets, or how it is to actually find, recruit and train data scientists and what needs to be done. And so you need actually to simplify the access to technology such as every organization can make it, whatever the talents by bridging that gap. And to get there, there is a need of actually breaking up the silos. I think a collaborative approach, where technologies and business work together and actually all put some of their ends into those data projects together. >> Yeah, it makes sense. So Florian, Let's stay with you for a minute, if I can. Your observation spaces, is pretty, pretty global. And so, you have a unique perspective on how companies around the world might be using data and data science. Are you seeing any trends, maybe differences between regions or maybe within different industries? What are you seeing? >> Yep. Yeah, definitely, I do see trends that are not geographic that much, but much more in terms of maturity of certain industries and certain sectors, which are that certain industries invested a lot in terms of data, data access, ability to store data as well as few years and know each level of maturity where they can invest more and get to the next steps. And it's really reliant to reach out to certain details, certain organization, actually to have built this longterm data strategy a few years ago, and no stocks ripping off the benefits. >> You know, a decade ago, Florian, Hal Varian famously said that the sexy job in the next 10 years will be statisticians. And then everybody sort of changed that to data scientists. And then everybody, all the statisticians became data scientists and they got a raise. But data science requires more than just statistics acumen. What skills do you see is critical for the next generation of data science? >> Yeah, it's a good question because I think the first generation of data scientists became better scientists because they could learn some Python quickly and be flexible. And I think that skills of the next generation of data scientists will definitely be different. It will be first about being able to speak the language of the business, meaning all you translate data insight, predictive modeling, all of this into actionable insights or business impact. And it will be about who you collaborate with the rest of the business. It's not just how fast you can build something, how fast you can do a notebook in Python or do quantity models of some sorts. It's about how you actually build this bridge with the business. And obviously those things are important, but we also must be cognizant of the fact that technology will evolve in the future. There will be new tools in technologies, and they will still need to get this level of flexibility and get to understand quickly what are the next tools, they need to use or new languages or whatever to get there. >> Thank you for that. Benoit, let's come back to you. This year has been tumultuous to say the least for everyone, but it's a good time to be in tech, ironically. And if you're in cloud, it's even better. But you look at Snowflake and Dataiku, you guys had done well, despite the economic uncertainty and the challenges of the pandemic. As you look back on 2020, what are you thinking? What are you telling people as we head into next year? >> Yeah, I think it's very interesting, right. We, this crisis has told us that the world really can change from one day to the next. And this has dramatic and profound aspects. For example, companies all of a sudden, saw their revenue line dropping and they had to do less with data. And some of the companies was the reverse, right? All of a sudden, they were online like Instacart, for example, and their business completely change from one day to the other. So this agility of adjusting the resources that you have to do the task, a need that can change, using solution like Snowflake, really helps that. And we saw both in our customers. Some customers from one day to the next, were growing like big time, because they benefited from COVID and their business benefited, but also, as you know, had to drop and what is nice with cloud, it allows to adjust compute resources to your business needs and really address it in-house. The other aspect is understanding what is happening, right? You need to analyze. So we saw all our customers basically wanted to understand, what is it going to be the impact on my business? How can I adapt? How can I adjust? And for that, they needed to analyze data. And of course, a lot of data, which are not necessarily data about their business, but also data from the outside. For example, COVID data. Where is the state, what is the impact, geographic impact on COVID all the time. And access to this data is critical. So this is the promise of the data cloud, right? Having one single place where you can put all the data of the world. So, our customers all of a sudden, started to consume the COVID data from our data marketplace. And we have the unit already thousands of customers looking at this data, analyzing this data to make good decisions. So this agility and this adapting from one hour to the next is really critical and that goes with data, with cloud, more interesting resources and that's doesn't exist on premise. So, indeed I think the lesson learned is, we are living in a world which is changing all the time, and we have to understand it. We have to adjust and that's why cloud, some way is great. >> Excellent, thank you. You know, in theCUBE, we like to talk about disruption, of course, who doesn't. And also, I mean, you look at AI and the impact that it's beginning to have and kind of pre-COVID, you look at some of the industries that were getting disrupted by, everybody talks about digital transformation and you had on the one end of the spectrum, industries like publishing, which are highly disrupted or taxis, and you can say, "Okay well, that's Bits versus Adam, the old Negroponte thing." But then the flip side of this, it says, "Look at financial services that hadn't been dramatically disrupted, certainly healthcare, which is right for disruption, defense." So the more the number of industries that really hadn't leaned into digital transformation, if it ain't broke, don't fix it. Not on my watch. There was this complacency. And then of course COVID broke everything. So Florian, I wonder if you could comment, what industry or industries do you think are going to be most impacted by data science and what I call machine intelligence or AI in the coming years and decades? >> Honestly, I think it's all of them, or at least most of them. Because for some industries, the impact is very visible because we are talking about brand new products, drones, flying cars, or whatever is that are very visible for us. But for others, we are talking about spectrum changes in the way you operate as an organization. Even if financial industry itself doesn't seem to be so impacted when you look at it from the consumer side or the outside. In fact internally, it's probably impacted just because of the way you use data to develop for flexibility you need, is there kind of a cost gain you can get by leveraging the latest technologies, is just enormous. And so it will, actually comes from the industry, that also. And overall, I think that 2020 is a year where, from the perspective of AI and analytics, we understood this idea of maturity and resilience. Maturity, meaning that when you've got a crisis, you actually need data and AI more than before, you need to actually call the people from data in the room to take better decisions and look forward and not backward. And I think that's a very important learning from 2020 that will tell things about 2021. And resilience, it's like, yeah, data analytics today is a function consuming every industries, and is so important that it's something that needs to work. So the infrastructure needs to work, the infrastructure needs to be super resilient. So probably not on trend and not fully on trend, at some point and the kind of residence where you need to be able to plan for literally anything. like no hypothesis in terms of behaviors can be taken for granted. And that's something that is new and which is just signaling that we are just getting into a next step for all data analytics. >> I wonder Benoit, if you have anything to add to that, I mean, I often wonder, you know, when are machines going to be able to make better diagnoses than doctors, some people say already. Will the financial services, traditional banks lose control of payment systems? You know, what's going to happen to big retail stores? I mean, may be bring us home with maybe some of your final thoughts. >> Yeah, I would say, I don't see that as a negative, right? The human being will always be involved very closely, but then the machine and the data can really help, see correlation in the data that would be impossible for human being alone to discover. So, I think it's going to be a compliment, not a replacement and everything that has made us faster, doesn't mean that we have less work to do. It means that we can do more. And we have so much to do. That I would not be worried about the effect of being more efficient and better at our work. And indeed, I fundamentally think that, data, processing of images and doing AI on these images and discovering patterns and potentially flagging disease, way earlier than it was possible, it is going to have a huge impact in health care. And as Florian was saying, every industry is going to be impacted by that technology. So, yeah, I'm very optimistic. >> Great, Guys, I wish we had more time. We got to leave it there but so thanks so much for coming on theCUBE. It was really a pleasure having you. >> [Benoit & Florian] Thank you. >> You're welcome but keep it right there, everybody. We'll back with our next guest, right after this short break. You're watching theCUBE.
SUMMARY :
And he's now the president And the next generation of the access to data, the And we all know that, you all the workloads that you the notion of bringing the access to technology such as And so, you have a unique And it's really reliant to reach out Hal Varian famously said that the sexy job And it will be about who you collaborate and the challenges of the pandemic. adjusting the resources that you have end of the spectrum, of the way you use data to I mean, I often wonder, you know, So, I think it's going to be a compliment, We got to leave it there right after this short break.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Vellante | PERSON | 0.99+ |
Florian | PERSON | 0.99+ |
Benoit | PERSON | 0.99+ |
Florian Douetteau | PERSON | 0.99+ |
Benoit Dageville | PERSON | 0.99+ |
2020 | DATE | 0.99+ |
10 years | QUANTITY | 0.99+ |
Dataiku | ORGANIZATION | 0.99+ |
Hilary Mason | PERSON | 0.99+ |
Python | TITLE | 0.99+ |
Hal Varian | PERSON | 0.99+ |
next year | DATE | 0.99+ |
Snowflake | ORGANIZATION | 0.99+ |
one place | QUANTITY | 0.99+ |
both | QUANTITY | 0.99+ |
one hour | QUANTITY | 0.99+ |
Bitly | ORGANIZATION | 0.99+ |
Snowflake Data Cloud Summit | EVENT | 0.99+ |
a decade ago | DATE | 0.98+ |
one day | QUANTITY | 0.98+ |
theCUBE | ORGANIZATION | 0.98+ |
first | QUANTITY | 0.98+ |
each level | QUANTITY | 0.98+ |
Snowflake | TITLE | 0.98+ |
2021 | DATE | 0.97+ |
today | DATE | 0.97+ |
first generation | QUANTITY | 0.97+ |
pandemic | EVENT | 0.97+ |
few years ago | DATE | 0.93+ |
thousands of customers | QUANTITY | 0.93+ |
single architecture | QUANTITY | 0.92+ |
first era | QUANTITY | 0.88+ |
Negroponte | PERSON | 0.87+ |
first data scientist | QUANTITY | 0.87+ |
Instacart | ORGANIZATION | 0.87+ |
This year | DATE | 0.86+ |
one single place | QUANTITY | 0.86+ |
two | QUANTITY | 0.83+ |
two world- | QUANTITY | 0.78+ |
each workload | QUANTITY | 0.78+ |
one | QUANTITY | 0.76+ |
Adam | PERSON | 0.74+ |
next 10 years | DATE | 0.69+ |
first timers | QUANTITY | 0.52+ |
COVID | OTHER | 0.51+ |
COVID | ORGANIZATION | 0.43+ |
COVID | EVENT | 0.37+ |
decades | DATE | 0.29+ |
Bill Schmarzo, Hitachi Vantara | CUBE Conversation, August 2020
>> Announcer: From theCUBE studios in Palo Alto, in Boston, connecting with thought leaders all around the world. This is a CUBE conversation. >> Hey, welcome back, you're ready. Jeff Frick here with theCUBE. We are still getting through the year of 2020. It's still the year of COVID and there's no end in sight I think until we get to a vaccine. That said, we're really excited to have one of our favorite guests. We haven't had him on for a while. I haven't talked to him for a long time. He used to I think have the record for the most CUBE appearances of probably any CUBE alumni. We're excited to have him joining us from his house in Palo Alto. Bill Schmarzo, you know him as the Dean of Big Data, he's got more titles. He's the chief innovation officer at Hitachi Vantara. He's also, we used to call him the Dean of Big Data, kind of for fun. Well, Bill goes out and writes a bunch of books. And now he teaches at the University of San Francisco, School of Management as an executive fellow. He's an honorary professor at NUI Galway. I think he's just, he likes to go that side of the pond and a many time author now, go check him out. His author profile on Amazon, the "Big Data MBA," "The Art of Thinking Like A Data Scientist" and another Big Data, kind of a workbook. Bill, great to see you. >> Thanks, Jeff, you know, I miss my time on theCUBE. These conversations have always been great. We've always kind of poked around the edges of things. A lot of our conversations have always been I thought, very leading edge and the title Dean of Big Data is courtesy of theCUBE. You guys were the first ones to give me that name out of one of the very first Strata Conferences where you dubbed me the Dean of Big Data, because I taught a class there called the Big Data MBA and look what's happened since then. >> I love it. >> It's all on you guys. >> I love it, and we've outlasted Strata, Strata doesn't exist as a conference anymore. So, you know, part of that I think is because Big Data is now everywhere, right? It's not the standalone thing. But there's a topic, and I'm holding in my hands a paper that you worked on with a colleague, Dr. Sidaoui, talking about what is the value of data? What is the economic value of data? And this is a topic that's been thrown around quite a bit. I think you list a total of 28 reference sources in this document. So it's a well researched piece of material, but it's a really challenging problem. So before we kind of get into the details, you know, from your position, having done this for a long time, and I don't know what you're doing today, you used to travel every single week to go out and visit customers and actually do implementations and really help people think these through. When you think about the value, the economic value, how did you start to kind of frame that to make sense and make it kind of a manageable problem to attack? >> So, Jeff, the research project was eyeopening for me. And one of the advantages of being a professor is, you have access to all these very smart, very motivated, very free research sources. And one of the problems that I've wrestled with as long as I've been in this industry is, how do you figure out what is data worth? And so what I did is I took these research students and I stick them on this problem. I said, "I want you to do some research. Let me understand what is the value of data?" I've seen all these different papers and analysts and consulting firms talk about it, but nobody's really got this thing clicked. And so we launched this research project at USF, professor Mouwafac Sidaoui and I together, and we were bumping along the same old path that everyone else got, which was inched on, how do we get data on our balance sheet? That was always the motivation, because as a company we're worth so much more because our data is so valuable, and how do I get it on the balance sheet? So we're headed down that path and trying to figure out how do you get it on the balance sheet? And then one of my research students, she comes up to me and she says, "Professor Schmarzo," she goes, "Data is kind of an unusual asset." I said, "Well, what do you mean?" She goes, "Well, you think about data as an asset. It never depletes, it never wears out. And the same dataset can be used across an unlimited number of use cases at a marginal cost equal to zero." And when she said that, it's like, "Holy crap." The light bulb went off. It's like, "Wait a second. I've been thinking about this entirely wrong for the last 30 some years of my life in this space. I've had the wrong frame. I keep thinking about this as an act, as an accounting conversation. An accounting determines valuation based on what somebody is willing to pay for." So if you go back to Adam Smith, 1776, "Wealth of Nations," he talks about valuation techniques. And one of the valuation techniques he talks about is valuation and exchange. That is the value of an asset is what someone's willing to pay you for it. So the value of this bottle of water is what someone's willing to pay you for it. So everybody fixates on this asset, valuation in exchange methodology. That's how you put it on balance sheet. That's how you run depreciation schedules, that dictates everything. But Adam Smith also talked about in that book, another valuation methodology, which is valuation in use, which is an economics conversation, not an accounting conversation. And when I realized that my frame was wrong, yeah, I had the right book. I had Adam Smith, I had "Wealth of Nations." I had all that good stuff, but I hadn't read the whole book. I had missed this whole concept about the economic value, where value is determined by not how much someone's willing to pay you for it, but the value you can drive by using it. So, Jeff, when that person made that comment, the entire research project, and I got to tell you, my entire life did a total 180, right? Just total of 180 degree change of how I was thinking about data as an asset. >> Right, well, Bill, it's funny though, that's kind of captured, I always think of kind of finance versus accounting, right? And then you're right on accounting. And we learn a lot of things in accounting. Basically we learn more that we don't know, but it's really hard to put it in an accounting framework, because as you said, it's not like a regular asset. You can use it a lot of times, you can use it across lots of use cases, it doesn't degradate over time. In fact, it used to be a liability. 'cause you had to buy all this hardware and software to maintain it. But if you look at the finance side, if you look at the pure play internet companies like Google, like Facebook, like Amazon, and you look at their valuation, right? We used to have this thing, we still have this thing called Goodwill, which was kind of this capture between what the market established the value of the company to be. But wasn't reflected when you summed up all the assets on the balance sheet and you had this leftover thing, you could just plug in goodwill. And I would hypothesize that for these big giant tech companies, the market has baked in the value of the data, has kind of put in that present value on that for a long period of time over multiple projects. And we see it captured probably in goodwill, versus being kind of called out as an individual balance sheet item. >> So I don't think it's, I don't know accounting. I'm not an accountant, thank God, right? And I know that goodwill is one of those things if I remember from my MBA program is something that when you buy a company and you look at the value you paid versus what it was worth, it stuck into this category called goodwill, because no one knew how to figure it out. So the company at book value was a billion dollars, but you paid five billion for it. Well, you're not an idiot, so that four billion extra you paid must be in goodwill and they'd stick it in goodwill. And I think there's actually a way that goodwill gets depreciated as well. So it could be that, but I'm totally away from the accounting framework. I think that's distracting, trying to work within the gap rules is more of an inhibitor. And we talk about the Googles of the world and the Facebooks of the world and the Netflix of the world and the Amazons and companies that are great at monetizing data. Well, they're great at monetizing it because they're not selling it, they're using it. Google is using their data to dominate search, right? Netflix is using it to be the leader in on-demand videos. And it's how they use all the data, how they use the insights about their customers, their products, and their operations to really drive new sources of value. So to me, it's this, when you start thinking about from an economics perspective, for example, why is the same car that I buy and an Uber driver buys, why is that car more valuable to an Uber driver than it is to me? Well, the bottom line is, Uber drivers are going to use that car to generate value, right? That $40,000, that car they bought is worth a lot more, because they're going to use that to generate value. For me it sits in the driveway and the birds poop on it. So, right, so it's this value in use concept. And when organizations can make that, by the way, most organizations really struggle with this. They struggle with this value in use concept. They want to, when you talk to them about data monetization and say, "Well, I'm thinking about the chief data officer, try not to trying to sell data, knocking on doors, shaking their tin cup, saying, 'Buy my data.'" No, no one wants your data. Your data is more valuable for how you use it to drive your operations then it's a sell to somebody else. >> Right, right. Well, on of the other things that's really important from an economics concept is scarcity, right? And a whole lot of economics is driven around scarcity. And how do you price for scarcity so that the market evens out and the price matches up to the supply? What's interesting about the data concept is, there is no scarcity anymore. And you know, you've outlined and everyone has giant numbers going up into the right, in terms of the quantity of the data and how much data there is and is going to be. But what you point out very eloquently in this paper is the scarcity is around the resources to actually do the work on the data to get the value out of the data. And I think there's just this interesting step function between just raw data, which has really no value in and of itself, right? Until you start to apply some concepts to it, you start to analyze it. And most importantly, that you have some context by which you're doing all this analysis to then drive that value. And I thought it was really an interesting part of this paper, which is get beyond the arguing that we're kind of discussing here and get into some specifics where you can measure value around a specific business objective. And not only that, but then now the investment of the resources on top of the data to be able to extract the value to then drive your business process for it. So it's a really different way to think about scarcity, not on the data per se, but on the ability to do something with it. >> You're spot on, Jeff, because organizations don't fail because of a lack of use cases. They fail because they have too many. So how do you prioritize? Now that scarcity is not an issue on the data side, but it is this issue on the people resources side, you don't have unlimited data scientists, right? So how do you prioritize and focus on those opportunities that are most important? I'll tell you, that's not a data science conversation, that's a business conversation, right? And figuring out how you align organizations to identify and focus on those use cases that are most important. Like in the paper we go through several different use cases using Chipotle as an example. The reason why I picked Chipotle is because, well, I like Chipotle. So I could go there and I could write it off as research. But there's a, think about the number of use cases where a company like Chipotle or any other company can leverage your data to drive their key business initiatives and their key operational use cases. It's almost unbounded, which by the way, is a huge challenge. In fact, I think part of the problem we see with a lot of organizations is because they do such a poor job of prioritizing and focusing, they try to solve the entire problem with one big fell swoop, right? It's slightly the old ERP big bang projects. Well, I'm just going to spend $20 million to buy this analytic capability from company X and I'm going to install it and then magic is going to happen. And then magic is going to happen, right? And then magic is going to happen, right? And magic never happens. We get crickets instead, because the biggest challenge isn't around how do I leverage the data, it's about where do I start? What problems do I go after? And how do I make sure the organization is bought in to basically use case by use case, build out your data and analytics architecture and capabilities. >> Yeah, and you start backwards from really specific business objectives in the use cases that you outline here, right? I want to increase my average ticket by X. I want to increase my frequency of visits by X. I want to increase the amount of items per order from X to 1.2 X, or 1.3 X. So from there you get a nice kind of big revenue hit that you can plan around and then work backwards into the amount of effort that it takes and then you can come up, "Is this a good investment or not?" So it's a really different way to get back to the value of the data. And more importantly, the analytics and the work to actually call out the information. >> The technologies, the data and analytic technologies available to us. The very composable nature of these allow us to take this use case by use case approach. I can build out my data lake one use case at a time. I don't need to stuff 25 data sources into my data lake and hope there's someone more valuable. I can use the first use case to say, "Oh, I need these three data sources to solve that use case. I'm going to put those three data sources in the data lake. I'm going to go through the entire curation process of making sure the data has been transformed and cleansed and aligned and enriched and met of, all the other governance, all that kind of stuff this goes on. But I'm going to do that use case by use case, 'cause a use case can tell me which data sources are most important for that given situation. And I can build up my data lake and I can build up my analytics then one use case at a time. And there is a huge impact then, huge impact when I build out use case by use case. That does not happen. Let me throw something that's not really covered in the paper, but it is very much covered in my new book that I'm working on, which is, in knowledge-based industries, the economies of learning are more powerful than the economies of scale. Now think about that for a second. >> Say that again, say that again. >> Yeah, the economies of learning are more powerful than the economies of scale. And what that means is what I learned on the first use case that I build out, I can apply that learning to the second use case, to the third use case, to the fourth use case. So when I put my data into my data lake for my first use case, and the paper covers this, well, once it's in my data lake, the cost of reusing that data in a second, third and fourth use cases is basically, you know marginal cost is zero. So I get this ability to learn about what data sets are most important and to reapply that across the organization. So this learning concept, I learn use case by use case, I don't have to do a big economies of scale approach and start with 25 datasets of which only three or four might be useful. But I'm incurring the overhead for all those other non-important data sets because I didn't take the time to go through and figure out what are my most important use cases and what data do I need to support those use cases. >> I mean, should people even think of the data per se or should they really readjust their thinking around the application of the data? Because the data in and of itself means nothing, right? 55, is that fast or slow? Is that old or young? Well, it depends on a whole lot of things. Am I walking or am I in a brand new Corvette? So it just, it's funny to me that the data in and of itself really doesn't have any value and doesn't really provide any direction into a decision or a higher order, predictive analytics until you start to manipulate the data. So is it even the wrong discussion? Is data the right discussion? Or should we really be talking about the capabilities to do stuff within and really get people focused on that? >> So Jeff, there's so many points to hit on there. So the application of data is what's the value, and the queue of you guys used to be famous for saying, "Separating noise from the signal." >> Signal from the noise. Signal from a noise, right. Well, how do you know in your dataset what's signal and what's noise? Well, the use case will tell you. If you don't know the use case and you have no way of figuring out what's important. One of the things I use, I still rail against, and it happens still. Somebody will walk up my data science team and say, "Here's some data, tell me what's interesting in it." Well, how do you separate signal from noise if I don't know the use case? So I think you're spot on, Jeff. The way to think about this is, don't become data-driven, become value-driven and value is driven from the use case or the application or the use of the data to solve that particular use case. So organizations that get fixated on being data-driven, I hate the term data-driven. It's like as if there's some sort of frigging magic from having data. No, data has no value. It's how you use it to derive customer product and operational insights that drive value,. >> Right, so there's an interesting step function, and we talk about it all the time. You're out in the weeds, working with Chipotle lately, and increase their average ticket by 1.2 X. We talk more here, kind of conceptually. And one of the great kind of conceptual holy grails within a data-driven economy is kind of working up this step function. And you've talked about it here. It's from descriptive, to diagnostic, to predictive. And then the Holy grail prescriptive, we're way ahead of the curve. This comes into tons of stuff around unscheduled maintenance. And you know, there's a lot of specific applications, but do you think we spend too much time kind of shooting for the fourth order of greatness impact, instead of kind of focusing on the small wins? >> Well, you certainly have to build your way there. I don't think you can get to prescriptive without doing predictive, and you can't do predictive without doing descriptive and such. But let me throw a really one at you, Jeff, I think there's even one beyond prescriptive. One we're talking more and more about, autonomous, a ton of analytics, right? And one of the things that paper talked about that didn't click with me at the time was this idea of orphaned analytics. You and I kind of talked about this before the call here. And one thing we noticed in the research was that a lot of these very mature organizations who had advanced from the retrospective analytics of BI to the descriptive, to the predicted, to the prescriptive, they were building one off analytics to solve a problem and getting value from it, but never reusing this analytics over and over again. They were done one off and then they were thrown away and these organizations were so good at data science and analytics, that it was easier for them to just build from scratch than to try to dig around and try to find something that was never actually ever built to be reused. And so I have this whole idea of orphaned analytics, right? It didn't really occur to me. It didn't make any sense into me until I read this quote from Elon Musk, and Elon Musk made this statement. He says, " I believe that when you buy a Tesla, you're buying an asset that appreciates in value, not depreciates through usage." I was thinking, "Wait a second, what does that mean?" He didn't actually say it, "Through usage." He said, "He believes you're buying an asset that appreciates not depreciates in value." And of course the first response I had was, "Oh, it's like a 1964 and a half Mustang. It's rare, so everybody is going to want these things. So buy one, stick it in your garage. And 20 years later, you're bringing it out and it's worth more money." No, no, there's 600,000 of these things roaming around the streets, they're not rare. What he meant is that he is building an autonomous asset. That the more that it's used, the more valuable it's getting, the more reliable, the more efficient, the more predictive, the more safe this asset's getting. So there is this level beyond prescriptive where we can think about, "How do we leverage artificial intelligence, reinforcement, learning, deep learning, to build these assets that the more that they are used, the smarter they get." That's beyond prescriptive. That's an environment where these things are learning. In many cases, they're learning with minimal or no human intervention. That's the real aha moment. That's what I miss with orphaned analytics and why it's important to build analytics that can be reused over and over again. Because every time you use these analytics in a different use case, they get smarter, they get more valuable, they get more predictive. To me that's the aha moment that blew my mind. I realized I had missed that in the paper entirely. And it took me basically two years later to realize, dough, I missed the most important part of the paper. >> Right, well, it's an interesting take really on why the valuation I would argue is reflected in Tesla, which is a function of the data. And there's a phenomenal video if you've never seen it, where they have autonomous vehicle day, it might be a year or so old. And he's got his number one engineer from, I think the Microprocessor Group, The Computer Vision Group, as well as the autonomous driving group. And there's a couple of really great concepts I want to follow up on what you said. One is that they have this thing called The Fleet. To your point, there's hundreds of thousands of these things, if they haven't hit a million, that are calling home reporting home every day as to exactly how everyone took the Northbound 101 on-ramp off of University Avenue. How fast did they go? What line did they take? What G-forces did they take? And every one of those cars feeds into the system, so that when they do the autonomous update, not only are they using all their regular things that they would use to map out that 101 Northbound entry, but they've got all the data from all the cars that have been doing it. And you know, when that other car, the autonomous car couple years ago hit the pedestrian, I think in Phoenix, which is not good, sad, killed a person, dark tough situation. But you know, we are doing an autonomous vehicle show and the guy who made a really interesting point, right? That when something like that happens, typically if I was in a car wreck or you're in a car wreck, hopefully not, I learned the person that we hit learns and maybe a couple of witnesses learn, maybe the inspector. >> But nobody else learns. >> But nobody else learns. But now with the autonomy, every single person can learn from every single experience with every vehicle contributing data within that fleet. To your point, it's just an order of magnitude, different way to think about things. >> Think about a 1% improvement compounded 365 times, equals I think 38 X improvement. The power of 1% improvements over these 600,000 plus cars that are learning. By the way, even when the autonomous FSD, the full self-driving mode module isn't turned on, even when it's not turned on, it runs in shadow mode. So it's learning from the human drivers, the human overlords, it's constantly learning. And by the way, not only they're collecting all this data, I did a little research, I pulled out some of their job search ads and they've built a giant simulator, right? And they're there basically every night, simulating billions and billions of more driven miles because of the simulator. They are building, he's going to have a simulator, not only for driving, but think about all the data he's capturing as these cars are riding down the road. By the way, they don't use Lidar, they use video, right? So he's driving by malls. He knows how many cars are in the mall. He's driving down roads, he knows how old the cars are and which ones should be replaced. I mean, he has this, he's sitting on this incredible wealth of data. If anybody could simulate what's going on in the world and figure out how to get out of this COVID problem, it's probably Elon Musk and the data he's captured, be courtesy of all those cars. >> Yeah, yeah, it's really interesting, and we're seeing it now. There's a new autonomous drone out, the Skydio, and they just announced their commercial product. And again, it completely changes the way you think about how you use that tool, because you've just eliminated the complexity of driving. I don't want to drive that, I want to tell it what to do. And so you're saying, this whole application of air force and companies around things like measuring piles of coal and measuring these huge assets that are volume metric measured, that these things can go and map out and farming, et cetera, et cetera. So the autonomy piece, that's really insightful. I want to shift gears a little bit, Bill, and talk about, you had some theories in here about thinking of data as an asset, data as a currency, data as monetization. I mean, how should people think of it? 'Cause I don't think currency is very good. It's really not kind of an exchange of value that we're doing this kind of classic asset. I think the data as oil is horrible, right? To your point, it doesn't get burned up once and can't be used again. It can be used over and over and over. It's basically like feedstock for all kinds of stuff, but the feedstock never goes away. So again, or is it that even the right way to think about, do we really need to shift our conversation and get past the idea of data and get much more into the idea of information and actionable information and useful information that, oh, by the way, happens to be powered by data under the covers? >> Yeah, good question, Jeff. Data is an asset in the same way that a human is an asset. But just having humans in your company doesn't drive value, it's how you use those humans. And so it's really again the application of the data around the use cases. So I still think data is an asset, but I don't want to, I'm not fixated on, put it on my balance sheet. That nice talk about put it on a balance sheet, I immediately put the blinders on. It inhibits what I can do. I want to think about this as an asset that I can use to drive value, value to my customers. So I'm trying to learn more about my customer's tendencies and propensities and interests and passions, and try to learn the same thing about my car's behaviors and tendencies and my operations have tendencies. And so I do think data is an asset, but it's a latent asset in the sense that it has potential value, but it actually has no value per se, inputting it into a balance sheet. So I think it's an asset. I worry about the accounting concept medially hijacking what we can do with it. To me the value of data becomes and how it interacts with, maybe with other assets. So maybe data itself is not so much an asset as it's fuel for driving the value of assets. So, you know, it fuels my use cases. It fuels my ability to retain and get more out of my customers. It fuels ability to predict what my products are going to break down and even have products who self-monitor, self-diagnosis and self-heal. So, data is an asset, but it's only a latent asset in the sense that it sits there and it doesn't have any value until you actually put something to it and shock it into action. >> So let's shift gears a little bit and start talking about the data and talk about the human factors. 'Cause you said, one of the challenges is people trying to bite off more than they can chew. And we have the role of chief data officer now. And to your point, maybe that mucks things up more than it helps. But in all the customer cases that you've worked on, is there a consistent kind of pattern of behavior, personality, types of projects that enables some people to grab those resources to apply to their data to have successful projects, because to your point there's too much data and there's too many projects and you talk a lot about prioritization. But there's a lot of assumptions in the prioritization model that you can, that you know a whole lot of things, especially if you're comparing project A over in group A with project B, with group B and the two may not really know the economics across that. But from an individual person who sees the potential, what advice do you give them? What kind of characteristics do you see, either in the type of the project, the type of the boss, the type of the individual that really lends itself to a higher probability of a successful outcome? >> So first off you need to find somebody who has a vision for how they want to use the data, and not just collect it. But how they're going to try to change the fortunes of the organization. So it always takes a visionary, may not be the CEO, might be somebody who's a head of marketing or the head of logistics, or it could be a CIO, it could be a chief data officer as well. But you've got to find somebody who says, "We have this latent asset we could be doing more with, and we have a series of organizational problem challenges against which I could apply this asset. And I need to be the matchmaker that brings these together." Now the tool that I think is the most powerful tool in marrying the latent capabilities of data with all the revenue generating opportunities in the application side, because there's a countless number, the most important tool that I found doing that is design thinking. Now, the reason why I think design thinking is so important, because one of the things that design thinking does a great job is it gives everybody a voice in the process of identifying, validating, valuing, and prioritizing use cases you're going to go after. Let me say that again. The challenge organizations have is identifying, validating, valuing, and prioritizing the use cases they want to go after. Design thinking is a marvelous tool for driving organizational alignment around where we're going to start and what's going to be next and why we're going to start there and how we're going to bring everybody together. Big data and data science projects don't die because of technology failure. Most of them die because of passive aggressive behaviors in the organization that you didn't bring everybody into the process. Everybody's voice didn't get a chance to be heard. And that one person who's voice didn't get a chance to get heard, they're going to get you. They may own a certain piece of data. They may own something, but they're just waiting and lay, they're just laying there waiting for their chance to come up and snag it. So what you got to do is you got to proactively bring these people together. We call this, this is part of our value engineering process. We have a value engineering process around envisioning where we bring all these people together. We help them to understand how data in itself is a latent asset, but how it can be used from an economics perspective, drive all those value. We get them all fired up on how these can solve any one of these use cases. But you got to start with one, and you've got to embrace this idea that I can build out my data and analytic capabilities, one use case at a time. And the first use case I go after and solve, makes my second one easier, makes my third one easier, right? It has this ability that when you start going use case by use case two really magical things happen. Number one, your marginal cost flatten. That is because you're building out your data lake one use case at a time, and you're bringing all the important data lake, that data lake one use case at a time. At some point in time, you've got most of the important data you need, and the ability that you don't need to add another data source. You got what you need, so your marginal costs start to flatten. And by the way, if you build your analytics as composable, reusable, continuous learning analytic assets, not as orphaned analytics, pretty soon you have all the analytics you need as well. So your marginal cost flatten, but effect number two is that you've, because you've have the data and the analytics, I can accelerate time to value, and I can de-risked projects as I go use case by use case. And so then the biggest challenge becomes not in the data and the analytics, it's getting the all the business stakeholders to agree on, here's a roadmap we're going to go after. This one's first, and this one is going first because it helps to drive the value of the second and third one. And then this one drives this, and you create a whole roadmap of rippling through of how the data and analytics are driving this value to across all these use cases at a marginal cost approaching zero. >> So should we have chief design thinking officers instead of chief data officers that really actually move the data process along? I mean, I first heard about design thinking years ago, actually interviewing Dan Gordon from Gordon Biersch, and they were, he had just hired a couple of Stanford grads, I think is where they pioneered it, and they were doing some work about introducing, I think it was a a new apple-based alcoholic beverage, apple cider, and they talked a lot about it. And it's pretty interesting, but I mean, are you seeing design thinking proliferate into the organizations that you work with? Either formally as design thinking or as some derivation of it that pulls some of those attributes that you highlighted that are so key to success? >> So I think we're seeing the birth of this new role that's marrying capabilities of design thinking with the capabilities of data and analytics. And they're calling this dude or dudette the chief innovation officer. Surprise. >> Title for someone we know. >> And I got to tell a little story. So I have a very experienced design thinker on my team. All of our data science projects have a design thinker on them. Every one of our data science projects has a design thinker, because the nature of how you build and successfully execute a data science project, models almost exactly how design thinking works. I've written several papers on it, and it's a marvelous way. Design thinking and data science are different sides of the same coin. But my respect for data science or for design thinking took a major shot in the arm, major boost when my design thinking person on my team, whose name is John Morley introduced me to a senior data scientist at Google. And I was bottom coffee. I said, "No," this is back in, before I even joined Hitachi Vantara, and I said, "So tell me the secret to Google's data science success? You guys are marvelous, you're doing things that no one else was even contemplating, and what's your key to success?" And he giggles and laughs and he goes, "Design thinking." I go, "What the hell is that? Design thinking, I've never even heard of the stupid thing before." He goes, "I'd make a deal with you, Friday afternoon let's pop over to Stanford's B school and I'll teach you about design thinking." So I went with him on a Friday to the d.school, Design School over at Stanford and I was blown away, not just in how design thinking was used to ideate and bring and to explore. But I was blown away about how powerful that concept is when you marry it with data science. What is data science in its simplest sense? Data science is about identifying the variables and metrics that might be better predictors of performance. It's that might phrase that's the real key. And who are the people who have the best insights into what values or metrics or KPIs you might want to test? It ain't the data scientists, it's the subject matter experts on the business side. And when you use design thinking to bring this subject matter experts with the data scientists together, all kinds of magic stuff happens. It's unbelievable how well it works. And all of our projects leverage design thinking. Our whole value engineering process is built around marrying design thinking with data science, around this prioritization, around these concepts of, all ideas are worthy of consideration and all voices need to be heard. And the idea how you embrace ambiguity and diversity of perspectives to drive innovation, it's marvelous. But I feel like I'm a lone voice out in the wilderness, crying out, "Yeah, Tesla gets it, Google gets it, Apple gets it, Facebook gets it." But you know, most other organizations in the world, they don't think like that. They think design thinking is this Wufoo thing. Oh yeah, you're going to bring people together and sing Kumbaya. It's like, "No, I'm not singing Kumbaya. I'm picking their brains because they're going to help make their data science team much more effective and knowing what problems we're going to go after and how I'm going to measure success and progress. >> Maybe that's the next Dean for the next 10 years, the Dean of design thinking instead of data science, and who knew they're one and the same? Well, Bill, that's a super insightful, I mean, it's so, is validated and supported by the trends that we see all over the place, just in terms of democratization, right? Democratization of the tools, more people having access to data, more opinions, more perspective, more people that have the ability to manipulate the data and basically experiment, does drive better business outcomes. And it's so consistent. >> If I could add one thing, Jeff, I think that what's really powerful about design thinking is when I think about what's happening with artificial intelligence or AI, there's all these conversations about, "Oh, AI is going to wipe out all these jobs. Is going to take all these jobs away." And what we're actually finding is that if we think about machine learning, driven by AI and human empowerment, driven by design thinking, we're seeing the opportunity to exploit these economies of learning at the front lines where every customer engagement, every operational execution is an opportunity to gather not only more data, but to gather more learnings, to empower the humans at the front lines of the organization to constantly be seeking, to try different things, to explore and to learn from each of these engagements. I think it's, AI to me is incredibly powerful. And I think about it as a source of driving more learning, a continuous learning and continuously adapting an organization where it's not just the machines that are doing this, but it's the humans who've been empowered to do that. And my chapter nine in my new book, Jeff, is all about team empowerment, because nothing you do with AI is going to matter of squat if you don't have empowered teams who know how to take and leverage that continuous learning opportunity at the front lines of customer and operational engagement. >> Bill, I couldn't set a better, I think we'll leave it there. That's a great close, when is the next book coming out? >> So today I do my second to last final review. Then it goes back to the editor and he does a review and we start looking at formatting. So I think we're probably four to six weeks out. >> Okay, well, thank you so much, congratulations on all the success. I just love how the Dean is really the Dean now, teaching all over the world, sharing the knowledge and attacking some of these big problems. And like all great economics problems, often the answer is not economics at all. It's completely really twist the lens and don't think of it in that, all that construct. >> Exactly. >> All right, Bill. Thanks again and have a great week. >> Thanks, Jeff. >> All right. He's Bill Schmarzo, I'm Jeff Frick. You're watching theCUBE. Thanks for watching, we'll see you next time. (gentle music)
SUMMARY :
leaders all around the world. And now he teaches at the of the very first Strata Conferences into the details, you know, and how do I get it on the balance sheet? of the data, has kind of put at the value you paid but on the ability to And how do I make sure the analytics and the work of making sure the data has the time to go through that the data in and of itself and the queue of you is driven from the use case And one of the great kind And of course the first and the guy who made a really But now with the autonomy, and the data he's captured, and get past the idea of of the data around the use cases. and the two may not really and the ability that you don't need into the organizations that you work with? the birth of this new role And the idea how you embrace ambiguity people that have the ability of the organization to is the next book coming out? Then it goes back to the I just love how the Dean Thanks again and have a great week. we'll see you next time.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Jeff | PERSON | 0.99+ |
Bill Schmarzo | PERSON | 0.99+ |
Jeff Frick | PERSON | 0.99+ |
Sidaoui | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
ORGANIZATION | 0.99+ | |
John Morley | PERSON | 0.99+ |
Apple | ORGANIZATION | 0.99+ |
Netflix | ORGANIZATION | 0.99+ |
Palo Alto | LOCATION | 0.99+ |
Amazons | ORGANIZATION | 0.99+ |
five billion | QUANTITY | 0.99+ |
1% | QUANTITY | 0.99+ |
$20 million | QUANTITY | 0.99+ |
$40,000 | QUANTITY | 0.99+ |
August 2020 | DATE | 0.99+ |
365 times | QUANTITY | 0.99+ |
Adam Smith | PERSON | 0.99+ |
Phoenix | LOCATION | 0.99+ |
Uber | ORGANIZATION | 0.99+ |
second | QUANTITY | 0.99+ |
NUI Galway | ORGANIZATION | 0.99+ |
four | QUANTITY | 0.99+ |
third | QUANTITY | 0.99+ |
Schmarzo | PERSON | 0.99+ |
billions | QUANTITY | 0.99+ |
Chipotle | ORGANIZATION | 0.99+ |
Friday afternoon | DATE | 0.99+ |
The Art of Thinking Like A Data Scientist | TITLE | 0.99+ |
University Avenue | LOCATION | 0.99+ |
Hitachi Vantara | ORGANIZATION | 0.99+ |
one | QUANTITY | 0.99+ |
three | QUANTITY | 0.99+ |
28 reference sources | QUANTITY | 0.99+ |
Elon Musk | PERSON | 0.99+ |
Bill | PERSON | 0.99+ |
Boston | LOCATION | 0.99+ |
180 | QUANTITY | 0.99+ |
The Computer Vision Group | ORGANIZATION | 0.99+ |
four billion | QUANTITY | 0.99+ |
first use case | QUANTITY | 0.99+ |
Dan Gordon | PERSON | 0.99+ |
Tesla | ORGANIZATION | 0.99+ |
first | QUANTITY | 0.99+ |
1776 | DATE | 0.99+ |
zero | QUANTITY | 0.99+ |
third use case | QUANTITY | 0.99+ |
180 degree | QUANTITY | 0.99+ |
Elon Musk | PERSON | 0.99+ |
38 X | QUANTITY | 0.99+ |
2020 | DATE | 0.99+ |
two | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
hundreds of thousands | QUANTITY | 0.99+ |
Microprocessor Group | ORGANIZATION | 0.99+ |
25 data sources | QUANTITY | 0.99+ |
six weeks | QUANTITY | 0.99+ |
USF | ORGANIZATION | 0.99+ |
fourth use case | QUANTITY | 0.99+ |
Ben Nelson, Minerva Project | CUBE Conversation March 2020
(upbeat electronic music) >> Announcer: From the CUBE Studios in Palo Alto in Boston, connecting with thought leaders all around the world, this is a CUBE conversation. >> Hey welcome back already, Jeff Frick here with theCUBE. We're in our Palo Alto studios today having a Cube conversation. You know nobody can really travel, conference seasons are all kind of on hold, or going to digital, so there's a lot of interesting stuff going on. But thankfully we've got the capability to invite some of our community in. We're really interested in hearing from some of the leaders that we have in the community about what's going on in their world and you know, what they're telling their people. And what can we learn. So we're excited to have a good friend of mine who went to business school together, God it seems like it was over 20 years ago. He's Ben Nelson, the chairman and CEO of the Minerva Project. Ben great to see you and welcome. >> Thanks so much, great to be here. >> Yeah. So, you have always been kind of a trailblazer, I mean way back in the day I think that you've only had like two jobs in all this time, you know. (laughing) You know kind of changing the world of digital photography. >> Yeah three or four, three or four. >> Three or four. >> Yeah. (laughing) >> And after a really long run, you made this move to start something new in education. >> Yeah. >> Education's a big hairy monster. There's a lot of angles. And you started the Minerva Project, and I can't believe I looked before we got on today that that was nine years ago. So tell us about the Minerva Project, how you got started, kind of what's the mission, and then we'll get into it. >> Yeah so Minerva exists and it sounds somewhat lofty for an organization, but we do exist to serve this mission which is to nurture critical wisdom for the sake of the world. We think a wiser world is a better world. We think that really wisdom is the core goal of education and we decided that higher education is the area that is both most in need of transformation and also one that we're most capable of influencing. And so we set about actually creating our own university demonstrating an example of what a university can do. And then, helping tool other institutions to follow in those footsteps. >> Yeah it's a really interesting take. There's often times we're told if a time traveler came here from 1776, right, and walked around and would look at the way we drive, look at the way we communicate, look at the way we transact business. All these things would be so new and novel inventive. If you walked them over to Stanford or Harvard he'd feel right at home, you know. >> Yeah. >> So the education is still kind of locked in to this way that it's always been. So for you to kind of take a new approach, I mean I guess it did take actually starting your own school to be able to execute and leverage some of these new methods and tools, versus trying to move what is a pretty, you know, kind of hard to move institutional base. >> Yeah absolutely. And it's also you know, because we have to remember that universities as an institution started before the printing press. So if you go and talk to pretty much any university president, and ask him or her what is the mission of a university, generically, forget you know your university or what have you. They'll say, "Well generically universities exist "to create and disseminate knowledge." That's why they've been founded 1000 years ago and that's why they exist today. And you know, creation of knowledge I think there's a good argument to be made that the research mission of a university is important for the advancement of society and that it needs to be supported. Certainly directly in that regard. So much of you know the innovation that we benefit from today came from university labs and research. That's an important factor. But the dissemination of knowledge is a bit of an odd thing. I guess before the printing press, sure, yeah, I mean kind of hard to disseminate knowledge except for if you gather a whole bunch of people in a room and talk at them. Maybe they scribble notes very quickly. Well that's a decent way of disseminating knowledge because they can you know, one mouth and many pieces of paper and then they can read it later or study it. I guess that makes sense it's somewhat efficient. But after the printing press and certainly after the internet, the concept of a university needing to disseminate knowledge as it's core mission seems kind of crazy. It can't be that that's what universities are for. But effectively they're still structured in that way. And I don't think any university president when actually challenged in that way would argue the point. They would say, "Oh yes of course, "well what we really need to do is teach people "how to use knowledge or evaluate knowledge "or make sure that we communicate effectively "or understand how that knowledge can interact "with other pieces of knowledge and you know, "create new ways of thinking, et cetera." But that isn't the dissemination of knowledge. And that isn't the way that universities are actually structured. >> But it's funny that you say that. Even before you get to whether they should be still trying to disseminate knowledge, they're not even using the new tools now that they had the printing press that come along. (laughing) To disseminate knowledge. You know it's really interesting as we're going through this time right now with the coronavirus and a lot of things that were kind of traditional are moving in to digital and this new tool called Zoom which never fails to amaze me how many people are having their first Zoom call ever, right. >> Right, right. >> Ever, right I mean how long ago was Skype, how long ago was WebX. These tools have been around for a really interesting time, a long time. But now, you know, kind of a critical mass of technology that anybody can flip their laptop up, or their phone and go. You know you guys just in terms of a pure kind of tools play you know took advantage of the things that are available here in 2020 and 2019. So I wonder if you can share with the folks that don't have experience kind of using remote learning and remote access, you know what are some of the lessons you learned what are some of the best practice. What should people kind of think about what's capable and the things you can do with digital tools that you can't do when you're trying to get everybody in a classroom together at the same time. >> Right, so I think first and foremost, there's kind of the nuts and bolts. The basics. Right. So one of the things that you know, education environments have always been able to get away with is when you've got everyone in a room and you know, you're kind of cutting them off from the rest of life, you sometimes don't realize that you're talking into thin air, right. That maybe speaking students are not listening, they're not absorbing what you're saying. But you know they have to show up, at least in K 12, and higher ed they don't bother showing up and so the 15 people who do wind up showing up from the 100 person lecture I guess you do you say, "Oh at least they're listening." But the reality is that when you're online, you're competing with everything. You're competing with the next tab, you're competing with just not showing up. It's so much easier to just, you know, open up some game or something, some YouTube video. And so you've got to make this engaging. And making it engaging isn't about being entertaining. And that's actually one of the major problems of assessing who is a good professor and who isn't. You know people look at student reviews, right. They say, "Oh, you know such and such "was such a great professor." But when you actually track student reviews of professors to learning outcomes, there's a slight negative correlation. Right which means that the better the students believe the professor is actually that is an indicator that they've learned a little bit less. >> Right. >> That's really bizarre, intuitively. But when you actually think about it deeply, you realize that entertaining students isn't the job of a professor. It's actually teaching them. It's actually getting them to think through the material. And learning is hard, it's not easy. So you have to bring some of those techniques of engagement into online. And you can do that but it requires a lot of interactivity. So that's aspect number one. But really the much bigger idea isn't that you just do what you do offline and then put it online, right. Technology isn't at it's best when it mimics what you do without it, right. Technology didn't build an exact replica of the horse. >> Right, right. >> And said you know, ride that. Right. It doesn't make any sense, right. Instead, what technology should do is things you cannot do offline. One of the things that worked 300, 400 years ago is that you could study a subject matter in full. One professor, one teacher could teach you pretty much everything that people needed to know in a given field. In fact, the fields themselves were collapsed, right. Science, mathematics, you know, ethics were all put under this idea called philosophy. Philosophy was everything. Right. And so there's really we didn't have much to learn. But today, because we have so much information and so many tools to be able to process through that information, what happens is that education gets atomized. And you know you go through a college education you're you know, being taught by 25, 30 some different professors. But one professor really has no idea what you've learned previously. Even when they're in a 101, 102 sequence. How many times have we been in kind of the 102 class where in the first month all the professor did was repeat what happened in the 101 class because they didn't feel comfortable that you actually learned it. Or if the professor before them taught it the way they wanted it taught. >> Right, right. >> And that's because education is done offline with no data. If you actually have education in a data rich environment you can actually design cross cutting curriculum. You can shift the professor's role from disseminating knowledge to actually having students or mentoring students and guiding them in how to apply that knowledge. And so, once you have institutional views of curricula, you can use technology to deliver an institution wide education. Not by teaching you a way of thinking or a set of content, but giving you a set of tools that broadly any professor can agree on, and then apply them to whatever context professors want to present. And that creates a much more holistic education, and it's one that only can be done using technology. >> Ben that was a mouthful. You got into all kinds of good stuff there. (laughing) So let's break some of it down. That was fascinating. I mean I think you know the asynchronous versus synchronous opportunity if you will, to as you said kind of atomize education to the creation of content right the distribution of content and more importantly the consumption of content. Because why should I have to change my day if the person I want to hear is only available next Tuesday at noon pacific, right. It makes no sense anymore. And the long tail opportunities for this content that lives out there forever is pretty interesting. But it's a very interesting you know, kind of point of view if you assume that all the knowledge is already out there and now your job as an educator is to help train people to critically think about what's out there. How do I incorporate that, what are the things I should be thinking about when I'm integrating that into my decision. It's a very different way. And as you said, everything is an alt tab away. Literally the whole world is an alt tab away from that webinar. (laughing) Very good stuff. >> Exactly right. >> And the other piece I want to get your take on is really kind of lifetime learning. And I didn't know that you guys were you know kind of applying some of your principles oh my goodness where you actually measure effectiveness of teaching. And measure how long people hang out in the class. And measure whether it's good or not. But you're applying this really now in helping companies do digital transformation. And I think, you know, coming at that approach from a shift in thinking is really a different approach. I was just looking at an Andy Jassy keynote from a couple years ago yesterday, and he talked about the A number one thing in digital transformation is a buy in at senior leadership and a top down priority. So you know, what do you see in some of your engagements, how are you applying some of this principles to help people think about change differently? >> Yeah you know I think recessions are a very telling time for corporate learning. Right. And if you notice, what is the first budget that gets cut when economic times get tough it's the you know employee learning and development. Right. Those budgets just get decimated. Right off the bat. And that's primarily because employees don't see much value out of it, and employers don't really measure the impact of those things. No one's saying, "Oh my God, 'this is such an incredible program. "My employees were able to do x before this program, 'and then they were able to do one point five x afterward." You know, if people had that kind of training program in the traditional system, then they would be multi-billion dollar behemoths in the space. And there really are not. And that's because again, most of education is done in content land. And it's usually very expensive, and the results are not very good. Instead, if you actually think about learning tools as opposed to information, and then applying those tools in your core business, all of a sudden you can actually see transformation. And so when we do executive education programs as an example, you know we ask our learner how much of what you've learned can you apply to your job tomorrow? Right. And we see an overwhelming majority of our students are saying something like more than 80 to 90% of what they learned they can apply immediately. >> Wow, that's impressive. >> That's useful. >> Right. And why do you think is it just kind of institutional stuck in the mud? Is it the wrong incentive structure? I mean why you're talking about very simple stuff right. >> Yeah. >> Why don't you actually measure outcomes and adjust accordingly, you know. Use a data centric methodology to improve things over time, you know. Use digital tools in way that they can get you more than you can do in a physical space. I mean is it just inertia? I mean I really think this is a watershed moment because now everybody is forced into using these tools. Right. And there's a lot of, you know kind of psychology around habits and habit forming. >> Right, exactly. >> And if you do something for a certain amount of time every single day you know it becomes a habit. And if these stay in place orders which in my mind I think we are going to be doing it for a while, kind of change people's behavior and the way they use technology to interact with other folks. You know it could be a real, you know, kind of turning point in everyone's opening eyes that digital is different than physical. It's not exactly the same. There are some things in physical that are just better, but, you know there's a whole realm of things in digital that you cannot do when you're bound by time, location, and space. >> Exactly right. That's right. And I think the reason that it's so difficult to shift the system is because the training of people in the system, and I'm speaking specifically about higher education, really has nothing to do with education. Think about how a university professor becomes a university professor. How do they show up in a classroom? They get a bachelor's degree, where they don't learn anything about how to teach or how the mind works. They get a PhD, in which they learn nothing about how to teach or how the mind works. They do a post-doctoral research fellowship where they research in their field, right. Then they become an associate professor or an assistant professor and non-tenure, right. And in order to get tenure they've got seven years in order to make it on a publishing track, because how they teach is irrelevant. And they don't get any formal training on how to teach or how the brain works, right. Then they become you know, a junior tenured professor. A full tenured professor, right. And then maybe they become an administrator. Right. And so if you think about it, all they know is their field. And I've had conversations with academics which are to me befuddling, in the sense that you know they'll say, "Oh, you know, "everyone should learn how to think "like a historian. "Oh no everybody should learn to think "like an economist. "Everyone should learn to think "like a physicist." And you kind of unpack it, you say, "Well why?" And it's, "Oh well because we deploy pools "that nobody else deploys and it's so great." Right. And so it's OK give me an example. I had this conversation with a university president who was a historian. And that president said, "Look, you know, "what we do is we look at you know, "primary source materials hundreds of years ago "and learn to interpret what they say to us "and ascertain truth from that. "That's an incredibly important skill." I said, "OK, so what you're saying is you "examine evidence and evaluate that evidence "to see what it can actually tell you. "Isn't that what every single scientist, "social scientist, no matter what field they're in does? "Isn't that what a physicist does? "Isn't that what an economist does? "Isn't that what a psychologist does? "Right, isn't that what an English professor does?" Right actually thinking about I remember I took a mini module when I was an undergraduate with Rebecca Bushnell who is a literature professor, eventually became the dean of the college of arts and science at the University of Pennsylvania. And, we basically looked at a text written 400 years before, and tried to figure out what parts of the text were written by the author, what were transcription errors, and what was censored. That's looking at evidence. >> Right, right. >> This was an English professor. It's the exact same process. But because people know about it in their field and they think the only way to get to it is through their field, as opposed to teaching the tool, it can't get out of their own way. >> Yeah. >> And that's why I think education is so stuck right now. >> Yeah. That's crazy. And you know we're all victims of kind of the context in which we look through everything, and the lens in which we apply to everything that we see which is you know one of my things that there isn't really a kind of a truth it's what is your interpretation. And that's really you know, what is in your head. But I want to close it out. And Ben I really appreciate your time today. It's been a great conversation. And really kind of take it back to your mission which is around critical thinking. You know there's a lot of conversation lately, you know, this kind of rush to STEM as the thing. And there's certainly a lot of great job opportunities coming out of school if you're a data scientist and you can write in R. But what I think is a more interesting conversation is to get out of your own way. You know is the critical thinking as you know the AI and RPA and all these other things kind of take over more of these tasks and really this higher order need for people to think through complex problems. >> Right. >> I mean like we're going through today. Thank God people who are qualified and can see ahead and saw an exponential curve potential just really causing serious damage when we're still to head into this thing to take aggressive action. Dr. Sarah Cody here locally here you know, telling the San Jose Sharks you can't play. You know that is not an easy decision. But thankfully they did and they had the data. But really just your kind of thoughts on why you prioritize on critical thinking and you know can what you see with your students when they get out into the real world applying critical thinking not necessarily equations. >> Yeah look I think there's no better demonstration of how important critical thinking is than when you look at the kinds of advances that STEM is trying to make. Right. What happens any time we get a demonstration of the power of artificial intelligence, right. You remember a few years ago when Microsoft released it's AI engine. Right. Smartest engineers working on it, and all of a sudden it you know spat back misogynist racist types of perspectives. Why? The training set was garbage. It wasn't that the technology was bad, actually it was amazing technology. But the people who were writing it couldn't think. They didn't know how to think two steps ahead and say, "Wait a second, if we train "the information, kind of the random comments "we see on the internet, you know, "who bothers to write anonymomys comments?" Trolls, right. And so if we train it on a troll data set, it'll become an artificial intelligent troll. Right. It doesn't take a lot of critical thinking to actually realize that, but it takes some. >> Right. >> Right. And when you focus merely on those technical skills what you wind up doing is wasting it. Right. And so if you ground people in critical thinking, and we see this with our graduate. You know we graduated our very first class in May. And we had what as far as I can tell is the best graduate school placement of any graduating class in the country. As far as the quality of offers they got. We had a 94% placement rate in jobs in graduate positions. Which I think is tied with the very best ivy league institutions. And the kinds of jobs that the students are getting and six months into them the kinds of reviews that their employers are giving us looks nothing like a recent undergraduate. These are oftentimes types of jobs that are unavailable to recent undergraduates. And you know we had one student recently actually just told me, fresh in my mind, even though he was the youngest person in his company, when the CEO of his company has a strategic question he comes to him. And when he's in a meeting, full of PhDs, everybody looks to him to run the meeting and set the agenda. He's six months out of undergrad, right. And you know I can give you story after story after story about each and every one of these graduate. And it's not because they were born with it. They actually had a wise education. >> Yeah. Ben well that's a great story. And we'll leave it there. Congratulations again to you and the team at Minerva and what you've built and your first graduating class. Great accomplishment and really great to catch up, it's been too long. And when this is all over we'll have to get together and have an adult beverage. >> That would be wonderful. >> All right Ben thanks a lot. >> Thanks so much Jeff. >> All right. You've been watching theCUBE. Great check in with Ben Nelson. Thanks for watching. Everybody stay safe and we'll see you next time. (upbeat electronic music)
SUMMARY :
all around the world, this is a CUBE conversation. Ben great to see you and welcome. You know kind of changing the world Yeah. you made this move to start something new in education. And you started the Minerva Project, And so we set about actually creating he'd feel right at home, you know. you know, kind of hard to move institutional base. And it's also you know, because we have to remember But it's funny that you say that. and the things you can do with digital tools So one of the things that you know, But really the much bigger idea isn't that you just And you know you go through a college education And so, once you have institutional views of curricula, And as you said, everything is an alt tab away. And I didn't know that you guys it's the you know employee learning and development. And why do you think is it just kind of And there's a lot of, you know kind of psychology in digital that you cannot do when you're bound And that president said, "Look, you know, It's the exact same process. And that's really you know, what is in your head. and you know can what you see with your students "we see on the internet, you know, And you know I can give you story after story after story Congratulations again to you and the team Everybody stay safe and we'll see you next time.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Rebecca Bushnell | PERSON | 0.99+ |
Jeff Frick | PERSON | 0.99+ |
Ben Nelson | PERSON | 0.99+ |
Jeff | PERSON | 0.99+ |
2020 | DATE | 0.99+ |
March 2020 | DATE | 0.99+ |
seven years | QUANTITY | 0.99+ |
Palo Alto | LOCATION | 0.99+ |
2019 | DATE | 0.99+ |
15 people | QUANTITY | 0.99+ |
Sarah Cody | PERSON | 0.99+ |
six months | QUANTITY | 0.99+ |
May | DATE | 0.99+ |
100 person | QUANTITY | 0.99+ |
25 | QUANTITY | 0.99+ |
94% | QUANTITY | 0.99+ |
two jobs | QUANTITY | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
one teacher | QUANTITY | 0.99+ |
One professor | QUANTITY | 0.99+ |
four | QUANTITY | 0.99+ |
Skype | ORGANIZATION | 0.99+ |
Ben | PERSON | 0.99+ |
nine years ago | DATE | 0.99+ |
three | QUANTITY | 0.99+ |
Three | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
University of Pennsylvania | ORGANIZATION | 0.99+ |
Minerva | ORGANIZATION | 0.99+ |
one mouth | QUANTITY | 0.99+ |
1776 | DATE | 0.99+ |
Minerva Project | ORGANIZATION | 0.99+ |
first | QUANTITY | 0.98+ |
1000 years ago | DATE | 0.98+ |
more than 80 | QUANTITY | 0.98+ |
one student | QUANTITY | 0.98+ |
both | QUANTITY | 0.98+ |
ivy league | ORGANIZATION | 0.98+ |
Andy Jassy | PERSON | 0.98+ |
first class | QUANTITY | 0.98+ |
Boston | LOCATION | 0.98+ |
two steps | QUANTITY | 0.98+ |
one | QUANTITY | 0.98+ |
One | QUANTITY | 0.98+ |
tomorrow | DATE | 0.97+ |
one point | QUANTITY | 0.97+ |
YouTube | ORGANIZATION | 0.97+ |
hundreds of years ago | DATE | 0.97+ |
CUBE Studios | ORGANIZATION | 0.96+ |
one professor | QUANTITY | 0.96+ |
English | OTHER | 0.96+ |
101 class | QUANTITY | 0.96+ |
101 | QUANTITY | 0.93+ |
couple years ago | DATE | 0.93+ |
first budget | QUANTITY | 0.93+ |
over 20 years ago | DATE | 0.92+ |
few years ago | DATE | 0.91+ |
five | QUANTITY | 0.91+ |
first month | QUANTITY | 0.89+ |
Dr. | PERSON | 0.89+ |
102 class | QUANTITY | 0.88+ |
CUBE | ORGANIZATION | 0.86+ |
each | QUANTITY | 0.86+ |
a second | QUANTITY | 0.85+ |
Harvard | ORGANIZATION | 0.83+ |
102 | QUANTITY | 0.83+ |
Stanford | ORGANIZATION | 0.82+ |
first graduating | QUANTITY | 0.81+ |
400 years before | DATE | 0.8+ |
400 years ago | DATE | 0.77+ |
next Tuesday at noon | DATE | 0.76+ |
90% | QUANTITY | 0.75+ |
multi-billion dollar | QUANTITY | 0.75+ |
theCUBE | ORGANIZATION | 0.74+ |
yesterday | DATE | 0.72+ |
Sri Ambati, H2O.ai | CUBE Conversation, August 2019
>> from our studios in the heart of Silicon Valley, Palo ALTO, California It is a cute conversation. >> Hello and welcome to this Special Cube conversation here in Palo Alto, California Cubes Studios Jon for your host of the Q. We retreat embodies the founder and CEO of H 20 dot ay, ay, Cuba Lem hot. Start up right in the action of all the machine learning artificial intelligence with the democratization, the role of data in the future, it's all happening with the cloud 2.0, Dev Ops 2.0, great to see you, The test. But the company What's going on, you guys air smoking hot? Congratulations. You got the right formally here with a I explain what's going on. It started about seven >> years ago on Dottie. I was was just a new fad that arrived into Silicon Valley. Today we have thousands of companies in the eye and we're very excited to be partners in making more companies becoming I first. And our region here is to democratize the eye and we've made simple are open source made it easy for people to start adapting data signs and machine learning and different functions inside their large and said the large organizations and apply that for different use cases across financial service is insurance healthcare. >> We leapfrog in 2016 and build our first closer. It's chronic traveler >> C I. We made it on GPS using the latest hardware software innovations Open source. I has funded the rice off automatic machine learning, which >> further reduces the need for >> extraordinary talent to build machine learning. >> No one has time >> today and then we're trying to really bring that automatic mission learning a very significant crunch. Time free, I so people can consuming. I better. >> You know, this is one of the things I love about the current state of the market right now. Entrepreneur Mark, as well as start of some growing companies Go public is that there's a new breed of entrepreneurship going on around large scale, standing up infrastructure, shortening the time it takes to do something like provisioning like the old eyes. I get a phD and we're seeing this in data science. I mean, you don't have to be a python coder. This democratisation is not just a tagline. It's actually the reality is of a business opportunity of whoever can provide the infrastructure and the systems four people to do. It is an opportunity. You guys were doing that. This is a real dynamic. This isn't a new way, a new kind of dynamic in the industry. The three real character >> sticks on ability to adopt. Hey, Iris Oneness Data >> is a team, a team sport, which means that you gotta bring different dimensions within your organization to be able to take advantage of data and the I and, um, you've got to bring in your domain. Scientists work closely with your data. Scientists were closely with your data. Engineers produce applications that can be deployed and then get your design on top of it. That can convince users are our strategist to make those decisions. That delays is showing up, so that takes a multi dimensional workforce to work closely together. So the rial problem, an adoption of the AI today is not just technology, it's also culture. And so we're kind of bringing those aspects together and form of products. One of our products, for example, explainable. Aye, aye. It's helping the data. Scientists tell a story that businesses can understand. Why is the model deciding? I need to take discretion. This'll direction. Why's this moral? Giving this particular nurse a high credit score? Even though she is, she has a very she doesn't have a high school graduation. That kind of figuring out those Democratic democratization goes all the way down there. It's wise, a mortal deciding what's deciding and explaining and breaking that down into English, which which building trust is a huge aspect in a >> well. I want to get to the the talent in the time and the trust equation on the next talk track, but I want to get the hard news out there. You guys are have some news driverless a eyes, your one of your core things. What's the hard Explain the news. What's the big news? >> The big news has Bean, that is, the money ball from business and money Ball, as it has been played out, has been. The experts >> were left out of the >> field and all garden is taking over and there is no participation between experts, the domain scientists and the data scientists and what we're bringing with the new product in travel see eyes, an ability for companies to take away I and become a I companies themselves. The rial air races not between the Googles and the Amazons and Microsoft's and other guy companies, software companies. The relay race is in the word pickles. And how can a company, which is a bank or an insurance giant or a health care company take a I platforms and become, take the data, monetize the data and become a I companies themselves? >> You know, that's a really profound state. I would agree with 100% on that. I think we saw that early on in the big data world round Doop doop kind of died by the wayside. But day Volonte and we keep on team have observed and they actually predicted that the most value was gonna come from practitioners, not the vendors, because they're the ones who have the data. And you mentioned verticals. This is another interesting point. I want to get more explanation from you on Is that APS are driven by data data needs domain specific information. So you can't just say I have data. Therefore, magic happens. It's really at the edge of the domain speak or the domain feature of the application. This is where the data is this kind of supports your idea that the eyes with the company's not that are using it, not the suppliers of the technology. >> Our vision has always being hosted by maker customer service for right to be focused on the customer, and through that we actually made customer one of the product managers inside the company. And the way that the doors that opened from working where it closed with some of our leading customers was that we need to get them to participate and take a eyes, algorithms and platforms that can tune automatically. The algorithms and the right hyper parameter organizations, right features and amend the right data sets that they have. There's a whole data lake around there on their data architecture today, which data sets them and not using in my current problem solving. That's a reasonable problem in looking at that combination of these Berries. Pieces have been automated in travel a, C I. A. And the new version that we're not bringing to market is able to allow them to create their own recipes, bring your own transformers and make that automatic fit for their particular race. Do you think about this as a rebuilt all the components of a race car. They're gonna take it and apply for that particular race to win. >> So that's where driverless comes in its travels in the sense of you don't really need a full operator. It kind of operates on its own. >> In some sense, it's driver less, which is in some there taking the data scientists giving them a power tool that historically before automatic machine learning your valises in the umbrella automatic machine learning they would find tune learning the nuances off the data and the problem, the problem at hand, what they're optimizing for and the right tweaks in the algorithm. So they have to understand how deep the streets are gonna be home, any layers off, off deep learning they need what particular variation and deploying. They should put in a natural language processing what context they need to the long term, short term memory. All these pieces, they have to learn themselves. And they were only a few Grand masters are big data scientist in the world who could come up with the right answer for different problems. >> So you're spreading the love of a I around. So you simplifying that you get the big brains to work on it and democratization. People can then participate in. The machines also can learn both humans and machines between >> our open source and the very maker centric culture we've been able to attract on the world's top data scientists, physicists and compiler engineers to bring in a form factor that businesses can use. And today it one data scientist in a company like Franklin Templeton can operate at the level of 10 or hundreds of them and then bring the best in data science in a form factor that they can plug in and play. >> I was having a cautious We can't Libby, who works with being our platform team. We have all this data with the Cube, and we were just talking. Wait higher data science and a eye specialist and you go out and look around. You get Google and Amazon all these big players, spending between 3 to $4,000,000 per machine learning engineer, and that might be someone under the age of 30. And with no experience or so the talent war is huge. I mean the cost to just hire these guys. We can't hire these people. It's a >> global war. >> There's no there's a talent shortage in China. There's talent shortage in India. There stand shortage in Europe and we have officers in in Europe and in India. The talent shortage in Toronto and Ottawa writes it is. It's a global shortage off physicists and mathematicians and data scientists. So that's where our tools can help. And we see that you see travelers say I as a wave you can drive to New York or you can fly to me >> off. I started my son the other days taking computer science classes in school. I'm like, Well, you know, the machine learning at a eyes kind like dog training. You have dog training. You train that dog to do some tricks that some tricks. Well, if you're a coder, you want to train the machines. This is the machine training. This is data science is what a. I possibilities that machines have to be taught. Something is a base in foot. Machines just aren't self learning on their own. So as you look at the science of a I, this becomes the question on the talent gap. Can the talent get be closed by machines and you got the time you want speed low, latent, see and trust. All these things are hard to do. All three. Balancing all three is extremely difficult. What's your thoughts on those three variables? >> So that's where we brought a I to help the day >> I travel A. C. I's concept that bringing a I to simplify it's an export system to do a I better so you can actually give it to the hands of a new data scientists so you can perform it the power off a Dead ones data centers if you're not disempowering. The data sent that he is a scientist, the park's still foreign data scientist, because he cannot be stopped with the confusion matrix, false positives, false negatives. That's something a data scientists can understand. What you're talking about featured engineering. That's something a data scientists understand. And what travelers say is really doing is helping him may like do that rapidly and automated on the latest hardware. That's what the time is coming into GPS that PTSD pews different form off clouds at cheaper, faster, cheaper and easier. That's the democratization aspect, but it's really targeted. Data Scientist to Prevent Excrement Letter in Science data sciences is a search for truth, but it's a lot of extra minutes to get the truth and law. If you can make the cost of excrement really simple, cheaper on dhe prevent over fitting. That's a common problem in our science. Prevent by us accidental bites that you introduced because the data is last right, trying to kind of prevent the common pitfalls and doing data science leakage. Usually your signal leaks. And how do you prevent those common those pieces? That's kind of weird, revolutionize coming at it. But if you put that in the box, what that really unlocks is imagination. The real hard problems in the world are still the same. >> Aye aye for creative people, for instance. They want infrastructure. They don't wanna have to be an expert. They wanted that value. That's the consumer ization, >> is really the co founder for someone who's highly imaginative and his courage right? And you don't have to look for founders to look for courage and imagination that a lot of intra preneurs in large companies were trying to bring change to that organization. >> You know, we always say that it's intellectual property game's changing from you know I got the protocol. This is locked and patented. Two. You could have a workflow innovation change. One little tweak of a process with data and powerful. Aye, aye, that's the new magic I P equation. It's in the workforce, in the applications, new opportunities. Do you agree with that? >> Absolutely. That the leapfrog from here is businesses will come up with new business processes that we looked at. Business process optimization and globalization can help there. But a I, as you rightfully said earlier, is training computers, not just programming them. Their schooling most of computers that can now with data, think almost at the same level as a go player. Right there was leading Go player. You can think at the same level off an expert in that space. And if that's happening now, I can transform. My business can run 24 by seven at the rate at which I can assembled machines and feed a data data creation becomes making new data becomes the real value that hey, I can >> h 20 today I announcing driverless Aye, aye. Part of their flagship problem product around recipes and democratization. Ay, ay, congratulations. Final point take a minute to explain for the folks just the product, how they buy it. What's it made of? What's the commitment? How did they engage with you >> guys? It's an annual license recruit. License this software license people condone load on our website, get a three week trial, try it on their own retrial. Pretrial recipes are open source, but 100 recipes built by then Masters have been made open source and they could be plugged and tried and taken. Customers, of course, don't have to make their software open source. They can take this, make it theirs. And our region here is to make every company in the eye company. And and that means that they have to embrace it. I learn it. Ticket. Participate some off. The leading conservation companies are giving it back so you can access in the open source. But the real vision here is to build that community off. A practitioners inside large formulations were here or teams air global. And we're here to support that transformation off some of the largest customers. >> So my problem of hiring an aye aye person You could help you solve that right today. Okay, So it was watching. Please get their stuff and come get a job opening here. That's the goal. But that's that's the dream. That is the dream. And we we want to be should one day. I have watched >> you over the last 10 years. You've been an entrepreneur. The fierce passion. We want the eye to be a partner so you can take your message to wider audience and build monetization or on the data you have created. Businesses are the largest after the big data warlords we have on data. Privacy is gonna come eventually. But I think I did. Businesses are the second largest owners of data. They just don't know how to monetize it. Unlock value from it. I will have >> Well, you know, we love day that we want to be data driven. We want to go faster. I love the driverless vision travel. Say I h 20 dot ay, ay here in the Cuban John for it. Breaking news here in Silicon Valley from that start of h 20 dot ay, ay, thanks for watching. Thank you.
SUMMARY :
from our studios in the heart of Silicon Valley, Palo ALTO, But the company What's going on, you guys air smoking hot? And our region here is to democratize the eye and we've made simple are open source made We leapfrog in 2016 and build our first closer. I has funded the rice off automatic machine learning, I better. and the systems four people to do. sticks on ability to adopt. Why is the model deciding? What's the hard Explain the news. The big news has Bean, that is, the money ball from business and experts, the domain scientists and the data scientists and what we're bringing with the new product It's really at the edge of And the way that the doors that opened from working where it closed with some of our leading So that's where driverless comes in its travels in the sense of you don't really need a full operator. the nuances off the data and the problem, the problem at hand, So you simplifying that you get the big brains to our open source and the very maker centric culture we've been able to attract on the world's I mean the cost to just hire And we see that you see travelers say I as a wave you can drive to New York or Can the talent get be closed by machines and you got the time The data sent that he is a scientist, the park's still foreign data scientist, That's the consumer ization, is really the co founder for someone who's highly imaginative and his courage It's in the workforce, in the applications, new opportunities. That the leapfrog from here is businesses will come up with new business explain for the folks just the product, how they buy it. And and that means that they have to embrace it. That is the dream. or on the data you have created. I love the driverless vision
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Microsoft | ORGANIZATION | 0.99+ |
Europe | LOCATION | 0.99+ |
2016 | DATE | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
New York | LOCATION | 0.99+ |
China | LOCATION | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
Amazons | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
Ottawa | LOCATION | 0.99+ |
India | LOCATION | 0.99+ |
Toronto | LOCATION | 0.99+ |
August 2019 | DATE | 0.99+ |
hundreds | QUANTITY | 0.99+ |
100 recipes | QUANTITY | 0.99+ |
100% | QUANTITY | 0.99+ |
Googles | ORGANIZATION | 0.99+ |
three week | QUANTITY | 0.99+ |
24 | QUANTITY | 0.99+ |
first | QUANTITY | 0.99+ |
10 | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
Today | DATE | 0.99+ |
seven | QUANTITY | 0.99+ |
Sri Ambati | PERSON | 0.99+ |
One | QUANTITY | 0.98+ |
one | QUANTITY | 0.98+ |
Libby | PERSON | 0.98+ |
3 | QUANTITY | 0.98+ |
Two | QUANTITY | 0.97+ |
$4,000,000 | QUANTITY | 0.97+ |
Franklin Templeton | ORGANIZATION | 0.97+ |
both | QUANTITY | 0.96+ |
three variables | QUANTITY | 0.95+ |
thousands of companies | QUANTITY | 0.94+ |
Jon | PERSON | 0.93+ |
three | QUANTITY | 0.92+ |
H2O.ai | ORGANIZATION | 0.91+ |
Palo ALTO | LOCATION | 0.9+ |
English | OTHER | 0.89+ |
h 20 dot | OTHER | 0.86+ |
H 20 dot ay | ORGANIZATION | 0.86+ |
Volonte | PERSON | 0.84+ |
Dev Ops 2.0 | TITLE | 0.82+ |
one day | QUANTITY | 0.82+ |
last 10 years | DATE | 0.81+ |
Palo Alto, California | LOCATION | 0.8+ |
second largest | QUANTITY | 0.79+ |
about seven >> years ago | DATE | 0.79+ |
Cubes Studios | ORGANIZATION | 0.77+ |
CEO | PERSON | 0.76+ |
Lem | PERSON | 0.76+ |
one data scientist | QUANTITY | 0.76+ |
under | QUANTITY | 0.76+ |
four people | QUANTITY | 0.73+ |
30 | QUANTITY | 0.71+ |
Dottie | ORGANIZATION | 0.66+ |
Iris | PERSON | 0.65+ |
Bean | PERSON | 0.63+ |
python coder | TITLE | 0.59+ |
California | LOCATION | 0.58+ |
h 20 | OTHER | 0.57+ |
Cube | COMMERCIAL_ITEM | 0.56+ |
Go | TITLE | 0.55+ |
age of | QUANTITY | 0.52+ |
go | TITLE | 0.51+ |
Cuban | OTHER | 0.49+ |
Cuba | ORGANIZATION | 0.47+ |
John | PERSON | 0.44+ |
Oneness | ORGANIZATION | 0.43+ |
Leigh Martin, Infor | Inforum DC 2018
>> Live from Washington, D.C., it's theCUBE! Covering Inforum D.C. 2018. Brought to you by Infor. >> Well, welcome back to Washington, D.C., We are alive here at the Convention Center at Inforum 18, along with Dave Vellante, I'm John Walls. It's a pleasure now, welcome to theCUBE, Leigh Martin, who is the Senior Director of the Dynamic Science Labs at Infor, and good afternoon to you Leigh! >> Good afternoon, thank you for having me. >> Thanks for comin' on. >> Thank you for being here. Alright, well tell us about the Labs first off, obviously, data science is a big push at Infor. What do you do there, and then why is data science such a big deal? >> So Dynamic Science Labs is based in Cambridge, Massachusetts, we have about 20 scientists with backgrounds in math and science areas, so typically PhDs in Statistics and Operations Research, and those types of areas. And, we've really been working over the last several years to build solutions for Infor customers that are Math and Science based. So, we work directly with customers, typically through proof of concept, so we'll work directly with customers, we'll bring in their data, and we will build a solution around it. We like to see them implement it, and make sure we understand that they're getting the value back that we expect them to have. Once we prove out that piece of it, then we look for ways to deliver it to the larger group of Infor customers, typically through one of the Cloud Suites, perhaps functionality, that's built into a Cloud Suite, or something like that. >> Well, give me an example, I mean it's so, as you think-- you're saying that you're using data that's math and science based, but, for application development or solution development if you will. How? >> So, I'll give you an example, so we have a solution called Inventory Intelligence for Healthcare, it's moving towards a more generalized name of Inventory Intelligence, because we're going to move it out of the healthcare space and into other industries, but this is a product that we built over the last couple of years. We worked with a couple of customers, we brought in their loss and data, so their loss in customers, we bring the data into an area where we can work on it, we have a scientist in our team, actually, she's one of the Senior Directors in the team, Dawn Rose, who led the effort to design and build this, design and build the algorithm underlying the product; and what it essentially does is, it allows hospitals to find the right level of inventory. Most hospitals are overstocked, so this gives them an opportunity to bring down their inventory levels, to a manageable place without increasing stockouts, so obviously, it's very important in healthcare, that you're not having a lot of stockouts. And so, we spent a lot of time working with these customers, really understanding what the data was like that they were giving to us, and then Dawn and her team built the algorithm that essentially says, here's what you've done historically, right? So it's based on historic data, at the item level, at the location level. What've you done historically, and how can we project out the levels you should have going forward, so that they're at the right level where you're saving money, but again, you're not increasing stockouts, so. So, it's a lot of time and effort to bring those pieces together and build that algorithm, and then test it out with the customers, try it out a couple of times, you make some tweaks based on their business process and exactly how it works. And then, like I said, we've now built that out into originally a stand-alone application, and in about a month, we're going to go live in Cloud Suite Financials, so it's going to be a piece of functionality inside of Cloud Suite Financials. >> So, John, if I may, >> Please. >> I'm going to digress for a moment here because the first data scientist that I ever interviewed was the famous Hilary Mason, who's of course now at Cloudera, but, and she told me at the time that the data scientist is a part mathematician, part scientist, part statistician, part data hacker, part developer, and part artist. >> Right. (laughs) >> So, you know it's an amazing field that Hal Varian, who is the Google Economist said, "It's going to be the hottest field, in the next 10 years." And this is sort of proven true, but Leigh, my question is, so you guys are practitioners of data science, and then you bring that into your product, and what we hear from a lot of data scientists, other than that sort of, you know, panoply of skill sets, is, they spend more time wrangling data, and the tooling isn't there for collaboration. How are you guys dealing with that? How has that changed inside of Infor? >> It is true. And we actually really focus on first making sure we understand the data and the context of the data, so it's really important if you want to solve a particular business problem that a customer has, to make sure you understand exactly what is the definition of each and every piece of data that's in all of those fields that they sent over to you, before you try to put 'em inside an algorithm and make them do something for you. So it is very true that we spend a lot of time cleaning and understanding data before we ever dive into the problem solving aspect of it. And to your point, there is a whole list of other things that we do after we get through that phase, but it's still something we spend a lot of time on today, and that has been the case for, a long time now. We, wherever we can, we apply new tools and new techniques, but actually just the simple act of going in there and saying, "What am I looking at, how does it relate?" Let me ask the customer to clarify this to make sure I understand exactly what it means. That part doesn't go away, because we're really focused on solving the customer solution and then making sure that we can apply that to other customers, so really knowing what the data is that we're working with is key. So I don't think that part has actually changed too much, there are certainly tools that you can look at. People talk a lot about visualization, so you can start thinking, "Okay, how can I use some visualization to help me understand the data better?" But, just that, that whole act of understanding data is key and core to what we do, because, we want to build the solution that really answers the answers the business problem. >> The other thing that we hear a lot from data scientists is that, they help you figure out what questions you actually have to ask. So, it sort of starts with the data, they analyze the data, maybe you visualize the data, as you just pointed out, and all these questions pop out. So what is the process that you guys use? You have the data, you've got the data scientist, you're looking at the data, you're probably asking all these questions. You get, of course, get questions from your customers as well. You're building models maybe to address those questions, training the models to get better and better and better, and then you infuse that into your software. So, maybe, is that the process? Is it a little more complicated than that? Maybe you could fill in the gaps. >> Yeah, so, I, my personal opinion, and I think many of my colleagues would agree with me on this is, starting with the business problem, for us, is really the key. There are ways to go about looking at the data and then pulling out the questions from the data, but generally, that is a long and involved process. Because, it takes a lot of time to really get that deep into the data. So when we work, we really start with, what's the business problem that the customer's trying to solve? And then, what's the data that needs to be available for us to be able to solve that? And then, build the algorithm around that. So for us, it's really starting with the business problem. >> Okay, so what are some of the big problems? We heard this morning, that there's a problem in that, there's more job openings than there are candidates, and productivity, business productivity is not being impacted. So there are two big chewy problems that data scientists could maybe attack, and you guys seem to be passionate about those, so. How does data science help solve those problems? >> So, I think that, at Infor, I'll start off by saying at Infor there's actually, I talked about the folks that are in our office in Cambridge, but there's quite a bit of data science going on outside of our team, and we are the data science team, but there are lots of places inside of Infor where this is happening. Either in products that contains some sort of algorithmic approach, the HCM team for sure, the talent science team which works on HCM, that's a team that's led by Jill Strange, and we work with them on certain projects in certain areas. They are very focused on solving some of those people-related problems. For us, we work a little bit more on the, some of the other areas we work on is sort of the manufacturing and distribution areas, we work with the healthcare side of things, >> So supply chain, healthcare? >> Exactly. So some of the other areas, because they are, like I said, there are some strong teams out there that do data science, it's just, it's also incorporated with other things, like the talent science team. So, there's lots of examples of it out there. In terms of how we go about building it, so we, like I was saying, we work on answering the business, the business question upfront, understanding the data, and then, really sitting with the customer and building that out, and, so the problems that come to us are often through customers who have particular things that they want to answer. So, a lot of it is driven by customer questions, and particular problems that they're facing. Some of it is driven by us. We have some ideas about things that we think, would be really useful to customers. Either way, it ends up being a customer collaboration with us, with the product team, that eventually we'll want to roll it out too, to make sure that we're answering the problem in the way that the product team really feels it can be rolled out to customers, and better used, and more easily used by them. >> I presume it's a non-linear process, it's not like, that somebody comes to you with a problem, and it's okay, we're going to go look at that. Okay now, we got an answer, I mean it's-- Are you more embedded into the development process than that? Can you just explain that? >> So, we do have, we have a development team in Prague that does work with us, and it's depending on whether we think we're going to actually build a more-- a product with aspects to it like a UI, versus just a back end solution. Depends on how we've decided we want to proceed with it. so, for example, I was talking about Inventory Intelligence for Healthcare, we also have Pricing Science for Distribution, both of those were built initially with UIs on them, and customers could buy those separately. Now that we're in the Cloud Suites, that those are both being incorporated into the Cloud Suite. So, we have, going back to where I was talking about our team in Prague, we sometimes build product, sort of a fully encased product, working with them, and sometimes we work very closely with the development teams from the various Cloud Suites. And the product management team is always there to help us, to figure out sort of the long term plan and how the different pieces fit together. >> You know, kind of big picture, you've got AI right, and then machine learning, pumping all kinds of data your way. So, in a historical time frame, this is all pretty new, this confluence right? And in terms of development, but, where do you see it like 10 years from now, 20 years from now? What potential is there, we've talked about human potential, unlocking human potential, we'll unlock it with that kind of technology, what are we looking at, do you think? >> You know, I think that's such a fascinating area, and area of discussion, and sort of thinking, forward thinking. I do believe in sort of this idea of augmented intelligence, and I think Charles was talking a little bit about, about that this morning, although not in those particular terms; but this idea that computers and machines and technology will actually help us do better, and be better, and being more productive. So this idea of doing sort of the rote everyday tasks, that we no longer have to spend time doing, that'll free us up to think about the bigger problems, and hopefully, and my best self wants to say we'll work on famine, and poverty, and all those problems in the world that, really need our brains to focus on, and work. And the other interesting part of it is, if you think about, sort of the concept of singularity, and are computers ever going to actually be able to think for themselves? That's sort of another interesting piece when you talk about what's going to happen down the line. Maybe it won't happen in 10 years, maybe it will never happen, but there's definitely a lot of people out there, who are well known in sort of tech and science who talk about that, and talk about the fears related to that. That's a whole other piece, but it's fascinating to think about 10 years, 20 years from now, where we are going to be on that spectrum? >> How do you guys think about bias in AI and data science, because, humans express bias, tribalism, that's inherent in human nature. If machines are sort of mimicking humans, how do you deal with that and adjudicate? >> Yeah, and it's definitely a concern, it's another, there's a lot of writings out there and articles out there right now about bias in machine learning and in AI, and it's definitely a concern. I actually read, so, just being aware of it, I think is the first step, right? Because, as scientists and developers develop these algorithms, going into it consciously knowing that this is something they have to protect against, I think is the first step, for sure. And then, I was just reading an article just recently about another company (laughs) who is building sort of a, a bias tracker, so, a way to actually monitor your algorithm and identify places where there is perhaps bias coming in. So, I do think we'll see, we'll start to see more of those things, it gets very complicated, because when you start talking about deep learning and networks and AI, it's very difficult to actually understand what's going on under the covers, right? It's really hard to get in and say this is the reason why, your AI told you this, that's very hard to do. So, it's not going to be an easy process but, I think that we're going to start to see that kind of technology come. >> Well, we heard this morning about some sort of systems that could help, my interpretation, automate, speed up, and minimize the hassle of performance reviews. >> Yes. (laughs) >> And that's the classic example of, an assertive woman is called abrasive or aggressive, an assertive man is called a great leader, so it's just a classic example of bias. I mentioned Hilary Mason, rock star data scientist happens to be a woman, you happen to be a woman. Your thoughts as a woman in tech, and maybe, can AI help resolve some of those biases? >> Yeah. Well, first of all I want to say, I'm very pleased to work in an organization where we have some very strong leaders, who happen to be women, so I mentioned Dawn Rose, who designed our IIH solution, I mentioned Jill Strange, who runs the talent science organization. Half of my team is women, so, particularly inside of sort of the science area inside of Infor, I've been very pleased with the way we've built out some of that skill set. And, I'm also an active member of WIN, so the Women's Infor Network is something I'm very involved with, so, I meet a lot of people across our organization, a lot of women across our organization who have, are just really strong technology supporters, really intelligent, sort of go-getter type of people, and it's great to see that inside of Infor. I think there's a lot of work to be done, for sure. And you can always find stories, from other, whether it's coming out of Silicon Valley, or other places where you hear some, really sort of arcane sounding things that are still happening in the industry, and so, some of those things it's, it's disappointing, certainly to hear that. But I think, Van Jones said something this morning about how, and I liked the way he said it, and I'm not going to be able say it exactly, but he said something along the lines of, "The ground is there, the formation is starting, to get us moving in the right direction." and I think, I'm hopeful for the future, that we're heading in that way, and I think, you know, again, he sort of said something like, "Once the ground swell starts going in that direction, people will really jump in, and will see the benefits of being more diverse." Whether it's across, having more women, or having more people of color, however things expand, and that's just going to make us all better, and more efficient, and more productive, and I think that's a great thing. >> Well, and I think there's a spectrum, right? And on one side of the spectrum, there's intolerable and unacceptable behavior, which is just, should be zero tolerance in my opinion, and the passion of ours in theCUBE. The other side of that spectrum is inclusion, and it's a challenge that we have as a small company, and I remember having a conversation, earlier this year with an individual. And we talk about quotas, and I don't think that's the answer. Her comment was, "No, that's not the answer, you have to endeavor to reach deeper beyond your existing network." Which is hard sometimes for us, 'cause you're so busy, you're running around, it's like okay it's the convenient thing to do. But you got to peel the onion on that network, and actually take the extra time and make it a priority. I mean, your thoughts on that? >> No, I think that's a good point, I mean, if I think about who my circle is, right? And the people that I know and I interact with. If I only reach out to the smallest group of people, I'm not getting really out beyond my initial circle. So I think that's a very good point, and I think that that's-- we have to find ways to be more interactive, and pull from different areas. And I think it's interesting, so coming back to data science for a minute, if you sort of think about the evolution of where we got to, how we got to today where, now we're really pulling people from science areas, and math areas, and technology areas, and data scientists are coming from lots of places, right? And you don't always have to have a PhD, right? You don't necessary have to come up through that system to be a good data scientist, and I think, to see more of that, and really people going beyond, beyond just sort of the traditional circles and the traditional paths to really find people that you wouldn't normally identify, to bring into that, that path, is going to help us, just in general, be more diverse in our approach. >> Well it certainly it seems like it's embedded in the company culture. I think the great reason for you to be so optimistic going forward, not only about your job, but about the way companies going into that doing your job. >> What would you advise, young people generally, who want to crack into the data science field, but specifically, women, who have clearly, are underrepresented in technology? >> Yeah, so, I think the, I think we're starting to see more and more women enter the field, again it's one of those, people know it, and so there's less of a-- because people are aware of it, there's more tendency to be more inclusive. But I definitely think, just go for it, right? I mean if it's something you're interested in, and you want to try it out, go to a coding camp, and take a science class, and there's so many online resources now, I mean there's, the massive online courses that you can take. So, even if you're hesitant about it, there are ways you can kind of be at home, and try it out, and see if that's the right thing for you. >> Just dip your toe in the water. >> Yes, exactly, exactly! Try it out and see, and then just decide if that's the right thing for you, but I think there's a lot of different ways to sort of check it out. Again, you can take a course, you can actually get a degree, there's a wide range of things that you can do to kind of experiment with it, and then find out if that's right for you. >> And if you're not happy with the hiring opportunities out there, just start a company, that's my advice. >> That's right. (laughing together) >> Agreed, I definitely agree! >> We thank you-- we appreciate the time, and great advice, too. >> Thank you so much. >> Leigh Martin joining us here at Inforum 18, we are live in Washington, D.C., you're watching the exclusive coverage, right here, on theCUBE. (bubbly music)
SUMMARY :
Brought to you by Infor. and good afternoon to you Leigh! and then why is data science such a big deal? and we will build a solution around it. Well, give me an example, I mean it's so, as you think-- and how can we project out that the data scientist is a part mathematician, (laughs) and then you bring that into your product, and that has been the case for, a long time now. and then you infuse that into your software. and I think many of my colleagues and you guys seem to be passionate about those, so. some of the other areas we work on is sort of the so the problems that come to us are often through that somebody comes to you with a problem, And the product management team is always there to help us, what are we looking at, do you think? and talk about the fears related to that. How do you guys think about bias that this is something they have to protect against, Well, we heard this morning about some sort of And that's the classic example of, and it's great to see that inside of Infor. and it's a challenge that we have as a small company, and I think that that's-- I think the great reason for you to be and see if that's the right thing for you. and then just decide if that's the right thing for you, the hiring opportunities out there, That's right. we appreciate the time, and great advice, too. at Inforum 18, we are live in Washington, D.C.,
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Vellante | PERSON | 0.99+ |
Hilary Mason | PERSON | 0.99+ |
John Walls | PERSON | 0.99+ |
Hal Varian | PERSON | 0.99+ |
Jill Strange | PERSON | 0.99+ |
Dynamic Science Labs | ORGANIZATION | 0.99+ |
John | PERSON | 0.99+ |
Leigh Martin | PERSON | 0.99+ |
Washington, D.C. | LOCATION | 0.99+ |
Cambridge | LOCATION | 0.99+ |
Prague | LOCATION | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
Charles | PERSON | 0.99+ |
Leigh | PERSON | 0.99+ |
Infor | ORGANIZATION | 0.99+ |
Van Jones | PERSON | 0.99+ |
Dawn | PERSON | 0.99+ |
WIN | ORGANIZATION | 0.99+ |
first step | QUANTITY | 0.99+ |
Cloudera | ORGANIZATION | 0.99+ |
Dawn Rose | PERSON | 0.99+ |
Cambridge, Massachusetts | LOCATION | 0.99+ |
Cloud Suite | TITLE | 0.99+ |
Women's Infor Network | ORGANIZATION | 0.98+ |
Convention Center | LOCATION | 0.98+ |
one | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
both | QUANTITY | 0.97+ |
10 years | QUANTITY | 0.97+ |
this morning | DATE | 0.96+ |
Cloud Suites | TITLE | 0.96+ |
first | QUANTITY | 0.96+ |
one side | QUANTITY | 0.95+ |
Cloud Suite Financials | TITLE | 0.93+ |
each | QUANTITY | 0.92+ |
two big chewy problems | QUANTITY | 0.92+ |
about 20 scientists | QUANTITY | 0.92+ |
D.C. | LOCATION | 0.9+ |
earlier this year | DATE | 0.9+ |
20 years | QUANTITY | 0.88+ |
last couple of years | DATE | 0.88+ |
DC | LOCATION | 0.87+ |
first data scientist | QUANTITY | 0.85+ |
Inforum 18 | ORGANIZATION | 0.83+ |
ORGANIZATION | 0.79+ | |
Half of my team | QUANTITY | 0.76+ |
years | DATE | 0.75+ |
couple | QUANTITY | 0.74+ |
Inventory Intelligence | TITLE | 0.71+ |
years | QUANTITY | 0.69+ |
HCM | ORGANIZATION | 0.68+ |
about a month | QUANTITY | 0.68+ |
next 10 years | DATE | 0.68+ |
2018 | DATE | 0.66+ |
20 | DATE | 0.63+ |
theCUBE | ORGANIZATION | 0.62+ |
last | DATE | 0.55+ |
Inforum | ORGANIZATION | 0.54+ |
zero | QUANTITY | 0.52+ |
Economist | TITLE | 0.51+ |
Cloud | TITLE | 0.49+ |
Inventory | ORGANIZATION | 0.47+ |
Inforum | EVENT | 0.42+ |
Dr. Swaine Chen, Singapore Genomics Institute | AWS Public Sector Summit 2018
>> Live from Washington D.C., it's theCUBE. Covering AWS Public Sector Summit 2018. Brought to you by Amazon Web Services and its ecosystem partners. (upbeat music) >> Hey welcome back everyone we're here live in Washington D.C. for Amazon Web Services Public Sector Summit, I'm John Furrier. Stu Miniman our next guest is Dr. Swaine Chen, Senior Research Scientist of Infectious Disease, the Genome institute of Singapore. And also an assistant professor at The Medicinal National University of Singapore. Great to have you on, I know you've been super busy, you were on stage yesterday, we tried to get you on today, thanks for coming in and kind of bring it in to our two days of coverage here. >> Thank you for having me, I'm very excited to be here. >> So we were in between breaks here and we're talking about some of the work around DNA sequencing but, you know it's super fascinating. I know you've done some work there but, I want to talk first about your presence here at the Public Sector Summit. You were on stage, tell your story 'cause you have an very interesting presentation around some of the cool things you're doing in the cloud, take a minute to explain. >> That's right, so one of the big things that's happening in genomics is the rate of data acquisition is outstripping Moore's Law right? So for a single institute to try to keep up with compute for that, we really can't do it. So that really is the big driver for us to move to cloud, and why we're on AWS. And so then, of course once we can do that once we can sort of have this capacity, there's lots of things that my research is mostly on infection diseases, so one of the things where really you've got, all of a sudden, you've got a huge amount of data you need to process would be a case like an outbreak. And that just happens it happens unexpectedly. So we had one of these that happened that I talked about. And the keynote yesterday was on Group B Streptococcus. This is a totally unexpected disease. And so all of a sudden we had all this data we had to process, and try to figure out what was going on with that outbreak. And unfortunately we're pretty sure that there's going to be other outbreaks coming up in the future as well, and just, being able to be prepared for that. AWS helps us provide some of that capacity, and we're you know, continuously trying to upgrade our analytics for that as well. >> So give me an example of kind of where this kind of hits home for you, where it works. What is doing specifically? Is it changing the timeframe? Is it changing the analysis? Where is the impact for you? >> Yeah so it's all of this right? So it's all the sort of standard things that AWS is providing all of the other companies. So it's cheaper for us to just pay for what we use, especially when we have super spiky work loads. Like in the case of an outbreak right? If all of a sudden we need to sort of take over the cluster internally, well there's going to be a lot of people screaming about that, right? So we can kick that out to the cloud, just pay for what we use, we don't have to sort of requisition all the hardware to do that, so it really helps us along these things. And also gives us the capacity too think about you know as data just comes in more and more, we start to think about, lets just increase our scale. This is somethings that been happening, sort of incessantly in science, incessantly in genomics. So as just an example from my work and my lab we're studying infectious diseases we're studying mostly bacterial genomics. So the genomes of bacteria that cause infections. We've increased our scale 100x in the last four years in terms of the data sets that we're processing. And we see the samples coming in, we're going to do another 10x in the next two years. We just really wouldn't have been able to do that on our current hardware. >> Yeah, Dr. Chen, fascinating space. We love for years there was discussion of well oh how much it costs, to be able to do everything had gone down. But what has been fascinating is you've look, you've talked about that date and outstripping Moore's Law, and not only what you can do but in collaboration with others now, because there's many others around the globe that are doing this. 'Cause talk about that level of data, and how the cloud enables that. >> Yeah so that's actually another great point. So genomics is very strong into open source, especially in the academic community. Whenever we publish a paper, all the genomic data that's in that paper, it gets, uh oh (laughs). Whenever we, whenever we publish-- >> Mall's closing in three minutes. >> Three minutes cloud count. >> Three minutes, okay. Whenever we publish a paper, that data goes up and gets submitted to these public databases. So when I talk about 100x scale, that's really incorporating world wide globally all the data that's present for that species. So as an example, I talked about Group B Streptococcus, another bacteria we study a lot is E. coli, Escherichia coli. So that causes diarrhea, it causes urinary tract infections, bloodstream infections. When we pull down a data set locally, in Singapore, with 100, 200, 300 strains we can now integrate that with a global database of 10,000, 20,000 strains and just gain a global prospective on that. We get higher resolution, and really AWS helps us to pull in from these public databases, and gives the scale to burst out that processing of that many more strains. >> So the DNA piece of your work, does that tie into this at all? I mean obviously you've done a lot of work with the DNA side, was that playing into this as well? >> The? >> The DNA work, you've done in the past? >> Yeah so all of the stuff that we're doing is DNA, basically. So there are other frontiers, that have been explored quite a lot. So looking at RNA and looking at proteins and carbohydrates and lipids, but at the Genome Institute in Singapore, we're very focused on the genetics, and mostly are doing DNA. >> How has the culture changed from academic communities with cloud computing. We're seeing sharing, certainly a key part of data sharing. Can you talk about that dynamic, and what's different now than it was say five to even 10 years ago? >> Huh, I'd say that the academic community has always been pretty open, the academic community right? It's always been a very strong open source compatible kind of community right? So data was always supposed to be submitted to public databases. Didn't always happen, but I think as the data scale goes up and we see the value of the sort of having a global perspective on infectious diseases and looking for the source of an outbreak, the imperative to share data right? That looking at outbreaks like Ebola, where in the past people might try to hold data back because they wanted to publish that. But from a public health point of view, the imperative to share that data immediately is much stronger now that we see the value of having that out there. So I would say that's one of the biggest changes is the imperative is there more. >> I agree I think academic people I talk to, they always want to share, it might be not uploaded fast enough. So time is key. But I got to ask you a personal question, of all the work you've done on, you've seen a lot of outbreaks. This is kind of like scary stuff. Have you had those aha moments, just like mind blowing moments where you go, oh my God we did that because of the cloud? I mean an you point to some examples where it's like that is awesome, that's great stuff. >> Well so we certainly have quite a few examples. I mean outbreaks are just unexpected. Figuring out any of them and being able to impact, or sort of say this is how this transmission is, or this is what the source is. This is how we should try to control this outbreak. I mean all of those are great stories. I would say that , you know, to be honest were still early in our transition to the cloud, and we're kind of running a hybrid environment right now. Like really when we need to burst out, then we'll do that with the cloud. But most of our examples, so far, you know we're still early in this for cloud. >> To the spiky is the key value for you, when the hits pipe out. >> So what excited you about the future of the technology that, do you believe we'll be able to do as we just accelerate, prices go down, access to more information, access to more. What do you think we're going to see in this field the next, you know, one to three years? >> Oh I think on of the biggest changes that's going to happen, is we're going to shift completely how we do, for example in outbreaks right? We're going to shift completely how we do outbreak detection. It's already happening in the U.S. and Europe. We're trying to implement this in Singapore as well. Basically the way we detect outbreaks right now, is we see a rise in the number of cases, you see it at the hospitals, you see a cluster of cases of people getting sick. And what defines a cluster? You kind of need enough of these cases that it sort of statistically goes above your base line. But we actually, when we look at genomic data we can tell, we can find clusters of outbreaks that are buried in the baseline. Because we just have higher resolution. We can see the same bacteria causing infections in groups of people. It might be a small outbreak, it might be self limited. But we can see this stuff happening, and it's buried below the baseline. So this is really what's going to happen, is instead of waiting until, a bunch of people get sick before you know that there's an outbreak. We're going to see that in the baseline or as it's coming up with two, three, five cases. We can save hundreds of infections. And that's one of the things that's super exciting about moving towards the future where sequencing is just going to be a lot cheaper. Sequencing will be faster. Yeah it's a super exciting time. >> And more researching is a flywheel. More researching come over the top. >> Yep, exactly, exactly. >> That's great work, Dr. Swaine Chen, thanks for coming on theCUBE. We really appreciate-- >> No thank you. >> Congratulations, great talk on the keynote yesterday, really appreciate it. This is theCUBE bringing you all the action here as we close down our reporting. They're going to shut us down. theCUBE will go on until they pull the plug, literally. Thanks for watching, I'm John Ferrier, Stu Miniman, and Dave Vellante. Amazons Web Services Public Sector Summit, thanks for watching. (upbeat techno music)
SUMMARY :
Brought to you by Amazon Web Services of Infectious Disease, the Genome institute of Singapore. So we were in between breaks here and we're So that really is the big driver for us to move Where is the impact for you? So it's all the sort of standard things that and how the cloud enables that. especially in the academic community. and gives the scale to burst out that Yeah so all of the stuff that we're How has the culture changed from academic the imperative to share that data immediately of all the work you've done on, This is how we should try to control this outbreak. To the spiky is the key value for you, the next, you know, one to three years? Basically the way we detect outbreaks right now, More researching come over the top. We really appreciate-- Congratulations, great talk on the
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Vellante | PERSON | 0.99+ |
Amazon Web Services | ORGANIZATION | 0.99+ |
Singapore | LOCATION | 0.99+ |
Stu Miniman | PERSON | 0.99+ |
John Ferrier | PERSON | 0.99+ |
two | QUANTITY | 0.99+ |
Three minutes | QUANTITY | 0.99+ |
Swaine Chen | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
John Furrier | PERSON | 0.99+ |
Genome Institute | ORGANIZATION | 0.99+ |
Chen | PERSON | 0.99+ |
three minutes | QUANTITY | 0.99+ |
Escherichia coli | OTHER | 0.99+ |
100 | QUANTITY | 0.99+ |
10x | QUANTITY | 0.99+ |
U.S. | LOCATION | 0.99+ |
Washington D.C. | LOCATION | 0.99+ |
two days | QUANTITY | 0.99+ |
Europe | LOCATION | 0.99+ |
three | QUANTITY | 0.99+ |
100x | QUANTITY | 0.99+ |
yesterday | DATE | 0.99+ |
today | DATE | 0.99+ |
one | QUANTITY | 0.99+ |
three years | QUANTITY | 0.98+ |
Public Sector Summit | EVENT | 0.98+ |
E. coli | OTHER | 0.97+ |
Dr. | PERSON | 0.97+ |
five | QUANTITY | 0.96+ |
Ebola | EVENT | 0.96+ |
Amazon Web Services Public Sector Summit | EVENT | 0.96+ |
The Medicinal National University of Singapore | ORGANIZATION | 0.96+ |
outbreak | EVENT | 0.95+ |
theCUBE | ORGANIZATION | 0.95+ |
Singapore Genomics Institute | ORGANIZATION | 0.94+ |
10,000, 20,000 strains | QUANTITY | 0.94+ |
AWS Public Sector Summit 2018 | EVENT | 0.94+ |
Amazons Web Services Public Sector Summit | EVENT | 0.94+ |
outbreaks | EVENT | 0.93+ |
first | QUANTITY | 0.92+ |
five cases | QUANTITY | 0.91+ |
hundreds of infections | QUANTITY | 0.91+ |
Moore | PERSON | 0.91+ |
last four years | DATE | 0.87+ |
Group B Streptococcus | OTHER | 0.84+ |
200, 300 strains | QUANTITY | 0.83+ |
next two years | DATE | 0.81+ |
single institute | QUANTITY | 0.81+ |
Streptococcus | OTHER | 0.76+ |
Genome institute of Singapore | ORGANIZATION | 0.76+ |
10 years ago | DATE | 0.75+ |
Group B | OTHER | 0.67+ |
people | QUANTITY | 0.5+ |
Scientist | PERSON | 0.48+ |
Infectious Disease | ORGANIZATION | 0.45+ |
bunch | QUANTITY | 0.38+ |
theCUBE | TITLE | 0.37+ |
Raghu Raghuram, VMware | VMware Radio 2018
>> [Narrator] From San Francisco, it's theCUBE. Covering Radio 2018, brought to you by VMware. >> Hey, welcome back everyone. This is theCUBE's exclusive coverage of Radio 2018. We are in San Francisco for their VMware's Radio 2018. It's their R&D fiesta, party. As Steve Harris said, former CTO, it's like a sales kickoff for engineers. It's a great time, but it's also serious. A lot of real serious discussion and of course people are flexing their technical muscle and stretching their minds. And I'm here with one of the chief operator, one of the main principals and legend in VMware, Raghu Raghuram. Chief Operating Officer, new title. Chief Operating Officer, Products and Cloud Services. >> That's right. >> Great to see you. >> Great to see you, John. >> What year did you join VMware? >> 2003 (chuckling) >> 15 years >> So, you've seen many of these radios. >> Yes, it's one of the highlights of the year for me. >> Yeah, super important architect of VMware, great part of the community, leader, architect of the AWS relationship. >> [Raghu] Sure >> Part of that movement with Andy Jassy, Sanjay Poonen. This is the 14th year of radio and VMware has changed a lot since you joined. It's now a world class organization. Getting check marks for one of the best places to work. Certainly for engineers it's like a great party environment. Take a minute to explain the radio culture, it's 14th year, there's t-shirts behind us, to commemorate the key milestones, where it's come from, where it's gone, your thoughts on the program and the community. >> Yeah, I mean this is in fact one of the unique characteristics of VMware. I have checked around with my peers in the industry and I don't think any other tech company of our size does this. Radio stands for R&D innovation offsite. Like you said, we started fourteen years ago just to take a bunch of engineers out from their daily grinds and say, "what could we be building fundamentally that's groundbreaking?" So, I would say it's a cross between a wild science fair and a research conference. In fact, both of these go hand in hand at this place. People publish papers and there is a selection committee just like in serious conferences. In fact, Ray had some amazing stats for this year's submissions and the selection is very very rigorous. At the same time, you'll go upstairs and you'll see the exhibition hall where there are all kinds of things that are displayed. Things that could be very well incremental things in the next release and things that are wild and wacky off the wall that we might never ever do. So, it's really the full gamut. Another interesting thing is we've gone bigger. We are getting people from pretty much all parts of the Emirate. I think there is representation from 25 companies. >> [John] How many engineering centers are there roughly? I mean, there's core centers and then you have engineers all over the world. How many engineers, ballpark? >> I would say, in terms of medium to big size centers, there are probably over a dozen across the globe and literally every continent. Clearly, in the US we have four big centers. In Europe, we have three at least. In Asia, we have another three or four. So, we definitely have over 10. >> I mean everyone who knows the VMware and also knows theCUBE, for nine years, well this is our ninth year covering VMworld, all you gotta do is look at VMworld and you can tell one thing right out of the gate. Very community oriented. All the decisions are made in the community. Also, people who know VMware know you're highly an engineering organization. >> [Raghu] Yep. >> This is not like a lot of marketing fluff. Although, you do have some good marketing here and there, but the point is it's an engineering culture with community. This is unique. I've seen companies that don't walk the talk on "community engineering". They have silos, there's a lot of infighting. How have you- How has VMware preserved a culture of innovation amongst their peers when it's competitive as hell inside VMware? One to be smart, achieve the success. But, also, VMware has always been in always a moving market. How do you guys do it? What's the secret sauce? >> I mean, there's not a single thing. Like you said, culture is something that happens over time and is preserved over time and is preserved through people. It's not like anything you can write down, right? Of course you can write it down. But, it won't be worth the paper it's written down on unless it's practiced everyday by other people. And so, I think that is the key thing here. Right from the get go, customer centric innovation has ruled the rules here. So, the question to ask always is great innovation, look at it from a customer end point of view. I think that matters a lot here. Secondly, there is a lot of emphasis on breaking the rules in terms of doing something disruptive, right? And, the engineers that come here tend to be the kind to respond to that, right? And then lots of venues. Like this is not the only thing that we do, right? We do these things called borathons, which is our internal version of hackathons. We do regional versions of these things. Each of the teams, like the business units, have their own little R&D innovation activities that go on. >> They have a playground. They can basically go outside the scope of their job. >> Exactly. >> Get an idea, a passion, an idea and go after it and not have to worry about anything. >> Yup, exactly. >> [John] With a path to commercialization, if it hits. >> Yeah, that's what I was gonna say. We have a fairly high success rate, I would say, of taking things that we see here and turning them into product and eventually into monetizable businesses. All the things that go into the product features. >> Give some examples of historically, successes, notables, and then also talk about some ones that aren't notable that have come out. I know a lot has come out of this, the numbers are clear. What are some highlights that have come out of the radio event that have been blockbuster successes? >> A lot of the things that you see in the networking today came out of radio. Things about doing security and networking from the hypervisor up, came from here. What you see today as vSAN, had its roots here. What you see today with the app defense and the security stuff, had its roots here. A lot of the features that are in vSphere today, especially the storage vMotion and so on and so forth, was first showcased here. This goes on and on and on. We also have a lot of things that have shown up here that we have not pursued. For example, almost like an eBay for VM capacity. We didn't pursue it. God knows, that could've been a huge idea. (laughing) >> It's the misses too. >> Yeah, there's the misses too. But, that's the whole point of this. >> Yeah. There's parts to creativity. How much creativity goes on at this event? I mean there's certainly a lot of barnstorming, brainstorming, or whatever you wanna call it. A lot of interaction, physical face to face. How much creativity is happening you think here? >> Yeah, so a few years back they introduced a couple of things. One is a instant birds of a feather. Where you can literally go to a whiteboard and say, "hey let's discuss this topic," and set up a time and then people show up. There's this other one they call Lightning Rounds, which literally happens over drinks I think tomorrow or something. Where people come in and it's lots of the mini gauntlet where nothing is scripted. All sorts of crazy ideas keep flowing. I would say those are two examples where there's a lot of on the spot creativity. As a company, the R&D teams have gotten more dispersed. This is the opportunity for people to get together even within the same business unit or across business units and say let's go solve this problem. You and I have been talking about this on email, let's talk about it face to face. Hey, let's bring somebody else in that's relevant to this conversation as well. So, those are the kind of things that go on here that spark the creativity. And then of course, the exhibits. When people start thinking about these exhibits and talking to people that are showing there, other ideas get spawned off as well. >> Raghu, talk about just from your experience, you got a great track record, and certainly it was in VMware, it goes back to the early 2000s. What is your observation on the innovation formula? What's been the consistent constant of innovation? As the waves have changed- I mean, I've been in Palo Alto for 19 years now, in my 20th year. Even Palo Alto's changed. So, the world's changed, modern. And we'll get to the Amazon deal in a second. Certainly cloud's here. What have you seen as the constant innovation variable? >> What I would say is this. Fundamentally the people that we tend to recruit into VMware are by large what we call, or at least I call, platform thinkers. So, they think of building a fundamental piece of technology that can be possibly be used in 10 different ways, and they build it for one particular use case. And then, the questions goes back to, now we've done this, what else can we do with this foundational technology? If you look at vSphere, does the same thing. If you look at networking, same thing. Storage is the same thing. So, I would say that is the constant. That's one constant here. Which is, how do you build fundamentally a platform that could be used in very different ways. >> Some will also say systems thinking. >> Exactly, so that's a compliment. >> The cloud is a system. >> (mumbles) I think Paul Maritz is a 2010 picture. Although, some of the calls didn't come out. He kind of generally had the architecture. >> Yeah, yeah >> He nailed it (laughs) >> There are a few people like Paul in the world and absolutely he nailed it. >> Dave and I would give him a lot of credit for that. Okay, let's talk about Amazon Web Services. Certainly Radio's now 14th year. At what point did the cloud start clicking in? You said there's some misses, the eBay for VMs. Certainly cloud is on the radar. >> Yeah >> And vCloud, we know what happened there. Pat talked about how you guys really took that opportunity, which is, you made lemonade out of some lemons there with that product. That's my words, not his. When did cloud first appear on the horizon in Radio and how do you see that happening now as we talk multi-cloud? >> You missed the alumni session today. One of the early engineers said when he was interviewed by Mendel, which was in 1999, Mendel is of course the founder and first chief scientist here. He said he foresaw the event. When the engineer asked him, "how are we gonna make money on this?" He thought there would be a day when people just rent computer capacity from a data center instead of going out and buying gear. In some ways- >> He predicted >> He predicted >> Cloud operations >> Back in the company's starting days. But really I think we saw this in 2005, 2006, 2007. At the same time actually as Amazon saw this. But, the big difference was we were growing 100% a year on core business and we had our hands full that way. We felt like as a software company the way to play it was by delivering technology to other people to build it. So, that's when it really made it's way here, in Radio and in the products. >> And by the way, it wasn't obvious to many people in the industry at that time, to Amazon. I've had many conversations with Andy Chassy and he now uses the term being misunderstood. They were completely misunderstood unless you were an entrepreneur who was using EC-2 just to avoid seed money. 'Cause it was a dream for entrepreneur's at that time. I remember that clearly. That was not obvious. It really wasn't obvious until about 2010, nine, 10. So you guys were growing. Missed that. Radio is not about missing it. It's about identifying. >> Exactly. >> So, how does it translate today for Amazon? >> The Amazon relationship, if you think about the technical underpinnings of it, clearly we did a vCloud error. We learned a lot on that. Within some of our engineers, the question that was asked was, "what if we could run a cloud on top of other peoples clouds?" And we did experiments with nested virtualization. We did experiments with bare metal. And then we chose the start of our model. So, that's one of the technical early indicators of what we could do on other people's clouds. So, that's a big thing. The rest of the things we're doing with respect to elastically growing capacity and all those things, came from experiments that were shown up here. So, that was the connection back to Radio. In terms of the Amazon partnership itself, a lot of it was driven from the customer end. As we were thinking about VCN not working the way we wanted it to work, we went back to the customers and said, "what is wrong with this picture?" And, the answer that came back was very clear. They said, we like the hybrid idea, but we want the hybrid to be VMware on prem and Amazon in the cloud because 70% of our customers turned out to be AWS customers. And at the same time AWS was hearing the same thing. Why don't you guys team up instead of being either or? That's what led to the partnership. >> Your team at VMware came as the cloud native piece? >> Yeah >> Aspect of it. So Kubernetes is on the horizon. Not on the horizon, in your face. And you've got service mesh over the top. >> Yep, yep >> That's up the stack. It's networking. >> Yep, exactly. >> Still needs to do networking. >> Yeah, exactly. >> It's like, you guys must be like, hey we love what's going on up there. Come down to the store. >> Yeah. So, the boundary between what is application platform and infrastructure platform is constantly changing. Kubernetes, when it started out people said oh it's an application platform. Now it turns out its actually infrastructure. Same thing in networking. So what we see is, things were the lower level of the infrastructure constructs, the same idea is applied at the next level up. That's why we love Kubernetes. We love Service Mesh. We love similar concepts that are coming about in storage and security it's one- >> A unified stack is coming. >> Yep, exactly. >> Just someone fix networking and then the holy grail, programmable networks. >> Yep >> When are they coming? >> At the application level. >> Let's go >> Yeah >> Holy grail is finally here. It's not where you thought it was gonna be. >> It is at both places, right. I mean, it's tying back to the conventional layer, two layer, three stuff because that's also important still. >> Raghu, I love having a chat with you. It's great to chat. >> Good to see you again John. >> Super impressive with the work you've been doing. Love the cloud deal with Amazon, you know that. Love what's going on at Kubernetes and containerization. Love what's going on with Service Mesh, unified stack. Love cryptocurrency, which I didn't get to ask you. >> Yep >> Thumbs up? >> Crazy things going on there too >> Thumbs up, okay, thumbs up. >> We're watching the cryptocurrency. >> Watching, token economics coming right behind it. It's theCUBE bringing you all the action here at Radio. We're the signal. 2018, Radio 2018. I'm theCUBE with Raghu. I'll be right back with more coverage after this short break. (upbeat music)
SUMMARY :
Covering Radio 2018, brought to you by VMware. and of course people are flexing their the community, leader, architect of the AWS relationship. and the community. and the selection is very very rigorous. and then you have engineers all over the world. Clearly, in the US we have four big centers. All the decisions are made in the community. What's the secret sauce? So, the question to ask always They can basically go outside the scope of their job. and not have to worry about anything. All the things that go into the product features. of the radio event that have been blockbuster successes? A lot of the things that you see But, that's the whole point of this. A lot of interaction, physical face to face. This is the opportunity for people to get together So, the world's changed, modern. Fundamentally the people that we tend He kind of generally had the architecture. There are a few people like Paul in the world Certainly cloud is on the radar. When did cloud first appear on the horizon in Radio One of the early engineers said But, the big difference was we And by the way, it wasn't obvious and Amazon in the cloud because 70% So Kubernetes is on the horizon. It's networking. It's like, you guys must be like, of the infrastructure constructs, and then the holy grail, programmable networks. It's not where you thought it was gonna be. It is at both places, right. It's great to chat. Love the cloud deal with Amazon, We're the signal.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Mendel | PERSON | 0.99+ |
Andy Jassy | PERSON | 0.99+ |
Steve Harris | PERSON | 0.99+ |
Europe | LOCATION | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Sanjay Poonen | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
1999 | DATE | 0.99+ |
Dave | PERSON | 0.99+ |
2010 | DATE | 0.99+ |
2005 | DATE | 0.99+ |
Asia | LOCATION | 0.99+ |
Andy Chassy | PERSON | 0.99+ |
Paul | PERSON | 0.99+ |
2007 | DATE | 0.99+ |
US | LOCATION | 0.99+ |
2006 | DATE | 0.99+ |
John | PERSON | 0.99+ |
19 years | QUANTITY | 0.99+ |
four | QUANTITY | 0.99+ |
VMware | ORGANIZATION | 0.99+ |
Palo Alto | LOCATION | 0.99+ |
70% | QUANTITY | 0.99+ |
San Francisco | LOCATION | 0.99+ |
Raghu Raghuram | PERSON | 0.99+ |
three | QUANTITY | 0.99+ |
2003 | DATE | 0.99+ |
25 companies | QUANTITY | 0.99+ |
Amazon Web Services | ORGANIZATION | 0.99+ |
nine years | QUANTITY | 0.99+ |
ninth year | QUANTITY | 0.99+ |
Ray | PERSON | 0.99+ |
14th year | QUANTITY | 0.99+ |
two examples | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
Pat | PERSON | 0.99+ |
Raghu | PERSON | 0.99+ |
vSphere | TITLE | 0.99+ |
eBay | ORGANIZATION | 0.99+ |
early 2000s | DATE | 0.99+ |
over a dozen | QUANTITY | 0.99+ |
One | QUANTITY | 0.99+ |
15 years | QUANTITY | 0.99+ |
Paul Maritz | PERSON | 0.99+ |
both | QUANTITY | 0.98+ |
one | QUANTITY | 0.98+ |
Each | QUANTITY | 0.98+ |
fourteen years ago | DATE | 0.98+ |
two layer | QUANTITY | 0.98+ |
first | QUANTITY | 0.98+ |
tomorrow | DATE | 0.97+ |
Secondly | QUANTITY | 0.97+ |
20th year | QUANTITY | 0.97+ |
100% a year | QUANTITY | 0.97+ |
both places | QUANTITY | 0.97+ |
EC-2 | TITLE | 0.97+ |
vMotion | TITLE | 0.97+ |
10 different ways | QUANTITY | 0.97+ |
Emirate | LOCATION | 0.96+ |
four big centers | QUANTITY | 0.96+ |
VCN | ORGANIZATION | 0.96+ |
VMworld | ORGANIZATION | 0.96+ |
Data Science for All: It's a Whole New Game
>> There's a movement that's sweeping across businesses everywhere here in this country and around the world. And it's all about data. Today businesses are being inundated with data. To the tune of over two and a half million gigabytes that'll be generated in the next 60 seconds alone. What do you do with all that data? To extract insights you typically turn to a data scientist. But not necessarily anymore. At least not exclusively. Today the ability to extract value from data is becoming a shared mission. A team effort that spans the organization extending far more widely than ever before. Today, data science is being democratized. >> Data Sciences for All: It's a Whole New Game. >> Welcome everyone, I'm Katie Linendoll. I'm a technology expert writer and I love reporting on all things tech. My fascination with tech started very young. I began coding when I was 12. Received my networking certs by 18 and a degree in IT and new media from Rochester Institute of Technology. So as you can tell, technology has always been a sure passion of mine. Having grown up in the digital age, I love having a career that keeps me at the forefront of science and technology innovations. I spend equal time in the field being hands on as I do on my laptop conducting in depth research. Whether I'm diving underwater with NASA astronauts, witnessing the new ways which mobile technology can help rebuild the Philippine's economy in the wake of super typhoons, or sharing a first look at the newest iPhones on The Today Show, yesterday, I'm always on the hunt for the latest and greatest tech stories. And that's what brought me here. I'll be your host for the next hour and as we explore the new phenomenon that is taking businesses around the world by storm. And data science continues to become democratized and extends beyond the domain of the data scientist. And why there's also a mandate for all of us to become data literate. Now that data science for all drives our AI culture. And we're going to be able to take to the streets and go behind the scenes as we uncover the factors that are fueling this phenomenon and giving rise to a movement that is reshaping how businesses leverage data. And putting organizations on the road to AI. So coming up, I'll be doing interviews with data scientists. We'll see real world demos and take a look at how IBM is changing the game with an open data science platform. We'll also be joined by legendary statistician Nate Silver, founder and editor-in-chief of FiveThirtyEight. Who will shed light on how a data driven mindset is changing everything from business to our culture. We also have a few people who are joining us in our studio, so thank you guys for joining us. Come on, I can do better than that, right? Live studio audience, the fun stuff. And for all of you during the program, I want to remind you to join that conversation on social media using the hashtag DSforAll, it's data science for all. Share your thoughts on what data science and AI means to you and your business. And, let's dive into a whole new game of data science. Now I'd like to welcome my co-host General Manager IBM Analytics, Rob Thomas. >> Hello, Katie. >> Come on guys. >> Yeah, seriously. >> No one's allowed to be quiet during this show, okay? >> Right. >> Or, I'll start calling people out. So Rob, thank you so much. I think you know this conversation, we're calling it a data explosion happening right now. And it's nothing new. And when you and I chatted about it. You've been talking about this for years. You have to ask, is this old news at this point? >> Yeah, I mean, well first of all, the data explosion is not coming, it's here. And everybody's in the middle of it right now. What is different is the economics have changed. And the scale and complexity of the data that organizations are having to deal with has changed. And to this day, 80% of the data in the world still sits behind corporate firewalls. So, that's becoming a problem. It's becoming unmanageable. IT struggles to manage it. The business can't get everything they need. Consumers can't consume it when they want. So we have a challenge here. >> It's challenging in the world of unmanageable. Crazy complexity. If I'm sitting here as an IT manager of my business, I'm probably thinking to myself, this is incredibly frustrating. How in the world am I going to get control of all this data? And probably not just me thinking it. Many individuals here as well. >> Yeah, indeed. Everybody's thinking about how am I going to put data to work in my organization in a way I haven't done before. Look, you've got to have the right expertise, the right tools. The other thing that's happening in the market right now is clients are dealing with multi cloud environments. So data behind the firewall in private cloud, multiple public clouds. And they have to find a way. How am I going to pull meaning out of this data? And that brings us to data science and AI. That's how you get there. >> I understand the data science part but I think we're all starting to hear more about AI. And it's incredible that this buzz word is happening. How do businesses adopt to this AI growth and boom and trend that's happening in this world right now? >> Well, let me define it this way. Data science is a discipline. And machine learning is one technique. And then AI puts both machine learning into practice and applies it to the business. So this is really about how getting your business where it needs to go. And to get to an AI future, you have to lay a data foundation today. I love the phrase, "there's no AI without IA." That means you're not going to get to AI unless you have the right information architecture to start with. >> Can you elaborate though in terms of how businesses can really adopt AI and get started. >> Look, I think there's four things you have to do if you're serious about AI. One is you need a strategy for data acquisition. Two is you need a modern data architecture. Three is you need pervasive automation. And four is you got to expand job roles in the organization. >> Data acquisition. First pillar in this you just discussed. Can we start there and explain why it's so critical in this process? >> Yeah, so let's think about how data acquisition has evolved through the years. 15 years ago, data acquisition was about how do I get data in and out of my ERP system? And that was pretty much solved. Then the mobile revolution happens. And suddenly you've got structured and non-structured data. More than you've ever dealt with. And now you get to where we are today. You're talking terabytes, petabytes of data. >> [Katie] Yottabytes, I heard that word the other day. >> I heard that too. >> Didn't even know what it meant. >> You know how many zeros that is? >> I thought we were in Star Wars. >> Yeah, I think it's a lot of zeroes. >> Yodabytes, it's new. >> So, it's becoming more and more complex in terms of how you acquire data. So that's the new data landscape that every client is dealing with. And if you don't have a strategy for how you acquire that and manage it, you're not going to get to that AI future. >> So a natural segue, if you are one of these businesses, how do you build for the data landscape? >> Yeah, so the question I always hear from customers is we need to evolve our data architecture to be ready for AI. And the way I think about that is it's really about moving from static data repositories to more of a fluid data layer. >> And we continue with the architecture. New data architecture is an interesting buzz word to hear. But it's also one of the four pillars. So if you could dive in there. >> Yeah, I mean it's a new twist on what I would call some core data science concepts. For example, you have to leverage tools with a modern, centralized data warehouse. But your data warehouse can't be stagnant to just what's right there. So you need a way to federate data across different environments. You need to be able to bring your analytics to the data because it's most efficient that way. And ultimately, it's about building an optimized data platform that is designed for data science and AI. Which means it has to be a lot more flexible than what clients have had in the past. >> All right. So we've laid out what you need for driving automation. But where does the machine learning kick in? >> Machine learning is what gives you the ability to automate tasks. And I think about machine learning. It's about predicting and automating. And this will really change the roles of data professionals and IT professionals. For example, a data scientist cannot possibly know every algorithm or every model that they could use. So we can automate the process of algorithm selection. Another example is things like automated data matching. Or metadata creation. Some of these things may not be exciting but they're hugely practical. And so when you think about the real use cases that are driving return on investment today, it's things like that. It's automating the mundane tasks. >> Let's go ahead and come back to something that you mentioned earlier because it's fascinating to be talking about this AI journey, but also significant is the new job roles. And what are those other participants in the analytics pipeline? >> Yeah I think we're just at the start of this idea of new job roles. We have data scientists. We have data engineers. Now you see machine learning engineers. Application developers. What's really happening is that data scientists are no longer allowed to work in their own silo. And so the new job roles is about how does everybody have data first in their mind? And then they're using tools to automate data science, to automate building machine learning into applications. So roles are going to change dramatically in organizations. >> I think that's confusing though because we have several organizations who saying is that highly specialized roles, just for data science? Or is it applicable to everybody across the board? >> Yeah, and that's the big question, right? Cause everybody's thinking how will this apply? Do I want this to be just a small set of people in the organization that will do this? But, our view is data science has to for everybody. It's about bring data science to everybody as a shared mission across the organization. Everybody in the company has to be data literate. And participate in this journey. >> So overall, group effort, has to be a common goal, and we all need to be data literate across the board. >> Absolutely. >> Done deal. But at the end of the day, it's kind of not an easy task. >> It's not. It's not easy but it's maybe not as big of a shift as you would think. Because you have to put data in the hands of people that can do something with it. So, it's very basic. Give access to data. Data's often locked up in a lot of organizations today. Give people the right tools. Embrace the idea of choice or diversity in terms of those tools. That gets you started on this path. >> It's interesting to hear you say essentially you need to train everyone though across the board when it comes to data literacy. And I think people that are coming into the work force don't necessarily have a background or a degree in data science. So how do you manage? >> Yeah, so in many cases that's true. I will tell you some universities are doing amazing work here. One example, University of California Berkeley. They offer a course for all majors. So no matter what you're majoring in, you have a course on foundations of data science. How do you bring data science to every role? So it's starting to happen. We at IBM provide data science courses through CognitiveClass.ai. It's for everybody. It's free. And look, if you want to get your hands on code and just dive right in, you go to datascience.ibm.com. The key point is this though. It's more about attitude than it is aptitude. I think anybody can figure this out. But it's about the attitude to say we're putting data first and we're going to figure out how to make this real in our organization. >> I also have to give a shout out to my alma mater because I have heard that there is an offering in MS in data analytics. And they are always on the forefront of new technologies and new majors and on trend. And I've heard that the placement behind those jobs, people graduating with the MS is high. >> I'm sure it's very high. >> So go Tigers. All right, tangential. Let me get back to something else you touched on earlier because you mentioned that a number of customers ask you how in the world do I get started with AI? It's an overwhelming question. Where do you even begin? What do you tell them? >> Yeah, well things are moving really fast. But the good thing is most organizations I see, they're already on the path, even if they don't know it. They might have a BI practice in place. They've got data warehouses. They've got data lakes. Let me give you an example. AMC Networks. They produce a lot of the shows that I'm sure you watch Katie. >> [Katie] Yes, Breaking Bad, Walking Dead, any fans? >> [Rob] Yeah, we've got a few. >> [Katie] Well you taught me something I didn't even know. Because it's amazing how we have all these different industries, but yet media in itself is impacted too. And this is a good example. >> Absolutely. So, AMC Networks, think about it. They've got ads to place. They want to track viewer behavior. What do people like? What do they dislike? So they have to optimize every aspect of their business from marketing campaigns to promotions to scheduling to ads. And their goal was transform data into business insights and really take the burden off of their IT team that was heavily burdened by obviously a huge increase in data. So their VP of BI took the approach of using machine learning to process large volumes of data. They used a platform that was designed for AI and data processing. It's the IBM analytics system where it's a data warehouse, data science tools are built in. It has in memory data processing. And just like that, they were ready for AI. And they're already seeing that impact in their business. >> Do you think a movement of that nature kind of presses other media conglomerates and organizations to say we need to be doing this too? >> I think it's inevitable that everybody, you're either going to be playing, you're either going to be leading, or you'll be playing catch up. And so, as we talk to clients we think about how do you start down this path now, even if you have to iterate over time? Because otherwise you're going to wake up and you're going to be behind. >> One thing worth noting is we've talked about analytics to the data. It's analytics first to the data, not the other way around. >> Right. So, look. We as a practice, we say you want to bring data to where the data sits. Because it's a lot more efficient that way. It gets you better outcomes in terms of how you train models and it's more efficient. And we think that leads to better outcomes. Other organization will say, "Hey move the data around." And everything becomes a big data movement exercise. But once an organization has started down this path, they're starting to get predictions, they want to do it where it's really easy. And that means analytics applied right where the data sits. >> And worth talking about the role of the data scientist in all of this. It's been called the hot job of the decade. And a Harvard Business Review even dubbed it the sexiest job of the 21st century. >> Yes. >> I want to see this on the cover of Vogue. Like I want to see the first data scientist. Female preferred, on the cover of Vogue. That would be amazing. >> Perhaps you can. >> People agree. So what changes for them? Is this challenging in terms of we talk data science for all. Where do all the data science, is it data science for everyone? And how does it change everything? >> Well, I think of it this way. AI gives software super powers. It really does. It changes the nature of software. And at the center of that is data scientists. So, a data scientist has a set of powers that they've never had before in any organization. And that's why it's a hot profession. Now, on one hand, this has been around for a while. We've had actuaries. We've had statisticians that have really transformed industries. But there are a few things that are new now. We have new tools. New languages. Broader recognition of this need. And while it's important to recognize this critical skill set, you can't just limit it to a few people. This is about scaling it across the organization. And truly making it accessible to all. >> So then do we need more data scientists? Or is this something you train like you said, across the board? >> Well, I think you want to do a little bit of both. We want more. But, we can also train more and make the ones we have more productive. The way I think about it is there's kind of two markets here. And we call it clickers and coders. >> [Katie] I like that. That's good. >> So, let's talk about what that means. So clickers are basically somebody that wants to use tools. Create models visually. It's drag and drop. Something that's very intuitive. Those are the clickers. Nothing wrong with that. It's been valuable for years. There's a new crop of data scientists. They want to code. They want to build with the latest open source tools. They want to write in Python or R. These are the coders. And both approaches are viable. Both approaches are critical. Organizations have to have a way to meet the needs of both of those types. And there's not a lot of things available today that do that. >> Well let's keep going on that. Because I hear you talking about the data scientists role and how it's critical to success, but with the new tools, data science and analytics skills can extend beyond the domain of just the data scientist. >> That's right. So look, we're unifying coders and clickers into a single platform, which we call IBM Data Science Experience. And as the demand for data science expertise grows, so does the need for these kind of tools. To bring them into the same environment. And my view is if you have the right platform, it enables the organization to collaborate. And suddenly you've changed the nature of data science from an individual sport to a team sport. >> So as somebody that, my background is in IT, the question is really is this an additional piece of what IT needs to do in 2017 and beyond? Or is it just another line item to the budget? >> So I'm afraid that some people might view it that way. As just another line item. But, I would challenge that and say data science is going to reinvent IT. It's going to change the nature of IT. And every organization needs to think about what are the skills that are critical? How do we engage a broader team to do this? Because once they get there, this is the chance to reinvent how they're performing IT. >> [Katie] Challenging or not? >> Look it's all a big challenge. Think about everything IT organizations have been through. Some of them were late to things like mobile, but then they caught up. Some were late to cloud, but then they caught up. I would just urge people, don't be late to data science. Use this as your chance to reinvent IT. Start with this notion of clickers and coders. This is a seminal moment. Much like mobile and cloud was. So don't be late. >> And I think it's critical because it could be so costly to wait. And Rob and I were even chatting earlier how data analytics is just moving into all different kinds of industries. And I can tell you even personally being effected by how important the analysis is in working in pediatric cancer for the last seven years. I personally implement virtual reality headsets to pediatric cancer hospitals across the country. And it's great. And it's working phenomenally. And the kids are amazed. And the staff is amazed. But the phase two of this project is putting in little metrics in the hardware that gather the breathing, the heart rate to show that we have data. Proof that we can hand over to the hospitals to continue making this program a success. So just in-- >> That's a great example. >> An interesting example. >> Saving lives? >> Yes. >> That's also applying a lot of what we talked about. >> Exciting stuff in the world of data science. >> Yes. Look, I just add this is an existential moment for every organization. Because what you do in this area is probably going to define how competitive you are going forward. And think about if you don't do something. What if one of your competitors goes and creates an application that's more engaging with clients? So my recommendation is start small. Experiment. Learn. Iterate on projects. Define the business outcomes. Then scale up. It's very doable. But you've got to take the first step. >> First step always critical. And now we're going to get to the fun hands on part of our story. Because in just a moment we're going to take a closer look at what data science can deliver. And where organizations are trying to get to. All right. Thank you Rob and now we've been joined by Siva Anne who is going to help us navigate this demo. First, welcome Siva. Give him a big round of applause. Yeah. All right, Rob break down what we're going to be looking at. You take over this demo. >> All right. So this is going to be pretty interesting. So Siva is going to take us through. So he's going to play the role of a financial adviser. Who wants to help better serve clients through recommendations. And I'm going to really illustrate three things. One is how do you federate data from multiple data sources? Inside the firewall, outside the firewall. How do you apply machine learning to predict and to automate? And then how do you move analytics closer to your data? So, what you're seeing here is a custom application for an investment firm. So, Siva, our financial adviser, welcome. So you can see at the top, we've got market data. We pulled that from an external source. And then we've got Siva's calendar in the middle. He's got clients on the right side. So page down, what else do you see down there Siva? >> [Siva] I can see the recent market news. And in here I can see that JP Morgan is calling for a US dollar rebound in the second half of the year. And, I have upcoming meeting with Leo Rakes. I can get-- >> [Rob] So let's go in there. Why don't you click on Leo Rakes. So, you're sitting at your desk, you're deciding how you're going to spend the day. You know you have a meeting with Leo. So you click on it. You immediately see, all right, so what do we know about him? We've got data governance implemented. So we know his age, we know his degree. We can see he's not that aggressive of a trader. Only six trades in the last few years. But then where it gets interesting is you go to the bottom. You start to see predicted industry affinity. Where did that come from? How do we have that? >> [Siva] So these green lines and red arrows here indicate the trending affinity of Leo Rakes for particular industry stocks. What we've done here is we've built machine learning models using customer's demographic data, his stock portfolios, and browsing behavior to build a model which can predict his affinity for a particular industry. >> [Rob] Interesting. So, I like to think of this, we call it celebrity experiences. So how do you treat every customer like they're a celebrity? So to some extent, we're reading his mind. Because without asking him, we know that he's going to have an affinity for auto stocks. So we go down. Now we look at his portfolio. You can see okay, he's got some different holdings. He's got Amazon, Google, Apple, and then he's got RACE, which is the ticker for Ferrari. You can see that's done incredibly well. And so, as a financial adviser, you look at this and you say, all right, we know he loves auto stocks. Ferrari's done very well. Let's create a hedge. Like what kind of security would interest him as a hedge against his position for Ferrari? Could we go figure that out? >> [Siva] Yes. Given I know that he's gotten an affinity for auto stocks, and I also see that Ferrari has got some terminus gains, I want to lock in these gains by hedging. And I want to do that by picking a auto stock which has got negative correlation with Ferrari. >> [Rob] So this is where we get to the idea of in database analytics. Cause you start clicking that and immediately we're getting instant answers of what's happening. So what did we find here? We're going to compare Ferrari and Honda. >> [Siva] I'm going to compare Ferrari with Honda. And what I see here instantly is that Honda has got a negative correlation with Ferrari, which makes it a perfect mix for his stock portfolio. Given he has an affinity for auto stocks and it correlates negatively with Ferrari. >> [Rob] These are very powerful tools at the hand of a financial adviser. You think about it. As a financial adviser, you wouldn't think about federating data, machine learning, pretty powerful. >> [Siva] Yes. So what we have seen here is that using the common SQL engine, we've been able to federate queries across multiple data sources. Db2 Warehouse in the cloud, IBM's Integrated Analytic System, and Hortonworks powered Hadoop platform for the new speeds. We've been able to use machine learning to derive innovative insights about his stock affinities. And drive the machine learning into the appliance. Closer to where the data resides to deliver high performance analytics. >> [Rob] At scale? >> [Siva] We're able to run millions of these correlations across stocks, currency, other factors. And even score hundreds of customers for their affinities on a daily basis. >> That's great. Siva, thank you for playing the role of financial adviser. So I just want to recap briefly. Cause this really powerful technology that's really simple. So we federated, we aggregated multiple data sources from all over the web and internal systems. And public cloud systems. Machine learning models were built that predicted Leo's affinity for a certain industry. In this case, automotive. And then you see when you deploy analytics next to your data, even a financial adviser, just with the click of a button is getting instant answers so they can go be more productive in their next meeting. This whole idea of celebrity experiences for your customer, that's available for everybody, if you take advantage of these types of capabilities. Katie, I'll hand it back to you. >> Good stuff. Thank you Rob. Thank you Siva. Powerful demonstration on what we've been talking about all afternoon. And thank you again to Siva for helping us navigate. Should be give him one more round of applause? We're going to be back in just a moment to look at how we operationalize all of this data. But in first, here's a message from me. If you're a part of a line of business, your main fear is disruption. You know data is the new goal that can create huge amounts of value. So does your competition. And they may be beating you to it. You're convinced there are new business models and revenue sources hidden in all the data. You just need to figure out how to leverage it. But with the scarcity of data scientists, you really can't rely solely on them. You may need more people throughout the organization that have the ability to extract value from data. And as a data science leader or data scientist, you have a lot of the same concerns. You spend way too much time looking for, prepping, and interpreting data and waiting for models to train. You know you need to operationalize the work you do to provide business value faster. What you want is an easier way to do data prep. And rapidly build models that can be easily deployed, monitored and automatically updated. So whether you're a data scientist, data science leader, or in a line of business, what's the solution? What'll it take to transform the way you work? That's what we're going to explore next. All right, now it's time to delve deeper into the nuts and bolts. The nitty gritty of operationalizing data science and creating a data driven culture. How do you actually do that? Well that's what these experts are here to share with us. I'm joined by Nir Kaldero, who's head of data science at Galvanize, which is an education and training organization. Tricia Wang, who is co-founder of Sudden Compass, a consultancy that helps companies understand people with data. And last, but certainly not least, Michael Li, founder and CEO of Data Incubator, which is a data science train company. All right guys. Shall we get right to it? >> All right. >> So data explosion happening right now. And we are seeing it across the board. I just shared an example of how it's impacting my philanthropic work in pediatric cancer. But you guys each have so many unique roles in your business life. How are you seeing it just blow up in your fields? Nir, your thing? >> Yeah, for example like in Galvanize we train many Fortune 500 companies. And just by looking at the demand of companies that wants us to help them go through this digital transformation is mind-blowing. Data point by itself. >> Okay. Well what we're seeing what's going on is that data science like as a theme, is that it's actually for everyone now. But what's happening is that it's actually meeting non technical people. But what we're seeing is that when non technical people are implementing these tools or coming at these tools without a base line of data literacy, they're often times using it in ways that distance themselves from the customer. Because they're implementing data science tools without a clear purpose, without a clear problem. And so what we do at Sudden Compass is that we work with companies to help them embrace and understand the complexity of their customers. Because often times they are misusing data science to try and flatten their understanding of the customer. As if you can just do more traditional marketing. Where you're putting people into boxes. And I think the whole ROI of data is that you can now understand people's relationships at a much more complex level at a greater scale before. But we have to do this with basic data literacy. And this has to involve technical and non technical people. >> Well you can have all the data in the world, and I think it speaks to, if you're not doing the proper movement with it, forget it. It means nothing at the same time. >> No absolutely. I mean, I think that when you look at the huge explosion in data, that comes with it a huge explosion in data experts. Right, we call them data scientists, data analysts. And sometimes they're people who are very, very talented, like the people here. But sometimes you have people who are maybe re-branding themselves, right? Trying to move up their title one notch to try to attract that higher salary. And I think that that's one of the things that customers are coming to us for, right? They're saying, hey look, there are a lot of people that call themselves data scientists, but we can't really distinguish. So, we have sort of run a fellowship where you help companies hire from a really talented group of folks, who are also truly data scientists and who know all those kind of really important data science tools. And we also help companies internally. Fortune 500 companies who are looking to grow that data science practice that they have. And we help clients like McKinsey, BCG, Bain, train up their customers, also their clients, also their workers to be more data talented. And to build up that data science capabilities. >> And Nir, this is something you work with a lot. A lot of Fortune 500 companies. And when we were speaking earlier, you were saying many of these companies can be in a panic. >> Yeah. >> Explain that. >> Yeah, so you know, not all Fortune 500 companies are fully data driven. And we know that the winners in this fourth industrial revolution, which I like to call the machine intelligence revolution, will be companies who navigate and transform their organization to unlock the power of data science and machine learning. And the companies that are not like that. Or not utilize data science and predictive power well, will pretty much get shredded. So they are in a panic. >> Tricia, companies have to deal with data behind the firewall and in the new multi cloud world. How do organizations start to become driven right to the core? >> I think the most urgent question to become data driven that companies should be asking is how do I bring the complex reality that our customers are experiencing on the ground in to a corporate office? Into the data models. So that question is critical because that's how you actually prevent any big data disasters. And that's how you leverage big data. Because when your data models are really far from your human models, that's when you're going to do things that are really far off from how, it's going to not feel right. That's when Tesco had their terrible big data disaster that they're still recovering from. And so that's why I think it's really important to understand that when you implement big data, you have to further embrace thick data. The qualitative, the emotional stuff, that is difficult to quantify. But then comes the difficult art and science that I think is the next level of data science. Which is that getting non technical and technical people together to ask how do we find those unknown nuggets of insights that are difficult to quantify? Then, how do we do the next step of figuring out how do you mathematically scale those insights into a data model? So that actually is reflective of human understanding? And then we can start making decisions at scale. But you have to have that first. >> That's absolutely right. And I think that when we think about what it means to be a data scientist, right? I always think about it in these sort of three pillars. You have the math side. You have to have that kind of stats, hardcore machine learning background. You have the programming side. You don't work with small amounts of data. You work with large amounts of data. You've got to be able to type the code to make those computers run. But then the last part is that human element. You have to understand the domain expertise. You have to understand what it is that I'm actually analyzing. What's the business proposition? And how are the clients, how are the users actually interacting with the system? That human element that you were talking about. And I think having somebody who understands all of those and not just in isolation, but is able to marry that understanding across those different topics, that's what makes a data scientist. >> But I find that we don't have people with those skill sets. And right now the way I see teams being set up inside companies is that they're creating these isolated data unicorns. These data scientists that have graduated from your programs, which are great. But, they don't involve the people who are the domain experts. They don't involve the designers, the consumer insight people, the people, the salespeople. The people who spend time with the customers day in and day out. Somehow they're left out of the room. They're consulted, but they're not a stakeholder. >> Can I actually >> Yeah, yeah please. >> Can I actually give a quick example? So for example, we at Galvanize train the executives and the managers. And then the technical people, the data scientists and the analysts. But in order to actually see all of the RY behind the data, you also have to have a creative fluid conversation between non technical and technical people. And this is a major trend now. And there's a major gap. And we need to increase awareness and kind of like create a new, kind of like environment where technical people also talks seamlessly with non technical ones. >> [Tricia] We call-- >> That's one of the things that we see a lot. Is one of the trends in-- >> A major trend. >> data science training is it's not just for the data science technical experts. It's not just for one type of person. So a lot of the training we do is sort of data engineers. People who are more on the software engineering side learning more about the stats of math. And then people who are sort of traditionally on the stat side learning more about the engineering. And then managers and people who are data analysts learning about both. >> Michael, I think you said something that was of interest too because I think we can look at IBM Watson as an example. And working in healthcare. The human component. Because often times we talk about machine learning and AI, and data and you get worried that you still need that human component. Especially in the world of healthcare. And I think that's a very strong point when it comes to the data analysis side. Is there any particular example you can speak to of that? >> So I think that there was this really excellent paper a while ago talking about all the neuro net stuff and trained on textual data. So looking at sort of different corpuses. And they found that these models were highly, highly sexist. They would read these corpuses and it's not because neuro nets themselves are sexist. It's because they're reading the things that we write. And it turns out that we write kind of sexist things. And they would sort of find all these patterns in there that were sort of latent, that had a lot of sort of things that maybe we would cringe at if we sort of saw. And I think that's one of the really important aspects of the human element, right? It's being able to come in and sort of say like, okay, I know what the biases of the system are, I know what the biases of the tools are. I need to figure out how to use that to make the tools, make the world a better place. And like another area where this comes up all the time is lending, right? So the federal government has said, and we have a lot of clients in the financial services space, so they're constantly under these kind of rules that they can't make discriminatory lending practices based on a whole set of protected categories. Race, sex, gender, things like that. But, it's very easy when you train a model on credit scores to pick that up. And then to have a model that's inadvertently sexist or racist. And that's where you need the human element to come back in and say okay, look, you're using the classic example would be zip code, you're using zip code as a variable. But when you look at it, zip codes actually highly correlated with race. And you can't do that. So you may inadvertently by sort of following the math and being a little naive about the problem, inadvertently introduce something really horrible into a model and that's where you need a human element to sort of step in and say, okay hold on. Slow things down. This isn't the right way to go. >> And the people who have -- >> I feel like, I can feel her ready to respond. >> Yes, I'm ready. >> She's like let me have at it. >> And the people here it is. And the people who are really great at providing that human intelligence are social scientists. We are trained to look for bias and to understand bias in data. Whether it's quantitative or qualitative. And I really think that we're going to have less of these kind of problems if we had more integrated teams. If it was a mandate from leadership to say no data science team should be without a social scientist, ethnographer, or qualitative researcher of some kind, to be able to help see these biases. >> The talent piece is actually the most crucial-- >> Yeah. >> one here. If you look about how to enable machine intelligence in organization there are the pillars that I have in my head which is the culture, the talent and the technology infrastructure. And I believe and I saw in working very closely with the Fortune 100 and 200 companies that the talent piece is actually the most important crucial hard to get. >> [Tricia] I totally agree. >> It's absolutely true. Yeah, no I mean I think that's sort of like how we came up with our business model. Companies were basically saying hey, I can't hire data scientists. And so we have a fellowship where we get 2,000 applicants each quarter. We take the top 2% and then we sort of train them up. And we work with hiring companies who then want to hire from that population. And so we're sort of helping them solve that problem. And the other half of it is really around training. Cause with a lot of industries, especially if you're sort of in a more regulated industry, there's a lot of nuances to what you're doing. And the fastest way to develop that data science or AI talent may not necessarily be to hire folks who are coming out of a PhD program. It may be to take folks internally who have a lot of that domain knowledge that you have and get them trained up on those data science techniques. So we've had large insurance companies come to us and say hey look, we hire three or four folks from you a quarter. That doesn't move the needle for us. What we really need is take the thousand actuaries and statisticians that we have and get all of them trained up to become a data scientist and become data literate in this new open source world. >> [Katie] Go ahead. >> All right, ladies first. >> Go ahead. >> Are you sure? >> No please, fight first. >> Go ahead. >> Go ahead Nir. >> So this is actually a trend that we have been seeing in the past year or so that companies kind of like start to look how to upscale and look for talent within the organization. So they can actually move them to become more literate and navigate 'em from analyst to data scientist. And from data scientist to machine learner. So this is actually a trend that is happening already for a year or so. >> Yeah, but I also find that after they've gone through that training in getting people skilled up in data science, the next problem that I get is executives coming to say we've invested in all of this. We're still not moving the needle. We've already invested in the right tools. We've gotten the right skills. We have enough scale of people who have these skills. Why are we not moving the needle? And what I explain to them is look, you're still making decisions in the same way. And you're still not involving enough of the non technical people. Especially from marketing, which is now, the CMO's are much more responsible for driving growth in their companies now. But often times it's so hard to change the old way of marketing, which is still like very segmentation. You know, demographic variable based, and we're trying to move people to say no, you have to understand the complexity of customers and not put them in boxes. >> And I think underlying a lot of this discussion is this question of culture, right? >> Yes. >> Absolutely. >> How do you build a data driven culture? And I think that that culture question, one of the ways that comes up quite often in especially in large, Fortune 500 enterprises, is that they are very, they're not very comfortable with sort of example, open source architecture. Open source tools. And there is some sort of residual bias that that's somehow dangerous. So security vulnerability. And I think that that's part of the cultural challenge that they often have in terms of how do I build a more data driven organization? Well a lot of the talent really wants to use these kind of tools. And I mean, just to give you an example, we are partnering with one of the major cloud providers to sort of help make open source tools more user friendly on their platform. So trying to help them attract the best technologists to use their platform because they want and they understand the value of having that kind of open source technology work seamlessly on their platforms. So I think that just sort of goes to show you how important open source is in this movement. And how much large companies and Fortune 500 companies and a lot of the ones we work with have to embrace that. >> Yeah, and I'm seeing it in our work. Even when we're working with Fortune 500 companies, is that they've already gone through the first phase of data science work. Where I explain it was all about the tools and getting the right tools and architecture in place. And then companies started moving into getting the right skill set in place. Getting the right talent. And what you're talking about with culture is really where I think we're talking about the third phase of data science, which is looking at communication of these technical frameworks so that we can get non technical people really comfortable in the same room with data scientists. That is going to be the phase, that's really where I see the pain point. And that's why at Sudden Compass, we're really dedicated to working with each other to figure out how do we solve this problem now? >> And I think that communication between the technical stakeholders and management and leadership. That's a very critical piece of this. You can't have a successful data science organization without that. >> Absolutely. >> And I think that actually some of the most popular trainings we've had recently are from managers and executives who are looking to say, how do I become more data savvy? How do I figure out what is this data science thing and how do I communicate with my data scientists? >> You guys made this way too easy. I was just going to get some popcorn and watch it play out. >> Nir, last 30 seconds. I want to leave you with an opportunity to, anything you want to add to this conversation? >> I think one thing to conclude is to say that companies that are not data driven is about time to hit refresh and figure how they transition the organization to become data driven. To become agile and nimble so they can actually see what opportunities from this important industrial revolution. Otherwise, unfortunately they will have hard time to survive. >> [Katie] All agreed? >> [Tricia] Absolutely, you're right. >> Michael, Trish, Nir, thank you so much. Fascinating discussion. And thank you guys again for joining us. We will be right back with another great demo. Right after this. >> Thank you Katie. >> Once again, thank you for an excellent discussion. Weren't they great guys? And thank you for everyone who's tuning in on the live webcast. As you can hear, we have an amazing studio audience here. And we're going to keep things moving. I'm now joined by Daniel Hernandez and Siva Anne. And we're going to turn our attention to how you can deliver on what they're talking about using data science experience to do data science faster. >> Thank you Katie. Siva and I are going to spend the next 10 minutes showing you how you can deliver on what they were saying using the IBM Data Science Experience to do data science faster. We'll demonstrate through new features we introduced this week how teams can work together more effectively across the entire analytics life cycle. How you can take advantage of any and all data no matter where it is and what it is. How you could use your favorite tools from open source. And finally how you could build models anywhere and employ them close to where your data is. Remember the financial adviser app Rob showed you? To build an app like that, we needed a team of data scientists, developers, data engineers, and IT staff to collaborate. We do this in the Data Science Experience through a concept we call projects. When I create a new project, I can now use the new Github integration feature. We're doing for data science what we've been doing for developers for years. Distributed teams can work together on analytics projects. And take advantage of Github's version management and change management features. This is a huge deal. Let's explore the project we created for the financial adviser app. As you can see, our data engineer Joane, our developer Rob, and others are collaborating this project. Joane got things started by bringing together the trusted data sources we need to build the app. Taking a closer look at the data, we see that our customer and profile data is stored on our recently announced IBM Integrated Analytics System, which runs safely behind our firewall. We also needed macro economic data, which she was able to find in the Federal Reserve. And she stored it in our Db2 Warehouse on Cloud. And finally, she selected stock news data from NASDAQ.com and landed that in a Hadoop cluster, which happens to be powered by Hortonworks. We added a new feature to the Data Science Experience so that when it's installed with Hortonworks, it automatically uses a need of security and governance controls within the cluster so your data is always secure and safe. Now we want to show you the news data we stored in the Hortonworks cluster. This is the mean administrative console. It's powered by an open source project called Ambari. And here's the news data. It's in parquet files stored in HDFS, which happens to be a distributive file system. To get the data from NASDAQ into our cluster, we used IBM's BigIntegrate and BigQuality to create automatic data pipelines that acquire, cleanse, and ingest that news data. Once the data's available, we use IBM's Big SQL to query that data using SQL statements that are much like the ones we would use for any relation of data, including the data that we have in the Integrated Analytics System and Db2 Warehouse on Cloud. This and the federation capabilities that Big SQL offers dramatically simplifies data acquisition. Now we want to show you how we support a brand new tool that we're excited about. Since we launched last summer, the Data Science Experience has supported Jupyter and R for data analysis and visualization. In this week's update, we deeply integrated another great open source project called Apache Zeppelin. It's known for having great visualization support, advanced collaboration features, and is growing in popularity amongst the data science community. This is an example of Apache Zeppelin and the notebook we created through it to explore some of our data. Notice how wonderful and easy the data visualizations are. Now we want to walk you through the Jupyter notebook we created to explore our customer preference for stocks. We use notebooks to understand and explore data. To identify the features that have some predictive power. Ultimately, we're trying to assess what ultimately is driving customer stock preference. Here we did the analysis to identify the attributes of customers that are likely to purchase auto stocks. We used this understanding to build our machine learning model. For building machine learning models, we've always had tools integrated into the Data Science Experience. But sometimes you need to use tools you already invested in. Like our very own SPSS as well as SAS. Through new import feature, you can easily import those models created with those tools. This helps you avoid vendor lock-in, and simplify the development, training, deployment, and management of all your models. To build the models we used in app, we could have coded, but we prefer a visual experience. We used our customer profile data in the Integrated Analytic System. Used the Auto Data Preparation to cleanse our data. Choose the binary classification algorithms. Let the Data Science Experience evaluate between logistic regression and gradient boosted tree. It's doing the heavy work for us. As you can see here, the Data Science Experience generated performance metrics that show us that the gradient boosted tree is the best performing algorithm for the data we gave it. Once we save this model, it's automatically deployed and available for developers to use. Any application developer can take this endpoint and consume it like they would any other API inside of the apps they built. We've made training and creating machine learning models super simple. But what about the operations? A lot of companies are struggling to ensure their model performance remains high over time. In our financial adviser app, we know that customer data changes constantly, so we need to always monitor model performance and ensure that our models are retrained as is necessary. This is a dashboard that shows the performance of our models and lets our teams monitor and retrain those models so that they're always performing to our standards. So far we've been showing you the Data Science Experience available behind the firewall that we're using to build and train models. Through a new publish feature, you can build models and deploy them anywhere. In another environment, private, public, or anywhere else with just a few clicks. So here we're publishing our model to the Watson machine learning service. It happens to be in the IBM cloud. And also deeply integrated with our Data Science Experience. After publishing and switching to the Watson machine learning service, you can see that our stock affinity and model that we just published is there and ready for use. So this is incredibly important. I just want to say it again. The Data Science Experience allows you to train models behind your own firewall, take advantage of your proprietary and sensitive data, and then deploy those models wherever you want with ease. So summarize what we just showed you. First, IBM's Data Science Experience supports all teams. You saw how our data engineer populated our project with trusted data sets. Our data scientists developed, trained, and tested a machine learning model. Our developers used APIs to integrate machine learning into their apps. And how IT can use our Integrated Model Management dashboard to monitor and manage model performance. Second, we support all data. On premises, in the cloud, structured, unstructured, inside of your firewall, and outside of it. We help you bring analytics and governance to where your data is. Third, we support all tools. The data science tools that you depend on are readily available and deeply integrated. This includes capabilities from great partners like Hortonworks. And powerful tools like our very own IBM SPSS. And fourth, and finally, we support all deployments. You can build your models anywhere, and deploy them right next to where your data is. Whether that's in the public cloud, private cloud, or even on the world's most reliable transaction platform, IBM z. So see for yourself. Go to the Data Science Experience website, take us for a spin. And if you happen to be ready right now, our recently created Data Science Elite Team can help you get started and run experiments alongside you with no charge. Thank you very much. >> Thank you very much Daniel. It seems like a great time to get started. And thanks to Siva for taking us through it. Rob and I will be back in just a moment to add some perspective right after this. All right, once again joined by Rob Thomas. And Rob obviously we got a lot of information here. >> Yes, we've covered a lot of ground. >> This is intense. You got to break it down for me cause I think we zoom out and see the big picture. What better data science can deliver to a business? Why is this so important? I mean we've heard it through and through. >> Yeah, well, I heard it a couple times. But it starts with businesses have to embrace a data driven culture. And it is a change. And we need to make data accessible with the right tools in a collaborative culture because we've got diverse skill sets in every organization. But data driven companies succeed when data science tools are in the hands of everyone. And I think that's a new thought. I think most companies think just get your data scientist some tools, you'll be fine. This is about tools in the hands of everyone. I think the panel did a great job of describing about how we get to data science for all. Building a data culture, making it a part of your everyday operations, and the highlights of what Daniel just showed us, that's some pretty cool features for how organizations can get to this, which is you can see IBM's Data Science Experience, how that supports all teams. You saw data analysts, data scientists, application developer, IT staff, all working together. Second, you saw how we support all tools. And your choice of tools. So the most popular data science libraries integrated into one platform. And we saw some new capabilities that help companies avoid lock-in, where you can import existing models created from specialist tools like SPSS or others. And then deploy them and manage them inside of Data Science Experience. That's pretty interesting. And lastly, you see we continue to build on this best of open tools. Partnering with companies like H2O, Hortonworks, and others. Third, you can see how you use all data no matter where it lives. That's a key challenge every organization's going to face. Private, public, federating all data sources. We announced new integration with the Hortonworks data platform where we deploy machine learning models where your data resides. That's been a key theme. Analytics where the data is. And lastly, supporting all types of deployments. Deploy them in your Hadoop cluster. Deploy them in your Integrated Analytic System. Or deploy them in z, just to name a few. A lot of different options here. But look, don't believe anything I say. Go try it for yourself. Data Science Experience, anybody can use it. Go to datascience.ibm.com and look, if you want to start right now, we just created a team that we call Data Science Elite. These are the best data scientists in the world that will come sit down with you and co-create solutions, models, and prove out a proof of concept. >> Good stuff. Thank you Rob. So you might be asking what does an organization look like that embraces data science for all? And how could it transform your role? I'm going to head back to the office and check it out. Let's start with the perspective of the line of business. What's changed? Well, now you're starting to explore new business models. You've uncovered opportunities for new revenue sources and all that hidden data. And being disrupted is no longer keeping you up at night. As a data science leader, you're beginning to collaborate with a line of business to better understand and translate the objectives into the models that are being built. Your data scientists are also starting to collaborate with the less technical team members and analysts who are working closest to the business problem. And as a data scientist, you stop feeling like you're falling behind. Open source tools are keeping you current. You're also starting to operationalize the work that you do. And you get to do more of what you love. Explore data, build models, put your models into production, and create business impact. All in all, it's not a bad scenario. Thanks. All right. We are back and coming up next, oh this is a special time right now. Cause we got a great guest speaker. New York Magazine called him the spreadsheet psychic and number crunching prodigy who went from correctly forecasting baseball games to correctly forecasting presidential elections. He even invented a proprietary algorithm called PECOTA for predicting future performance by baseball players and teams. And his New York Times bestselling book, The Signal and the Noise was named by Amazon.com as the number one best non-fiction book of 2012. He's currently the Editor in Chief of the award winning website, FiveThirtyEight and appears on ESPN as an on air commentator. Big round of applause. My pleasure to welcome Nate Silver. >> Thank you. We met backstage. >> Yes. >> It feels weird to re-shake your hand, but you know, for the audience. >> I had to give the intense firm grip. >> Definitely. >> The ninja grip. So you and I have crossed paths kind of digitally in the past, which it really interesting, is I started my career at ESPN. And I started as a production assistant, then later back on air for sports technology. And I go to you to talk about sports because-- >> Yeah. >> Wow, has ESPN upped their game in terms of understanding the importance of data and analytics. And what it brings. Not just to MLB, but across the board. >> No, it's really infused into the way they present the broadcast. You'll have win probability on the bottom line. And they'll incorporate FiveThirtyEight metrics into how they cover college football for example. So, ESPN ... Sports is maybe the perfect, if you're a data scientist, like the perfect kind of test case. And the reason being that sports consists of problems that have rules. And have structure. And when problems have rules and structure, then it's a lot easier to work with. So it's a great way to kind of improve your skills as a data scientist. Of course, there are also important real world problems that are more open ended, and those present different types of challenges. But it's such a natural fit. The teams. Think about the teams playing the World Series tonight. The Dodgers and the Astros are both like very data driven, especially Houston. Golden State Warriors, the NBA Champions, extremely data driven. New England Patriots, relative to an NFL team, it's shifted a little bit, the NFL bar is lower. But the Patriots are certainly very analytical in how they make decisions. So, you can't talk about sports without talking about analytics. >> And I was going to save the baseball question for later. Cause we are moments away from game seven. >> Yeah. >> Is everyone else watching game seven? It's been an incredible series. Probably one of the best of all time. >> Yeah, I mean-- >> You have a prediction here? >> You can mention that too. So I don't have a prediction. FiveThirtyEight has the Dodgers with a 60% chance of winning. >> [Katie] LA Fans. >> So you have two teams that are about equal. But the Dodgers pitching staff is in better shape at the moment. The end of a seven game series. And they're at home. >> But the statistics behind the two teams is pretty incredible. >> Yeah. It's like the first World Series in I think 56 years or something where you have two 100 win teams facing one another. There have been a lot of parity in baseball for a lot of years. Not that many offensive overall juggernauts. But this year, and last year with the Cubs and the Indians too really. But this year, you have really spectacular teams in the World Series. It kind of is a showcase of modern baseball. Lots of home runs. Lots of strikeouts. >> [Katie] Lots of extra innings. >> Lots of extra innings. Good defense. Lots of pitching changes. So if you love the modern baseball game, it's been about the best example that you've had. If you like a little bit more contact, and fewer strikeouts, maybe not so much. But it's been a spectacular and very exciting World Series. It's amazing to talk. MLB is huge with analysis. I mean, hands down. But across the board, if you can provide a few examples. Because there's so many teams in front offices putting such an, just a heavy intensity on the analysis side. And where the teams are going. And if you could provide any specific examples of teams that have really blown your mind. Especially over the last year or two. Because every year it gets more exciting if you will. I mean, so a big thing in baseball is defensive shifts. So if you watch tonight, you'll probably see a couple of plays where if you're used to watching baseball, a guy makes really solid contact. And there's a fielder there that you don't think should be there. But that's really very data driven where you analyze where's this guy hit the ball. That part's not so hard. But also there's game theory involved. Because you have to adjust for the fact that he knows where you're positioning the defenders. He's trying therefore to make adjustments to his own swing and so that's been a major innovation in how baseball is played. You know, how bullpens are used too. Where teams have realized that actually having a guy, across all sports pretty much, realizing the importance of rest. And of fatigue. And that you can be the best pitcher in the world, but guess what? After four or five innings, you're probably not as good as a guy who has a fresh arm necessarily. So I mean, it really is like, these are not subtle things anymore. It's not just oh, on base percentage is valuable. It really effects kind of every strategic decision in baseball. The NBA, if you watch an NBA game tonight, see how many three point shots are taken. That's in part because of data. And teams realizing hey, three points is worth more than two, once you're more than about five feet from the basket, the shooting percentage gets really flat. And so it's revolutionary, right? Like teams that will shoot almost half their shots from the three point range nowadays. Larry Bird, who wound up being one of the greatest three point shooters of all time, took only eight three pointers his first year in the NBA. It's quite noticeable if you watch baseball or basketball in particular. >> Not to focus too much on sports. One final question. In terms of Major League Soccer, and now in NFL, we're having the analysis and having wearables where it can now showcase if they wanted to on screen, heart rate and breathing and how much exertion. How much data is too much data? And when does it ruin the sport? >> So, I don't think, I mean, again, it goes sport by sport a little bit. I think in basketball you actually have a more exciting game. I think the game is more open now. You have more three pointers. You have guys getting higher assist totals. But you know, I don't know. I'm not one of those people who thinks look, if you love baseball or basketball, and you go in to work for the Astros, the Yankees or the Knicks, they probably need some help, right? You really have to be passionate about that sport. Because it's all based on what questions am I asking? As I'm a fan or I guess an employee of the team. Or a player watching the game. And there isn't really any substitute I don't think for the insight and intuition that a curious human has to kind of ask the right questions. So we can talk at great length about what tools do you then apply when you have those questions, but that still comes from people. I don't think machine learning could help with what questions do I want to ask of the data. It might help you get the answers. >> If you have a mid-fielder in a soccer game though, not exerting, only 80%, and you're seeing that on a screen as a fan, and you're saying could that person get fired at the end of the day? One day, with the data? >> So we found that actually some in soccer in particular, some of the better players are actually more still. So Leo Messi, maybe the best player in the world, doesn't move as much as other soccer players do. And the reason being that A) he kind of knows how to position himself in the first place. B) he realizes that you make a run, and you're out of position. That's quite fatiguing. And particularly soccer, like basketball, is a sport where it's incredibly fatiguing. And so, sometimes the guys who conserve their energy, that kind of old school mentality, you have to hustle at every moment. That is not helpful to the team if you're hustling on an irrelevant play. And therefore, on a critical play, can't get back on defense, for example. >> Sports, but also data is moving exponentially as we're just speaking about today. Tech, healthcare, every different industry. Is there any particular that's a favorite of yours to cover? And I imagine they're all different as well. >> I mean, I do like sports. We cover a lot of politics too. Which is different. I mean in politics I think people aren't intuitively as data driven as they might be in sports for example. It's impressive to follow the breakthroughs in artificial intelligence. It started out just as kind of playing games and playing chess and poker and Go and things like that. But you really have seen a lot of breakthroughs in the last couple of years. But yeah, it's kind of infused into everything really. >> You're known for your work in politics though. Especially presidential campaigns. >> Yeah. >> This year, in particular. Was it insanely challenging? What was the most notable thing that came out of any of your predictions? >> I mean, in some ways, looking at the polling was the easiest lens to look at it. So I think there's kind of a myth that last year's result was a big shock and it wasn't really. If you did the modeling in the right way, then you realized that number one, polls have a margin of error. And so when a candidate has a three point lead, that's not particularly safe. Number two, the outcome between different states is correlated. Meaning that it's not that much of a surprise that Clinton lost Wisconsin and Michigan and Pennsylvania and Ohio. You know I'm from Michigan. Have friends from all those states. Kind of the same types of people in those states. Those outcomes are all correlated. So what people thought was a big upset for the polls I think was an example of how data science done carefully and correctly where you understand probabilities, understand correlations. Our model gave Trump a 30% chance of winning. Others models gave him a 1% chance. And so that was interesting in that it showed that number one, that modeling strategies and skill do matter quite a lot. When you have someone saying 30% versus 1%. I mean, that's a very very big spread. And number two, that these aren't like solved problems necessarily. Although again, the problem with elections is that you only have one election every four years. So I can be very confident that I have a better model. Even one year of data doesn't really prove very much. Even five or 10 years doesn't really prove very much. And so, being aware of the limitations to some extent intrinsically in elections when you only get one kind of new training example every four years, there's not really any way around that. There are ways to be more robust to sparce data environments. But if you're identifying different types of business problems to solve, figuring out what's a solvable problem where I can add value with data science is a really key part of what you're doing. >> You're such a leader in this space. In data and analysis. It would be interesting to kind of peek back the curtain, understand how you operate but also how large is your team? How you're putting together information. How quickly you're putting it out. Cause I think in this right now world where everybody wants things instantly-- >> Yeah. >> There's also, you want to be first too in the world of journalism. But you don't want to be inaccurate because that's your credibility. >> We talked about this before, right? I think on average, speed is a little bit overrated in journalism. >> [Katie] I think it's a big problem in journalism. >> Yeah. >> Especially in the tech world. You have to be first. You have to be first. And it's just pumping out, pumping out. And there's got to be more time spent on stories if I can speak subjectively. >> Yeah, for sure. But at the same time, we are reacting to the news. And so we have people that come in, we hire most of our people actually from journalism. >> [Katie] How many people do you have on your team? >> About 35. But, if you get someone who comes in from an academic track for example, they might be surprised at how fast journalism is. That even though we might be slower than the average website, the fact that there's a tragic event in New York, are there things we have to say about that? A candidate drops out of the presidential race, are things we have to say about that. In periods ranging from minutes to days as opposed to kind of weeks to months to years in the academic world. The corporate world moves faster. What is a little different about journalism is that you are expected to have more precision where people notice when you make a mistake. In corporations, you have maybe less transparency. If you make 10 investments and seven of them turn out well, then you'll get a lot of profit from that, right? In journalism, it's a little different. If you make kind of seven predictions or say seven things, and seven of them are very accurate and three of them aren't, you'll still get criticized a lot for the three. Just because that's kind of the way that journalism is. And so the kind of combination of needing, not having that much tolerance for mistakes, but also needing to be fast. That is tricky. And I criticize other journalists sometimes including for not being data driven enough, but the best excuse any journalist has, this is happening really fast and it's my job to kind of figure out in real time what's going on and provide useful information to the readers. And that's really difficult. Especially in a world where literally, I'll probably get off the stage and check my phone and who knows what President Trump will have tweeted or what things will have happened. But it really is a kind of 24/7. >> Well because it's 24/7 with FiveThirtyEight, one of the most well known sites for data, are you feeling micromanagey on your people? Because you do have to hit this balance. You can't have something come out four or five days later. >> Yeah, I'm not -- >> Are you overseeing everything? >> I'm not by nature a micromanager. And so you try to hire well. You try and let people make mistakes. And the flip side of this is that if a news organization that never had any mistakes, never had any corrections, that's raw, right? You have to have some tolerance for error because you are trying to decide things in real time. And figure things out. I think transparency's a big part of that. Say here's what we think, and here's why we think it. If we have a model to say it's not just the final number, here's a lot of detail about how that's calculated. In some case we release the code and the raw data. Sometimes we don't because there's a proprietary advantage. But quite often we're saying we want you to trust us and it's so important that you trust us, here's the model. Go play around with it yourself. Here's the data. And that's also I think an important value. >> That speaks to open source. And your perspective on that in general. >> Yeah, I mean, look, I'm a big fan of open source. I worry that I think sometimes the trends are a little bit away from open source. But by the way, one thing that happens when you share your data or you share your thinking at least in lieu of the data, and you can definitely do both is that readers will catch embarrassing mistakes that you made. By the way, even having open sourceness within your team, I mean we have editors and copy editors who often save you from really embarrassing mistakes. And by the way, it's not necessarily people who have a training in data science. I would guess that of our 35 people, maybe only five to 10 have a kind of formal background in what you would call data science. >> [Katie] I think that speaks to the theme here. >> Yeah. >> [Katie] That everybody's kind of got to be data literate. >> But yeah, it is like you have a good intuition. You have a good BS detector basically. And you have a good intuition for hey, this looks a little bit out of line to me. And sometimes that can be based on domain knowledge, right? We have one of our copy editors, she's a big college football fan. And we had an algorithm we released that tries to predict what the human being selection committee will do, and she was like, why is LSU rated so high? Cause I know that LSU sucks this year. And we looked at it, and she was right. There was a bug where it had forgotten to account for their last game where they lost to Troy or something and so -- >> That also speaks to the human element as well. >> It does. In general as a rule, if you're designing a kind of regression based model, it's different in machine learning where you have more, when you kind of build in the tolerance for error. But if you're trying to do something more precise, then so much of it is just debugging. It's saying that looks wrong to me. And I'm going to investigate that. And sometimes it's not wrong. Sometimes your model actually has an insight that you didn't have yourself. But fairly often, it is. And I think kind of what you learn is like, hey if there's something that bothers me, I want to go investigate that now and debug that now. Because the last thing you want is where all of a sudden, the answer you're putting out there in the world hinges on a mistake that you made. Cause you never know if you have so to speak, 1,000 lines of code and they all perform something differently. You never know when you get in a weird edge case where this one decision you made winds up being the difference between your having a good forecast and a bad one. In a defensible position and a indefensible one. So we definitely are quite diligent and careful. But it's also kind of knowing like, hey, where is an approximation good enough and where do I need more precision? Cause you could also drive yourself crazy in the other direction where you know, it doesn't matter if the answer is 91.2 versus 90. And so you can kind of go 91.2, three, four and it's like kind of A) false precision and B) not a good use of your time. So that's where I do still spend a lot of time is thinking about which problems are "solvable" or approachable with data and which ones aren't. And when they're not by the way, you're still allowed to report on them. We are a news organization so we do traditional reporting as well. And then kind of figuring out when do you need precision versus when is being pointed in the right direction good enough? >> I would love to get inside your brain and see how you operate on just like an everyday walking to Walgreens movement. It's like oh, if I cross the street in .2-- >> It's not, I mean-- >> Is it like maddening in there? >> No, not really. I mean, I'm like-- >> This is an honest question. >> If I'm looking for airfares, I'm a little more careful. But no, part of it's like you don't want to waste time on unimportant decisions, right? I will sometimes, if I can't decide what to eat at a restaurant, I'll flip a coin. If the chicken and the pasta both sound really good-- >> That's not high tech Nate. We want better. >> But that's the point, right? It's like both the chicken and the pasta are going to be really darn good, right? So I'm not going to waste my time trying to figure it out. I'm just going to have an arbitrary way to decide. >> Serious and business, how organizations in the last three to five years have just evolved with this data boom. How are you seeing it as from a consultant point of view? Do you think it's an exciting time? Do you think it's a you must act now time? >> I mean, we do know that you definitely see a lot of talent among the younger generation now. That so FiveThirtyEight has been at ESPN for four years now. And man, the quality of the interns we get has improved so much in four years. The quality of the kind of young hires that we make straight out of college has improved so much in four years. So you definitely do see a younger generation for which this is just part of their bloodstream and part of their DNA. And also, particular fields that we're interested in. So we're interested in people who have both a data and a journalism background. We're interested in people who have a visualization and a coding background. A lot of what we do is very much interactive graphics and so forth. And so we do see those skill sets coming into play a lot more. And so the kind of shortage of talent that had I think frankly been a problem for a long time, I'm optimistic based on the young people in our office, it's a little anecdotal but you can tell that there are so many more programs that are kind of teaching students the right set of skills that maybe weren't taught as much a few years ago. >> But when you're seeing these big organizations, ESPN as perfect example, moving more towards data and analytics than ever before. >> Yeah. >> You would say that's obviously true. >> Oh for sure. >> If you're not moving that direction, you're going to fall behind quickly. >> Yeah and the thing is, if you read my book or I guess people have a copy of the book. In some ways it's saying hey, there are lot of ways to screw up when you're using data. And we've built bad models. We've had models that were bad and got good results. Good models that got bad results and everything else. But the point is that the reason to be out in front of the problem is so you give yourself more runway to make errors and mistakes. And to learn kind of what works and what doesn't and which people to put on the problem. I sometimes do worry that a company says oh we need data. And everyone kind of agrees on that now. We need data science. Then they have some big test case. And they have a failure. And they maybe have a failure because they didn't know really how to use it well enough. But learning from that and iterating on that. And so by the time that you're on the third generation of kind of a problem that you're trying to solve, and you're watching everyone else make the mistake that you made five years ago, I mean, that's really powerful. But that doesn't mean that getting invested in it now, getting invested both in technology and the human capital side is important. >> Final question for you as we run out of time. 2018 beyond, what is your biggest project in terms of data gathering that you're working on? >> There's a midterm election coming up. That's a big thing for us. We're also doing a lot of work with NBA data. So for four years now, the NBA has been collecting player tracking data. So they have 3D cameras in every arena. So they can actually kind of quantify for example how fast a fast break is, for example. Or literally where a player is and where the ball is. For every NBA game now for the past four or five years. And there hasn't really been an overall metric of player value that's taken advantage of that. The teams do it. But in the NBA, the teams are a little bit ahead of journalists and analysts. So we're trying to have a really truly next generation stat. It's a lot of data. Sometimes I now more oversee things than I once did myself. And so you're parsing through many, many, many lines of code. But yeah, so we hope to have that out at some point in the next few months. >> Anything you've personally been passionate about that you've wanted to work on and kind of solve? >> I mean, the NBA thing, I am a pretty big basketball fan. >> You can do better than that. Come on, I want something real personal that you're like I got to crunch the numbers. >> You know, we tried to figure out where the best burrito in America was a few years ago. >> I'm going to end it there. >> Okay. >> Nate, thank you so much for joining us. It's been an absolute pleasure. Thank you. >> Cool, thank you. >> I thought we were going to chat World Series, you know. Burritos, important. I want to thank everybody here in our audience. Let's give him a big round of applause. >> [Nate] Thank you everyone. >> Perfect way to end the day. And for a replay of today's program, just head on over to ibm.com/dsforall. I'm Katie Linendoll. And this has been Data Science for All: It's a Whole New Game. Test one, two. One, two, three. Hi guys, I just want to quickly let you know as you're exiting. A few heads up. Downstairs right now there's going to be a meet and greet with Nate. And we're going to be doing that with clients and customers who are interested. So I would recommend before the game starts, and you lose Nate, head on downstairs. And also the gallery is open until eight p.m. with demos and activations. And tomorrow, make sure to come back too. Because we have exciting stuff. I'll be joining you as your host. And we're kicking off at nine a.m. So bye everybody, thank you so much. >> [Announcer] Ladies and gentlemen, thank you for attending this evening's webcast. If you are not attending all cloud and cognitive summit tomorrow, we ask that you recycle your name badge at the registration desk. Thank you. Also, please note there are two exits on the back of the room on either side of the room. Have a good evening. Ladies and gentlemen, the meet and greet will be on stage. Thank you.
SUMMARY :
Today the ability to extract value from data is becoming a shared mission. And for all of you during the program, I want to remind you to join that conversation on And when you and I chatted about it. And the scale and complexity of the data that organizations are having to deal with has It's challenging in the world of unmanageable. And they have to find a way. AI. And it's incredible that this buzz word is happening. And to get to an AI future, you have to lay a data foundation today. And four is you got to expand job roles in the organization. First pillar in this you just discussed. And now you get to where we are today. And if you don't have a strategy for how you acquire that and manage it, you're not going And the way I think about that is it's really about moving from static data repositories And we continue with the architecture. So you need a way to federate data across different environments. So we've laid out what you need for driving automation. And so when you think about the real use cases that are driving return on investment today, Let's go ahead and come back to something that you mentioned earlier because it's fascinating And so the new job roles is about how does everybody have data first in their mind? Everybody in the company has to be data literate. So overall, group effort, has to be a common goal, and we all need to be data literate But at the end of the day, it's kind of not an easy task. It's not easy but it's maybe not as big of a shift as you would think. It's interesting to hear you say essentially you need to train everyone though across the And look, if you want to get your hands on code and just dive right in, you go to datascience.ibm.com. And I've heard that the placement behind those jobs, people graduating with the MS is high. Let me get back to something else you touched on earlier because you mentioned that a number They produce a lot of the shows that I'm sure you watch Katie. And this is a good example. So they have to optimize every aspect of their business from marketing campaigns to promotions And so, as we talk to clients we think about how do you start down this path now, even It's analytics first to the data, not the other way around. We as a practice, we say you want to bring data to where the data sits. And a Harvard Business Review even dubbed it the sexiest job of the 21st century. Female preferred, on the cover of Vogue. And how does it change everything? And while it's important to recognize this critical skill set, you can't just limit it And we call it clickers and coders. [Katie] I like that. And there's not a lot of things available today that do that. Because I hear you talking about the data scientists role and how it's critical to success, And my view is if you have the right platform, it enables the organization to collaborate. And every organization needs to think about what are the skills that are critical? Use this as your chance to reinvent IT. And I can tell you even personally being effected by how important the analysis is in working And think about if you don't do something. And now we're going to get to the fun hands on part of our story. And then how do you move analytics closer to your data? And in here I can see that JP Morgan is calling for a US dollar rebound in the second half But then where it gets interesting is you go to the bottom. data, his stock portfolios, and browsing behavior to build a model which can predict his affinity And so, as a financial adviser, you look at this and you say, all right, we know he loves And I want to do that by picking a auto stock which has got negative correlation with Ferrari. Cause you start clicking that and immediately we're getting instant answers of what's happening. And what I see here instantly is that Honda has got a negative correlation with Ferrari, As a financial adviser, you wouldn't think about federating data, machine learning, pretty And drive the machine learning into the appliance. And even score hundreds of customers for their affinities on a daily basis. And then you see when you deploy analytics next to your data, even a financial adviser, And as a data science leader or data scientist, you have a lot of the same concerns. But you guys each have so many unique roles in your business life. And just by looking at the demand of companies that wants us to help them go through this And I think the whole ROI of data is that you can now understand people's relationships Well you can have all the data in the world, and I think it speaks to, if you're not doing And I think that that's one of the things that customers are coming to us for, right? And Nir, this is something you work with a lot. And the companies that are not like that. Tricia, companies have to deal with data behind the firewall and in the new multi cloud And so that's why I think it's really important to understand that when you implement big And how are the clients, how are the users actually interacting with the system? And right now the way I see teams being set up inside companies is that they're creating But in order to actually see all of the RY behind the data, you also have to have a creative That's one of the things that we see a lot. So a lot of the training we do is sort of data engineers. And I think that's a very strong point when it comes to the data analysis side. And that's where you need the human element to come back in and say okay, look, you're And the people who are really great at providing that human intelligence are social scientists. the talent piece is actually the most important crucial hard to get. It may be to take folks internally who have a lot of that domain knowledge that you have And from data scientist to machine learner. And what I explain to them is look, you're still making decisions in the same way. And I mean, just to give you an example, we are partnering with one of the major cloud And what you're talking about with culture is really where I think we're talking about And I think that communication between the technical stakeholders and management You guys made this way too easy. I want to leave you with an opportunity to, anything you want to add to this conversation? I think one thing to conclude is to say that companies that are not data driven is And thank you guys again for joining us. And we're going to turn our attention to how you can deliver on what they're talking about And finally how you could build models anywhere and employ them close to where your data is. And thanks to Siva for taking us through it. You got to break it down for me cause I think we zoom out and see the big picture. And we saw some new capabilities that help companies avoid lock-in, where you can import And as a data scientist, you stop feeling like you're falling behind. We met backstage. And I go to you to talk about sports because-- And what it brings. And the reason being that sports consists of problems that have rules. And I was going to save the baseball question for later. Probably one of the best of all time. FiveThirtyEight has the Dodgers with a 60% chance of winning. So you have two teams that are about equal. It's like the first World Series in I think 56 years or something where you have two 100 And that you can be the best pitcher in the world, but guess what? And when does it ruin the sport? So we can talk at great length about what tools do you then apply when you have those And the reason being that A) he kind of knows how to position himself in the first place. And I imagine they're all different as well. But you really have seen a lot of breakthroughs in the last couple of years. You're known for your work in politics though. What was the most notable thing that came out of any of your predictions? And so, being aware of the limitations to some extent intrinsically in elections when It would be interesting to kind of peek back the curtain, understand how you operate but But you don't want to be inaccurate because that's your credibility. I think on average, speed is a little bit overrated in journalism. And there's got to be more time spent on stories if I can speak subjectively. And so we have people that come in, we hire most of our people actually from journalism. And so the kind of combination of needing, not having that much tolerance for mistakes, Because you do have to hit this balance. And so you try to hire well. And your perspective on that in general. But by the way, one thing that happens when you share your data or you share your thinking And you have a good intuition for hey, this looks a little bit out of line to me. And I think kind of what you learn is like, hey if there's something that bothers me, It's like oh, if I cross the street in .2-- I mean, I'm like-- But no, part of it's like you don't want to waste time on unimportant decisions, right? We want better. It's like both the chicken and the pasta are going to be really darn good, right? Serious and business, how organizations in the last three to five years have just And man, the quality of the interns we get has improved so much in four years. But when you're seeing these big organizations, ESPN as perfect example, moving more towards But the point is that the reason to be out in front of the problem is so you give yourself Final question for you as we run out of time. And so you're parsing through many, many, many lines of code. You can do better than that. You know, we tried to figure out where the best burrito in America was a few years Nate, thank you so much for joining us. I thought we were going to chat World Series, you know. And also the gallery is open until eight p.m. with demos and activations. If you are not attending all cloud and cognitive summit tomorrow, we ask that you recycle your
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Tricia Wang | PERSON | 0.99+ |
Katie | PERSON | 0.99+ |
Katie Linendoll | PERSON | 0.99+ |
Rob | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
Joane | PERSON | 0.99+ |
Daniel | PERSON | 0.99+ |
Michael Li | PERSON | 0.99+ |
Nate Silver | PERSON | 0.99+ |
Apple | ORGANIZATION | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
Trump | PERSON | 0.99+ |
Nate | PERSON | 0.99+ |
Honda | ORGANIZATION | 0.99+ |
Siva | PERSON | 0.99+ |
McKinsey | ORGANIZATION | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Larry Bird | PERSON | 0.99+ |
2017 | DATE | 0.99+ |
Rob Thomas | PERSON | 0.99+ |
Michigan | LOCATION | 0.99+ |
Yankees | ORGANIZATION | 0.99+ |
New York | LOCATION | 0.99+ |
Clinton | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Tesco | ORGANIZATION | 0.99+ |
Michael | PERSON | 0.99+ |
America | LOCATION | 0.99+ |
Leo | PERSON | 0.99+ |
four years | QUANTITY | 0.99+ |
five | QUANTITY | 0.99+ |
30% | QUANTITY | 0.99+ |
Astros | ORGANIZATION | 0.99+ |
Trish | PERSON | 0.99+ |
Sudden Compass | ORGANIZATION | 0.99+ |
Leo Messi | PERSON | 0.99+ |
two teams | QUANTITY | 0.99+ |
1,000 lines | QUANTITY | 0.99+ |
one year | QUANTITY | 0.99+ |
10 investments | QUANTITY | 0.99+ |
NASDAQ | ORGANIZATION | 0.99+ |
The Signal and the Noise | TITLE | 0.99+ |
Tricia | PERSON | 0.99+ |
Nir Kaldero | PERSON | 0.99+ |
80% | QUANTITY | 0.99+ |
BCG | ORGANIZATION | 0.99+ |
Daniel Hernandez | PERSON | 0.99+ |
ESPN | ORGANIZATION | 0.99+ |
H2O | ORGANIZATION | 0.99+ |
Ferrari | ORGANIZATION | 0.99+ |
last year | DATE | 0.99+ |
18 | QUANTITY | 0.99+ |
three | QUANTITY | 0.99+ |
Data Incubator | ORGANIZATION | 0.99+ |
Patriots | ORGANIZATION | 0.99+ |
Wikibon Presents: Software is Eating the Edge | The Entangling of Big Data and IIoT
>> So as folks make their way over from Javits I'm going to give you the least interesting part of the evening and that's my segment in which I welcome you here, introduce myself, lay out what what we're going to do for the next couple of hours. So first off, thank you very much for coming. As all of you know Wikibon is a part of SiliconANGLE which also includes theCUBE, so if you look around, this is what we have been doing for the past couple of days here in the TheCUBE. We've been inviting some significant thought leaders from over on the show and in incredibly expensive limousines driven them up the street to come on to TheCUBE and spend time with us and talk about some of the things that are happening in the industry today that are especially important. We tore it down, and we're having this party tonight. So we want to thank you very much for coming and look forward to having more conversations with all of you. Now what are we going to talk about? Well Wikibon is the research arm of SiliconANGLE. So we take data that comes out of TheCUBE and other places and we incorporated it into our research. And work very closely with large end users and large technology companies regarding how to make better decisions in this incredibly complex, incredibly important transformative world of digital business. What we're going to talk about tonight, and I've got a couple of my analysts assembled, and we're also going to have a panel, is this notion of software is eating the Edge. Now most of you have probably heard Marc Andreessen, the venture capitalist and developer, original developer of Netscape many years ago, talk about how software's eating the world. Well, if software is truly going to eat the world, it's going to eat at, it's going to take the big chunks, big bites at the Edge. That's where the actual action's going to be. And what we want to talk about specifically is the entangling of the internet or the industrial internet of things and IoT with analytics. So that's what we're going to talk about over the course of the next couple of hours. To do that we're going to, I've already blown the schedule, that's on me. But to do that I'm going to spend a couple minutes talking about what we regard as the essential digital business capabilities which includes analytics and Big Data, and includes IIoT and we'll explain at least in our position why those two things come together the way that they do. But I'm going to ask the august and revered Neil Raden, Wikibon analyst to come on up and talk about harvesting value at the Edge. 'Cause there are some, not now Neil, when we're done, when I'm done. So I'm going to ask Neil to come on up and we'll talk, he's going to talk about harvesting value at the Edge. And then Jim Kobielus will follow up with him, another Wikibon analyst, he'll talk specifically about how we're going to take that combination of analytics and Edge and turn it into the new types of systems and software that are going to sustain this significant transformation that's going on. And then after that, I'm going to ask Neil and Jim to come, going to invite some other folks up and we're going to run a panel to talk about some of these issues and do a real question and answer. So the goal here is before we break for drinks is to create a community feeling within the room. That includes smart people here, smart people in the audience having a conversation ultimately about some of these significant changes so please participate and we look forward to talking about the rest of it. All right, let's get going! What is digital business? One of the nice things about being an analyst is that you can reach back on people who were significantly smarter than you and build your points of view on the shoulders of those giants including Peter Drucker. Many years ago Peter Drucker made the observation that the purpose of business is to create and keep a customer. Not better shareholder value, not anything else. It is about creating and keeping your customer. Now you can argue with that, at the end of the day, if you don't have customers, you don't have a business. Now the observation that we've made, what we've added to that is that we've made the observation that the difference between business and digital business essentially is one thing. That's data. A digital business uses data to differentially create and keep customers. That's the only difference. If you think about the difference between taxi cab companies here in New York City, every cab that I've been in in the last three days has bothered me about Uber. The reason, the difference between Uber and a taxi cab company is data. That's the primary difference. Uber uses data as an asset. And we think this is the fundamental feature of digital business that everybody has to pay attention to. How is a business going to use data as an asset? Is the business using data as an asset? Is a business driving its engagement with customers, the role of its product et cetera using data? And if they are, they are becoming a more digital business. Now when you think about that, what we're really talking about is how are they going to put data to work? How are they going to take their customer data and their operational data and their financial data and any other kind of data and ultimately turn that into superior engagement or improved customer experience or more agile operations or increased automation? Those are the kinds of outcomes that we're talking about. But it is about putting data to work. That's fundamentally what we're trying to do within a digital business. Now that leads to an observation about the crucial strategic business capabilities that every business that aspires to be more digital or to be digital has to put in place. And I want to be clear. When I say strategic capabilities I mean something specific. When you talk about, for example technology architecture or information architecture there is this notion of what capabilities does your business need? Your business needs capabilities to pursue and achieve its mission. And in the digital business these are the capabilities that are now additive to this core question, ultimately of whether or not the company is a digital business. What are the three capabilities? One, you have to capture data. Not just do a good job of it, but better than your competition. You have to capture data better than your competition. In a way that is ultimately less intrusive on your markets and on your customers. That's in many respects, one of the first priorities of the internet of things and people. The idea of using sensors and related technologies to capture more data. Once you capture that data you have to turn it into value. You have to do something with it that creates business value so you can do a better job of engaging your markets and serving your customers. And that essentially is what we regard as the basis of Big Data. Including operations, including financial performance and everything else, but ultimately it's taking the data that's being captured and turning it into value within the business. The last point here is that once you have generated a model, or an insight or some other resource that you can act upon, you then have to act upon it in the real world. We call that systems of agency, the ability to enact based on data. Now I want to spend just a second talking about systems of agency 'cause we think it's an interesting concept and it's something Jim Kobielus is going to talk about a little bit later. When we say systems of agency, what we're saying is increasingly machines are acting on behalf of a brand. Or systems, combinations of machines and people are acting on behalf of the brand. And this whole notion of agency is the idea that ultimately these systems are now acting as the business's agent. They are at the front line of engaging customers. It's an extremely rich proposition that has subtle but crucial implications. For example I was talking to a senior decision maker at a business today and they made a quick observation, they talked about they, on their way here to New York City they had followed a woman who was going through security, opened up her suitcase and took out a bird. And then went through security with the bird. And the reason why I bring this up now is as TSA was trying to figure out how exactly to deal with this, the bird started talking and repeating things that the woman had said and many of those things, in fact, might have put her in jail. Now in this case the bird is not an agent of that woman. You can't put the woman in jail because of what the bird said. But increasingly we have to ask ourselves as we ask machines to do more on our behalf, digital instrumentation and elements to do more on our behalf, it's going to have blow back and an impact on our brand if we don't do it well. I want to draw that forward a little bit because I suggest there's going to be a new lifecycle for data. And the way that we think about it is we have the internet or the Edge which is comprised of things and crucially people, using sensors, whether they be smaller processors in control towers or whether they be phones that are tracking where we go, and this crucial element here is something that we call information transducers. Now a transducer in a traditional sense is something that takes energy from one form to another so that it can perform new types of work. By information transducer I essentially mean it takes information from one form to another so it can perform another type of work. This is a crucial feature of data. One of the beauties of data is that it can be used in multiple places at multiple times and not engender significant net new costs. It's one of the few assets that you can say about that. So the concept of an information transducer's really important because it's the basis for a lot of transformations of data as data flies through organizations. So we end up with the transducers storing data in the form of analytics, machine learning, business operations, other types of things, and then it goes back and it's transduced, back into to the real world as we program the real world and turning into these systems of agency. So that's the new lifecycle. And increasingly, that's how we have to think about data flows. Capturing it, turning it into value and having it act on our behalf in front of markets. That could have enormous implications for how ultimately money is spent over the next few years. So Wikibon does a significant amount of market research in addition to advising our large user customers. And that includes doing studies on cloud, public cloud, but also studies on what's happening within the analytics world. And if you take a look at it, what we basically see happening over the course of the next few years is significant investments in software and also services to get the word out. But we also expect there's going to be a lot of hardware. A significant amount of hardware that's ultimately sold within this space. And that's because of something that we call true private cloud. This concept of ultimately a business increasingly being designed and architected around the idea of data assets means that the reality, the physical realities of how data operates, how much it costs to store it or move it, the issues of latency, the issues of intellectual property protection as well as things like the regulatory regimes that are being put in place to govern how data gets used in between locations. All of those factors are going to drive increased utilization of what we call true private cloud. On premise technologies that provide the cloud experience but act where the data naturally needs to be processed. I'll come a little bit more to that in a second. So we think that it's going to be a relatively balanced market, a lot of stuff is going to end up in the cloud, but as Neil and Jim will talk about, there's going to be an enormous amount of analytics that pulls an enormous amount of data out to the Edge 'cause that's where the action's going to be. Now one of the things I want to also reveal to you is we've done a fair amount of data, we've done a fair amount of research around this question of where or how will data guide decisions about infrastructure? And in particular the Edge is driving these conversations. So here is a piece of research that one of our cohorts at Wikibon did, David Floyer. Taking a look at IoT Edge cost comparisons over a three year period. And it showed on the left hand side, an example where the sensor towers and other types of devices were streaming data back into a central location in a wind farm, stylized wind farm example. Very very expensive. Significant amounts of money end up being consumed, significant resources end up being consumed by the cost of moving the data from one place to another. Now this is even assuming that latency does not become a problem. The second example that we looked at is if we kept more of that data at the Edge and processed at the Edge. And literally it is a 85 plus percent cost reduction to keep more of the data at the Edge. Now that has enormous implications, how we think about big data, how we think about next generation architectures, et cetera. But it's these costs that are going to be so crucial to shaping the decisions that we make over the next two years about where we put hardware, where we put resources, what type of automation is possible, and what types of technology management has to be put in place. Ultimately we think it's going to lead to a structure, an architecture in the infrastructure as well as applications that is informed more by moving cloud to the data than moving the data to the cloud. That's kind of our fundamental proposition is that the norm in the industry has been to think about moving all data up to the cloud because who wants to do IT? It's so much cheaper, look what Amazon can do. Or what AWS can do. All true statements. Very very important in many respects. But most businesses today are starting to rethink that simple proposition and asking themselves do we have to move our business to the cloud, or can we move the cloud to the business? And increasingly what we see happening as we talk to our large customers about this, is that the cloud is being extended out to the Edge, we're moving the cloud and cloud services out to the business. Because of economic reasons, intellectual property control reasons, regulatory reasons, security reasons, any number of other reasons. It's just a more natural way to deal with it. And of course, the most important reason is latency. So with that as a quick backdrop, if I may quickly summarize, we believe fundamentally that the difference today is that businesses are trying to understand how to use data as an asset. And that requires an investment in new sets of technology capabilities that are not cheap, not simple and require significant thought, a lot of planning, lot of change within an IT and business organizations. How we capture data, how we turn it into value, and how we translate that into real world action through software. That's going to lead to a rethinking, ultimately, based on cost and other factors about how we deploy infrastructure. How we use the cloud so that the data guides the activity and not the choice of cloud supplier determines or limits what we can do with our data. And that's going to lead to this notion of true private cloud and elevate the role the Edge plays in analytics and all other architectures. So I hope that was perfectly clear. And now what I want to do is I want to bring up Neil Raden. Yes, now's the time Neil! So let me invite Neil up to spend some time talking about harvesting value at the Edge. Can you see his, all right. Got it. >> Oh boy. Hi everybody. Yeah, this is a really, this is a really big and complicated topic so I decided to just concentrate on something fairly simple, but I know that Peter mentioned customers. And he also had a picture of Peter Drucker. I had the pleasure in 1998 of interviewing Peter and photographing him. Peter Drucker, not this Peter. Because I'd started a magazine called Hired Brains. It was for consultants. And Peter said, Peter said a number of really interesting things to me, but one of them was his definition of a customer was someone who wrote you a check that didn't bounce. He was kind of a wag. He was! So anyway, he had to leave to do a video conference with Jack Welch and so I said to him, how do you charge Jack Welch to spend an hour on a video conference? And he said, you know I have this theory that you should always charge your client enough that it hurts a little bit or they don't take you seriously. Well, I had the chance to talk to Jack's wife, Suzie Welch recently and I told her that story and she said, "Oh he's full of it, Jack never paid "a dime for those conferences!" (laughs) So anyway, all right, so let's talk about this. To me, things about, engineered things like the hardware and network and all these other standards and so forth, we haven't fully developed those yet, but they're coming. As far as I'm concerned, they're not the most interesting thing. The most interesting thing to me in Edge Analytics is what you're going to get out of it, what the result is going to be. Making sense of this data that's coming. And while we're on data, something I've been thinking a lot lately because everybody I've talked to for the last three days just keeps talking to me about data. I have this feeling that data isn't actually quite real. That any data that we deal with is the result of some process that's captured it from something else that's actually real. In other words it's proxy. So it's not exactly perfect. And that's why we've always had these problems about customer A, customer A, customer A, what's their definition? What's the definition of this, that and the other thing? And with sensor data, I really have the feeling, when companies get, not you know, not companies, organizations get instrumented and start dealing with this kind of data what they're going to find is that this is the first time, and I've been involved in analytics, I don't want to date myself, 'cause I know I look young, but the first, I've been dealing with analytics since 1975. And everything we've ever done in analytics has involved pulling data from some other system that was not designed for analytics. But if you think about sensor data, this is data that we're actually going to catch the first time. It's going to be ours! We're not going to get it from some other source. It's going to be the real deal, to the extent that it's the real deal. Now you may say, ya know Neil, a sensor that's sending us information about oil pressure or temperature or something like that, how can you quarrel with that? Well, I can quarrel with it because I don't know if the sensor's doing it right. So we still don't know, even with that data, if it's right, but that's what we have to work with. Now, what does that really mean? Is that we have to be really careful with this data. It's ours, we have to take care of it. We don't get to reload it from source some other day. If we munge it up it's gone forever. So that has, that has very serious implications, but let me, let me roll you back a little bit. The way I look at analytics is it's come in three different eras. And we're entering into the third now. The first era was business intelligence. It was basically built and governed by IT, it was system of record kind of reporting. And as far as I can recall, it probably started around 1988 or at least that's the year that Howard Dresner claims to have invented the term. I'm not sure it's true. And things happened before 1988 that was sort of like BI, but 88 was when they really started coming out, that's when we saw BusinessObjects and Cognos and MicroStrategy and those kinds of things. The second generation just popped out on everybody else. We're all looking around at BI and we were saying why isn't this working? Why are only five people in the organization using this? Why are we not getting value out of this massive license we bought? And along comes companies like Tableau doing data discovery, visualization, data prep and Line of Business people are using this now. But it's still the same kind of data sources. It's moved out a little bit, but it still hasn't really hit the Big Data thing. Now we're in third generation, so we not only had Big Data, which has come and hit us like a tsunami, but we're looking at smart discovery, we're looking at machine learning. We're looking at AI induced analytics workflows. And then all the natural language cousins. You know, natural language processing, natural language, what's? Oh Q, natural language query. Natural language generation. Anybody here know what natural language generation is? Yeah, so what you see now is you do some sort of analysis and that tool comes up and says this chart is about the following and it used the following data, and it's blah blah blah blah blah. I think it's kind of wordy and it's going to refined some, but it's an interesting, it's an interesting thing to do. Now, the problem I see with Edge Analytics and IoT in general is that most of the canonical examples we talk about are pretty thin. I know we talk about autonomous cars, I hope to God we never have them, 'cause I'm a car guy. Fleet Management, I think Qualcomm started Fleet Management in 1988, that is not a new application. Industrial controls. I seem to remember, I seem to remember Honeywell doing industrial controls at least in the 70s and before that I wasn't, I don't want to talk about what I was doing, but I definitely wasn't in this industry. So my feeling is we all need to sit down and think about this and get creative. Because the real value in Edge Analytics or IoT, whatever you want to call it, the real value is going to be figuring out something that's new or different. Creating a brand new business. Changing the way an operation happens in a company, right? And I think there's a lot of smart people out there and I think there's a million apps that we haven't even talked about so, if you as a vendor come to me and tell me how great your product is, please don't talk to me about autonomous cars or Fleet Managing, 'cause I've heard about that, okay? Now, hardware and architecture are really not the most interesting thing. We fell into that trap with data warehousing. We've fallen into that trap with Big Data. We talk about speeds and feeds. Somebody said to me the other day, what's the narrative of this company? This is a technology provider. And I said as far as I can tell, they don't have a narrative they have some products and they compete in a space. And when they go to clients and the clients say, what's the value of your product? They don't have an answer for that. So we don't want to fall into this trap, okay? Because IoT is going to inform you in ways you've never even dreamed about. Unfortunately some of them are going to be really stinky, you know, they're going to be really bad. You're going to lose more of your privacy, it's going to get harder to get, I dunno, mortgage for example, I dunno, maybe it'll be easier, but in any case, it's not going to all be good. So let's really think about what you want to do with this technology to do something that's really valuable. Cost takeout is not the place to justify an IoT project. Because number one, it's very expensive, and number two, it's a waste of the technology because you should be looking at, you know the old numerator denominator thing? You should be looking at the numerators and forget about the denominators because that's not what you do with IoT. And the other thing is you don't want to get over confident. Actually this is good advice about anything, right? But in this case, I love this quote by Derek Sivers He's a pretty funny guy. He said, "If more information was the answer, "then we'd all be billionaires with perfect abs." I'm not sure what's on his wishlist, but you know, I would, those aren't necessarily the two things I would think of, okay. Now, what I said about the data, I want to explain some more. Big Data Analytics, if you look at this graphic, it depicts it perfectly. It's a bunch of different stuff falling into the funnel. All right? It comes from other places, it's not original material. And when it comes in, it's always used as second hand data. Now what does that mean? That means that you have to figure out the semantics of this information and you have to find a way to put it together in a way that's useful to you, okay. That's Big Data. That's where we are. How is that different from IoT data? It's like I said, IoT is original. You can put it together any way you want because no one else has ever done that before. It's yours to construct, okay. You don't even have to transform it into a schema because you're creating the new application. But the most important thing is you have to take care of it 'cause if you lose it, it's gone. It's the original data. It's the same way, in operational systems for a long long time we've always been concerned about backup and security and everything else. You better believe this is a problem. I know a lot of people think about streaming data, that we're going to look at it for a minute, and we're going to throw most of it away. Personally I don't think that's going to happen. I think it's all going to be saved, at least for a while. Now, the governance and security, oh, by the way, I don't know where you're going to find a presentation where somebody uses a newspaper clipping about Vladimir Lenin, but here it is, enjoy yourselves. I believe that when people think about governance and security today they're still thinking along the same grids that we thought about it all along. But this is very very different and again, I'm sorry I keep thrashing this around, but this is treasured data that has to be carefully taken care of. Now when I say governance, my experience has been over the years that governance is something that IT does to make everybody's lives miserable. But that's not what I mean by governance today. It means a comprehensive program to really secure the value of the data as an asset. And you need to think about this differently. Now the other thing is you may not get to think about it differently, because some of the stuff may end up being subject to regulation. And if the regulators start regulating some of this, then that'll take some of the degrees of freedom away from you in how you put this together, but you know, that's the way it works. Now, machine learning, I think I told somebody the other day that claims about machine learning in software products are as common as twisters in trail parks. And a lot of it is not really what I'd call machine learning. But there's a lot of it around. And I think all of the open source machine learning and artificial intelligence that's popped up, it's great because all those math PhDs who work at Home Depot now have something to do when they go home at night and they construct this stuff. But if you're going to have machine learning at the Edge, here's the question, what kind of machine learning would you have at the Edge? As opposed to developing your models back at say, the cloud, when you transmit the data there. The devices at the Edge are not very powerful. And they don't have a lot of memory. So you're only going to be able to do things that have been modeled or constructed somewhere else. But that's okay. Because machine learning algorithm development is actually slow and painful. So you really want the people who know how to do this working with gobs of data creating models and testing them offline. And when you have something that works, you can put it there. Now there's one thing I want to talk about before I finish, and I think I'm almost finished. I wrote a book about 10 years ago about automated decision making and the conclusion that I came up with was that little decisions add up, and that's good. But it also means you don't have to get them all right. But you don't want computers or software making decisions unattended if it involves human life, or frankly any life. Or the environment. So when you think about the applications that you can build using this architecture and this technology, think about the fact that you're not going to be doing air traffic control, you're not going to be monitoring crossing guards at the elementary school. You're going to be doing things that may seem fairly mundane. Managing machinery on the factory floor, I mean that may sound great, but really isn't that interesting. Managing well heads, drilling for oil, well I mean, it's great to the extent that it doesn't cause wells to explode, but they don't usually explode. What it's usually used for is to drive the cost out of preventative maintenance. Not very interesting. So use your heads. Come up with really cool stuff. And any of you who are involved in Edge Analytics, the next time I talk to you I don't want to hear about the same five applications that everybody talks about. Let's hear about some new ones. So, in conclusion, I don't really have anything in conclusion except that Peter mentioned something about limousines bringing people up here. On Monday I was slogging up and down Park Avenue and Madison Avenue with my client and we were visiting all the hedge funds there because we were doing a project with them. And in the miserable weather I looked at him and I said, for godsake Paul, where's the black car? And he said, that was the 90s. (laughs) Thank you. So, Jim, up to you. (audience applauding) This is terrible, go that way, this was terrible coming that way. >> Woo, don't want to trip! And let's move to, there we go. Hi everybody, how ya doing? Thanks Neil, thanks Peter, those were great discussions. So I'm the third leg in this relay race here, talking about of course how software is eating the world. And focusing on the value of Edge Analytics in a lot of real world scenarios. Programming the real world for, to make the world a better place. So I will talk, I'll break it out analytically in terms of the research that Wikibon is doing in the area of the IoT, but specifically how AI intelligence is being embedded really to all material reality potentially at the Edge. But mobile applications and industrial IoT and the smart appliances and self driving vehicles. I will break it out in terms of a reference architecture for understanding what functions are being pushed to the Edge to hardware, to our phones and so forth to drive various scenarios in terms of real world results. So I'll move a pace here. So basically AI software or AI microservices are being infused into Edge hardware as we speak. What we see is more vendors of smart phones and other, real world appliances and things like smart driving, self driving vehicles. What they're doing is they're instrumenting their products with computer vision and natural language processing, environmental awareness based on sensing and actuation and those capabilities and inferences that these devices just do to both provide human support for human users of these devices as well as to enable varying degrees of autonomous operation. So what I'll be talking about is how AI is a foundation for data driven systems of agency of the sort that Peter is talking about. Infusing data driven intelligence into everything or potentially so. As more of this capability, all these algorithms for things like, ya know for doing real time predictions and classifications, anomaly detection and so forth, as this functionality gets diffused widely and becomes more commoditized, you'll see it burned into an ever-wider variety of hardware architecture, neuro synaptic chips, GPUs and so forth. So what I've got here in front of you is a sort of a high level reference architecture that we're building up in our research at Wikibon. So AI, artificial intelligence is a big term, a big paradigm, I'm not going to unpack it completely. Of course we don't have oodles of time so I'm going to take you fairly quickly through the high points. It's a driver for systems of agency. Programming the real world. Transducing digital inputs, the data, to analog real world results. Through the embedding of this capability in the IoT, but pushing more and more of it out to the Edge with points of decision and action in real time. And there are four capabilities that we're seeing in terms of AI enabled, enabling capabilities that are absolutely critical to software being pushed to the Edge are sensing, actuation, inference and Learning. Sensing and actuation like Peter was describing, it's about capturing data from the environment within which a device or users is operating or moving. And then actuation is the fancy term for doing stuff, ya know like industrial IoT, it's obviously machine controlled, but clearly, you know self driving vehicles is steering a vehicle and avoiding crashing and so forth. Inference is the meat and potatoes as it were of AI. Analytics does inferences. It infers from the data, the logic of the application. Predictive logic, correlations, classification, abstractions, differentiation, anomaly detection, recognizing faces and voices. We see that now with Apple and the latest version of the iPhone is embedding face recognition as a core, as the core multifactor authentication technique. Clearly that's a harbinger of what's going to be universal fairly soon which is that depends on AI. That depends on convolutional neural networks, that is some heavy hitting processing power that's necessary and it's processing the data that's coming from your face. So that's critically important. So what we're looking at then is the AI software is taking root in hardware to power continuous agency. Getting stuff done. Powered decision support by human beings who have to take varying degrees of action in various environments. We don't necessarily want to let the car steer itself in all scenarios, we want some degree of override, for lots of good reasons. They want to protect life and limb including their own. And just more data driven automation across the internet of things in the broadest sense. So unpacking this reference framework, what's happening is that AI driven intelligence is powering real time decisioning at the Edge. Real time local sensing from the data that it's capturing there, it's ingesting the data. Some, not all of that data, may be persistent at the Edge. Some, perhaps most of it, will be pushed into the cloud for other processing. When you have these highly complex algorithms that are doing AI deep learning, multilayer, to do a variety of anti-fraud and higher level like narrative, auto-narrative roll-ups from various scenes that are unfolding. A lot of this processing is going to begin to happen in the cloud, but a fair amount of the more narrowly scoped inferences that drive real time decision support at the point of action will be done on the device itself. Contextual actuation, so it's the sensor data that's captured by the device along with other data that may be coming down in real time streams through the cloud will provide the broader contextual envelope of data needed to drive actuation, to drive various models and rules and so forth that are making stuff happen at the point of action, at the Edge. Continuous inference. What it all comes down to is that inference is what's going on inside the chips at the Edge device. And what we're seeing is a growing range of hardware architectures, GPUs, CPUs, FPGAs, ASIC, Neuro synaptic chips of all sorts playing in various combinations that are automating more and more very complex inference scenarios at the Edge. And not just individual devices, swarms of devices, like drones and so forth are essentially an Edge unto themselves. You'll see these tiered hierarchies of Edge swarms that are playing and doing inferences of ever more complex dynamic nature. And much of this will be, this capability, the fundamental capabilities that is powering them all will be burned into the hardware that powers them. And then adaptive learning. Now I use the term learning rather than training here, training is at the core of it. Training means everything in terms of the predictive fitness or the fitness of your AI services for whatever task, predictions, classifications, face recognition that you, you've built them for. But I use the term learning in a broader sense. It's what's make your inferences get better and better, more accurate over time is that you're training them with fresh data in a supervised learning environment. But you can have reinforcement learning if you're doing like say robotics and you don't have ground truth against which to train the data set. You know there's maximize a reward function versus minimize a loss function, you know, the standard approach, the latter for supervised learning. There's also, of course, the issue, or not the issue, the approach of unsupervised learning with cluster analysis critically important in a lot of real world scenarios. So Edge AI Algorithms, clearly, deep learning which is multilayered machine learning models that can do abstractions at higher and higher levels. Face recognition is a high level abstraction. Faces in a social environment is an even higher level of abstraction in terms of groups. Faces over time and bodies and gestures, doing various things in various environments is an even higher level abstraction in terms of narratives that can be rolled up, are being rolled up by deep learning capabilities of great sophistication. Convolutional neural networks for processing images, recurrent neural networks for processing time series. Generative adversarial networks for doing essentially what's called generative applications of all sort, composing music, and a lot of it's being used for auto programming. These are all deep learning. There's a variety of other algorithm approaches I'm not going to bore you with here. Deep learning is essentially the enabler of the five senses of the IoT. Your phone's going to have, has a camera, it has a microphone, it has the ability to of course, has geolocation and navigation capabilities. It's environmentally aware, it's got an accelerometer and so forth embedded therein. The reason that your phone and all of the devices are getting scary sentient is that they have the sensory modalities and the AI, the deep learning that enables them to make environmentally correct decisions in the wider range of scenarios. So machine learning is the foundation of all of this, but there are other, I mean of deep learning, artificial neural networks is the foundation of that. But there are other approaches for machine learning I want to make you aware of because support vector machines and these other established approaches for machine learning are not going away but really what's driving the show now is deep learning, because it's scary effective. And so that's where most of the investment in AI is going into these days for deep learning. AI Edge platforms, tools and frameworks are just coming along like gangbusters. Much development of AI, of deep learning happens in the context of your data lake. This is where you're storing your training data. This is the data that you use to build and test to validate in your models. So we're seeing a deepening stack of Hadoop and there's Kafka, and Spark and so forth that are driving the training (coughs) excuse me, of AI models that are power all these Edge Analytic applications so that that lake will continue to broaden in terms, and deepen in terms of a scope and the range of data sets and the range of modeling, AI modeling supports. Data science is critically important in this scenario because the data scientist, the data science teams, the tools and techniques and flows of data science are the fundamental development paradigm or discipline or capability that's being leveraged to build and to train and to deploy and iterate all this AI that's being pushed to the Edge. So clearly data science is at the center, data scientists of an increasingly specialized nature are necessary to the realization to this value at the Edge. AI frameworks are coming along like you know, a mile a minute. TensorFlow has achieved a, is an open source, most of these are open source, has achieved sort of almost like a defacto standard, status, I'm using the word defacto in air quotes. There's Theano and Keras and xNet and CNTK and a variety of other ones. We're seeing range of AI frameworks come to market, most open source. Most are supported by most of the major tool vendors as well. So at Wikibon we're definitely tracking that, we plan to go deeper in our coverage of that space. And then next best action, powers recommendation engines. I mean next best action decision automation of the sort of thing Neil's covered in a variety of contexts in his career is fundamentally important to Edge Analytics to systems of agency 'cause it's driving the process automation, decision automation, sort of the targeted recommendations that are made at the Edge to individual users as well as to process that automation. That's absolutely necessary for self driving vehicles to do their jobs and industrial IoT. So what we're seeing is more and more recommendation engine or recommender capabilities powered by ML and DL are going to the Edge, are already at the Edge for a variety of applications. Edge AI capabilities, like I said, there's sensing. And sensing at the Edge is becoming ever more rich, mixed reality Edge modalities of all sort are for augmented reality and so forth. We're just seeing a growth in certain, the range of sensory modalities that are enabled or filtered and analyzed through AI that are being pushed to the Edge, into the chip sets. Actuation, that's where robotics comes in. Robotics is coming into all aspects of our lives. And you know, it's brainless without AI, without deep learning and these capabilities. Inference, autonomous edge decisioning. Like I said, it's, a growing range of inferences that are being done at the Edge. And that's where it has to happen 'cause that's the point of decision. Learning, training, much training, most training will continue to be done in the cloud because it's very data intensive. It's a grind to train and optimize an AI algorithm to do its job. It's not something that you necessarily want to do or can do at the Edge at Edge devices so, the models that are built and trained in the cloud are pushed down through a dev ops process down to the Edge and that's the way it will work pretty much in most AI environments, Edge analytics environments. You centralize the modeling, you decentralize the execution of the inference models. The training engines will be in the cloud. Edge AI applications. I'll just run you through sort of a core list of the ones that are coming into, already come into the mainstream at the Edge. Multifactor authentication, clearly the Apple announcement of face recognition is just a harbinger of the fact that that's coming to every device. Computer vision speech recognition, NLP, digital assistance and chat bots powered by natural language processing and understanding, it's all AI powered. And it's becoming very mainstream. Emotion detection, face recognition, you know I could go on and on but these are like the core things that everybody has access to or will by 2020 and they're core devices, mass market devices. Developers, designers and hardware engineers are coming together to pool their expertise to build and train not just the AI, but also the entire package of hardware in UX and the orchestration of real world business scenarios or life scenarios that all this intelligence, the submitted intelligence enables and most, much of what they build in terms of AI will be containerized as micro services through Docker and orchestrated through Kubernetes as full cloud services in an increasingly distributed fabric. That's coming along very rapidly. We can see a fair amount of that already on display at Strata in terms of what the vendors are doing or announcing or who they're working with. The hardware itself, the Edge, you know at the Edge, some data will be persistent, needs to be persistent to drive inference. That's, and you know to drive a variety of different application scenarios that need some degree of historical data related to what that device in question happens to be sensing or has sensed in the immediate past or you know, whatever. The hardware itself is geared towards both sensing and increasingly persistence and Edge driven actuation of real world results. The whole notion of drones and robotics being embedded into everything that we do. That's where that comes in. That has to be powered by low cost, low power commodity chip sets of various sorts. What we see right now in terms of chip sets is it's a GPUs, Nvidia has gone real far and GPUs have come along very fast in terms of power inference engines, you know like the Tesla cars and so forth. But GPUs are in many ways the core hardware sub straight for in inference engines in DL so far. But to become a mass market phenomenon, it's got to get cheaper and lower powered and more commoditized, and so we see a fair number of CPUs being used as the hardware for Edge Analytic applications. Some vendors are fairly big on FPGAs, I believe Microsoft has gone fairly far with FPGAs inside DL strategy. ASIC, I mean, there's neuro synaptic chips like IBM's got one. There's at least a few dozen vendors of neuro synaptic chips on the market so at Wikibon we're going to track that market as it develops. And what we're seeing is a fair number of scenarios where it's a mixed environment where you use one chip set architecture at the inference side of the Edge, and other chip set architectures that are driving the DL as processed in the cloud, playing together within a common architecture. And we see some, a fair number of DL environments where the actual training is done in the cloud on Spark using CPUs and parallelized in memory, but pushing Tensorflow models that might be trained through Spark down to the Edge where the inferences are done in FPGAs and GPUs. Those kinds of mixed hardware scenarios are very, very, likely to be standard going forward in lots of areas. So analytics at the Edge power continuous results is what it's all about. The whole point is really not moving the data, it's putting the inference at the Edge and working from the data that's already captured and persistent there for the duration of whatever action or decision or result needs to be powered from the Edge. Like Neil said cost takeout alone is not worth doing. Cost takeout alone is not the rationale for putting AI at the Edge. It's getting new stuff done, new kinds of things done in an automated consistent, intelligent, contextualized way to make our lives better and more productive. Security and governance are becoming more important. Governance of the models, governance of the data, governance in a dev ops context in terms of version controls over all those DL models that are built, that are trained, that are containerized and deployed. Continuous iteration and improvement of those to help them learn to do, make our lives better and easier. With that said, I'm going to hand it over now. It's five minutes after the hour. We're going to get going with the Influencer Panel so what we'd like to do is I call Peter, and Peter's going to call our influencers. >> All right, am I live yet? Can you hear me? All right so, we've got, let me jump back in control here. We've got, again, the objective here is to have community take on some things. And so what we want to do is I want to invite five other people up, Neil why don't you come on up as well. Start with Neil. You can sit here. On the far right hand side, Judith, Judith Hurwitz. >> Neil: I'm glad I'm on the left side. >> From the Hurwitz Group. >> From the Hurwitz Group. Jennifer Shin who's affiliated with UC Berkeley. Jennifer are you here? >> She's here, Jennifer where are you? >> She was here a second ago. >> Neil: I saw her walk out she may have, >> Peter: All right, she'll be back in a second. >> Here's Jennifer! >> Here's Jennifer! >> Neil: With 8 Path Solutions, right? >> Yep. >> Yeah 8 Path Solutions. >> Just get my mic. >> Take your time Jen. >> Peter: All right, Stephanie McReynolds. Far left. And finally Joe Caserta, Joe come on up. >> Stephie's with Elysian >> And to the left. So what I want to do is I want to start by having everybody just go around introduce yourself quickly. Judith, why don't we start there. >> I'm Judith Hurwitz, I'm president of Hurwitz and Associates. We're an analyst research and fault leadership firm. I'm the co-author of eight books. Most recent is Cognitive Computing and Big Data Analytics. I've been in the market for a couple years now. >> Jennifer. >> Hi, my name's Jennifer Shin. I'm the founder and Chief Data Scientist 8 Path Solutions LLC. We do data science analytics and technology. We're actually about to do a big launch next month, with Box actually. >> We're apparent, are we having a, sorry Jennifer, are we having a problem with Jennifer's microphone? >> Man: Just turn it back on? >> Oh you have to turn it back on. >> It was on, oh sorry, can you hear me now? >> Yes! We can hear you now. >> Okay, I don't know how that turned back off, but okay. >> So you got to redo all that Jen. >> Okay, so my name's Jennifer Shin, I'm founder of 8 Path Solutions LLC, it's a data science analytics and technology company. I founded it about six years ago. So we've been developing some really cool technology that we're going to be launching with Box next month. It's really exciting. And I have, I've been developing a lot of patents and some technology as well as teaching at UC Berkeley as a lecturer in data science. >> You know Jim, you know Neil, Joe, you ready to go? >> Joe: Just broke my microphone. >> Joe's microphone is broken. >> Joe: Now it should be all right. >> Jim: Speak into Neil's. >> Joe: Hello, hello? >> I just feel not worthy in the presence of Joe Caserta. (several laughing) >> That's right, master of mics. If you can hear me, Joe Caserta, so yeah, I've been doing data technology solutions since 1986, almost as old as Neil here, but been doing specifically like BI, data warehousing, business intelligence type of work since 1996. And been doing, wholly dedicated to Big Data solutions and modern data engineering since 2009. Where should I be looking? >> Yeah I don't know where is the camera? >> Yeah, and that's basically it. So my company was formed in 2001, it's called Caserta Concepts. We recently rebranded to only Caserta 'cause what we do is way more than just concepts. So we conceptualize the stuff, we envision what the future brings and we actually build it. And we help clients large and small who are just, want to be leaders in innovation using data specifically to advance their business. >> Peter: And finally Stephanie McReynolds. >> I'm Stephanie McReynolds, I had product marketing as well as corporate marketing for a company called Elysian. And we are a data catalog so we help bring together not only a technical understanding of your data, but we curate that data with human knowledge and use automated intelligence internally within the system to make recommendations about what data to use for decision making. And some of our customers like City of San Diego, a large automotive manufacturer working on self driving cars and General Electric use Elysian to help power their solutions for IoT at the Edge. >> All right so let's jump right into it. And again if you have a question, raise your hand, and we'll do our best to get it to the floor. But what I want to do is I want to get seven questions in front of this group and have you guys discuss, slog, disagree, agree. Let's start here. What is the relationship between Big Data AI and IoT? Now Wikibon's put forward its observation that data's being generated at the Edge, that action is being taken at the Edge and then increasingly the software and other infrastructure architectures need to accommodate the realities of how data is going to work in these very complex systems. That's our perspective. Anybody, Judith, you want to start? >> Yeah, so I think that if you look at AI machine learning, all these different areas, you have to be able to have the data learned. Now when it comes to IoT, I think one of the issues we have to be careful about is not all data will be at the Edge. Not all data needs to be analyzed at the Edge. For example if the light is green and that's good and it's supposed to be green, do you really have to constantly analyze the fact that the light is green? You actually only really want to be able to analyze and take action when there's an anomaly. Well if it goes purple, that's actually a sign that something might explode, so that's where you want to make sure that you have the analytics at the edge. Not for everything, but for the things where there is an anomaly and a change. >> Joe, how about from your perspective? >> For me I think the evolution of data is really becoming, eventually oxygen is just, I mean data's going to be the oxygen we breathe. It used to be very very reactive and there used to be like a latency. You do something, there's a behavior, there's an event, there's a transaction, and then you go record it and then you collect it, and then you can analyze it. And it was very very waterfallish, right? And then eventually we figured out to put it back into the system. Or at least human beings interpret it to try to make the system better and that is really completely turned on it's head, we don't do that anymore. Right now it's very very, it's synchronous, where as we're actually making these transactions, the machines, we don't really need, I mean human beings are involved a bit, but less and less and less. And it's just a reality, it may not be politically correct to say but it's a reality that my phone in my pocket is following my behavior, and it knows without telling a human being what I'm doing. And it can actually help me do things like get to where I want to go faster depending on my preference if I want to save money or save time or visit things along the way. And I think that's all integration of big data, streaming data, artificial intelligence and I think the next thing that we're going to start seeing is the culmination of all of that. I actually, hopefully it'll be published soon, I just wrote an article for Forbes with the term of ARBI and ARBI is the integration of Augmented Reality and Business Intelligence. Where I think essentially we're going to see, you know, hold your phone up to Jim's face and it's going to recognize-- >> Peter: It's going to break. >> And it's going to say exactly you know, what are the key metrics that we want to know about Jim. If he works on my sales force, what's his attainment of goal, what is-- >> Jim: Can it read my mind? >> Potentially based on behavior patterns. >> Now I'm scared. >> I don't think Jim's buying it. >> It will, without a doubt be able to predict what you've done in the past, you may, with some certain level of confidence you may do again in the future, right? And is that mind reading? It's pretty close, right? >> Well, sometimes, I mean, mind reading is in the eye of the individual who wants to know. And if the machine appears to approximate what's going on in the person's head, sometimes you can't tell. So I guess, I guess we could call that the Turing machine test of the paranormal. >> Well, face recognition, micro gesture recognition, I mean facial gestures, people can do it. Maybe not better than a coin toss, but if it can be seen visually and captured and analyzed, conceivably some degree of mind reading can be built in. I can see when somebody's angry looking at me so, that's a possibility. That's kind of a scary possibility in a surveillance society, potentially. >> Neil: Right, absolutely. >> Peter: Stephanie, what do you think? >> Well, I hear a world of it's the bots versus the humans being painted here and I think that, you know at Elysian we have a very strong perspective on this and that is that the greatest impact, or the greatest results is going to be when humans figure out how to collaborate with the machines. And so yes, you want to get to the location more quickly, but the machine as in the bot isn't able to tell you exactly what to do and you're just going to blindly follow it. You need to train that machine, you need to have a partnership with that machine. So, a lot of the power, and I think this goes back to Judith's story is then what is the human decision making that can be augmented with data from the machine, but then the humans are actually training the training side and driving machines in the right direction. I think that's when we get true power out of some of these solutions so it's not just all about the technology. It's not all about the data or the AI, or the IoT, it's about how that empowers human systems to become smarter and more effective and more efficient. And I think we're playing that out in our technology in a certain way and I think organizations that are thinking along those lines with IoT are seeing more benefits immediately from those projects. >> So I think we have a general agreement of what kind of some of the things you talked about, IoT, crucial capturing information, and then having action being taken, AI being crucial to defining and refining the nature of the actions that are being taken Big Data ultimately powering how a lot of that changes. Let's go to the next one. >> So actually I have something to add to that. So I think it makes sense, right, with IoT, why we have Big Data associated with it. If you think about what data is collected by IoT. We're talking about a serial information, right? It's over time, it's going to grow exponentially just by definition, right, so every minute you collect a piece of information that means over time, it's going to keep growing, growing, growing as it accumulates. So that's one of the reasons why the IoT is so strongly associated with Big Data. And also why you need AI to be able to differentiate between one minute versus next minute, right? Trying to find a better way rather than looking at all that information and manually picking out patterns. To have some automated process for being able to filter through that much data that's being collected. >> I want to point out though based on what you just said Jennifer, I want to bring Neil in at this point, that this question of IoT now generating unprecedented levels of data does introduce this idea of the primary source. Historically what we've done within technology, or within IT certainly is we've taken stylized data. There is no such thing as a real world accounting thing. It is a human contrivance. And we stylize data and therefore it's relatively easy to be very precise on it. But when we start, as you noted, when we start measuring things with a tolerance down to thousandths of a millimeter, whatever that is, metric system, now we're still sometimes dealing with errors that we have to attend to. So, the reality is we're not just dealing with stylized data, we're dealing with real data, and it's more, more frequent, but it also has special cases that we have to attend to as in terms of how we use it. What do you think Neil? >> Well, I mean, I agree with that, I think I already said that, right. >> Yes you did, okay let's move on to the next one. >> Well it's a doppelganger, the digital twin doppelganger that's automatically created by your very fact that you're living and interacting and so forth and so on. It's going to accumulate regardless. Now that doppelganger may not be your agent, or might not be the foundation for your agent unless there's some other piece of logic like an interest graph that you build, a human being saying this is my broad set of interests, and so all of my agents out there in the IoT, you all need to be aware that when you make a decision on my behalf as my agent, this is what Jim would do. You know I mean there needs to be that kind of logic somewhere in this fabric to enable true agency. >> All right, so I'm going to start with you. Oh go ahead. >> I have a real short answer to this though. I think that Big Data provides the data and compute platform to make AI possible. For those of us who dipped our toes in the water in the 80s, we got clobbered because we didn't have the, we didn't have the facilities, we didn't have the resources to really do AI, we just kind of played around with it. And I think that the other thing about it is if you combine Big Data and AI and IoT, what you're going to see is people, a lot of the applications we develop now are very inward looking, we look at our organization, we look at our customers. We try to figure out how to sell more shoes to fashionable ladies, right? But with this technology, I think people can really expand what they're thinking about and what they model and come up with applications that are much more external. >> Actually what I would add to that is also it actually introduces being able to use engineering, right? Having engineers interested in the data. Because it's actually technical data that's collected not just say preferences or information about people, but actual measurements that are being collected with IoT. So it's really interesting in the engineering space because it opens up a whole new world for the engineers to actually look at data and to actually combine both that hardware side as well as the data that's being collected from it. >> Well, Neil, you and I have talked about something, 'cause it's not just engineers. We have in the healthcare industry for example, which you know a fair amount about, there's this notion of empirical based management. And the idea that increasingly we have to be driven by data as a way of improving the way that managers do things, the way the managers collect or collaborate and ultimately collectively how they take action. So it's not just engineers, it's supposed to also inform business, what's actually happening in the healthcare world when we start thinking about some of this empirical based management, is it working? What are some of the barriers? >> It's not a function of technology. What happens in medicine and healthcare research is, I guess you can say it borders on fraud. (people chuckling) No, I'm not kidding. I know the New England Journal of Medicine a couple of years ago released a study and said that at least half their articles that they published turned out to be written, ghost written by pharmaceutical companies. (man chuckling) Right, so I think the problem is that when you do a clinical study, the one that really killed me about 10 years ago was the women's health initiative. They spent $700 million gathering this data over 20 years. And when they released it they looked at all the wrong things deliberately, right? So I think that's a systemic-- >> I think you're bringing up a really important point that we haven't brought up yet, and that is is can you use Big Data and machine learning to begin to take the biases out? So if you let the, if you divorce your preconceived notions and your biases from the data and let the data lead you to the logic, you start to, I think get better over time, but it's going to take a while to get there because we do tend to gravitate towards our biases. >> I will share an anecdote. So I had some arm pain, and I had numbness in my thumb and pointer finger and I went to, excruciating pain, went to the hospital. So the doctor examined me, and he said you probably have a pinched nerve, he said, but I'm not exactly sure which nerve it would be, I'll be right back. And I kid you not, he went to a computer and he Googled it. (Neil laughs) And he came back because this little bit of information was something that could easily be looked up, right? Every nerve in your spine is connected to your different fingers so the pointer and the thumb just happens to be your C6, so he came back and said, it's your C6. (Neil mumbles) >> You know an interesting, I mean that's a good example. One of the issues with healthcare data is that the data set is not always shared across the entire research community, so by making Big Data accessible to everyone, you actually start a more rational conversation or debate on well what are the true insights-- >> If that conversation includes what Judith talked about, the actual model that you use to set priorities and make decisions about what's actually important. So it's not just about improving, this is the test. It's not just about improving your understanding of the wrong thing, it's also testing whether it's the right or wrong thing as well. >> That's right, to be able to test that you need to have humans in dialog with one another bringing different biases to the table to work through okay is there truth in this data? >> It's context and it's correlation and you can have a great correlation that's garbage. You know if you don't have the right context. >> Peter: So I want to, hold on Jim, I want to, >> It's exploratory. >> Hold on Jim, I want to take it to the next question 'cause I want to build off of what you talked about Stephanie and that is that this says something about what is the Edge. And our perspective is that the Edge is not just devices. That when we talk about the Edge, we're talking about human beings and the role that human beings are going to play both as sensors or carrying things with them, but also as actuators, actually taking action which is not a simple thing. So what do you guys think? What does the Edge mean to you? Joe, why don't you start? >> Well, I think it could be a combination of the two. And specifically when we talk about healthcare. So I believe in 2017 when we eat we don't know why we're eating, like I think we should absolutely by now be able to know exactly what is my protein level, what is my calcium level, what is my potassium level? And then find the foods to meet that. What have I depleted versus what I should have, and eat very very purposely and not by taste-- >> And it's amazing that red wine is always the answer. >> It is. (people laughing) And tequila, that helps too. >> Jim: You're a precision foodie is what you are. (several chuckle) >> There's no reason why we should not be able to know that right now, right? And when it comes to healthcare is, the biggest problem or challenge with healthcare is no matter how great of a technology you have, you can't, you can't, you can't manage what you can't measure. And you're really not allowed to use a lot of this data so you can't measure it, right? You can't do things very very scientifically right, in the healthcare world and I think regulation in the healthcare world is really burdening advancement in science. >> Peter: Any thoughts Jennifer? >> Yes, I teach statistics for data scientists, right, so you know we talk about a lot of these concepts. I think what makes these questions so difficult is you have to find a balance, right, a middle ground. For instance, in the case of are you being too biased through data, well you could say like we want to look at data only objectively, but then there are certain relationships that your data models might show that aren't actually a causal relationship. For instance, if there's an alien that came from space and saw earth, saw the people, everyone's carrying umbrellas right, and then it started to rain. That alien might think well, it's because they're carrying umbrellas that it's raining. Now we know from real world that that's actually not the way these things work. So if you look only at the data, that's the potential risk. That you'll start making associations or saying something's causal when it's actually not, right? So that's one of the, one of the I think big challenges. I think when it comes to looking also at things like healthcare data, right? Do you collect data about anything and everything? Does it mean that A, we need to collect all that data for the question we're looking at? Or that it's actually the best, more optimal way to be able to get to the answer? Meaning sometimes you can take some shortcuts in terms of what data you collect and still get the right answer and not have maybe that level of specificity that's going to cost you millions extra to be able to get. >> So Jennifer as a data scientist, I want to build upon what you just said. And that is, are we going to start to see methods and models emerge for how we actually solve some of these problems? So for example, we know how to build a system for stylized process like accounting or some elements of accounting. We have methods and models that lead to technology and actions and whatnot all the way down to that that system can be generated. We don't have the same notion to the same degree when we start talking about AI and some of these Big Datas. We have algorithms, we have technology. But are we going to start seeing, as a data scientist, repeatability and learning and how to think the problems through that's going to lead us to a more likely best or at least good result? >> So I think that's a bit of a tough question, right? Because part of it is, it's going to depend on how many of these researchers actually get exposed to real world scenarios, right? Research looks into all these papers, and you come up with all these models, but if it's never tested in a real world scenario, well, I mean we really can't validate that it works, right? So I think it is dependent on how much of this integration there's going to be between the research community and industry and how much investment there is. Funding is going to matter in this case. If there's no funding in the research side, then you'll see a lot of industry folk who feel very confident about their models that, but again on the other side of course, if researchers don't validate those models then you really can't say for sure that it's actually more accurate, or it's more efficient. >> It's the issue of real world testing and experimentation, A B testing, that's standard practice in many operationalized ML and AI implementations in the business world, but real world experimentation in the Edge analytics, what you're actually transducing are touching people's actual lives. Problem there is, like in healthcare and so forth, when you're experimenting with people's lives, somebody's going to die. I mean, in other words, that's a critical, in terms of causal analysis, you've got to tread lightly on doing operationalizing that kind of testing in the IoT when people's lives and health are at stake. >> We still give 'em placebos. So we still test 'em. All right so let's go to the next question. What are the hottest innovations in AI? Stephanie I want to start with you as a company, someone at a company that's got kind of an interesting little thing happening. We start thinking about how do we better catalog data and represent it to a large number of people. What are some of the hottest innovations in AI as you see it? >> I think it's a little counter intuitive about what the hottest innovations are in AI, because we're at a spot in the industry where the most successful companies that are working with AI are actually incorporating them into solutions. So the best AI solutions are actually the products that you don't know there's AI operating underneath. But they're having a significant impact on business decision making or bringing a different type of application to the market and you know, I think there's a lot of investment that's going into AI tooling and tool sets for data scientists or researchers, but the more innovative companies are thinking through how do we really take AI and make it have an impact on business decision making and that means kind of hiding the AI to the business user. Because if you think a bot is making a decision instead of you, you're not going to partner with that bot very easily or very readily. I worked at, way at the start of my career, I worked in CRM when recommendation engines were all the rage online and also in call centers. And the hardest thing was to get a call center agent to actually read the script that the algorithm was presenting to them, that algorithm was 99% correct most of the time, but there was this human resistance to letting a computer tell you what to tell that customer on the other side even if it was more successful in the end. And so I think that the innovation in AI that's really going to push us forward is when humans feel like they can partner with these bots and they don't think of it as a bot, but they think about as assisting their work and getting to a better result-- >> Hence the augmentation point you made earlier. >> Absolutely, absolutely. >> Joe how 'about you? What do you look at? What are you excited about? >> I think the coolest thing at the moment right now is chat bots. Like to be able, like to have voice be able to speak with you in natural language, to do that, I think that's pretty innovative, right? And I do think that eventually, for the average user, not for techies like me, but for the average user, I think keyboards are going to be a thing of the past. I think we're going to communicate with computers through voice and I think this is the very very beginning of that and it's an incredible innovation. >> Neil? >> Well, I think we all have myopia here. We're all thinking about commercial applications. Big, big things are happening with AI in the intelligence community, in military, the defense industry, in all sorts of things. Meteorology. And that's where, well, hopefully not on an every day basis with military, you really see the effect of this. But I was involved in a project a couple of years ago where we were developing AI software to detect artillery pieces in terrain from satellite imagery. I don't have to tell you what country that was. I think you can probably figure that one out right? But there are legions of people in many many companies that are involved in that industry. So if you're talking about the dollars spent on AI, I think the stuff that we do in our industries is probably fairly small. >> Well it reminds me of an application I actually thought was interesting about AI related to that, AI being applied to removing mines from war zones. >> Why not? >> Which is not a bad thing for a whole lot of people. Judith what do you look at? >> So I'm looking at things like being able to have pre-trained data sets in specific solution areas. I think that that's something that's coming. Also the ability to, to really be able to have a machine assist you in selecting the right algorithms based on what your data looks like and the problems you're trying to solve. Some of the things that data scientists still spend a lot of their time on, but can be augmented with some, basically we have to move to levels of abstraction before this becomes truly ubiquitous across many different areas. >> Peter: Jennifer? >> So I'm going to say computer vision. >> Computer vision? >> Computer vision. So computer vision ranges from image recognition to be able to say what content is in the image. Is it a dog, is it a cat, is it a blueberry muffin? Like a sort of popular post out there where it's like a blueberry muffin versus like I think a chihuahua and then it compares the two. And can the AI really actually detect difference, right? So I think that's really where a lot of people who are in this space of being in both the AI space as well as data science are looking to for the new innovations. I think, for instance, cloud vision I think that's what Google still calls it. The vision API we've they've released on beta allows you to actually use an API to send your image and then have it be recognized right, by their API. There's another startup in New York called Clarify that also does a similar thing as well as you know Amazon has their recognition platform as well. So I think in a, from images being able to detect what's in the content as well as from videos, being able to say things like how many people are entering a frame? How many people enter the store? Not having to actually go look at it and count it, but having a computer actually tally that information for you, right? >> There's actually an extra piece to that. So if I have a picture of a stop sign, and I'm an automated car, and is it a picture on the back of a bus of a stop sign, or is it a real stop sign? So that's going to be one of the complications. >> Doesn't matter to a New York City cab driver. How 'about you Jim? >> Probably not. (laughs) >> Hottest thing in AI is General Adversarial Networks, GANT, what's hot about that, well, I'll be very quick, most AI, most deep learning, machine learning is analytical, it's distilling or inferring insights from the data. Generative takes that same algorithmic basis but to build stuff. In other words, to create realistic looking photographs, to compose music, to build CAD CAM models essentially that can be constructed on 3D printers. So GANT, it's a huge research focus all around the world are used for, often increasingly used for natural language generation. In other words it's institutionalizing or having a foundation for nailing the Turing test every single time, building something with machines that looks like it was constructed by a human and doing it over and over again to fool humans. I mean you can imagine the fraud potential. But you can also imagine just the sheer, like it's going to shape the world, GANT. >> All right so I'm going to say one thing, and then we're going to ask if anybody in the audience has an idea. So the thing that I find interesting is traditional programs, or when you tell a machine to do something you don't need incentives. When you tell a human being something, you have to provide incentives. Like how do you get someone to actually read the text. And this whole question of elements within AI that incorporate incentives as a way of trying to guide human behavior is absolutely fascinating to me. Whether it's gamification, or even some things we're thinking about with block chain and bitcoins and related types of stuff. To my mind that's going to have an enormous impact, some good, some bad. Anybody in the audience? I don't want to lose everybody here. What do you think sir? And I'll try to do my best to repeat it. Oh we have a mic. >> So my question's about, Okay, so the question's pretty much about what Stephanie's talking about which is human and loop training right? I come from a computer vision background. That's the problem, we need millions of images trained, we need humans to do that. And that's like you know, the workforce is essentially people that aren't necessarily part of the AI community, they're people that are just able to use that data and analyze the data and label that data. That's something that I think is a big problem everyone in the computer vision industry at least faces. I was wondering-- >> So again, but the problem is that is the difficulty of methodologically bringing together people who understand it and people who, people who have domain expertise people who have algorithm expertise and working together? >> I think the expertise issue comes in healthcare, right? In healthcare you need experts to be labeling your images. With contextual information where essentially augmented reality applications coming in, you have the AR kit and everything coming out, but there is a lack of context based intelligence. And all of that comes through training images, and all of that requires people to do it. And that's kind of like the foundational basis of AI coming forward is not necessarily an algorithm, right? It's how well are datas labeled? Who's doing the labeling and how do we ensure that it happens? >> Great question. So for the panel. So if you think about it, a consultant talks about being on the bench. How much time are they going to have to spend on trying to develop additional business? How much time should we set aside for executives to help train some of the assistants? >> I think that the key is not, to think of the problem a different way is that you would have people manually label data and that's one way to solve the problem. But you can also look at what is the natural workflow of that executive, or that individual? And is there a way to gather that context automatically using AI, right? And if you can do that, it's similar to what we do in our product, we observe how someone is analyzing the data and from those observations we can actually create the metadata that then trains the system in a particular direction. But you have to think about solving the problem differently of finding the workflow that then you can feed into to make this labeling easy without the human really realizing that they're labeling the data. >> Peter: Anybody else? >> I'll just add to what Stephanie said, so in the IoT applications, all those sensory modalities, the computer vision, the speech recognition, all that, that's all potential training data. So it cross checks against all the other models that are processing all the other data coming from that device. So that the natural language process of understanding can be reality checked against the images that the person happens to be commenting upon, or the scene in which they're embedded, so yeah, the data's embedded-- >> I don't think we're, we're not at the stage yet where this is easy. It's going to take time before we do start doing the pre-training of some of these details so that it goes faster, but right now, there're not that many shortcuts. >> Go ahead Joe. >> Sorry so a couple things. So one is like, I was just caught up on your incentivizing programs to be more efficient like humans. You know in Ethereum that has this notion, which is bot chain, has this theory, this concept of gas. Where like as the process becomes more efficient it costs less to actually run, right? It costs less ether, right? So it actually is kind of, the machine is actually incentivized and you don't really know what it's going to cost until the machine processes it, right? So there is like some notion of that there. But as far as like vision, like training the machine for computer vision, I think it's through adoption and crowdsourcing, so as people start using it more they're going to be adding more pictures. Very very organically. And then the machines will be trained and right now is a very small handful doing it, and it's very proactive by the Googles and the Facebooks and all of that. But as we start using it, as they start looking at my images and Jim's and Jen's images, it's going to keep getting smarter and smarter through adoption and through very organic process. >> So Neil, let me ask you a question. Who owns the value that's generated as a consequence of all these people ultimately contributing their insight and intelligence into these systems? >> Well, to a certain extent the people who are contributing the insight own nothing because the systems collect their actions and the things they do and then that data doesn't belong to them, it belongs to whoever collected it or whoever's going to do something with it. But the other thing, getting back to the medical stuff. It's not enough to say that the systems, people will do the right thing, because a lot of them are not motivated to do the right thing. The whole grant thing, the whole oh my god I'm not going to go against the senior professor. A lot of these, I knew a guy who was a doctor at University of Pittsburgh and they were doing a clinical study on the tubes that they put in little kids' ears who have ear infections, right? And-- >> Google it! Who helps out? >> Anyway, I forget the exact thing, but he came out and said that the principle investigator lied when he made the presentation, that it should be this, I forget which way it went. He was fired from his position at Pittsburgh and he has never worked as a doctor again. 'Cause he went against the senior line of authority. He was-- >> Another question back here? >> Man: Yes, Mark Turner has a question. >> Not a question, just want to piggyback what you're saying about the transfixation of maybe in healthcare of black and white images and color images in the case of sonograms and ultrasound and mammograms, you see that happening using AI? You see that being, I mean it's already happening, do you see it moving forward in that kind of way? I mean, talk more about that, about you know, AI and black and white images being used and they can be transfixed, they can be made to color images so you can see things better, doctors can perform better operations. >> So I'm sorry, but could you summarize down? What's the question? Summarize it just, >> I had a lot of students, they're interested in the cross pollenization between AI and say the medical community as far as things like ultrasound and sonograms and mammograms and how you can literally take a black and white image and it can, using algorithms and stuff be made to color images that can help doctors better do the work that they've already been doing, just do it better. You touched on it like 30 seconds. >> So how AI can be used to actually add information in a way that's not necessarily invasive but is ultimately improves how someone might respond to it or use it, yes? Related? I've also got something say about medical images in a second, any of you guys want to, go ahead Jennifer. >> Yeah, so for one thing, you know and it kind of goes back to what we were talking about before. When we look at for instance scans, like at some point I was looking at CT scans, right, for lung cancer nodules. In order for me, who I don't have a medical background, to identify where the nodule is, of course, a doctor actually had to go in and specify which slice of the scan had the nodule and where exactly it is, so it's on both the slice level as well as, within that 2D image, where it's located and the size of it. So the beauty of things like AI is that ultimately right now a radiologist has to look at every slice and actually identify this manually, right? The goal of course would be that one day we wouldn't have to have someone look at every slice to like 300 usually slices and be able to identify it much more automated. And I think the reality is we're not going to get something where it's going to be 100%. And with anything we do in the real world it's always like a 95% chance of it being accurate. So I think it's finding that in between of where, what's the threshold that we want to use to be able to say that this is, definitively say a lung cancer nodule or not. I think the other thing to think about is in terms of how their using other information, what they might use is a for instance, to say like you know, based on other characteristics of the person's health, they might use that as sort of a grading right? So you know, how dark or how light something is, identify maybe in that region, the prevalence of that specific variable. So that's usually how they integrate that information into something that's already existing in the computer vision sense. I think that's, the difficulty with this of course, is being able to identify which variables were introduced into data that does exist. >> So I'll make two quick observations on this then I'll go to the next question. One is radiologists have historically been some of the highest paid physicians within the medical community partly because they don't have to be particularly clinical. They don't have to spend a lot of time with patients. They tend to spend time with doctors which means they can do a lot of work in a little bit of time, and charge a fair amount of money. As we start to introduce some of these technologies that allow us to from a machine standpoint actually make diagnoses based on those images, I find it fascinating that you now see television ads promoting the role that the radiologist plays in clinical medicine. It's kind of an interesting response. >> It's also disruptive as I'm seeing more and more studies showing that deep learning models processing images, ultrasounds and so forth are getting as accurate as many of the best radiologists. >> That's the point! >> Detecting cancer >> Now radiologists are saying oh look, we do this great thing in terms of interacting with the patients, never have because they're being dis-intermediated. The second thing that I'll note is one of my favorite examples of that if I got it right, is looking at the images, the deep space images that come out of Hubble. Where they're taking data from thousands, maybe even millions of images and combining it together in interesting ways you can actually see depth. You can actually move through to a very very small scale a system that's 150, well maybe that, can't be that much, maybe six billion light years away. Fascinating stuff. All right so let me go to the last question here, and then I'm going to close it down, then we can have something to drink. What are the hottest, oh I'm sorry, question? >> Yes, hi, my name's George, I'm with Blue Talon. You asked earlier there the question what's the hottest thing in the Edge and AI, I would say that it's security. It seems to me that before you can empower agency you need to be able to authorize what they can act on, how they can act on, who they can act on. So it seems if you're going to move from very distributed data at the Edge and analytics at the Edge, there has to be security similarly done at the Edge. And I saw (speaking faintly) slides that called out security as a key prerequisite and maybe Judith can comment, but I'm curious how security's going to evolve to meet this analytics at the Edge. >> Well, let me do that and I'll ask Jen to comment. The notion of agency is crucially important, slightly different from security, just so we're clear. And the basic idea here is historically folks have thought about moving data or they thought about moving application function, now we are thinking about moving authority. So as you said. That's not necessarily, that's not really a security question, but this has been a problem that's been in, of concern in a number of different domains. How do we move authority with the resources? And that's really what informs the whole agency process. But with that said, Jim. >> Yeah actually I'll, yeah, thank you for bringing up security so identity is the foundation of security. Strong identity, multifactor, face recognition, biometrics and so forth. Clearly AI, machine learning, deep learning are powering a new era of biometrics and you know it's behavioral metrics and so forth that's organic to people's use of devices and so forth. You know getting to the point that Peter was raising is important, agency! Systems of agency. Your agent, you have to, you as a human being should be vouching in a secure, tamper proof way, your identity should be vouching for the identity of some agent, physical or virtual that does stuff on your behalf. How can that, how should that be managed within this increasingly distributed IoT fabric? Well a lot of that's been worked. It all ran through webs of trust, public key infrastructure, formats and you know SAML for single sign and so forth. It's all about assertion, strong assertions and vouching. I mean there's the whole workflows of things. Back in the ancient days when I was actually a PKI analyst three analyst firms ago, I got deep into all the guts of all those federation agreements, something like that has to be IoT scalable to enable systems agency to be truly fluid. So we can vouch for our agents wherever they happen to be. We're going to keep on having as human beings agents all over creation, we're not even going to be aware of everywhere that our agents are, but our identity-- >> It's not just-- >> Our identity has to follow. >> But it's not just identity, it's also authorization and context. >> Permissioning, of course. >> So I may be the right person to do something yesterday, but I'm not authorized to do it in another context in another application. >> Role based permissioning, yeah. Or persona based. >> That's right. >> I agree. >> And obviously it's going to be interesting to see the role that block chain or its follow on to the technology is going to play here. Okay so let me throw one more questions out. What are the hottest applications of AI at the Edge? We've talked about a number of them, does anybody want to add something that hasn't been talked about? Or do you want to get a beer? (people laughing) Stephanie, you raised your hand first. >> I was going to go, I bring something mundane to the table actually because I think one of the most exciting innovations with IoT and AI are actually simple things like City of San Diego is rolling out 3200 automated street lights that will actually help you find a parking space, reduce the amount of emissions into the atmosphere, so has some environmental change, positive environmental change impact. I mean, it's street lights, it's not like a, it's not medical industry, it doesn't look like a life changing innovation, and yet if we automate streetlights and we manage our energy better, and maybe they can flicker on and off if there's a parking space there for you, that's a significant impact on everyone's life. >> And dramatically suppress the impact of backseat driving! >> (laughs) Exactly. >> Joe what were you saying? >> I was just going to say you know there's already the technology out there where you can put a camera on a drone with machine learning within an artificial intelligence within it, and it can look at buildings and determine whether there's rusty pipes and cracks in cement and leaky roofs and all of those things. And that's all based on artificial intelligence. And I think if you can do that, to be able to look at an x-ray and determine if there's a tumor there is not out of the realm of possibility, right? >> Neil? >> I agree with both of them, that's what I meant about external kind of applications. Instead of figuring out what to sell our customers. Which is most what we hear. I just, I think all of those things are imminently doable. And boy street lights that help you find a parking place, that's brilliant, right? >> Simple! >> It improves your life more than, I dunno. Something I use on the internet recently, but I think it's great! That's, I'd like to see a thousand things like that. >> Peter: Jim? >> Yeah, building on what Stephanie and Neil were saying, it's ambient intelligence built into everything to enable fine grain microclimate awareness of all of us as human beings moving through the world. And enable reading of every microclimate in buildings. In other words, you know you have sensors on your body that are always detecting the heat, the humidity, the level of pollution or whatever in every environment that you're in or that you might be likely to move into fairly soon and either A can help give you guidance in real time about where to avoid, or give that environment guidance about how to adjust itself to your, like the lighting or whatever it might be to your specific requirements. And you know when you have a room like this, full of other human beings, there has to be some negotiated settlement. Some will find it too hot, some will find it too cold or whatever but I think that is fundamental in terms of reshaping the sheer quality of experience of most of our lived habitats on the planet potentially. That's really the Edge analytics application that depends on everybody having, being fully equipped with a personal area network of sensors that's communicating into the cloud. >> Jennifer? >> So I think, what's really interesting about it is being able to utilize the technology we do have, it's a lot cheaper now to have a lot of these ways of measuring that we didn't have before. And whether or not engineers can then leverage what we have as ways to measure things and then of course then you need people like data scientists to build the right model. So you can collect all this data, if you don't build the right model that identifies these patterns then all that data's just collected and it's just made a repository. So without having the models that supports patterns that are actually in the data, you're not going to find a better way of being able to find insights in the data itself. So I think what will be really interesting is to see how existing technology is leveraged, to collect data and then how that's actually modeled as well as to be able to see how technology's going to now develop from where it is now, to being able to either collect things more sensitively or in the case of say for instance if you're dealing with like how people move, whether we can build things that we can then use to measure how we move, right? Like how we move every day and then being able to model that in a way that is actually going to give us better insights in things like healthcare and just maybe even just our behaviors. >> Peter: Judith? >> So, I think we also have to look at it from a peer to peer perspective. So I may be able to get some data from one thing at the Edge, but then all those Edge devices, sensors or whatever, they all have to interact with each other because we don't live, we may, in our business lives, act in silos, but in the real world when you look at things like sensors and devices it's how they react with each other on a peer to peer basis. >> All right, before I invite John up, I want to say, I'll say what my thing is, and it's not the hottest. It's the one I hate the most. I hate AI generated music. (people laughing) Hate it. All right, I want to thank all the panelists, every single person, some great commentary, great observations. I want to thank you very much. I want to thank everybody that joined. John in a second you'll kind of announce who's the big winner. But the one thing I want to do is, is I was listening, I learned a lot from everybody, but I want to call out the one comment that I think we all need to remember, and I'm going to give you the award Stephanie. And that is increasing we have to remember that the best AI is probably AI that we don't even know is working on our behalf. The same flip side of that is all of us have to be very cognizant of the idea that AI is acting on our behalf and we may not know it. So, John why don't you come on up. Who won the, whatever it's called, the raffle? >> You won. >> Thank you! >> How 'about a round of applause for the great panel. (audience applauding) Okay we have a put the business cards in the basket, we're going to have that brought up. We're going to have two raffle gifts, some nice Bose headsets and speaker, Bluetooth speaker. Got to wait for that. I just want to say thank you for coming and for the folks watching, this is our fifth year doing our own event called Big Data NYC which is really an extension of the landscape beyond the Big Data world that's Cloud and AI and IoT and other great things happen and great experts and influencers and analysts here. Thanks for sharing your opinion. Really appreciate you taking the time to come out and share your data and your knowledge, appreciate it. Thank you. Where's the? >> Sam's right in front of you. >> There's the thing, okay. Got to be present to win. We saw some people sneaking out the back door to go to a dinner. >> First prize first. >> Okay first prize is the Bose headset. >> Bluetooth and noise canceling. >> I won't look, Sam you got to hold it down, I can see the cards. >> All right. >> Stephanie you won! (Stephanie laughing) Okay, Sawny Cox, Sawny Allie Cox? (audience applauding) Yay look at that! He's here! The bar's open so help yourself, but we got one more. >> Congratulations. Picture right here. >> Hold that I saw you. Wake up a little bit. Okay, all right. Next one is, my kids love this. This is great, great for the beach, great for everything portable speaker, great gift. >> What is it? >> Portable speaker. >> It is a portable speaker, it's pretty awesome. >> Oh you grabbed mine. >> Oh that's one of our guys. >> (lauging) But who was it? >> Can't be related! Ava, Ava, Ava. Okay Gene Penesko (audience applauding) Hey! He came in! All right look at that, the timing's great. >> Another one? (people laughing) >> Hey thanks everybody, enjoy the night, thank Peter Burris, head of research for SiliconANGLE, Wikibon and he great guests and influencers and friends. And you guys for coming in the community. Thanks for watching and thanks for coming. Enjoy the party and some drinks and that's out, that's it for the influencer panel and analyst discussion. Thank you. (logo music)
SUMMARY :
is that the cloud is being extended out to the Edge, the next time I talk to you I don't want to hear that are made at the Edge to individual users We've got, again, the objective here is to have community From the Hurwitz Group. And finally Joe Caserta, Joe come on up. And to the left. I've been in the market for a couple years now. I'm the founder and Chief Data Scientist We can hear you now. And I have, I've been developing a lot of patents I just feel not worthy in the presence of Joe Caserta. If you can hear me, Joe Caserta, so yeah, I've been doing We recently rebranded to only Caserta 'cause what we do to make recommendations about what data to use the realities of how data is going to work in these to make sure that you have the analytics at the edge. and ARBI is the integration of Augmented Reality And it's going to say exactly you know, And if the machine appears to approximate what's and analyzed, conceivably some degree of mind reading but the machine as in the bot isn't able to tell you kind of some of the things you talked about, IoT, So that's one of the reasons why the IoT of the primary source. Well, I mean, I agree with that, I think I already or might not be the foundation for your agent All right, so I'm going to start with you. a lot of the applications we develop now are very So it's really interesting in the engineering space And the idea that increasingly we have to be driven I know the New England Journal of Medicine So if you let the, if you divorce your preconceived notions So the doctor examined me, and he said you probably have One of the issues with healthcare data is that the data set the actual model that you use to set priorities and you can have a great correlation that's garbage. What does the Edge mean to you? And then find the foods to meet that. And tequila, that helps too. Jim: You're a precision foodie is what you are. in the healthcare world and I think regulation For instance, in the case of are you being too biased We don't have the same notion to the same degree but again on the other side of course, in the Edge analytics, what you're actually transducing What are some of the hottest innovations in AI and that means kind of hiding the AI to the business user. I think keyboards are going to be a thing of the past. I don't have to tell you what country that was. AI being applied to removing mines from war zones. Judith what do you look at? and the problems you're trying to solve. And can the AI really actually detect difference, right? So that's going to be one of the complications. Doesn't matter to a New York City cab driver. (laughs) So GANT, it's a huge research focus all around the world So the thing that I find interesting is traditional people that aren't necessarily part of the AI community, and all of that requires people to do it. So for the panel. of finding the workflow that then you can feed into that the person happens to be commenting upon, It's going to take time before we do start doing and Jim's and Jen's images, it's going to keep getting Who owns the value that's generated as a consequence But the other thing, getting back to the medical stuff. and said that the principle investigator lied and color images in the case of sonograms and ultrasound and say the medical community as far as things in a second, any of you guys want to, go ahead Jennifer. to say like you know, based on other characteristics I find it fascinating that you now see television ads as many of the best radiologists. and then I'm going to close it down, It seems to me that before you can empower agency Well, let me do that and I'll ask Jen to comment. agreements, something like that has to be IoT scalable and context. So I may be the right person to do something yesterday, Or persona based. that block chain or its follow on to the technology into the atmosphere, so has some environmental change, the technology out there where you can put a camera And boy street lights that help you find a parking place, That's, I'd like to see a thousand things like that. that are always detecting the heat, the humidity, patterns that are actually in the data, but in the real world when you look at things and I'm going to give you the award Stephanie. and for the folks watching, We saw some people sneaking out the back door I can see the cards. Stephanie you won! Picture right here. This is great, great for the beach, great for everything All right look at that, the timing's great. that's it for the influencer panel and analyst discussion.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Judith | PERSON | 0.99+ |
Jennifer | PERSON | 0.99+ |
Jim | PERSON | 0.99+ |
Neil | PERSON | 0.99+ |
Stephanie McReynolds | PERSON | 0.99+ |
Jack | PERSON | 0.99+ |
2001 | DATE | 0.99+ |
Marc Andreessen | PERSON | 0.99+ |
Jim Kobielus | PERSON | 0.99+ |
Jennifer Shin | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Joe Caserta | PERSON | 0.99+ |
Suzie Welch | PERSON | 0.99+ |
Joe | PERSON | 0.99+ |
David Floyer | PERSON | 0.99+ |
Peter | PERSON | 0.99+ |
Stephanie | PERSON | 0.99+ |
Jen | PERSON | 0.99+ |
Neil Raden | PERSON | 0.99+ |
Mark Turner | PERSON | 0.99+ |
Judith Hurwitz | PERSON | 0.99+ |
John | PERSON | 0.99+ |
Elysian | ORGANIZATION | 0.99+ |
Uber | ORGANIZATION | 0.99+ |
Qualcomm | ORGANIZATION | 0.99+ |
Peter Burris | PERSON | 0.99+ |
2017 | DATE | 0.99+ |
Honeywell | ORGANIZATION | 0.99+ |
Apple | ORGANIZATION | 0.99+ |
Derek Sivers | PERSON | 0.99+ |
New York | LOCATION | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
New York City | LOCATION | 0.99+ |
1998 | DATE | 0.99+ |
Ali Ghodsi, Databricks - #SparkSummit - #theCUBE
>> Narrator: Live from San Francisco, it's the Cube. Covering Sparks Summit 2017. Brought to you by Databricks. (upbeat music) >> Welcome back to the Cube, day two at Sparks Summit. It's very exciting. I can't wait to talk to this gentleman. We have the CEO from Databricks, Ali Ghodsi, joining us. Ali, welcome to the show. >> Thank you so much. >> David: Well we sat here and watched the keynote this morning with Databricks and you delivered. Some big announcements. Before we get into some of that, I want to ask you, it's been about a year and a half since you transitioned from VP Products and Engineering into a CEO role. What's the most fun part of that and maybe what's the toughest part? >> Oh, I see. That's a good question and that's a tough question too. Most fun part is... You know, you touch many more facets of the business. So in engineering, it's all the tech and you're dealing only with engineers, mostly. Customers are one hop away, there's a product management layer between you and the customers. So you're very inwards focused. As a CEO you're dealing with marketing, finance, sales, these different functions. And then, externally with media, with stakeholders, a lot of customer calls. There's many, many more facets of the business that you're seeing. And it also gives you a preview and it also gives you a perspective that you couldn't have before. You see how the pieces fit together so you actually can have a better perspective and see further out than you could before. Before, I was more in my own pick situation where I was seeing sort of just the things relating to engineering so that's the best part. >> You're obviously working close with customers. You introduced a few customers this morning up on stage. But after the keynote, did you hear any reactions from people? What are they saying? >> Yes the keynote was recently so on my way here I've had multiple people sort of... A couple people that high-fived just before I got up on stage here. On several softwaring, people are really excited about that. Less devops, less configuration, let them focus on the innovation, they want that. So that's something that's celebrated. Yesterday-- >> Recap that real quickly for our audience here, what the server-less operating is. >> Absolutely, so it's very simple. We want lots of data scientists to be able to do machine learning without have to worry about the infrastructure underneath it. So we have something called server-less pools and server-less pools you can just have lots of data scientists use it. Under the hood, this pool of resources shrinks and expands automatically. It adds storage, if needed. And you don't have to worry about the configuration of it. And it also makes sure that it's isolating the different data scientists. So if one data scientist happened to run something that takes much more resources, it won't effect the other data scientists that are sharing that. So the short story of it is you cut costs significantly, you can now have 3000 people share the same resources and it enables them to move faster because they don't have to worry about all the devops that they otherwise have to do. >> George, is that a really big deal? >> Well we know whenever there's infrastructure that gets between a developer, data science, and their outcomes, that's friction. I'd be curious to say let's put that into a bigger perspective, which is if you go back several years, what were the class of apps that Spark was being used for, and in conjunction with what other technologies. Then bring us forward to today and then maybe look out three years. >> Ali: Yeah, that's a great question. So from the very beginning, data is key for any of these predictive analytics that we are doing. So that was always a key thing. But back then we saw more Hadoop data lakes. There more data lakes, data reservoirs, data marks that people were building out. We saw also a lot of traditional data warehousing. These days, we see more and more things moving to cloud. The Hadoop data lake received, often times at enterprises, being transformed into a cloud blob storage. That's cheaper, it's dual-up replicated, it's on many continents. That's something that we've seen happen. And we work across any of these, frankly. We, from the very beginning, Spark, one of its strengths is it integrates really well wherever your data is. And there's a huge community of developers around it, over 1000 people now that have contributed to it. Many of these people are in other organizations, they're employed by other companies and their job is to make sure that Databricks or Spark works really, really well with, say, Cassandra or with S3. That's a shift that we're seeing. In terms of applications people are building it's moving more into production. Four years ago much more of it was interactive exploratory. Now we're seeing production use cases. The fraud analytics use case that I mentioned, that's running continuously and the requirements there are different. You can't go down for ten minutes on a Saturday morning at 4 a.m. when you're doing credit card fraud because that's a lot of fraud and that affects the business of, say, Capital One. So that's much more crucial for them. >> So what would be the surrounding infrastructure and applications to make that whole solution work? Would you plug into a traditional system of record at the sales order entry kind of process point? Are you working off sort of semi-real-time or near real-time data? And did you train the models on the data lake? How did the pieces fit together? >> Unfortunately the answers depends on the particular architecture that the customer has. Every enterprise is slightly different. But it's not uncommon that the data is coming in. They're using Spark structured streaming in Databricks to get it into S3, so that's one piece of the puzzle. Then when it ends up there, from then on it funnels out to many different use cases. It could be a data warehousing use case, where they're just using interactive sequel on it. So that's the traditional interactive use case, but it could be a real-time use case, where it's actually taking those data that it's processed and it's detecting anomalies and putting triggers in other systems and then those systems downstream will react to those triggers for anomalies. But it could also be that it's periodically training models and storing the models somewhere. Often times it might be in a Cassandra, or in a Redis, or something of that sort. It will store the model there and then some web application can then take it from there, do point queries to it and say okay, I have a particular user that came in here George now, quickly look up what is his feature vector, figure out what the product recommendations we should show to this person and then it takes it from there. >> So in those cases, Cassandra or Redis, they're playing the serving layer. But generating the prediction model is coming from you and they're just doing the inferencing, the prediction itself. So if you look out several years, without asking you the roadmap, which you can feel free to answer, how do you see that scope of apps expanding or the share of an existing app like that? >> Yeah, I think two interesting trends that I believe in, I'll be foolish enough to make predictions. One is that I think that data warehousing, as we know it today, will continue to exist. However, it will be transformed and all the data warehousing solutions that we have today will add predictive capabilities or it will disappear. So let me motivate that. If you have a data warehouse with customer data in it and a fact table, you have all your transactions there, you have all your products there. Today, you can plug in BI tools and on top of that you can see what's my business health today and yesterday. But you can't ask it: tell me about tomorrow. Why not? The data is there, why can I not ask it this customer data, you tell me which of these customers are going to turn, or which one of them should I reach out to because I can possibly upsell these? Why wouldn't I want to do that? I think everyone would want to do that and everyday a warehousing solution in ten years will have these capabilities. Now with Spark sequel you can do that and the announcement yesterday showed you also how you can bake models, machinery models, and export them so a sequel analyst can just act system directly with no machine learning experience. It's just a simple function call and it just works. So that's one prediction I'll make. The second prediction I'll make is that we're going to see lots of revolutions in different industries, beyond the traditional 'get people to click on ads' and understand social behavior. We're going to go beyond that. So for those use cases it will be closer to the things I mentioned like Shell and what you need to do there is involve these domain experts. The domain experts will come in, the doctors, or the machine specialists, you have to involve them in the loop. And they'll be able to transform, maybe much less exotic applications, it's not the super high-tech Silicon Valley stuff, but it's nevertheless extremely important to every enterprise, to every protocol, on the planet. That's, I think, the exciting part of where predictions will go in the next decade or two. >> If I were to try and pick out the most man-bytes dug kind of observation in there, you know, it's supposed to be the unexpected thing, I would say where you said all data warehouses are going to become predictive services. Because what we've been hearing, it's sort of the other side of that coin which is all the operational databases will get all the predictive capabilities. But you said something very different. I guess my question is are you seeing the advanced analytics going to the data warehouse because the repository of data is going to be bigger there and so you can either build better models or because it's not burdened with transaction SLAs that you can serve up predictions quicker? >> The data warehousing has been about basic statistics. It's been a sequel that the language that is used is to get descriptive statistics. Tables with averages and medians, that's statistics. Why wouldn't you want to have advanced statistics which now does predictions on it. It just so happens that sequel is not the right interface for that. So it's going to be very natural that people who are already asking statistical questions for the last 30 years from their customer data, these massive throes of data that they have stored. Why wouldn't they want to also say, 'okay now give me more advanced statistics?' I'm not an expert on advanced statistics but you the system. Tell me what I should watch out for. Which of these customers do I talk to? Which of the products are in trouble? Which of the products are not, or which parts of my business are not doing well now? Predict the future for me. >> George: When you're doing that though, you're now doing it on data that has a fair amount of latency built into it. Because that's how it got into the data warehouse. Where if it's in the operational database, it's really low latency, typically low latency stuff. Where and why do you see that distinction? >> I do think also that we'll see more and more real-time engines take over. If you do things in real-time you can do it for a fraction of the cost. So we'll also see those capabilities come in. So you don't have to... Your question is, why would you want to once a week batch everything into a central warehouse and I agree with that. It will be streaming in live and then you can on that, do predictions, you can do basic analytics. I think basically the lines will blur between all these technologies that we're seeing. In some sense, Spark actually was the precursor to all that. So Spark already was unifying machine learning, sequel, ETL, real-time, and you're going to see that everywhere appear. >> You mentioned Shell as an example, one of your customers, you also had HP, Capital One, and you developed this unified analytics platform, that's solving some of their common problems. Now that you're in the mood to make predictions, what do you think are going to be the most compelling use cases or industries where you're going to see Databricks going in the future? >> That's a hard one. Right now, I think healthcare. There's a lot of data sets, there's a lot of gene sequencing data. They want to be able to use machine learning. In fact, I think those industries being transformed slowly from using classical statistics into machine learning. We've actually helped some of these companies do that. We've set up workshops and they've gotten people trained. And now they're hiring machine learning experts that are coming in. So that's one I think in the healthcare industry, whether it's for drug-testing, clinical-trials, even diagnosis, that's a big one, I do think industrial IT. These are big companies with lots of equipment, they have tons of sensor data, massive data sets. There's a lot of predictions that they can do on that. So that's a second one I would say. Financial industry, they've always been about predictions, so it makes a lot of sense that they continue doing that. Those are the biggest ones for Databricks. But I think now also as slowly, other verticals are moving into the cloud. We'll see more of other use cases as well. But those are the biggest ones I see right now. It's hard to say where it will be ten years from now, or 15. Things are going so fast that it's hard to even predict six months. >> David: Do you believe IOT is going to be a big business driver? >> Yes, absolutely. >> I want to circle back where you said that we've got different types of databases but we're going to unify the capabilities. Without saying, it's not like one wins, one loses. >> Ali: Yes, I didn't want to do that. >> So describe maybe the characteristics of what a database that compliments Sparks really well might look like. >> That's hard for me to say. The capabilities of Spark, I think, are here to stay. The ability to be able to ETL variety of data that doesn't have structure, so Structured Query Language, SQL, is not fit for it, that is really important and it's going to become more important since data is the new oil, as they said. Well, then it's going to be very important to be able to work with all kinds of data and getting that into the systems. There's more things everyday being created. Devices, IOT, whatever it is that are spewing out this data in different forms and shapes. So being able to work with that variety, that's going to be an important property. So they'll have to do that. That's the ETL portion or the ELT portion. The real-time portion, not having to do this in a batch manner once a week because now time is a competitive advantage. So if I'm one week behind you that means I'm going to lose out. So doing that in real-time, or near human-time or human real-time, that's going to be really important. So that's going to come as well, I think, and people will demand that. That's going to be a competitive advantage. Wherever you can add that secret sauce it's going to add value to the customers. And then finally the predictive stuff, adding the predictive stuff. But I think people will want to continue to also do all the old stuff they've been doing. I don't think that's going to go away. Those bring value to customers, they want to do all those traditional use cases as well. >> So what about now where customers expect to have some, not clear how much, un-Primmed application platform like Spark. Some in the cloud that now that you've totally reordered the TCO equation. But then also at the edge for IOT-type use cases, do you have to slim down Spark to work at the edge? If you have server-less working in the cloud, does that mean you have to change the management paradigm on Prim. What does that mix look like? How does someone, you know how does a Fortune 200 company, get their arms around that? >> Ali: Yeah, this is a surprising thing, most surprising thing for me in the last year, is how many of those Fortune 200's that I was talking to three years ago and they were saying 'no way, we're not going into the cloud. You don't understand the regulations that we are facing or the amount of data that we have.' Or 'we can do it better,' or 'the security requirements that we have, no one can match that.' To now, those very same companies are saying 'absolutely, we're going.' It's not about if, it's about when. Now I would be hard-pressed to find any enterprise that says 'no, we're not going to go, ever.' And some companies we've even seen go from the cloud to on Prim, and then now back. Because the prices are getting more competitive in the cloud. Because now there's three, at least, major players that are competing and they're well-funded companies. In some sense, you have ad money and office money and retail money being thrown at this problem. Prices are getting competitive. Very soon, most IT folks will realize, there's no way we can do this faster, or better, or more reliable secure ourselves. >> David: We've got just a minute to go here before the break so we're going to kind of wrap it up here. And we got over 3000 people here at Spark Summit so it's the Spark community. I want you to talk to them for a moment. What problems do you want them to work on the most? And what are we going to be talking about a year from now at this table? >> The second one is harder. So I think the Spark community is doing a phenomenal job. I'm not going to tell them what to do. They should continue doing what they are doing already which is integrating Spark in the ecosystem, adding more and more integrations with the greatest technologies that are happening out there. Continue the innovation and we're super happy to have them here. We'll continue it as well, we'll continue to host this event and look forward to also having a Spark Summit in Europe, and also the East Coast soon. >> David: Okay, so I'm not going to ask you to make any more predictions. >> Alright, excellent. >> David: Ali this is great stuffy today. Thank you so much for taking some time and giving us more insight after the keynote this morning. Good luck with the rest of the show. >> Thank you. >> Thanks, Ali. And thank you for watching. That's Ali Ghodsi CEO from Databricks. We are Spark Summit 2017 here, on the Cube. Thanks for watching, stay with us. (upbeat mustic)
SUMMARY :
Brought to you by Databricks. We have the CEO from Databricks, Ali Ghodsi, joining us. the keynote this morning with Databricks and you delivered. that you couldn't have before. But after the keynote, did you Yes the keynote was recently so on my way here Recap that real quickly for our audience here, and server-less pools you can just have into a bigger perspective, which is if you go back So from the very beginning, So that's the traditional interactive use case, But generating the prediction model is coming from you and the announcement yesterday showed you also and so you can either build better models It's been a sequel that the language that is used Where and why do you see that distinction? and then you can on that, do predictions, what do you think are going to be It's hard to say where it will be ten years from now, or 15. I want to circle back where you said So describe maybe the characteristics of what a database and getting that into the systems. does that mean you have to change or the amount of data that we have.' I want you to talk to them for a moment. and also the East Coast soon. David: Okay, so I'm not going to ask you Thank you so much for taking some time And thank you for watching.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
George | PERSON | 0.99+ |
David | PERSON | 0.99+ |
HP | ORGANIZATION | 0.99+ |
Ali Ghodsi | PERSON | 0.99+ |
Europe | LOCATION | 0.99+ |
Ali | PERSON | 0.99+ |
Databricks | ORGANIZATION | 0.99+ |
San Francisco | LOCATION | 0.99+ |
Capital One | ORGANIZATION | 0.99+ |
three | QUANTITY | 0.99+ |
Today | DATE | 0.99+ |
one week | QUANTITY | 0.99+ |
tomorrow | DATE | 0.99+ |
last year | DATE | 0.99+ |
ten years | QUANTITY | 0.99+ |
yesterday | DATE | 0.99+ |
three years | QUANTITY | 0.99+ |
3000 people | QUANTITY | 0.99+ |
One | QUANTITY | 0.99+ |
ten minutes | QUANTITY | 0.99+ |
Four years ago | DATE | 0.99+ |
three years ago | DATE | 0.99+ |
next decade | DATE | 0.99+ |
six months | QUANTITY | 0.99+ |
Yesterday | DATE | 0.98+ |
over 1000 people | QUANTITY | 0.98+ |
East Coast | LOCATION | 0.98+ |
today | DATE | 0.98+ |
one | QUANTITY | 0.98+ |
one prediction | QUANTITY | 0.98+ |
second prediction | QUANTITY | 0.98+ |
Silicon Valley | LOCATION | 0.97+ |
Spark Summit 2017 | EVENT | 0.97+ |
Spark | TITLE | 0.97+ |
once a week | QUANTITY | 0.97+ |
Sparks Summit | EVENT | 0.97+ |
Fortune 200 | ORGANIZATION | 0.96+ |
over 3000 people | QUANTITY | 0.96+ |
about a year and a half | QUANTITY | 0.95+ |
Shell | ORGANIZATION | 0.95+ |
Spark | ORGANIZATION | 0.95+ |
Sparks | TITLE | 0.94+ |
IOT | ORGANIZATION | 0.94+ |
day two | QUANTITY | 0.94+ |
Sparks Summit 2017 | EVENT | 0.94+ |
this morning | DATE | 0.93+ |
second one | QUANTITY | 0.93+ |
S3 | TITLE | 0.85+ |
one data scientist | QUANTITY | 0.85+ |
15 | QUANTITY | 0.85+ |
Saturday morning at | DATE | 0.84+ |
tons | QUANTITY | 0.83+ |
S3 | ORGANIZATION | 0.8+ |
one piece of the puzzle | QUANTITY | 0.79+ |
couple people | QUANTITY | 0.77+ |
Prim | ORGANIZATION | 0.76+ |
several years | QUANTITY | 0.75+ |
Sam Lightstone, IBM - Chief Data Scientist, USA - #theCUBE
hey welcome back here ready Jeff Rick here with the key we're at the chief data scientist USA conference in downtown San Francisco and we're really excited to have a representative from IBM Sam Lightstone distinguished engineer from IBM join us Sam great to se you thank you very much pleasure to be here absolutely so we cover a ton of IBM events we're at world of Watson world lots of developer conference big the big event in New York earlier this year around strata so you know we're big fans of all the things that IBM is doing and in Rob Thomas and the SPARC group so I could go on and on but we won't go there we'll talk about what you were talking about earlier today and kind of let the cat out of the bag which is always exciting breaking news or breaking Bay there I don't know exactly how we would describe it but you talked about something new IBM data confluence yeah you could share this what's that all about yeah so it's a it's a whole new idea a whole new paradigm that were that we were incubating right now inside of IBM and it's not yet available but we're hoping to start trials in January ish timeframe but it comes from a realization that so much data is about to come upon us from distributed data sources you know everybody's got not only your cell phone but increasingly data is coming from Internet of Things you're gonna have data coming from your car data come from your glasses some smart meters on your house and it's deluge of data and the way that people like to do data science on this data today is they pull this data from these devices and put it into a central repository which is which is a perfectly legitimate strategy but it means that you're creating copies of the data and there's a certain complexity of dragging that data through the internet into some central repository so the idea that we had with data confluence is to leave the data where it is and create and allow the data all these different data sources if you can imagine cars you can imagine cell phones or smart meters on buildings allow them to find one another and collaborate on data science problems like a computational mesh so that we can bring hundreds thousands millions of microprocessors to bear on the data where it lives without moving it around and our theory is not only is that simpler for everyone because the data doesn't have to move around but we can actually bring more computation to bear because every one of those data sources has compute and has persistence and you can multiply the the opportunities right and you took a chance you ran a live demo which is you know always risky business at any anything but but there was a really interesting because concepts that you highlighted kind of organically forming adapting constellation right of these of these sources and the example you use they were solar panels but for them to do this kind of automatically if you will as opposed to someone going in and scripting and building the structure because tomorrow as you demonstrated in your demo you might want to add more or add more so exactly that dynamic functions are pretty pretty interesting yeah and it's a very powerful concept and a very necessary concept and the reason it's so necessary is these devices could be anywhere right and you could have most your devices in New York but a few of them in the Yukon or Alaska or something and you don't want them to all be equally connected right so it's important to be sensitive to create this network that is sort of geospatially aware and connectivity aware not not just sort of hard-coded you know so that so one aspect of that is to be sensitive to network latency and topology that's one reason why it has to be automatic the other reason has to be automatic is if you really want this to scale to thousands of devices you can't have some programmer trying to figure out who connects to what right it's just too hard right so making it really adaptive and automatic is super important another thing that's really important for the Internet of Things is depending on the on the circumstance but if you can imagine cell phones for example you can have a network of thousands millions of phones but at any point in time somebody some of those funds are gonna be turned off so the network has to be adaptive to the possibility that devices go offline right are there intentionally like a phone perhaps unintentionally because they break you know if you have a device on a smart meter it may simply break and then that particular device is offline for a period of time right so the network has to be resilient to that and that's part of what we've been building in particular using technology that we incubated in our UK labs in Hursley so it's it's been a great collaboration across IBM this is not just you know one you know one set of people in one lab but actually a corporate collaboration and really our goal is to make this as you say automatic but I would I would say beyond automatic to make it resilient right there's got to be resilient and fault tolerant because the complexities that we could be dealing with are just too large for human being to deal with right and clearly and distributed right that's the big thing guys we're leveraging IBM bluemix cloud you know all this stuff doesn't happen with with cloud capabilities and the demo you did here you were here the data center was concerned San Jose and the actual data elements were in in Toronto so just you know Amazon and Microsoft and Google are always you know get talked about a lot it within the cloud space but really iBM is making major players and it if not in that top three certainly right there in the fourth position as a leader in cloud and then what this cloud enables and then really kind of with the whole cognitive push you know that's a priority for Ginni and the team to really bring more intelligence he's exactly right and what data confluence you know what we're hoping not only to tap in to data science on distributed systems for IOT and also for enterprise use cases as well but really to take it to the next level of hybrid cloud because these data sources could be in the cloud and they could be on-premises they could be anywhere in the world and you can mix and match and that's really a very powerful capability for our customers many companies now struggling as their data is now part cloud and part on-premises right and in the compute as well right you could deal shift exactly compute from the edge to the cloud you know a dynamic fashion based on what the kind of optimal solution is or as you said sometimes over the edges off lined and you can't do it there it's exactly right so kind of a cool story you said this came out of a out of something called blue unicorn what is blue you know fantastic so blue unicorn was an initiative that a few of us got together on inside of IBM you probably know some of these folks Rob Thomas so I think you've interviewed gears from Karachay Leah and myself and the three of us got together and we said you know we want to find a more effective way to tap in to the creative juices of our staff we got some of the greatest minds in the world working at IBM we hire brilliant people PhDs masters of the top schools all over the world and all too often we hire these people and we tell them what they should be working on that wouldn't it be better if we could find a repeatable process for them to come to us and say here's the next big innovation that IBM ssin should have and blue unicorn came out of that desire to tap into and and nurture this creative passion of of our staff and was really designed almost like an internal VC initiative so people would would come to us with proposals and we've got those proposals we start out with hundreds and feted it down to dozens that down to just a small few that we would fund from the ones that we funded you know that would go through periodic reviews until eventually we ended up with a very small set that are still being incubated and and did a confluence happen to have been one of those projects awesome so it's different than kind of the 10% thing this is actually almost like an internal you you put your proposal together you pitch it whereas if it was an internal VC you get funded and then you go do that with your team right one thing I would say is one of the you know as we're setting up we're trying to find ways to make it work make it efficient one of the best filtering factors that we came up with is that people had to show us running code before it was funded right right and that was amazing because that meant people had to work nights and weekends they had to have that level of passion and commitment for their idea to get to that level of vetting and that was incredible that that definitely filtered the people who were super passionate about what they were doing and the people just said yeah I'd like to tinker and that was tremendous okay and then you're here at the show melting a small show tight group kind of multi industry any good takeaway surprises from the last couple days here at the chief data science USA show you know it's been an amazing conference actually and some great speakers some great insights I think one of the most useful insights for me was was I was curious to hear from this audience what is the duration of data that is important to them do they need to see data from the last hour the last month the last year the last 10 years and of course it does vary from problem to problem but many people said you know for the work that I do I need about three months to build a model and then once I have a model I'm really looking at the last two to four weeks of data to gain data science insight and that was a very important point for me especially as we continue on our work on analytics data science and IBM it's very important for us to understand the range of data that that people are using shorter than you seem sure yeah it's shorter because I know certainly in the data warehousing space that I've been working a lot of my career in people do data analytics on you know six months a year or three years right so this is this is it definitely is somewhat of a shift and it tells us something about our society that things are moving faster and the idea that's older than six months is is usually not as interesting anymore yeah really shows kind of the dynamic real-time nature it's not this is analyzing just the old stuff is interesting but not nearly as interesting as being on top of where's the spark stream somebody's other thing is funny Beth Comstock kicked off the GU minds and machines event a couple days ago she said we even walk faster in cities they've done so everything is continuing to speed up right all right so you're from now you're back here what are we gonna be talking about Wow okay well you know we just launched a few months or a few weeks ago actually the the Watson Data Platform a huge event for us and it really is for us the foundation the data foundation of all the cognitive computing that we're that IBM is coming out with it's gonna bring together data science and data storage and collaboration across you know amongst analysts and data scientists together all all one platform for all your data needs I'm hoping that a year from now I'm going to speak to you about how data confluence is a core part of that of that platform and we're gonna be raeng analytics on millions of devices all over the world all right Sam well thanks for taking a few minutes I know you gotta go catch an airplane for stopping by and sharing your insight thank you all right Sam lights on I'm Jeff Creek you're watching the cube thanks for watching
**Summary and Sentiment Analysis are not been shown because of improper transcript**
ENTITIES
Entity | Category | Confidence |
---|---|---|
Amazon | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
Jeff Rick | PERSON | 0.99+ |
New York | LOCATION | 0.99+ |
Rob Thomas | PERSON | 0.99+ |
Toronto | LOCATION | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
Karachay Leah | PERSON | 0.99+ |
Alaska | LOCATION | 0.99+ |
San Jose | LOCATION | 0.99+ |
Sam Lightstone | PERSON | 0.99+ |
Sam Lightstone | PERSON | 0.99+ |
Jeff Creek | PERSON | 0.99+ |
three years | QUANTITY | 0.99+ |
Beth Comstock | PERSON | 0.99+ |
hundreds | QUANTITY | 0.99+ |
thousands | QUANTITY | 0.99+ |
three | QUANTITY | 0.99+ |
January | DATE | 0.99+ |
millions of devices | QUANTITY | 0.99+ |
thousands of devices | QUANTITY | 0.99+ |
10% | QUANTITY | 0.99+ |
Ginni | PERSON | 0.99+ |
Yukon | LOCATION | 0.99+ |
last year | DATE | 0.98+ |
Hursley | LOCATION | 0.98+ |
dozens | QUANTITY | 0.98+ |
SPARC | ORGANIZATION | 0.98+ |
UK | LOCATION | 0.98+ |
four weeks | QUANTITY | 0.98+ |
Sam lights | PERSON | 0.97+ |
tomorrow | DATE | 0.97+ |
Sam | PERSON | 0.97+ |
earlier this year | DATE | 0.97+ |
about three months | QUANTITY | 0.97+ |
one reason | QUANTITY | 0.97+ |
older than six months | QUANTITY | 0.97+ |
last month | DATE | 0.97+ |
today | DATE | 0.96+ |
six months a year | QUANTITY | 0.96+ |
fourth position | QUANTITY | 0.96+ |
iBM | ORGANIZATION | 0.95+ |
blue unicorn | TITLE | 0.94+ |
hundreds thousands millions of microprocessors | QUANTITY | 0.92+ |
blue unicorn | TITLE | 0.92+ |
one | QUANTITY | 0.92+ |
one lab | QUANTITY | 0.91+ |
earlier today | DATE | 0.9+ |
a few weeks ago | DATE | 0.9+ |
a couple days ago | DATE | 0.85+ |
chief data science | ORGANIZATION | 0.84+ |
one aspect | QUANTITY | 0.83+ |
millions of phones | QUANTITY | 0.82+ |
downtown San Francisco | LOCATION | 0.82+ |
top three | QUANTITY | 0.82+ |
USA | LOCATION | 0.81+ |
one of those projects | QUANTITY | 0.78+ |
last 10 years | DATE | 0.77+ |
a year from | DATE | 0.77+ |
one set of people | QUANTITY | 0.74+ |
a few months | DATE | 0.72+ |
last couple days | DATE | 0.7+ |
Chief Data Scientist | PERSON | 0.68+ |
last | QUANTITY | 0.66+ |
last hour | DATE | 0.63+ |
two | QUANTITY | 0.63+ |
lots of developer | QUANTITY | 0.61+ |
bluemix | COMMERCIAL_ITEM | 0.56+ |
Watson | EVENT | 0.51+ |
Watson | ORGANIZATION | 0.46+ |
strata | ORGANIZATION | 0.4+ |
blue | TITLE | 0.38+ |
Next-Generation Analytics Social Influencer Roundtable - #BigDataNYC 2016 #theCUBE
>> Narrator: Live from New York, it's the Cube, covering big data New York City 2016. Brought to you by headline sponsors, CISCO, IBM, NVIDIA, and our ecosystem sponsors, now here's your host, Dave Valante. >> Welcome back to New York City, everybody, this is the Cube, the worldwide leader in live tech coverage, and this is a cube first, we've got a nine person, actually eight person panel of experts, data scientists, all alike. I'm here with my co-host, James Cubelis, who has helped organize this panel of experts. James, welcome. >> Thank you very much, Dave, it's great to be here, and we have some really excellent brain power up there, so I'm going to let them talk. >> Okay, well thank you again-- >> And I'll interject my thoughts now and then, but I want to hear them. >> Okay, great, we know you well, Jim, we know you'll do that, so thank you for that, and appreciate you organizing this. Okay, so what I'm going to do to our panelists is ask you to introduce yourself. I'll introduce you, but tell us a little bit about yourself, and talk a little bit about what data science means to you. A number of you started in the field a long time ago, perhaps data warehouse experts before the term data science was coined. Some of you started probably after Hal Varian said it was the sexiest job in the world. (laughs) So think about how data science has changed and or what it means to you. We're going to start with Greg Piateski, who's from Boston. A Ph.D., KDnuggets, Greg, tell us about yourself and what data science means to you. >> Okay, well thank you Dave and thank you Jim for the invitation. Data science in a sense is the second oldest profession. I think people have this built-in need to find patterns and whatever we find we want to organize the data, but we do it well on a small scale, but we don't do it well on a large scale, so really, data science takes our need and helps us organize what we find, the patterns that we find that are really valid and useful and not just random, I think this is a big challenge of data science. I've actually started in this field before the term Data Science existed. I started as a researcher and organized the first few workshops on data mining and knowledge discovery, and the term data mining became less fashionable, became predictive analytics, now it's data science and it will be something else in a few years. >> Okay, thank you, Eves Mulkearns, Eves, I of course know you from Twitter. A lot of people know you as well. Tell us about your experiences and what data scientist means to you. >> Well, data science to me is if you take the two words, the data and the science, the science it holds a lot of expertise and skills there, it's statistics, it's mathematics, it's understanding the business and putting that together with the digitization of what we have. It's not only the structured data or the unstructured data what you store in the database try to get out and try to understand what is in there, but even video what is coming on and then trying to find, like George already said, the patterns in there and bringing value to the business but looking from a technical perspective, but still linking that to the business insights and you can do that on a technical level, but then you don't know yet what you need to find, or what you're looking for. >> Okay great, thank you. Craig Brown, Cube alum. How many people have been on the Cube actually before? >> I have. >> Okay, good. I always like to ask that question. So Craig, tell us a little bit about your background and, you know, data science, how has it changed, what's it all mean to you? >> Sure, so I'm Craig Brown, I've been in IT for almost 28 years, and that was obviously before the term data science, but I've evolved from, I started out as a developer. And evolved through the data ranks, as I called it, working with data structures, working with data systems, data technologies, and now we're working with data pure and simple. Data science to me is an individual or team of individuals that dissect the data, understand the data, help folks look at the data differently than just the information that, you know, we usually use in reports, and get more insights on, how to utilize it and better leverage it as an asset within an organization. >> Great, thank you Craig, okay, Jennifer Shin? Math is obviously part of being a data scientist. You're good at math I understand. Tell us about yourself. >> Yeah, so I'm a senior principle data scientist at the Nielsen Company. I'm also the founder of 8 Path Solutions, which is a data science, analytics, and technology company, and I'm also on the faculty in the Master of Information and Data Science program at UC Berkeley. So math is part of the IT statistics for data science actually this semester, and I think for me, I consider myself a scientist primarily, and data science is a nice day job to have, right? Something where there's industry need for people with my skill set in the sciences, and data gives us a great way of being able to communicate sort of what we know in science in a way that can be used out there in the real world. I think the best benefit for me is that now that I'm a data scientist, people know what my job is, whereas before, maybe five ten years ago, no one understood what I did. Now, people don't necessarily understand what I do now, but at least they understand kind of what I do, so it's still an improvement. >> Excellent. Thank you Jennifer. Joe Caserta, you're somebody who started in the data warehouse business, and saw that snake swallow a basketball and grow into what we now know as big data, so tell us about yourself. >> So I've been doing data for 30 years now, and I wrote the Data Warehouse ETL Toolkit with Ralph Timbal, which is the best selling book in the industry on preparing data for analytics, and with the big paradigm shift that's happened, you know for me the past seven years has been, instead of preparing data for people to analyze data to make decisions, now we're preparing data for machines to make the decisions, and I think that's the big shift from data analysis to data analytics and data science. >> Great, thank you. Miriam, Miriam Fridell, welcome. >> Thank you. I'm Miriam Fridell, I work for Elder Research, we are a data science consultancy, and I came to data science, sort of through a very circuitous route. I started off as a physicist, went to work as a consultant and software engineer, then became a research analyst, and finally came to data science. And I think one of the most interesting things to me about data science is that it's not simply about building an interesting model and doing some interesting mathematics, or maybe wrangling the data, all of which I love to do, but it's really the entire analytics lifecycle, and a value that you can actually extract from data at the end, and that's one of the things that I enjoy most is seeing a client's eyes light up or a wow, I didn't really know we could look at data that way, that's really interesting. I can actually do something with that, so I think that, to me, is one of the most interesting things about it. >> Great, thank you. Justin Sadeen, welcome. >> Absolutely, than you, thank you. So my name is Justin Sadeen, I work for Morph EDU, an artificial intelligence company in Atlanta, Georgia, and we develop learning platforms for non-profit and private educational institutions. So I'm a Marine Corp veteran turned data enthusiast, and so what I think about data science is the intersection of information, intelligence, and analysis, and I'm really excited about the transition from big data into smart data, and that's what I see data science as. >> Great, and last but not least, Dez Blanchfield, welcome mate. >> Good day. Yeah, I'm the one with the funny accent. So data science for me is probably the funniest job I've ever to describe to my mom. I've had quite a few different jobs, and she's never understood any of them, and this one she understands the least. I think a fun way to describe what we're trying to do in the world of data science and analytics now is it's the equivalent of high altitude mountain climbing. It's like the extreme sport version of the computer science world, because we have to be this magical unicorn of a human that can understand plain english problems from C-suite down and then translate it into code, either as soles or as teams of developers. And so there's this black art that we're expected to be able to transmogrify from something that we just in plain english say I would like to know X, and we have to go and figure it out, so there's this neat extreme sport view I have of rushing down the side of a mountain on a mountain bike and just dodging rocks and trees and things occasionally, because invariably, we do have things that go wrong, and they don't quite give us the answers we want. But I think we're at an interesting point in time now with the explosion in the types of technology that are at our fingertips, and the scale at which we can do things now, once upon a time we would sit at a terminal and write code and just look at data and watch it in columns, and then we ended up with spreadsheet technologies at our fingertips. Nowadays it's quite normal to instantiate a small high performance distributed cluster of computers, effectively a super computer in a public cloud, and throw some data at it and see what comes back. And we can do that on a credit card. So I think we're at a really interesting tipping point now where this coinage of data science needs to be slightly better defined, so that we can help organizations who have weird and strange questions that they want to ask, tell them solutions to those questions, and deliver on them in, I guess, a commodity deliverable. I want to know xyz and I want to know it in this time frame and I want to spend this much amount of money to do it, and I don't really care how you're going to do it. And there's so many tools we can choose from and there's so many platforms we can choose from, it's this little black art of computing, if you'd like, we're effectively making it up as we go in many ways, so I think it's one of the most exciting challenges that I've had, and I think I'm pretty sure I speak for most of us in that we're lucky that we get paid to do this amazing job. That we get make up on a daily basis in some cases. >> Excellent, well okay. So we'll just get right into it. I'm going to go off script-- >> Do they have unicorns down under? I think they have some strange species right? >> Well we put the pointy bit on the back. You guys have in on the front. >> So I was at an IBM event on Friday. It was a chief data officer summit, and I attended what was called the Data Divas' breakfast. It was a women in tech thing, and one of the CDOs, she said that 25% of chief data officers are women, which is much higher than you would normally see in the profile of IT. We happen to have 25% of our panelists are women. Is that common? Miriam and Jennifer, is that common for the data science field? Or is this a higher percentage than you would normally see-- >> James: Or a lower percentage? >> I think certainly for us, we have hired a number of additional women in the last year, and they are phenomenal data scientists. I don't know that I would say, I mean I think it's certainly typical that this is still a male-dominated field, but I think like many male-dominated fields, physics, mathematics, computer science, I think that that is slowly changing and evolving, and I think certainly, that's something that we've noticed in our firm over the years at our consultancy, as we're hiring new people. So I don't know if I would say 25% is the right number, but hopefully we can get it closer to 50. Jennifer, I don't know if you have... >> Yeah, so I know at Nielsen we have actually more than 25% of our team is women, at least the team I work with, so there seems to be a lot of women who are going into the field. Which isn't too surprising, because with a lot of the issues that come up in STEM, one of the reasons why a lot of women drop out is because they want real world jobs and they feel like they want to be in the workforce, and so I think this is a great opportunity with data science being so popular for these women to actually have a job where they can still maintain that engineering and science view background that they learned in school. >> Great, well Hillary Mason, I think, was the first data scientist that I ever interviewed, and I asked her what are the sort of skills required and the first question that we wanted to ask, I just threw other women in tech in there, 'cause we love women in tech, is about this notion of the unicorn data scientist, right? It's been put forth that there's the skill sets required to be a date scientist are so numerous that it's virtually impossible to have a data scientist with all those skills. >> And I love Dez's extreme sports analogy, because that plays into the whole notion of data science, we like to talk about the theme now of data science as a team sport. Must it be an extreme sport is what I'm wondering, you know. The unicorns of the world seem to be... Is that realistic now in this new era? >> I mean when automobiles first came out, they were concerned that there wouldn't be enough chauffeurs to drive all the people around. Is there an analogy with data, to be a data-driven company. Do I need a data scientist, and does that data scientist, you know, need to have these unbelievable mixture of skills? Or are we doomed to always have a skill shortage? Open it up. >> I'd like to have a crack at that, so it's interesting, when automobiles were a thing, when they first bought cars out, and before they, sort of, were modernized by the likes of Ford's Model T, when we got away from the horse and carriage, they actually had human beings walking down the street with a flag warning the public that the horseless carriage was coming, and I think data scientists are very much like that. That we're kind of expected to go ahead of the organization and try and take the challenges we're faced with today and see what's going to come around the corner. And so we're like the little flag-bearers, if you'd like, in many ways of this is where we're at today, tell me where I'm going to be tomorrow, and try and predict the day after as well. It is very much becoming a team sport though. But I think the concept of data science being a unicorn has come about because the coinage hasn't been very well defined, you know, if you were to ask 10 people what a data scientist were, you'd get 11 answers, and I think this is a really challenging issue for hiring managers and C-suites when the generants say I was data science, I want big data, I want an analyst. They don't actually really know what they're asking for. Generally, if you ask for a database administrator, it's a well-described job spec, and you can just advertise it and some 20 people will turn up and you interview to decide whether you like the look and feel and smell of 'em. When you ask for a data scientist, there's 20 different definitions of what that one data science role could be. So we don't initially know what the job is, we don't know what the deliverable is, and we're still trying to figure that out, so yeah. >> Craig what about you? >> So from my experience, when we talk about data science, we're really talking about a collection of experiences with multiple people I've yet to find, at least from my experience, a data science effort with a lone wolf. So you're talking about a combination of skills, and so you don't have, no one individual needs to have all that makes a data scientist a data scientist, but you definitely have to have the right combination of skills amongst a team in order to accomplish the goals of data science team. So from my experiences and from the clients that I've worked with, we refer to the data science effort as a data science team. And I believe that's very appropriate to the team sport analogy. >> For us, we look at a data scientist as a full stack web developer, a jack of all trades, I mean they need to have a multitude of background coming from a programmer from an analyst. You can't find one subject matter expert, it's very difficult. And if you're able to find a subject matter expert, you know, through the lifecycle of product development, you're going to require that individual to interact with a number of other members from your team who are analysts and then you just end up well training this person to be, again, a jack of all trades, so it comes full circle. >> I own a business that does nothing but data solutions, and we've been in business 15 years, and it's been, the transition over time has been going from being a conventional wisdom run company with a bunch of experts at the top to becoming more of a data-driven company using data warehousing and BI, but now the trend is absolutely analytics driven. So if you're not becoming an analytics-driven company, you are going to be behind the curve very very soon, and it's interesting that IBM is now coining the phrase of a cognitive business. I think that is absolutely the future. If you're not a cognitive business from a technology perspective, and an analytics-driven perspective, you're going to be left behind, that's for sure. So in order to stay competitive, you know, you need to really think about data science think about how you're using your data, and I also see that what's considered the data expert has evolved over time too where it used to be just someone really good at writing SQL, or someone really good at writing queries in any language, but now it's becoming more of a interdisciplinary action where you need soft skills and you also need the hard skills, and that's why I think there's more females in the industry now than ever. Because you really need to have a really broad width of experiences that really wasn't required in the past. >> Greg Piateski, you have a comment? >> So there are not too many unicorns in nature or as data scientists, so I think organizations that want to hire data scientists have to look for teams, and there are a few unicorns like Hillary Mason or maybe Osama Faiat, but they generally tend to start companies and very hard to retain them as data scientists. What I see is in other evolution, automation, and you know, steps like IBM, Watson, the first platform is eventually a great advance for data scientists in the short term, but probably what's likely to happen in the longer term kind of more and more of those skills becoming subsumed by machine unique layer within the software. How long will it take, I don't know, but I have a feeling that the paradise for data scientists may not be very long lived. >> Greg, I have a follow up question to what I just heard you say. When a data scientist, let's say a unicorn data scientist starts a company, as you've phrased it, and the company's product is built on data science, do they give up becoming a data scientist in the process? It would seem that they become a data scientist of a higher order if they've built a product based on that knowledge. What is your thoughts on that? >> Well, I know a few people like that, so I think maybe they remain data scientists at heart, but they don't really have the time to do the analysis and they really have to focus more on strategic things. For example, today actually is the birthday of Google, 18 years ago, so Larry Page and Sergey Brin wrote a very influential paper back in the '90s About page rank. Have they remained data scientist, perhaps a very very small part, but that's not really what they do, so I think those unicorn data scientists could quickly evolve to have to look for really teams to capture those skills. >> Clearly they come to a point in their career where they build a company based on teams of data scientists and data engineers and so forth, which relates to the topic of team data science. What is the right division of roles and responsibilities for team data science? >> Before we go, Jennifer, did you have a comment on that? >> Yeah, so I guess I would say for me, when data science came out and there was, you know, the Venn Diagram that came out about all the skills you were supposed to have? I took a very different approach than all of the people who I knew who were going into data science. Most people started interviewing immediately, they were like this is great, I'm going to get a job. I went and learned how to develop applications, and learned computer science, 'cause I had never taken a computer science course in college, and made sure I trued up that one part where I didn't know these things or had the skills from school, so I went headfirst and just learned it, and then now I have actually a lot of technology patents as a result of that. So to answer Jim's question, actually. I started my company about five years ago. And originally started out as a consulting firm slash data science company, then it evolved, and one of the reasons I went back in the industry and now I'm at Nielsen is because you really can't do the same sort of data science work when you're actually doing product development. It's a very very different sort of world. You know, when you're developing a product you're developing a core feature or functionality that you're going to offer clients and customers, so I think definitely you really don't get to have that wide range of sort of looking at 8 million models and testing things out. That flexibility really isn't there as your product starts getting developed. >> Before we go into the team sport, the hard skills that you have, are you all good at math? Are you all computer science types? How about math? Are you all math? >> What were your GPAs? (laughs) >> David: Anybody not math oriented? Anybody not love math? You don't love math? >> I love math, I think it's required. >> David: So math yes, check. >> You dream in equations, right? You dream. >> Computer science? Do I have to have computer science skills? At least the basic knowledge? >> I don't know that you need to have formal classes in any of these things, but I think certainly as Jennifer was saying, if you have no skills in programming whatsoever and you have no interest in learning how to write SQL queries or RR Python, you're probably going to struggle a little bit. >> James: It would be a challenge. >> So I think yes, I have a Ph.D. in physics, I did a lot of math, it's my love language, but I think you don't necessarily need to have formal training in all of these things, but I think you need to have a curiosity and a love of learning, and so if you don't have that, you still want to learn and however you gain that knowledge I think, but yeah, if you have no technical interests whatsoever, and don't want to write a line of code, maybe data science is not the field for you. Even if you don't do it everyday. >> And statistics as well? You would put that in that same general category? How about data hacking? You got to love data hacking, is that fair? Eaves, you have a comment? >> Yeah, I think so, while we've been discussing that for me, the most important part is that you have a logical mind and you have the capability to absorb new things and the curiosity you need to dive into that. While I don't have an education in IT or whatever, I have a background in chemistry and those things that I learned there, I apply to information technology as well, and from a part that you say, okay, I'm a tech-savvy guy, I'm interested in the tech part of it, you need to speak that business language and if you can do that crossover and understand what other skill sets or parts of the roles are telling you I think the communication in that aspect is very important. >> I'd like throw just something really quickly, and I think there's an interesting thing that happens in IT, particularly around technology. We tend to forget that we've actually solved a lot of these problems in the past. If we look in history, if we look around the second World War, and Bletchley Park in the UK, where you had a very similar experience as humans that we're having currently around the whole issue of data science, so there was an interesting challenge with the enigma in the shark code, right? And there was a bunch of men put in a room and told, you're mathematicians and you come from universities, and you can crack codes, but they couldn't. And so what they ended up doing was running these ads, and putting challenges, they actually put, I think it was crossword puzzles in the newspaper, and this deluge of women came out of all kinds of different roles without math degrees, without science degrees, but could solve problems, and they were thrown at the challenge of cracking codes, and invariably, they did the heavy lifting. On a daily basis for converting messages from one format to another, so that this very small team at the end could actually get in play with the sexy piece of it. And I think we're going through a similar shift now with what we're refer to as data science in the technology and business world. Where the people who are doing the heavy lifting aren't necessarily what we'd think of as the traditional data scientists, and so, there have been some unicorns and we've championed them, and they're great. But I think the shift's going to be to accountants, actuaries, and statisticians who understand the business, and come from an MBA star background that can learn the relevant pieces of math and models that we need to to apply to get the data science outcome. I think we've already been here, we've solved this problem, we've just got to learn not to try and reinvent the wheel, 'cause the media hypes this whole thing of data science is exciting and new, but we've been here a couple times before, and there's a lot to be learned from that, my view. >> I think we had Joe next. >> Yeah, so I was going to say that, data science is a funny thing. To use the word science is kind of a misnomer, because there is definitely a level of art to it, and I like to use the analogy, when Michelangelo would look at a block of marble, everyone else looked at the block of marble to see a block of marble. He looks at a block of marble and he sees a finished sculpture, and then he figures out what tools do I need to actually make my vision? And I think data science is a lot like that. We hear a problem, we see the solution, and then we just need the right tools to do it, and I think part of consulting and data science in particular. It's not so much what we know out of the gate, but it's how quickly we learn. And I think everyone here, what makes them brilliant, is how quickly they could learn any tool that they need to see their vision get accomplished. >> David: Justin? >> Yeah, I think you make a really great point, for me, I'm a Marine Corp veteran, and the reason I mentioned that is 'cause I work with two veterans who are problem solvers. And I think that's what data scientists really are, in the long run are problem solvers, and you mentioned a great point that, yeah, I think just problem solving is the key. You don't have to be a subject matter expert, just be able to take the tools and intelligently use them. >> Now when you look at the whole notion of team data science, what is the right mix of roles, like role definitions within a high-quality or a high-preforming data science teams now IBM, with, of course, our announcement of project, data works and so forth. We're splitting the role division, in terms of data scientist versus data engineers versus application developer versus business analyst, is that the right breakdown of roles? Or what would the panelists recommend in terms of understanding what kind of roles make sense within, like I said, a high performing team that's looking for trying to develop applications that depend on data, machine learning, and so forth? Anybody want to? >> I'll tackle that. So the teams that I have created over the years made up these data science teams that I brought into customer sites have a combination of developer capabilities and some of them are IT developers, but some of them were developers of things other than applications. They designed buildings, they did other things with their technical expertise besides building technology. The other piece besides the developer is the analytics, and analytics can be taught as long as they understand how algorithms work and the code behind the analytics, in other words, how are we analyzing things, and from a data science perspective, we are leveraging technology to do the analyzing through the tool sets, so ultimately as long as they understand how tool sets work, then we can train them on the tools. Having that analytic background is an important piece. >> Craig, is it easier to, I'll go to you in a moment Joe, is it easier to cross train a data scientist to be an app developer, than to cross train an app developer to be a data scientist or does it not matter? >> Yes. (laughs) And not the other way around. It depends on the-- >> It's easier to cross train a data scientist to be an app developer than-- >> Yes. >> The other way around. Why is that? >> Developing code can be as difficult as the tool set one uses to develop code. Today's tool sets are very user friendly. where developing code is very difficult to teach a person to think along the lines of developing code when they don't have any idea of the aspects of code, of building something. >> I think it was Joe, or you next, or Jennifer, who was it? >> I would say that one of the reasons for that is data scientists will probably know if the answer's right after you process data, whereas data engineer might be able to manipulate the data but may not know if the answer's correct. So I think that is one of the reasons why having a data scientist learn the application development skills might be a easier time than the other way around. >> I think Miriam, had a comment? Sorry. >> I think that what we're advising our clients to do is to not think, before data science and before analytics became so required by companies to stay competitive, it was more of a waterfall, you have a data engineer build a solution, you know, then you throw it over the fence and the business analyst would have at it, where now, it must be agile, and you must have a scrum team where you have the data scientist and the data engineer and the project manager and the product owner and someone from the chief data office all at the table at the same time and all accomplishing the same goal. Because all of these skills are required, collectively in order to solve this problem, and it can't be done daisy chained anymore it has to be a collaboration. And that's why I think spark is so awesome, because you know, spark is a single interface that a data engineer can use, a data analyst can use, and a data scientist can use. And now with what we've learned today, having a data catalog on top so that the chief data office can actually manage it, I think is really going to take spark to the next level. >> James: Miriam? >> I wanted to comment on your question to Craig about is it harder to teach a data scientist to build an application or vice versa, and one of the things that we have worked on a lot in our data science team is incorporating a lot of best practices from software development, agile, scrum, that sort of thing, and I think particularly with a focus on deploying models that we don't just want to build an interesting data science model, we want to deploy it, and get some value. You need to really incorporate these processes from someone who might know how to build applications and that, I think for some data scientists can be a challenge, because one of the fun things about data science is you get to get into the data, and you get your hands dirty, and you build a model, and you get to try all these cool things, but then when the time comes for you to actually deploy something, you need deployment-grade code in order to make sure it can go into production at your client side and be useful for instance, so I think that there's an interesting challenge on both ends, but one of the things I've definitely noticed with some of our data scientists is it's very hard to get them to think in that mindset, which is why you have a team of people, because everyone has different skills and you can mitigate that. >> Dev-ops for data science? >> Yeah, exactly. We call it insight ops, but yeah, I hear what you're saying. Data science is becoming increasingly an operational function as opposed to strictly exploratory or developmental. Did some one else have a, Dez? >> One of the things I was going to mention, one of the things I like to do when someone gives me a new problem is take all the laptops and phones away. And we just end up in a room with a whiteboard. And developers find that challenging sometimes, so I had this one line where I said to them don't write the first line of code until you actually understand the problem you're trying to solve right? And I think where the data science focus has changed the game for organizations who are trying to get some systematic repeatable process that they can throw data at and just keep getting answers and things, no matter what the industry might be is that developers will come with a particular mindset on how they're going to codify something without necessarily getting the full spectrum and understanding the problem first place. What I'm finding is the people that come at data science tend to have more of a hacker ethic. They want to hack the problem, they want to understand the challenge, and they want to be able to get it down to plain English simple phrases, and then apply some algorithms and then build models, and then codify it, and so most of the time we sit in a room with whiteboard markers just trying to build a model in a graphical sense and make sure it's going to work and that it's going to flow, and once we can do that, we can codify it. I think when you come at it from the other angle from the developer ethic, and you're like I'm just going to codify this from day one, I'm going to write code. I'm going to hack this thing out and it's just going to run and compile. Often, you don't truly understand what he's trying to get to at the end point, and you can just spend days writing code and I think someone made the comment that sometimes you don't actually know whether the output is actually accurate in the first place. So I think there's a lot of value being provided from the data science practice. Over understanding the problem in plain english at a team level, so what am I trying to do from the business consulting point of view? What are the requirements? How do I build this model? How do I test the model? How do I run a sample set through it? Train the thing and then make sure what I'm going to codify actually makes sense in the first place, because otherwise, what are you trying to solve in the first place? >> Wasn't that Einstein who said if I had an hour to solve a problem, I'd spend 55 minutes understanding the problem and five minutes on the solution, right? It's exactly what you're talking about. >> Well I think, I will say, getting back to the question, the thing with building these teams, I think a lot of times people don't talk about is that engineers are actually very very important for data science projects and data science problems. For instance, if you were just trying to prototype something or just come up with a model, then data science teams are great, however, if you need to actually put that into production, that code that the data scientist has written may not be optimal, so as we scale out, it may be actually very inefficient. At that point, you kind of want an engineer to step in and actually optimize that code, so I think it depends on what you're building and that kind of dictates what kind of division you want among your teammates, but I do think that a lot of times, the engineering component is really undervalued out there. >> Jennifer, it seems that the data engineering function, data discovery and preparation and so forth is becoming automated to a greater degree, but if I'm listening to you, I don't hear that data engineering as a discipline is becoming extinct in terms of a role that people can be hired into. You're saying that there's a strong ongoing need for data engineers to optimize the entire pipeline to deliver the fruits of data science in production applications, is that correct? So they play that very much operational role as the backbone for... >> So I think a lot of times businesses will go to data scientist to build a better model to build a predictive model, but that model may not be something that you really want to implement out there when there's like a million users coming to your website, 'cause it may not be efficient, it may take a very long time, so I think in that sense, it is important to have good engineers, and your whole product may fail, you may build the best model it may have the best output, but if you can't actually implement it, then really what good is it? >> What about calibrating these models? How do you go about doing that and sort of testing that in the real world? Has that changed overtime? Or is it... >> So one of the things that I think can happen, and we found with one of our clients is when you build a model, you do it with the data that you have, and you try to use a very robust cross-validation process to make sure that it's robust and it's sturdy, but one thing that can sometimes happen is after you put your model into production, there can be external factors that, societal or whatever, things that have nothing to do with the data that you have or the quality of the data or the quality of the model, which can actually erode the model's performance over time. So as an example, we think about cell phone contracts right? Those have changed a lot over the years, so maybe five years ago, the type of data plan you had might not be the same that it is today, because a totally different type of plan is offered, so if you're building a model on that to say predict who's going to leave and go to a different cell phone carrier, the validity of your model overtime is going to completely degrade based on nothing that you have, that you put into the model or the data that was available, so I think you need to have this sort of model management and monitoring process to take this factors into account and then know when it's time to do a refresh. >> Cross-validation, even at one point in time, for example, there was an article in the New York Times recently that they gave the same data set to five different data scientists, this is survey data for the presidential election that's upcoming, and five different data scientists came to five different predictions. They were all high quality data scientists, the cross-validation showed a wide variation about who was on top, whether it was Hillary or whether it was Trump so that shows you that even at any point in time, cross-validation is essential to understand how robust the predictions might be. Does somebody else have a comment? Joe? >> I just want to say that this even drives home the fact that having the scrum team for each project and having the engineer and the data scientist, data engineer and data scientist working side by side because it is important that whatever we're building we assume will eventually go into production, and we used to have in the data warehousing world, you'd get the data out of the systems, out of your applications, you do analysis on your data, and the nirvana was maybe that data would go back to the system, but typically it didn't. Nowadays, the applications are dependent on the insight coming from the data science team. With the behavior of the application and the personalization and individual experience for a customer is highly dependent, so it has to be, you said is data science part of the dev-ops team, absolutely now, it has to be. >> Whose job is it to figure out the way in which the data is presented to the business? Where's the sort of presentation, the visualization plan, is that the data scientist role? Does that depend on whether or not you have that gene? Do you need a UI person on your team? Where does that fit? >> Wow, good question. >> Well usually that's the output, I mean, once you get to the point where you're visualizing the data, you've created an algorithm or some sort of code that produces that to be visualized, so at the end of the day that the customers can see what all the fuss is about from a data science perspective. But it's usually post the data science component. >> So do you run into situations where you can see it and it's blatantly obvious, but it doesn't necessarily translate to the business? >> Well there's an interesting challenge with data, and we throw the word data around a lot, and I've got this fun line I like throwing out there. If you torture data long enough, it will talk. So the challenge then is to figure out when to stop torturing it, right? And it's the same with models, and so I think in many other parts of organizations, we'll take something, if someone's doing a financial report on performance of the organization and they're doing it in a spreadsheet, they'll get two or three peers to review it, and validate that they've come up with a working model and the answer actually makes sense. And I think we're rushing so quickly at doing analysis on data that comes to us in various formats and high velocity that I think it's very important for us to actually stop and do peer reviews, of the models and the data and the output as well, because otherwise we start making decisions very quickly about things that may or may not be true. It's very easy to get the data to paint any picture you want, and you gave the example of the five different attempts at that thing, and I had this shoot out thing as well where I'll take in a team, I'll get two different people to do exactly the same thing in completely different rooms, and come back and challenge each other, and it's quite amazing to see the looks on their faces when they're like, oh, I didn't see that, and then go back and do it again until, and then just keep iterating until we get to the point where they both get the same outcome, in fact there's a really interesting anecdote about when the UNIX operation system was being written, and a couple of the authors went away and wrote the same program without realizing that each other were doing it, and when they came back, they actually had line for line, the same piece of C code, 'cause they'd actually gotten to a truth. A perfect version of that program, and I think we need to often look at, when we're building models and playing with data, if we can't come at it from different angles, and get the same answer, then maybe the answer isn't quite true yet, so there's a lot of risk in that. And it's the same with presentation, you know, you can paint any picture you want with the dashboard, but who's actually validating when the dashboard's painting the correct picture? >> James: Go ahead, please. >> There is a science actually, behind data visualization, you know if you're doing trending, it's a line graph, if you're doing comparative analysis, it's bar graph, if you're doing percentages, it's a pie chart, like there is a certain science to it, it's not that much of a mystery as the novice thinks there is, but what makes it challenging is that you also, just like any presentation, you have to consider your audience. And your audience, whenever we're delivering a solution, either insight, or just data in a grid, we really have to consider who is the consumer of this data, and actually cater the visual to that person or to that particular audience. And that is part of the art, and that is what makes a great data scientist. >> The consumer may in fact be the source of the data itself, like in a mobile app, so you're tuning their visualization and then their behavior is changing as a result, and then the data on their changed behavior comes back, so it can be a circular process. >> So Jim, at a recent conference, you were tweeting about the citizen data scientist, and you got emasculated by-- >> I spoke there too. >> Okay. >> TWI on that same topic, I got-- >> Kirk Borne I hear came after you. >> Kirk meant-- >> Called foul, flag on the play. >> Kirk meant well. I love Claudia Emahoff too, but yeah, it's a controversial topic. >> So I wonder what our panel thinks of that notion, citizen data scientist. >> Can I respond about citizen data scientists? >> David: Yeah, please. >> I think this term was introduced by Gartner analyst in 2015, and I think it's a very dangerous and misleading term. I think definitely we want to democratize the data and have access to more people, not just data scientists, but managers, BI analysts, but when there is already a term for such people, we can call the business analysts, because it implies some training, some understanding of the data. If you use the term citizen data scientist, it implies that without any training you take some data and then you find something there, and they think as Dev's mentioned, we've seen many examples, very easy to find completely spurious random correlations in data. So we don't want citizen dentists to treat our teeth or citizen pilots to fly planes, and if data's important, having citizen data scientists is equally dangerous, so I'm hoping that, I think actually Gartner did not use the term citizen data scientist in their 2016 hype course, so hopefully they will put this term to rest. >> So Gregory, you apparently are defining citizen to mean incompetent as opposed to simply self-starting. >> Well self-starting is very different, but that's not what I think what was the intention. I think what we see in terms of data democratization, there is a big trend over automation. There are many tools, for example there are many companies like Data Robot, probably IBM, has interesting machine learning capability towards automation, so I think I recently started a page on KDnuggets for automated data science solutions, and there are already 20 different forums that provide different levels of automation. So one can deliver in full automation maybe some expertise, but it's very dangerous to have part of an automated tool and at some point then ask citizen data scientists to try to take the wheels. >> I want to chime in on that. >> David: Yeah, pile on. >> I totally agree with all of that. I think the comment I just want to quickly put out there is that the space we're in is a very young, and rapidly changing world, and so what we haven't had yet is this time to stop and take a deep breath and actually define ourselves, so if you look at computer science in general, a lot of the traditional roles have sort of had 10 or 20 years of history, and so thorough the hiring process, and the development of those spaces, we've actually had time to breath and define what those jobs are, so we know what a systems programmer is, and we know what a database administrator is, but we haven't yet had a chance as a community to stop and breath and say, well what do we think these roles are, and so to fill that void, the media creates coinages, and I think this is the risk we've got now that the concept of a data scientist was just a term that was coined to fill a void, because no one quite knew what to call somebody who didn't come from a data science background if they were tinkering around data science, and I think that's something that we need to sort of sit up and pay attention to, because if we don't own that and drive it ourselves, then somebody else is going to fill the void and they'll create these very frustrating concepts like data scientist, which drives us all crazy. >> James: Miriam's next. >> So I wanted to comment, I agree with both of the previous comments, but in terms of a citizen data scientist, and I think whether or not you're citizen data scientist or an actual data scientist whatever that means, I think one of the most important things you can have is a sense of skepticism, right? Because you can get spurious correlations and it's like wow, my predictive model is so excellent, you know? And being aware of things like leaks from the future, right? This actually isn't predictive at all, it's a result of the thing I'm trying to predict, and so I think one thing I know that we try and do is if something really looks too good, we need to go back in and make sure, did we not look at the data correctly? Is something missing? Did we have a problem with the ETL? And so I think that a healthy sense of skepticism is important to make sure that you're not taking a spurious correlation and trying to derive some significant meaning from it. >> I think there's a Dilbert cartoon that I saw that described that very well. Joe, did you have a comment? >> I think that in order for citizen data scientists to really exist, I think we do need to have more maturity in the tools that they would use. My vision is that the BI tools of today are all going to be replaced with natural language processing and searching, you know, just be able to open up a search bar and say give me sales by region, and to take that one step into the future even further, it should actually say what are my sales going to be next year? And it should trigger a simple linear regression or be able to say which features of the televisions are actually affecting sales and do a clustering algorithm, you know I think hopefully that will be the future, but I don't see anything of that today, and I think in order to have a true citizen data scientist, you would need to have that, and that is pretty sophisticated stuff. >> I think for me, the idea of citizen data scientist I can relate to that, for instance, when I was in graduate school, I started doing some research on FDA data. It was an open source data set about 4.2 million data points. Technically when I graduated, the paper was still not published, and so in some sense, you could think of me as a citizen data scientist, right? I wasn't getting funding, I wasn't doing it for school, but I was still continuing my research, so I'd like to hope that with all the new data sources out there that there might be scientists or people who are maybe kept out of a field people who wanted to be in STEM and for whatever life circumstance couldn't be in it. That they might be encouraged to actually go and look into the data and maybe build better models or validate information that's out there. >> So Justin, I'm sorry you had one comment? >> It seems data science was termed before academia adopted formalized training for data science. But yeah, you can make, like Dez said, you can make data work for whatever problem you're trying to solve, whatever answer you see, you want data to work around it, you can make it happen. And I kind of consider that like in project management, like data creep, so you're so hyper focused on a solution you're trying to find the answer that you create an answer that works for that solution, but it may not be the correct answer, and I think the crossover discussion works well for that case. >> So but the term comes up 'cause there's a frustration I guess, right? That data science skills are not plentiful, and it's potentially a bottleneck in an organization. Supposedly 80% of your time is spent on cleaning data, is that right? Is that fair? So there's a problem. How much of that can be automated and when? >> I'll have a shot at that. So I think there's a shift that's going to come about where we're going to move from centralized data sets to data at the edge of the network, and this is something that's happening very quickly now where we can't just hold everything back to a central spot. When the internet of things actually wakes up. Things like the Boeing Dreamliner 787, that things got 6,000 sensors in it, produces half a terabyte of data per flight. There are 87,400 flights per day in domestic airspace in the U.S. That's 43.5 petabytes of raw data, now that's about three years worth of disk manufacturing in total, right? We're never going to copy that across one place, we can't process, so I think the challenge we've got ahead of us is looking at how we're going to move the intelligence and the analytics to the edge of the network and pre-cook the data in different tiers, so have a look at the raw material we get, and boil it down to a slightly smaller data set, bring a meta data version of that back, and eventually get to the point where we've only got the very minimum data set and data points we need to make key decisions. Without that, we're already at the point where we have too much data, and we can't munch it fast enough, and we can't spin off enough tin even if we witch the cloud on, and that's just this never ending deluge of noise, right? And you've got that signal versus noise problem so then we're now seeing a shift where people looking at how do we move the intelligence back to the edge of network which we actually solved some time ago in the securities space. You know, spam filtering, if an emails hits Google on the west coast of the U.S. and they create a check some for that spam email, it immediately goes into a database, and nothing gets on the opposite side of the coast, because they already know it's spam. They recognize that email coming in, that's evil, stop it. So we've already fixed its insecurity with intrusion detection, we've fixed it in spam, so we now need to take that learning, and bring it into business analytics, if you like, and see where we're finding patterns and behavior, and brew that out to the edge of the network, so if I'm seeing a demand over here for tickets on a new sale of a show, I need to be able to see where else I'm going to see that demand and start responding to that before the demand comes about. I think that's a shift that we're going to see quickly, because we'll never keep up with the data munching challenge and the volume's just going to explode. >> David: We just have a couple minutes. >> That does sound like a great topic for a future Cube panel which is data science on the edge of the fog. >> I got a hundred questions around that. So we're wrapping up here. Just got a couple minutes. Final thoughts on this conversation or any other pieces that you want to punctuate. >> I think one thing that's been really interesting for me being on this panel is hearing all of my co-panelists talking about common themes and things that we are also experiencing which isn't a surprise, but it's interesting to hear about how ubiquitous some of the challenges are, and also at the announcement earlier today, some of the things that they're talking about and thinking about, we're also talking about and thinking about. So I think it's great to hear we're all in different countries and different places, but we're experiencing a lot of the same challenges, and I think that's been really interesting for me to hear about. >> David: Great, anybody else, final thoughts? >> To echo Dez's thoughts, it's about we're never going to catch up with the amount of data that's produced, so it's about transforming big data into smart data. >> I could just say that with the shift from normal data, small data, to big data, the answer is automate, automate, automate, and we've been talking about advanced algorithms and machine learning for the science for changing the business, but there also needs to be machine learning and advanced algorithms for the backroom where we're actually getting smarter about how we ingestate and how we fix data as it comes in. Because we can actually train the machines to understand data anomalies and what we want to do with them over time. And I think the further upstream we get of data correction, the less work there will be downstream. And I also think that the concept of being able to fix data at the source is gone, that's behind us. Right now the data that we're using to analyze to change the business, typically we have no control over. Like Dez said, they're coming from censors and machines and internet of things and if it's wrong, it's always going to be wrong, so we have to figure out how to do that in our laboratory. >> Eaves, final thoughts? >> I think it's a mind shift being a data scientist if you look back at the time why did you start developing or writing code? Because you like to code, whatever, just for the sake of building a nice algorithm or a piece of software, or whatever, and now I think with the spirit of a data scientist, you're looking at a problem and say this is where I want to go, so you have more the top down approach than the bottom up approach. And have the big picture and that is what you really need as a data scientist, just look across technologies, look across departments, look across everything, and then on top of that, try to apply as much skills as you have available, and that's kind of unicorn that they're trying to look for, because it's pretty hard to find people with that wide vision on everything that is happening within the company, so you need to be aware of technology, you need to be aware of how a business is run, and how it fits within a cultural environment, you have to work with people and all those things together to my belief to make it very difficult to find those good data scientists. >> Jim? Your final thought? >> My final thoughts is this is an awesome panel, and I'm so glad that you've come to New York, and I'm hoping that you all stay, of course, for the the IBM Data First launch event that will take place this evening about a block over at Hudson Mercantile, so that's pretty much it. Thank you, I really learned a lot. >> I want to second Jim's thanks, really, great panel. Awesome expertise, really appreciate you taking the time, and thanks to the folks at IBM for putting this together. >> And I'm big fans of most of you, all of you, on this session here, so it's great just to meet you in person, thank you. >> Okay, and I want to thank Jeff Frick for being a human curtain there with the sun setting here in New York City. Well thanks very much for watching, we are going to be across the street at the IBM announcement, we're going to be on the ground. We open up again tomorrow at 9:30 at Big Data NYC, Big Data Week, Strata plus the Hadoop World, thanks for watching everybody, that's a wrap from here. This is the Cube, we're out. (techno music)
SUMMARY :
Brought to you by headline sponsors, and this is a cube first, and we have some really but I want to hear them. and appreciate you organizing this. and the term data mining Eves, I of course know you from Twitter. and you can do that on a technical level, How many people have been on the Cube I always like to ask that question. and that was obviously Great, thank you Craig, and I'm also on the faculty and saw that snake swallow a basketball and with the big paradigm Great, thank you. and I came to data science, Great, thank you. and so what I think about data science Great, and last but not least, and the scale at which I'm going to go off script-- You guys have in on the front. and one of the CDOs, she said that 25% and I think certainly, that's and so I think this is a great opportunity and the first question talk about the theme now and does that data scientist, you know, and you can just advertise and from the clients I mean they need to have and it's been, the transition over time but I have a feeling that the paradise and the company's product and they really have to focus What is the right division and one of the reasons I You dream in equations, right? and you have no interest in learning but I think you need to and the curiosity you and there's a lot to be and I like to use the analogy, and the reason I mentioned that is that the right breakdown of roles? and the code behind the analytics, And not the other way around. Why is that? idea of the aspects of code, of the reasons for that I think Miriam, had a comment? and someone from the chief data office and one of the things that an operational function as opposed to and so most of the time and five minutes on the solution, right? that code that the data but if I'm listening to you, that in the real world? the data that you have or so that shows you that and the nirvana was maybe that the customers can see and a couple of the authors went away and actually cater the of the data itself, like in a mobile app, I love Claudia Emahoff too, of that notion, citizen data scientist. and have access to more people, to mean incompetent as opposed to and at some point then ask and the development of those spaces, and so I think one thing I think there's a and I think in order to have a true so I'd like to hope that with all the new and I think So but the term comes up and the analytics to of the fog. or any other pieces that you want to and also at the so it's about transforming big data and machine learning for the science and now I think with the and I'm hoping that you and thanks to the folks at IBM so it's great just to meet you in person, This is the Cube, we're out.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Jennifer | PERSON | 0.99+ |
Jennifer Shin | PERSON | 0.99+ |
Miriam Fridell | PERSON | 0.99+ |
Greg Piateski | PERSON | 0.99+ |
Justin | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
David | PERSON | 0.99+ |
Jeff Frick | PERSON | 0.99+ |
2015 | DATE | 0.99+ |
Joe Caserta | PERSON | 0.99+ |
James Cubelis | PERSON | 0.99+ |
James | PERSON | 0.99+ |
Miriam | PERSON | 0.99+ |
Jim | PERSON | 0.99+ |
Joe | PERSON | 0.99+ |
Claudia Emahoff | PERSON | 0.99+ |
NVIDIA | ORGANIZATION | 0.99+ |
Hillary | PERSON | 0.99+ |
New York | LOCATION | 0.99+ |
Hillary Mason | PERSON | 0.99+ |
Justin Sadeen | PERSON | 0.99+ |
Greg | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
55 minutes | QUANTITY | 0.99+ |
Trump | PERSON | 0.99+ |
2016 | DATE | 0.99+ |
Craig | PERSON | 0.99+ |
Dave Valante | PERSON | 0.99+ |
George | PERSON | 0.99+ |
Dez Blanchfield | PERSON | 0.99+ |
UK | LOCATION | 0.99+ |
Ford | ORGANIZATION | 0.99+ |
Craig Brown | PERSON | 0.99+ |
10 | QUANTITY | 0.99+ |
8 Path Solutions | ORGANIZATION | 0.99+ |
CISCO | ORGANIZATION | 0.99+ |
five minutes | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
30 years | QUANTITY | 0.99+ |
Kirk | PERSON | 0.99+ |
25% | QUANTITY | 0.99+ |
Marine Corp | ORGANIZATION | 0.99+ |
80% | QUANTITY | 0.99+ |
43.5 petabytes | QUANTITY | 0.99+ |
Boston | LOCATION | 0.99+ |
Data Robot | ORGANIZATION | 0.99+ |
10 people | QUANTITY | 0.99+ |
Hal Varian | PERSON | 0.99+ |
Einstein | PERSON | 0.99+ |
New York City | LOCATION | 0.99+ |
Nielsen | ORGANIZATION | 0.99+ |
first question | QUANTITY | 0.99+ |
Friday | DATE | 0.99+ |
Ralph Timbal | PERSON | 0.99+ |
U.S. | LOCATION | 0.99+ |
6,000 sensors | QUANTITY | 0.99+ |
UC Berkeley | ORGANIZATION | 0.99+ |
Sergey Brin | PERSON | 0.99+ |