Jesse Rothstein, ExtraHop | AWS re:Inforce 2019
>> live from Boston, Massachusetts. It's the Cube covering A W s reinforce 2019 brought to you by Amazon Web service is and its ecosystem partners come >> back, Everyone live Coverage of AWS reinforced their first conference, The Cube here in Boston. Messages some jumper. MacOS David Lattin escapes Jesse rusting >> CT on co >> founder of Extra Cube alumni. Great to see you again. VM World Reinvent >> Now the new conference reinforce not a team. A >> summit reinforced a branded event around Cloud security. This is in your wheelhouse. >> Thank you for having me. Yeah, it's a spectacular event. Unbelievable turnout. I think there's 8000 people here. Maybe more. I know that's what they were expecting for an event that was conceived of, or at least announced barely six months ago. The turnout's just >> wait. Many conversation in the past on the Cube and others cloud security now having its own conference. It's not like a like a security conference like Black at Def Con, which is like a broader security. This is really focused on cloud security and the nuances involved for on premises and cloud as it's evolving. It's certainly a lot more change coming on this kind of spins into your direction you would talking this year in the front end. >> It absolutely does. First, it speaks to market demand. Clearly, there was demand for a cloud security focused conference, and that's why this exists. Every survey that I've seen lists security extremely high on the list of anxieties or even causes for delay for shifting workloads to the cloud. So Amazon takes security extremely seriously. >> And then my own personal >> view is that cloud security has been somewhat nascent and immature. And we're seeing, you know, hopefully kind of Ah, somewhere rapid, a >> lot of motivation in that market. Certainly a lot of motivated people want to see it go faster and there spitting in building that out. So I gotta ask >> you before you get off the show, I actually say something if I may. I mean, it's been a long time coming. Yeah, this to your point, Jesse. There was a real need for it, and I think Amazon deserves a lot of credit for that. But at the same time, I think Amazon. There's a little criticism there. I mean, I think that the message that reinvent that's always been we got the best security. We got the most features as I come on in, and the whole theme here of the shared responsibility model, which I'd love to get into, I think was somewhat misunderstood by some of those high high level messaging. So I didn't want to put that out there as a topic that we might touch on. Great. Let's talk about it. Okay, so I do think it was misunderstood. The shared responsibility model. I think the messaging was Hey, the cloud is more secure than your existing data centers. Come on in. And I think a lot of people naively entered waters and then realized, Oh, wait a minute. There's a lot that we still have toe secure. We can't just set it and forget it. I mean, you agree with that? >> I I think that's a controversial topic. I do agree with it. I think it continues to be misunderstood. Shared responsibility model in some ways is Amazon saying We're going the security infrastructure and we're going to give you the tools. But organizations air still expected to follow best practices, certainly, and implement their own, hopefully best in class security operations. >> It's highly nuanced. You can say sharing data see increases visibility into into threats and also of making quality alerts. But I think it's a little bit biased, Dave for Amazon to satiate responsibility because they're essentially want to share in the security posture because they're saying we'll do this. You do that as inherently shared. So why wouldn't they say that? >> Well, I guess we're gonna say way want to own everything? Well, I guess my weight So this show is that I really like their focus on that. I think they shone a light on it and for the goodness of the the industry in the community they have. But it is a bit >> nuanced, and they've said some controversial, perhaps even trajectory statements. In the keynote yesterday, I was I was amused to hear that security is everybody everyone's job, which is something I wholeheartedly believe in. But at the same time, you know, David said that he didn't believe Stephen Step Rather said that he didn't believe in depth set cops, and that seemed a little bit of odds because I but I think they're probably really Steven Schmidt. Steven >> so eight of us. But at the same time, there was a narrative around. Security is code. So, yes, there were some contradictions in messaging, so this smaller remains small ones. They were nuanced but remains some confusion. And that's why people look to the ecosystem to help acorns. And this goes back to >> my earlier point. I I believe that cloud security is really quite nascent. When we look at the way we look at the landscape of vendors, we see a number of vendors that really are kind of on Prem security solutions. They're trying to shoehorn into the cloud way, see a lot of essentially vulnerability scanning and static image scanning. But wait, don't see, in my opinion, that much really best in class security so solutions. And I think until relatively recently it was very hard to enable some of them. And that's why I'd love to talk about the VPC traffic marrying announcement, because I think that was actually the most impactful announcement >> that I want to get to it. So So this is ah, a new on the way. By the way, the other feedback up ahead on the Cube is the sessions here have been so good because you can dig deeper than what you can get it re invent given tries. This is a good example. Explained that the that story because this has been one of the most important stories, the traffic mirroring >> well, unlike >> reinvent. I think this show is Is Maura about education than it is about announcements? No, Amazon announced. A few new service is going into G ET, but these were service is, for the most part, that we already knew you were coming here like God Watchtower in security hub. But the BBC traffic mirroring was really the announcement of this show. And, gosh, it's been a long time in coming 11 closely held belief I've had for a long time is that in the fullness of time, there's really nothing of value that that you can do on Prem that you wouldn't eventually be able to do in the cloud. And it's just been a head scratcher for me. WIFE. For so many years, we've been unable to get any sort of view, mirror or tap of the traffic for diagnostic or analytic purpose is something you could do on prim so easily, with a span porter and network tap and in the cloud we've been having to do kind of back flips and workarounds and software taps and things like that. But with this announcement, it's finally here. It's native >> explain VPC Chapman. What is it for? The folks watching might not know it. Why it's wife. What is it and why is it important? >> So BBC traffic marrying is a network tap that is built into E. C. To networking. What it means is that you can configure a V p c traffic mirror four individual E C two instances actually down to the e n I. Level. You can configure filters and you can send that to a target for analysis purposes. And this analysis could be for diagnostics. But I think much more important is for security. Extra hop is is really began as a network analytics platform way do network detection and response. So this type of this ability to analyze the traffic in real time to run predictive models against it to detect in real time suspicious behaviors and potential threats, I think is absolutely game changing for someone security posture. >> And you guys have been on the doorstep of this day in day out. So this is like a great benefit to you guys. As a company, I can see that. I see That's a great thing for you guys. What's the impact of the customers? Because what is the good news that comes out of the traffic nearing for them? What's the impact of their environment? >> Well, it's all about >> friction. First, I wantto clarify that we've been running in a WS for over six years, six or seven years, so we've had that solution. But it's required some friction in the deployment process because our customers had to install some sort of software tap, which was usually an agent, that was analyzing that there was really gathering the packets in some sort of promiscuous mood and then sending them to us in a tunnel. Where is now? This is This is built into the service into the infrastructure. There's no performance penalty at all. You can configure it. You have I am rolls and policies to secure it. All of the friction goes away. I think, for the kind of the first time in in cloud history, you can now get extremely high quality network security analytics with practically the flip of a switch. >> So It's not another thing do manage. It's like you say, inherit to the network. John and I have heard this this week at this event from practitioners that they want to see less just incremental security products and Maur step function and what they mean by that is way want products that actually take action or give us a script that we can implement, or or actually fix the problem for us. Will this announcement on others that you guys were involved in take that next step more proactive security that these guys so a couple of thoughts >> on that first, the answer is yes, it can, and you're absolutely right. Remediation is extremely important, especially for attacks that they're fast and destructive. When you think about kind of the when you think about attack patterns, their attacks are low and slow. Their attacks their advanced in persistent but the taxes, air fast and destructive movie the speed that is really beyond the ability for humans to respond. And for those sorts of attacks, I think you absolutely need some sort of automated remediation. The most common solutions are some form of blocking the traffic, quarantining the traffic or maybe locking the accounts, and you're kind of blocking. Quarantining and locking are my top three, and then various forms of auditing and forensics go along the way. Amazon actually has a very good tool box for that already. And there are security orchestration, products that can help. And for products like extra hop, the ability to feed a detection into an action is actually a trivial form of integration that we offer out of the box. So the answer is yes. >> But let me go >> back to kind of the incrementalist approach as well that you mentioned. I kind of think about the space and really, really broad strokes and organizations for the last 10 years or so have really highly invested in prevention and protection. So a lot of this is your perimeter defense and in point protection, and the technologies have gotten better. Firewalls have turned into next generation firewalls and antivirus agents have turned into next generation anti virus or in point detection and response. But I strongly believe that network security has and in some ways just kind of lagged behind, and it's really ripe for innovation. And that's why that's what we've really spent the last decade >> building. And that's why you're excited about the traffic BPC traffic nearing because it allows for parallel analytics and so more real time, >> more real >> time. But the network has great properties that nothing else has. When you think about network security with the network itself is close to ground Truth as you can get, it's very hard to tamper with, and it's impossible to turn off those air great properties for cyber security. And you can't say that about something like that. Logs, which are from time to time disabled and scrubbed on. You certainly can't say that about en Pointe agents, which are often worked around and in some cases even used as a better for attack. >> I'm gonna ask you Okay, on that point, I get that. So the next question would come to my mind is okay with the surface here. With coyote expanding and with cloud, you have a sprawling surface area. So the surface area is growing just by default by natural evolution, connecting to the cloud people of back hauling their data into the cloud. All this is good stuff. >> Absolutely. Call it the attack surface, and it is absolutely glowing perhaps in an exponential >> about that dynamic, one sprawling attack air. Because that's just the environment now. And what's the best practice to kind of figure out security posture? Great, great >> question. People talk a lot about the dissolution of the perimeter, and I think I think that's a bit of the debate. And regardless of your views on that, we can all believe that the perimeter is changing and that workloads are moving around and that users are becoming more mobile. But I think an extremely important point is that every enterprise just about is hybrid. So we actually need protection for a hybrid attack surface. And that's an area where I believe extra hop offers a great solution because we have a solution that runs on premises in physical data centers are on campuses, which, no matter how much work, would you move to the cloud. You still have some sort of user on some sort of laptop or some sort of work station in some sort of campus environment, way workin in private cloud environments that are virtualized. And then, of course, we work in public cloud environments, and another announcement that we just made it this show, which I also think is game changing, is our revealed ex cloud offering. So this is an SAS. This is a sass based, network detection and response solution, which means that I talked about removing friction by marrying the traffic. But in this case, all >> you have to >> do is mirror the traffic, pointed to our sass, and we'll do all of the management mean that So is that in the streets for you that is in the marketplace. We launched it yesterday, >> So it's great integration point for you guys. Get it, get on board more customers. >> And I think I think solutions like ours are absolutely best practices and required to secure this hybrid attacks in the >> marketplace. What was that experience like, you know, Amazon >> was actually great to work with. I don't mean to say that with disbelief. You work with you work with such a large company. You kind of have certain expectations, and they exceeded all of my expectations in terms of their responsiveness. They worked with us extremely closely to get into the marketplace. They made recommendations with partners who could help accelerate our efforts. But >> in addition to the >> marketplace, we actually worked with them closely on the VPC traffic marrying feature. There was something we began talking with them about a SW far back, as I think last December, even before reinvent, they were extremely responsive to our feedback. They move very, very quickly. They've actually just >> been a delight to work. There's a question about you talking about the nana mutability of logs, and they go off line sometimes. And yet the same time there's been tens of $1,000,000,000 of value creation from that industry. Are there things that our magic there or things that you can learn from the analytics of analyzing logs that you could bring over to sort of what you're positioning is a more modern and cloud like approach? Or is there some kind of barrier to entry doing that? Can you shed some light on Jesse? That's >> a great question, and this is where I'll say it's a genius of the end situation, not a tyranny of the or so I'm not telling people. Don't collect your logs or analyze them. Of course you should do that, you know that's the best practice. But chances are that that space, you know, the log analysis and the, you know, the SIM market has become so mature. Chances are you're already doing that. And I'm not gonna tell organizations that they shouldn't have some sort of point protection. Of course you should. But what I am saying is that the network itself is a very fundamental data source that has all of those properties that are really good for cyber security and the ability that analyze what's going on in your environment in real time. Understand which users air involved? Which resource is air accessed? And are these behavioral patterns of suspicious and do they represent potential threats? I think that's very powerful. I have a I have a whole threat research team that we've built that just runs attacks, simulations and they run attack tools so that we can take behavioral profiles and understand what these look like in the environment. We build predictive models around how we expect you re sources and users and end points to behave. And when they deviate from those models, that's how we know something suspicious is going on. So this is definitely a a genius of the end situation. John >> reminds me of your you like you're very fond of saying, Hey, what got you here is not likely to move you forward. And that's kind of the takeaway for practitioners is >> yeah. I mean, you gotta build on your success. I mean, having economies of scale is about not having Disick onyx of scale, meaning you always constantly reinventing your product, not building on the success. And then you're gonna have more success if you can't trajectory if you it's just basic competitive strategy product strategy. But the thing that's interesting here is is that as you get more successful and you continue to raise the bar, which is an Amazon term, they work with you better. So if you're raising the bar and you did your own network security probably like OK, now we get parallel traffic mirroring so that >> that's true. But I think we've also heard the Amazon is I think they caught maniacally customer focused, right? And so I think that this traffic marrying capability really is due to customer demand. In fact, when you when you were if you were at the Kino when they made the announcement, that was the announcement where I feel like every phone in the in the whole auditorium went up. That's the announcement where I think there's a lot of excitement and for security practitioners in particular, and SEC ops teams I think this. I think this really reduces some anxiety they have, because cloud workloads really tend to be quite opaque. You have logs, you have audit logs, but it's very difficult to know what actually going on there and who is actually accessing that environment. And, even more important, where is my data going? This is where we can have all sorts of everything from a supply chain attack to a data exfiltration on. It's extremely important to to be able to have that visibility into these clouds >> way agree. We've been saying on the cue many, many years now that the network is the last bottleneck, really, where that script gets flipped upside down where Workloads air dictating Dev ops. Now the network piece is here, so I think this is going to create a lot of innovation. That's our belief. Love to follow up Mawr in Palo Alto. When we get back on this hybrid cloud, I think that's a huge opportunity. I think there's a create a blind spot for companies because that's where the the attackers will go, because they'll know that the hybrids rolling out and that'll be a vulnerability area >> one that's, you know, it's an arms race. Network security is not new. It's been around for decades. But the attack the attackers in the attacks have become more sophisticated, and as a result, you know the defenders need to raise their game as well. This is why, on the one hand, there's there's so much hype and I think machine learning in some ways is oversold. But in other ways, it is a great tool in our arsenal. You know, the machine learning the predictive models, the behavioral models, they really do work. And it really is the next evolution for defensive >> capabilities. Thanks for coming on. Great insight. >> One last question. The beer. Extra guys have been here way did in the past. It's been a while since >> we've done that, but it comes from early days when when I founded the company, people would ask you in the name extra hoppy. Oh, are you guys an online brewery? And we were joking. We said no, that that was extra hops way embraced it and We actually worked with a local brewer that has since been acquired by a major beverage brands. I >> don't know that. I just heard way built our own >> label, and it was the ex Rob Wired P. A. It was it was extremely well received. Every time we visit a customer they'd ask us to bring here. >> That's pretty. You gotta go back to proven formula. Thanks for the insights. Let's follow up when we get back in Palo Alto in our studio on his high breathing's a compelling conversation network Security Network analytics innovation areas where all the action's happening here in Boston, 80 best reinforced. Keep coverage. We'll be right back.
SUMMARY :
A W s reinforce 2019 brought to you by Amazon Web service is back, Everyone live Coverage of AWS reinforced their first conference, The Cube here in Boston. Great to see you again. Now the new conference reinforce not a team. This is in your wheelhouse. I think there's 8000 people here. This is really focused on cloud security and the nuances involved for on premises and cloud as Every survey that I've seen lists security extremely high on the list And we're seeing, you know, hopefully kind of Ah, lot of motivation in that market. I mean, you agree with that? I think it continues to be misunderstood. But I think it's a little bit biased, in the community they have. But at the same time, But at the same time, there was a narrative around. And I think until relatively recently it was very hard to enable some of them. By the way, the other feedback up ahead on the Cube is the sessions here have been so good because you can dig deeper But the BBC traffic mirroring was really the announcement of this What is it and why is it important? What it means is that you can configure a V p c traffic mirror four So this is like a great benefit to you guys. But it's required some friction in the deployment process Will this announcement on others that you guys were involved in take that next And for products like extra hop, the ability to feed a detection back to kind of the incrementalist approach as well that you mentioned. And that's why you're excited about the traffic BPC traffic nearing because it allows for parallel analytics And you can't say that about something like that. So the next question would come to my mind is okay Call it the attack surface, and it is absolutely glowing perhaps in an exponential Because that's just the environment now. But I think an extremely important point is that every enterprise just the management mean that So is that in the streets for you that is in the marketplace. So it's great integration point for you guys. What was that experience like, you know, Amazon I don't mean to say that with disbelief. There was something we began talking there or things that you can learn from the analytics of analyzing logs that you could bring that are really good for cyber security and the ability that analyze what's going on in your And that's kind of the takeaway for practitioners is But the thing that's interesting here is is that as you get more successful and you continue And so I think that this traffic marrying capability really Now the network piece is here, so I think this is going to create a lot of innovation. And it really is the next evolution for Thanks for coming on. It's been a while since we've done that, but it comes from early days when when I founded the company, people would ask you in the name extra I just heard way built our own Every time we visit a customer they'd ask us to bring here. Thanks for the insights.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
David | PERSON | 0.99+ |
Jesse Rothstein | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Steven Schmidt | PERSON | 0.99+ |
John | PERSON | 0.99+ |
Boston | LOCATION | 0.99+ |
Palo Alto | LOCATION | 0.99+ |
Steven | PERSON | 0.99+ |
David Lattin | PERSON | 0.99+ |
yesterday | DATE | 0.99+ |
BBC | ORGANIZATION | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Dave | PERSON | 0.99+ |
Jesse | PERSON | 0.99+ |
First | QUANTITY | 0.99+ |
Boston, Massachusetts | LOCATION | 0.99+ |
8000 people | QUANTITY | 0.99+ |
seven years | QUANTITY | 0.99+ |
last December | DATE | 0.99+ |
Stephen Step Rather | PERSON | 0.99+ |
first time | QUANTITY | 0.99+ |
over six years | QUANTITY | 0.99+ |
tens | QUANTITY | 0.99+ |
six | QUANTITY | 0.99+ |
One last question | QUANTITY | 0.99+ |
Extra Cube | ORGANIZATION | 0.98+ |
one | QUANTITY | 0.98+ |
six months ago | DATE | 0.98+ |
WS | ORGANIZATION | 0.98+ |
80 | QUANTITY | 0.98+ |
11 | QUANTITY | 0.98+ |
first | QUANTITY | 0.97+ |
this year | DATE | 0.97+ |
first conference | QUANTITY | 0.97+ |
Is Maura | TITLE | 0.97+ |
this week | DATE | 0.96+ |
Amazon Web | ORGANIZATION | 0.95+ |
VPC | PERSON | 0.95+ |
Kino | ORGANIZATION | 0.94+ |
2019 | DATE | 0.92+ |
two instances | QUANTITY | 0.92+ |
Cube | COMMERCIAL_ITEM | 0.92+ |
Disick | ORGANIZATION | 0.91+ |
decades | QUANTITY | 0.9+ |
Cube | ORGANIZATION | 0.89+ |
$1,000,000,000 | QUANTITY | 0.88+ |
Chapman | PERSON | 0.87+ |
VM World Reinvent | EVENT | 0.86+ |
eight | QUANTITY | 0.85+ |
top three | QUANTITY | 0.83+ |
Watchtower | TITLE | 0.83+ |
ExtraHop | ORGANIZATION | 0.81+ |
Wired P. | ORGANIZATION | 0.79+ |
last decade | DATE | 0.77+ |
G ET | ORGANIZATION | 0.75+ |
Rob | PERSON | 0.73+ |
God | PERSON | 0.66+ |
Con | EVENT | 0.64+ |
A W s | EVENT | 0.63+ |
last 10 years | DATE | 0.57+ |
years | QUANTITY | 0.56+ |
Mawr | PERSON | 0.56+ |
Prem | ORGANIZATION | 0.53+ |
SEC | ORGANIZATION | 0.53+ |
Def | ORGANIZATION | 0.52+ |
MacOS | TITLE | 0.48+ |
onyx | COMMERCIAL_ITEM | 0.42+ |
Black | ORGANIZATION | 0.37+ |
Data Science: Present and Future | IBM Data Science For All
>> Announcer: Live from New York City it's The Cube, covering IBM data science for all. Brought to you by IBM. (light digital music) >> Welcome back to data science for all. It's a whole new game. And it is a whole new game. >> Dave Vellante, John Walls here. We've got quite a distinguished panel. So it is a new game-- >> Well we're in the game, I'm just happy to be-- (both laugh) Have a swing at the pitch. >> Well let's what we have here. Five distinguished members of our panel. It'll take me a minute to get through the introductions, but believe me they're worth it. Jennifer Shin joins us. Jennifer's the founder of 8 Path Solutions, the director of the data science of Comcast and part of the faculty at UC Berkeley and NYU. Jennifer, nice to have you with us, we appreciate the time. Joe McKendrick an analyst and contributor of Forbes and ZDNet, Joe, thank you for being here at well. Another ZDNetter next to him, Dion Hinchcliffe, who is a vice president and principal analyst of Constellation Research and also contributes to ZDNet. Good to see you, sir. To the back row, but that doesn't mean anything about the quality of the participation here. Bob Hayes with a killer Batman shirt on by the way, which we'll get to explain in just a little bit. He runs the Business over Broadway. And Joe Caserta, who the founder of Caserta Concepts. Welcome to all of you. Thanks for taking the time to be with us. Jennifer, let me just begin with you. Obviously as a practitioner you're very involved in the industry, you're on the academic side as well. We mentioned Berkeley, NYU, steep experience. So I want you to kind of take your foot in both worlds and tell me about data science. I mean where do we stand now from those two perspectives? How have we evolved to where we are? And how would you describe, I guess the state of data science? >> Yeah so I think that's a really interesting question. There's a lot of changes happening. In part because data science has now become much more established, both in the academic side as well as in industry. So now you see some of the bigger problems coming out. People have managed to have data pipelines set up. But now there are these questions about models and accuracy and data integration. So the really cool stuff from the data science standpoint. We get to get really into the details of the data. And I think on the academic side you now see undergraduate programs, not just graduate programs, but undergraduate programs being involved. UC Berkeley just did a big initiative that they're going to offer data science to undergrads. So that's a huge news for the university. So I think there's a lot of interest from the academic side to continue data science as a major, as a field. But I think in industry one of the difficulties you're now having is businesses are now asking that question of ROI, right? What do I actually get in return in the initial years? So I think there's a lot of work to be done and just a lot of opportunity. It's great because people now understand better with data sciences, but I think data sciences have to really think about that seriously and take it seriously and really think about how am I actually getting a return, or adding a value to the business? >> And there's lot to be said is there not, just in terms of increasing the workforce, the acumen, the training that's required now. It's a still relatively new discipline. So is there a shortage issue? Or is there just a great need? Is the opportunity there? I mean how would you look at that? >> Well I always think there's opportunity to be smart. If you can be smarter, you know it's always better. It gives you advantages in the workplace, it gets you an advantage in academia. The question is, can you actually do the work? The work's really hard, right? You have to learn all these different disciplines, you have to be able to technically understand data. Then you have to understand it conceptually. You have to be able to model with it, you have to be able to explain it. There's a lot of aspects that you're not going to pick up overnight. So I think part of it is endurance. Like are people going to feel motivated enough and dedicate enough time to it to get very good at that skill set. And also of course, you know in terms of industry, will there be enough interest in the long term that there will be a financial motivation. For people to keep staying in the field, right? So I think it's definitely a lot of opportunity. But that's always been there. Like I tell people I think of myself as a scientist and data science happens to be my day job. That's just the job title. But if you are a scientist and you work with data you'll always want to work with data. I think that's just an inherent need. It's kind of a compulsion, you just kind of can't help yourself, but dig a little bit deeper, ask the questions, you can't not think about it. So I think that will always exist. Whether or not it's an industry job in the way that we see it today, and like five years from now, or 10 years from now. I think that's something that's up for debate. >> So all of you have watched the evolution of data and how it effects organizations for a number of years now. If you go back to the days when data warehouse was king, we had a lot of promises about 360 degree views of the customer and how we were going to be more anticipatory in terms and more responsive. In many ways the decision support systems and the data warehousing world didn't live up to those promises. They solved other problems for sure. And so everybody was looking for big data to solve those problems. And they've begun to attack many of them. We talked earlier in The Cube today about fraud detection, it's gotten much, much better. Certainly retargeting of advertising has gotten better. But I wonder if you could comment, you know maybe start with Joe. As to the effect that data and data sciences had on organizations in terms of fulfilling that vision of a 360 degree view of customers and anticipating customer needs. >> So. Data warehousing, I wouldn't say failed. But I think it was unfinished in order to achieve what we need done today. At the time I think it did a pretty good job. I think it was the only place where we were able to collect data from all these different systems, have it in a single place for analytics. The big difference between what I think, between data warehousing and data science is data warehouses were primarily made for the consumer to human beings. To be able to have people look through some tool and be able to analyze data manually. That really doesn't work anymore, there's just too much data to do that. So that's why we need to build a science around it so that we can actually have machines actually doing the analytics for us. And I think that's the biggest stride in the evolution over the past couple of years, that now we're actually able to do that, right? It used to be very, you know you go back to when data warehouses started, you had to be a deep technologist in order to be able to collect the data, write the programs to clean the data. But now you're average causal IT person can do that. Right now I think we're back in data science where you have to be a fairly sophisticated programmer, analyst, scientist, statistician, engineer, in order to do what we need to do, in order to make machines actually understand the data. But I think part of the evolution, we're just in the forefront. We're going to see over the next, not even years, within the next year I think a lot of new innovation where the average person within business and definitely the average person within IT will be able to do as easily say, "What are my sales going to be next year?" As easy as it is to say, "What were my sales last year." Where now it's a big deal. Right now in order to do that you have to build some algorithms, you have to be a specialist on predictive analytics. And I think, you know as the tools mature, as people using data matures, and as the technology ecosystem for data matures, it's going to be easier and more accessible. >> So it's still too hard. (laughs) That's something-- >> Joe C.: Today it is yes. >> You've written about and talked about. >> Yeah no question about it. We see this citizen data scientist. You know we talked about the democratization of data science but the way we talk about analytics and warehousing and all the tools we had before, they generated a lot of insights and views on the information, but they didn't really give us the science part. And that's, I think that what's missing is the forming of the hypothesis, the closing of the loop of. We now have use of this data, but are are changing, are we thinking about it strategically? Are we learning from it and then feeding that back into the process. I think that's the big difference between data science and the analytics side. But, you know just like Google made search available to everyone, not just people who had highly specialized indexers or crawlers. Now we can have tools that make these capabilities available to anyone. You know going back to what Joe said I think the key thing is we now have tools that can look at all the data and ask all the questions. 'Cause we can't possibly do it all ourselves. Our organizations are increasingly awash in data. Which is the life blood of our organizations, but we're not using it, you know this a whole concept of dark data. And so I think the concept, or the promise of opening these tools up for everyone to be able to access those insights and activate them, I think that, you know, that's where it's headed. >> This is kind of where the T shirt comes in right? So Bob if you would, so you've got this Batman shirt on. We talked a little bit about it earlier, but it plays right into what Dion's talking about. About tools and, I don't want to spoil it, but you go ahead (laughs) and tell me about it. >> Right, so. Batman is a super hero, but he doesn't have any supernatural powers, right? He can't fly on his own, he can't become invisible on his own. But the thing is he has the utility belt and he has these tools he can use to help him solve problems. For example he as the bat ring when he's confronted with a building that he wants to get over, right? So he pulls it out and uses that. So as data professionals we have all these tools now that these vendors are making. We have IBM SPSS, we have data science experience. IMB Watson that these data pros can now use it as part of their utility belt and solve problems that they're confronted with. So if you''re ever confronted with like a Churn problem and you have somebody who has access to that data they can put that into IBM Watson, ask a question and it'll tell you what's the key driver of Churn. So it's not that you have to be a superhuman to be a data scientist, but these tools will help you solve certain problems and help your business go forward. >> Joe McKendrick, do you have a comment? >> Does that make the Batmobile the Watson? (everyone laughs) Analogy? >> I was just going to add that, you know all of the billionaires in the world today and none of them decided to become Batman yet. It's very disappointing. >> Yeah. (Joe laughs) >> Go ahead Joe. >> And I just want to add some thoughts to our discussion about what happened with data warehousing. I think it's important to point out as well that data warehousing, as it existed, was fairly successful but for larger companies. Data warehousing is a very expensive proposition it remains a expensive proposition. Something that's in the domain of the Fortune 500. But today's economy is based on a very entrepreneurial model. The Fortune 500s are out there of course it's ever shifting. But you have a lot of smaller companies a lot of people with start ups. You have people within divisions of larger companies that want to innovate and not be tied to the corporate balance sheet. They want to be able to go through, they want to innovate and experiment without having to go through finance and the finance department. So there's all these open source tools available. There's cloud resources as well as open source tools. Hadoop of course being a prime example where you can work with the data and experiment with the data and practice data science at a very low cost. >> Dion mentioned the C word, citizen data scientist last year at the panel. We had a conversation about that. And the data scientists on the panel generally were like, "Stop." Okay, we're not all of a sudden going to turn everybody into data scientists however, what we want to do is get people thinking about data, more focused on data, becoming a data driven organization. I mean as a data scientist I wonder if you could comment on that. >> Well I think so the other side of that is, you know there are also many people who maybe didn't, you know follow through with science, 'cause it's also expensive. A PhD takes a lot of time. And you know if you don't get funding it's a lot of money. And for very little security if you think about how hard it is to get a teaching job that's going to give you enough of a pay off to pay that back. Right, the time that you took off, the investment that you made. So I think the other side of that is by making data more accessible, you allow people who could have been great in science, have an opportunity to be great data scientists. And so I think for me the idea of citizen data scientist, that's where the opportunity is. I think in terms of democratizing data and making it available for everyone, I feel as though it's something similar to the way we didn't really know what KPIs were, maybe 20 years ago. People didn't use it as readily, didn't teach it in schools. I think maybe 10, 20 years from now, some of the things that we're building today from data science, hopefully more people will understand how to use these tools. They'll have a better understanding of working with data and what that means, and just data literacy right? Just being able to use these tools and be able to understand what data's saying and actually what it's not saying. Which is the thing that most people don't think about. But you can also say that data doesn't say anything. There's a lot of noise in it. There's too much noise to be able to say that there is a result. So I think that's the other side of it. So yeah I guess in terms for me, in terms of data a serious data scientist, I think it's a great idea to have that, right? But at the same time of course everyone kind of emphasized you don't want everyone out there going, "I can be a data scientist without education, "without statistics, without math," without understanding of how to implement the process. I've seen a lot of companies implement the same sort of process from 10, 20 years ago just on Hadoop instead of SQL. Right and it's very inefficient. And the only difference is that you can build more tables wrong than they could before. (everyone laughs) Which is I guess >> For less. it's an accomplishment and for less, it's cheaper, yeah. >> It is cheaper. >> Otherwise we're like I'm not a data scientist but I did stay at a Holiday Inn Express last night, right? >> Yeah. (panelists laugh) And there's like a little bit of pride that like they used 2,000, you know they used 2,000 computers to do it. Like a little bit of pride about that, but you know of course maybe not a great way to go. I think 20 years we couldn't do that, right? One computer was already an accomplishment to have that resource. So I think you have to think about the fact that if you're doing it wrong, you're going to just make that mistake bigger, which his also the other side of working with data. >> Sure, Bob. >> Yeah I have a comment about that. I've never liked the term citizen data scientist or citizen scientist. I get the point of it and I think employees within companies can help in the data analytics problem by maybe being a data collector or something. I mean I would never have just somebody become a scientist based on a few classes here she takes. It's like saying like, "Oh I'm going to be a citizen lawyer." And so you come to me with your legal problems, or a citizen surgeon. Like you need training to be good at something. You can't just be good at something just 'cause you want to be. >> John: Joe you wanted to say something too on that. >> Since we're in New York City I'd like to use the analogy of a real scientist versus a data scientist. So real scientist requires tools, right? And the tools are not new, like microscopes and a laboratory and a clean room. And these tools have evolved over years and years, and since we're in New York we could walk within a 10 block radius and buy any of those tools. It doesn't make us a scientist because we use those tools. I think with data, you know making, making the tools evolve and become easier to use, you know like Bob was saying, it doesn't make you a better data scientist, it just makes the data more accessible. You know we can go buy a microscope, we can go buy Hadoop, we can buy any kind of tool in a data ecosystem, but it doesn't really make you a scientist. I'm very involved in the NYU data science program and the Columbia data science program, like these kids are brilliant. You know these kids are not someone who is, you know just trying to run a day to day job, you know in corporate America. I think the people who are running the day to day job in corporate America are going to be the recipients of data science. Just like people who take drugs, right? As a result of a smart data scientist coming up with a formula that can help people, I think we're going to make it easier to distribute the data that can help people with all the new tools. But it doesn't really make it, you know the access to the data and tools available doesn't really make you a better data scientist. Without, like Bob was saying, without better training and education. >> So how-- I'm sorry, how do you then, if it's not for everybody, but yet I'm the user at the end of the day at my company and I've got these reams of data before me, how do you make it make better sense to me then? So that's where machine learning comes in or artificial intelligence and all this stuff. So how at the end of the day, Dion? How do you make it relevant and usable, actionable to somebody who might not be as practiced as you would like? >> I agree with Joe that many of us will be the recipients of data science. Just like you had to be a computer science at one point to develop programs for a computer, now we can get the programs. You don't need to be a computer scientist to get a lot of value out of our IT systems. The same thing's going to happen with data science. There's far more demand for data science than there ever could be produced by, you know having an ivory tower filled with data scientists. Which we need those guys, too, don't get me wrong. But we need to have, productize it and make it available in packages such that it can be consumed. The outputs and even some of the inputs can be provided by mere mortals, whether that's machine learning or artificial intelligence or bots that go off and run the hypotheses and select the algorithms maybe with some human help. We have to productize it. This is a constant of data scientist of service, which is becoming a thing now. It's, "I need this, I need this capability at scale. "I need it fast and I need it cheap." The commoditization of data science is going to happen. >> That goes back to what I was saying about, the recipient also of data science is also machines, right? Because I think the other thing that's happening now in the evolution of data is that, you know the data is, it's so tightly coupled. Back when you were talking about data warehousing you have all the business transactions then you take the data out of those systems, you put them in a warehouse for analysis, right? Maybe they'll make a decision to change that system at some point. Now the analytics platform and the business application is very tightly coupled. They become dependent upon one another. So you know people who are using the applications are now be able to take advantage of the insights of data analytics and data science, just through the app. Which never really existed before. >> I have one comment on that. You were talking about how do you get the end user more involved, well like we said earlier data science is not easy, right? As an end user, I encourage you to take a stats course, just a basic stats course, understanding what a mean is, variability, regression analysis, just basic stuff. So you as an end user can get more, or glean more insight from the reports that you're given, right? If you go to France and don't know French, then people can speak really slowly to you in French, you're not going to get it. You need to understand the language of data to get value from the technology we have available to us. >> Incidentally French is one of the languages that you have the option of learning if you're a mathematicians. So math PhDs are required to learn a second language. France being the country of algebra, that's one of the languages you could actually learn. Anyway tangent. But going back to the point. So statistics courses, definitely encourage it. I teach statistics. And one of the things that I'm finding as I go through the process of teaching it I'm actually bringing in my experience. And by bringing in my experience I'm actually kind of making the students think about the data differently. So the other thing people don't think about is the fact that like statisticians typically were expected to do, you know, just basic sort of tasks. In a sense that they're knowledge is specialized, right? But the day to day operations was they ran some data, you know they ran a test on some data, looked at the results, interpret the results based on what they were taught in school. They didn't develop that model a lot of times they just understand what the tests were saying, especially in the medical field. So when you when think about things like, we have words like population, census. Which is when you take data from every single, you have every single data point versus a sample, which is a subset. It's a very different story now that we're collecting faster than it used to be. It used to be the idea that you could collect information from everyone. Like it happens once every 10 years, we built that in. But nowadays you know, you know here about Facebook, for instance, I think they claimed earlier this year that their data was more accurate than the census data. So now there are these claims being made about which data source is more accurate. And I think the other side of this is now statisticians are expected to know data in a different way than they were before. So it's not just changing as a field in data science, but I think the sciences that are using data are also changing their fields as well. >> Dave: So is sampling dead? >> Well no, because-- >> Should it be? (laughs) >> Well if you're sampling wrong, yes. That's really the question. >> Okay. You know it's been said that the data doesn't lie, people do. Organizations are very political. Oftentimes you know, lies, damned lies and statistics, Benjamin Israeli. Are you seeing a change in the way in which organizations are using data in the context of the politics. So, some strong P&L manager say gets data and crafts it in a way that he or she can advance their agenda. Or they'll maybe attack a data set that is, probably should drive them in a different direction, but might be antithetical to their agenda. Are you seeing data, you know we talked about democratizing data, are you seeing that reduce the politics inside of organizations? >> So you know we've always used data to tell stories at the top level of an organization that's what it's all about. And I still see very much that no matter how much data science or, the access to the truth through looking at the numbers that story telling is still the political filter through which all that data still passes, right? But it's the advent of things like Block Chain, more and more corporate records and corporate information is going to end up in these open and shared repositories where there is not alternate truth. It'll come back to whoever tells the best stories at the end of the day. So I still see the organizations are very political. We are seeing now more open data though. Open data initiatives are a big thing, both in government and in the private sector. It is having an effect, but it's slow and steady. So that's what I see. >> Um, um, go ahead. >> I was just going to say as well. Ultimately I think data driven decision making is a great thing. And it's especially useful at the lower tiers of the organization where you have the routine day to day's decisions that could be automated through machine learning and deep learning. The algorithms can be improved on a constant basis. On the upper levels, you know that's why you pay executives the big bucks in the upper levels to make the strategic decisions. And data can help them, but ultimately, data, IT, technology alone will not create new markets, it will not drive new businesses, it's up to human beings to do that. The technology is the tool to help them make those decisions. But creating businesses, growing businesses, is very much a human activity. And that's something I don't see ever getting replaced. Technology might replace many other parts of the organization, but not that part. >> I tend to be a foolish optimist when it comes to this stuff. >> You do. (laughs) >> I do believe that data will make the world better. I do believe that data doesn't lie people lie. You know I think as we start, I'm already seeing trends in industries, all different industries where, you know conventional wisdom is starting to get trumped by analytics. You know I think it's still up to the human being today to ignore the facts and go with what they think in their gut and sometimes they win, sometimes they lose. But generally if they lose the data will tell them that they should have gone the other way. I think as we start relying more on data and trusting data through artificial intelligence, as we start making our lives a little bit easier, as we start using smart cars for safety, before replacement of humans. AS we start, you know, using data really and analytics and data science really as the bumpers, instead of the vehicle, eventually we're going to start to trust it as the vehicle itself. And then it's going to make lying a little bit harder. >> Okay, so great, excellent. Optimism, I love it. (John laughs) So I'm going to play devil's advocate here a little bit. There's a couple elephant in the room topics that I want to, to explore a little bit. >> Here it comes. >> There was an article today in Wired. And it was called, Why AI is Still Waiting for It's Ethics Transplant. And, I will just read a little segment from there. It says, new ethical frameworks for AI need to move beyond individual responsibility to hold powerful industrial, government and military interests accountable as they design and employ AI. When tech giants build AI products, too often user consent, privacy and transparency are overlooked in favor of frictionless functionality that supports profit driven business models based on aggregate data profiles. This is from Kate Crawford and Meredith Whittaker who founded AI Now. And they're calling for sort of, almost clinical trials on AI, if I could use that analogy. Before you go to market you've got to test the human impact, the social impact. Thoughts. >> And also have the ability for a human to intervene at some point in the process. This goes way back. Is everybody familiar with the name Stanislav Petrov? He's the Soviet officer who back in 1983, it was in the control room, I guess somewhere outside of Moscow in the control room, which detected a nuclear missile attack against the Soviet Union coming out of the United States. Ordinarily I think if this was an entirely AI driven process we wouldn't be sitting here right now talking about it. But this gentlemen looked at what was going on on the screen and, I'm sure he's accountable to his authorities in the Soviet Union. He probably got in a lot of trouble for this, but he decided to ignore the signals, ignore the data coming out of, from the Soviet satellites. And as it turned out, of course he was right. The Soviet satellites were seeing glints of the sun and they were interpreting those glints as missile launches. And I think that's a great example why, you know every situation of course doesn't mean the end of the world, (laughs) it was in this case. But it's a great example why there needs to be a human component, a human ability for human intervention at some point in the process. >> So other thoughts. I mean organizations are driving AI hard for profit. Best minds of our generation are trying to figure out how to get people to click on ads. Jeff Hammerbacher is famous for saying it. >> You can use data for a lot of things, data analytics, you can solve, you can cure cancer. You can make customers click on more ads. It depends on what you're goal is. But, there are ethical considerations we need to think about. When we have data that will have a racial bias against blacks and have them have higher prison sentences or so forth or worse credit scores, so forth. That has an impact on a broad group of people. And as a society we need to address that. And as scientists we need to consider how are we going to fix that problem? Cathy O'Neil in her book, Weapons of Math Destruction, excellent book, I highly recommend that your listeners read that book. And she talks about these issues about if AI, if algorithms have a widespread impact, if they adversely impact protected group. And I forget the last criteria, but like we need to really think about these things as a people, as a country. >> So always think the idea of ethics is interesting. So I had this conversation come up a lot of times when I talk to data scientists. I think as a concept, right as an idea, yes you want things to be ethical. The question I always pose to them is, "Well in the business setting "how are you actually going to do this?" 'Cause I find the most difficult thing working as a data scientist, is to be able to make the day to day decision of when someone says, "I don't like that number," how do you actually get around that. If that's the right data to be showing someone or if that's accurate. And say the business decides, "Well we don't like that number." Many people feel pressured to then change the data, change, or change what the data shows. So I think being able to educate people to be able to find ways to say what the data is saying, but not going past some line where it's a lie, where it's unethical. 'Cause you can also say what data doesn't say. You don't always have to say what the data does say. You can leave it as, "Here's what we do know, "but here's what we don't know." There's a don't know part that many people will omit when they talk about data. So I think, you know especially when it comes to things like AI it's tricky, right? Because I always tell people I don't know everyone thinks AI's going to be so amazing. I started an industry by fixing problems with computers that people didn't realize computers had. For instance when you have a system, a lot of bugs, we all have bug reports that we've probably submitted. I mean really it's no where near the point where it's going to start dominating our lives and taking over all the jobs. Because frankly it's not that advanced. It's still run by people, still fixed by people, still managed by people. I think with ethics, you know a lot of it has to do with the regulations, what the laws say. That's really going to be what's involved in terms of what people are willing to do. A lot of businesses, they want to make money. If there's no rules that says they can't do certain things to make money, then there's no restriction. I think the other thing to think about is we as consumers, like everyday in our lives, we shouldn't separate the idea of data as a business. We think of it as a business person, from our day to day consumer lives. Meaning, yes I work with data. Incidentally I also always opt out of my credit card, you know when they send you that information, they make you actually mail them, like old school mail, snail mail like a document that says, okay I don't want to be part of this data collection process. Which I always do. It's a little bit more work, but I go through that step of doing that. Now if more people did that, perhaps companies would feel more incentivized to pay you for your data, or give you more control of your data. Or at least you know, if a company's going to collect information, I'd want you to be certain processes in place to ensure that it doesn't just get sold, right? For instance if a start up gets acquired what happens with that data they have on you? You agree to give it to start up. But I mean what are the rules on that? So I think we have to really think about the ethics from not just, you know, someone who's going to implement something but as consumers what control we have for our own data. 'Cause that's going to directly impact what businesses can do with our data. >> You know you mentioned data collection. So slightly on that subject. All these great new capabilities we have coming. We talked about what's going to happen with media in the future and what 5G technology's going to do to mobile and these great bandwidth opportunities. The internet of things and the internet of everywhere. And all these great inputs, right? Do we have an arms race like are we keeping up with the capabilities to make sense of all the new data that's going to be coming in? And how do those things square up in this? Because the potential is fantastic, right? But are we keeping up with the ability to make it make sense and to put it to use, Joe? >> So I think data ingestion and data integration is probably one of the biggest challenges. I think, especially as the world is starting to become more dependent on data. I think you know, just because we're dependent on numbers we've come up with GAAP, which is generally accepted accounting principles that can be audited and proven whether it's true or false. I think in our lifetime we will see something similar to that we will we have formal checks and balances of data that we use that can be audited. Getting back to you know what Dave was saying earlier about, I personally would trust a machine that was programmed to do the right thing, than to trust a politician or some leader that may have their own agenda. And I think the other thing about machines is that they are auditable. You know you can look at the code and see exactly what it's doing and how it's doing it. Human beings not so much. So I think getting to the truth, even if the truth isn't the answer that we want, I think is a positive thing. It's something that we can't do today that once we start relying on machines to do we'll be able to get there. >> Yeah I was just going to add that we live in exponential times. And the challenge is that the way that we're structured traditionally as organizations is not allowing us to absorb advances exponentially, it's linear at best. Everyone talks about change management and how are we going to do digital transformation. Evidence shows that technology's forcing the leaders and the laggards apart. There's a few leading organizations that are eating the world and they seem to be somehow rolling out new things. I don't know how Amazon rolls out all this stuff. There's all this artificial intelligence and the IOT devices, Alexa, natural language processing and that's just a fraction, it's just a tip of what they're releasing. So it just shows that there are some organizations that have path found the way. Most of the Fortune 500 from the year 2000 are gone already, right? The disruption is happening. And so we are trying, have to find someway to adopt these new capabilities and deploy them effectively or the writing is on the wall. I spent a lot of time exploring this topic, how are we going to get there and all of us have a lot of hard work is the short answer. >> I read that there's going to be more data, or it was predicted, more data created in this year than in the past, I think it was five, 5,000 years. >> Forever. (laughs) >> And that to mix the statistics that we're analyzing currently less than 1% of the data. To taking those numbers and hear what you're all saying it's like, we're not keeping up, it seems like we're, it's not even linear. I mean that gap is just going to grow and grow and grow. How do we close that? >> There's a guy out there named Chris Dancy, he's known as the human cyborg. He has 700 hundred sensors all over his body. And his theory is that data's not new, having access to the data is new. You know we've always had a blood pressure, we've always had a sugar level. But we were never able to actually capture it in real time before. So now that we can capture and harness it, now we can be smarter about it. So I think that being able to use this information is really incredible like, this is something that over our lifetime we've never had and now we can do it. Which hence the big explosion in data. But I think how we use it and have it governed I think is the challenge right now. It's kind of cowboys and indians out there right now. And without proper governance and without rigorous regulation I think we are going to have some bumps in the road along the way. >> The data's in the oil is the question how are we actually going to operationalize around it? >> Or find it. Go ahead. >> I will say the other side of it is, so if you think about information, we always have the same amount of information right? What we choose to record however, is a different story. Now if you want wanted to know things about the Olympics, but you decide to collect information every day for years instead of just the Olympic year, yes you have a lot of data, but did you need all of that data? For that question about the Olympics, you don't need to collect data during years there are no Olympics, right? Unless of course you're comparing it relative. But I think that's another thing to think about. Just 'cause you collect more data does not mean that data will produce more statistically significant results, it does not mean it'll improve your model. You can be collecting data about your shoe size trying to get information about your hair. I mean it really does depend on what you're trying to measure, what your goals are, and what the data's going to be used for. If you don't factor the real world context into it, then yeah you can collect data, you know an infinite amount of data, but you'll never process it. Because you have no question to ask you're not looking to model anything. There is no universal truth about everything, that just doesn't exist out there. >> I think she's spot on. It comes down to what kind of questions are you trying to ask of your data? You can have one given database that has 100 variables in it, right? And you can ask it five different questions, all valid questions and that data may have those variables that'll tell you what's the best predictor of Churn, what's the best predictor of cancer treatment outcome. And if you can ask the right question of the data you have then that'll give you some insight. Just data for data's sake, that's just hype. We have a lot of data but it may not lead to anything if we don't ask it the right questions. >> Joe. >> I agree but I just want to add one thing. This is where the science in data science comes in. Scientists often will look at data that's already been in existence for years, weather forecasts, weather data, climate change data for example that go back to data charts and so forth going back centuries if that data is available. And they reformat, they reconfigure it, they get new uses out of it. And the potential I see with the data we're collecting is it may not be of use to us today, because we haven't thought of ways to use it, but maybe 10, 20, even 100 years from now someone's going to think of a way to leverage the data, to look at it in new ways and to come up with new ideas. That's just my thought on the science aspect. >> Knowing what you know about data science, why did Facebook miss Russia and the fake news trend? They came out and admitted it. You know, we miss it, why? Could they have, is it because they were focused elsewhere? Could they have solved that problem? (crosstalk) >> It's what you said which is are you asking the right questions and if you're not looking for that problem in exactly the way that it occurred you might not be able to find it. >> I thought the ads were paid in rubles. Shouldn't that be your first clue (panelists laugh) that something's amiss? >> You know red flag, so to speak. >> Yes. >> I mean Bitcoin maybe it could have hidden it. >> Bob: Right, exactly. >> I would think too that what happened last year is actually was the end of an age of optimism. I'll bring up the Soviet Union again, (chuckles). It collapsed back in 1991, 1990, 1991, Russia was reborn in. And think there was a general feeling of optimism in the '90s through the 2000s that Russia is now being well integrated into the world economy as other nations all over the globe, all continents are being integrated into the global economy thanks to technology. And technology is lifting entire continents out of poverty and ensuring more connectedness for people. Across Africa, India, Asia, we're seeing those economies that very different countries than 20 years ago and that extended into Russia as well. Russia is part of the global economy. We're able to communicate as a global, a global network. I think as a result we kind of overlook the dark side that occurred. >> John: Joe? >> Again, the foolish optimist here. But I think that... It shouldn't be the question like how did we miss it? It's do we have the ability now to catch it? And I think without data science without machine learning, without being able to train machines to look for patterns that involve corruption or result in corruption, I think we'd be out of luck. But now we have those tools. And now hopefully, optimistically, by the next election we'll be able to detect these things before they become public. >> It's a loaded question because my premise was Facebook had the ability and the tools and the knowledge and the data science expertise if in fact they wanted to solve that problem, but they were focused on other problems, which is how do I get people to click on ads? >> Right they had the ability to train the machines, but they were giving the machines the wrong training. >> Looking under the wrong rock. >> (laughs) That's right. >> It is easy to play armchair quarterback. Another topic I wanted to ask the panel about is, IBM Watson. You guys spend time in the Valley, I spend time in the Valley. People in the Valley poo-poo Watson. Ah, Google, Facebook, Amazon they've got the best AI. Watson, and some of that's fair criticism. Watson's a heavy lift, very services oriented, you just got to apply it in a very focused. At the same time Google's trying to get you to click on Ads, as is Facebook, Amazon's trying to get you to buy stuff. IBM's trying to solve cancer. Your thoughts on that sort of juxtaposition of the different AI suppliers and there may be others. Oh, nobody wants to touch this one, come on. I told you elephant in the room questions. >> Well I mean you're looking at two different, very different types of organizations. One which is really spent decades in applying technology to business and these other companies are ones that are primarily into the consumer, right? When we talk about things like IBM Watson you're looking at a very different type of solution. You used to be able to buy IT and once you installed it you pretty much could get it to work and store your records or you know, do whatever it is you needed it to do. But these types of tools, like Watson actually tries to learn your business. And it needs to spend time doing that watching the data and having its models tuned. And so you don't get the results right away. And I think that's been kind of the challenge that organizations like IBM has had. Like this is a different type of technology solution, one that has to actually learn first before it can provide value. And so I think you know you have organizations like IBM that are much better at applying technology to business, and then they have the further hurdle of having to try to apply these tools that work in very different ways. There's education too on the side of the buyer. >> I'd have to say that you know I think there's plenty of businesses out there also trying to solve very significant, meaningful problems. You know with Microsoft AI and Google AI and IBM Watson, I think it's not really the tool that matters, like we were saying earlier. A fool with a tool is still a fool. And regardless of who the manufacturer of that tool is. And I think you know having, a thoughtful, intelligent, trained, educated data scientist using any of these tools can be equally effective. >> So do you not see core AI competence and I left out Microsoft, as a strategic advantage for these companies? Is it going to be so ubiquitous and available that virtually anybody can apply it? Or is all the investment in R&D and AI going to pay off for these guys? >> Yeah, so I think there's different levels of AI, right? So there's AI where you can actually improve the model. I remember when I was invited when Watson was kind of first out by IBM to a private, sort of presentation. And my question was, "Okay, so when do I get "to access the corpus?" The corpus being sort of the foundation of NLP, which is natural language processing. So it's what you use as almost like a dictionary. Like how you're actually going to measure things, or things up. And they said, "Oh you can't." "What do you mean I can't?" It's like, "We do that." "So you're telling me as a data scientist "you're expecting me to rely on the fact "that you did it better than me and I should rely on that." I think over the years after that IBM started opening it up and offering different ways of being able to access the corpus and work with that data. But I remember at the first Watson hackathon there was only two corpus available. It was either the travel or medicine. There was no other foundational data available. So I think one of the difficulties was, you know IBM being a little bit more on the forefront of it they kind of had that burden of having to develop these systems and learning kind of the hard way that if you don't have the right models and you don't have the right data and you don't have the right access, that's going to be a huge limiter. I think with things like medical, medical information that's an extremely difficult data to start with. Partly because you know anything that you do find or don't find, the impact is significant. If I'm looking at things like what people clicked on the impact of using that data wrong, it's minimal. You might lose some money. If you do that with healthcare data, if you do that with medical data, people may die, like this is a much more difficult data set to start with. So I think from a scientific standpoint it's great to have any information about a new technology, new process. That's the nice that is that IBM's obviously invested in it and collected information. I think the difficulty there though is just 'cause you have it you can't solve everything. And if feel like from someone who works in technology, I think in general when you appeal to developers you try not to market. And with Watson it's very heavily marketed, which tends to turn off people who are more from the technical side. Because I think they don't like it when it's gimmicky in part because they do the opposite of that. They're always trying to build up the technical components of it. They don't like it when you're trying to convince them that you're selling them something when you could just give them the specs and look at it. So it could be something as simple as communication. But I do think it is valuable to have had a company who leads on the forefront of that and try to do so we can actually learn from what IBM has learned from this process. >> But you're an optimist. (John laughs) All right, good. >> Just one more thought. >> Joe go ahead first. >> Joe: I want to see how Alexa or Siri do on Jeopardy. (panelists laugh) >> All right. Going to go around a final thought, give you a second. Let's just think about like your 12 month crystal ball. In terms of either challenges that need to be met in the near term or opportunities you think will be realized. 12, 18 month horizon. Bob you've got the microphone headed up, so I'll let you lead off and let's just go around. >> I think a big challenge for business, for society is getting people educated on data and analytics. There's a study that was just released I think last month by Service Now, I think, or some vendor, or Click. They found that only 17% of the employees in Europe have the ability to use data in their job. Think about that. >> 17. >> 17. Less than 20%. So these people don't have the ability to understand or use data intelligently to improve their work performance. That says a lot about the state we're in today. And that's Europe. It's probably a lot worse in the United States. So that's a big challenge I think. To educate the masses. >> John: Joe. >> I think we probably have a better chance of improving technology over training people. I think using data needs to be iPhone easy. And I think, you know which means that a lot of innovation is in the years to come. I do think that a keyboard is going to be a thing of the past for the average user. We are going to start using voice a lot more. I think augmented reality is going to be things that becomes a real reality. Where we can hold our phone in front of an object and it will have an overlay of prices where it's available, if it's a person. I think that we will see within an organization holding a camera up to someone and being able to see what is their salary, what sales did they do last year, some key performance indicators. I hope that we are beyond the days of everyone around the world walking around like this and we start actually becoming more social as human beings through augmented reality. I think, it has to happen. I think we're going through kind of foolish times at the moment in order to get to the greater good. And I think the greater good is using technology in a very, very smart way. Which means that you shouldn't have to be, sorry to contradict, but maybe it's good to counterpoint. I don't think you need to have a PhD in SQL to use data. Like I think that's 1990. I think as we evolve it's going to become easier for the average person. Which means people like the brain trust here needs to get smarter and start innovating. I think the innovation around data is really at the tip of the iceberg, we're going to see a lot more of it in the years to come. >> Dion why don't you go ahead, then we'll come down the line here. >> Yeah so I think over that time frame two things are likely to happen. One is somebody's going to crack the consumerization of machine learning and AI, such that it really is available to the masses and we can do much more advanced things than we could. We see the industries tend to reach an inflection point and then there's an explosion. No one's quite cracked the code on how to really bring this to everyone, but somebody will. And that could happen in that time frame. And then the other thing that I think that almost has to happen is that the forces for openness, open data, data sharing, open data initiatives things like Block Chain are going to run headlong into data protection, data privacy, customer privacy laws and regulations that have to come down and protect us. Because the industry's not doing it, the government is stepping in and it's going to re-silo a lot of our data. It's going to make it recede and make it less accessible, making data science harder for a lot of the most meaningful types of activities. Patient data for example is already all locked down. We could do so much more with it, but health start ups are really constrained about what they can do. 'Cause they can't access the data. We can't even access our own health care records, right? So I think that's the challenge is we have to have that battle next to be able to go and take the next step. >> Well I see, with the growth of data a lot of it's coming through IOT, internet of things. I think that's a big source. And we're going to see a lot of innovation. A new types of Ubers or Air BnBs. Uber's so 2013 though, right? We're going to see new companies with new ideas, new innovations, they're going to be looking at the ways this data can be leveraged all this big data. Or data coming in from the IOT can be leveraged. You know there's some examples out there. There's a company for example that is outfitting tools, putting sensors in the tools. Industrial sites can therefore track where the tools are at any given time. This is an expensive, time consuming process, constantly loosing tools, trying to locate tools. Assessing whether the tool's being applied to the production line or the right tool is at the right torque and so forth. With the sensors implanted in these tools, it's now possible to be more efficient. And there's going to be innovations like that. Maybe small start up type things or smaller innovations. We're going to see a lot of new ideas and new types of approaches to handling all this data. There's going to be new business ideas. The next Uber, we may be hearing about it a year from now whatever that may be. And that Uber is going to be applying data, probably IOT type data in some, new innovative way. >> Jennifer, final word. >> Yeah so I think with data, you know it's interesting, right, for one thing I think on of the things that's made data more available and just people we open to the idea, has been start ups. But what's interesting about this is a lot of start ups have been acquired. And a lot of people at start ups that got acquired now these people work at bigger corporations. Which was the way it was maybe 10 years ago, data wasn't available and open, companies kept it very proprietary, you had to sign NDAs. It was like within the last 10 years that open source all of that initiatives became much more popular, much more open, a acceptable sort of way to look at data. I think that what I'm kind of interested in seeing is what people do within the corporate environment. Right, 'cause they have resources. They have funding that start ups don't have. And they have backing, right? Presumably if you're acquired you went in at a higher title in the corporate structure whereas if you had started there you probably wouldn't be at that title at that point. So I think you have an opportunity where people who have done innovative things and have proven that they can build really cool stuff, can now be in that corporate environment. I think part of it's going to be whether or not they can really adjust to sort of the corporate, you know the corporate landscape, the politics of it or the bureaucracy. I think every organization has that. Being able to navigate that is a difficult thing in part 'cause it's a human skill set, it's a people skill, it's a soft skill. It's not the same thing as just being able to code something and sell it. So you know it's going to really come down to people. I think if people can figure out for instance, what people want to buy, what people think, in general that's where the money comes from. You know you make money 'cause someone gave you money. So if you can find a way to look at a data or even look at technology and understand what people are doing, aren't doing, what they're happy about, unhappy about, there's always opportunity in collecting the data in that way and being able to leverage that. So you build cooler things, and offer things that haven't been thought of yet. So it's a very interesting time I think with the corporate resources available if you can do that. You know who knows what we'll have in like a year. >> I'll add one. >> Please. >> The majority of companies in the S&P 500 have a market cap that's greater than their revenue. The reason is 'cause they have IP related to data that's of value. But most of those companies, most companies, the vast majority of companies don't have any way to measure the value of that data. There's no GAAP accounting standard. So they don't understand the value contribution of their data in terms of how it helps them monetize. Not the data itself necessarily, but how it contributes to the monetization of the company. And I think that's a big gap. If you don't understand the value of the data that means you don't understand how to refine it, if data is the new oil and how to protect it and so forth and secure it. So that to me is a big gap that needs to get closed before we can actually say we live in a data driven world. >> So you're saying I've got an asset, I don't know if it's worth this or this. And they're missing that great opportunity. >> So devolve to what I know best. >> Great discussion. Really, really enjoyed the, the time as flown by. Joe if you get that augmented reality thing to work on the salary, point it toward that guy not this guy, okay? (everyone laughs) It's much more impressive if you point it over there. But Joe thank you, Dion, Joe and Jennifer and Batman. We appreciate and Bob Hayes, thanks for being with us. >> Thanks you guys. >> Really enjoyed >> Great stuff. >> the conversation. >> And a reminder coming up a the top of the hour, six o'clock Eastern time, IBMgo.com featuring the live keynote which is being set up just about 50 feet from us right now. Nick Silver is one of the headliners there, John Thomas is well, or rather Rob Thomas. John Thomas we had on earlier on The Cube. But a panel discussion as well coming up at six o'clock on IBMgo.com, six to 7:15. Be sure to join that live stream. That's it from The Cube. We certainly appreciate the time. Glad to have you along here in New York. And until the next time, take care. (bright digital music)
SUMMARY :
Brought to you by IBM. Welcome back to data science for all. So it is a new game-- Have a swing at the pitch. Thanks for taking the time to be with us. from the academic side to continue data science And there's lot to be said is there not, ask the questions, you can't not think about it. of the customer and how we were going to be more anticipatory And I think, you know as the tools mature, So it's still too hard. I think that, you know, that's where it's headed. So Bob if you would, so you've got this Batman shirt on. to be a data scientist, but these tools will help you I was just going to add that, you know I think it's important to point out as well that And the data scientists on the panel And the only difference is that you can build it's an accomplishment and for less, So I think you have to think about the fact that I get the point of it and I think and become easier to use, you know like Bob was saying, So how at the end of the day, Dion? or bots that go off and run the hypotheses So you know people who are using the applications are now then people can speak really slowly to you in French, But the day to day operations was they ran some data, That's really the question. You know it's been said that the data doesn't lie, the access to the truth through looking at the numbers of the organization where you have the routine I tend to be a foolish optimist You do. I think as we start relying more on data and trusting data There's a couple elephant in the room topics Before you go to market you've got to test And also have the ability for a human to intervene to click on ads. And I forget the last criteria, but like we need I think with ethics, you know a lot of it has to do of all the new data that's going to be coming in? Getting back to you know what Dave was saying earlier about, organizations that have path found the way. than in the past, I think it was (laughs) I mean that gap is just going to grow and grow and grow. So I think that being able to use this information Or find it. But I think that's another thing to think about. And if you can ask the right question of the data you have And the potential I see with the data we're collecting is Knowing what you know about data science, for that problem in exactly the way that it occurred I thought the ads were paid in rubles. I think as a result we kind of overlook And I think without data science without machine learning, Right they had the ability to train the machines, At the same time Google's trying to get you And so I think you know And I think you know having, I think in general when you appeal to developers But you're an optimist. Joe: I want to see how Alexa or Siri do on Jeopardy. in the near term or opportunities you think have the ability to use data in their job. That says a lot about the state we're in today. I don't think you need to have a PhD in SQL to use data. Dion why don't you go ahead, We see the industries tend to reach an inflection point And that Uber is going to be applying data, I think part of it's going to be whether or not if data is the new oil and how to protect it I don't know if it's worth this or this. Joe if you get that augmented reality thing Glad to have you along here in New York.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Jeff Hammerbacher | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Dion Hinchcliffe | PERSON | 0.99+ |
John | PERSON | 0.99+ |
Jennifer | PERSON | 0.99+ |
Joe | PERSON | 0.99+ |
Comcast | ORGANIZATION | 0.99+ |
Chris Dancy | PERSON | 0.99+ |
Jennifer Shin | PERSON | 0.99+ |
Cathy O'Neil | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Stanislav Petrov | PERSON | 0.99+ |
Joe McKendrick | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Nick Silver | PERSON | 0.99+ |
John Thomas | PERSON | 0.99+ |
100 variables | QUANTITY | 0.99+ |
John Walls | PERSON | 0.99+ |
1990 | DATE | 0.99+ |
Joe Caserta | PERSON | 0.99+ |
Rob Thomas | PERSON | 0.99+ |
Uber | ORGANIZATION | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
UC Berkeley | ORGANIZATION | 0.99+ |
1983 | DATE | 0.99+ |
1991 | DATE | 0.99+ |
2013 | DATE | 0.99+ |
Constellation Research | ORGANIZATION | 0.99+ |
Europe | LOCATION | 0.99+ |
ORGANIZATION | 0.99+ | |
Bob | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
Bob Hayes | PERSON | 0.99+ |
United States | LOCATION | 0.99+ |
360 degree | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
New York | LOCATION | 0.99+ |
Benjamin Israeli | PERSON | 0.99+ |
France | LOCATION | 0.99+ |
Africa | LOCATION | 0.99+ |
12 month | QUANTITY | 0.99+ |
Soviet Union | LOCATION | 0.99+ |
Batman | PERSON | 0.99+ |
New York City | LOCATION | 0.99+ |
last year | DATE | 0.99+ |
Olympics | EVENT | 0.99+ |
Meredith Whittaker | PERSON | 0.99+ |
iPhone | COMMERCIAL_ITEM | 0.99+ |
Moscow | LOCATION | 0.99+ |
Ubers | ORGANIZATION | 0.99+ |
20 years | QUANTITY | 0.99+ |
Joe C. | PERSON | 0.99+ |