ON DEMAND BUILDING MULTI CLUSTER CONTAINER PLATFORM SPG FINAL 2

>> Hello, everyone. I'm Khalil Ahmad, Senior Director, Architecture at S&P Global. I have been working with S&P Global for six years now. Previously, I worked for Citigroup and Prudential. Overall, I have been part of IT industry for 30 years, and most of my professional career has been within financial sector in New York City metro area. I live in New Jersey with my wife and son, Daniel Khalil. I have a Master degree in software engineering from the University of Scranton, and Master in mathematics University of Punjab, Lahore. And currently I am pursuing TRIUM global Executive MBA. A joint program from the NYU Stern, LSE and HEC Paris. So today, I'm going to talk about building multi-cluster scalable container platform, supporting on-prem hybrid and multicloud use cases, how we leverage that with an S&P Global and what was our best story. As far as the agenda is concerned, I will go over, quickly the problem statement. Then I will mention the work of our core requirements, how we get solutioning, how Docker Enterprise helped us. And at the end, I will go over the pilot deployment for a proof of concept which we leverage. So, as far as the problem statement is concerned. Containers, as you all know, in the enterprise are becoming mainstream but expertise remains limited and challenges are mounting as containers enter production. Some companies are building skills internally and someone looking for partners that can help catalyze success, and choosing more integrated solutions that accelerate deployments and simplify the container environment. To overcome the challenges, we at S&P Global started our journey a few years back, taking advantage of both options. So, first of all, we met with all the stakeholder, application team, Product Manager and we define our core requirements. What we want out of this container platform, which supports multicloud and hybrid supporting on-prem as well. So, as you see my core requirements, we decided that we need first of all a roadmap or container strategy, providing guidelines on standards and specification. Secondly, with an S&P Global, we decided to introduce Platform as a Service approach, where we bring the container platform and provide that as a service internally to our all application team and all the Product Managers. Hosting multiple application on-prem as well as in multicloud. Third requirement was that we need Linux and Windows container support. In addition to that, we would also require hosted secure image registry with role based access control and image security scanning. In addition to that, we also started DevOps journey, so we want to have a full support of CI/CD pipeline. Whatever the solution we recommend from the architecture group, it should be easily integrated to the developer workstation. And developer workstation could be Windows, Mac or Linux. Orchestration, performance and control were few other parameter which we'll want to keep in mind. And the most important, dynamic scaling of container clusters. That was something we were also want to achieve, when we introduce this Platform as a Service. So, as far as the standard specification are concerned, we turn to the Open Container Initiative, the OCI. OCI was established in June 2015 by Docker and other leaders in the technology industry. And OCI operates under Linux Foundation, and currently contains two specification, runtime specification and image specification. So, at that time, it was a no brainer, other than to just stick with OCI. So, we are following the industry standard and specifications. Now the next step was, okay, the container platform. But what would be our runtime engine? What would be orchestration? And how we support, in our on-prem as well as in the multicloud infrastructure? So, when it comes to runtime engine, we decided to go with the Docker. Which is by default, runtime engine and Kubernetes. And if I may mention, DataDog in one of their public report, they say Docker is probably the most talked about infrastructure technology for the past few years. So, sticking to Docker runtime engine was another win-win game and we saw in future not bringing any challenge or issues. When it comes to orchestration. We prefer Kubernetes but that time there was a challenge, Kubernetes did not support Windows container. So, we wanted something which worked with a Linux container, and also has the ability or to orchestrate Windows containers. So, even though long term we want to stick to Kubernetes, but we also wanted to have a Docker swarm. When it comes to on-prem and multicloud, technically you could only support as of now, technology may change in future, but as of now, you can only support if you bring your own orchestration too. So, in our case, if we have control over orchestration control and not locked in with one cloud provider, that was the ideal situation. So, with all that, research, R&D and finding, we found Docker Enterprise. Which is securely built, share and run modern applications anywhere. So, when we come across Docker Enterprise, we were pleased to see that it meets our most of the core requirements. Whether it is coming on the developer machine, to integrating their workstation, building the application. Whether it comes to sharing those application, in a secure way and collaborating with our pipeline. And the lastly, when it comes to the running. If we run in hybrid or multicloud or edge, in Kubernetes, Docker Enterprise have the support all the way. So, three area one I just call up all the Docker Enterprise, choice, flexibility and security. I'm sure there's a lot more features in Docker Enterprise as a suite. But, when we looked at these three words very quickly, simplified hybrid orchestration. Define application centric policies and boundaries. Once you define, you're all set. Then you just maintain those policies. Manage diverse application across mixed infrastructure, with secure segmentation. Then it comes to secure software supply chain. Provenance across the entire lifecycle of apps and infrastructure through enforceable policy. Consistently manage all apps and infrastructure. And lastly, when it comes to infrastructure independence. It was easily forever lift and shift, because same time, our cloud journey was in the flight. We were moving from on-prem to the cloud. So, support for lift and shift application was one of our wishlist. And Docker Enterprise did not disappoint us. It also supported both traditional and micro services apps on any infrastructure. So, here we are, Docker Enterprise. Why Docker Enterprise? Some of the items in previous slides I mentioned. But in addition to those industry-leading platform, simplifying the IT operations, for running modern application at scale, anywhere. Docker Enterprise also has developer tools. So, the integration, as I mentioned earlier was smooth. In addition to all these tools, the main two components, the Universal Control Plane and the Docker Trusted Registry, solve lot of our problems. When it comes to the orchestration, we have our own Universal Control Plane. Which under the hood, manages Kubernetes and Docker swarm both clusters. So, guess what? We have a Windows support, through Docker swarm and we have a Linux support through Kubernetes. Now that paradigm has changed, as of today, Kubernetes support Windows container. So, guess what? We are well after the UCP, because we have our own orchestration tool, and we start managing Kubernetes cluster in Linux and introduce now, Windows as well. Then comes to the Docker Trusted Registry. Integrated Security and role based access control, made a very smooth transition from our RT storage to DTR. In addition to that, binary level scanning was another good feature from the security point of view. So that, these all options and our R&D landed the Docker Enterprise is the way to go. And if we go over the Docker Enterprise, we can spin up multiple clusters on-prem and in the cloud. And we have a one centralized location to manage those clusters. >> Khalil: So, with all that, now let's talk about how what was our pilot deployment, for proof of concept. In this diagram, you can see we, on the left side is our on-prem Data Center, on the right side is AWS, US East Coast. We picked up one region three zones. And on-prem, we picked up our Data Center, one of the Data Center in the United States of America, and we started the POC. So, our Universal Control Plane had a five nodes cluster. Docker Trusted Registry, also has a five node cluster. And the both, but in our on-prem Data Center. When it comes to the worker nodes, we have started with 18 node cluster, on the Linux side and the four node cluster on the Windows side. Because the major footprint which we have was on the Linux side, and the Windows use cases were pretty small. Also, this is just a proof of concept. And in AWS, we mimic the same web worker nodes, virtual to what we have on-prem. We have a 13 nodes cluster on Linux. And we started with four node cluster of Windows container. And having the direct connect from our Data Center to AWS, which was previously existing, so we did not have any connectivity or latency issue. Now, if you see in this diagram, you have a centralized, Universal Control Plane and your trusted registry. And we were able to spin up a cluster, on-prem as well as in the cloud. And we made this happen, end to end in record time. So later, when we deploy this in production, we also added another cloud provider. So, what you see the box on the right side, we just duplicate test that box in another cloud platform. So, now other orchestration tool, managing on-prem and multicloud clusters. Now, in your use case, you may find this little, you know, more in favor of on-prem. But that fit in our use case. Later, we did have expanded the cluster of Universal Control Plane and DTR in the cloud as well. And the clusters have gone and hundreds and thousands of worker nodes span over two cloud providers, third being discussed. And this solution has been working so far, very good. We did not see any downtime, not a single instance. And we were able to provide multicloud platform, container Platform as a Service for our S&P Global. Thank you for your time. If any questions, I have put my LinkedIn and Twitter account holder, you're welcome to ask any question

Published Date : Sep 14 2020

SUMMARY :

and in the cloud. and the Windows use

ENTITIES

Entity	Category	Confidence
Daniel Khalil	PERSON	0.99+
Citigroup	ORGANIZATION	0.99+
S&P Global	ORGANIZATION	0.99+
June 2015	DATE	0.99+
S&P Global	ORGANIZATION	0.99+
Khalil Ahmad	PERSON	0.99+
LSE	ORGANIZATION	0.99+
six years	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
30 years	QUANTITY	0.99+
New Jersey	LOCATION	0.99+
Prudential	ORGANIZATION	0.99+
United States of America	LOCATION	0.99+
New York City	LOCATION	0.99+
13 nodes	QUANTITY	0.99+
University of Scranton	ORGANIZATION	0.99+
LinkedIn	ORGANIZATION	0.99+
OCI	ORGANIZATION	0.99+
University of Punjab	ORGANIZATION	0.99+
today	DATE	0.99+
Linux	TITLE	0.99+
three words	QUANTITY	0.99+
third	QUANTITY	0.99+
Windows	TITLE	0.99+
Linux Foundation	ORGANIZATION	0.99+
Twitter	ORGANIZATION	0.98+
Khalil	PERSON	0.98+
three zones	QUANTITY	0.98+
both	QUANTITY	0.98+
HEC Paris	ORGANIZATION	0.98+
one	QUANTITY	0.98+
Docker	TITLE	0.98+
NYU Stern	ORGANIZATION	0.98+
five nodes	QUANTITY	0.97+
two components	QUANTITY	0.97+
both options	QUANTITY	0.97+
Docker Enterprise	TITLE	0.97+
Secondly	QUANTITY	0.96+
single instance	QUANTITY	0.96+
first	QUANTITY	0.95+
Kubernetes	TITLE	0.94+
two cloud providers	QUANTITY	0.94+
DataDog	ORGANIZATION	0.93+
Docker	ORGANIZATION	0.93+
two	QUANTITY	0.92+
Third requirement	QUANTITY	0.92+
four node	QUANTITY	0.91+
both clusters	QUANTITY	0.91+
TRIUM	ORGANIZATION	0.91+
five node cluster	QUANTITY	0.88+
Docker Enterprise	ORGANIZATION	0.87+
US East Coast	LOCATION	0.85+
one cloud provider	QUANTITY	0.83+
Lahore	LOCATION	0.82+
Open Container Initiative	ORGANIZATION	0.81+

Machine Learning Panel | Machine Learning Everywhere 2018

>> Announcer: Live from New York, it's theCUBE. Covering machine learning everywhere. Build your ladder to AI. Brought to you by IBM. Welcome back to New York City. Along with Dave Vellante, I'm John Walls. We continue our coverage here on theCUBE of machine learning everywhere. Build your ladder to AI, IBM our host here today. We put together, occasionally at these events, a panel of esteemed experts with deep perspectives on a particular subject. Today our influencer panel is comprised of three well-known and respected authorities in this space. Glad to have Colin Sumpter here with us. He's the man with the mic, by the way. He's going to talk first. But, Colin is an IT architect with CrowdMole. Thank you for being with us, Colin. Jennifer Shin, those of you on theCUBE, you're very familiar with Jennifer, a long time Cuber. Founded 8 Path Solutions, on the faculty at NYU and Cal Berkeley, and also with us is Craig Brown, a big data consultant. And a home game for all of you guys, right, more or less here we are in the city. So, thanks for having us, we appreciate the time. First off, let's just talk about the title of the event, Build Your Path... Or Your Ladder, excuse me, to AI. What are those steps on that ladder, Colin? The fundamental steps that you've got to jump on, or step on, in order to get to that true AI environment? >> In order to get to that true AI environment, John, is a matter of mastering or organizing your information well enough to perform analytics. That'll give you two choices to do either linear regression or supervised classification, and then you actually have enough organized data to talk to your team and organize your team around that data to begin that ladder to successively benefit from your data science program. >> Want to take a stab at it, Jennifer? >> So, I would say, compute, right? You need to have the right processing, or at least the ability to scale out to be able to process the algorithm fast enough to be able to find value in your data. I think the other thing is, of course, the data source itself. Do you have right data to answer the questions you want to answer? So, I think, without those two things, you'll either have a lot of great data that you can't process in time, or you'll have a great process or a great algorithm that has no real information, so your output is useless. I think those are the fundamental things you really do need to have any sort of AI solution built. >> I'll take a stab at it from the business side. They have to adopt it first. They have to believe that this is going to benefit them and that the effort that's necessary in order to build into the various aspects of algorithms and data subjects is there, so I think adopting the concept of machine learning and the development aspects that it takes to do that is a key component to building the ladder. >> So this just isn't toe in the water, right? You got to dive in the deep end, right? >> Craig: Right. >> It gets to culture. If you look at most organizations, not the big five market capped companies, but most organizations, data is not at their core. Humans are at their core, human expertise and data is sort of bolted on, but that has to change, or they're going to get disrupted. Data has to be at the core, maybe the human expertise leverages that data. What do you guys seeing with end customers in terms of their readiness for this transformation? >> What I'm seeing customers spending time right now is getting out of the silos. So, when you speak culture, that's primarily what the culture surrounds. They develop applications with functionality as a silo, and data specific to that functionality is the component in which they look at data. They have to get out of that mindset and look at the data holistically, and ultimately, in these events, looking at it as an asset. >> The data is a shared resource. >> Craig: Right, correct. >> Okay, and again, with the exception of the... Whether it's Google, Facebook, obviously, but the Ubers, the AirBNB's, etc... With the exception of those guys, most customers aren't there. Still, the data is in silos, they've got myriad infrastructure. Your thoughts, Jennifer? >> I'm also seeing sort of a disconnect between the operationalizing team, the team that runs these codes, or has a real business need for it, and sometimes you'll see corporations with research teams, and there's sort of a disconnect between what the researchers do and what these operations, or marketing, whatever domain it is, what they're doing in terms of a day to day operation. So, for instance, a researcher will look really deep into these algorithms, and may know a lot about deep learning in theory, in theoretical world, and might publish a paper that's really interesting. But, that application part where they're actually being used every day, there's this difference there, where you really shouldn't have that difference. There should be more alignment. I think actually aligning those resources... I think companies are struggling with that. >> So, Colin, we were talking off camera about RPA, Robotic Process Automation. Where's the play for machine intelligence and RPA? Maybe, first of all, you could explain RPA. >> David, RPA stands for Robotic Process Automation. That's going to enable you to grow and scale a digital workforce. Typically, it's done in the cloud. The way RPA and Robotic Process Automation plays into machine learning and data science, is that it allows you to outsource business processes to compensate for the lack of human expertise that's available in the marketplace, because you need competency to enable the technology to take advantage of these new benefits coming in the market. And, when you start automating some of these processes, you can keep pace with the innovation in the marketplace and allow the human expertise to gradually grow into these new data science technologies. >> So, I was mentioning some of the big guys before. Top five market capped companies: Google, Amazon, Apple, Facebook, Microsoft, all digital. Microsoft you can argue, but still, pretty digital, pretty data oriented. My question is about closing that gap. In your view, can companies close that gap? How can they close that gap? Are you guys helping companies close that gap? It's a wide chasm, it seems. Thoughts? >> The thought on closing the chasm is... presenting the technology to the decision-makers. What we've learned is that... you don't know what you don't know, so it's impossible to find the new technologies if you don't have the vocabulary to just begin a simple research of these new technologies. And, to close that gap, it really comes down to the awareness, events like theCUBE, webinars, different educational opportunities that are available to line of business owners, directors, VP's of systems and services, to begin that awareness process, finding consultants... begin that pipeline enablement to begin allowing the business to take advantage and harness data science, machine learning and what's coming. >> One of the things I've noticed is that there's a lot of information out there, like everyone a webinar, everyone has tutorials, but there's a lot of overlap. There aren't that many very sophisticated documents you can find about how to implement it in real world conditions. They all tend to use the same core data set, a lot of these machine learning tutorials you'll find, which is hilarious because the data set's actually very small. And I know where it comes from, just from having the expertise, but it's not something I'd ever use in the real world. The level of skill you need to be able to do any of these methodologies. But that's what's out there. So, there's a lot of information, but they're kind of at a rudimentary level. They're not really at that sophisticated level where you're going to learn enough to deploy in real world conditions. One of the things I'm noticing is, with the technical teams, with the data science team, machine learning teams, they're kind of using the same methodologies I used maybe 10 years ago. Because the management who manage these teams are not technical enough. They're business people, so they don't understand how to guide them, how to explain hey maybe you shouldn't do that with your code, because that's actually going to cause a problem. You should use parallel code, you should make sure everything is running in parallel so compute's faster. But, if these younger teams are actually learning for the first time, they make the same mistakes you made 10 years ago. So, I think, what I'm noticing is that lack of leadership is partly one of the reasons, and also the assumption that a non-technical person can lead the technical team. >> So, it's just not skillset on the worker level, if you will. It's also knowledge base on the decision-maker level. That's a bad place to be, right? So, how do you get into the door to a business like that? Obviously, and we've talked about this a little bit today, that some companies say, "We're not data companies, we're not digital companies, we sell widgets." Well, yeah but you sell widgets and you need this to sell more widgets. And so, how do you get into the door and talk about this problem that Jennifer just cited? You're signing the checks, man. You're going to have to get up to speed on this otherwise you're not going to have checks to sign in three to five years, you're done! >> I think that speaks to use cases. I think that, and what I'm actually saying at customers, is that there's a disconnect and an understanding from the executive teams and the low-level technical teams on what the use case actually means to the business. Some of the use cases are operational in nature. Some of the use cases are data in nature. There's no real conformity on what does the use case mean across the organization, and that understanding isn't there. And so, the CIO's, the CEO's, the CTO's think that, "Okay, we're going to achieve a certain level of capability if we do a variety of technological things," and the business is looking to effectively improve some or bring some efficiency to business processes. At each level within the organization, the understanding is at the level at which the discussions are being made. And so, I'm in these meetings with senior executives and we have lots of ideas on how we can bring efficiencies and some operational productivity with technology. And then we get in a meeting with the data stewards and "What are these guys talking about? They don't understand what's going on at the data level and what data we have." And then that's where the data quality challenges come into the conversation, so I think that, to close that cataclysm, we have to figure out who needs to be in the room to effectively help us build the right understanding around the use cases and then bring the technology to those use cases then actually see within the organization how we're affecting that. >> So, to change the questioning here... I want you guys to think about how capable can we make machines in the near term, let's talk next decade near term. Let's say next decade. How capable can we make machines and are there limits to what we should do? >> That's a tough one. Although you want to go next decade, we're still faced with some of the challenges today in terms of, again, that adoption, the use case scenarios, and then what my colleagues are saying here about the various data challenges and dev ops and things. So, there's a number of things that we have to overcome, but if we can get past those areas in the next decade, I don't think there's going to be much of a limit, in my opinion, as to what the technology can do and what we can ask the machines to produce for us. As Colin mentioned, with RPA, I think that the capability is there, right? But, can we also ultimately, as humans, leverage that capability effectively? >> I get this question a lot. People are really worried about AI and robots taking over, and all of that. And I go... Well, let's think about the example. We've all been online, probably over the weekend, maybe it's 3 or 4 AM, checking your bank account, and you get an error message your password is wrong. And we swear... And I've been there where I'm like, "No, no my password's right." And it keeps saying that the password is wrong. Of course, then I change it, and it's still wrong. Then, the next day when I login, I can login, same password, because they didn't put a great error message there. They just defaulted to wrong password when it's probably a server that's down. So, there are these basics or processes that we could be improving which no one's improving. So you think in that example, how many customer service reps are going to be contacted to try to address that? How many IT teams? So, for every one of these bad technologies that are out there, or technologies that are not being run efficiently or run in a way that makes sense, you actually have maybe three people that are going to be contacted to try to resolve an issue that actually maybe could have been avoided to begin with. I feel like it's optimistic to say that robots are going to take over, because you're probably going to need more people to put band-aids on bad technology and bad engineering, frankly. And I think that's the reality of it. If we had hoverboards, that would be great, you know? For a while, we thought we did, right? But we found out, oh it's not quite hoverboards. I feel like that might be what happens with AI. We might think we have it, and then go oh wait, it's not really what we thought it was. >> So there are real limits, certainly in the near to mid to maybe even long term, that are imposed. But you're an optimist. >> Yeah. Well, not so much with AI but everything else, sure. (laughing) AI, I'm a little bit like, "Well, it would be great, but I'd like basic things to be taken care of every day." So, I think the usefulness of technology is not something anyone's talking about. They're talking about this advancement, that advancement, things people don't understand, don't know even how to use in their life. Great, great is an idea. But, what about useful things we can actually use in our real life? >> So block and tackle first, and then put some reverses in later, if you will, to switch over to football. We were talking about it earlier, just about basics. Fundamentals, get your fundamentals right and then you can complement on that with supplementary technologies. Craig, Colin? >> Jen made some really good points and brought up some very good points, and so has... >> John: Craig. >> Craig, I'm sorry. (laughing) >> Craig: It's alright. >> 10 years out, Jen and Craig spoke to false positives. And false positives create a lot of inefficiency in businesses. So, when you start using machine learning and AI 10 years from now, maybe there's reduced false positives that have been scored in real time, allowing teams not to have their time consumed and their business resources consumed trying to resolve false positives. These false positives have a business value that, today, some businesses might not be able to record. In financial services, banks count money not lended. But, in every day business, a lot of businesses aren't counting the monetary consequences of false positives and the drag it has on their operational ability and capacity. >> I want to ask you guys about disruption. If you look at where the disruption, the digital disruptions, have taken place, obviously retail, certainly advertising, certainly content businesses... There are some industries that haven't been highly disruptive: financial services, insurance, we were talking earlier about aerospace, defense rather. Is any business, any industry, safe from digital disruption? >> There are. Certain industries are just highly regulated: healthcare, financial services, real estate, transactional law... These are very extremely regulated technologies, or businesses, that are... I don't want to say susceptible to technology, but they can be disrupted at a basic level, operational efficiency, to make these things happen, these business processes happen more rapidly, more accurately. >> So you guys buy that? There's some... I'd like to get a little debate going here. >> So, I work with the government, and the government's trying to change things. I feel like that's kind of a sign because they tend to be a little bit slower than, say, other private industries, or private companies. They have data, they're trying to actually put it into a system, meaning like if they have files... I think that, at some point, I got contacted about putting files that they found, like birth records, right, marriage records, that they found from 100-plus years ago and trying to put that into the system. By the way, I did look into it, there was no way to use AI for that, because there was no standardization across these files, so they have half a million files, but someone's probably going to manually have to enter that in. The reality is, I think because there's a demand for having things be digital, we aren't likely to see a decrease in that. We're not going to have one industry that goes, "Oh, your files aren't digital." Probably because they also want to be digital. The companies themselves, the employees themselves, want to see that change. So, I think there's going to be this continuous move toward it, but there's the question of, "Are we doing it better?" It is better than, say, having it on paper sometimes? Because sometimes I just feel like it's easier on paper than to have to look through my phone, look through the app. There's so many apps now! >> (laughing) I got my index cards cards still, Jennifer! Dave's got his notebook! >> I'm not sure I want my ledger to be on paper... >> Right! So I think that's going to be an interesting thing when people take a step back and go like, "Is this really better? Is this actually an improvement?" Because I don't think all things are better digital. >> That's a great question. Will the world be a better, more prosperous place... Uncertain. Your thoughts? >> I think the competition is probably the driver as to who has to this now, who's not safe. The organizations that are heavily regulated or compliance-driven can actually use that as the reasoning for not jumping into the barrel right now, and letting it happen in other areas first, watching the technology mature-- >> Dave: Let's wait. >> Yeah, let's wait, because that's traditionally how they-- >> Dave: Good strategy in your opinion? >> It depends on the entity but I think there's nothing wrong with being safe. There's nothing wrong with waiting for a variety of innovations to mature. What level of maturity, I think, is the perspective that probably is another discussion for another day, but I think that it's okay. I don't think that everyone should jump in. Get some lessons learned, watch how the other guys do it. I think that safety is in the eyes of the beholder, right? But some organizations are just competition fierce and they need a competitive edge and this is where they get it. >> When you say safety, do you mean safety in making decisions, or do you mean safety in protecting data? How are you defining safety? >> Safety in terms of when they need to launch, and look into these new technologies as a basis for change within the organization. >> What about the other side of that point? There's so much more data about it, so much more behavior about it, so many more attitudes, so on and so forth. And there is privacy issues and security issues and all that... Those are real challenges for any company, and becoming exponentially more important as more is at stake. So, how do companies address that? That's got to be absolutely part of their equation, as they decide what these future deployments are, because they're going to have great, vast reams of data, but that's a lot of vulnerability too, isn't it? >> It's as vulnerable as they... So, from an organizational standpoint, they're accustomed to these... These challenges aren't new, right? We still see data breaches. >> They're bigger now, right? >> They're bigger, but we still see occasionally data breaches in organizations where we don't expect to see them. I think that, from that perspective, it's the experiences of the organizations that determine the risks they want to take on, to a certain degree. And then, based on those risks, and how they handle adversity within those risks, from an experience standpoint they know ultimately how to handle it, and get themselves to a place where they can figure out what happened and then fix the issues. And then the others watch while these risk-takers take on these types of scenarios. >> I want to underscore this whole disruption thing and ask... We don't have much time, I know we're going a little over. I want to ask you to pull out your Hubble telescopes. Let's make a 20 to 30 year view, so we're safe, because we know we're going to be wrong. I want a sort of scale of 1 to 10, high likelihood being 10, low being 1. Maybe sort of rapid fire. Do you think large retail stores are going to mostly disappear? What do you guys think? >> I think the way that they are structured, the way that they interact with their customers might change, but you're still going to need them because there are going to be times where you need to buy something. >> So, six, seven, something like that? Is that kind of consensus, or do you feel differently Colin? >> I feel retail's going to be around, especially fashion because certain people, and myself included, I need to try my clothes on. So, you need a location to go to, a physical location to actually feel the material, experience the material. >> Alright, so we kind of have a consensus there. It's probably no. How about driving-- >> I was going to say, Amazon opened a book store. Just saying, it's kind of funny because they got... And they opened the book store, so you know, I think what happens is people forget over time, they go, "It's a new idea." It's not so much a new idea. >> I heard a rumor the other day that their next big acquisition was going to be, not Neiman Marcus. What's the other high end retailer? >> Nordstrom? >> Nordstrom, yeah. And my wife said, "Bad idea, they'll ruin it." Will driving and owning your own car become an exception? >> Driving and owning your own car... >> Dave: 30 years now, we're talking. >> 30 years... Sure, I think the concept is there. I think that we're looking at that. IOT is moving us in that direction. 5G is around the corner. So, I think the makings of it is there. So, since I can dare to be wrong, yeah I think-- >> We'll be on 10G by then anyway, so-- >> Automobiles really haven't been disrupted, the car industry. But you're forecasting, I would tend to agree. Do you guys agree or no, or do you think that culturally I want to drive my own car? >> Yeah, I think people, I think a couple of things. How well engineered is it? Because if it's badly engineered, people are not going to want to use it. For instance, there are people who could take public transportation. It's the same idea, right? Everything's autonomous, you'd have to follow in line. There's going to be some system, some order to it. And you might go-- >> Dave: Good example, yeah. >> You might go, "Oh, I want it to be faster. I don't want to be in line with that autonomous vehicle. I want to get there faster, get there sooner." And there are people who want to have that control over their lives, but they're not subject to things like schedules all the time and that's their constraint. So, I think if the engineering is bad, you're going to have more problems and people are probably going to go away from wanting to be autonomous. >> Alright, Colin, one for you. Will robots and maybe 3D printing, for example RPA, will it reverse the trend toward offshore manufacturing? >> 30 years from now, yes. I think robotic process engineering, eventually you're going to be at your cubicle or your desk, or whatever it is, and you're going to be able to print office supplies. >> Do you guys think machines will make better diagnoses than doctors? Ohhhhh. >> I'll take that one. >> Alright, alright. >> I think yes, to a certain degree, because if you look at the... problems with diagnosis, right now they miss it and I don't know how people, even 30 years from now, will be different from that perspective, where machines can look at quite a bit of data about a patient in split seconds and say, "Hey, the likelihood of you recurring this disease is nil to none, because here's what I'm basing it on." I don't think doctors will be able to do that. Now, again, daring to be wrong! (laughing) >> Jennifer: Yeah so--6 >> Don't tell your own doctor either. (laughing) >> That's true. If anything happens, we know, we all know. I think it depends. So maybe 80%, some middle percentage might be the case. I think extreme outliers, maybe not so much. You think about anything that's programmed into an algorithm, someone probably identified that disease, a human being identified that as a disease, made that connection, and then it gets put into the algorithm. I think what w6ll happen is that, for the 20% that isn't being done well by machine, you'll have people who are more specialized being able to identify the outlier cases from, say, the standard. Normally, if you have certain symptoms, you have a cold, those are kind of standard ones. If you have this weird sort of thing where there's n6w variables, environmental variables for instance, your environment can actually lead to you having cancer. So, there's othe6 factors other than just your body and your health that's going to actually be important to think about wh6n diagnosing someone. >> John: Colin, go ahead. >> I think machines aren't going to out-decision doctors. I think doctors are going to work well the machine learning. For instance, there's a published document of Watson doing the research of a team of four in 10 minutes, when it normally takes a month. So, those doctors,6to bring up Jen and Craig's point, are going to have more time to focus in on what the actual symptoms are, to resolve the outcome of patient care and patient services in a way that benefits humanity. >> I just wish that, Dave, that you would have picked a shorter horizon that... 30 years, 20 I feel good about our chances of seeing that. 30 I'm just not so sure, I mean... For the two old guys on the panel here. >> The consensus is 20 years, not so much. But beyond 10 years, a lot's going to change. >> Well, thank you all for joining this. I always enjoy the discussions. Craig, Jennifer and Colin, thanks for being here with us here on theCUBE, we appreciate the time. Back with more here from New York right after this. You're watching theCUBE. (upbeat digital music)

Published Date : Feb 27 2018

SUMMARY :

Brought to you by IBM. enough organized data to talk to your team and organize or at least the ability to scale out to be able to process and that the effort that's necessary in order to build but that has to change, or they're going to get disrupted. and data specific to that functionality but the Ubers, the AirBNB's, etc... I think companies are struggling with that. Maybe, first of all, you could explain RPA. and allow the human expertise to gradually grow Are you guys helping companies close that gap? presenting the technology to the decision-makers. how to guide them, how to explain hey maybe you shouldn't You're going to have to get up to speed on this and the business is looking to effectively improve some and are there limits to what we should do? I don't think there's going to be much of a limit, that are going to be contacted to try to resolve an issue certainly in the near to mid to maybe even long term, but I'd like basic things to be taken care of every day." in later, if you will, to switch over to football. and brought up some very good points, and so has... Craig, I'm sorry. and the drag it has on their operational ability I want to ask you guys about disruption. operational efficiency, to make these things happen, I'd like to get a little debate going here. So, I think there's going to be this continuous move ledger to be on paper... So I think that's going to be an interesting thing Will the world be a better, more prosperous place... as to who has to this now, who's not safe. It depends on the entity but I think and look into these new technologies as a basis That's got to be absolutely part of their equation, they're accustomed to these... and get themselves to a place where they can figure out I want to ask you to pull out your Hubble telescopes. because there are going to be times I feel retail's going to be around, Alright, so we kind of have a consensus there. I think what happens is people forget over time, I heard a rumor the other day that their next big Will driving and owning your own car become an exception? So, since I can dare to be wrong, yeah I think-- or do you think that culturally I want to drive my own car? There's going to be some system, some order to it. going to go away from wanting to be autonomous. Alright, Colin, one for you. be able to print office supplies. Do you guys think machines will make "Hey, the likelihood of you recurring this disease Don't tell your own doctor either. being able to identify the outlier cases from, say, I think doctors are going to work well the machine learning. I just wish that, Dave, that you would have picked The consensus is 20 years, not so much. I always enjoy the discussions.

ENTITIES

Entity	Category	Confidence
Craig	PERSON	0.99+
Jennifer	PERSON	0.99+
Colin	PERSON	0.99+
David	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Jen	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Jennifer Shin	PERSON	0.99+
Apple	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Colin Sumpter	PERSON	0.99+
Craig Brown	PERSON	0.99+
John Walls	PERSON	0.99+
20	QUANTITY	0.99+
John	PERSON	0.99+
Nordstrom	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
AirBNB	ORGANIZATION	0.99+
New York	LOCATION	0.99+
Neiman Marcus	ORGANIZATION	0.99+
80%	QUANTITY	0.99+
20%	QUANTITY	0.99+
3	DATE	0.99+
today	DATE	0.99+
three	QUANTITY	0.99+
New York City	LOCATION	0.99+
20 years	QUANTITY	0.99+
CrowdMole	ORGANIZATION	0.99+
10	QUANTITY	0.99+
4 AM	DATE	0.99+
8 Path Solutions	ORGANIZATION	0.99+
Today	DATE	0.99+
two old guys	QUANTITY	0.99+
five years	QUANTITY	0.99+
30 years	QUANTITY	0.99+
30 year	QUANTITY	0.99+
First	QUANTITY	0.99+
three people	QUANTITY	0.99+
Ubers	ORGANIZATION	0.99+
10 minutes	QUANTITY	0.99+
10 years	QUANTITY	0.99+
a month	QUANTITY	0.98+
one	QUANTITY	0.98+
first time	QUANTITY	0.98+
next decade	DATE	0.98+
10 years ago	DATE	0.98+
seven	QUANTITY	0.98+
30	QUANTITY	0.98+
Hubble	ORGANIZATION	0.98+
two things	QUANTITY	0.98+
1	QUANTITY	0.98+
half a million files	QUANTITY	0.97+

Data Science: Present and Future | IBM Data Science For All

>> Announcer: Live from New York City it's The Cube, covering IBM data science for all. Brought to you by IBM. (light digital music) >> Welcome back to data science for all. It's a whole new game. And it is a whole new game. >> Dave Vellante, John Walls here. We've got quite a distinguished panel. So it is a new game-- >> Well we're in the game, I'm just happy to be-- (both laugh) Have a swing at the pitch. >> Well let's what we have here. Five distinguished members of our panel. It'll take me a minute to get through the introductions, but believe me they're worth it. Jennifer Shin joins us. Jennifer's the founder of 8 Path Solutions, the director of the data science of Comcast and part of the faculty at UC Berkeley and NYU. Jennifer, nice to have you with us, we appreciate the time. Joe McKendrick an analyst and contributor of Forbes and ZDNet, Joe, thank you for being here at well. Another ZDNetter next to him, Dion Hinchcliffe, who is a vice president and principal analyst of Constellation Research and also contributes to ZDNet. Good to see you, sir. To the back row, but that doesn't mean anything about the quality of the participation here. Bob Hayes with a killer Batman shirt on by the way, which we'll get to explain in just a little bit. He runs the Business over Broadway. And Joe Caserta, who the founder of Caserta Concepts. Welcome to all of you. Thanks for taking the time to be with us. Jennifer, let me just begin with you. Obviously as a practitioner you're very involved in the industry, you're on the academic side as well. We mentioned Berkeley, NYU, steep experience. So I want you to kind of take your foot in both worlds and tell me about data science. I mean where do we stand now from those two perspectives? How have we evolved to where we are? And how would you describe, I guess the state of data science? >> Yeah so I think that's a really interesting question. There's a lot of changes happening. In part because data science has now become much more established, both in the academic side as well as in industry. So now you see some of the bigger problems coming out. People have managed to have data pipelines set up. But now there are these questions about models and accuracy and data integration. So the really cool stuff from the data science standpoint. We get to get really into the details of the data. And I think on the academic side you now see undergraduate programs, not just graduate programs, but undergraduate programs being involved. UC Berkeley just did a big initiative that they're going to offer data science to undergrads. So that's a huge news for the university. So I think there's a lot of interest from the academic side to continue data science as a major, as a field. But I think in industry one of the difficulties you're now having is businesses are now asking that question of ROI, right? What do I actually get in return in the initial years? So I think there's a lot of work to be done and just a lot of opportunity. It's great because people now understand better with data sciences, but I think data sciences have to really think about that seriously and take it seriously and really think about how am I actually getting a return, or adding a value to the business? >> And there's lot to be said is there not, just in terms of increasing the workforce, the acumen, the training that's required now. It's a still relatively new discipline. So is there a shortage issue? Or is there just a great need? Is the opportunity there? I mean how would you look at that? >> Well I always think there's opportunity to be smart. If you can be smarter, you know it's always better. It gives you advantages in the workplace, it gets you an advantage in academia. The question is, can you actually do the work? The work's really hard, right? You have to learn all these different disciplines, you have to be able to technically understand data. Then you have to understand it conceptually. You have to be able to model with it, you have to be able to explain it. There's a lot of aspects that you're not going to pick up overnight. So I think part of it is endurance. Like are people going to feel motivated enough and dedicate enough time to it to get very good at that skill set. And also of course, you know in terms of industry, will there be enough interest in the long term that there will be a financial motivation. For people to keep staying in the field, right? So I think it's definitely a lot of opportunity. But that's always been there. Like I tell people I think of myself as a scientist and data science happens to be my day job. That's just the job title. But if you are a scientist and you work with data you'll always want to work with data. I think that's just an inherent need. It's kind of a compulsion, you just kind of can't help yourself, but dig a little bit deeper, ask the questions, you can't not think about it. So I think that will always exist. Whether or not it's an industry job in the way that we see it today, and like five years from now, or 10 years from now. I think that's something that's up for debate. >> So all of you have watched the evolution of data and how it effects organizations for a number of years now. If you go back to the days when data warehouse was king, we had a lot of promises about 360 degree views of the customer and how we were going to be more anticipatory in terms and more responsive. In many ways the decision support systems and the data warehousing world didn't live up to those promises. They solved other problems for sure. And so everybody was looking for big data to solve those problems. And they've begun to attack many of them. We talked earlier in The Cube today about fraud detection, it's gotten much, much better. Certainly retargeting of advertising has gotten better. But I wonder if you could comment, you know maybe start with Joe. As to the effect that data and data sciences had on organizations in terms of fulfilling that vision of a 360 degree view of customers and anticipating customer needs. >> So. Data warehousing, I wouldn't say failed. But I think it was unfinished in order to achieve what we need done today. At the time I think it did a pretty good job. I think it was the only place where we were able to collect data from all these different systems, have it in a single place for analytics. The big difference between what I think, between data warehousing and data science is data warehouses were primarily made for the consumer to human beings. To be able to have people look through some tool and be able to analyze data manually. That really doesn't work anymore, there's just too much data to do that. So that's why we need to build a science around it so that we can actually have machines actually doing the analytics for us. And I think that's the biggest stride in the evolution over the past couple of years, that now we're actually able to do that, right? It used to be very, you know you go back to when data warehouses started, you had to be a deep technologist in order to be able to collect the data, write the programs to clean the data. But now you're average causal IT person can do that. Right now I think we're back in data science where you have to be a fairly sophisticated programmer, analyst, scientist, statistician, engineer, in order to do what we need to do, in order to make machines actually understand the data. But I think part of the evolution, we're just in the forefront. We're going to see over the next, not even years, within the next year I think a lot of new innovation where the average person within business and definitely the average person within IT will be able to do as easily say, "What are my sales going to be next year?" As easy as it is to say, "What were my sales last year." Where now it's a big deal. Right now in order to do that you have to build some algorithms, you have to be a specialist on predictive analytics. And I think, you know as the tools mature, as people using data matures, and as the technology ecosystem for data matures, it's going to be easier and more accessible. >> So it's still too hard. (laughs) That's something-- >> Joe C.: Today it is yes. >> You've written about and talked about. >> Yeah no question about it. We see this citizen data scientist. You know we talked about the democratization of data science but the way we talk about analytics and warehousing and all the tools we had before, they generated a lot of insights and views on the information, but they didn't really give us the science part. And that's, I think that what's missing is the forming of the hypothesis, the closing of the loop of. We now have use of this data, but are are changing, are we thinking about it strategically? Are we learning from it and then feeding that back into the process. I think that's the big difference between data science and the analytics side. But, you know just like Google made search available to everyone, not just people who had highly specialized indexers or crawlers. Now we can have tools that make these capabilities available to anyone. You know going back to what Joe said I think the key thing is we now have tools that can look at all the data and ask all the questions. 'Cause we can't possibly do it all ourselves. Our organizations are increasingly awash in data. Which is the life blood of our organizations, but we're not using it, you know this a whole concept of dark data. And so I think the concept, or the promise of opening these tools up for everyone to be able to access those insights and activate them, I think that, you know, that's where it's headed. >> This is kind of where the T shirt comes in right? So Bob if you would, so you've got this Batman shirt on. We talked a little bit about it earlier, but it plays right into what Dion's talking about. About tools and, I don't want to spoil it, but you go ahead (laughs) and tell me about it. >> Right, so. Batman is a super hero, but he doesn't have any supernatural powers, right? He can't fly on his own, he can't become invisible on his own. But the thing is he has the utility belt and he has these tools he can use to help him solve problems. For example he as the bat ring when he's confronted with a building that he wants to get over, right? So he pulls it out and uses that. So as data professionals we have all these tools now that these vendors are making. We have IBM SPSS, we have data science experience. IMB Watson that these data pros can now use it as part of their utility belt and solve problems that they're confronted with. So if you''re ever confronted with like a Churn problem and you have somebody who has access to that data they can put that into IBM Watson, ask a question and it'll tell you what's the key driver of Churn. So it's not that you have to be a superhuman to be a data scientist, but these tools will help you solve certain problems and help your business go forward. >> Joe McKendrick, do you have a comment? >> Does that make the Batmobile the Watson? (everyone laughs) Analogy? >> I was just going to add that, you know all of the billionaires in the world today and none of them decided to become Batman yet. It's very disappointing. >> Yeah. (Joe laughs) >> Go ahead Joe. >> And I just want to add some thoughts to our discussion about what happened with data warehousing. I think it's important to point out as well that data warehousing, as it existed, was fairly successful but for larger companies. Data warehousing is a very expensive proposition it remains a expensive proposition. Something that's in the domain of the Fortune 500. But today's economy is based on a very entrepreneurial model. The Fortune 500s are out there of course it's ever shifting. But you have a lot of smaller companies a lot of people with start ups. You have people within divisions of larger companies that want to innovate and not be tied to the corporate balance sheet. They want to be able to go through, they want to innovate and experiment without having to go through finance and the finance department. So there's all these open source tools available. There's cloud resources as well as open source tools. Hadoop of course being a prime example where you can work with the data and experiment with the data and practice data science at a very low cost. >> Dion mentioned the C word, citizen data scientist last year at the panel. We had a conversation about that. And the data scientists on the panel generally were like, "Stop." Okay, we're not all of a sudden going to turn everybody into data scientists however, what we want to do is get people thinking about data, more focused on data, becoming a data driven organization. I mean as a data scientist I wonder if you could comment on that. >> Well I think so the other side of that is, you know there are also many people who maybe didn't, you know follow through with science, 'cause it's also expensive. A PhD takes a lot of time. And you know if you don't get funding it's a lot of money. And for very little security if you think about how hard it is to get a teaching job that's going to give you enough of a pay off to pay that back. Right, the time that you took off, the investment that you made. So I think the other side of that is by making data more accessible, you allow people who could have been great in science, have an opportunity to be great data scientists. And so I think for me the idea of citizen data scientist, that's where the opportunity is. I think in terms of democratizing data and making it available for everyone, I feel as though it's something similar to the way we didn't really know what KPIs were, maybe 20 years ago. People didn't use it as readily, didn't teach it in schools. I think maybe 10, 20 years from now, some of the things that we're building today from data science, hopefully more people will understand how to use these tools. They'll have a better understanding of working with data and what that means, and just data literacy right? Just being able to use these tools and be able to understand what data's saying and actually what it's not saying. Which is the thing that most people don't think about. But you can also say that data doesn't say anything. There's a lot of noise in it. There's too much noise to be able to say that there is a result. So I think that's the other side of it. So yeah I guess in terms for me, in terms of data a serious data scientist, I think it's a great idea to have that, right? But at the same time of course everyone kind of emphasized you don't want everyone out there going, "I can be a data scientist without education, "without statistics, without math," without understanding of how to implement the process. I've seen a lot of companies implement the same sort of process from 10, 20 years ago just on Hadoop instead of SQL. Right and it's very inefficient. And the only difference is that you can build more tables wrong than they could before. (everyone laughs) Which is I guess >> For less. it's an accomplishment and for less, it's cheaper, yeah. >> It is cheaper. >> Otherwise we're like I'm not a data scientist but I did stay at a Holiday Inn Express last night, right? >> Yeah. (panelists laugh) And there's like a little bit of pride that like they used 2,000, you know they used 2,000 computers to do it. Like a little bit of pride about that, but you know of course maybe not a great way to go. I think 20 years we couldn't do that, right? One computer was already an accomplishment to have that resource. So I think you have to think about the fact that if you're doing it wrong, you're going to just make that mistake bigger, which his also the other side of working with data. >> Sure, Bob. >> Yeah I have a comment about that. I've never liked the term citizen data scientist or citizen scientist. I get the point of it and I think employees within companies can help in the data analytics problem by maybe being a data collector or something. I mean I would never have just somebody become a scientist based on a few classes here she takes. It's like saying like, "Oh I'm going to be a citizen lawyer." And so you come to me with your legal problems, or a citizen surgeon. Like you need training to be good at something. You can't just be good at something just 'cause you want to be. >> John: Joe you wanted to say something too on that. >> Since we're in New York City I'd like to use the analogy of a real scientist versus a data scientist. So real scientist requires tools, right? And the tools are not new, like microscopes and a laboratory and a clean room. And these tools have evolved over years and years, and since we're in New York we could walk within a 10 block radius and buy any of those tools. It doesn't make us a scientist because we use those tools. I think with data, you know making, making the tools evolve and become easier to use, you know like Bob was saying, it doesn't make you a better data scientist, it just makes the data more accessible. You know we can go buy a microscope, we can go buy Hadoop, we can buy any kind of tool in a data ecosystem, but it doesn't really make you a scientist. I'm very involved in the NYU data science program and the Columbia data science program, like these kids are brilliant. You know these kids are not someone who is, you know just trying to run a day to day job, you know in corporate America. I think the people who are running the day to day job in corporate America are going to be the recipients of data science. Just like people who take drugs, right? As a result of a smart data scientist coming up with a formula that can help people, I think we're going to make it easier to distribute the data that can help people with all the new tools. But it doesn't really make it, you know the access to the data and tools available doesn't really make you a better data scientist. Without, like Bob was saying, without better training and education. >> So how-- I'm sorry, how do you then, if it's not for everybody, but yet I'm the user at the end of the day at my company and I've got these reams of data before me, how do you make it make better sense to me then? So that's where machine learning comes in or artificial intelligence and all this stuff. So how at the end of the day, Dion? How do you make it relevant and usable, actionable to somebody who might not be as practiced as you would like? >> I agree with Joe that many of us will be the recipients of data science. Just like you had to be a computer science at one point to develop programs for a computer, now we can get the programs. You don't need to be a computer scientist to get a lot of value out of our IT systems. The same thing's going to happen with data science. There's far more demand for data science than there ever could be produced by, you know having an ivory tower filled with data scientists. Which we need those guys, too, don't get me wrong. But we need to have, productize it and make it available in packages such that it can be consumed. The outputs and even some of the inputs can be provided by mere mortals, whether that's machine learning or artificial intelligence or bots that go off and run the hypotheses and select the algorithms maybe with some human help. We have to productize it. This is a constant of data scientist of service, which is becoming a thing now. It's, "I need this, I need this capability at scale. "I need it fast and I need it cheap." The commoditization of data science is going to happen. >> That goes back to what I was saying about, the recipient also of data science is also machines, right? Because I think the other thing that's happening now in the evolution of data is that, you know the data is, it's so tightly coupled. Back when you were talking about data warehousing you have all the business transactions then you take the data out of those systems, you put them in a warehouse for analysis, right? Maybe they'll make a decision to change that system at some point. Now the analytics platform and the business application is very tightly coupled. They become dependent upon one another. So you know people who are using the applications are now be able to take advantage of the insights of data analytics and data science, just through the app. Which never really existed before. >> I have one comment on that. You were talking about how do you get the end user more involved, well like we said earlier data science is not easy, right? As an end user, I encourage you to take a stats course, just a basic stats course, understanding what a mean is, variability, regression analysis, just basic stuff. So you as an end user can get more, or glean more insight from the reports that you're given, right? If you go to France and don't know French, then people can speak really slowly to you in French, you're not going to get it. You need to understand the language of data to get value from the technology we have available to us. >> Incidentally French is one of the languages that you have the option of learning if you're a mathematicians. So math PhDs are required to learn a second language. France being the country of algebra, that's one of the languages you could actually learn. Anyway tangent. But going back to the point. So statistics courses, definitely encourage it. I teach statistics. And one of the things that I'm finding as I go through the process of teaching it I'm actually bringing in my experience. And by bringing in my experience I'm actually kind of making the students think about the data differently. So the other thing people don't think about is the fact that like statisticians typically were expected to do, you know, just basic sort of tasks. In a sense that they're knowledge is specialized, right? But the day to day operations was they ran some data, you know they ran a test on some data, looked at the results, interpret the results based on what they were taught in school. They didn't develop that model a lot of times they just understand what the tests were saying, especially in the medical field. So when you when think about things like, we have words like population, census. Which is when you take data from every single, you have every single data point versus a sample, which is a subset. It's a very different story now that we're collecting faster than it used to be. It used to be the idea that you could collect information from everyone. Like it happens once every 10 years, we built that in. But nowadays you know, you know here about Facebook, for instance, I think they claimed earlier this year that their data was more accurate than the census data. So now there are these claims being made about which data source is more accurate. And I think the other side of this is now statisticians are expected to know data in a different way than they were before. So it's not just changing as a field in data science, but I think the sciences that are using data are also changing their fields as well. >> Dave: So is sampling dead? >> Well no, because-- >> Should it be? (laughs) >> Well if you're sampling wrong, yes. That's really the question. >> Okay. You know it's been said that the data doesn't lie, people do. Organizations are very political. Oftentimes you know, lies, damned lies and statistics, Benjamin Israeli. Are you seeing a change in the way in which organizations are using data in the context of the politics. So, some strong P&L manager say gets data and crafts it in a way that he or she can advance their agenda. Or they'll maybe attack a data set that is, probably should drive them in a different direction, but might be antithetical to their agenda. Are you seeing data, you know we talked about democratizing data, are you seeing that reduce the politics inside of organizations? >> So you know we've always used data to tell stories at the top level of an organization that's what it's all about. And I still see very much that no matter how much data science or, the access to the truth through looking at the numbers that story telling is still the political filter through which all that data still passes, right? But it's the advent of things like Block Chain, more and more corporate records and corporate information is going to end up in these open and shared repositories where there is not alternate truth. It'll come back to whoever tells the best stories at the end of the day. So I still see the organizations are very political. We are seeing now more open data though. Open data initiatives are a big thing, both in government and in the private sector. It is having an effect, but it's slow and steady. So that's what I see. >> Um, um, go ahead. >> I was just going to say as well. Ultimately I think data driven decision making is a great thing. And it's especially useful at the lower tiers of the organization where you have the routine day to day's decisions that could be automated through machine learning and deep learning. The algorithms can be improved on a constant basis. On the upper levels, you know that's why you pay executives the big bucks in the upper levels to make the strategic decisions. And data can help them, but ultimately, data, IT, technology alone will not create new markets, it will not drive new businesses, it's up to human beings to do that. The technology is the tool to help them make those decisions. But creating businesses, growing businesses, is very much a human activity. And that's something I don't see ever getting replaced. Technology might replace many other parts of the organization, but not that part. >> I tend to be a foolish optimist when it comes to this stuff. >> You do. (laughs) >> I do believe that data will make the world better. I do believe that data doesn't lie people lie. You know I think as we start, I'm already seeing trends in industries, all different industries where, you know conventional wisdom is starting to get trumped by analytics. You know I think it's still up to the human being today to ignore the facts and go with what they think in their gut and sometimes they win, sometimes they lose. But generally if they lose the data will tell them that they should have gone the other way. I think as we start relying more on data and trusting data through artificial intelligence, as we start making our lives a little bit easier, as we start using smart cars for safety, before replacement of humans. AS we start, you know, using data really and analytics and data science really as the bumpers, instead of the vehicle, eventually we're going to start to trust it as the vehicle itself. And then it's going to make lying a little bit harder. >> Okay, so great, excellent. Optimism, I love it. (John laughs) So I'm going to play devil's advocate here a little bit. There's a couple elephant in the room topics that I want to, to explore a little bit. >> Here it comes. >> There was an article today in Wired. And it was called, Why AI is Still Waiting for It's Ethics Transplant. And, I will just read a little segment from there. It says, new ethical frameworks for AI need to move beyond individual responsibility to hold powerful industrial, government and military interests accountable as they design and employ AI. When tech giants build AI products, too often user consent, privacy and transparency are overlooked in favor of frictionless functionality that supports profit driven business models based on aggregate data profiles. This is from Kate Crawford and Meredith Whittaker who founded AI Now. And they're calling for sort of, almost clinical trials on AI, if I could use that analogy. Before you go to market you've got to test the human impact, the social impact. Thoughts. >> And also have the ability for a human to intervene at some point in the process. This goes way back. Is everybody familiar with the name Stanislav Petrov? He's the Soviet officer who back in 1983, it was in the control room, I guess somewhere outside of Moscow in the control room, which detected a nuclear missile attack against the Soviet Union coming out of the United States. Ordinarily I think if this was an entirely AI driven process we wouldn't be sitting here right now talking about it. But this gentlemen looked at what was going on on the screen and, I'm sure he's accountable to his authorities in the Soviet Union. He probably got in a lot of trouble for this, but he decided to ignore the signals, ignore the data coming out of, from the Soviet satellites. And as it turned out, of course he was right. The Soviet satellites were seeing glints of the sun and they were interpreting those glints as missile launches. And I think that's a great example why, you know every situation of course doesn't mean the end of the world, (laughs) it was in this case. But it's a great example why there needs to be a human component, a human ability for human intervention at some point in the process. >> So other thoughts. I mean organizations are driving AI hard for profit. Best minds of our generation are trying to figure out how to get people to click on ads. Jeff Hammerbacher is famous for saying it. >> You can use data for a lot of things, data analytics, you can solve, you can cure cancer. You can make customers click on more ads. It depends on what you're goal is. But, there are ethical considerations we need to think about. When we have data that will have a racial bias against blacks and have them have higher prison sentences or so forth or worse credit scores, so forth. That has an impact on a broad group of people. And as a society we need to address that. And as scientists we need to consider how are we going to fix that problem? Cathy O'Neil in her book, Weapons of Math Destruction, excellent book, I highly recommend that your listeners read that book. And she talks about these issues about if AI, if algorithms have a widespread impact, if they adversely impact protected group. And I forget the last criteria, but like we need to really think about these things as a people, as a country. >> So always think the idea of ethics is interesting. So I had this conversation come up a lot of times when I talk to data scientists. I think as a concept, right as an idea, yes you want things to be ethical. The question I always pose to them is, "Well in the business setting "how are you actually going to do this?" 'Cause I find the most difficult thing working as a data scientist, is to be able to make the day to day decision of when someone says, "I don't like that number," how do you actually get around that. If that's the right data to be showing someone or if that's accurate. And say the business decides, "Well we don't like that number." Many people feel pressured to then change the data, change, or change what the data shows. So I think being able to educate people to be able to find ways to say what the data is saying, but not going past some line where it's a lie, where it's unethical. 'Cause you can also say what data doesn't say. You don't always have to say what the data does say. You can leave it as, "Here's what we do know, "but here's what we don't know." There's a don't know part that many people will omit when they talk about data. So I think, you know especially when it comes to things like AI it's tricky, right? Because I always tell people I don't know everyone thinks AI's going to be so amazing. I started an industry by fixing problems with computers that people didn't realize computers had. For instance when you have a system, a lot of bugs, we all have bug reports that we've probably submitted. I mean really it's no where near the point where it's going to start dominating our lives and taking over all the jobs. Because frankly it's not that advanced. It's still run by people, still fixed by people, still managed by people. I think with ethics, you know a lot of it has to do with the regulations, what the laws say. That's really going to be what's involved in terms of what people are willing to do. A lot of businesses, they want to make money. If there's no rules that says they can't do certain things to make money, then there's no restriction. I think the other thing to think about is we as consumers, like everyday in our lives, we shouldn't separate the idea of data as a business. We think of it as a business person, from our day to day consumer lives. Meaning, yes I work with data. Incidentally I also always opt out of my credit card, you know when they send you that information, they make you actually mail them, like old school mail, snail mail like a document that says, okay I don't want to be part of this data collection process. Which I always do. It's a little bit more work, but I go through that step of doing that. Now if more people did that, perhaps companies would feel more incentivized to pay you for your data, or give you more control of your data. Or at least you know, if a company's going to collect information, I'd want you to be certain processes in place to ensure that it doesn't just get sold, right? For instance if a start up gets acquired what happens with that data they have on you? You agree to give it to start up. But I mean what are the rules on that? So I think we have to really think about the ethics from not just, you know, someone who's going to implement something but as consumers what control we have for our own data. 'Cause that's going to directly impact what businesses can do with our data. >> You know you mentioned data collection. So slightly on that subject. All these great new capabilities we have coming. We talked about what's going to happen with media in the future and what 5G technology's going to do to mobile and these great bandwidth opportunities. The internet of things and the internet of everywhere. And all these great inputs, right? Do we have an arms race like are we keeping up with the capabilities to make sense of all the new data that's going to be coming in? And how do those things square up in this? Because the potential is fantastic, right? But are we keeping up with the ability to make it make sense and to put it to use, Joe? >> So I think data ingestion and data integration is probably one of the biggest challenges. I think, especially as the world is starting to become more dependent on data. I think you know, just because we're dependent on numbers we've come up with GAAP, which is generally accepted accounting principles that can be audited and proven whether it's true or false. I think in our lifetime we will see something similar to that we will we have formal checks and balances of data that we use that can be audited. Getting back to you know what Dave was saying earlier about, I personally would trust a machine that was programmed to do the right thing, than to trust a politician or some leader that may have their own agenda. And I think the other thing about machines is that they are auditable. You know you can look at the code and see exactly what it's doing and how it's doing it. Human beings not so much. So I think getting to the truth, even if the truth isn't the answer that we want, I think is a positive thing. It's something that we can't do today that once we start relying on machines to do we'll be able to get there. >> Yeah I was just going to add that we live in exponential times. And the challenge is that the way that we're structured traditionally as organizations is not allowing us to absorb advances exponentially, it's linear at best. Everyone talks about change management and how are we going to do digital transformation. Evidence shows that technology's forcing the leaders and the laggards apart. There's a few leading organizations that are eating the world and they seem to be somehow rolling out new things. I don't know how Amazon rolls out all this stuff. There's all this artificial intelligence and the IOT devices, Alexa, natural language processing and that's just a fraction, it's just a tip of what they're releasing. So it just shows that there are some organizations that have path found the way. Most of the Fortune 500 from the year 2000 are gone already, right? The disruption is happening. And so we are trying, have to find someway to adopt these new capabilities and deploy them effectively or the writing is on the wall. I spent a lot of time exploring this topic, how are we going to get there and all of us have a lot of hard work is the short answer. >> I read that there's going to be more data, or it was predicted, more data created in this year than in the past, I think it was five, 5,000 years. >> Forever. (laughs) >> And that to mix the statistics that we're analyzing currently less than 1% of the data. To taking those numbers and hear what you're all saying it's like, we're not keeping up, it seems like we're, it's not even linear. I mean that gap is just going to grow and grow and grow. How do we close that? >> There's a guy out there named Chris Dancy, he's known as the human cyborg. He has 700 hundred sensors all over his body. And his theory is that data's not new, having access to the data is new. You know we've always had a blood pressure, we've always had a sugar level. But we were never able to actually capture it in real time before. So now that we can capture and harness it, now we can be smarter about it. So I think that being able to use this information is really incredible like, this is something that over our lifetime we've never had and now we can do it. Which hence the big explosion in data. But I think how we use it and have it governed I think is the challenge right now. It's kind of cowboys and indians out there right now. And without proper governance and without rigorous regulation I think we are going to have some bumps in the road along the way. >> The data's in the oil is the question how are we actually going to operationalize around it? >> Or find it. Go ahead. >> I will say the other side of it is, so if you think about information, we always have the same amount of information right? What we choose to record however, is a different story. Now if you want wanted to know things about the Olympics, but you decide to collect information every day for years instead of just the Olympic year, yes you have a lot of data, but did you need all of that data? For that question about the Olympics, you don't need to collect data during years there are no Olympics, right? Unless of course you're comparing it relative. But I think that's another thing to think about. Just 'cause you collect more data does not mean that data will produce more statistically significant results, it does not mean it'll improve your model. You can be collecting data about your shoe size trying to get information about your hair. I mean it really does depend on what you're trying to measure, what your goals are, and what the data's going to be used for. If you don't factor the real world context into it, then yeah you can collect data, you know an infinite amount of data, but you'll never process it. Because you have no question to ask you're not looking to model anything. There is no universal truth about everything, that just doesn't exist out there. >> I think she's spot on. It comes down to what kind of questions are you trying to ask of your data? You can have one given database that has 100 variables in it, right? And you can ask it five different questions, all valid questions and that data may have those variables that'll tell you what's the best predictor of Churn, what's the best predictor of cancer treatment outcome. And if you can ask the right question of the data you have then that'll give you some insight. Just data for data's sake, that's just hype. We have a lot of data but it may not lead to anything if we don't ask it the right questions. >> Joe. >> I agree but I just want to add one thing. This is where the science in data science comes in. Scientists often will look at data that's already been in existence for years, weather forecasts, weather data, climate change data for example that go back to data charts and so forth going back centuries if that data is available. And they reformat, they reconfigure it, they get new uses out of it. And the potential I see with the data we're collecting is it may not be of use to us today, because we haven't thought of ways to use it, but maybe 10, 20, even 100 years from now someone's going to think of a way to leverage the data, to look at it in new ways and to come up with new ideas. That's just my thought on the science aspect. >> Knowing what you know about data science, why did Facebook miss Russia and the fake news trend? They came out and admitted it. You know, we miss it, why? Could they have, is it because they were focused elsewhere? Could they have solved that problem? (crosstalk) >> It's what you said which is are you asking the right questions and if you're not looking for that problem in exactly the way that it occurred you might not be able to find it. >> I thought the ads were paid in rubles. Shouldn't that be your first clue (panelists laugh) that something's amiss? >> You know red flag, so to speak. >> Yes. >> I mean Bitcoin maybe it could have hidden it. >> Bob: Right, exactly. >> I would think too that what happened last year is actually was the end of an age of optimism. I'll bring up the Soviet Union again, (chuckles). It collapsed back in 1991, 1990, 1991, Russia was reborn in. And think there was a general feeling of optimism in the '90s through the 2000s that Russia is now being well integrated into the world economy as other nations all over the globe, all continents are being integrated into the global economy thanks to technology. And technology is lifting entire continents out of poverty and ensuring more connectedness for people. Across Africa, India, Asia, we're seeing those economies that very different countries than 20 years ago and that extended into Russia as well. Russia is part of the global economy. We're able to communicate as a global, a global network. I think as a result we kind of overlook the dark side that occurred. >> John: Joe? >> Again, the foolish optimist here. But I think that... It shouldn't be the question like how did we miss it? It's do we have the ability now to catch it? And I think without data science without machine learning, without being able to train machines to look for patterns that involve corruption or result in corruption, I think we'd be out of luck. But now we have those tools. And now hopefully, optimistically, by the next election we'll be able to detect these things before they become public. >> It's a loaded question because my premise was Facebook had the ability and the tools and the knowledge and the data science expertise if in fact they wanted to solve that problem, but they were focused on other problems, which is how do I get people to click on ads? >> Right they had the ability to train the machines, but they were giving the machines the wrong training. >> Looking under the wrong rock. >> (laughs) That's right. >> It is easy to play armchair quarterback. Another topic I wanted to ask the panel about is, IBM Watson. You guys spend time in the Valley, I spend time in the Valley. People in the Valley poo-poo Watson. Ah, Google, Facebook, Amazon they've got the best AI. Watson, and some of that's fair criticism. Watson's a heavy lift, very services oriented, you just got to apply it in a very focused. At the same time Google's trying to get you to click on Ads, as is Facebook, Amazon's trying to get you to buy stuff. IBM's trying to solve cancer. Your thoughts on that sort of juxtaposition of the different AI suppliers and there may be others. Oh, nobody wants to touch this one, come on. I told you elephant in the room questions. >> Well I mean you're looking at two different, very different types of organizations. One which is really spent decades in applying technology to business and these other companies are ones that are primarily into the consumer, right? When we talk about things like IBM Watson you're looking at a very different type of solution. You used to be able to buy IT and once you installed it you pretty much could get it to work and store your records or you know, do whatever it is you needed it to do. But these types of tools, like Watson actually tries to learn your business. And it needs to spend time doing that watching the data and having its models tuned. And so you don't get the results right away. And I think that's been kind of the challenge that organizations like IBM has had. Like this is a different type of technology solution, one that has to actually learn first before it can provide value. And so I think you know you have organizations like IBM that are much better at applying technology to business, and then they have the further hurdle of having to try to apply these tools that work in very different ways. There's education too on the side of the buyer. >> I'd have to say that you know I think there's plenty of businesses out there also trying to solve very significant, meaningful problems. You know with Microsoft AI and Google AI and IBM Watson, I think it's not really the tool that matters, like we were saying earlier. A fool with a tool is still a fool. And regardless of who the manufacturer of that tool is. And I think you know having, a thoughtful, intelligent, trained, educated data scientist using any of these tools can be equally effective. >> So do you not see core AI competence and I left out Microsoft, as a strategic advantage for these companies? Is it going to be so ubiquitous and available that virtually anybody can apply it? Or is all the investment in R&D and AI going to pay off for these guys? >> Yeah, so I think there's different levels of AI, right? So there's AI where you can actually improve the model. I remember when I was invited when Watson was kind of first out by IBM to a private, sort of presentation. And my question was, "Okay, so when do I get "to access the corpus?" The corpus being sort of the foundation of NLP, which is natural language processing. So it's what you use as almost like a dictionary. Like how you're actually going to measure things, or things up. And they said, "Oh you can't." "What do you mean I can't?" It's like, "We do that." "So you're telling me as a data scientist "you're expecting me to rely on the fact "that you did it better than me and I should rely on that." I think over the years after that IBM started opening it up and offering different ways of being able to access the corpus and work with that data. But I remember at the first Watson hackathon there was only two corpus available. It was either the travel or medicine. There was no other foundational data available. So I think one of the difficulties was, you know IBM being a little bit more on the forefront of it they kind of had that burden of having to develop these systems and learning kind of the hard way that if you don't have the right models and you don't have the right data and you don't have the right access, that's going to be a huge limiter. I think with things like medical, medical information that's an extremely difficult data to start with. Partly because you know anything that you do find or don't find, the impact is significant. If I'm looking at things like what people clicked on the impact of using that data wrong, it's minimal. You might lose some money. If you do that with healthcare data, if you do that with medical data, people may die, like this is a much more difficult data set to start with. So I think from a scientific standpoint it's great to have any information about a new technology, new process. That's the nice that is that IBM's obviously invested in it and collected information. I think the difficulty there though is just 'cause you have it you can't solve everything. And if feel like from someone who works in technology, I think in general when you appeal to developers you try not to market. And with Watson it's very heavily marketed, which tends to turn off people who are more from the technical side. Because I think they don't like it when it's gimmicky in part because they do the opposite of that. They're always trying to build up the technical components of it. They don't like it when you're trying to convince them that you're selling them something when you could just give them the specs and look at it. So it could be something as simple as communication. But I do think it is valuable to have had a company who leads on the forefront of that and try to do so we can actually learn from what IBM has learned from this process. >> But you're an optimist. (John laughs) All right, good. >> Just one more thought. >> Joe go ahead first. >> Joe: I want to see how Alexa or Siri do on Jeopardy. (panelists laugh) >> All right. Going to go around a final thought, give you a second. Let's just think about like your 12 month crystal ball. In terms of either challenges that need to be met in the near term or opportunities you think will be realized. 12, 18 month horizon. Bob you've got the microphone headed up, so I'll let you lead off and let's just go around. >> I think a big challenge for business, for society is getting people educated on data and analytics. There's a study that was just released I think last month by Service Now, I think, or some vendor, or Click. They found that only 17% of the employees in Europe have the ability to use data in their job. Think about that. >> 17. >> 17. Less than 20%. So these people don't have the ability to understand or use data intelligently to improve their work performance. That says a lot about the state we're in today. And that's Europe. It's probably a lot worse in the United States. So that's a big challenge I think. To educate the masses. >> John: Joe. >> I think we probably have a better chance of improving technology over training people. I think using data needs to be iPhone easy. And I think, you know which means that a lot of innovation is in the years to come. I do think that a keyboard is going to be a thing of the past for the average user. We are going to start using voice a lot more. I think augmented reality is going to be things that becomes a real reality. Where we can hold our phone in front of an object and it will have an overlay of prices where it's available, if it's a person. I think that we will see within an organization holding a camera up to someone and being able to see what is their salary, what sales did they do last year, some key performance indicators. I hope that we are beyond the days of everyone around the world walking around like this and we start actually becoming more social as human beings through augmented reality. I think, it has to happen. I think we're going through kind of foolish times at the moment in order to get to the greater good. And I think the greater good is using technology in a very, very smart way. Which means that you shouldn't have to be, sorry to contradict, but maybe it's good to counterpoint. I don't think you need to have a PhD in SQL to use data. Like I think that's 1990. I think as we evolve it's going to become easier for the average person. Which means people like the brain trust here needs to get smarter and start innovating. I think the innovation around data is really at the tip of the iceberg, we're going to see a lot more of it in the years to come. >> Dion why don't you go ahead, then we'll come down the line here. >> Yeah so I think over that time frame two things are likely to happen. One is somebody's going to crack the consumerization of machine learning and AI, such that it really is available to the masses and we can do much more advanced things than we could. We see the industries tend to reach an inflection point and then there's an explosion. No one's quite cracked the code on how to really bring this to everyone, but somebody will. And that could happen in that time frame. And then the other thing that I think that almost has to happen is that the forces for openness, open data, data sharing, open data initiatives things like Block Chain are going to run headlong into data protection, data privacy, customer privacy laws and regulations that have to come down and protect us. Because the industry's not doing it, the government is stepping in and it's going to re-silo a lot of our data. It's going to make it recede and make it less accessible, making data science harder for a lot of the most meaningful types of activities. Patient data for example is already all locked down. We could do so much more with it, but health start ups are really constrained about what they can do. 'Cause they can't access the data. We can't even access our own health care records, right? So I think that's the challenge is we have to have that battle next to be able to go and take the next step. >> Well I see, with the growth of data a lot of it's coming through IOT, internet of things. I think that's a big source. And we're going to see a lot of innovation. A new types of Ubers or Air BnBs. Uber's so 2013 though, right? We're going to see new companies with new ideas, new innovations, they're going to be looking at the ways this data can be leveraged all this big data. Or data coming in from the IOT can be leveraged. You know there's some examples out there. There's a company for example that is outfitting tools, putting sensors in the tools. Industrial sites can therefore track where the tools are at any given time. This is an expensive, time consuming process, constantly loosing tools, trying to locate tools. Assessing whether the tool's being applied to the production line or the right tool is at the right torque and so forth. With the sensors implanted in these tools, it's now possible to be more efficient. And there's going to be innovations like that. Maybe small start up type things or smaller innovations. We're going to see a lot of new ideas and new types of approaches to handling all this data. There's going to be new business ideas. The next Uber, we may be hearing about it a year from now whatever that may be. And that Uber is going to be applying data, probably IOT type data in some, new innovative way. >> Jennifer, final word. >> Yeah so I think with data, you know it's interesting, right, for one thing I think on of the things that's made data more available and just people we open to the idea, has been start ups. But what's interesting about this is a lot of start ups have been acquired. And a lot of people at start ups that got acquired now these people work at bigger corporations. Which was the way it was maybe 10 years ago, data wasn't available and open, companies kept it very proprietary, you had to sign NDAs. It was like within the last 10 years that open source all of that initiatives became much more popular, much more open, a acceptable sort of way to look at data. I think that what I'm kind of interested in seeing is what people do within the corporate environment. Right, 'cause they have resources. They have funding that start ups don't have. And they have backing, right? Presumably if you're acquired you went in at a higher title in the corporate structure whereas if you had started there you probably wouldn't be at that title at that point. So I think you have an opportunity where people who have done innovative things and have proven that they can build really cool stuff, can now be in that corporate environment. I think part of it's going to be whether or not they can really adjust to sort of the corporate, you know the corporate landscape, the politics of it or the bureaucracy. I think every organization has that. Being able to navigate that is a difficult thing in part 'cause it's a human skill set, it's a people skill, it's a soft skill. It's not the same thing as just being able to code something and sell it. So you know it's going to really come down to people. I think if people can figure out for instance, what people want to buy, what people think, in general that's where the money comes from. You know you make money 'cause someone gave you money. So if you can find a way to look at a data or even look at technology and understand what people are doing, aren't doing, what they're happy about, unhappy about, there's always opportunity in collecting the data in that way and being able to leverage that. So you build cooler things, and offer things that haven't been thought of yet. So it's a very interesting time I think with the corporate resources available if you can do that. You know who knows what we'll have in like a year. >> I'll add one. >> Please. >> The majority of companies in the S&P 500 have a market cap that's greater than their revenue. The reason is 'cause they have IP related to data that's of value. But most of those companies, most companies, the vast majority of companies don't have any way to measure the value of that data. There's no GAAP accounting standard. So they don't understand the value contribution of their data in terms of how it helps them monetize. Not the data itself necessarily, but how it contributes to the monetization of the company. And I think that's a big gap. If you don't understand the value of the data that means you don't understand how to refine it, if data is the new oil and how to protect it and so forth and secure it. So that to me is a big gap that needs to get closed before we can actually say we live in a data driven world. >> So you're saying I've got an asset, I don't know if it's worth this or this. And they're missing that great opportunity. >> So devolve to what I know best. >> Great discussion. Really, really enjoyed the, the time as flown by. Joe if you get that augmented reality thing to work on the salary, point it toward that guy not this guy, okay? (everyone laughs) It's much more impressive if you point it over there. But Joe thank you, Dion, Joe and Jennifer and Batman. We appreciate and Bob Hayes, thanks for being with us. >> Thanks you guys. >> Really enjoyed >> Great stuff. >> the conversation. >> And a reminder coming up a the top of the hour, six o'clock Eastern time, IBMgo.com featuring the live keynote which is being set up just about 50 feet from us right now. Nick Silver is one of the headliners there, John Thomas is well, or rather Rob Thomas. John Thomas we had on earlier on The Cube. But a panel discussion as well coming up at six o'clock on IBMgo.com, six to 7:15. Be sure to join that live stream. That's it from The Cube. We certainly appreciate the time. Glad to have you along here in New York. And until the next time, take care. (bright digital music)

Published Date : Nov 1 2017

SUMMARY :

Brought to you by IBM. Welcome back to data science for all. So it is a new game-- Have a swing at the pitch. Thanks for taking the time to be with us. from the academic side to continue data science And there's lot to be said is there not, ask the questions, you can't not think about it. of the customer and how we were going to be more anticipatory And I think, you know as the tools mature, So it's still too hard. I think that, you know, that's where it's headed. So Bob if you would, so you've got this Batman shirt on. to be a data scientist, but these tools will help you I was just going to add that, you know I think it's important to point out as well that And the data scientists on the panel And the only difference is that you can build it's an accomplishment and for less, So I think you have to think about the fact that I get the point of it and I think and become easier to use, you know like Bob was saying, So how at the end of the day, Dion? or bots that go off and run the hypotheses So you know people who are using the applications are now then people can speak really slowly to you in French, But the day to day operations was they ran some data, That's really the question. You know it's been said that the data doesn't lie, the access to the truth through looking at the numbers of the organization where you have the routine I tend to be a foolish optimist You do. I think as we start relying more on data and trusting data There's a couple elephant in the room topics Before you go to market you've got to test And also have the ability for a human to intervene to click on ads. And I forget the last criteria, but like we need I think with ethics, you know a lot of it has to do of all the new data that's going to be coming in? Getting back to you know what Dave was saying earlier about, organizations that have path found the way. than in the past, I think it was (laughs) I mean that gap is just going to grow and grow and grow. So I think that being able to use this information Or find it. But I think that's another thing to think about. And if you can ask the right question of the data you have And the potential I see with the data we're collecting is Knowing what you know about data science, for that problem in exactly the way that it occurred I thought the ads were paid in rubles. I think as a result we kind of overlook And I think without data science without machine learning, Right they had the ability to train the machines, At the same time Google's trying to get you And so I think you know And I think you know having, I think in general when you appeal to developers But you're an optimist. Joe: I want to see how Alexa or Siri do on Jeopardy. in the near term or opportunities you think have the ability to use data in their job. That says a lot about the state we're in today. I don't think you need to have a PhD in SQL to use data. Dion why don't you go ahead, We see the industries tend to reach an inflection point And that Uber is going to be applying data, I think part of it's going to be whether or not if data is the new oil and how to protect it I don't know if it's worth this or this. Joe if you get that augmented reality thing Glad to have you along here in New York.

ENTITIES

Entity	Category	Confidence
Jeff Hammerbacher	PERSON	0.99+
Dave	PERSON	0.99+
Dion Hinchcliffe	PERSON	0.99+
John	PERSON	0.99+
Jennifer	PERSON	0.99+
Joe	PERSON	0.99+
Comcast	ORGANIZATION	0.99+
Chris Dancy	PERSON	0.99+
Jennifer Shin	PERSON	0.99+
Cathy O'Neil	PERSON	0.99+
Dave Vellante	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Stanislav Petrov	PERSON	0.99+
Joe McKendrick	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Nick Silver	PERSON	0.99+
John Thomas	PERSON	0.99+
100 variables	QUANTITY	0.99+
John Walls	PERSON	0.99+
1990	DATE	0.99+
Joe Caserta	PERSON	0.99+
Rob Thomas	PERSON	0.99+
Uber	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
UC Berkeley	ORGANIZATION	0.99+
1983	DATE	0.99+
1991	DATE	0.99+
2013	DATE	0.99+
Constellation Research	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
Facebook	ORGANIZATION	0.99+
Bob	PERSON	0.99+
Google	ORGANIZATION	0.99+
Bob Hayes	PERSON	0.99+
United States	LOCATION	0.99+
360 degree	QUANTITY	0.99+
one	QUANTITY	0.99+
New York	LOCATION	0.99+
Benjamin Israeli	PERSON	0.99+
France	LOCATION	0.99+
Africa	LOCATION	0.99+
12 month	QUANTITY	0.99+
Soviet Union	LOCATION	0.99+
Batman	PERSON	0.99+
New York City	LOCATION	0.99+
last year	DATE	0.99+
Olympics	EVENT	0.99+
Meredith Whittaker	PERSON	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
Moscow	LOCATION	0.99+
Ubers	ORGANIZATION	0.99+
20 years	QUANTITY	0.99+
Joe C.	PERSON	0.99+

Anjali Menon, Morgan Stanley | Grace Hopper 2017

(techno music) >> Narrator: Live from Orlando, Florida. It's the Cube. Covering Grace Hopper's Celebration of Women in Computing. Brought to you by Silicon Angle Media. >> Welcome back to the Cube's coverage of the Grace Hopper Conference here in Orlando, Florida. I'm your host Rebecca Knight. We're joined now by Anjali Menon. She is the VP of Technology at Morgan Stanley. Thanks so much for coming on the show. >> My pleasure to be here. >> So I'd love to just tell our viewers a little about your journey as a woman in technology who now works at an investment bank. >> Yes, absolutely. I think it's a very long journey, if you will. It started when I was seven years old. Back in my school we had an extra curricular computer science course, so I signed up for it. And I remember starting out as, you know, someone who was coding in basic. And, you know, it was just very simple things. You draw a line, draw a kite, watch it move across the screen. It was just so exciting for someone of that age. So, you know, I kept at it. I continued to enroll in the same course over the years. So, middle school, high school and then I did my undergraduate in computer science in engineering. And then in 2011 I graduated from NYU with a Masters in Computer Science. And, you know, Morgan Stanley was one of those companies that had showed up during on campus recruitment. And just the feedback that I had heard from my other peers who were already in the company, just, you know, about the work culture at Morgan Stanley. It was just really, really good. So, you know, I joined Morgan Stanley and right now I'm, you know, Assistant Owner. I own the Equities and Options order entry application. So I'm responsible for, you know, the overall design and development. So it's been a really exciting journey. To, uh, you know, Morgan Stanley, yep. >> So you as a woman in technology and now working in finance. >> Yes. >> I mean these are two very male dominated industries. >> Mm-hmm (affirmative) >> That are come together to provide your jobs. >> Yeah. >> How, what is it like to be a woman on the front lines? >> So, you know it's interesting, I fee like a lot of people have, you know, misconceptions about that. You know, about being a woman in tech. But we have a very diverse and inclusive culture at Morgan Stanley. Like I mentioned, I am Assistant Owner for the Equities and Options Order Entry Application. So, you know, when I'm sitting at a table with senior managers, because I'm the subject matter expertise, expert, it's great to, you know, look at them sit and listen to me talk because, you know, I'm the one who's bringing in the information. So it doesn't really matter if you're a woman or a man. What matters is, are you the one with the expertise? Are you the one with the talent, right? And they're going to sit up and listen to you irrespective of your gender. So, you know, that's just the culture at Morgan Stanley. So, uh, yep. >> So now, talking about the culture. And you are here, obviously, trying to recruit bright, young talent at the Grace Hopper Conference. >> Yes, yes. >> What are you hearing from potential employees? What are they looking for in a company? >> What are we looking for in students, or? >> I'm interested in both what Morgan Stanley wants to see out of perspective candidates. >> Mm-hmm (affirmative) >> But also what you're hearing from the recruits themselves in terms of how they want their job to fit into their lives. >> Absolutely, a lot of, one of the recurring questions that I do get when I'm interviewing students is, you know, how do you maintain the whole work life balance? Like you said, finance and tech. It's a very grueling industry, right? So how do you keep that balance? And what's really wonderful is that, you know, you don't have to sacrifice you personal life, or your passion projects, for your work. Me personally, uh me personally, for the last year I've been taking a lot of extra curricular courses. Non credit courses at NYU in film making and photography. Because that's just my passion project. I love telling stories, and I used to be a writer, and I was just looking to explore other mediums for telling stories. So in the last year, since the summer of 2016, I've been taking courses at NYU and it's just been such a great experience then, and I think Morgan Stanley sort of allows you to have that culture. Right? You have your nine to five job and during those hours you're very focused on what you're doing, but, you know, they do give you time outside of that to just, like, work on your passion projects. And it's great that I can find that balance between the two. >> So Morgan Stanley could be a choice employee, employer for a young woman looking for a work life balance. >> Oh, yeah, absolutely, absolutely. >> And now what are you looking for in a potential recruit? What are you telling the young women here at Grace Hopper? >> We are looking for women who are bright and very confident. I feel like all of the interviews that I've done in the last few days, I've met such wonderful young women. And it's really difficult to choose because everyone has their own area of expertise. And you can tell they're very, very intelligent. They love challenges, right? A lot of the questions that I ask are typically around, like, problem solving, and puzzles. And, it's great to see how they can approach it, and deconstruct it. So, it's been really difficult trying to find, it's been really difficult trying to choose one over the other because everyone is just so equally bright, yeah. >> So, how are you, how are you going about this recruitment process? What are, how are you assembling a diverse team? >> So we've been doing a lot of on the spot quizzes. So like once a day we have two problems that are presented. We have students stopping by and they're working it out. We're helping them through the process of, you know, figuring out the solution. And, you know, anyone who stands out, we're pulling them aside, scheduling interviews with them. We are actually also making offers on the spot as well. >> Oh wow, Okay. >> So, that's, that's been a new experience, so, yeah. It's been, we have a lot of interviews already scheduled as well, so , yeah. >> So when you're, in terms of your job, what are the things you are most excited about that you're working on? In terms of the real technical challenges that you're facing? >> Absolutely, so, I work within the capital market space and wealth management. Our clients are financial advisors, right, so, my job, when I came in three or four years ago, was, I wanted to enhance the order entry experience for the equities and options product. And essentially what we were looking to do was enable the FAs with the tool, that would enable them to do their jobs efficiently and quickly. So the last couple of years, we've been building an equities trading platform that would enable them to do just that. And it's just really exciting to see what the legacy system did and what the new system does and the progress that we've made. And we just hear really good feedback from the field as well. Like, our clients, the FAs, Financial Advisors, who are using the new system. It's great to hear things like, "Oh, I love that I can do my job so quickly. It's just like one or two clicks and I can do so much more than the legacy system.". So it's really exciting. >> So what is the difference there? What are you enabling to happen so much more speedily than happened in the legacy system? >> So, our legacy system was a single order entry application. While the new system allows them to submit multiple orders across securities, across accounts in a single, in a single, operation. So what would have taken, you know, minutes to submit say ten orders, is now just takes a few seconds. So, it's just a faster enhanced order entry experience. And I love that I was a part of that, that journey, yup. >> So, so speed is one thing. What are some other priorities that you have going forward in terms of enhancing the products that you provide to financial advisors? >> Just be able to efficiently, you know, submit orders as well. So with respect to, you know, just submitting multiple orders going across securities. Or even like quickly creating tickets. With the legacy system it was a lot of like form filling. You start, you entered the account, you entered the security and you fill out all the other details. But we've enabled them with quick ways to create tickets. So, in just a few keystrokes, with, like, semantic based entries, they can create like, multiple tickets and submit the orders. So, just being able to efficiently do their job as well was one of the key things that we were looking to deliver. >> And are you focused at all on the user, the sort of the design user experience element too? >> So we do have a dedicated user experience team. But since I started off as a front end developer, I did work very closely with them, to help, like build out that interface. So, yeah, we do have a dedicated team. It was great to actually work with them to help build that out, yup. >> Great. And finally, I just am curious about your thoughts about this Grace Hopper Conference. This is, is this your first time? >> It's my first time at Grace Hopper. >> A newbie here. >> It's been overwhelming. I remember walking in yesterday and I could see a sea of people and it's been wonderful, yeah. >> Great, great. So we'll see you here next year? >> Absolutely. >> Excellent. Well Anjali thank you so much, it's been a pleasure talking to you, having you on the show. >> Thank you. >> I'm Rebecca Knight. We'll have more from the Grace Hopper Conference in just a little bit. (techno music)

Published Date : Oct 12 2017

SUMMARY :

Brought to you by Silicon Angle Media. Thanks so much for coming on the show. So I'd love to just tell our viewers And I remember starting out as, you know, So you as a woman in technology So, you know, that's just the culture at Morgan Stanley. And you are here, obviously, trying to I'm interested in both what Morgan Stanley job to fit into their lives. And what's really wonderful is that, you know, So Morgan Stanley could be a choice employee, And you can tell they're very, very intelligent. you know, figuring out the solution. So, that's, that's been a new experience, so, yeah. And it's just really exciting to see So what would have taken, you know, minutes enhancing the products that you So with respect to, you know, So we do have a dedicated user experience team. And finally, I just am curious about it's been wonderful, yeah. So we'll see you here next year? Well Anjali thank you so much, it's been We'll have more from the Grace Hopper Conference

ENTITIES

Entity	Category	Confidence
Rebecca Knight	PERSON	0.99+
Anjali Menon	PERSON	0.99+
Anjali	PERSON	0.99+
one	QUANTITY	0.99+
Morgan Stanley	ORGANIZATION	0.99+
2011	DATE	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
Orlando, Florida	LOCATION	0.99+
two	QUANTITY	0.99+
yesterday	DATE	0.99+
last year	DATE	0.99+
next year	DATE	0.99+
two clicks	QUANTITY	0.99+
both	QUANTITY	0.99+
NYU	ORGANIZATION	0.99+
two problems	QUANTITY	0.99+
first time	QUANTITY	0.99+
nine	QUANTITY	0.98+
Grace Hopper	ORGANIZATION	0.98+
ten orders	QUANTITY	0.98+
Grace Hopper Conference	EVENT	0.96+
single	QUANTITY	0.96+
single order	QUANTITY	0.96+
Grace Hopper	EVENT	0.92+
one thing	QUANTITY	0.91+
seven years old	QUANTITY	0.9+
three	DATE	0.9+
once a day	QUANTITY	0.89+
Cube	ORGANIZATION	0.87+
summer of 2016	DATE	0.87+
four years ago	DATE	0.85+
five job	QUANTITY	0.82+
Morgan Stanley	PERSON	0.82+
2017	DATE	0.73+
Grace Hopper	TITLE	0.67+
years	DATE	0.65+
last couple	DATE	0.64+
of Women in Computing	TITLE	0.56+
key	QUANTITY	0.48+

Dr. Dawn Nafus | SXSW 2017

>> Announcer: Live from Austin, Texas it's the Cube. Covering South by Southwest 2017. Brought to you by Intel. Now here's John Furrier. Okay we're back live here at the South by Southwest Intel AI Lounge, this is The Cube's special coverage of South by Southwest with Intel, #IntelAI where amazing starts with Intel. Our next guest is Dr. Dawn Nafus who's with Intel and you are a senior research scientist. Welcome to The Cube. >> Thank you. >> So you've got a panel coming up and you also have a book AI For Everything. And looking at a democratization of AI we had a quote yesterday that, "AI is the bulldozer for data." What bulldozers were in the real world, AI will be that bulldozer for data, surfacing new experiences. >> Right. >> This is the subject of your book, kind of. What's your take on this and what's your premise? >> Right well the book actually takes a step way back, it's actually called Self Tracking, the panel is AI For Everyone. But the book is on self tracking. And it's really about actually getting some meaning out of data before we start talking about bulldozers. So right now we've got this situation where there's a lot of talk about AI's going to sort of solve all of our problems in health and there's a lot that can get accomplished, whoops. But the fact of the matter is is that people are still struggling with gees, like, "What does my Fitbit actually mean, right?" So there's this, there's a real big gap. And I think probably part of what the industry has to do is not just sort of build new great technologies which we've got to do but also start to fill that gap in sort of data education, data literacy, all that sort of stuff. >> So we're kind of in this first generation of AI data you mentioned wearable, Fitbits. >> Dawn: Yup. >> So people are now getting used to this, so that it sounds this integration into lifestyle becomes kind of a dynamic. >> Yeah. >> Why are people grappling >> John: with this, what's your research say about that? >> Well right now with wearables frankly we're in the classic trough of disillusionment. (laughs) You know for those of you listening I don't know if you have sort of wearables in drawers right now, right? But a lot of people do. And it turns out that folks tend to use it, you know maybe about three or four weeks and either they've learned something really interesting and helpful or they haven't. And so there's actually a lot of people who do really interesting stuff to kind of combine it with symptoms tracking, location, right other sorts of things to actually really reveal the sorts of triggers for medical issues that you can't find in a clinical setting. It's all about being out in the real world and figuring out what's going on with you. Right, so then when we start to think about adding more complexity into that, which is the thing that AI's good at, we've got this problem of there's only so many data sets that AI's any actually any good at handling. And so I think there's going to have to be a moment where sort of people themselves actually start to say, "Okay you know what? "This is how I define my problem. "This is what I'm going to choose to keep track of." And some of that's going to be on a sensor and some of it isn't. Right and sort of being really intervening a little bit more strongly in what this stuff's actually doing. >> You mentioned the Fitbit and you were seeing a lot of disruption in the areas, innovation and disruption, same thing good and bad potentially. But I'll see autonomous vehicles is pretty clear, and knows what Tesla's tracking with their hot trend. But you mentioned Fitbit, that's a healthcare kind of thing. AIs might seem to be a perfect fit into healthcare because there's always alarms going off and all this data flying around. Is that a low hanging fruit for AI? Healthcare? >> Well I don't know if there's any such thing as low hanging fruit (John laughs) in this space. (laughs) But certainly if you're talking about like actual human benefit, right? That absolutely comes the top of the list. And we can see that in both formal healthcare in clinical settings and sort of imaging for diagnosis. Again I think there's areas to be cautious about, right? You know making sure that there's also an appropriate human check and there's also mechanisms for transparency, right? So that doctors, when there is a discrepancy between what the doctor believes and what the machine says you can actually go back and figure out what's actually going on. The other thing I'm particularly excited about is, and this is why I'm so interested in democratization is that health is not just about, you know, what goes on in clinical care. There are right now environmental health groups who are looking at slew of air quality data that they don't know what to do with, right? And a certain amount of machine assistance to sort of figure out you know signatures of sort of point source polluters, for example, is a really great use of AI. It's not going to make anybody any money anytime soon, but that's the kind of society that we want to live in right? >> You are the social good angle for sure, but I'd like to get your thoughts 'cause you mentioned democratization and it's kind of a nuance depending upon what you're looking at. Democratization with news and media is what you saw with social media now you got healthcare. So how do you define democratization in your context and you're excited about.? Is that more of freedom of information and data is it getting around gatekeepers and siloed stacks? I mean how do you look at democratization? >> All of the above. (laughs) (John laughs) I'd say there are two real elements to that. The first is making sure that you know, people are going to use this for more than just business, have the ability to actually do it and have access to the right sorts of infrastructures to, whether it's the environmental health case or there are actually artists now who use natural language processing to create art work. And people ask them, "Why are you using deblurting?" I said, "Well there's a real access issue frankly." It's also on the side of if you're not the person who's going to be directly using data a kind of a sense of, you know... Democratization to me means being able to ask questions of how the stuff's actually behaving. So that means building in mechanisms for transparency, building in mechanisms to allow journalists to do the work that they do. >> Sharing potentially? >> I'm sorry? >> And sharing as well more data? >> Very, very good. Right absolutely, I mean frankly we still have a problem right now in the wearable base of people even getting access to their own data. There's a guy I work with named Hugo Campos who has an arterial defibrillator and he's still fighting to get access to the very data that's coming out of his heart. Right? (laughs) >> Is it on SSD, in the cloud? I mean where is it? >> It is in the cloud. It's going back to the manufacturer. And there are very robust conversations about where it should be. >> That's super sad. So this brings up the whole thing that we've been talking about yesterday when we had a mini segment on The Cube is that there are all these new societal use cases that are just springing up that we've never seen before. Self-driving cars with transportation, healthcare access to data, all these things. What are some of the things that you see emerging on that tools or approaches that could help either scientists or practitioners or citizens deal with these new critical problem solving that needs to apply technology to. I was talking just last week at Stanford with folks that are looking at gender bias and algorithms. >> Right, uh-huh it's real. >> Something I would never have thought of that's an outlier. Like hey, what? >> Oh no, it's happened. >> But it's one of those things were okay, let's put that on the table. There's all this new stuff coming on the table. >> Yeah, yeah absolutely. >> What do you see? >> So they're-- >> How do we solve that >> John: what approaches? >> Yeah there are a couple of mechanisms and I would encourage listeners and folks in the audience to have a look at a really great report that just came out from the Obama Administration and NYU School of Law. It's called AI Now and they actually propose a couple of pathways to sort of making sure we get this right. So you know a couple of things. You know one is frankly making sure that women and people of color are in the room when the stuff's getting built, right? That helps. You know as I said earlier you know making sure that you know things will go awry. Like it just will we can't predict how these things are going to work and catching it after the fact and building in mechanisms to be able to do that really matter. So there was a great effort by ProPublica to look at a system that was predicting criminal recidivism. And what they did was they said, "Look you know "it is true that "the thing has the same failure rate "for both blacks and whites." But some hefty data journalism and data scraping and all the rest of it actually revealed that it was producing false positives for blacks and false negatives for whites. Meaning that black people were predicted to create more crime than white people right? So you know, we can catch that, right? And when we build in more system of people who had the skills to do it, then we can build stuff that we can live with. >> This is exactly to your point of democratization I think that fascinates me that I get so excited about. It's almost intoxicating when you think about it technically and also societal that there's all these new things that are emerging and the community has to work together. Because it's one of those things where there's no, there may be a board of governors out there. I mean who is the board of governors for this stuff? It really has to be community driven. >> Yeah, yeah. >> And NYU's got one, any other examples of communities that are out there that people can participate in or? >> Yup, absolutely. So I think that you know, they're certainly collaborating on projects that you actually care about and sort of asking good questions about, is this appropriate for AI or not, right? Is a great place to start of reaching out to people who have those technical skills. There are also the Engineering Professional Association actually just came out a couple months ago with a set of guidelines for developers to be able to... The kinds of things you have to think about if you're going to build an ethical AI system. So they came out with some very high level principles. Operationalizing those principles is going to be a real tough job and we're all going to have to pitch in. And I'm certainly involved in that. But yeah, there are actually systems of governance that are cohering, but it's early days. >> It's great way to get involved. So I got to ask you the personal question. In your efforts with the research and the book and all of your travels, what's some of the most amazing things that you've seen with AI that are out there that people may know about or may not know about that they should know about? >> Oh gosh. I'm going to reserve judgment, I don't know yet. I think we're too early on the curve to be able to talk about, you know, sort of the magic of it. What I can say is that there is real power when ordinary people who have no coding skills whatsoever and frankly don't even know what the heck machine learning is, get their heads around data that is collected about them personally. That opens up, you can teach five year olds statistical concepts that are learned in college with a wearable because the data applies to them. So they know how it's been collected. >> It's personal. >> Yeah they know what it is already. You don't have to tell them what a outlier effect is because they know because they wear that outlier. You know what I mean. >> They're immersed in the data. >> Absolutely and I think that's where the real social change is going to come from. >> I love immersion as a great way to teach kids. But the data's key. So I got to ask you with the big pillars of change going on and at Mobile World Congress I saw you, Intel in particular, talking about autonomous vehicles heavily, smart cities, media entertainment and the smart home. I'm just trying to get a peg a comparable of how big this shift will be. These will be, I mean the '60s revolution when chips started coming out, the PC revolution and server revolution and now we're kind of in this new wave. How big is it? I mean in order of magnitude, is it super huge with all of the other ships combined? Are we going to see radical >> I don't know. >> configuration changes? >> You know. You know I'm an anthropologist, right? (John laughs) You know everything changes and nothing changes at the same time, right? We're still going to wake up, we're still going to put on our shoes in the morning, right? We're still going to have a lot of the same values and social structures and all the rest of it that we've always had, right. So I don't think in terms of plonk, here's a bunch of technology now. Now that's a revolution. There's like a dialogue. And we are just at the very, very baby steps of having that dialogue. But when we do people in my field call it domestication, right? These become tame, they become part of our lives, we shape them and they shape us. And that's not radical change, that's the change we always have. >> That's evolution. So I got to ask you a question because I have four kids and I have this conversation with my wife and friends all the time because we have kids, digital natives are growing up. And we see a lot of also work place domestication, people kind of getting domesticated with the new technologies. What's your advice whether it's parents to their kids, kids to growing up in this world, whether it's education? How should people approach the technology that's coming at them so heavily? In the age of social media where all our voices are equal right now, getting more filters are coming out. It's pretty intense. >> Yeah, yeah. I think it's an occasion where people have to think a lot more deliberately than they ever have about the sources of information that they want exposure to. The kinds of interaction, the mechanisms that actual do and don't matter. And thinking very clearly about what's noise and what's not is a fine thing to do. (laughs) (John laughs) so yeah, probably the filtering mechanisms has to get a bit stronger. I would say too there's a whole set of practices, there are ways that you can scrutinize new devices for, you know, where the data goes. And often, kind of the higher bar companies will give you access back, right? So if you can't get your data out again, I would start asking questions. >> All right final two questions for you. What's your experiences like so far at South by Southwest? >> Yup. >> And where is the world going to take you next in terms of your research and your focus? >> Well this is my second year at South by Southwest. It's hugely fun, I am so pleased to see just a rip roaring crowd here at the Intel facility which is just amazing. I think this is our first time as in Dell proper. I'm having a really good time. The Self Tracking book is in the book shelf over in the convention center if you're interested. And what's next is we are going to get real about how to make, how to make these ethical principles actually work at an engineering level. >> Computer science meets social science, happening right now. >> Absolutely. >> Intel powering amazing here at South by Southwest. I'm John Furrier you're watching The Cube. We've got a great set of people here on The Cube. Also great AI Lounge experience, great demos, great technologists all about AI for social change with Dr. Dawn Nafus with Intel. We'll be right back with more coverage after this short break. (upbeat digital beats)

Published Date : Mar 11 2017

SUMMARY :

Brought to you by Intel. "AI is the bulldozer for data." This is the subject of your book, kind of. is that people are still struggling with gees, you mentioned wearable, Fitbits. so that it sounds this integration into lifestyle And so I think there's going to have to be a moment where You mentioned the Fitbit and you were seeing to sort of figure out you know signatures So how do you define democratization in your context have the ability to actually do it a problem right now in the wearable base of It's going back to the manufacturer. What are some of the things that you see emerging have thought of that's an outlier. let's put that on the table. had the skills to do it, and the community has to work together. So I think that you know, they're So I got to ask you the personal question. to be able to talk about, you know, You don't have to tell them what a outlier effect is is going to come from. So I got to ask you with the big pillars and social structures and all the rest of it So I got to ask you a question because kind of the higher bar companies will give you What's your experiences like so far It's hugely fun, I am so pleased to see happening right now. We'll be right back with more coverage

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
John Furrier	PERSON	0.99+
NYU School of Law	ORGANIZATION	0.99+
Obama Administration	ORGANIZATION	0.99+
Dawn Nafus	PERSON	0.99+
five year	QUANTITY	0.99+
four kids	QUANTITY	0.99+
Tesla	ORGANIZATION	0.99+
yesterday	DATE	0.99+
ProPublica	ORGANIZATION	0.99+
both	QUANTITY	0.99+
last week	DATE	0.99+
Hugo Campos	PERSON	0.99+
second year	QUANTITY	0.99+
first	QUANTITY	0.99+
Intel	ORGANIZATION	0.99+
Austin, Texas	LOCATION	0.99+
one	QUANTITY	0.99+
The Cube	TITLE	0.99+
first time	QUANTITY	0.99+
two questions	QUANTITY	0.99+
four weeks	QUANTITY	0.98+
Dawn	PERSON	0.98+
NYU	ORGANIZATION	0.98+
SXSW 2017	EVENT	0.98+
Engineering Professional Association	ORGANIZATION	0.98+
first generation	QUANTITY	0.97+
Mobile World Congress	EVENT	0.97+
AI For Everything	TITLE	0.97+
couple months ago	DATE	0.96+
South by Southwest	ORGANIZATION	0.96+
two real elements	QUANTITY	0.95+
'60s	DATE	0.95+
Dell	ORGANIZATION	0.94+
Dr.	PERSON	0.93+
about three	QUANTITY	0.92+
Fitbit	ORGANIZATION	0.89+
Stanford	ORGANIZATION	0.86+
2017	DATE	0.86+
Dr. Dawn Nafus	PERSON	0.85+
Fitbits	ORGANIZATION	0.74+
South by Southwest	LOCATION	0.72+
South by Southwest	TITLE	0.7+
one of	QUANTITY	0.7+
Southwest	LOCATION	0.6+
Self Tracking	TITLE	0.6+
Lounge	ORGANIZATION	0.57+
couple	QUANTITY	0.47+
The Cube	ORGANIZATION	0.46+
Cube	COMMERCIAL_ITEM	0.44+
AI	LOCATION	0.44+
South by	TITLE	0.42+
Southwest	ORGANIZATION	0.39+

Claudia Perlich, Dstillery - Women in Data Science 2017 - #WiDS2017 - #theCUBE

>> Narrator: Live from Stanford University, it's theCUBE covering the Women in Data Science Conference 2017. >> Hi welcome back to theCUBE, I'm Lisa Martin and we are live at Stanford University at the second annual Women in Data Science one day tech conference. We are joined by one of the speakers for the event today, Claudia Perlich, the Chief Scientist at Dstillery, Claudia, welcome to theCUBE. >> Claudia: Thank you so much for having me. It's exciting. >> It is exciting! It's great to have you here. You are quite the prolific author, you've won data mining competitions and awards, you speak at conferences all around the world. Talk to us what you're currently doing as the Chief Scientist for Dstillery. Who's Dstillery? What's the Chief Scientist's role and how are you really leveraging data and science to be a change agent for your clients. I joined Dstillery when it was still called Media6Degrees as a very small startup in the New York ad tech space. It was very exciting. I came out of the IBM Watson Research Lab and really found this a new challenging application area for my skills. What does a Chief Scientist do? It's a good question, I think it actually took the CEO about two years to finally give me a job description, (laughter) and the conclusion at that point was something like, okay there is technical contribution, so I sit down and actually code things and I build prototypes and I play around with data. I also am referred to as Intellectual Leadership, so I work a lot with the teams just kind of scoping problems, brainstorming was may work or dozen, and finally, that's what I'm here for today, is what they consider an Ambassador for the company, so being the face to talk about the more scientific aspects of what's happening now in ad tech, which brings me to what we actually do, right. One of the things that happened over the recent past in advertising is it became an incredible playground for data signs because the available data is incomparable to many other fields that I have seen. And so Dstillery was a pioneer in that space starting to look at initially social data things that people shared, but over the years it has really grown into getting a sense of the digital footprint of what people do. And our primary business model was to bring this to marketers to help them on a much more individualized basis identify who their customers current as well as futures are. Really get a very different understanding than these broad middle-aged soccer mom kind of categories to honor the individual tastes and preferences and actions that really truly reflect the variety of what people do. I'm many things as you mentioned, I publish mom, what's a mom, and I have a horse, so there are many different parts to me. I don't think any single one description fully captures that and we felt that advertising is a great space to explore how you can translate that and help both sides, the people that are being interacted with, as well as the brands that want to make sure that they reach the right individuals. >> Lisa: Very interesting. Well, as buyers journey as changed to mostly online, >> Exactly. >> You're right, it's an incredibly rich opportunity for companies to harness more of that behavioral information and probably see things that they wouldn't have predicted. We were talking to Walmart Labs earlier and one of the interesting insights that they shared was that, especially in Silicon Valley where people spend too much time in the car commuting-- (laughter) You have a long commute as well by train. >> Yes. >> And you'd think that people would want, I want my groceries to show up on my doorstep, I don't want to have to go into the store, and they actually found the opposite that people in such a cosmopolitan area as Silicon Valley actually want to go into the store and pick up-- >> Claudia: Yep. >> Their groceries, so it's very interesting how the data actually can sometimes really change. It's really the scientific method on a very different scale >> Claudia: Much smaller. >> But really using the behavior insights to change the shopping experience, but also to change the experience of companies that are looking to sell their products. >> I think that the last part of the puzzle is, the question is no longer what is the right video for the Super Bowl, I mean we have the Super Bowl coming up, right? >> Lisa: Right. Right. >> They did a study like when do people pay attention to the Super Bowl. You can actually tell, cuz you know what people don't do when they pay attention to the Super Bowl? >> Lisa: Mm,hmm. >> They're not playing around with their phones. They're actually not playing-- >> Lisa: Of course. >> Candy Crush and all these things, so what we see in the ad tech environment, we actually see that the demand for the digital ads go down when people really focus on what's going on on the big screen. But that was a diversion ... >> Lisa: It's very interesting (laughter) though cuz it's something that's very tangible and very ... It's a real world applications. Question for you about data science and your background. You mentioned that you worked with IBM Watson. Forbes has just said that Data Scientist is the best job to apply for in 2017. What is your vision? Talk to us about your team, how you've grown that up, how you're using big data and science to really optimize the products that you deliver to your customers. >> Data Science is really many, many different flavors and in some sense I became a Data Scientist long before the term really existed. Back then I was just a particular weird kind of geek. (laughter) You know all of a sudden it's-- >> Now it has a name. (laughter) >> Right and the reputation to be fun and so you see really many different application areas depending very different skillsets. What is originally the focus of our company has always been around, can we predict what people are going to do? That was always the primary focus and now you see that it's very nicely reflected at the event too. All of sudden communicating this becomes much bigger a part of the puzzle where people say, "Okay, I realize that you're really "good at predicting, but can you tell me why, "what is it these nuggets of inside-- >> Interpretation, right. >> "That you mentioned. Can you visualize what's going on?" And so we grew a team initially from a small group of really focused machine learning and predictive skills over to the broader can you communicate it. Can you explain to the customer archieve brands what happened here. Can you visualize data. That's kind of the broader shift and I think the most challenging part that I can tell in the broader picture of where there is a bit of a short coming in skillset, we have a lot of people who are really good today at analyzing data and coding, so that part has caught up. There are so many Data Science programs. What I still am looking for is how do you bring management and corporate culture to the place where they can truly take advantage of it. >> Lisa: Right. >> This kind of disconnect that we still have-- >> Lisa: Absolutely. >> How do we educate the management level to be comfortable evaluating what their data science group actually did. Whether they working on the right problems that really ultimately will have impact. I think that layer of education needs to receive a lot more emphasis compared to what we already see in terms of this increased skillset on just the sheer technical side of it. >> You mentioned that you teach-- >> Claudia: Mm,hmm. >> Before we went live here, that you teach at NYU, but you're also teaching Data Science to the business folks. I would love for you to expand a little bit more upon that and how are you helping to educate these people to understand the impact. Cuz that's really, really a change agent within the company. That's a cultural change, which is really challenging-- >> Claudia: Very much so. >> Lisa: What's their perception? What's their interest in understanding how this can really drive value? >> What you see, I've been teaching this course for almost six years now, and originally it was really kind of the hardcore coders who also happened to get a PhD on the side, who came to the course. Now you increasingly have a very broad collection of business minded people. I typically teach in the part-time, meaning they all have day jobs and they've realized in their day jobs, I need this. I need that. That skill. That knowledge. We're trying to get on the ground where without having to teach them python and ARM whatever the new toys are there. How can you identify opportunities? How do you know which of the many different flavors of Data Science, from prediction towards visualization to just analyzing historical data to maybe even causality. Which of these tools is appropriate for the task at hand and then being able to evaluate whether the level of support that a machine can only bring, is it even sufficient? Because often just because you can analyze data doesn't mean that the reliability of the model is truly sufficient to support then a downstream business project. Being able to really understand those trade offs without necessarily being able to sit down and code it yourself. That knowledge has become a lot more valuable and I really enjoy the brainstorming when we're just trying to scope a project when they come with problems from their day job and say, "Hey, we're trying to do that." And saying, "Are you really trying to do that?" "What are you actually able to execute? "What kind of decisions can you make?" This is almost like the brainstorming in my own company now brought out to much broader people working in hospitals, people working in banking, so I get exposed to all of these kinds of problems said and that makes it really exciting for me. >> Lisa: Interesting. When Dstillery is talking to customer or prospective customers, is this now something that you're finding is a board level conversation within businesses? >> Claudia: No, I never get bored of that, so there is a part of the business that is pretty well understood and executed. You come to us, you give us money, and we will execute a digital campaign, either on mobile phones, on video, and you tell me what it is that you want me to optimize for. Do you want people to click on your ad? Please don't say yes, that's the worst possible things you may ask me to do-- (laughter) But let's talk about what you're going to measure, whether you want people to show up in your store, whether you really care about signing up for a test drive, and then the system automatically will build all the models that then do all the real-time bidding. Advertising, I'm not sure how many people are aware, as your New York Times page loads, every single ad slot on that side is sold in a real-time auction. About 50 billion times a day, we receive a request whether we want to bid on the opportunity to show somebody an ad. >> Lisa: Wow. >> So that piece, I can't make 50 billion decisions a day. >> Lisa: Right. >> It is entirely automated. There's this fully automated machine learning that just serves that purpose. What makes it interesting for me now that ... Now this is kind of standard fare if you want to move over and is more interesting parts. Well, can you for instance predict which of the 15 different creatives I have for Jobani, should I show you? >> Lisa: Mm,hmm. >> The one with the woman running, or the one with the kid opening, so there is no nuances to it and exploring these new challenges or going into totally new areas talking about, for instance churn prediction, I know an awful lot about people, I can predict very many things and a lot of them go far beyond just how you interact with ads, it's almost the most boring part. We can see people researching diabetes. We can provide snapshots to farmer telling them here's really where we see a rise of activity on a certain topic and maybe this is something of interest to understand which population is driving those changes. These kinds of conversations really making it exciting for me to bring the knowledge of what I see back to many different constituents and see what kind of problems we can possibly support with that. >> Lisa: It's interesting too. It sounds like more, not just providing ad technology to customers-- >> Claudia: Yeah. >> You're really helping them understand where they should be looking to drive value for their businesses. >> Claudia: That's really been the focus increasingly and I enjoy that a lot. >> Lisa: I can imagine that, that's quite interesting. Want to ask you a little bit before we wrap up here about your talk today. I was looking at your, the title of your abstract is, "Beware what you ask for: The secret life of predictive models". (laughter) Talk to us about some of the lessons you learn when things have gone a little bit, huh, I didn't expect that. >> I'm a huge fan of predictive modeling. I love the capabilities and what this technology can do. This being said, it's a collection of aha moments where you're looking at this and this, this doesn't really smell right. To give you an example from ad tech, and I alluded to this, when people say, "Okay we want a high click through rate." Yes, that means I have to predict who will click on an ad. And then you realize that no matter what the campaign, no matter what the product, the model always chooses to show the ad on the flashlight app. Yeah, because that's when people fumble in the dark. The model's really, really good at predicting when people are likely to click on an ad, except that's really not what you intended-- >> Right. >> When you asked me to do that. >> Right. >> So it's almost the best and powerful that they move off into a sidetracked direction you didn't even know existed. Something similar happened with one of these competitions that I won. For Siemens Medical where you had to identify an FMI images of breast, which of these regions are most likely benign or which one have cancer. In both models we did really, really well, all was good. Until we realized that the patient ID was by far the most predictive feature. Now this really shouldn't happen. Your social security number shouldn't be able to predict-- >> Lisa: Right. >> Anything really. It wasn't the social security number, but when we started looking a little bit deeper, we realized what had happened is the data set was a sample from different sources, and one was a treatment center, and one was a screening center and they had certain ranges of patient IDs, so the model had learned where the machine stood, not what the image actually contained about the probability of having cancer. Whoever assembled the data set possibly didn't think about the downstream effect this can have on modeling-- >> Right. >> Which brings us back to the data science skill as really comprehensive starting all the way from the beginning of where the data is collected, all the way down to be extremely skeptical about your own work and really make sure that it truly reflects what you want it to do. You asked earlier like what makes really good Data Scientists. The intuition to feel when something is wrong and to be able to pinpoint and trace it back with the curiosity of really needing to understand everything about the whole process. >> Lisa: And also being not only being able to communicate it, but probably being willing to fail. >> Claudia: That is the number one really requirement. If you want to have a data-driven culture, you have to embrace failure, because otherwise you will fail. >> Lisa: How do you find the reception (laughter) to that fact by your business students. Is that something that they're used to hearing or does it sound like a foreign language to them? >> I think the majority of them are in junior enough positions that they-- >> Lisa: Okay. >> Truly embrace that and if at all, they have come across the fact that they weren't allowed to fail as often as they had wanted to. I think once you go into the higher levels of conversation and we see that a lot in the ad tech industry where you have incentive problems. We see a lot of fraud being targeted. At the end of the day, the ad agency doesn't want to confess to the client that yeah they just wasted five million dollars-- >> Lisa: Right. >> Of ad spend on bots, and even the CMO might not be feeling very comfortable confessing that to the CO-- >> Right. >> Claudia: Being willing to truly face up the truth that sometimes data forces you into your face, that can be quite difficult for a company or even an industry. >> Lisa: Yes, it can. It's quite revolutionary. As is this event, so Claudia Perlich we thank you so much for joining us-- >> My pleasure. >> Lisa: On theCUBE today and we know that you're going to be mentoring a lot of people that are here. We thank you for watching theCUBE. We are live at Stanford University from the Women in Data Science Conference. I am Lisa Martin and we'll be right back (upbeat music)

Published Date : Feb 3 2017

SUMMARY :

covering the Women in Data We are joined by one of the Claudia: Thank you so being the face to talk about changed to mostly online, and one of the interesting It's really the scientific that are looking to sell their products. Lisa: Right. to the Super Bowl. around with their phones. demand for the digital ads is the best job to apply for in 2017. before the term really existed. Now it has a name. Right and the reputation to be fun and corporate culture to the the management level to and how are you helping and I really enjoy the brainstorming to customer or prospective customers, on the opportunity to show somebody an ad. So that piece, I can't make Well, can you for instance predict of interest to understand which population ad technology to customers-- be looking to drive value and I enjoy that a lot. of the lessons you learn the model always chooses to show the ad So it's almost the best and powerful happened is the data set was and to be able to able to communicate it, Claudia: That is the Lisa: How do you find the reception I think once you go into the to truly face up the truth we thank you so much for joining us-- from the Women in Data Science Conference.

ENTITIES

Entity	Category	Confidence
Claudia Perlich	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Lisa	PERSON	0.99+
Claudia	PERSON	0.99+
2017	DATE	0.99+
Candy Crush	TITLE	0.99+
Silicon Valley	LOCATION	0.99+
Siemens Medical	ORGANIZATION	0.99+
Dstillery	ORGANIZATION	0.99+
New York	LOCATION	0.99+
Super Bowl	EVENT	0.99+
Super Bowl	EVENT	0.99+
Walmart Labs	ORGANIZATION	0.99+
IBM Watson Research Lab	ORGANIZATION	0.99+
Jobani	PERSON	0.99+
five million dollars	QUANTITY	0.99+
both models	QUANTITY	0.99+
both sides	QUANTITY	0.99+
single	QUANTITY	0.99+
today	DATE	0.99+
15 different creatives	QUANTITY	0.98+
One	QUANTITY	0.97+
#WiDS2017	EVENT	0.97+
about two years	QUANTITY	0.97+
ARM	ORGANIZATION	0.97+
Women in Data Science Conference 2017	EVENT	0.97+
Women in Data Science Conference	EVENT	0.97+
Women in Data Science	EVENT	0.96+
one	QUANTITY	0.96+
Media6Degrees	ORGANIZATION	0.96+
About 50 billion times a day	QUANTITY	0.95+
Forbes	ORGANIZATION	0.95+
Stanford University	ORGANIZATION	0.93+
50 billion decisions a day	QUANTITY	0.92+
Women in Data Science 2017	EVENT	0.92+
Beware what you ask for: The secret life of predictive models	TITLE	0.9+
IBM Watson	ORGANIZATION	0.89+
theCUBE	ORGANIZATION	0.89+
almost six years	QUANTITY	0.88+
one day	QUANTITY	0.86+
Stanford University	ORGANIZATION	0.84+
NYU	ORGANIZATION	0.82+
single ad	QUANTITY	0.72+
python	ORGANIZATION	0.66+
second annual	QUANTITY	0.62+
one of the speakers	QUANTITY	0.61+
New York Times	TITLE	0.6+
dozen	QUANTITY	0.56+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for NYU: