Computer Science & Space Exploration | Exascale Day

>>from around the globe. It's the Q. With digital coverage >>of exa scale day made possible by Hewlett Packard Enterprise. We're back at the celebration of Exa Scale Day. This is Dave Volant, and I'm pleased to welcome to great guests Brian Dance Berries Here. Here's what The ISS Program Science office at the Johnson Space Center. And Dr Mark Fernandez is back. He's the Americas HPC technology officer at Hewlett Packard Enterprise. Gentlemen, welcome. >>Thank you. Yeah, >>well, thanks for coming on. And, Mark, Good to see you again. And, Brian, I wonder if we could start with you and talk a little bit about your role. A T. I s s program Science office as a scientist. What's happening these days? What are you working on? >>Well, it's been my privilege the last few years to be working in the, uh, research integration area of of the space station office. And that's where we're looking at all of the different sponsors NASA, the other international partners, all the sponsors within NASA, and, uh, prioritizing what research gets to go up to station. What research gets conducted in that regard. And to give you a feel for the magnitude of the task, but we're coming up now on November 2nd for the 20th anniversary of continuous human presence on station. So we've been a space faring society now for coming up on 20 years, and I would like to point out because, you know, as an old guy myself, it impresses me. That's, you know, that's 25% of the US population. Everybody under the age of 20 has never had a moment when they were alive and we didn't have people living and working in space. So Okay, I got off on a tangent there. We'll move on in that 20 years we've done 3000 experiments on station and the station has really made ah, miraculously sort of evolution from, ah, basic platform, what is now really fully functioning national lab up there with, um, commercially run research facilities all the time. I think you can think of it as the world's largest satellite bus. We have, you know, four or five instruments looking down, measuring all kinds of things in the atmosphere during Earth observation data, looking out, doing astrophysics, research, measuring cosmic rays, X ray observatory, all kinds of things, plus inside the station you've got racks and racks of experiments going on typically scores, you know, if not more than 50 experiments going on at any one time. So, you know, the topic of this event is really important. Doesn't NASA, you know, data transmission Up and down, all of the cameras going on on on station the experiments. Um, you know, one of one of those astrophysics observatory's you know, it has collected over 15 billion um uh, impact data of cosmic rays. And so the massive amounts of data that that needs to be collected and transferred for all of these experiments to go on really hits to the core. And I'm glad I'm able toe be here and and speak with you today on this. This topic. >>Well, thank you for that, Bryan. A baby boomer, right? Grew up with the national pride of the moon landing. And of course, we've we've seen we saw the space shuttle. We've seen international collaboration, and it's just always been something, you know, part of our lives. So thank you for the great work that you guys were doing their mark. You and I had a great discussion about exa scale and kind of what it means for society and some of the innovations that we could maybe expect over the coming years. Now I wonder if you could talk about some of the collaboration between what you guys were doing and Brian's team. >>Uh, yeah, so yes, indeed. Thank you for having me early. Appreciate it. That was a great introduction. Brian, Uh, I'm the principal investigator on Space Born computer, too. And as the two implies, where there was one before it. And so we worked with Bryant and his team extensively over the past few years again high performance computing on board the International Space Station. Brian mentioned the thousands of experiments that have been done to date and that there are currently 50 orm or going on at any one time. And those experiments collect data. And up until recently, you've had to transmit that data down to Earth for processing. And that's a significant amount of bandwidth. Yeah, so with baseball and computer to we're inviting hello developers and others to take advantage of that onboard computational capability you mentioned exa scale. We plan to get the extra scale next year. We're currently in the era that's called PETA scale on. We've been in the past scale era since 2000 and seven, so it's taken us a while to make it that next lead. Well, 10 years after Earth had a PETA scale system in 2017 were able to put ah teraflop system on the International space station to prove that we could do a trillion calculations a second in space. That's where the data is originating. That's where it might be best to process it. So we want to be able to take those capabilities with us. And with H. P. E. Acting as a wonderful partner with Brian and NASA and the space station, we think we're able to do that for many of these experiments. >>It's mind boggling you were talking about. I was talking about the moon landing earlier and the limited power of computing power. Now we've got, you know, water, cool supercomputers in space. I'm interested. I'd love to explore this notion of private industry developing space capable computers. I think it's an interesting model where you have computer companies can repurpose technology that they're selling obviously greater scale for space exploration and apply that supercomputing technology instead of having government fund, proprietary purpose built systems that air. Essentially, you use case, if you will. So, Brian, what are the benefits of that model? The perhaps you wouldn't achieve with governments or maybe contractors, you know, kind of building these proprietary systems. >>Well, first of all, you know, any any tool, your using any, any new technology that has, you know, multiple users is going to mature quicker. You're gonna have, you know, greater features, greater capabilities, you know, not even talking about computers. Anything you're doing. So moving from, you know, governor government is a single, um, you know, user to off the shelf type products gives you that opportunity to have things that have been proven, have the technology is fully matured. Now, what had to happen is we had to mature the space station so that we had a platform where we could test these things and make sure they're gonna work in the high radiation environments, you know, And they're gonna be reliable, because first, you've got to make sure that that safety and reliability or taken care of so that that's that's why in the space program you're gonna you're gonna be behind the times in terms of the computing power of the equipment up there because, first of all and foremost, you needed to make sure that it was reliable and say, Now, my undergraduate degree was in aerospace engineering and what we care about is aerospace engineers is how heavy is it, how big and bulky is it because you know it z expensive? You know, every pound I once visited Gulfstream Aerospace, and they would pay their employees $1000 that they could come up with a way saving £1 in building that aircraft. That means you have more capacity for flying. It's on the orders of magnitude. More important to do that when you're taking payloads to space. So you know, particularly with space born computer, the opportunity there to use software and and check the reliability that way, Uh, without having to make the computer, you know, radiation resistance, if you will, with heavy, you know, bulky, um, packaging to protect it from that radiation is a really important thing, and it's gonna be a huge advantage moving forward as we go to the moon and on to Mars. >>Yeah, that's interesting. I mean, your point about cots commercial off the shelf technology. I mean, that's something that obviously governments have wanted to leverage for a long, long time for many, many decades. But but But Mark the issue was always the is. Brian was just saying the very stringent and difficult requirements of space. Well, you're obviously with space Born one. You got to the point where you had visibility of the economics made sense. It made commercial sense for companies like Hewlett Packard Enterprise. And now we've sort of closed that gap to the point where you're sort of now on that innovation curve. What if you could talk about that a little bit? >>Yeah, absolutely. Brian has some excellent points, you know, he said, anything we do today and requires computers, and that's absolutely correct. So I tell people that when you go to the moon and when you go to the Mars, you probably want to go with the iPhone 10 or 11 and not a flip phone. So before space born was sent up, you went with 2000 early two thousands computing technology there which, like you said many of the people born today weren't even around when the space station began and has been occupied so they don't even know how to program or use that type of computing. Power was based on one. We sent the exact same products that we were shipping to customers today, so they are current state of the art, and we had a mandate. Don't touch the hardware, have all the protection that you can via software. So that's what we've done. We've got several philosophical ways to do that. We've implemented those in software. They've been successful improving in the space for one, and now it's space born to. We're going to begin the experiments so that the rest of the community so that the rest of the community can figure out that it is economically viable, and it will accelerate their research and progress in space. I'm most excited about that. Every venture into space as Brian mentioned will require some computational capability, and HP has figured out that the economics air there we need to bring the customers through space ball into in order for them to learn that we are reliable but current state of the art, and that we could benefit them and all of humanity. >>Guys, I wanna ask you kind of a two part question. And, Brian, I'll start with you and it z somewhat philosophical. Uh, I mean, my understanding was and I want to say this was probably around the time of the Bush administration w two on and maybe certainly before that, but as technology progress, there was a debate about all right, Should we put our resource is on moon because of the proximity to Earth? Or should we, you know, go where no man has gone before and or woman and get to Mars? Where What's the thinking today, Brian? On that? That balance between Moon and Mars? >>Well, you know, our plans today are are to get back to the moon by 2024. That's the Artemus program. Uh, it's exciting. It makes sense from, you know, an engineering standpoint. You take, you know, you take baby steps as you continue to move forward. And so you have that opportunity, um, to to learn while you're still, you know, relatively close to home. You can get there in days, not months. If you're going to Mars, for example, toe have everything line up properly. You're looking at a multi year mission you know, it may take you nine months to get there. Then you have to wait for the Earth and Mars to get back in the right position to come back on that same kind of trajectory. So you have toe be there for more than a year before you can turn around and come back. So, you know, he was talking about the computing power. You know, right now that the beautiful thing about the space station is, it's right there. It's it's orbiting above us. It's only 250 miles away. Uh, so you can test out all of these technologies. You can rely on the ground to keep track of systems. There's not that much of a delay in terms of telemetry coming back. But as you get to the moon and then definitely is, you get get out to Mars. You know, there are enough minutes delay out there that you've got to take the computing power with you. You've got to take everything you need to be able to make those decisions you need to make because there's not time to, um, you know, get that information back on the ground, get back get it back to Earth, have people analyze the situation and then tell you what the next step is to do. That may be too late. So you've got to think the computing power with you. >>So extra scale bring some new possibilities. Both both for, you know, the moon and Mars. I know Space Born one did some simulations relative. Tomorrow we'll talk about that. But But, Brian, what are the things that you hope to get out of excess scale computing that maybe you couldn't do with previous generations? >>Well, you know, you know, market on a key point. You know, bandwidth up and down is, of course, always a limitation. In the more computing data analysis you can do on site, the more efficient you could be with parsing out that that bandwidth and to give you ah, feel for just that kind of think about those those observatory's earth observing and an astronomical I was talking about collecting data. Think about the hours of video that are being recorded daily as the astronauts work on various things to document what they're doing. They many of the biological experiments, one of the key key pieces of data that's coming back. Is that video of the the microbes growing or the plants growing or whatever fluid physics experiments going on? We do a lot of colloids research, which is suspended particles inside ah liquid. And that, of course, high speed video. Is he Thio doing that kind of research? Right now? We've got something called the I s s experience going on in there, which is basically recording and will eventually put out a syriza of basically a movie on virtual reality recording. That kind of data is so huge when you have a 360 degree camera up there recording all of that data, great virtual reality, they There's still a lot of times bringing that back on higher hard drives when the space six vehicles come back to the Earth. That's a lot of data going on. We recorded videos all the time, tremendous amount of bandwidth going on. And as you get to the moon and as you get further out, you can a man imagine how much more limiting that bandwidth it. >>Yeah, We used to joke in the old mainframe days that the fastest way to get data from point a to Point B was called C Tam, the Chevy truck access method. Just load >>up a >>truck, whatever it was, tapes or hard drive. So eso and mark, of course space born to was coming on. Spaceport one really was a pilot, but it proved that the commercial computers could actually work for long durations in space, and the economics were feasible. Thinking about, you know, future missions and space born to What are you hoping to accomplish? >>I'm hoping to bring. I'm hoping to bring that success from space born one to the rest of the community with space born to so that they can realize they can do. They're processing at the edge. The purpose of exploration is insight, not data collection. So all of these experiments begin with data collection. Whether that's videos or samples are mold growing, etcetera, collecting that data, we must process it to turn it into information and insight. And the faster we can do that, the faster we get. Our results and the better things are. I often talk Thio College in high school and sometimes grammar school students about this need to process at the edge and how the communication issues can prevent you from doing that. For example, many of us remember the communications with the moon. The moon is about 250,000 miles away, if I remember correctly, and the speed of light is 186,000 miles a second. So even if the speed of light it takes more than a second for the communications to get to the moon and back. So I can remember being stressed out when Houston will to make a statement. And we were wondering if the astronauts could answer Well, they answered as soon as possible. But that 1 to 2 second delay that was natural was what drove us crazy, which made us nervous. We were worried about them in the success of the mission. So Mars is millions of miles away. So flip it around. If you're a Mars explorer and you look out the window and there's a big red cloud coming at you that looks like a tornado and you might want to do some Mars dust storm modeling right then and there to figure out what's the safest thing to do. You don't have the time literally get that back to earth have been processing and get you the answer back. You've got to take those computational capabilities with you. And we're hoping that of these 52 thousands of experiments that are on board, the SS can show that in order to better accomplish their missions on the moon. And Omar, >>I'm so glad you brought that up because I was gonna ask you guys in the commercial world everybody talks about real time. Of course, we talk about the real time edge and AI influencing and and the time value of data I was gonna ask, you know, the real time, Nous, How do you handle that? I think Mark, you just answered that. But at the same time, people will say, you know, the commercial would like, for instance, in advertising. You know, the joke the best. It's not kind of a joke, but the best minds of our generation tryingto get people to click on ads. And it's somewhat true, unfortunately, but at any rate, the value of data diminishes over time. I would imagine in space exploration where where you're dealing and things like light years, that actually there's quite a bit of value in the historical data. But, Mark, you just You just gave a great example of where you need real time, compute capabilities on the ground. But but But, Brian, I wonder if I could ask you the value of this historic historical data, as you just described collecting so much data. Are you? Do you see that the value of that data actually persists over time, you could go back with better modeling and better a i and computing and actually learn from all that data. What are your thoughts on that, Brian? >>Definitely. I think the answer is yes to that. And, you know, as part of the evolution from from basically a platform to a station, we're also learning to make use of the experiments in the data that we have there. NASA has set up. Um, you know, unopened data access sites for some of our physical science experiments that taking place there and and gene lab for looking at some of the biological genomic experiments that have gone on. And I've seen papers already beginning to be generated not from the original experimenters and principal investigators, but from that data set that has been collected. And, you know, when you're sending something up to space and it to the space station and volume for cargo is so limited, you want to get the most you can out of that. So you you want to be is efficient as possible. And one of the ways you do that is you collect. You take these earth observing, uh, instruments. Then you take that data. And, sure, the principal investigators air using it for the key thing that they designed it for. But if that data is available, others will come along and make use of it in different ways. >>Yeah, So I wanna remind the audience and these these these air supercomputers, the space born computers, they're they're solar powered, obviously, and and they're mounted overhead, right? Is that is that correct? >>Yeah. Yes. Space borne computer was mounted in the overhead. I jokingly say that as soon as someone could figure out how to get a data center in orbit, they will have a 50 per cent denser data station that we could have down here instead of two robes side by side. You can also have one overhead on. The power is free. If you can drive it off a solar, and the cooling is free because it's pretty cold out there in space, so it's gonna be very efficient. Uh, space borne computer is the most energy efficient computer in existence. Uh, free electricity and free cooling. And now we're offering free cycles through all the experimenters on goal >>Eso Space born one exceeded its mission timeframe. You were able to run as it was mentioned before some simulations for future Mars missions. And, um and you talked a little bit about what you want to get out of, uh, space born to. I mean, are there other, like, wish list items, bucket bucket list items that people are talking about? >>Yeah, two of them. And these air kind of hypothetical. And Brian kind of alluded to them. Uh, one is having the data on board. So an example that halo developers talk to us about is Hey, I'm on Mars and I see this mold growing on my potatoes. That's not good. So let me let me sample that mold, do a gene sequencing, and then I've got stored all the historical data on space borne computer of all the bad molds out there and let me do a comparison right then and there before I have dinner with my fried potato. So that's that's one. That's very interesting. A second one closely related to it is we have offered up the storage on space borne computer to for all of your raw data that we process. So, Mr Scientist, if if you need the raw data and you need it now, of course, you can have it sent down. But if you don't let us just hold it there as long as they have space. And when we returned to Earth like you mentioned, Patrick will ship that solid state disk back to them so they could have a new person, but again, reserving that network bandwidth, uh, keeping all that raw data available for the entire duration of the mission so that it may have value later on. >>Great. Thank you for that. I want to end on just sort of talking about come back to the collaboration between I S s National Labs and Hewlett Packard Enterprise, and you've got your inviting project ideas using space Bourne to during the upcoming mission. Maybe you could talk about what that's about, and we have A We have a graphic we're gonna put up on DSM information that you can you can access. But please, mark share with us what you're planning there. >>So again, the collaboration has been outstanding. There. There's been a mention off How much savings is, uh, if you can reduce the weight by a pound. Well, our partners ice s national lab and NASA have taken on that cost of delivering baseball in computer to the international space station as part of their collaboration and powering and cooling us and giving us the technical support in return on our side, we're offering up space borne computer to for all the onboard experiments and all those that think they might be wanting doing experiments on space born on the S s in the future to take advantage of that. So we're very, very excited about that. >>Yeah, and you could go toe just email space born at hp dot com on just float some ideas. I'm sure at some point there'll be a website so you can email them or you can email me david dot volonte at at silicon angle dot com and I'll shoot you that that email one or that website once we get it. But, Brian, I wanna end with you. You've been so gracious with your time. Uh, yeah. Give us your final thoughts on on exa scale. Maybe how you're celebrating exa scale day? I was joking with Mark. Maybe we got a special exa scale drink for 10. 18 but, uh, what's your final thoughts, Brian? >>Uh, I'm going to digress just a little bit. I think I think I have a unique perspective to celebrate eggs a scale day because as an undergraduate student, I was interning at Langley Research Center in the wind tunnels and the wind tunnel. I was then, um, they they were very excited that they had a new state of the art giant room size computer to take that data we way worked on unsteady, um, aerodynamic forces. So you need a lot of computation, and you need to be ableto take data at a high bandwidth. To be able to do that, they'd always, you know, run their their wind tunnel for four or five hours. Almost the whole shift. Like that data and maybe a week later, been ableto look at the data to decide if they got what they were looking for? Well, at the time in the in the early eighties, this is definitely the before times that I got there. They had they had that computer in place. Yes, it was a punchcard computer. It was the one time in my life I got to put my hands on the punch cards and was told not to drop them there. Any trouble if I did that. But I was able thio immediately after, uh, actually, during their run, take that data, reduce it down, grabbed my colored pencils and graph paper and graph out coefficient lift coefficient of drag. Other things that they were measuring. Take it back to them. And they were so excited to have data two hours after they had taken it analyzed and looked at it just pickled them. Think that they could make decisions now on what they wanted to do for their next run. Well, we've come a long way since then. You know, extra scale day really, really emphasizes that point, you know? So it really brings it home to me. Yeah. >>Please, no, please carry on. >>Well, I was just gonna say, you know, you talked about the opportunities that that space borne computer provides and and Mark mentioned our colleagues at the I S s national lab. You know, um, the space station has been declared a national laboratory, and so about half of the, uh, capabilities we have for doing research is a portion to the national lab so that commercial entities so that HP can can do these sorts of projects and universities can access station and and other government agencies. And then NASA can focus in on those things we want to do purely to push our exploration programs. So the opportunities to take advantage of that are there marks opening up the door for a lot of opportunities. But others can just Google S s national laboratory and find some information on how to get in the way. Mark did originally using s national lab to maybe get a good experiment up there. >>Well, it's just astounding to see the progress that this industry is made when you go back and look, you know, the early days of supercomputing to imagine that they actually can be space born is just tremendous. Not only the impacts that it can have on Space six exploration, but also society in general. Mark Wayne talked about that. Guys, thanks so much for coming on the Cube and celebrating Exa scale day and helping expand the community. Great work. And, uh, thank you very much for all that you guys dio >>Thank you very much for having me on and everybody out there. Let's get the XO scale as quick as we can. Appreciate everything you all are >>doing. Let's do it. >>I've got a I've got a similar story. Humanity saw the first trillion calculations per second. Like I said in 1997. And it was over 100 racks of computer equipment. Well, space borne one is less than fourth of Iraq in only 20 years. So I'm gonna be celebrating exa scale day in anticipation off exa scale computers on earth and soon following within the national lab that exists in 20 plus years And being on Mars. >>That's awesome. That mark. Thank you for that. And and thank you for watching everybody. We're celebrating Exa scale day with the community. The supercomputing community on the Cube Right back

Published Date : Oct 16 2020

SUMMARY :

It's the Q. With digital coverage We're back at the celebration of Exa Scale Day. Thank you. And, Mark, Good to see you again. And to give you a feel for the magnitude of the task, of the collaboration between what you guys were doing and Brian's team. developers and others to take advantage of that onboard computational capability you with governments or maybe contractors, you know, kind of building these proprietary off the shelf type products gives you that opportunity to have things that have been proven, have the technology You got to the point where you had visibility of the economics made sense. So I tell people that when you go to the moon Or should we, you know, go where no man has gone before and or woman and You've got to take everything you need to be able to make those decisions you need to make because there's not time to, for, you know, the moon and Mars. the more efficient you could be with parsing out that that bandwidth and to give you ah, B was called C Tam, the Chevy truck access method. future missions and space born to What are you hoping to accomplish? get that back to earth have been processing and get you the answer back. the time value of data I was gonna ask, you know, the real time, And one of the ways you do that is you collect. If you can drive it off a solar, and the cooling is free because it's pretty cold about what you want to get out of, uh, space born to. So, Mr Scientist, if if you need the raw data and you need it now, that's about, and we have A We have a graphic we're gonna put up on DSM information that you can is, uh, if you can reduce the weight by a pound. so you can email them or you can email me david dot volonte at at silicon angle dot com and I'll shoot you that state of the art giant room size computer to take that data we way Well, I was just gonna say, you know, you talked about the opportunities that that space borne computer provides And, uh, thank you very much for all that you guys dio Thank you very much for having me on and everybody out there. Let's do it. Humanity saw the first trillion calculations And and thank you for watching everybody.

ENTITIES

Entity	Category	Confidence
Brian	PERSON	0.99+
Mark	PERSON	0.99+
Mark Wayne	PERSON	0.99+
Bryan	PERSON	0.99+
NASA	ORGANIZATION	0.99+
1997	DATE	0.99+
Mars	LOCATION	0.99+
Bryant	PERSON	0.99+
Earth	LOCATION	0.99+
Dave Volant	PERSON	0.99+
£1	QUANTITY	0.99+
Hewlett Packard Enterprise	ORGANIZATION	0.99+
360 degree	QUANTITY	0.99+
3000 experiments	QUANTITY	0.99+
2017	DATE	0.99+
two	QUANTITY	0.99+
Patrick	PERSON	0.99+
five hours	QUANTITY	0.99+
nine months	QUANTITY	0.99+
November 2nd	DATE	0.99+
HP	ORGANIZATION	0.99+
25%	QUANTITY	0.99+
Tomorrow	DATE	0.99+
I S s National Labs	ORGANIZATION	0.99+
50 per cent	QUANTITY	0.99+
next year	DATE	0.99+
20 years	QUANTITY	0.99+
iPhone 10	COMMERCIAL_ITEM	0.99+
four	QUANTITY	0.99+
2024	DATE	0.99+
1	QUANTITY	0.99+
today	DATE	0.99+
earth	LOCATION	0.99+
a week later	DATE	0.99+
two part	QUANTITY	0.99+
Omar	PERSON	0.99+
2000	DATE	0.99+
Thio College	ORGANIZATION	0.99+
11	COMMERCIAL_ITEM	0.99+
more than a second	QUANTITY	0.99+
10. 18	QUANTITY	0.99+
one time	QUANTITY	0.99+
2 second	QUANTITY	0.99+
Both	QUANTITY	0.99+
over 100 racks	QUANTITY	0.98+

Leicester Clinical Data Science Initiative

>>Hello. I'm Professor Toru Suzuki Cherif cardiovascular medicine on associate dean of the College of Life Sciences at the University of Leicester in the United Kingdom, where I'm also director of the Lester Life Sciences accelerator. I'm also honorary consultant cardiologist within our university hospitals. It's part of the national health system NHS Trust. Today, I'd like to talk to you about our Lester Clinical Data Science Initiative. Now brief background on Lester. It's university in hospitals. Lester is in the center of England. The national health system is divided depending on the countries. The United Kingdom, which is comprised of, uh, England, Scotland to the north, whales to the west and Northern Ireland is another part in a different island. But national health system of England is what will be predominantly be discussed. Today has a history of about 70 years now, owing to the fact that we're basically in the center of England. Although this is only about one hour north of London, we have a catchment of about 100 miles, which takes us from the eastern coast of England, bordering with Birmingham to the west north just south of Liverpool, Manchester and just south to the tip of London. We have one of the busiest national health system trust in the United Kingdom, with a catchment about 100 miles and one million patients a year. Our main hospital, the General Hospital, which is actually called the Royal Infirmary, which can has an accident and emergency, which means Emergency Department is that has one of the busiest emergency departments in the nation. I work at Glen Field Hospital, which is one of the main cardiovascular hospitals of the United Kingdom and Europe. Academically, the Medical School of the University of Leicester is ranked 20th in the world on Lee, behind Cambridge, Oxford Imperial College and University College London. For the UK, this is very research. Waited, uh, ranking is Therefore we are very research focused universities as well for the cardiovascular research groups, with it mainly within Glenn Field Hospital, we are ranked as the 29th Independent research institution in the world which places us. A Suffield waited within our group. As you can see those their top ranked this is regardless of cardiology, include institutes like the Broad Institute and Whitehead Institute. Mitt Welcome Trust Sanger, Howard Hughes Medical Institute, Kemble, Cold Spring Harbor and as a hospital we rank within ah in this field in a relatively competitive manner as well. Therefore, we're very research focused. Hospital is well now to give you the unique selling points of Leicester. We're we're the largest and busiest national health system trust in the United Kingdom, but we also have a very large and stable as well as ethnically diverse population. The population ranges often into three generations, which allows us to do a lot of cohort based studies which allows us for the primary and secondary care cohorts, lot of which are well characterized and focused on genomics. In the past. We also have a biomedical research center focusing on chronic diseases, which is funded by the National Institutes of Health Research, which funds clinical research the hospitals of United Kingdom on we also have a very rich regional life science cluster, including med techs and small and medium sized enterprises. Now for this, the bottom line is that I am the director of the letter site left Sciences accelerator, >>which is tasked with industrial engagement in the local national sectors but not excluding the international sectors as well. Broadly, we have academics and clinicians with interest in health care, which includes science and engineering as well as non clinical researchers. And prior to the cove it outbreak, the government announced the £450 million investment into our university hospitals, which I hope will be going forward now to give you a brief background on where the scientific strategy the United Kingdom lies. Three industrial strategy was brought out a za part of the process which involved exiting the European Union, and part of that was the life science sector deal. And among this, as you will see, there were four grand challenges that were put in place a I and data economy, future of mobility, clean growth and aging society and as a medical research institute. A lot of the focus that we have been transitioning with within my group are projects are focused on using data and analytics using artificial intelligence, but also understanding how chronic diseases evolved as part of the aging society, and therefore we will be able to address these grand challenges for the country. Additionally, the national health system also has its long term plans, which we align to. One of those is digitally enabled care and that this hope you're going mainstream over the next 10 years. And to do this, what is envision will be The clinicians will be able to access and interact with patient records and care plants wherever they are with ready access to decision support and artificial intelligence, and that this will enable predictive techniques, which include linking with clinical genomic as well as other data supports, such as image ing a new medical breakthroughs. There has been what's called the Topol Review that discusses the future of health care in the United Kingdom and preparing the health care workforce for the delivery of the digital future, which clearly discusses in the end that we would be using automated image interpretation. Is using artificial intelligence predictive analytics using artificial intelligence as mentioned in the long term plans. That is part of that. We will also be engaging natural language processing speech recognition. I'm reading the genome amusing. Genomic announced this as well. We are in what is called the Midland's. As I mentioned previously, the Midland's comprised the East Midlands, where we are as Lester, other places such as Nottingham. We're here. The West Midland involves Birmingham, and here is ah collective. We are the Midlands. Here we comprise what is called the Midlands engine on the Midland's engine focuses on transport, accelerating innovation, trading with the world as well as the ultra connected region. And therefore our work will also involve connectivity moving forward. And it's part of that. It's part of our health care plans. We hope to also enable total digital connectivity moving forward and that will allow us to embrace digital data as well as collectivity. These three key words will ah Linkous our health care systems for the future. Now, to give you a vision for the future of medicine vision that there will be a very complex data set that we will need to work on, which will involve genomics Phanom ICS image ing which will called, uh oh mix analysis. But this is just meaning that is, uh complex data sets that we need to work on. This will integrate with our clinical data Platforms are bioinformatics, and we'll also get real time information of physiology through interfaces and wearables. Important for this is that we have computing, uh, processes that will now allow this kind of complex data analysis in real time using artificial intelligence and machine learning based applications to allow visualization Analytics, which could be out, put it through various user interfaces to the clinician and others. One of the characteristics of the United Kingdom is that the NHS is that we embrace data and captured data from when most citizens have been born from the cradle toe when they die to the grave. And it's important that we were able to link this data up to understand the journey of that patient. Over time. When they come to hospital, which is secondary care data, we will get disease data when they go to their primary care general practitioner, we will be able to get early check up data is Paula's follow monitoring monitoring, but also social care data. If this could be linked, allow us to understand how aging and deterioration as well as frailty, uh, encompasses thes patients. And to do this, we have many, many numerous data sets available, including clinical letters, blood tests, more advanced tests, which is genetics and imaging, which we can possibly, um, integrate into a patient journey which will allow us to understand the digital journey of that patient. I have called this the digital twin patient cohort to do a digital simulation of patient health journeys using data integration and analytics. This is a technique that has often been used in industrial manufacturing to understand the maintenance and service points for hardware and instruments. But we would be using this to stratify predict diseases. This'll would also be monitored and refined, using wearables and other types of complex data analysis to allow for, in the end, preemptive intervention to allow paradigm shifting. How we undertake medicine at this time, which is more reactive rather than proactive as infrastructure we are presently working on putting together what's it called the Data Safe haven or trusted research environment? One which with in the clinical environment, the university hospitals and curated and data manner, which allows us to enable data mining off the databases or, I should say, the trusted research environment within the clinical environment. Hopefully, we will then be able to anonymous that to allow ah used by academics and possibly also, uh, partnering industry to do further data mining and tool development, which we could then further field test again using our real world data base of patients that will be continually, uh, updating in our system. In the cardiovascular group, we have what's called the bricks cohort, which means biomedical research. Informatics Center for Cardiovascular Science, which was done, started long time even before I joined, uh, in 2010 which has today almost captured about 10,000 patients arm or who come through to Glenn Field Hospital for various treatments or and even those who have not on. We asked for their consent to their blood for genetics, but also for blood tests, uh, genomics testing, but also image ing as well as other consent. Hable medical information s so far there about 10,000 patients and we've been trying to extract and curate their data accordingly. Again, a za reminder of what the strengths of Leicester are. We have one of the largest and busiest trust with the very large, uh, patient cohort Ah, focused dr at the university, which allows for chronic diseases such as heart disease. I just mentioned our efforts on heart disease, uh which are about 10,000 patients ongoing right now. But we would wish thio include further chronic diseases such as diabetes, respiratory diseases, renal disease and further to understand the multi modality between these diseases so that we can understand how they >>interact as well. Finally, I like to talk about the lesser life science accelerator as well. This is a new project that was funded by >>the U started this January for three years. I'm the director for this and all the groups within the College of Life Sciences that are involved with healthcare but also clinical work are involved. And through this we hope to support innovative industrial partnerships and collaborations in the region, a swells nationally and further on into internationally as well. I realized that today is a talked to um, or business and commercial oriented audience. And we would welcome interest from your companies and partners to come to Leicester toe work with us on, uh, clinical health care data and to drive our agenda forward for this so that we can enable innovative research but also product development in partnership with you moving forward. Thank you for your time.

Published Date : Sep 21 2020

SUMMARY :

We have one of the busiest national health system trust in the United Kingdom, with a catchment as part of the aging society, and therefore we will be able to address these grand challenges for Finally, I like to talk about the lesser the U started this January for three years.

ENTITIES

Entity	Category	Confidence
National Institutes of Health Research	ORGANIZATION	0.99+
Howard Hughes Medical Institute	ORGANIZATION	0.99+
Birmingham	LOCATION	0.99+
2010	DATE	0.99+
Broad Institute	ORGANIZATION	0.99+
England	LOCATION	0.99+
College of Life Sciences	ORGANIZATION	0.99+
Whitehead Institute	ORGANIZATION	0.99+
United Kingdom	LOCATION	0.99+
Toru Suzuki Cherif	PERSON	0.99+
Europe	LOCATION	0.99+
London	LOCATION	0.99+
£450 million	QUANTITY	0.99+
Lester	ORGANIZATION	0.99+
three years	QUANTITY	0.99+
Oxford Imperial College	ORGANIZATION	0.99+
Leicester	LOCATION	0.99+
European Union	ORGANIZATION	0.99+
Informatics Center for Cardiovascular Science	ORGANIZATION	0.99+
Scotland	LOCATION	0.99+
Glenn Field Hospital	ORGANIZATION	0.99+
Manchester	LOCATION	0.99+
Today	DATE	0.99+
Nottingham	LOCATION	0.99+
Cold Spring Harbor	ORGANIZATION	0.99+
today	DATE	0.99+
General Hospital	ORGANIZATION	0.99+
one	QUANTITY	0.99+
Glen Field Hospital	ORGANIZATION	0.99+
Kemble	ORGANIZATION	0.99+
Royal Infirmary	ORGANIZATION	0.99+
about 100 miles	QUANTITY	0.99+
Northern Ireland	LOCATION	0.99+
Lester Life Sciences	ORGANIZATION	0.99+
Liverpool	LOCATION	0.99+
UK	LOCATION	0.98+
about 70 years	QUANTITY	0.98+
Midland	LOCATION	0.98+
about 10,000 patients	QUANTITY	0.98+
University of Leicester	ORGANIZATION	0.98+
NHS Trust	ORGANIZATION	0.98+
Mitt Welcome Trust Sanger	ORGANIZATION	0.98+
Paula	PERSON	0.98+
West Midland	LOCATION	0.98+
about 10,000 patients	QUANTITY	0.97+
East Midlands	LOCATION	0.97+
about one hour	QUANTITY	0.97+
NHS	ORGANIZATION	0.97+
20th	QUANTITY	0.97+
United Kingdom	LOCATION	0.96+
University College London	ORGANIZATION	0.96+
One	QUANTITY	0.95+
one million patients a year	QUANTITY	0.93+
Suffield	ORGANIZATION	0.92+
Three industrial strategy	QUANTITY	0.92+
three generations	QUANTITY	0.92+
Lester Clinical Data Science Initiative	ORGANIZATION	0.89+
Lee	LOCATION	0.88+
January	DATE	0.88+
Medical School of the	ORGANIZATION	0.87+
University of Leicester	ORGANIZATION	0.87+
Midlands	LOCATION	0.87+
Lester	LOCATION	0.87+
three key words	QUANTITY	0.86+
Topol Review	TITLE	0.85+
Leicester	ORGANIZATION	0.83+
Leicester Clinical Data Science Initiative	ORGANIZATION	0.82+
four grand challenges	QUANTITY	0.82+
Emergency Department	ORGANIZATION	0.8+
twin patient	QUANTITY	0.73+
29th Independent research	QUANTITY	0.69+
next 10 years	DATE	0.66+

Daphne Koller, insitro | Stanford Women in Data Science (WiDS) Conference 2020

>>live from Stanford University. It's the queue covering Stanford women in data science 2020. Brought to you by Silicon Angle Media. >>Hi! And welcome to the Cube. I'm your host, Sonia, to guard. And we're live at Stanford University covering Woods Women in Data Science Conference The fifth annual one And joining us today is Daphne Koller, who is the co founder who sorry is the CEO and founder of In Citro that Daphne. Welcome to the Cube. >>Nice to be here, Sonia. Thank you for having me. So >>tell us a little bit about in Citro how you how you got founded and more about your >>role. So I've been working in the intersection of machine learning and biology and health for quite a while, and it was always a bit of an interesting journey and that the data sets were quite small and limited. We're now in a different world where there's tools that are allowing us to create massive biological data sense that I think can help us solve really significant societal problems. And one of those problems that I think is really important is drug discovery and development, where despite many important advancements, the costs just keep going up and up and up. And the question is, can we use machine learning to solve that problem >>better? And you talk about this more in your keynote, so give us a few highlights of what you talked about. So in the last, you can think of >>drug discovery development in the last 50 to 70 years as being a bit of a glass half full glass, half empty. The glass half full is the fact that there's diseases that used to be a death sentence or of sentenced, a lifelong of pain and suffering that >>are now >>addressed by some of the modern day medicines. And I think that's absolutely amazing. The >>other side of >>it is that the cost of developing new drugs has been growing exponentially and what's come to be known as the Rooms law being the inverse of Moore's law, which is the one we're all familiar with because the number of drugs approved per 1,000,000,000 U. S. Dollars just keeps going down exponentially. So the question is, can we change that curve? >>And you talked in your keynote about the interdisciplinary culture to tell us more about that? I think in >>order to address some of the critical problems that we're facing. One needs to really build a culture of people who work together at from different disciplines, each bringing their own insights and their own ideas into the mix. So and in Citro, we actually have a company. That's half life scientists, many of whom are producing data for the purpose of driving machine learning models and the other Halford machine learning people in data scientists who are working on those. But it's not a handoff where one group produces that they then the other one consumes and interpreted. But really, they start from the very beginning to understand. What are the problems that one could solve together? How do you design the experiment? How do you build the model and how do you derive insights from that that can help us make better medicines for people? >>And, um, I also wanted to ask you the you co founded coursera, so tell us a little bit more about that platform. So I found that >>coursera as a result of work that I've been doing at Stanford, working on how technology can make education better and more accessible. This was a project that I did here, number of my colleagues as well. And at some point in the fall of 2011 there was an experiment of Let's take some of the content that we've been we've been developing within within Stanford and put it out there for people to just benefit from, and we didn't know what would happen. Would it be a few 1000 people, but within a matter of weeks with minimal advertising Other than one New York Times article that went viral, we had 100,000 people in each of those courses. And that was a moment in time where, you know, we looked at it at this and said, Can we just go back to writing more papers or is there an incredible opportunity to transform access to education to people all over the world? And so I ended up taking a what was supposed to be to really absence from Stanford to go and co found coursera, and I thought I'd go back after two years, but the But at the end of that two year period, the there was just so much more to be done and so much more impact that we could bring to people all over the world, people of both genders, people of different social economic status, every single country around the world. We just felt like this was something that I couldn't not dio. >>And how did you Why did you decide to go from an educational platform to then going into machine learning and biomedicine? >>So I've been doing Corsair for about five years in 2016 and the company was on a great trajectory. But it's primarily >>a >>a content company, and around me, machine learning was transforming the world, and I wanted to come back and be part of that. And when I looked around, I saw machine learning being applied to e commerce and the natural language and to self driving cars. But there really wasn't a lot of impact being made on the life science area. I wanted to be part of making that happen, partly because I felt like coming back to your earlier comment that in order to really have that impact, you need to have someone who speaks both languages. And while there's a new generation of researchers who are bilingual in biology and machine learning, there's still a small group in there, very few of those in kind of my age cohort and I thought that I would be able to have a real impact by bullying company in the space. >>So it sounds like your background is pretty varied. What advice would you give to women who are just starting college now who may be interested in the similar field? Would you tell them they have to major in math? Or or do you think that maybe, like there's some other majors that may be influential as well? I think >>there is a lot of ways to get into data science. Math is one of them. But there's also statistics or physics. And I would say that especially for the field that I'm currently in, which is at the intersection of machine learning data science on the one hand, and biology and health on the other one can, um, get there from biology or medicine as well. But what I think is important is not to shy away from the more mathematically oriented courses in whatever major you're in, because that foundation is a really strong one. There is ah lot of people out there who are basically lightweight consumers of data science, and they don't really understand how the methods that they're deploying, how they work and that limits thumb in their ability to advance the field and come up with new methods that are better suited, perhaps, of the problems of their tackling. So I think it's totally fine. And in fact, there's a lot of value to coming into data science from fields other than now third computer science. But I think taking courses in those fields, even while you're majoring in whatever field you're interested in, is going to make you a much better person who lives at that intersection. >>And how do you think having a technology background has helped you in in founding your companies and has helped you become a successful CEO in companies >>that are very strongly R and D, focused like like in Citro and others? Having a technical co founder is absolutely essential because it's fine to have and understanding of whatever the user needs and so on and come from the business side of it. And a lot of companies have a business co founder. But not understanding what the technology can actually do is highly limiting because you end up hallucinating. Oh, if we could only do this and that would be great. But you can't and people end up often times making ridiculous promises about what's technology will or will not do because they just don't understand where the land mines sit. And, um, and where you're going to hit reels, obstacles in the path. So I think it's really important to have a strong technical foundation in these companies. >>And that being said, Where do you see in Teacher in the future? And how do you see it solving, Say, Nash, that you talked about in your keynote. >>So we hope that in Citro will be a fully integrated drug discovery and development company that is based on a completely different foundation than a traditional pharma company where they grew up. In the old approach of that is very much a bespoke scientific um, analysis of the biology of different diseases and then going after targets are ways of dealing with the disease that are driven by human intuition. Where I think we have the opportunity to go today is to build a very data driven approach that collects massive amounts of data and then let analysis of those data really reveal new hypotheses that might not be the ones that accord with people's preconceptions of what matters and what doesn't. And so hopefully we'll be able to overtime create enough data and applying machine learning to address key bottlenecks in the drug discovery development process that we can bring better drugs to people, and we can do it faster and hopefully it much lower cost. >>That's great. And you also mention in your keynote that you think the 20 twenties is like a digital biology era, so tell us more about that. So I think if >>you look, if you take a historical perspective on science and think back, you realize that there's periods in history where one discipline has made a tremendous amount of progress in relatively short amount of time because of a new technology or a new way of looking at things in the 18 seventies, that discipline was chemistry with the understanding of the periodic table, and that you actually couldn't turn lead into gold in the 19 hundreds. That was physics with understanding the connection between matter and energy in between space and time. In the 19 fifties that was computing where silicon chips were suddenly able to perform calculations that up until that point, only people have been able to >>dio. And then in 19 nineties, >>there was an interesting bifurcation. One was three era of data, which is related to computing but also involves elements, statistics and optimization of neuroscience. And the other one was quantitative biology. In which file do you move from a descriptive signs of taxonomy izing phenomenon to really probing and measuring biology in a very detailed on high throughput way, using techniques like micro arrays that measure the activity of 20,000 genes at once, or the human genome sequencing of the human genome and many others. But >>these two fields kind of >>evolved in parallel, and what I think is coming now, 30 years later, is the convergence of those two fields into one field that I like to think of a digital biology where we are able using the tools that have and continue to be developed, measure biology, an entirely new levels of detail, of fidelity of scale. We can use the techniques of machine learning and data signs to interpret what we're seeing and then use some of the technologies that are also emerging to engineer biology to do things that it otherwise wouldn't do. And that will have implications and bio materials in energy and the environment in agriculture. And I think also in human health. And it's a incredibly exciting space toe to be in right now, because just so much is happening in the opportunities to make a difference and make the world a better place or just so large. >>That sounds awesome. Stephanie. Thank you for your insight. And thanks for being on the Cube. Thank you. I'm Sonia. Taqueria. Thanks for watching. Stay tuned for more. Okay? Great. Yeah, yeah, yeah.

Published Date : Mar 3 2020

SUMMARY :

Brought to you by Silicon Angle Media. And we're live at Stanford University covering Thank you for having me. And the question is, can we use machine learning to solve that problem So in the last, you can think of drug discovery development in the last 50 to 70 years as being a bit of a glass half full glass, And I think that's absolutely amazing. it is that the cost of developing new drugs has been growing exponentially and the other Halford machine learning people in data scientists who are working And, um, I also wanted to ask you the you co founded coursera, so tell us a little bit more about And at some point in the fall of 2011 there was an experiment the company was on a great trajectory. comment that in order to really have that impact, you need to have someone who speaks both languages. What advice would you give to women who are just starting methods that are better suited, perhaps, of the problems of their tackling. So I think it's really important to have a strong technical And that being said, Where do you see in Teacher in the future? key bottlenecks in the drug discovery development process that we can bring better drugs to people, And you also mention in your keynote that you think the 20 twenties is like the understanding of the periodic table, and that you actually couldn't turn lead into gold in And then in 19 nineties, And the other one was quantitative biology. is the convergence of those two fields into one field that I like to think of a digital biology And thanks for being on the Cube.

ENTITIES

Entity	Category	Confidence
Sonia	PERSON	0.99+
Daphne Koller	PERSON	0.99+
Stephanie	PERSON	0.99+
2016	DATE	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
20,000 genes	QUANTITY	0.99+
100,000 people	QUANTITY	0.99+
Stanford University	ORGANIZATION	0.99+
18 seventies	DATE	0.99+
Corsair	ORGANIZATION	0.99+
19 fifties	DATE	0.99+
one field	QUANTITY	0.99+
two fields	QUANTITY	0.99+
Moore	PERSON	0.99+
Daphne	PERSON	0.99+
fall of 2011	DATE	0.99+
20 twenties	DATE	0.99+
one	QUANTITY	0.99+
both genders	QUANTITY	0.99+
each	QUANTITY	0.98+
both languages	QUANTITY	0.98+
30 years later	DATE	0.97+
Taqueria	PERSON	0.97+
One	QUANTITY	0.97+
today	DATE	0.97+
Nash	PERSON	0.97+
two year	QUANTITY	0.97+
third	QUANTITY	0.97+
Stanford	ORGANIZATION	0.96+
Woods Women in Data Science Conference	EVENT	0.96+
19 hundreds	DATE	0.96+
one discipline	QUANTITY	0.96+
Halford	ORGANIZATION	0.95+
2020	DATE	0.95+
New York Times	ORGANIZATION	0.94+
about five years	QUANTITY	0.94+
Citro	ORGANIZATION	0.94+
70 years	QUANTITY	0.93+
1000 people	QUANTITY	0.93+
Stanford Women in Data Science	EVENT	0.89+
19 nineties	DATE	0.86+
one group	QUANTITY	0.77+
fifth annual one	QUANTITY	0.76+
Citro	TITLE	0.72+
WiDS) Conference 2020	EVENT	0.69+
three	QUANTITY	0.66+
single country	QUANTITY	0.65+
50	QUANTITY	0.64+
half full	QUANTITY	0.62+
two years	QUANTITY	0.6+
1,000,000,000 U. S. Dollars	QUANTITY	0.59+
in Citro	ORGANIZATION	0.53+
Rooms	TITLE	0.52+
In	ORGANIZATION	0.51+
Cube	ORGANIZATION	0.47+

Talithia Williams, Harvey Mudd College | Stanford Women in Data Science (WiDS) Conference 2020

>>live from Stanford University. It's the queue covering Stanford women in Data Science 2020. Brought to you by Silicon Angle Media >>and welcome to the Cube. I'm your host Sonia category, and we're live at Stanford University, covering the fifth annual Woods Women in Data Science conference. Joining us today is Tilapia Williams, who's the associate professor of mathematics at Harvey Mudd College and host of Nova Wonders at PBS to leave a welcome to the Cappy to be here. Thanks for having me. So you have a lot of rules. So let's first tell us about being an associate professor at Harvey Mudd. >>Yeah, I've been at Harvey Mudd now for 11 years, so it's been really a lot of fun in the math department, but I'm a statistician by training, so I teach a lot of courses and statistics and data science and things like that. >>Very cool. And you're also a host of API s show called Novo Wonders. >>Yeah, that came about a couple of years ago. Folks at PBS reached out they had seen my Ted talk, and they said, Hey, it looks like you could be fund host of this science documentary shows So, Nova Wonders, is a six episode Siri's. It kind of takes viewers on a journey of what the cutting edge questions and science are. Um, so I got to host the show with a couple other co host and really think about like, you know, what are what are the animals saying? And so we've got some really fun episodes to do. What's the universe made of? Was one of them what's living inside of us. That was definitely a gross win. Todo figure out all the different micro organisms that live inside our body. So, yeah, it's been funded in hopes that show as well. >>And you talk about data science and AI and all that stuff on >>Yeah. Oh, yeah, yeah, one of the episodes. Can we build a Brain was dealt with a lot of data, big data and artificial intelligence, and you know, how good can we get? How good can computers get and really sort of compared to what we see in the movies? We're a long way away from that, but it seems like you know we're getting better every year, building technology that is truly intelligent, >>and you gave a talk today about mining for your own personal data. So give us some highlights from your talk. Yeah, >>so that talks sort of stemmed out of the Ted talk that I gave on owning your body's data. And it's really challenging people to think about how they can use data that they collect about their bodies to help make better health decisions on DSO ways that you can use, like your temperature data or your heart rate. Dina. Or what is data say over time? What does it say about your body's health and really challenging the audience to get excited about looking at that data? We have so many devices that collect data automatically for us, and often we don't pause on enough to actually look at that historical data. And so that was what the talk was about today, like, here's what you can find when you actually sit down and look at that data. >>What's the most important data you think people should be collecting about themselves? >>Well, definitely not. Your weight is. I don't >>want to know what that >>is. Um, it depends, you know, I think for women who are in the fertile years of life taking your daily waking temperature can tell you when your body's fertile. When you're ovulating, it can. So that information could give women during that time period really critical information. But in general, I think it's just a matter of being aware of of how your body is changing. So for some people, maybe it's your blood pressure or your blood sugar. You have high blood pressure or high blood sugar. Those things become really critical to keep an eye on. And, um, and I really encourage people whatever data they take, too, the active in the understanding of an interpretation of the data. It's not like if you take this data, you'll be healthy radio. You live to 100. It's really a matter of challenging people to own the data that they have and get excited about understanding the data that they are taking. So >>absolutely put putting people in charge of their >>own bodies. That's >>right. >>And actually speaking about that in your Ted talk, you mentioned how you were. Your doctor told you to have a C section and you looked at the data and he said, No, I'm gonna have this baby naturally. So tell us more about that. >>Yes, you should always listen to your medical pressures. But in this case, I will say that it was It was definitely more of a dialogue. And so I wasn't just sort of trying to lean on the fact that, like, I have a PhD in statistics and I know data, he was really kind of objectively with the on call doctor at the time, looking at the data >>and talking about it. >>And this doctor was this is his first time seeing me. And so I think it would have been different had my personal midwife or my doctor been telling me that. But this person would have only looked at this one chart and was it was making a decision without thinking about my historical data. And so I tried to bring that to the conversation and say, like, let me tell you more about you know, my body and this is pregnancy number three like, here's how my body works. And I think this person in particular just wasn't really hearing any of that. It was like, Here's my advice. We just need to do this. I'm like, >>Oh, >>you know, and so is gently as possible. I tried to really share that data. Um, and then it got to the point where it was sort of like either you're gonna do what I say or you're gonna have to sign a waiver. And we were like, Well, to sign the waiver that cost quite a buzz in the hospital that day. But we came back and had a very successful labor and delivery. And so, yeah, >>I think >>that at the time, >>But, >>you know, with that caveat that you should listen to what, your doctors >>Yeah. I mean, there's really interesting, like, what's the boundary between, Like what the numbers tell you and what professional >>tells me Because I don't have an MD. Right. And so, you know, I'm cautious not to overstep that, but I felt like in that case, the doctor wasn't really even considering the data that I was bringing. Um, I was we were actually induced with our first son, but again, that was more of a conversation, more of a dialogue. Here's what's happening here is what we're concerned about and the data to really back it up. And so I felt like in that case, like Yeah, I'm happy to go with your suggestion, but I could number three. It was just like, No, this isn't really >>great. Um, so you also wrote a book called Power In Numbers. The Rebel Women of Mathematics. So what inspired you to write this book? And what do you hope readers take away from it? >>A couple different things. I remember when I saw the movie hidden figures. And, um, I spent three summers at NASA working at JPL, the Jet Propulsion Laboratory. And so I had this very fun connection toe, you know, having worked at NASA. And, um, when this movie came out and I'm sitting there watching it and I'm, like ball in just crying, like I didn't know that there were black women who worked at NASA like, before me, you know, um and so it felt it felt it was just so transformative for me to see these stories just sort of unfold. And I thought, like, Well, why didn't I learn about these women growing up? Like imagine, Had I known about Katherine Johnsons of the world? Maybe that would have really inspired Not just me, but, you know, thinking of all the women of color who aren't in mathematics or who don't see themselves working at at NASA. And so for me, the book was really a way to leave that legacy to the generation that's coming up and say, like, there have been women who've done mathematics, um, and statistics and data science for years, and they're women who are doing it now. So a lot of the about 1/3 of the book are women who were still here and, like, active in the field and doing great things. And so I really wanted to highlight sort of where we've been, where we've been, but also where we're going and the amazing women that are doing work in it. And it's very visual. So some things like, Oh my gosh, >>women in math >>It is really like a very picturesque book of showing this beautiful images of the women and their mathematics and their work. And yes, I'm really proud of it. >>That's awesome. And even though there is like greater diversity now in the tech industry, there's still very few African American women, especially who are part of this industry. So what advice would you give to those women who who feel like they don't belong. >>Yeah, well, a they really do belong. Um, and I think it's also incumbent of people in the industry to sort of recognize ways that they could be advocate for women, and especially for women of color, because often it takes someone who's already at the table to invite other people to the table. And I can't just walk up like move over, get out the way I'm here now. But really being thoughtful about who's not representative, how do we get those voices here? And so I think the onus is often mawr on. People who occupy those spaces are ready to think about how they can be more intentional in bringing diversity in other spaces >>and going back to your talk a little bit. Um uh, how how should people use their data? >>Yeah, so I mean, I think, um, the ways that we've used our data, um, have been to change our lifestyle practices. And so, for example, when I first got a Fitbit, um, it wasn't really that I was like, Oh, I have a goal. It was just like I want something to keep track of my steps And then I look at him and I feel like, Oh, gosh, I didn't even do anything today. And so I think having sort of even that baseline data gave me a place to say, Okay, let me see if I hit 10 stuff, you know, 10,000 >>steps in a day or >>and so, in some ways, having the data allows you to set goals. Some people come in knowing, like, I've got this goal. I want to hit it. But for me, it was just sort of like, um and so I think that's also how I've started to use additional data. So when I take my heart rate data or my pulse, I'm really trying to see if I can get lower than how it was before. So the push is really like, how is my exercise and my diet changing so that I can bring my resting heart rate down? And so having the data gives me a gold up, restore it, and it also gives me that historical information to see like, Oh, this is how far I've come. Like I can't stop there, you know, >>that's a great social impact. >>That's right. Yeah, absolutely. >>and, um, Do you think that so in terms of, like, a security and privacy point of view, like if you're recording all your personal data on these devices, how do you navigate that? >>Yeah, that's a tough one. I mean, because you are giving up that data privacy. Um, I usually make sure that the data that I'm allowing access to this sort of data that I wouldn't care if it got published on the cover of you know, the New York Times. Maybe I wouldn't want everyone to see what my weight is, but, um, and so in some ways, while it is my personal data, there's something that's a bit abstract from it. Like it could be anyone's data as opposed to, say, my DNA. Like I'm not going to do a DNA test. You know, I don't want my data to be mapped it out there for the world. Um, but I think that that's increasingly become a concern because people are giving access to of their information to different companies. It's not clear how companies would use that information, so if they're using my data to build a product will make a product better. You know we don't see any world from that way. We don't have the benefit of it, but they have access to our data. And so I think in terms of data, privacy and data ethics, there's a huge conversation to have around that. We're only kind >>of at the beginning of understanding what that is. Yeah, >>well, thank you so much for being on the Cube. Really having you here. Thank you. Thanks. So I'm Sonia to Gary. Thanks so much for watching the cube and stay tuned for more. Yeah, yeah, yeah.

Published Date : Mar 3 2020

SUMMARY :

Brought to you by Silicon Angle Media So you have a lot of rules. the math department, but I'm a statistician by training, so I teach a lot of courses and statistics and data And you're also a host of API s show called Novo Wonders. so I got to host the show with a couple other co host and really think about like, with a lot of data, big data and artificial intelligence, and you know, how good can we get? and you gave a talk today about mining for your own personal data. And so that was what the talk was about today, like, here's what you can find when you actually sit down and look at that data. I don't is. Um, it depends, you know, I think for women who are in That's And actually speaking about that in your Ted talk, you mentioned how you were. And so I wasn't just bring that to the conversation and say, like, let me tell you more about you know, my body and this is pregnancy number Um, and then it got to the point where it was sort of like either you're gonna do what I say or you're gonna have you and what professional And so I felt like in that case, like Yeah, I'm happy to go with your suggestion, And what do you hope readers take away from it? And so I had this very fun connection toe, you know, having worked at NASA. And yes, I'm really proud of it. So what advice would you give to those women who who feel like they don't belong. And so I think the onus and going back to your talk a little bit. me a place to say, Okay, let me see if I hit 10 stuff, you know, 10,000 so I think that's also how I've started to use additional data. Yeah, absolutely. And so I think in terms of data, of at the beginning of understanding what that is. well, thank you so much for being on the Cube.

ENTITIES

Entity	Category	Confidence
Tilapia Williams	PERSON	0.99+
Sonia	PERSON	0.99+
Talithia Williams	PERSON	0.99+
PBS	ORGANIZATION	0.99+
Gary	PERSON	0.99+
11 years	QUANTITY	0.99+
NASA	ORGANIZATION	0.99+
10,000	QUANTITY	0.99+
Siri	TITLE	0.99+
100	QUANTITY	0.99+
Novo Wonders	TITLE	0.99+
Jet Propulsion Laboratory	ORGANIZATION	0.99+
Power In Numbers	TITLE	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
Katherine Johnsons	PERSON	0.99+
Stanford University	ORGANIZATION	0.99+
first son	QUANTITY	0.99+
today	DATE	0.99+
Harvey Mudd College	ORGANIZATION	0.99+
first time	QUANTITY	0.99+
Dina	PERSON	0.99+
first	QUANTITY	0.99+
JPL	ORGANIZATION	0.99+
three summers	QUANTITY	0.98+
six episode	QUANTITY	0.98+
Harvey Mudd	ORGANIZATION	0.97+
So, Nova Wonders	TITLE	0.97+
one	QUANTITY	0.96+
The Rebel Women of Mathematics	TITLE	0.96+
10 stuff	QUANTITY	0.94+
New York Times	ORGANIZATION	0.94+
couple of years ago	DATE	0.93+
Stanford	ORGANIZATION	0.93+
Stanford Women in Data Science	EVENT	0.92+
Woods Women in Data Science conference	EVENT	0.92+
a day	QUANTITY	0.92+
one chart	QUANTITY	0.91+
about 1/3	QUANTITY	0.88+
Fitbit	ORGANIZATION	0.86+
pregnancy	QUANTITY	0.81+
Ted	TITLE	0.8+
hidden figures	TITLE	0.79+
fifth	QUANTITY	0.77+
Ted talk	TITLE	0.71+
African American	OTHER	0.7+
couple	QUANTITY	0.7+
WiDS) Conference 2020	EVENT	0.68+
three	QUANTITY	0.68+
number three	QUANTITY	0.67+
Nova Wonders	TITLE	0.63+
co	QUANTITY	0.63+
2020	DATE	0.5+
Data	EVENT	0.46+
Science	TITLE	0.42+
Cappy	ORGANIZATION	0.37+

Newsha Ajami, Stanford University | Stanford Women in Data Science (WiDS) Conference 2020

>>live from Stanford University. It's the queue covering Stanford women in data science 2020. Brought to you by Silicon Angle Media. >>Yeah, yeah, and welcome to the Cube. I'm your host Sonia Category and we're live at Stanford University, covering the fifth annual Woods Women in Data Science Conference. Joining us today is new Sha Ajami, who's the director of urban water policy for Stanford. You should welcome to the Cube. Thank you for having me. Absolutely. So tell us a little bit about your role. So >>I directed around water policy program at Stanford. We focused on building solutions for resilient cities to try to use data science and also the mathematical models to better understand how water use is changing and how we can build a future cities and infrastructure to address the needs of the people in the US, in California and across the world. >>That's great. And you're gonna give a talk today about how to build water security using big data. So give us a preview of your talk. >>Sure. So the 20th century water infrastructure model was very much of a >>top down model, >>so we built solutions or infrastructure to bring water to people, but people were not part of the loop. They were not the way that they behaved their decision making process. What they used, how they use it wasn't necessarily part of the process and the assume. There's enough water out there to bring water to people, and they can do whatever they want with it. So what we're trying to do is you want to change this paradigm and try to make it more bottom up at to engage people's decision making process and the uncertainty associated with that as part of the infrastructure planning process. Until I'll be talking, I'll talk a little bit about that. >>And where is the most water usage coming from? So, >>interestingly enough, in developed world, especially in the in the western United States, 50% of our water is used outdoors for grass and outdoor spacing, which we don't necessarily are dependent on. Our lives depend on it. I'll talk about the statistics and my talk, but grass is the biggest club you're going in the US while you're not really needing it for food consumption and also uses four times more water >>than than >>corn, which is which is a lot of water. And in California alone, if you just think about some of the spaces that we have grass or green spaces, we have our doors in the in. The in the malls are institutional buildings or different outdoor spaces. We have some of that water. If we can save, it can provide water for about a 1,000,000 or two million people a year. So that's a lot of water that we can be able to we can save and use, or you are actually a repurpose for needs that you really half. >>So does that also boil down to like people of watering their own lawns? Or is the problem for a much bigger grass message? >>Actually, interestingly enough, that's only 10% of that water out the water use. The rest of it is actually the residential water use, which is what you and I, the grass you and I have in our backyard and watering it so that water is even more than that amount that I mentioned. So we use a lot of water outdoors and again. Some of these green spaces are important for community building for making sure everybody has access to green spaces and people. Kids can play soccer or play outdoors, but really our individual lawns and outdoor spaces. If there are not really a native you know landscaping, it's not something that views enough to justify the amount of water you use for that purpose. >>So taking longer showers and all the stuff is very minimal compared to no, not >>at all. Sure, those are also very, very important. That's another 50% of our water. They're using that urban areas. It is important to be mindful the baby wash dishes. Maybe take shower the baby brush rt. They're not wasting water while you're doing that. And a lot of other individual decisions that we make that can impact water use on a daily basis. >>Right, So So tell us a little bit more about right now in California, We just had a dry February was the 1st 150 years, and you know, this is a huge issue for cities, agriculture and for potential wildfires. So tell us about your opinion about that. So, >>um, the 20th century's infrastructure model I mentioned at the beginning One of the flaws in that system is that it assumes that we will have enough snow in the mountains that would melt during the spring and summer time and would provide us water. The problem is, climate change has really, really impacted that assumption, and now you're not getting as much snow, which is comes back to the fact that this February we have not received any snow. We're still in the winter and we have spring weather and we don't really have much snow on the mountain. Which means that's going to impact the amount of water we have for summer and spring time this year. We had a great last year. We got enough water in our reservoirs, which means that you can potentially make it through. But then you have consecutive years that are dry and they don't receive a lot of water precipitation in form of snow or rain. That will become a very problematic issue to meet future water demands in California. >>And do you think this issue is along with not having enough rainfall, but also about how we store water, or do you think there should be a change in that policy? >>Sure, I think that it definitely has something also in the way we store water and be definitely you're in the 21st century. We have different problems and challenges. It's good to think about alternative ways off a storing water, including using groundwater sources. Groundwater as a way off, storing excess water or moving water around faster and making sure we use every drop of water that falls on the ground and also protecting our water supplies from contamination or pollution. >>And you see it's ever going to desalination or to get clean water. So, interestingly >>enough, I think desalination definitely has worth in other parts of the world, and then they have. Then you have smaller population or you have already tapped out of all the other options that are available to you. Desalination is expensive. Solution costs a lot of money to build this infrastructure and also again depends on you know, this centralized approach that we will build something and provide resources to people from from that location. So it's very costly to build this kind of solutions. I think for for California we still have plenty of water that we can save and repurpose, I would say, and also we still can do recycling and reuse. We can capture our stone water and reuse it, so there's so many other, cheaper, more accessible options available before you go ahead and build a desalination plants >>and you're gonna be talking about sustainable water resource management. So tell us a little bit more about that, too. So the thing with >>water mismanagement and occasionally I use also the word like building resilient water. Future is all about diversifying our water supply and being mindful of how they use our water, every drop of water that use its degraded on. It needs to be cleaned up and put back in the environment, so it always starts from the bottom. The more you save, the less impact you have on the environment. The second thing is you want to make sure every trouble wanted have used. We can use it as many times possible and not make it not not. Take it, use it, lose its right away, but actually be able to use it multiple times for different purposes. Another point that's very important, as actually majority of the water they've used on a daily basis is it doesn't need to be extremely clean drinking water quality. For example, if you tell someone that you're flushing down our toilets. Drinkable water would surprise you that we would spend this much time and resources and money and energy to clean that water to flush it down the toilet video using it. So So basically rethinking the way we built this infrastructure model is very important, being able to tailor water to the needs that we have and also being mindful of Have you use that resource? >>So is your research focus mainly on California or the local community? We actually >>are solutions that we built on our California focus. Actually, we try to build solutions that can be easily applied to different places. Having said that, because you're working from the bottom up, wavy approach water from the bottom up, you need to have a local collaboration and local perspective to bring to their to this picture on. A lot of our collaborators have been so far in California, we have had data from them. We were able to sort of demonstrate some of the assumptions we had in California. But we work actually all over the world. We have collaborators in Europe in Asia and they're all trying to do the same thing that we dio on. You're trying to sort of collaborate with them on some of the projects in other parts of the world. >>That's awesome. So going forward, what do you hope to see with sustainable water management? So, to >>be honest with you, I would often we think about technology as a way that would solve all our problems and move us out of the challenges we have. I would say technology is great, but we need to really rethink the way we manager resource is on the institutions that we have on there. We manage our data and information that we have. And I really hope that became revolutionized that part of the water sector and disrupt that part because as we disrupt this institutional part >>on the >>system, provide more system level thinking to the water sector, I'm hoping that that would change the way we manage our water and then actually opens up space for some of these technologies to come into play as >>we go forward. That's awesome. So before we leave here, you're originally from Tehran. Um and and now you're in this data science industry. What would you say to a kid who's abroad, who wants to maybe move here and have a career in data science? >>I would say Study hard, Don't let anything to disk or do you know we're all equal? Our brains are all made the same way. Doesn't matter what's on the surface. So, um so I and encourage all the girls study hard and not get discouraged and fail as many times as you can, because failing is an opportunity to become more resilient and learn how to grow. And, um and I have, and I really hope to see more girls and women in this in these engineering and stem fields, to be more active on, become more prominent. >>Have you seen a large growth within the past few years? Definitely, >>the conversation is definitely there, and there are a lot more women, and I love how Margot and her team are sort of trying to highlight the number of people who are out there. And working on these issues because that demonstrates that the field wasn't necessarily empty was just not not highlighted as much. So for sure, it's very encouraging to see how much growth you have seen over the years for sure >>you shed. Thank you so much. It's really inspiring all the work you do. Thank you for having me. So no, Absolutely nice to meet you. I'm Senator Gary. Thanks for watching the Cube and stay tuned for more. Yeah, yeah, yeah.

Published Date : Mar 3 2020

SUMMARY :

Brought to you by Silicon Angle Media. Thank you for having me. models to better understand how water use is changing So give us a preview of your talk. to do is you want to change this paradigm and try to make it more bottom up at and my talk, but grass is the biggest club you're going in the US So that's a lot of water that we can be able to we can save and use, The rest of it is actually the residential water use, which is what you and I, They're not wasting water while you're doing that. We just had a dry February was the 1st 150 years, and you know, Which means that's going to impact the amount of water we have for summer and spring time this year. Sure, I think that it definitely has something also in the way we store water and be definitely you're And you see it's ever going to desalination or to get clean water. I think for for California we still have plenty of water that we can save and repurpose, So the thing with the needs that we have and also being mindful of Have you use that resource? the bottom up, you need to have a local collaboration and local So going forward, what do you hope to see with sustainable that part of the water sector and disrupt that part because as we disrupt this institutional So before we leave here, you're originally from Tehran. and fail as many times as you can, because failing is an opportunity to become more resilient it's very encouraging to see how much growth you have seen over the years for sure It's really inspiring all the work you do.

ENTITIES

Entity	Category	Confidence
Europe	LOCATION	0.99+
California	LOCATION	0.99+
US	LOCATION	0.99+
Sha Ajami	PERSON	0.99+
Tehran	LOCATION	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
Margot	PERSON	0.99+
20th century	DATE	0.99+
50%	QUANTITY	0.99+
21st century	DATE	0.99+
Newsha Ajami	PERSON	0.99+
Stanford University	ORGANIZATION	0.99+
last year	DATE	0.99+
February	DATE	0.99+
Sonia	PERSON	0.98+
second thing	QUANTITY	0.98+
10%	QUANTITY	0.98+
Asia	LOCATION	0.98+
today	DATE	0.98+
Gary	PERSON	0.97+
Stanford	ORGANIZATION	0.96+
Woods Women in Data Science Conference	EVENT	0.96+
four times	QUANTITY	0.95+
Senator	PERSON	0.94+
western United States	LOCATION	0.93+
1st 150 years	QUANTITY	0.93+
2020	DATE	0.92+
Stanford Women in Data Science (	EVENT	0.9+
this year	DATE	0.86+
two million people a year	QUANTITY	0.85+
Cube	ORGANIZATION	0.82+
about a 1,000,000	QUANTITY	0.8+
WiDS) Conference 2020	EVENT	0.77+
this February	DATE	0.75+
One	QUANTITY	0.74+
Cube	TITLE	0.63+
past	DATE	0.55+
fifth	EVENT	0.54+
data	TITLE	0.52+
drop	QUANTITY	0.51+
years	DATE	0.49+
annual	QUANTITY	0.41+

Emily Glassberg Sands, Coursera | Stanford Women in Data Science (WiDS) Conference 2020

>> Reporter: Live from Stanford University, it's theCUBE, covering Stanford Women in Data Science 2020. Brought to you by SiliconANGLE media. >> Hi, and welcome to theCUBE. I'm your host, Sonia Tagare, and we're live at Stanford University covering the fifth annual WiDs, Women in Data Science conference. Joining us today is Emily Glassberg Sands, the Head of Data Science at Coursera, Emily, welcome to theCUBE. >> Thanks, so great to be on. >> So, tell us a little bit more about what you do at Coursera. >> Yeah, absolutely, so Coursera is the world's largest platform for higher education. We partner with about 160 universities and 20 industry partners and we provide top learning content from data science to child nutrition to about 50 million learners around the world. I lead the end to end data team so spanning data engineering, data science and machine learning. >> Wow, and we just had Daphne Koller on earlier this morning who is the co-founder of Coursera and she's also the one who hired you. >> Yeah. >> So tell us more about that relationship. >> Well, I love Daphne, I think the world of her, as I will talk about shortly, she actually didn't hire me from the start. The first answer I got one from Coursera was a no, that the company wasn't quite ready for someone who wasn't a full blown coder. But I eventually talked to her into bringing me on board, and she's been an inspiration ever since. I think one of my first memories of Daphne was when she was painting the vision of what's possible with online education, and she said, "think about the first movie." The first movie was literally just filming a play on stage. You'll appreciate this, given your background in film, and then fast forward to today and think about what's possible in movies that could never be possible on the brick-and-mortar stage. And the analog she was creating was the first MOOC, the first Massive Open Online Course was very simply filming a professor in a classroom. But she was thinking forward to today and tomorrow and five years from now, and what's possible in terms of how data and technology can transform, how educators teach and how learners learn. >> That's very cool. So, how has Coursera changed from when she started it to now? >> So, it's evolved a lot. So, I've been at Coursera about six years, when I joined the company, it had less than 50 people. Today we're 10 times that size, we have 500. I think there have been obviously dramatic growth in the platform over all the three main changes to our business model. The first is we've moved from partnering exclusively with universities to recognizing that actually, a lot of the most important education for folks in the labor market is being taught within companies. So, Google is super incentivized to train people in Google Cloud, Amazon and AWS. Folks need to learn Tableau and a whole host of other software's. So, we've expanded to including education that's provided not just by top institutions like Stanford, but also by top institutions that are companies like Amazon and Google. The second big change is we've recognized that while for many learners and individual course or a MOOC is sufficient, some learners need access to full degree, a diploma bearing credential. So we've moved to the degree space we now have 14 degrees live on the platform masters in computer science and data science but also in business, accounting, and so on. And the third major changes, I think just sort of as the world has evolved to recognize that folks need to be learning throughout their lives. There's also general consensus that it's not just on the individuals to learn, but also on their companies to train them and governments as well, and so we launched Coursera enterprise, which is about providing learning content through employers and through governments so we can reach a wider swath of individuals who might not be able to afford it themselves. >> And how are you able to use data science to track individual, user preferences and user behavior? >> Yeah, that's a great question so you can imagine right? 50 million learners, they're from almost every country in the world from a range of different backgrounds have a bunch of different goals, And so I think what you're getting out is that so much of creating the right learning experience for each person is about personalizing that experience. And we personalized throughout the learner journey so in discovery up-front, when you first joined the platform, we ask you, what's your career goal? What role are you in today? And then we help you find the right content to close the gap. As you're moving through courses we predict whether or not you need some additional support. Whether it's a fully automated intervention like a behavioral nudge, emphasizing growth mindset, or a pedagogical nudge like recommending the right review material and provide it to you, and then we also do the same to accelerate support staff on campus. So, we identify for each individual what type of human touch might they need, and we serve up to support staff recommendations for who they should reach out to, whether it's a counselor reaching out to degree student who hasn't logged in for a while, or a TA reaching out to a degree student who's struggling with an assignment. So, data really powers all of that, understanding someone's goals, their backgrounds, the content that's going to close the gap, as well as understanding where they need additional support and what type of help we can provide. >> And how are you able to track this data, are you using AV testing? >> Yeah, great question, so the, we call it a venting level data, which basically tracks what every learner is doing as they're moving through the platform. And then we use AV testing to understand the influence of kind of our big feature. So, say we roll out a new search ranking algorithm or a new learning experience we would AV-Test that, yes to understand how learners in the new variant compared to learners in the old variant. But for many of our machine learn systems, we're actually doing more of a multi-armed bandit approach where on the margin, we're changing a little bit the experience people have to understand what effect that has on their downstream behavior, separate from this mass hold-in or hold-out AV-Test. >> And so today, you're giving a talk about Coursera's latest data products so give us a little insight about that. >> So, I'm covering three data products that we've launched over the last couple of years. The first two are oriented around really helping learners be successful in the learning experience. So the first is predicting when learners are going to need additional nudges and intervening in fully automated ways to get them back on track. The second is about identifying learners who need human support and serving up really easily interpretable insights to support staff so they can reach out to the right learner with the right help. And then the third is a little bit different. It's about once learners are out in the labor market, how can they credibly signal what they know, so that they can be rewarded for that learning on the job. And this is a product called skill scoring, where we're actually measuring what skills each learner has up to what level so I can for example, compare that to the skills required in my target career or show it to my employer so I can be rewarded for what I know. >> That can be really helpful when people are creating resumes, by ranking how much of a skill that they have. >> Absolutely. So, it's really interesting when you talk about resumes, so many of what, so much of what's shown on resumes are traditional credentials, things like What school did you go to? what did you major in? what jobs have you had? And as you and I both know, there's unequal access to the school you go to or the early jobs you get. And so, part of the motivation behind skill scoring is to create more equitable or fair or accessible signals for the labor market. So, we're really excited about that direction. >> And do you think companies are taking that into consideration when they're hiring people who say have like a five out of five skills in computer science, but they didn't go to Stanford? >> Yeah. >> Think they're taking that >> Absolutely, I think companies are hungry to find more diverse talent and the biggest challenge is, when you look at people from diverse backgrounds, it's hard to know who has what skills. And so skill scoring provides a really valuable input, we're actually seeing it in use already by many of our enterprise customers who are using it to identify who have their internal employees is well positioned for new opportunities or new roles. For example, I may have a bunch of backend engineers, if I know who's good in math and machine learning and statistics, I can actually tap those folks to transition over to machine learning roles. And so it's used both as an external signal and external labor market, as well as an internal signal within companies. >> And just our last question here, what advice would you give to young women who are either out of college or just starting college who are interested in data science? Who maybe, don't haven't majored in a typical data science major? What advice would you give to them? >> So, I love that you asked you haven't made it, majored in a typical data science major. I'm actually an economist by training. And I think that's probably the reason why I was at first rejected from Coursera because an economist is a very strange background to go into data science. I think my primary advice to those young women would be to really not get too lost in the data science, in the math, in the algorithms and instead to remember that those are a means to an end, and the end is impact. So, think about the problems in the world that you care about. For me, it's education. For others, it's health care, or personal finance or a range of other issues. And remember that data science provides this vast set of tools that you can use to solve the problems you care about most. >> That's great, thank you so much for being on theCUBE. >> Thank you. I'm Sonia Tagare, thank you so much for watching theCUBE and stay tuned for more. (upbeat music)

Published Date : Mar 3 2020

SUMMARY :

Brought to you by SiliconANGLE media. covering the fifth annual WiDs, about what you do at Coursera. I lead the end to end data team and she's also the one who hired you. and then fast forward to today So, how has Coursera changed that it's not just on the individuals to learn, And then we help you find the right content the experience people have to understand what effect And so today, you're giving a talk about Coursera's compare that to the skills required in my target career resumes, by ranking how much of a skill that they have. to the school you go to or the early jobs you get. and statistics, I can actually tap those folks to transition and instead to remember that those are a means to an end, I'm Sonia Tagare, thank you so much for watching theCUBE

ENTITIES

Entity	Category	Confidence
Sonia Tagare	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Daphne	PERSON	0.99+
Daphne Koller	PERSON	0.99+
Stanford	ORGANIZATION	0.99+
10 times	QUANTITY	0.99+
Coursera	ORGANIZATION	0.99+
14 degrees	QUANTITY	0.99+
Emily	PERSON	0.99+
five	QUANTITY	0.99+
first movie	QUANTITY	0.99+
tomorrow	DATE	0.99+
500	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
third	QUANTITY	0.99+
first	QUANTITY	0.99+
Today	DATE	0.99+
second	QUANTITY	0.99+
20 industry partners	QUANTITY	0.99+
Emily Glassberg Sands	PERSON	0.99+
Stanford University	ORGANIZATION	0.99+
less than 50 people	QUANTITY	0.99+
each person	QUANTITY	0.98+
SiliconANGLE	ORGANIZATION	0.98+
today	DATE	0.98+
theCUBE	ORGANIZATION	0.98+
both	QUANTITY	0.98+
about 160 universities	QUANTITY	0.97+
first two	QUANTITY	0.96+
first answer	QUANTITY	0.95+
first MOOC	QUANTITY	0.95+
50 million learners	QUANTITY	0.95+
about 50 million learners	QUANTITY	0.94+
Tableau	TITLE	0.93+
about six years	QUANTITY	0.93+
three	QUANTITY	0.92+
each individual	QUANTITY	0.92+
WiDs, Women in Data Science conference	EVENT	0.91+
third major	QUANTITY	0.9+
each learner	QUANTITY	0.89+
one	QUANTITY	0.89+
WiDS	EVENT	0.88+
earlier this morning	DATE	0.87+
Conference 2020	EVENT	0.85+
last couple of years	DATE	0.85+
first memories	QUANTITY	0.85+
five skills	QUANTITY	0.83+
three data products	QUANTITY	0.83+
Stanford Women in Data Science	EVENT	0.82+
Google Cloud	TITLE	0.81+
five years	QUANTITY	0.77+
first Massive	QUANTITY	0.72+
Stanford Women in Data Science 2020	EVENT	0.69+
fifth	QUANTITY	0.54+

Ya Xu, LinkedIn | Stanford Women in Data Science (WiDS) Conference 2020

>> Narrator: Live from Stanford University, it's theCUBE! Covering Stanford Women in Data Science 2020, brought to you by SiliconAngle Media. >> Hi, and welcome to the cube, I'm your host, Sonia Tagare. And we're live at Stanford University, covering the fifth annual WiDS, Women in Data Science Conference. Joining us today is Ya XU, the head of data science at LinkedIn. Ya Welcome to the cube. >> Thank you for having me. >> So tell us a little bit about your role and about LinkedIn. >> So LinkedIn is, first of all, the biggest professional social network, where we have a massive economic graph that we have been creating with millions actually close to 700 million members and millions of companies and jobs and of course, you know, with students of skills and also schools as well as part of it. And, and I lead the data science team at LinkedIn. And my team really spans across the global presence that LinkedIn offices have. And yeah really working on various different areas. That's both thinking about how we can iterate and understand and improve our products, that we deliver to our members and our customers. And also at the same time thinking about how we can make our infrast6ructure more efficient, and thinking about how we can make our sales and marketing more efficient as well, so we really span across. >> And how has the use of data science evolved to deliver a better user experience for users of LinkedIn? >> Yeah, so first of all, I think we LinkedIn in general, we truly believe that everybody can benefit from better data, better data access, in general. So we're certainly using data to continuously understand better of what our members are looking for. As a simple example, is that whenever we launch new feature, we're not just blindly deciding ourselves what is the better feature for our members, but we actually understand how our users are reacting to it. Right? So we use data to understand that, and then certainly making decisions, and whether we should be eventually launching this feature to all members or not. So that's a very prominent way for us to use data. And obviously, we also use data to understand and just even before we build certain features. Is this sort of feature that's right feature to build. We do both survey and understand the survey data, but also at the same time understanding just user behavior data for us to be able to come up with better features for users. >> And do you use AB testing as well? >> Oh absolutely, Yeah. So we do a lot of AV experiments. That's what, I was not trying to use that word by that like that terminology, but this is what we use to have an understanding of user features that we are developing, that we are putting in front of our users. Is that what they enjoy as much as we think they will enjoy? >> Right, so you had a talk today about creating global economic opportunities with responsible data. So give us some highlights from your talk. >> So, first of all, at LinkedIn we we truly believe in the vision that we are working towards, which is really creating economic opportunity for every member of the global workforce. And if you're kind of starting from that, and thinking about that is our sort of the axiom that we're working towards, and then thinking about how you can do that, and obviously, the sort of the table stake or just the fundamental thing that we have to start with is to be able to preserve the privacy of our members as we are leveraging the data that our members entrust with us. Right, so how can we do that? We have some early effort in using and developing differential privacy as a technique for us to do a lot better. Always regarding preserving their privacy as we're leveraging the data, but also at the same time, it doesn't ends there, right? Because you're thinking about creating opportunity. It's not just about to preserve their privacy, but also, when we are leveraging the data, how can we leverage the data in a way that is able to create opportunity in a fair way? So here is also a lot of effort that we're having with regarding, how can we do that? And what does fairest mean? What are the ways we can actually turn some of the key concepts that we have into action that is really able to drive the way we develop product, the way that we think about responsible design, and the way that we build our algorithms, the way that we measure in every single dimension. >> And and speaking about that bias, at the opening address, they mentioned that diversity is really great because it provides many perspectives, and also helps reduce this bias. So how have you at LinkedIn been able to create a more diverse team? >> So first of all, I think it's certain we all believe that diversity is certainly better as we building product. Thinking about if you have a diverse team that is really a representation of the customer and some members that you're serving, then definitely you're able to come up with better features that is able to serve the needs of the population of our members. But also at the same time, that's just the right thing to do as well. Right, thinking about we all have had experiences we may not you know, feel as much belonging when we walk into a room that we are the only person that we identify with to be in that room. And, we certainly wanted to be able to create that environment for all the employees as well. And and thinking about, I think there is also studies that has done as what makes a high performing team. Some of the studies has done I google with the psychological safety aspects of it, which is really there's a lot of brain science that says when you make people feel they belong, that they will actually be so much more creative and innovative and everything right. So we have that belief. But tactically, there are many things that we're doing from all the divs aspect, right? How can you bring diversity, inclusion and belonging? Starting from and hiring, right? So we certainly are very much emphasized how can we increase the diversity of individuals that we're bringing to LinkedIn? And when they are at LinkedIn, can we make them feel more belonging, and feel more included in every aspects? We have different inclusion groups, right? We have I mean, obviously, I'm very much involved in Women tech. At LinkedIn we have both money efforts that we do to help women at LinkedIn in engineering, and in other groups as well to feel they belong to this community. At the same time, there is concrete actions that we're taking too. Right, that we are helping women to have a much better understanding, and aware of some of the ways that we operate that is slightly different from maybe our male colleagues will operate, right? There are certain things that we're doing to change the current processes, hiring processes, promotion process, that we are able to bring more equal footing to the way that we're thinking about gender gap and gender diversity. >> Right, that's great. And what advice would you give to women who are just starting college or who are just out of college who are interested in going into data science. >> So I want to say the biggest learning for me, is just have that can do attitude. I, you know, the woman biologically and all just like in every way, we're not any less than men. And that you certainly have seen many strong and very talented women that we have in the field. So don't let people's perceptions or biases around you to bring you down. And then thinking about what you wanted, and then just go for it, and then go for the the advice that you can get from people. And then there are so many as you can see in the conference today, so many talented women that you can reach out to who are winning and very willing to help you as well. >> And in this age of AI and ML, where do you see data science going in the future? >> That's a really interesting question. So in the way that, you know, data science I want to say is a field that is really broad, right? So if you're thinking about things that I would consider to be part of data science may not necessarily part of AI, but some of the course of influence that is extremely popular and important. And then I think the fields will continue to evolve, there are going to be and then the fields are continually overlapping with each other as well. You cannot do data science without understanding or have a strong skill in AI and machine learning. And you also can't do great machine learning without understanding the data science either. Right? So thinking about some of the talk that definitely colder earlier was sharing, as in you know, you can blind in the wrong algorithm and without realizing the bias. That all the algorithm is really just detecting the machines that's using the images versus you know, actually detecting the difference between broken bones or not right, like so. So I think having, I do see there is a continuously big overlap and I think the individuals who are involved in both communities should continue to be very comfortable being in that way too. >> Right, great. Thank you so much for being on theCUBE and thank you for your insight. >> Of course, thank you for having me. >> I'm your host, Sonia Takari. Thank you for watching theCUBE and stay tuned for more. (Upbeat music)

Published Date : Mar 3 2020

SUMMARY :

brought to you by SiliconAngle Media. Hi, and welcome to the cube, and about LinkedIn. and thinking about how we can make our sales and marketing and just even before we build certain features. that we are putting in front of our users. Right, so you had a talk today and the way that we build our algorithms, And and speaking about that bias, at the opening address, and aware of some of the ways that we operate And what advice would you give to women And that you certainly have seen many strong So in the way that, you know, data science and thank you for your insight. Thank you for watching theCUBE

ENTITIES

Entity	Category	Confidence
Sonia Takari	PERSON	0.99+
Sonia Tagare	PERSON	0.99+
LinkedIn	ORGANIZATION	0.99+
millions	QUANTITY	0.99+
today	DATE	0.99+
both	QUANTITY	0.98+
SiliconAngle Media	ORGANIZATION	0.98+
Stanford University	ORGANIZATION	0.97+
Ya Xu	PERSON	0.95+
Stanford Women in Data Science	EVENT	0.95+
WiDS, Women in Data Science Conference	EVENT	0.93+
both communities	QUANTITY	0.9+
700 million members	QUANTITY	0.89+
WiDS) Conference 2020	EVENT	0.79+
Stanford Women in Data Science 2020	EVENT	0.78+
millions of companies	QUANTITY	0.77+
single dimension	QUANTITY	0.7+
XU	PERSON	0.63+
first	QUANTITY	0.62+
fifth annual	QUANTITY	0.56+
theCUBE	TITLE	0.42+

Nhung Ho, Intuit | Stanford Women in Data Science (WiDS) Conference 2020

>>live from Stanford University. It's the queue covering Stanford women in data science 2020. Brought to you by Silicon Angle Media. Yeah. >>Hi. And welcome to the Cube. I'm your host Sonia Category. And we're live at Stanford University for the fifth annual Woods Women in Data Science Conference. Joining us today is none. Ho, the director of data Science at Intuit None. Welcome to the Cube. >>Thank you for having me here, so yeah, >>so tell us a little bit about your role at Intuit. So I leave the >>applied Machine Learning teams for our QuickBooks product lines and also for our customer success organization within my team. We do applied machine learning. So what? We specialize in building machine learning products and delivering them into our products for >>our users. Great. Today. Today you're giving a talk. You talked about how organizations want to achieve greater flexibility, speed and cost efficiencies on. And you're giving it a technical vision. Talk today about data science in the cloud world. So what should data scientists know about data science in a cloud world? >>Well, I'll just give you a little bit of a preview into my talk later because I don't want to spoil anything. Yeah, but I think one of the most important things being a data scientist in a cloud world is that you have to fundamentally change the way you work a lot of a start on our laptops or a server and do our work. But when you move to the cloud, it's like all bets are off. All the limiters are off. And so how do you fully take advantage of that? How do you change your workflow? What are some of the things that are available to you that you may not know about? And in addition to that, some some things that you have to rewire in your brain to operate in this new environment. And I'm going to share some experiences that I learned firsthand and also from my team in into its cloud migration over the past six years. >>That's great. Excited to hear that on DSO you were getting into it into it has sponsored Woods for many years now. Last year we spoke with could be the San Juan from Intuit. So tell us about this Intuit's sponsorship. Yeah, >>so into it. We are a champion of gender diversity and also all sorts of diversity. And when we first learned about which we said, We need to be a champion of the women in data science conference because for me personally, often times when I'm in a room, um, going over technical details I'm often the only woman and not just I'm often the only woman executive and so part of the sponsorship is to create this community of women, very technical women in this field, to share our work together to build this community and also to show the great diversity of work that's going on across the field of data science. >>And so Intuit has always been really great for embracing diversity. Tell us a little bit about about bad experience, about being part of Intuit and also about the tech women part. Yeah, >>so one of the things that into it that I really appreciate is we have employees groups around specific interests, and one of those employees groups is tech women at Intuit and Tech women at Intuit. The goal is to create a community of women who can provide coaching, mentorship, technical development, leadership development and I think one of the unique things about it is that it's not just focused on the technical development side, but on helping women develop into leadership positions. For me, When I first started out, there were very few women in executive positions in our field and data science is a brand new field, and so it takes time to get there. Now that I'm on the other side, one of the things that I want to do is be able to give back and coach the next generation. And so the tech women at Intuit Group allows me to do that through a very strong mentorship program that matches me and early career mentees across multiple different fields so that I can provide that coaching in that leadership development >>and speaking about like diversity. In the opening address, we heard that diversity creates perspectives, and it also takes away bias. So why gender diversity is so important into it, and how does it help take away that bias? Yeah, >>so one of the important things that I think a lot of people don't realize is when you go and you build your products, you bring in a lot of biases and how you build the product and ultimately the people who use your products are the general population for us. We serve consumer, small businesses and self employed. And if you take a look at the diversity of our customers, it mirrors the general population. And so when you think about building products, you need to bring in those diverse perspectives so you could build the best products possible because of people who are using those products come from a diverse background as well, >>right? And so now at Intuit like instead of going from a desktop based application, we're at a cloud based application, which is a big part of your talk. How do you use data Teoh for a B testing and why is it important? >>Yeah, a B testing That is a personal passion of mine, actually, because as a scientist, what we like to do is run a lot of experiments and say, Okay, what is the best thing out there so that ultimately, when you ship a new product or feature, you send the best thing possible that's verified by data, and you know exactly how users are going to react to it. When we were on desktop, they made it incredibly difficult because those were back in the days. And I don't know if you remember those put back in the days when you had a floppy disk, right or even a CD ROM's. That's how we shipped our products. And so all the changes that you wanted to make had to be contained. In the end, you really only ship it once per year. So if there's any type of testing that we did, we're bringing our users and have them use our products a little bit and then say Okay, we know exactly what we need to dio ship that out. So you only get one chance now that we're in the cloud. What that allows us to do is to test continuously via a B, testing every new feature that comes out. We have a champion Challenger model, and we can say Okay, the new version that we're shipping out is this much better than the previous one. We know it performs in this way, and then we got to make the decision. Is this the best thing to do for a customer? And so you turn what was once a one time process, a one time change management process. So one that's distributed throughout the entire year and at any one time we're running hundreds of tests to make sure that we're shipping exactly the best things for our customers. >>That's awesome. Um, so, um, what advice would you give to the next generation of women who are interested in stem but maybe feel like, Oh, I might be the only woman. I don't know if I should do this. Yeah, I think that the biggest >>thing for me was finding men's ownership, and initially, when I was very early career and even when I was doing my graduate studies for me, a mentor with someone who was in my field. But when I first joined into it, an executive in another group who is a female, said, Hey, I'd like to take your side, provide you some feedback, and this is some coaching I want to give you, And that was when I realized you don't actually need to have that person be in your field to actually guide you through to the next up. And so, for women who are going through their journey and early on, I recommend finding a mentor who is at a stage where you want to go, regardless of which field there in, because everybody has diverse perspectives and things that they can teach you as you go along. >>And how do you think Woods is helping women feel like they can do data science and be a part of the community? Yeah, I think >>what you'll see in the program today is a huge diversity of our speakers, our Panelists through all different stages of their career and all different fields. And so what we get to see is not only the time baseline of women who are in their PhDs all the way to very, very well established women. The provost of Stanford University was here today, which is amazing to see someone at the very top of the career who's been around the block. But the other thing is also the diversity and fields. When you think about data science, a lot of us think about just the tech industry. But you see it in healthcare. You see it in academia and there's a scene that wide diversity of where data science and where women who are practicing data science come from. I think it's really empowering because you can see yourself in the representation does matter quite a bit. >>Absolutely. And where do you see data science going forward? >>Oh, that is a, uh, tough and interesting question, actually. And I think that in the current environment today, we could talk about where it could go wrong or where it could actually open the doors. And for me, I'm an eternal optimist on one of the things that I think is really, really exciting for the future is we're getting to a stage where we're building models, not just for the general population. We have enough data and we have enough compute where we can build a model. Taylor just for you, for all of your life's on for me. I think that that is really, really powerful because we can build exactly the right solution to help our customers and our users succeed. Specifically, me working in the personal friend, Small business finance lease. That means I can hope that cupcake shop owner actually manage her cash flow and help her succeed to me that I think that's really powerful. And that's where data science is headed. >>None. Thank you so much for being on the Cube and thank you for your insight. Thank you so much. I'm so sorry. Thanks for watching the Cube. Stay tuned for more. Yeah, Yeah, yeah, yeah, yeah, yeah.

Published Date : Mar 3 2020

SUMMARY :

Brought to you by Silicon Angle Media. And we're live at Stanford University for the fifth so tell us a little bit about your role at Intuit. We do applied machine learning. And you're giving it a technical vision. What are some of the things that are available to you that you may not know about? Excited to hear that on DSO you were getting into it into it has sponsored We need to be a champion of the women in data science conference because And so Intuit has always been really great for embracing diversity. And so the tech women at Intuit Group allows me to do that through a very strong mentorship program that In the opening address, we heard that diversity creates And so when you think about building products, you need to bring in those diverse How do you use data Teoh for a B testing and And so all the changes that you wanted to make had to be contained. Um, so, um, what advice would you give to the next generation of women I recommend finding a mentor who is at a stage where you want to go, And so what we get to see is not only the time baseline of women who are in their PhDs all And where do you see data science going forward? And for me, I'm an eternal optimist on one of the things that I think is really, Thank you so much.

ENTITIES

Entity	Category	Confidence
Intuit	ORGANIZATION	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
Today	DATE	0.99+
Last year	DATE	0.99+
today	DATE	0.99+
Intuit Group	ORGANIZATION	0.99+
one time	QUANTITY	0.99+
Stanford University	ORGANIZATION	0.99+
one	QUANTITY	0.99+
Sonia	PERSON	0.99+
Nhung Ho	PERSON	0.99+
one chance	QUANTITY	0.99+
Taylor	PERSON	0.98+
first	QUANTITY	0.98+
Ho	PERSON	0.97+
QuickBooks	TITLE	0.97+
Intuit None	ORGANIZATION	0.95+
Woods Women in Data Science Conference	EVENT	0.94+
Stanford	ORGANIZATION	0.93+
hundreds of tests	QUANTITY	0.93+
2020	DATE	0.93+
past six years	DATE	0.88+
Stanford Women in Data Science (	EVENT	0.88+
DSO	ORGANIZATION	0.86+
one time process	QUANTITY	0.86+
once per year	QUANTITY	0.86+
Woods	PERSON	0.83+
Cube	COMMERCIAL_ITEM	0.77+
WiDS) Conference 2020	EVENT	0.75+
Woods	EVENT	0.66+
once	QUANTITY	0.61+
fifth	EVENT	0.55+
Cube	ORGANIZATION	0.51+
San Juan	LOCATION	0.46+
annual	QUANTITY	0.37+

Lillian Carrasquillo, Spotify | Stanford Women in Data Science (WiDS) Conference 2020

>>live from Stanford University. It's the queue covering Stanford women in data science 2020. Brought to you by Silicon Angle Media. >>Yeah, yeah. Hi. And welcome to the Cube. I'm your host, Sonia Atari. And we're live at Stanford University, covering the fifth annual Woods Women in Data Science Conference. Joining us today is Lillian Kearse. Keo, who's the Insights manager at Spotify. Slowly and welcome to the Cube. Thank you so much for having me. So tell us a little bit about your role at a Spotify. >>Yeah, So I'm actually one of the few insights managers in the personalization team. Um, and within my little group, we think about data and algorithms that help power the larger personalization experiences throughout Spotify. So, from your limits to discover weekly to your year and wrap stories to your experience on home and the search results, that's >>awesome. Can you tell us a little bit more about the personalization? Um, team? >>Yes. We actually have a variety of different product areas that come together to form the personalization mission, which is the mission is like the term that we use for a big department at Spotify, and we collaborate across different product areas to understand what are the foundational data sets and the foundational machine learning tools that are needed to be able to create features that a user can actually experience in the app? >>Great. Um, and so you're going to be on the career panel today? How do you feel about that? I'm >>really excited. Yeah, Yeah, the would seem is in a great job of bringing together Diverse is very, uh, it's overused term. Sometimes they're a very diverse group of people with lots of different types of experiences, which I think is core. So how I think about data science, it's a wide definition. And so I think it's great to show younger and mid career women all of the different career paths that we can all take. >>And what advice would you would you give to? Women were coming out of college right now about data science. >>Yeah, so my my big advice is to follow your interests. So there's so many different types of data science problems. You don't have to just go into a title that says data scientists or a team that says Data scientist, You can follow your interest into your data science. Use your data science skills in ways that might require a lot of collaboration or mixed methods, or work within a team where there are different types of different different types of expertise coming together to work on problems. >>And speaking of mixed methods, insights is a team that's a mixed methods research groups. So tell us more about that. Yes, I >>personally manage a data scientist, Um, user researcher and the three of us collaborate highly together across their disciplines. We also collaborate across research science, the research science team right into the product and engineering teams that are actually delivering the different products that users get to see. So it's highly collaborative, and the idea is to understand the problem. Space deeply together, be able to understand. What is it that we're trying to even just form in our head is like the need that a user work and human and user human has, um, in bringing in research from research scientists and the product side to be able to understand those needs and then actually have insights that another human, you know, a product owner you can really think through and understand the current space and like the product opportunities >>and to understand that user insight do use a B testing. >>We use a lot of >>a B testing, so that's core to how we think about our users at Spotify. So we use a lot of a B testing. We do a lot of offline experiments to understand the potential consequences or impact that certain interventions can have. But I think a B testing, you know, there's so much to learn about best practices there and where you're talking about a team that does foundational data and foundational features. You also have to think about unintended or second order effects of algorithmic a B test. So it's been just like a huge area of learning in a huge area of just very interesting outcomes. And like every test that we run, we learn a lot about not just the individual thing. We're testing with just the process overall. >>And, um, what are some features of Spotify that customers really love anything? Anything >>that's like we know use a daily mix people absolutely love every time that I make a new friend and I saw them what they work on there like I was just listening to my daily makes this morning discover weekly for people who really want >>to stay, >>you know, open to new music is also very popular. But I think the one that really takes it is any of the end of year wrapped campaigns that we have just the nostalgia that people have, even just for the last year. But in 2019 we were actually able to do 10 years, and that amount of nostalgia just went through the roof like people were just like, Oh my goodness, you captured the time that I broke up with that, you >>know, the 1st 5 years ago, or just like when I discovered that I love Taylor Swift, even though I didn't think I like their or something like that, you know? >>Are there any surprises or interesting stories that you have about, um, interesting user experiences? Yeah. >>I mean, I could give I >>can give you an example from my experience. So recently, A few a few months ago, I was scrolling through my home feed, and I noticed that one of the highly rated things for me was women in >>country, and I was like, Oh, that's kind of weird. I don't consider >>myself a country fan, right? And I was like having this moment where I went through this path of Wait, That's weird. Why would Why would this recommend? Why would the home screen recommend women in country, country music to me? And then when I click through it, um, it would show you a little bit of information about it because it had, you know, Dolly Parton. It had Margo Price and it had the high women and those were all artistes. And I've been listening to a lot, but I just had not formed an identity as a country music. And then I click through It was like, Oh, this is a great play list and I listen to it and it got me to the point where I was realizing I really actually do like country music when the stories were centered around women, that it was really fun to discover other artists that I wouldn't have otherwise jumped into as well. Based on the fact that I love the story writing and the song, writing these other country acts that >>so quickly discovered that so you have a degree in industrial mathematics, went to a liberal arts college on purpose because you want to try out different classes. So how is that diversity of education really helped >>you in your Yes, in my undergrad is from Smith College, which is a liberal arts school, very strong liberal arts foundation. And when I went to visit, one of the math professors that I met told me that he, you know, he considers studying math, not just to make you better at math, but that it makes you a better thinker. And you can take in much more information and sort of question assumptions and try to build a foundation for what? The problem that you're trying to think through is. And I just found that extremely interesting. And I also, you know, I haven't undeclared major in Latin American studies, and I studied like neuroscience and quantum physics for non experts and film class and all of these other things that I don't know if I would have had the same opportunity at a more technical school, and I just found it really challenging and satisfying to be able to push myself to think in different ways. I even took a poetry writing class I did not write good poetry, but the experience really stuck with me because it was about pushing myself outside of my own boundaries. >>And would you recommend having this kind of like diverse education to young women now who are looking >>and I absolutely love it? I mean, I think, you know, there's some people believe that instead of thinking about steam, we should be talking instead of thinking about stem. Rather, we should be talking about steam, which adds the arts education in there, and liberal arts is one of them. And I think that now, in these conversations that we have about biases in data and ML and AI and understanding, fairness and accountability, accountability bitterly, it's a hardware. Apparently, I think that a strong, uh, cross disciplinary collaborative and even on an individual level, cross disciplinary education is really the only way that we're gonna be able to make those connections to understand what kind of second order effects for having based on the decisions of parameters for a model. In a local sense, we're optimizing and doing a great job. But what are the global consequences of those decisions? And I think that that kind of interdisciplinary approach to education as an individual and collaboration as a team is really the only way. >>And speaking about bias. Earlier, we heard that diversity is great because it brings out new perspectives, and it also helps to reduce that unfair bias. So how it Spotify have you managed? Or has Spotify managed to create a more diverse team? >>Yeah, so I mean, it starts with recruiting. It starts with what kind of messaging we put out there, and there's a great team that thinks about that exclusively. And they're really pushing all of us as managers. As I seizes leaders to really think about the decisions in the way that we talk about things and all of these micro decisions that we make and how that creates an inclusive environments, it's not just about diversity. It's also about making people feel like this is where they should be. On a personal level, you know, I talk a lot with younger folks and people who are trying to just figure out what their place is in technology, whether it be because they come from a different culture, >>there are, >>you know, they might be gender, non binary. They might be women who feel like there is in a place for them. It's really about, You know, the things that I think about is because you're different. Your voice is needed even more. You know, like your voice matters and we need to figure out. And I always ask, How can I highlight your voice more? You know, how can I help? I have a tiny, tiny bit of power and influence. You know, more than some other folks. How can I help other people acquire that as well? >>Lilian, thank you so much for your insight. Thank you for being on the Cube. Thank you. I'm your host, Sonia today. Ari. Thank you for watching and stay tuned for more. Yeah, yeah.

Published Date : Mar 3 2020

SUMMARY :

Brought to you by Silicon Angle Media. Thank you so much for having me. that help power the larger personalization experiences throughout Spotify. Can you tell us a little bit more about the personalization? and we collaborate across different product areas to understand what are the foundational data sets and How do you feel about that? And so I think it's great to show younger And what advice would you would you give to? Yeah, so my my big advice is to follow your interests. And speaking of mixed methods, insights is a team that's a mixed methods research groups. in bringing in research from research scientists and the product side to be able to understand those needs And like every test that we run, we learn a lot about not just the individual thing. you know, open to new music is also very popular. Are there any surprises or interesting stories that you have about, um, interesting user experiences? can give you an example from my experience. I don't consider And I was like having this moment where I went through this path of Wait, so quickly discovered that so you have a degree in industrial mathematics, And I also, you know, I haven't undeclared major in Latin American studies, I mean, I think, you know, there's some people believe that So how it Spotify have you managed? As I seizes leaders to really think about the decisions in the way that we talk And I always ask, How can I highlight your voice more? Lilian, thank you so much for your insight.

ENTITIES

Entity	Category	Confidence
Lillian Carrasquillo	PERSON	0.99+
Lillian Kearse	PERSON	0.99+
Lilian	PERSON	0.99+
Sonia	PERSON	0.99+
Spotify	ORGANIZATION	0.99+
2019	DATE	0.99+
Ari	PERSON	0.99+
Sonia Atari	PERSON	0.99+
three	QUANTITY	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
today	DATE	0.99+
Stanford University	ORGANIZATION	0.99+
Smith College	ORGANIZATION	0.99+
10 years	QUANTITY	0.99+
Keo	PERSON	0.98+
last year	DATE	0.98+
one	QUANTITY	0.98+
Dolly Parton	PERSON	0.98+
Margo Price	PERSON	0.97+
Stanford Women in Data Science	EVENT	0.97+
1st 5 years ago	DATE	0.95+
Woods Women in Data Science Conference	EVENT	0.94+
Latin American	OTHER	0.9+
Taylor Swift	PERSON	0.88+
second order	QUANTITY	0.82+
Stanford	ORGANIZATION	0.82+
2020	DATE	0.81+
WiDS) Conference 2020	EVENT	0.8+
a few months ago	DATE	0.77+
end	DATE	0.61+
this morning	DATE	0.6+
fifth	EVENT	0.5+
data	TITLE	0.5+
Cube	COMMERCIAL_ITEM	0.5+
annual	QUANTITY	0.4+

Lucy Bernholz, Stanford University | Stanford Women in Data Science (WiDS) Conference 2020

>> Announcer: Live from Stanford University. It's theCUBE, covering Stanford Women in Data Science 2020, brought to you by SiliconANGLE Media. (upbeat music) >> Hi, and welcome to theCUBE. I'm your host, Sonia Tagare. And we're live at Stanford University covering the fifth annual WiDS Women in Data Science Conference. Joining us today is Lucy Bernholz, who is the Senior Research Scholar at Stanford University. Lucy, welcome to theCUBE. >> Thanks for having me. >> So you've led the Digital Civil Society Lab at Stanford for the past 11 years. So tell us more about that. >> Sure, so the Digital Civil Society Lab actually exists because we don't think digital civil society exists. So let me take that apart for you. Civil society is that weird third space outside of markets and outside of government. So it's where we associate together, it's where we as people get together and do things that help other people could be the nonprofit sector, it might be political action, it might be the eight of us just getting together and cleaning up a park or protesting something we don't like. So that's civil society. But what's happened over the last 30 years really is that everything we use to do that work has become dependent on digital systems and those digital systems, some tier, I'm talking gadgets, from our phones, to the infrastructure over which data is exchanged. That entire digital system is built by companies and surveilled by governments. So where do we as people get to go digitally? Where we could have a private conversation to say, "Hey, let's go meet downtown and protest x and y, or let's get together and create an alternative educational opportunity 'cause we feel our kids are being overlooked, whatever." All of that information that get exchanged, all of that associating that we might do in the digital world, it's all being watched. It's all being captured (laughs). And that's a problem because both history and political science, history and democracy theory show us that when there's no space for people to get together voluntarily, take collective action, and do that kind of thinking and planning and communicating it just between the people they want involved in that when that space no longer exists, democracies fall. So the lab exists to try to recreate that space. And in order to do that, we have to first of all recognize that it's being closed in. Secondly, we have to make real technological process, we need a whole set of different kind of different digital devices and norms. We need different kinds of organizations, and we need different laws. So that's what the lab does. >> And how does ethics play into that. >> It's all about ethics. And it's a word I try to avoid actually, because especially in the tech industry, I'll be completely blunt here. It's an empty term. It means nothing the companies are using it to avoid being regulated. People are trying to talk about ethics, but they don't want to talk about values. But you can't do that. Ethics is a code of practice built on a set of articulated values. And if you don't want to talk about values, you don't really having conversation about ethics, you're not having a conversation about the choices you're going to make in a difficult situation. You're not having a conversation over whether one life is worth 5000 lives or everybody's lives are equal. Or if you should shift the playing field to account for the millennia of systemic and structural biases that have been built into our system. There's no conversation about ethics, if you're not talking about that thing and those things. As long as we're just talking about ethics, we're not talking about anything. >> And you were actually on the ethics panel just now. So tell us a little bit about what you guys talked about and what were some highlights. >> So I think one of the key things about the ethics panel here at WiDS this morning was that first of all started the day, which is a good sign. It shouldn't be a separate topic of discussion. We need this conversation about values about what we're trying to build for, who we're trying to protect, how we're trying to recognize individual human agency that has to be built in throughout data science. So it's a good start to have a panel about it, the beginning of the conference, but I'm hopeful that the rest of the conversation will not leave it behind. We talked about the fact that just as civil society is now dependent on these digital systems that it doesn't control. Data scientists are building data sets and algorithmic forms of analysis, that are both of those two things are just coated sets of values. And if you try to have a conversation about that, at just the math level, you're going to miss the social level, you're going to miss the fact that that's humanity you're talking about. So it needs to really be integrated throughout the process. Talking about the values of what you're manipulating, and the values of the world that you're releasing these tools into. >> And what are some key issues today regarding ethics and data science? And what are some solutions? >> So I mean, this is the Women and Data Science Conference that happens because five years ago or whenever it was, the organizers realize, "Hey, women are really underrepresented in data science and maybe we should do something about that." That's true across the board. It's great to see hundreds of women here and around the world participating in the live stream, right? But as women, we need to make sure that as you're thinking about, again, the data and the algorithm, the data and the analysis that we're thinking about all of the people, all of the different kinds of people, all of the different kinds of languages, all of the different abilities, all of the different races, languages, ages, you name it that are represented in that data set and understand those people in context. In your data set, they may look like they're just two different points of data. But in the world writ large, we know perfectly well that women of color face a different environment than white men, right? They don't work, walk through the world in the same way. And it's ridiculous to assume that your shopping algorithm isn't going to affect that difference that they experience to the real world that isn't going to affect that in some way. It's fantasy, to imagine that is not going to work that way. So we need different kinds of people involved in creating the algorithms, different kinds of people in power in the companies who can say we shouldn't build that, we shouldn't use it. We need a different set of teaching mechanisms where people are actually trained to consider from the beginning, what's the intended positive, what's the intended negative, and what is some likely negatives, and then decide how far they go down that path? >> Right and we actually had on Dr. Rumman Chowdhury, from Accenture. And she's really big in data ethics. And she brought up the idea that just because we can doesn't mean that we should. So can you elaborate more on that? >> Yeah well, just because we can analyze massive datasets and possibly make some kind of mathematical model that based on a set of value statements might say, this person is more likely to get this disease or this person is more likely to excel in school in this dynamic or this person's more likely to commit a crime. Those are human experiences. And while analyzing large data sets, that in the best scenario might actually take into account the societal creation that those actual people are living in. Trying to extract that kind of analysis from that social setting, first of all is absurd. Second of all, it's going to accelerate the existing systemic problems. So you've got to use that kind of calculation over just because we could maybe do some things faster or with larger numbers, are the externalities that are going to be caused by doing it that way, the actual harm to living human beings? Or should those just be ignored, just so you can meet your shipping deadline? Because if we expanded our time horizon a little bit, if you expand your time horizon and look at some of the big companies out there now, they're now facing those externalities, and they're doing everything they possibly can to pretend that they didn't create them. And that loop needs to be shortened, so that you can actually sit down at some way through the process before you release some of these things and say, in the short term, it might look like we'd make x profit, but spread out that time horizon I don't know two x. And you face an election and the world's largest, longest lasting, stable democracy that people are losing faith in. Set up the right price to pay for a single company to meet its quarterly profit goals? I don't think so. So we need to reconnect those externalities back to the processes and the organizations that are causing those larger problems. >> Because essentially, having externalities just means that your data is biased. >> Data are biased, data about people are biased because people collect the data. There's this idea that there's some magic debias data set is science fiction. It doesn't exist. It certainly doesn't exist for more than two purposes, right? If we could, and I don't think we can debias a data set to then create an algorithm to do A, that same data set is not going to be debiased for creating algorithm B. Humans are biased. Let's get past this idea that we can strip that bias out of human created tools. What we're doing is we're embedding them in systems that accelerate them and expand them, they make them worse (laughs) right? They make them worse. So I'd spend a whole lot of time figuring out how to improve the systems and structures that we've already encoded with those biases. And using that then to try to inform the data science we're going about, in my opinion, we're going about this backwards. We're building the biases into the data science, and then exporting those tools into bias systems. And guess what problems are getting worse. That so let's stop doing that (laughs). >> Thank you so much for your insight Lucy. Thank you for being on theCUBE. >> Oh, thanks for having me. >> I'm Sonia Tagare, thanks for watching theCUBE. Stay tuned for more. (upbeat music)

Published Date : Mar 3 2020

SUMMARY :

brought to you by SiliconANGLE Media. covering the fifth annual WiDS for the past 11 years. So the lab exists to try to recreate that space. for the millennia of systemic and structural biases So tell us a little bit about what you guys talked about but I'm hopeful that the rest of the conversation that they experience to the real world doesn't mean that we should. And that loop needs to be shortened, just means that your data is biased. that same data set is not going to be debiased Thank you so much for your insight Lucy. I'm Sonia Tagare, thanks for watching theCUBE.

ENTITIES

Entity	Category	Confidence
Lucy Bernholz	PERSON	0.99+
Sonia Tagare	PERSON	0.99+
Lucy	PERSON	0.99+
Digital Civil Society Lab	ORGANIZATION	0.99+
5000 lives	QUANTITY	0.99+
Accenture	ORGANIZATION	0.99+
Rumman Chowdhury	PERSON	0.99+
one life	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
both	QUANTITY	0.98+
five years ago	DATE	0.98+
two things	QUANTITY	0.98+
eight	QUANTITY	0.98+
Stanford University	ORGANIZATION	0.97+
one	QUANTITY	0.97+
theCUBE	ORGANIZATION	0.96+
single company	QUANTITY	0.96+
WiDS Women in Data Science Conference	EVENT	0.96+
today	DATE	0.95+
two different points	QUANTITY	0.95+
Stanford Women in Data Science	EVENT	0.95+
Stanford	LOCATION	0.95+
Secondly	QUANTITY	0.94+
more than two purposes	QUANTITY	0.93+
Women and Data Science Conference	EVENT	0.93+
last 30 years	DATE	0.92+
hundreds of women	QUANTITY	0.91+
Second	QUANTITY	0.91+
first	QUANTITY	0.87+
third space	QUANTITY	0.81+
this morning	DATE	0.81+
Stanford Women in Data Science 2020	EVENT	0.76+
two	QUANTITY	0.73+
past 11 years	DATE	0.71+
Conference 2020	EVENT	0.69+
WiDS)	EVENT	0.67+
WiDS	EVENT	0.62+
fifth annual	QUANTITY	0.58+

John Hoegger, Microsoft | Stanford Women in Data Science (WiDS) Conference 2020

>>live from Stanford University. It's the queue covering Stanford women in data Science 2020. Brought to you by Silicon Angle Media. >>Hi, and welcome to the Cube. I'm your host, Sonia today, Ari. And we're live at Stanford University covering wigs, Women in Data Science Conference 2020 And this is the fifth annual one. Joining us today is John Hoegger, who is the principal data scientist manager at Microsoft. John. Welcome to the Cube. Thanks. So tell us a little bit about your role at Microsoft. >>I manage a central data science team for myself. 3 65 >>And tell us more about what you do on a daily basis. >>Yeah, so we look at it across all the different myself. 365 products Office Windows security products has really try and drive growth, whether it's trying to provide recommendations to customers to end uses to drive more engagement with the products that they use every day. >>And you're also on the Weeds Conference Planning Committee. So tell us about how you joined and how that experience has been like, >>Yeah, actually, I was at Stanford about a week after the very first conference on. I got talking to Karen, one of this co organizers of that that conference and I found out there was only one sponsor very first year, which was WalMart Labs >>on. >>The more that she talked about it, the more that I wanted to be involved on. I thought that makes it really should be a sponsor, this initiative. And so I got details. I went back and my assessment sponsor. Ever since I've been on the committee trying it help with. I didn't find speakers on and review and the different speakers that we have each year. And it's it's amazing just to see how this event has grown over the four years. >>Yeah, that's awesome. So when you first started, how many people attended in the beginning? >>So it started off as we're in this conference with 400 people and just a few other regional events, and so was live streamed but just ready to a few universities. And ever since then it's gone with the words ambassadors and people around the world. >>Yes, and outwits has is over 60 countries on every continent except Antarctica has told them in the Kino a swell as has 400 plus attendees here and his life stream. So how do you think would has evolved over the years? >>Uh, it's it's term from just a conference to a movement. Now it's Ah, there's all these new Our regional events have been set up every year and just people coming together, I'm working together. So, Mike, self hosting different events. We had events in Redmond. I had office and also in New York and Boston and other places as well. >>So as a as a data scientist manager for many years at Microsoft, I'm I'm sure you've seen it increase in women taking technical roles. Tell us a little bit about that. >>Yeah, And for any sort of company you have to try and provide that environment. And part of that is even from recruiting and ensuring that you've got a diverse into s. So we make sure that we have women on every set of interviews to be able to really answer the question. What's it like to be a woman on this team and your old men contents of that question on? So you know that helps as faras we try, encourage more were parented some of these things demos on. I've now got a team of 30 data scientists, and half of them are women, which is great. >>That's also, um So, uh, um, what advice would you give to young professional women who are just coming out of college or who just starting college or interested in a stem field? But maybe think, Oh, I don't know if they'll be anyone like me in the room. >>Uh, you ask the questions when you interview I go for those interviews and asked, like Like, say, What's it like to be a woman on the team? All right. You're really ensuring that the teams that you're joining the companies you joined in a inclusive on and really value diversity in the workforce >>and talking about that as we heard in the opening address that diversity brings more perspectives, and it also helps take away bias from data science. How have you noticed that that bias becoming more fair, especially at your time at Microsoft? >>Yeah, and that's what the rest is about. Is just having those diverse set of perspectives on opinions in heaven. More people just looking like a data and thinking through your holiday to come. Views on and ensure has been used in the right way. >>Right. Um and so, um, what do you going forward? Do you plan to still be on the woods committee? What do you see with is going how DC woods in five years? >>Ah, yeah. I live in for this conference I've been on the committee on. I just expected to continue to grow. I think it's just going right beyond a conference. Dossevi in the podcasts on all the other initiatives that occurring from that. >>Great. >>John, Thank you so much for being on the Cube. It was great having >>you here. Thank you. >>Thanks for watching the Cube. I'm your host, Sonia, to worry and stay tuned for more. Yeah.

Published Date : Mar 3 2020

SUMMARY :

Brought to you by Silicon Angle Media. So tell us a little bit about your role at Microsoft. I manage a central data science team for myself. Yeah, so we look at it across all the different myself. you joined and how that experience has been like, I got talking to Karen, one of this co organizers of that that conference And it's it's amazing just to see how this event has grown over So when you first started, how many people attended in the beginning? So it started off as we're in this conference with 400 people and just a So how do you think would has evolved over the years? Uh, it's it's term from just a conference to a movement. Tell us a little bit about that. So you know that helps as faras we That's also, um So, uh, um, what advice would you give to Uh, you ask the questions when you interview I go for those interviews and asked, and talking about that as we heard in the opening address that diversity brings more perspectives, Yeah, and that's what the rest is about. Um and so, um, what do you going forward? I just expected to continue to grow. John, Thank you so much for being on the Cube. you here. I'm your host, Sonia, to worry and stay tuned for more.

ENTITIES

Entity	Category	Confidence
Karen	PERSON	0.99+
John Hoegger	PERSON	0.99+
Sonia	PERSON	0.99+
Redmond	LOCATION	0.99+
New York	LOCATION	0.99+
Mike	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
John	PERSON	0.99+
Ari	PERSON	0.99+
400 people	QUANTITY	0.99+
Dossevi	PERSON	0.99+
Stanford University	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
WalMart Labs	ORGANIZATION	0.99+
30 data scientists	QUANTITY	0.99+
each year	QUANTITY	0.99+
today	DATE	0.98+
Office	TITLE	0.98+
Weeds Conference Planning Committee	ORGANIZATION	0.98+
one	QUANTITY	0.98+
first conference	QUANTITY	0.97+
five years	QUANTITY	0.97+
one sponsor	QUANTITY	0.97+
over 60 countries	QUANTITY	0.97+
first	QUANTITY	0.96+
400 plus attendees	QUANTITY	0.96+
first year	QUANTITY	0.95+
half	QUANTITY	0.94+
DC	LOCATION	0.94+
Stanford	ORGANIZATION	0.94+
fifth annual	QUANTITY	0.93+
Stanford Women in Data Science (	EVENT	0.88+
Women in Data Science Conference 2020	EVENT	0.87+
Stanford	LOCATION	0.86+
Antarctica	LOCATION	0.85+
four years	QUANTITY	0.79+
3	OTHER	0.78+
WiDS) Conference 2020	EVENT	0.75+
Cube	COMMERCIAL_ITEM	0.74+
365	QUANTITY	0.71+
in data Science 2020	EVENT	0.65+
about a week	DATE	0.64+
Kino	LOCATION	0.63+
Windows	TITLE	0.6+

Daphne Koller, insitro | WiDS Women in Data Science Conference 2020

live from Stanford University it's the hue covering Stanford women in data science 2020 brought to you by Silicon angle media hi and welcome to the cube I'm your host Sonia - Garrett and we're live at Stanford University covering wigs women in data science conference the fifth annual one and joining us today is Daphne Koller who is the co-founder who sari is the CEO and founder of in seat row that Daphne welcome to the cube nice to be here Sonia thank you for having me so tell us a little bit about in seat row how you how it you got it founded and more about your role so I've been working in the intersection of machine learning and biology and health for quite a while and it was always a bit of a an interesting journey in that the data sets were quite small and limited we're now in a different world where there's tools that are allowing us to create massive biological data sets that I think can help us solve really significant societal problems and one of those problems that I think is really important is drug discovery development where despite many important advancements the costs just keep going up and up and up and the question is can we use machine learning to solve that problem better and you talk about this more in your keynote so give us a few highlights of what you talked about so in the last you can think of drug discovery and development in the last 50 to 70 years as being a bit of a glass half-full glass half-empty the glass half-full is the fact that there's diseases that used to be a death sentence or of the sentence still a life long of pain and suffering that are now addressed by some of the modern-day medicines and I think that's absolutely amazing the other side of it is that the cost of developing new drugs has been growing exponentially in what's come to be known as Arun was law being the inverse of Moore's Law which is the one we're all familiar with because the number of drugs approved per billion u.s. dollars just keeps going down exponentially so the question is can we change that curve and you talked in your keynote about the interdisciplinary cold to tell us more about that I think in order to address some of the critical problems that were facing one needs to really build a culture of people who work together at from different disciplines each bringing their own insights and their own ideas into the mix so and in seat row we actually have a company that's half-life scientists many of whom are producing data for the purpose of driving machine learning models and the other half are machine learning people and data scientists who are working on those but it's not a handoff where one group produces the data and the other one consumes and interpreted but really they start from the very beginning to understand what are the problems that one could solve together how do you design the experiment how do you build the model and how do you derive insights from that that can help us make better medicines for people and I also wanted to ask you you co-founded Coursera so tell us a little bit more about that platform so I founded Coursera as a result of work that I'd been doing at Stanford working on how technology can make education better and more accessible this was a project that I did here a number of my colleagues as well and at some point in the fall of 2011 there was an experiment let's take some of the content that we've been we've been developing within it's within Stanford and put it out there for people to just benefit from and we didn't know what would happen would it be a few thousand people but within a matter of weeks with minimal advertising other than one New York Times article that went viral we had a hundred thousand people in each of those courses and that was a moment in time where you know we looked at this and said can we just go back to writing more papers or is there an incredible opportunity to transform access to education to people all over the world and so I ended up taking a what was supposed to be a teary leave of absence from Stanford to go and co-found Coursera and I thought I'd go back after two years but the but at the end of that two-year period the there was just so much more to be done and so much more impact that we could bring to people all over the world people of both genders people of the different social economic status every single country around the world we I just felt like this was something that I couldn't not do and how did you why did you decide to go from an educational platform to then going into machine learning and biomedicine so I've been doing Coursera for about five years in 2016 and the company was on a great trajectory but it's primarily a Content company and around me machine learning was transforming the world and I wanted to come back and be part of that and when I looked around I saw machine learning being applied to ecommerce and the natural language and to self-driving cars but there really wasn't a lot of impact being made on the life science area and I wanted to be part of making that happen partly because I felt like coming back to our earlier comment that in order to really have that impact you need to have someone who speaks both languages and while there's a new generation of researchers who are bilingual in biology and in machine learning there's still a small group and there very few of those in kind of my age cohort and I thought that I would be able to have a real impact by building and company in the space so it sounds like your background is pretty varied what advice would you give to women who are just starting college now who may be interested in a similar field would you tell them they have to major in math or or do you think that maybe like there are some other majors that may be influential as well I think there's a lot of ways to get into data science math is one of them but there's also statistics or physics and I would say that especially for the field that I'm currently in which is at the intersection of machine learning data science on the one hand and biology and health on the other one can get there from biology or medicine as well but what I think is important is not to shy away from the more mathematically oriented courses in whatever major you're in because that found the is a really strong one there's a lot of people out there who are basically lightweight consumers of data science and they don't really understand how the methods that they're deploying how they work and that limits them in their ability to advance the field and come up with new methods that are better suited perhaps to the problems that they're tackling so I think it's totally fine and in fact there's a lot of value to coming into data science from fields other than a third computer science but I think taking courses in those fields even while you're majoring in whatever field you're interested in is going to make you a much better person who lives at that intersection and how do you think having a technology background has helped you in in founding your companies and has helped you become a successful CEO in companies that are very strongly Rd focused like like in C tro and others having a technical co-founder is absolutely essential because it's fine to have an understanding of whatever the user needs and so on and come from the business side of it and a lot of companies have a business co-founder but not understanding what the technology can actually do is highly limiting because you end up hallucinating oh if we could only do this and yet that would be great but you can't and people end up oftentimes making ridiculous promises about what technology will or will not do because they just don't understand where the land mines sit and and where you're gonna hit real obstacles and in the path so I think it's really important to have a strong technical foundation in these companies and that being said where do you see an teacher in the future and and how do you see it solving say Nash that you talked about in your keynote so we hope that in seat row we'll be a fully integrated drug discovery and development company that is based on a slightly different foundation than a traditional pharma company where they grew up in the old approach of that is very much bespoke scientific analysis of the biology of different diseases and then going after targets or our ways of dealing with the disease that are driven by human intuition where I think we have the opportunity to go today is to build a very data-driven approach that collects massive amounts of data and then let analysis of those data really reveal new hypotheses that might not be the ones that the cord with people's preconceptions of what matters and what doesn't and so hopefully we'll be able to over time create enough data and apply machine learning to address key bottlenecks in the drug discovery development process so we can bring better drugs to people and we can do it faster and hopefully at much lower cost that's great and you also mentioned in your keynote that you think that 2020s is like a digital biology era so tell us more about that so I think if you look if you take a historical perspective on science and think back you realize that there's periods in history where one discipline has made a tremendous amount of progress in a relatively short amount of time because of a new technology or a new way of looking at things in the 1870s that discipline was chemistry was the understanding of the periodic table and that you actually couldn't turn lead into gold in the 1900s that was physics with understanding the connection between matter and energy and between space and time in the 1950s that was computing where silicon chips were suddenly able to perform calculations that up until that point only people have been able to do and then in 1990s there was an interesting bifurcation one was the era of data which is related to computing but also involves elements statistics and optimization of neuroscience and the other one was quantitative biology in which biology moved from a descriptive science of techsan amaizing phenomena to really probing and measuring biology in a very detailed and a high-throughput way using techniques like microarrays that measure the activity of 20,000 genes at once Oh the human genome sequencing of the human genome and many others but these two feels kind of evolved in parallel and what I think is coming now 30 years later is the convergence of those two fields into one field that I like to think of as digital biology where we are able using the tools that have and continue to be developed measure biology in entirely new levels of detail of fidelity of scale we can use the techniques of machine learning and data science to interpret what we're seeing and then use some of the technologies that are also emerging to engineer biology to do things that it otherwise wouldn't do and that will have implications in biomaterials in energy in the environment in agriculture and I think also in human health and it's an incredibly exciting space to be in right now because just so much is happening and the opportunities to make a difference and make the world a better place are just so large that sounds awesome Daphne thank you for your insight and thank you for being on cute thank you I'm so neat agario thanks for watching stay tuned for more great

Published Date : Mar 3 2020

SUMMARY :

in the last you can think of drug

ENTITIES

Entity	Category	Confidence
Daphne Koller	PERSON	0.99+
Sonia	PERSON	0.99+
Daphne	PERSON	0.99+
1950s	DATE	0.99+
1990s	DATE	0.99+
Sonia - Garrett	PERSON	0.99+
2016	DATE	0.99+
20,000 genes	QUANTITY	0.99+
1900s	DATE	0.99+
1870s	DATE	0.99+
two fields	QUANTITY	0.99+
one field	QUANTITY	0.99+
Stanford University	ORGANIZATION	0.99+
Stanford	ORGANIZATION	0.99+
Coursera	ORGANIZATION	0.98+
2020s	DATE	0.98+
both languages	QUANTITY	0.98+
both genders	QUANTITY	0.98+
two	QUANTITY	0.98+
fall of 2011	DATE	0.98+
two-year	QUANTITY	0.98+
today	DATE	0.97+
about five years	QUANTITY	0.96+
30 years later	DATE	0.93+
every single country	QUANTITY	0.93+
WiDS Women in Data Science Conference 2020	EVENT	0.93+
one	QUANTITY	0.91+
one discipline	QUANTITY	0.9+
a hundred thousand people	QUANTITY	0.9+
Nash	PERSON	0.89+
sari	PERSON	0.89+
each	QUANTITY	0.84+
Silicon angle media	ORGANIZATION	0.83+
few thousand people	QUANTITY	0.83+
billion u.s. dollars	QUANTITY	0.83+
two years	QUANTITY	0.82+
New York Times	ORGANIZATION	0.8+
one of those problems	QUANTITY	0.79+
Moore's Law	TITLE	0.79+
one group	QUANTITY	0.79+
Coursera	TITLE	0.78+
2020	DATE	0.77+
70 years	QUANTITY	0.76+
third computer	QUANTITY	0.74+
fifth annual one	QUANTITY	0.68+
each of those courses	QUANTITY	0.68+
science	EVENT	0.68+
lot of people	QUANTITY	0.66+
half	QUANTITY	0.64+
per	QUANTITY	0.49+
last 50	DATE	0.46+
Arun	TITLE	0.4+

Latanya Sweeney, Harvard University | Women in Data Science (WiDS) 2018

>> Narrator: Live from Stanford University in Palo Alto, California. It's theCUBE. Covering Women in Data Science Conference 2018. Brought to you by Stanford. (upbeat music) >> Welcome back to theCUBE. We are live at Stanford University for the Third Annual Women in Data Science WiDS Conference. I'm Lisa Marten and we've had a great morning so far talking with a lot the speakers and participants at this event here at Stanford, which of course is going on globally as well. Very excited to be joined by one of the Keynotes this morning at WiDS, Latanya Sweeney, the Professor of Government and Technology from Harvard. Latanya, thank you so much for stopping by theCUBE. >> Well thank you for having me. >> Absolutely. So you are a computer scientist by training. WiDS as a mentioned is in its third year, they're expecting a 100,000 people to engage. There's a 177 I think, Margot said, regional WiDS events going on right now. In 53 countries. >> Isn't that amazing? >> It is! >> It's so exciting. >> Incredible in such a short period of time. What is it about WiDS that was attraction to you saying, "Yes, I want to participate in this event." >> Well one of the issues is just simply the idea the data science represents this sort of wave of change, of how do I analyze data? How do I make it different? And the conference itself celebrating the fact that women are taking the step, is hugely important. I mean, when I was a graduate student at MIT, I was the first black woman to get a PhD in Computer Science from MIT. And sort of, no women you really just didn't see women in this area at all. So when I come to a conference like WiDS, it's huge. It's just huge to see all these walls broken down. >> I love that walls breaking down, barriers kind of evaporating. In your time though at MIT, I'd love to understand a little bit more. Were you very conscience, "Hey I'm one of the very "few females here?" (Latanya laughs) Did it bother you or were you just, "You know what, "this is my passion, and I don't care. "I'm going to keep going forward." What was that experience like? >> Well, at first I was very naive, in a belief that you know all that really mattered was the work I did. And, I never had problems with the students, but I did have lots of problems with the professors, with this idea that you had to be like them in ways that was beyond your brain or your work, in order to really be exalted by them. And so, so whether I wanted to admit it, or whether I just wanted to ignore it, it just sort of came crashing down. >> Did you have mentors at that time, or did you think, "You know what, I'm not finding anybody "that I can really follow. "I've got to by my own mentor right now." >> Right, I mean I don't think my experience is really that uncommon for women in my generation. Very difficult to find mentors who would be complete mentors, complete see themselves in you and really try to exalt you and navigate you. What women often have found is that they can find a partial person here, and a partial person there. One who can help them in this regard, or that regard, but not the same kind of idea that you would be the superstar of one of these mentors. And it's not to take away from the fact that there have been these angels in my life, who made a big difference, and so I don't want to take away from that that somehow I did this all by myself. That's not true. >> So with the conference today, one of the things that Maria Klawe said in her welcome remarks was encouraging this generation, "Don't be worried if there's something "that you're not good at." So I loved how she was sort of encouraging people to sort of, women sort of, let go of maybe some of those preconceived notions that, "I can't do this. "I'm not good at that." I think that it's very liberating and still in 2018 with the fact there is such a diversity gap, it's still so needed. What were maybe some of the three takeaways, if you will, of your Keynote this morning that you imparted on the audience? >> Was that technology design is the new policy maker. That they're making policy, the design itself is making policy, but nobody's like monitoring it. But we could in fact use data science to monitor, to show the unforeseen consequences, and in the examples that we've done that, we've had big impact on the world. >> So share some of that with us, because that's your focus. You're in... What department in Harvard? You said government? >> So I sit in the government department. >> Unforeseen consequences of technology? >> Yes. >> Tell us about that. >> Well, you know, so in the Keynote, I talked about examples where technology is basically challenging every democratic value that we have. And sort of like no one's really aware, we kind of think about it here and there, but by doing simple data science experiments, we can quantify that. We can demonstrate it, and by doing that we shore up sort of those who can help us the most; the advocates, the regulators, and journalists. And so I gave examples from my own work and from the work of my students. >> Tell me a little bit about your students actually. Are they undergrads? Do you also have graduate students as well? >> I have both. >> You have both. >> Both. The talk was about, I teach a class called Data Science to Save the World, and we tackle three to four real world problems within the semester, that we solve. And then the students love to do their own independent projects, and at the end many of those go on to be published papers. >> Wow! I feel like you need to have a cape or some sort of superhero emblem. We can work on that later. But tell me about the diversity within the student body at Harvard in your classes. Are you finding, what's maybe the ratio of men to women, for example? >> Well you know many of the universities from my time have really changed. So when I was an undergraduate the typical classroom of Harvard undergrads would be all white men, or mostly all white men. >> Lisa: Sounds like a lot of STEM's still. (Latanya laughs) >> Yeah, but now if you walk into Harvard we see a lot more diversity within the university. I'm also a faculty dean at one of the residential houses, and so the diversity is huge. However, when you start getting into computer science, you start seeing, you don't see as much diversity. But in the Data Sciences of the World course, we get students from all over. They come from different backgrounds. They come in different colors, shapes, and sizes. Each with a skillset and a desire to learn how to have impact. >> I think that desire is key. How do you help them sort of build their own confidence in terms of, regardless of what color, flavor, you know my peer group is, I like this. I want to be in this. How do you help ignite that confidence within someone that's quite new into this? >> So if you're 20 something or almost 20, and you do something that a regulator changes their laws, or a newspaper article picks up, or you're on the Today Show, that pretty much changes the course of your life, and that's what we found with the students. That some of them have done just some remarkable work that's really been picked up and exalted, and it's stayed with them. It would change the direction in which they've gone. So what we do in the course, is we teach them that there's just so many problems that are low hanging, and how to spot a problem, an issue that they can solve, and how to solve it in a way that can be have impact. And that's really what the course focus is on. >> That impact is so important to just continue to fuel someones fire, and for that person to then be empowered to be able to ignite a fire under somebody else. I think one of the things that you mentioned sort of speaks to some of the things that we're seeing in these boundaries and lines are blurring. Not just so much even on from a gender perspective, but even career path A, B, C, D, now it's data is fueling the world. Every company is becoming a company because they have to be, right, to make consumer demands and just grow and be profitable as a business. But I also I like the parallel there that these rigid maybe, more rigid lines of careers are now opening up, because like you're saying, you can make impact being a data scientist. In every sector you can influence policy and wow, what a huge opportunity. It's almost like it's infinite, right? >> Yeah. I mean if you look at even the range of talks in the conference today, you get a great sense of not only new tools in different areas, but just the sheer spectrum of areas in which data science is playing. And that these women are already working it, already have the impact. >> So, speaking of the conference today, one of the things that I think is that we're hearing, is it's not just about inspiring, I think, Maria Klawe had said in theCUBE previous to today, that she found that young women in their first semester of university college courses, are probably like the right age and time in their lives to really ignite a spark, but I think there's also sort of a reinvigoration of the women that have been in technology and STEM fields for a while. Are you feeling and hearing kind of some of the same things from your peers and colleagues here? >> Definitely. We see it at the two levels. It's really important to try to get them in freshman year before they have a discipline defined for themselves, or how they see themselves. So that you can sort of ignite that spark and keep that spark alive. But then later women who, women or others, who are already in a field and looking for a way to sort of release and redefine themselves, data science is definitely giving them that opportunity. >> It really is. So what are some of the things that you're looking forward to for your career at Harvard as 2018 moves forward? >> Well, we, you know, the students we try to tackle the big problems. Election vulnerabilities has been a big one for us, on our agenda. The privacy of publicly available data is another big one that we've been working on. Well I think that's enough for awhile. (laughs) >> Lisa: That's pretty big. >> Yeah. >> I think so. >> Yeah, we'll get those done! >> Well that and you know, designing the logo for the t-shirt cause you definitely need to have a superpower t-shirt. So last question for you, if you could give young Latanya advice, when you were just starting out college, not knowing any of this was going to happen in terms of this movement that is WiDS and 2018, what would some of those key advice points for you, for your younger self be? >> To believe in yourself. To believe in yourself and that it's going to work out. One of the things that I grew to learn was how to turn lemons into lemonade, and that turns out to be very, very powerful, because it's a way to bounce back when you're faced with things that you can't control, that people are trying to put obstacles in your way, you just sort of find another way to keep going. And the world sort of bended towards me, so that was really cool. >> And also that failure is not a bad F word, right? (Latanya laughs) >> That's absolutely correct. >> It's part of a natural course and I think any leader and whatever and just you're in whatever, country whatever ethnicity, gender, everybody has I wouldn't even say missteps, it's just part of life, but I think... >> Yeah it's just part of the what... And Harvard like I said, I am the dean in one of the faculty houses, and one of the main things that we do each, throughout the year, is invite speakers and who're accomplished in whatever area they're in, but the one thing that they all have in common is they took this really roundabout way to get where they are. And a lot of that was because failures and blocks came in the way, and that's really important I think for young adults to really understand. >> I agree. Well, Latanya, thank you so much for carving out some time to stop by and chat with us on theCUBE. We are excited to have your wisdom shared to our audience and we wish you a great rest of the conference. >> Alright, thank you very much. >> We'll see you next time on theCUBE. >> Okay. >> We want to thank you for watching theCUBE. I'm Lisa Marten. We are live from the Third Annual Women in Data Science Conference at Stanford University. Stick around after this short break, I'll be back with my next guest. (upbeat music)

Published Date : Mar 5 2018

SUMMARY :

Brought to you by Stanford. Latanya, thank you so much for stopping by theCUBE. So you are a computer scientist by training. What is it about WiDS that was attraction to you saying, And sort of, no women you really just didn't Did it bother you or were you just, "You know what, in order to really be exalted by them. Did you have mentors at that time, or did you but not the same kind of idea that you would be the What were maybe some of the three takeaways, if you will, Was that technology design is the new policy maker. So share some of that with us, because that's your focus. and from the work of my students. Do you also have graduate students as well? And then the students love to do their own I feel like you need to have a cape Well you know many of the universities from my time Lisa: Sounds like a lot of STEM's still. But in the Data Sciences of the World course, How do you help ignite that confidence within someone that pretty much changes the course of your life, But I also I like the parallel there that these rigid in the conference today, you get a great sense sort of a reinvigoration of the women that have been So that you can sort of ignite that spark to for your career at Harvard as 2018 moves forward? Well, we, you know, the students Well that and you know, One of the things that I grew to learn was how to It's part of a natural course and I think And a lot of that was because failures and blocks We are excited to have your wisdom shared to our We want to thank you for watching theCUBE.

ENTITIES

Entity	Category	Confidence
Lisa Marten	PERSON	0.99+
Latanya	PERSON	0.99+
Margot	PERSON	0.99+
Latanya Sweeney	PERSON	0.99+
Lisa	PERSON	0.99+
Maria Klawe	PERSON	0.99+
2018	DATE	0.99+
20	QUANTITY	0.99+
Both	QUANTITY	0.99+
three	QUANTITY	0.99+
both	QUANTITY	0.99+
three takeaways	QUANTITY	0.99+
Palo Alto, California	LOCATION	0.99+
first semester	QUANTITY	0.99+
100,000 people	QUANTITY	0.99+
first	QUANTITY	0.99+
today	DATE	0.99+
one	QUANTITY	0.98+
Harvard University	ORGANIZATION	0.98+
WiDS	EVENT	0.98+
two levels	QUANTITY	0.98+
53 countries	QUANTITY	0.98+
Each	QUANTITY	0.98+
third year	QUANTITY	0.98+
MIT	ORGANIZATION	0.97+
four	QUANTITY	0.97+
Stanford	LOCATION	0.97+
Third Annual Women in Data Science WiDS Conference	EVENT	0.97+
Today Show	TITLE	0.97+
Stanford	ORGANIZATION	0.97+
Harvard	ORGANIZATION	0.96+
Third Annual Women in Data Science Conference	EVENT	0.96+
One	QUANTITY	0.95+
one thing	QUANTITY	0.95+
each	QUANTITY	0.94+
Stanford University	ORGANIZATION	0.93+
Covering Women in Data Science Conference 2018	EVENT	0.92+
theCUBE	ORGANIZATION	0.91+
177	QUANTITY	0.89+
Women in Data Science	ORGANIZATION	0.89+
this morning	DATE	0.89+
Data Science to Save the World	TITLE	0.87+
Narrator	TITLE	0.81+
Harvard	LOCATION	0.77+
one of	QUANTITY	0.74+
Professor of Government and Technology	PERSON	0.69+
almost	QUANTITY	0.66+
black	OTHER	0.63+
Stanford University	LOCATION	0.6+
Keynote	TITLE	0.57+
world	QUANTITY	0.5+
WiDS	ORGANIZATION	0.49+
theCUBE	TITLE	0.46+

Data Science for All: It's a Whole New Game

>> There's a movement that's sweeping across businesses everywhere here in this country and around the world. And it's all about data. Today businesses are being inundated with data. To the tune of over two and a half million gigabytes that'll be generated in the next 60 seconds alone. What do you do with all that data? To extract insights you typically turn to a data scientist. But not necessarily anymore. At least not exclusively. Today the ability to extract value from data is becoming a shared mission. A team effort that spans the organization extending far more widely than ever before. Today, data science is being democratized. >> Data Sciences for All: It's a Whole New Game. >> Welcome everyone, I'm Katie Linendoll. I'm a technology expert writer and I love reporting on all things tech. My fascination with tech started very young. I began coding when I was 12. Received my networking certs by 18 and a degree in IT and new media from Rochester Institute of Technology. So as you can tell, technology has always been a sure passion of mine. Having grown up in the digital age, I love having a career that keeps me at the forefront of science and technology innovations. I spend equal time in the field being hands on as I do on my laptop conducting in depth research. Whether I'm diving underwater with NASA astronauts, witnessing the new ways which mobile technology can help rebuild the Philippine's economy in the wake of super typhoons, or sharing a first look at the newest iPhones on The Today Show, yesterday, I'm always on the hunt for the latest and greatest tech stories. And that's what brought me here. I'll be your host for the next hour and as we explore the new phenomenon that is taking businesses around the world by storm. And data science continues to become democratized and extends beyond the domain of the data scientist. And why there's also a mandate for all of us to become data literate. Now that data science for all drives our AI culture. And we're going to be able to take to the streets and go behind the scenes as we uncover the factors that are fueling this phenomenon and giving rise to a movement that is reshaping how businesses leverage data. And putting organizations on the road to AI. So coming up, I'll be doing interviews with data scientists. We'll see real world demos and take a look at how IBM is changing the game with an open data science platform. We'll also be joined by legendary statistician Nate Silver, founder and editor-in-chief of FiveThirtyEight. Who will shed light on how a data driven mindset is changing everything from business to our culture. We also have a few people who are joining us in our studio, so thank you guys for joining us. Come on, I can do better than that, right? Live studio audience, the fun stuff. And for all of you during the program, I want to remind you to join that conversation on social media using the hashtag DSforAll, it's data science for all. Share your thoughts on what data science and AI means to you and your business. And, let's dive into a whole new game of data science. Now I'd like to welcome my co-host General Manager IBM Analytics, Rob Thomas. >> Hello, Katie. >> Come on guys. >> Yeah, seriously. >> No one's allowed to be quiet during this show, okay? >> Right. >> Or, I'll start calling people out. So Rob, thank you so much. I think you know this conversation, we're calling it a data explosion happening right now. And it's nothing new. And when you and I chatted about it. You've been talking about this for years. You have to ask, is this old news at this point? >> Yeah, I mean, well first of all, the data explosion is not coming, it's here. And everybody's in the middle of it right now. What is different is the economics have changed. And the scale and complexity of the data that organizations are having to deal with has changed. And to this day, 80% of the data in the world still sits behind corporate firewalls. So, that's becoming a problem. It's becoming unmanageable. IT struggles to manage it. The business can't get everything they need. Consumers can't consume it when they want. So we have a challenge here. >> It's challenging in the world of unmanageable. Crazy complexity. If I'm sitting here as an IT manager of my business, I'm probably thinking to myself, this is incredibly frustrating. How in the world am I going to get control of all this data? And probably not just me thinking it. Many individuals here as well. >> Yeah, indeed. Everybody's thinking about how am I going to put data to work in my organization in a way I haven't done before. Look, you've got to have the right expertise, the right tools. The other thing that's happening in the market right now is clients are dealing with multi cloud environments. So data behind the firewall in private cloud, multiple public clouds. And they have to find a way. How am I going to pull meaning out of this data? And that brings us to data science and AI. That's how you get there. >> I understand the data science part but I think we're all starting to hear more about AI. And it's incredible that this buzz word is happening. How do businesses adopt to this AI growth and boom and trend that's happening in this world right now? >> Well, let me define it this way. Data science is a discipline. And machine learning is one technique. And then AI puts both machine learning into practice and applies it to the business. So this is really about how getting your business where it needs to go. And to get to an AI future, you have to lay a data foundation today. I love the phrase, "there's no AI without IA." That means you're not going to get to AI unless you have the right information architecture to start with. >> Can you elaborate though in terms of how businesses can really adopt AI and get started. >> Look, I think there's four things you have to do if you're serious about AI. One is you need a strategy for data acquisition. Two is you need a modern data architecture. Three is you need pervasive automation. And four is you got to expand job roles in the organization. >> Data acquisition. First pillar in this you just discussed. Can we start there and explain why it's so critical in this process? >> Yeah, so let's think about how data acquisition has evolved through the years. 15 years ago, data acquisition was about how do I get data in and out of my ERP system? And that was pretty much solved. Then the mobile revolution happens. And suddenly you've got structured and non-structured data. More than you've ever dealt with. And now you get to where we are today. You're talking terabytes, petabytes of data. >> [Katie] Yottabytes, I heard that word the other day. >> I heard that too. >> Didn't even know what it meant. >> You know how many zeros that is? >> I thought we were in Star Wars. >> Yeah, I think it's a lot of zeroes. >> Yodabytes, it's new. >> So, it's becoming more and more complex in terms of how you acquire data. So that's the new data landscape that every client is dealing with. And if you don't have a strategy for how you acquire that and manage it, you're not going to get to that AI future. >> So a natural segue, if you are one of these businesses, how do you build for the data landscape? >> Yeah, so the question I always hear from customers is we need to evolve our data architecture to be ready for AI. And the way I think about that is it's really about moving from static data repositories to more of a fluid data layer. >> And we continue with the architecture. New data architecture is an interesting buzz word to hear. But it's also one of the four pillars. So if you could dive in there. >> Yeah, I mean it's a new twist on what I would call some core data science concepts. For example, you have to leverage tools with a modern, centralized data warehouse. But your data warehouse can't be stagnant to just what's right there. So you need a way to federate data across different environments. You need to be able to bring your analytics to the data because it's most efficient that way. And ultimately, it's about building an optimized data platform that is designed for data science and AI. Which means it has to be a lot more flexible than what clients have had in the past. >> All right. So we've laid out what you need for driving automation. But where does the machine learning kick in? >> Machine learning is what gives you the ability to automate tasks. And I think about machine learning. It's about predicting and automating. And this will really change the roles of data professionals and IT professionals. For example, a data scientist cannot possibly know every algorithm or every model that they could use. So we can automate the process of algorithm selection. Another example is things like automated data matching. Or metadata creation. Some of these things may not be exciting but they're hugely practical. And so when you think about the real use cases that are driving return on investment today, it's things like that. It's automating the mundane tasks. >> Let's go ahead and come back to something that you mentioned earlier because it's fascinating to be talking about this AI journey, but also significant is the new job roles. And what are those other participants in the analytics pipeline? >> Yeah I think we're just at the start of this idea of new job roles. We have data scientists. We have data engineers. Now you see machine learning engineers. Application developers. What's really happening is that data scientists are no longer allowed to work in their own silo. And so the new job roles is about how does everybody have data first in their mind? And then they're using tools to automate data science, to automate building machine learning into applications. So roles are going to change dramatically in organizations. >> I think that's confusing though because we have several organizations who saying is that highly specialized roles, just for data science? Or is it applicable to everybody across the board? >> Yeah, and that's the big question, right? Cause everybody's thinking how will this apply? Do I want this to be just a small set of people in the organization that will do this? But, our view is data science has to for everybody. It's about bring data science to everybody as a shared mission across the organization. Everybody in the company has to be data literate. And participate in this journey. >> So overall, group effort, has to be a common goal, and we all need to be data literate across the board. >> Absolutely. >> Done deal. But at the end of the day, it's kind of not an easy task. >> It's not. It's not easy but it's maybe not as big of a shift as you would think. Because you have to put data in the hands of people that can do something with it. So, it's very basic. Give access to data. Data's often locked up in a lot of organizations today. Give people the right tools. Embrace the idea of choice or diversity in terms of those tools. That gets you started on this path. >> It's interesting to hear you say essentially you need to train everyone though across the board when it comes to data literacy. And I think people that are coming into the work force don't necessarily have a background or a degree in data science. So how do you manage? >> Yeah, so in many cases that's true. I will tell you some universities are doing amazing work here. One example, University of California Berkeley. They offer a course for all majors. So no matter what you're majoring in, you have a course on foundations of data science. How do you bring data science to every role? So it's starting to happen. We at IBM provide data science courses through CognitiveClass.ai. It's for everybody. It's free. And look, if you want to get your hands on code and just dive right in, you go to datascience.ibm.com. The key point is this though. It's more about attitude than it is aptitude. I think anybody can figure this out. But it's about the attitude to say we're putting data first and we're going to figure out how to make this real in our organization. >> I also have to give a shout out to my alma mater because I have heard that there is an offering in MS in data analytics. And they are always on the forefront of new technologies and new majors and on trend. And I've heard that the placement behind those jobs, people graduating with the MS is high. >> I'm sure it's very high. >> So go Tigers. All right, tangential. Let me get back to something else you touched on earlier because you mentioned that a number of customers ask you how in the world do I get started with AI? It's an overwhelming question. Where do you even begin? What do you tell them? >> Yeah, well things are moving really fast. But the good thing is most organizations I see, they're already on the path, even if they don't know it. They might have a BI practice in place. They've got data warehouses. They've got data lakes. Let me give you an example. AMC Networks. They produce a lot of the shows that I'm sure you watch Katie. >> [Katie] Yes, Breaking Bad, Walking Dead, any fans? >> [Rob] Yeah, we've got a few. >> [Katie] Well you taught me something I didn't even know. Because it's amazing how we have all these different industries, but yet media in itself is impacted too. And this is a good example. >> Absolutely. So, AMC Networks, think about it. They've got ads to place. They want to track viewer behavior. What do people like? What do they dislike? So they have to optimize every aspect of their business from marketing campaigns to promotions to scheduling to ads. And their goal was transform data into business insights and really take the burden off of their IT team that was heavily burdened by obviously a huge increase in data. So their VP of BI took the approach of using machine learning to process large volumes of data. They used a platform that was designed for AI and data processing. It's the IBM analytics system where it's a data warehouse, data science tools are built in. It has in memory data processing. And just like that, they were ready for AI. And they're already seeing that impact in their business. >> Do you think a movement of that nature kind of presses other media conglomerates and organizations to say we need to be doing this too? >> I think it's inevitable that everybody, you're either going to be playing, you're either going to be leading, or you'll be playing catch up. And so, as we talk to clients we think about how do you start down this path now, even if you have to iterate over time? Because otherwise you're going to wake up and you're going to be behind. >> One thing worth noting is we've talked about analytics to the data. It's analytics first to the data, not the other way around. >> Right. So, look. We as a practice, we say you want to bring data to where the data sits. Because it's a lot more efficient that way. It gets you better outcomes in terms of how you train models and it's more efficient. And we think that leads to better outcomes. Other organization will say, "Hey move the data around." And everything becomes a big data movement exercise. But once an organization has started down this path, they're starting to get predictions, they want to do it where it's really easy. And that means analytics applied right where the data sits. >> And worth talking about the role of the data scientist in all of this. It's been called the hot job of the decade. And a Harvard Business Review even dubbed it the sexiest job of the 21st century. >> Yes. >> I want to see this on the cover of Vogue. Like I want to see the first data scientist. Female preferred, on the cover of Vogue. That would be amazing. >> Perhaps you can. >> People agree. So what changes for them? Is this challenging in terms of we talk data science for all. Where do all the data science, is it data science for everyone? And how does it change everything? >> Well, I think of it this way. AI gives software super powers. It really does. It changes the nature of software. And at the center of that is data scientists. So, a data scientist has a set of powers that they've never had before in any organization. And that's why it's a hot profession. Now, on one hand, this has been around for a while. We've had actuaries. We've had statisticians that have really transformed industries. But there are a few things that are new now. We have new tools. New languages. Broader recognition of this need. And while it's important to recognize this critical skill set, you can't just limit it to a few people. This is about scaling it across the organization. And truly making it accessible to all. >> So then do we need more data scientists? Or is this something you train like you said, across the board? >> Well, I think you want to do a little bit of both. We want more. But, we can also train more and make the ones we have more productive. The way I think about it is there's kind of two markets here. And we call it clickers and coders. >> [Katie] I like that. That's good. >> So, let's talk about what that means. So clickers are basically somebody that wants to use tools. Create models visually. It's drag and drop. Something that's very intuitive. Those are the clickers. Nothing wrong with that. It's been valuable for years. There's a new crop of data scientists. They want to code. They want to build with the latest open source tools. They want to write in Python or R. These are the coders. And both approaches are viable. Both approaches are critical. Organizations have to have a way to meet the needs of both of those types. And there's not a lot of things available today that do that. >> Well let's keep going on that. Because I hear you talking about the data scientists role and how it's critical to success, but with the new tools, data science and analytics skills can extend beyond the domain of just the data scientist. >> That's right. So look, we're unifying coders and clickers into a single platform, which we call IBM Data Science Experience. And as the demand for data science expertise grows, so does the need for these kind of tools. To bring them into the same environment. And my view is if you have the right platform, it enables the organization to collaborate. And suddenly you've changed the nature of data science from an individual sport to a team sport. >> So as somebody that, my background is in IT, the question is really is this an additional piece of what IT needs to do in 2017 and beyond? Or is it just another line item to the budget? >> So I'm afraid that some people might view it that way. As just another line item. But, I would challenge that and say data science is going to reinvent IT. It's going to change the nature of IT. And every organization needs to think about what are the skills that are critical? How do we engage a broader team to do this? Because once they get there, this is the chance to reinvent how they're performing IT. >> [Katie] Challenging or not? >> Look it's all a big challenge. Think about everything IT organizations have been through. Some of them were late to things like mobile, but then they caught up. Some were late to cloud, but then they caught up. I would just urge people, don't be late to data science. Use this as your chance to reinvent IT. Start with this notion of clickers and coders. This is a seminal moment. Much like mobile and cloud was. So don't be late. >> And I think it's critical because it could be so costly to wait. And Rob and I were even chatting earlier how data analytics is just moving into all different kinds of industries. And I can tell you even personally being effected by how important the analysis is in working in pediatric cancer for the last seven years. I personally implement virtual reality headsets to pediatric cancer hospitals across the country. And it's great. And it's working phenomenally. And the kids are amazed. And the staff is amazed. But the phase two of this project is putting in little metrics in the hardware that gather the breathing, the heart rate to show that we have data. Proof that we can hand over to the hospitals to continue making this program a success. So just in-- >> That's a great example. >> An interesting example. >> Saving lives? >> Yes. >> That's also applying a lot of what we talked about. >> Exciting stuff in the world of data science. >> Yes. Look, I just add this is an existential moment for every organization. Because what you do in this area is probably going to define how competitive you are going forward. And think about if you don't do something. What if one of your competitors goes and creates an application that's more engaging with clients? So my recommendation is start small. Experiment. Learn. Iterate on projects. Define the business outcomes. Then scale up. It's very doable. But you've got to take the first step. >> First step always critical. And now we're going to get to the fun hands on part of our story. Because in just a moment we're going to take a closer look at what data science can deliver. And where organizations are trying to get to. All right. Thank you Rob and now we've been joined by Siva Anne who is going to help us navigate this demo. First, welcome Siva. Give him a big round of applause. Yeah. All right, Rob break down what we're going to be looking at. You take over this demo. >> All right. So this is going to be pretty interesting. So Siva is going to take us through. So he's going to play the role of a financial adviser. Who wants to help better serve clients through recommendations. And I'm going to really illustrate three things. One is how do you federate data from multiple data sources? Inside the firewall, outside the firewall. How do you apply machine learning to predict and to automate? And then how do you move analytics closer to your data? So, what you're seeing here is a custom application for an investment firm. So, Siva, our financial adviser, welcome. So you can see at the top, we've got market data. We pulled that from an external source. And then we've got Siva's calendar in the middle. He's got clients on the right side. So page down, what else do you see down there Siva? >> [Siva] I can see the recent market news. And in here I can see that JP Morgan is calling for a US dollar rebound in the second half of the year. And, I have upcoming meeting with Leo Rakes. I can get-- >> [Rob] So let's go in there. Why don't you click on Leo Rakes. So, you're sitting at your desk, you're deciding how you're going to spend the day. You know you have a meeting with Leo. So you click on it. You immediately see, all right, so what do we know about him? We've got data governance implemented. So we know his age, we know his degree. We can see he's not that aggressive of a trader. Only six trades in the last few years. But then where it gets interesting is you go to the bottom. You start to see predicted industry affinity. Where did that come from? How do we have that? >> [Siva] So these green lines and red arrows here indicate the trending affinity of Leo Rakes for particular industry stocks. What we've done here is we've built machine learning models using customer's demographic data, his stock portfolios, and browsing behavior to build a model which can predict his affinity for a particular industry. >> [Rob] Interesting. So, I like to think of this, we call it celebrity experiences. So how do you treat every customer like they're a celebrity? So to some extent, we're reading his mind. Because without asking him, we know that he's going to have an affinity for auto stocks. So we go down. Now we look at his portfolio. You can see okay, he's got some different holdings. He's got Amazon, Google, Apple, and then he's got RACE, which is the ticker for Ferrari. You can see that's done incredibly well. And so, as a financial adviser, you look at this and you say, all right, we know he loves auto stocks. Ferrari's done very well. Let's create a hedge. Like what kind of security would interest him as a hedge against his position for Ferrari? Could we go figure that out? >> [Siva] Yes. Given I know that he's gotten an affinity for auto stocks, and I also see that Ferrari has got some terminus gains, I want to lock in these gains by hedging. And I want to do that by picking a auto stock which has got negative correlation with Ferrari. >> [Rob] So this is where we get to the idea of in database analytics. Cause you start clicking that and immediately we're getting instant answers of what's happening. So what did we find here? We're going to compare Ferrari and Honda. >> [Siva] I'm going to compare Ferrari with Honda. And what I see here instantly is that Honda has got a negative correlation with Ferrari, which makes it a perfect mix for his stock portfolio. Given he has an affinity for auto stocks and it correlates negatively with Ferrari. >> [Rob] These are very powerful tools at the hand of a financial adviser. You think about it. As a financial adviser, you wouldn't think about federating data, machine learning, pretty powerful. >> [Siva] Yes. So what we have seen here is that using the common SQL engine, we've been able to federate queries across multiple data sources. Db2 Warehouse in the cloud, IBM's Integrated Analytic System, and Hortonworks powered Hadoop platform for the new speeds. We've been able to use machine learning to derive innovative insights about his stock affinities. And drive the machine learning into the appliance. Closer to where the data resides to deliver high performance analytics. >> [Rob] At scale? >> [Siva] We're able to run millions of these correlations across stocks, currency, other factors. And even score hundreds of customers for their affinities on a daily basis. >> That's great. Siva, thank you for playing the role of financial adviser. So I just want to recap briefly. Cause this really powerful technology that's really simple. So we federated, we aggregated multiple data sources from all over the web and internal systems. And public cloud systems. Machine learning models were built that predicted Leo's affinity for a certain industry. In this case, automotive. And then you see when you deploy analytics next to your data, even a financial adviser, just with the click of a button is getting instant answers so they can go be more productive in their next meeting. This whole idea of celebrity experiences for your customer, that's available for everybody, if you take advantage of these types of capabilities. Katie, I'll hand it back to you. >> Good stuff. Thank you Rob. Thank you Siva. Powerful demonstration on what we've been talking about all afternoon. And thank you again to Siva for helping us navigate. Should be give him one more round of applause? We're going to be back in just a moment to look at how we operationalize all of this data. But in first, here's a message from me. If you're a part of a line of business, your main fear is disruption. You know data is the new goal that can create huge amounts of value. So does your competition. And they may be beating you to it. You're convinced there are new business models and revenue sources hidden in all the data. You just need to figure out how to leverage it. But with the scarcity of data scientists, you really can't rely solely on them. You may need more people throughout the organization that have the ability to extract value from data. And as a data science leader or data scientist, you have a lot of the same concerns. You spend way too much time looking for, prepping, and interpreting data and waiting for models to train. You know you need to operationalize the work you do to provide business value faster. What you want is an easier way to do data prep. And rapidly build models that can be easily deployed, monitored and automatically updated. So whether you're a data scientist, data science leader, or in a line of business, what's the solution? What'll it take to transform the way you work? That's what we're going to explore next. All right, now it's time to delve deeper into the nuts and bolts. The nitty gritty of operationalizing data science and creating a data driven culture. How do you actually do that? Well that's what these experts are here to share with us. I'm joined by Nir Kaldero, who's head of data science at Galvanize, which is an education and training organization. Tricia Wang, who is co-founder of Sudden Compass, a consultancy that helps companies understand people with data. And last, but certainly not least, Michael Li, founder and CEO of Data Incubator, which is a data science train company. All right guys. Shall we get right to it? >> All right. >> So data explosion happening right now. And we are seeing it across the board. I just shared an example of how it's impacting my philanthropic work in pediatric cancer. But you guys each have so many unique roles in your business life. How are you seeing it just blow up in your fields? Nir, your thing? >> Yeah, for example like in Galvanize we train many Fortune 500 companies. And just by looking at the demand of companies that wants us to help them go through this digital transformation is mind-blowing. Data point by itself. >> Okay. Well what we're seeing what's going on is that data science like as a theme, is that it's actually for everyone now. But what's happening is that it's actually meeting non technical people. But what we're seeing is that when non technical people are implementing these tools or coming at these tools without a base line of data literacy, they're often times using it in ways that distance themselves from the customer. Because they're implementing data science tools without a clear purpose, without a clear problem. And so what we do at Sudden Compass is that we work with companies to help them embrace and understand the complexity of their customers. Because often times they are misusing data science to try and flatten their understanding of the customer. As if you can just do more traditional marketing. Where you're putting people into boxes. And I think the whole ROI of data is that you can now understand people's relationships at a much more complex level at a greater scale before. But we have to do this with basic data literacy. And this has to involve technical and non technical people. >> Well you can have all the data in the world, and I think it speaks to, if you're not doing the proper movement with it, forget it. It means nothing at the same time. >> No absolutely. I mean, I think that when you look at the huge explosion in data, that comes with it a huge explosion in data experts. Right, we call them data scientists, data analysts. And sometimes they're people who are very, very talented, like the people here. But sometimes you have people who are maybe re-branding themselves, right? Trying to move up their title one notch to try to attract that higher salary. And I think that that's one of the things that customers are coming to us for, right? They're saying, hey look, there are a lot of people that call themselves data scientists, but we can't really distinguish. So, we have sort of run a fellowship where you help companies hire from a really talented group of folks, who are also truly data scientists and who know all those kind of really important data science tools. And we also help companies internally. Fortune 500 companies who are looking to grow that data science practice that they have. And we help clients like McKinsey, BCG, Bain, train up their customers, also their clients, also their workers to be more data talented. And to build up that data science capabilities. >> And Nir, this is something you work with a lot. A lot of Fortune 500 companies. And when we were speaking earlier, you were saying many of these companies can be in a panic. >> Yeah. >> Explain that. >> Yeah, so you know, not all Fortune 500 companies are fully data driven. And we know that the winners in this fourth industrial revolution, which I like to call the machine intelligence revolution, will be companies who navigate and transform their organization to unlock the power of data science and machine learning. And the companies that are not like that. Or not utilize data science and predictive power well, will pretty much get shredded. So they are in a panic. >> Tricia, companies have to deal with data behind the firewall and in the new multi cloud world. How do organizations start to become driven right to the core? >> I think the most urgent question to become data driven that companies should be asking is how do I bring the complex reality that our customers are experiencing on the ground in to a corporate office? Into the data models. So that question is critical because that's how you actually prevent any big data disasters. And that's how you leverage big data. Because when your data models are really far from your human models, that's when you're going to do things that are really far off from how, it's going to not feel right. That's when Tesco had their terrible big data disaster that they're still recovering from. And so that's why I think it's really important to understand that when you implement big data, you have to further embrace thick data. The qualitative, the emotional stuff, that is difficult to quantify. But then comes the difficult art and science that I think is the next level of data science. Which is that getting non technical and technical people together to ask how do we find those unknown nuggets of insights that are difficult to quantify? Then, how do we do the next step of figuring out how do you mathematically scale those insights into a data model? So that actually is reflective of human understanding? And then we can start making decisions at scale. But you have to have that first. >> That's absolutely right. And I think that when we think about what it means to be a data scientist, right? I always think about it in these sort of three pillars. You have the math side. You have to have that kind of stats, hardcore machine learning background. You have the programming side. You don't work with small amounts of data. You work with large amounts of data. You've got to be able to type the code to make those computers run. But then the last part is that human element. You have to understand the domain expertise. You have to understand what it is that I'm actually analyzing. What's the business proposition? And how are the clients, how are the users actually interacting with the system? That human element that you were talking about. And I think having somebody who understands all of those and not just in isolation, but is able to marry that understanding across those different topics, that's what makes a data scientist. >> But I find that we don't have people with those skill sets. And right now the way I see teams being set up inside companies is that they're creating these isolated data unicorns. These data scientists that have graduated from your programs, which are great. But, they don't involve the people who are the domain experts. They don't involve the designers, the consumer insight people, the people, the salespeople. The people who spend time with the customers day in and day out. Somehow they're left out of the room. They're consulted, but they're not a stakeholder. >> Can I actually >> Yeah, yeah please. >> Can I actually give a quick example? So for example, we at Galvanize train the executives and the managers. And then the technical people, the data scientists and the analysts. But in order to actually see all of the RY behind the data, you also have to have a creative fluid conversation between non technical and technical people. And this is a major trend now. And there's a major gap. And we need to increase awareness and kind of like create a new, kind of like environment where technical people also talks seamlessly with non technical ones. >> [Tricia] We call-- >> That's one of the things that we see a lot. Is one of the trends in-- >> A major trend. >> data science training is it's not just for the data science technical experts. It's not just for one type of person. So a lot of the training we do is sort of data engineers. People who are more on the software engineering side learning more about the stats of math. And then people who are sort of traditionally on the stat side learning more about the engineering. And then managers and people who are data analysts learning about both. >> Michael, I think you said something that was of interest too because I think we can look at IBM Watson as an example. And working in healthcare. The human component. Because often times we talk about machine learning and AI, and data and you get worried that you still need that human component. Especially in the world of healthcare. And I think that's a very strong point when it comes to the data analysis side. Is there any particular example you can speak to of that? >> So I think that there was this really excellent paper a while ago talking about all the neuro net stuff and trained on textual data. So looking at sort of different corpuses. And they found that these models were highly, highly sexist. They would read these corpuses and it's not because neuro nets themselves are sexist. It's because they're reading the things that we write. And it turns out that we write kind of sexist things. And they would sort of find all these patterns in there that were sort of latent, that had a lot of sort of things that maybe we would cringe at if we sort of saw. And I think that's one of the really important aspects of the human element, right? It's being able to come in and sort of say like, okay, I know what the biases of the system are, I know what the biases of the tools are. I need to figure out how to use that to make the tools, make the world a better place. And like another area where this comes up all the time is lending, right? So the federal government has said, and we have a lot of clients in the financial services space, so they're constantly under these kind of rules that they can't make discriminatory lending practices based on a whole set of protected categories. Race, sex, gender, things like that. But, it's very easy when you train a model on credit scores to pick that up. And then to have a model that's inadvertently sexist or racist. And that's where you need the human element to come back in and say okay, look, you're using the classic example would be zip code, you're using zip code as a variable. But when you look at it, zip codes actually highly correlated with race. And you can't do that. So you may inadvertently by sort of following the math and being a little naive about the problem, inadvertently introduce something really horrible into a model and that's where you need a human element to sort of step in and say, okay hold on. Slow things down. This isn't the right way to go. >> And the people who have -- >> I feel like, I can feel her ready to respond. >> Yes, I'm ready. >> She's like let me have at it. >> And the people here it is. And the people who are really great at providing that human intelligence are social scientists. We are trained to look for bias and to understand bias in data. Whether it's quantitative or qualitative. And I really think that we're going to have less of these kind of problems if we had more integrated teams. If it was a mandate from leadership to say no data science team should be without a social scientist, ethnographer, or qualitative researcher of some kind, to be able to help see these biases. >> The talent piece is actually the most crucial-- >> Yeah. >> one here. If you look about how to enable machine intelligence in organization there are the pillars that I have in my head which is the culture, the talent and the technology infrastructure. And I believe and I saw in working very closely with the Fortune 100 and 200 companies that the talent piece is actually the most important crucial hard to get. >> [Tricia] I totally agree. >> It's absolutely true. Yeah, no I mean I think that's sort of like how we came up with our business model. Companies were basically saying hey, I can't hire data scientists. And so we have a fellowship where we get 2,000 applicants each quarter. We take the top 2% and then we sort of train them up. And we work with hiring companies who then want to hire from that population. And so we're sort of helping them solve that problem. And the other half of it is really around training. Cause with a lot of industries, especially if you're sort of in a more regulated industry, there's a lot of nuances to what you're doing. And the fastest way to develop that data science or AI talent may not necessarily be to hire folks who are coming out of a PhD program. It may be to take folks internally who have a lot of that domain knowledge that you have and get them trained up on those data science techniques. So we've had large insurance companies come to us and say hey look, we hire three or four folks from you a quarter. That doesn't move the needle for us. What we really need is take the thousand actuaries and statisticians that we have and get all of them trained up to become a data scientist and become data literate in this new open source world. >> [Katie] Go ahead. >> All right, ladies first. >> Go ahead. >> Are you sure? >> No please, fight first. >> Go ahead. >> Go ahead Nir. >> So this is actually a trend that we have been seeing in the past year or so that companies kind of like start to look how to upscale and look for talent within the organization. So they can actually move them to become more literate and navigate 'em from analyst to data scientist. And from data scientist to machine learner. So this is actually a trend that is happening already for a year or so. >> Yeah, but I also find that after they've gone through that training in getting people skilled up in data science, the next problem that I get is executives coming to say we've invested in all of this. We're still not moving the needle. We've already invested in the right tools. We've gotten the right skills. We have enough scale of people who have these skills. Why are we not moving the needle? And what I explain to them is look, you're still making decisions in the same way. And you're still not involving enough of the non technical people. Especially from marketing, which is now, the CMO's are much more responsible for driving growth in their companies now. But often times it's so hard to change the old way of marketing, which is still like very segmentation. You know, demographic variable based, and we're trying to move people to say no, you have to understand the complexity of customers and not put them in boxes. >> And I think underlying a lot of this discussion is this question of culture, right? >> Yes. >> Absolutely. >> How do you build a data driven culture? And I think that that culture question, one of the ways that comes up quite often in especially in large, Fortune 500 enterprises, is that they are very, they're not very comfortable with sort of example, open source architecture. Open source tools. And there is some sort of residual bias that that's somehow dangerous. So security vulnerability. And I think that that's part of the cultural challenge that they often have in terms of how do I build a more data driven organization? Well a lot of the talent really wants to use these kind of tools. And I mean, just to give you an example, we are partnering with one of the major cloud providers to sort of help make open source tools more user friendly on their platform. So trying to help them attract the best technologists to use their platform because they want and they understand the value of having that kind of open source technology work seamlessly on their platforms. So I think that just sort of goes to show you how important open source is in this movement. And how much large companies and Fortune 500 companies and a lot of the ones we work with have to embrace that. >> Yeah, and I'm seeing it in our work. Even when we're working with Fortune 500 companies, is that they've already gone through the first phase of data science work. Where I explain it was all about the tools and getting the right tools and architecture in place. And then companies started moving into getting the right skill set in place. Getting the right talent. And what you're talking about with culture is really where I think we're talking about the third phase of data science, which is looking at communication of these technical frameworks so that we can get non technical people really comfortable in the same room with data scientists. That is going to be the phase, that's really where I see the pain point. And that's why at Sudden Compass, we're really dedicated to working with each other to figure out how do we solve this problem now? >> And I think that communication between the technical stakeholders and management and leadership. That's a very critical piece of this. You can't have a successful data science organization without that. >> Absolutely. >> And I think that actually some of the most popular trainings we've had recently are from managers and executives who are looking to say, how do I become more data savvy? How do I figure out what is this data science thing and how do I communicate with my data scientists? >> You guys made this way too easy. I was just going to get some popcorn and watch it play out. >> Nir, last 30 seconds. I want to leave you with an opportunity to, anything you want to add to this conversation? >> I think one thing to conclude is to say that companies that are not data driven is about time to hit refresh and figure how they transition the organization to become data driven. To become agile and nimble so they can actually see what opportunities from this important industrial revolution. Otherwise, unfortunately they will have hard time to survive. >> [Katie] All agreed? >> [Tricia] Absolutely, you're right. >> Michael, Trish, Nir, thank you so much. Fascinating discussion. And thank you guys again for joining us. We will be right back with another great demo. Right after this. >> Thank you Katie. >> Once again, thank you for an excellent discussion. Weren't they great guys? And thank you for everyone who's tuning in on the live webcast. As you can hear, we have an amazing studio audience here. And we're going to keep things moving. I'm now joined by Daniel Hernandez and Siva Anne. And we're going to turn our attention to how you can deliver on what they're talking about using data science experience to do data science faster. >> Thank you Katie. Siva and I are going to spend the next 10 minutes showing you how you can deliver on what they were saying using the IBM Data Science Experience to do data science faster. We'll demonstrate through new features we introduced this week how teams can work together more effectively across the entire analytics life cycle. How you can take advantage of any and all data no matter where it is and what it is. How you could use your favorite tools from open source. And finally how you could build models anywhere and employ them close to where your data is. Remember the financial adviser app Rob showed you? To build an app like that, we needed a team of data scientists, developers, data engineers, and IT staff to collaborate. We do this in the Data Science Experience through a concept we call projects. When I create a new project, I can now use the new Github integration feature. We're doing for data science what we've been doing for developers for years. Distributed teams can work together on analytics projects. And take advantage of Github's version management and change management features. This is a huge deal. Let's explore the project we created for the financial adviser app. As you can see, our data engineer Joane, our developer Rob, and others are collaborating this project. Joane got things started by bringing together the trusted data sources we need to build the app. Taking a closer look at the data, we see that our customer and profile data is stored on our recently announced IBM Integrated Analytics System, which runs safely behind our firewall. We also needed macro economic data, which she was able to find in the Federal Reserve. And she stored it in our Db2 Warehouse on Cloud. And finally, she selected stock news data from NASDAQ.com and landed that in a Hadoop cluster, which happens to be powered by Hortonworks. We added a new feature to the Data Science Experience so that when it's installed with Hortonworks, it automatically uses a need of security and governance controls within the cluster so your data is always secure and safe. Now we want to show you the news data we stored in the Hortonworks cluster. This is the mean administrative console. It's powered by an open source project called Ambari. And here's the news data. It's in parquet files stored in HDFS, which happens to be a distributive file system. To get the data from NASDAQ into our cluster, we used IBM's BigIntegrate and BigQuality to create automatic data pipelines that acquire, cleanse, and ingest that news data. Once the data's available, we use IBM's Big SQL to query that data using SQL statements that are much like the ones we would use for any relation of data, including the data that we have in the Integrated Analytics System and Db2 Warehouse on Cloud. This and the federation capabilities that Big SQL offers dramatically simplifies data acquisition. Now we want to show you how we support a brand new tool that we're excited about. Since we launched last summer, the Data Science Experience has supported Jupyter and R for data analysis and visualization. In this week's update, we deeply integrated another great open source project called Apache Zeppelin. It's known for having great visualization support, advanced collaboration features, and is growing in popularity amongst the data science community. This is an example of Apache Zeppelin and the notebook we created through it to explore some of our data. Notice how wonderful and easy the data visualizations are. Now we want to walk you through the Jupyter notebook we created to explore our customer preference for stocks. We use notebooks to understand and explore data. To identify the features that have some predictive power. Ultimately, we're trying to assess what ultimately is driving customer stock preference. Here we did the analysis to identify the attributes of customers that are likely to purchase auto stocks. We used this understanding to build our machine learning model. For building machine learning models, we've always had tools integrated into the Data Science Experience. But sometimes you need to use tools you already invested in. Like our very own SPSS as well as SAS. Through new import feature, you can easily import those models created with those tools. This helps you avoid vendor lock-in, and simplify the development, training, deployment, and management of all your models. To build the models we used in app, we could have coded, but we prefer a visual experience. We used our customer profile data in the Integrated Analytic System. Used the Auto Data Preparation to cleanse our data. Choose the binary classification algorithms. Let the Data Science Experience evaluate between logistic regression and gradient boosted tree. It's doing the heavy work for us. As you can see here, the Data Science Experience generated performance metrics that show us that the gradient boosted tree is the best performing algorithm for the data we gave it. Once we save this model, it's automatically deployed and available for developers to use. Any application developer can take this endpoint and consume it like they would any other API inside of the apps they built. We've made training and creating machine learning models super simple. But what about the operations? A lot of companies are struggling to ensure their model performance remains high over time. In our financial adviser app, we know that customer data changes constantly, so we need to always monitor model performance and ensure that our models are retrained as is necessary. This is a dashboard that shows the performance of our models and lets our teams monitor and retrain those models so that they're always performing to our standards. So far we've been showing you the Data Science Experience available behind the firewall that we're using to build and train models. Through a new publish feature, you can build models and deploy them anywhere. In another environment, private, public, or anywhere else with just a few clicks. So here we're publishing our model to the Watson machine learning service. It happens to be in the IBM cloud. And also deeply integrated with our Data Science Experience. After publishing and switching to the Watson machine learning service, you can see that our stock affinity and model that we just published is there and ready for use. So this is incredibly important. I just want to say it again. The Data Science Experience allows you to train models behind your own firewall, take advantage of your proprietary and sensitive data, and then deploy those models wherever you want with ease. So summarize what we just showed you. First, IBM's Data Science Experience supports all teams. You saw how our data engineer populated our project with trusted data sets. Our data scientists developed, trained, and tested a machine learning model. Our developers used APIs to integrate machine learning into their apps. And how IT can use our Integrated Model Management dashboard to monitor and manage model performance. Second, we support all data. On premises, in the cloud, structured, unstructured, inside of your firewall, and outside of it. We help you bring analytics and governance to where your data is. Third, we support all tools. The data science tools that you depend on are readily available and deeply integrated. This includes capabilities from great partners like Hortonworks. And powerful tools like our very own IBM SPSS. And fourth, and finally, we support all deployments. You can build your models anywhere, and deploy them right next to where your data is. Whether that's in the public cloud, private cloud, or even on the world's most reliable transaction platform, IBM z. So see for yourself. Go to the Data Science Experience website, take us for a spin. And if you happen to be ready right now, our recently created Data Science Elite Team can help you get started and run experiments alongside you with no charge. Thank you very much. >> Thank you very much Daniel. It seems like a great time to get started. And thanks to Siva for taking us through it. Rob and I will be back in just a moment to add some perspective right after this. All right, once again joined by Rob Thomas. And Rob obviously we got a lot of information here. >> Yes, we've covered a lot of ground. >> This is intense. You got to break it down for me cause I think we zoom out and see the big picture. What better data science can deliver to a business? Why is this so important? I mean we've heard it through and through. >> Yeah, well, I heard it a couple times. But it starts with businesses have to embrace a data driven culture. And it is a change. And we need to make data accessible with the right tools in a collaborative culture because we've got diverse skill sets in every organization. But data driven companies succeed when data science tools are in the hands of everyone. And I think that's a new thought. I think most companies think just get your data scientist some tools, you'll be fine. This is about tools in the hands of everyone. I think the panel did a great job of describing about how we get to data science for all. Building a data culture, making it a part of your everyday operations, and the highlights of what Daniel just showed us, that's some pretty cool features for how organizations can get to this, which is you can see IBM's Data Science Experience, how that supports all teams. You saw data analysts, data scientists, application developer, IT staff, all working together. Second, you saw how we support all tools. And your choice of tools. So the most popular data science libraries integrated into one platform. And we saw some new capabilities that help companies avoid lock-in, where you can import existing models created from specialist tools like SPSS or others. And then deploy them and manage them inside of Data Science Experience. That's pretty interesting. And lastly, you see we continue to build on this best of open tools. Partnering with companies like H2O, Hortonworks, and others. Third, you can see how you use all data no matter where it lives. That's a key challenge every organization's going to face. Private, public, federating all data sources. We announced new integration with the Hortonworks data platform where we deploy machine learning models where your data resides. That's been a key theme. Analytics where the data is. And lastly, supporting all types of deployments. Deploy them in your Hadoop cluster. Deploy them in your Integrated Analytic System. Or deploy them in z, just to name a few. A lot of different options here. But look, don't believe anything I say. Go try it for yourself. Data Science Experience, anybody can use it. Go to datascience.ibm.com and look, if you want to start right now, we just created a team that we call Data Science Elite. These are the best data scientists in the world that will come sit down with you and co-create solutions, models, and prove out a proof of concept. >> Good stuff. Thank you Rob. So you might be asking what does an organization look like that embraces data science for all? And how could it transform your role? I'm going to head back to the office and check it out. Let's start with the perspective of the line of business. What's changed? Well, now you're starting to explore new business models. You've uncovered opportunities for new revenue sources and all that hidden data. And being disrupted is no longer keeping you up at night. As a data science leader, you're beginning to collaborate with a line of business to better understand and translate the objectives into the models that are being built. Your data scientists are also starting to collaborate with the less technical team members and analysts who are working closest to the business problem. And as a data scientist, you stop feeling like you're falling behind. Open source tools are keeping you current. You're also starting to operationalize the work that you do. And you get to do more of what you love. Explore data, build models, put your models into production, and create business impact. All in all, it's not a bad scenario. Thanks. All right. We are back and coming up next, oh this is a special time right now. Cause we got a great guest speaker. New York Magazine called him the spreadsheet psychic and number crunching prodigy who went from correctly forecasting baseball games to correctly forecasting presidential elections. He even invented a proprietary algorithm called PECOTA for predicting future performance by baseball players and teams. And his New York Times bestselling book, The Signal and the Noise was named by Amazon.com as the number one best non-fiction book of 2012. He's currently the Editor in Chief of the award winning website, FiveThirtyEight and appears on ESPN as an on air commentator. Big round of applause. My pleasure to welcome Nate Silver. >> Thank you. We met backstage. >> Yes. >> It feels weird to re-shake your hand, but you know, for the audience. >> I had to give the intense firm grip. >> Definitely. >> The ninja grip. So you and I have crossed paths kind of digitally in the past, which it really interesting, is I started my career at ESPN. And I started as a production assistant, then later back on air for sports technology. And I go to you to talk about sports because-- >> Yeah. >> Wow, has ESPN upped their game in terms of understanding the importance of data and analytics. And what it brings. Not just to MLB, but across the board. >> No, it's really infused into the way they present the broadcast. You'll have win probability on the bottom line. And they'll incorporate FiveThirtyEight metrics into how they cover college football for example. So, ESPN ... Sports is maybe the perfect, if you're a data scientist, like the perfect kind of test case. And the reason being that sports consists of problems that have rules. And have structure. And when problems have rules and structure, then it's a lot easier to work with. So it's a great way to kind of improve your skills as a data scientist. Of course, there are also important real world problems that are more open ended, and those present different types of challenges. But it's such a natural fit. The teams. Think about the teams playing the World Series tonight. The Dodgers and the Astros are both like very data driven, especially Houston. Golden State Warriors, the NBA Champions, extremely data driven. New England Patriots, relative to an NFL team, it's shifted a little bit, the NFL bar is lower. But the Patriots are certainly very analytical in how they make decisions. So, you can't talk about sports without talking about analytics. >> And I was going to save the baseball question for later. Cause we are moments away from game seven. >> Yeah. >> Is everyone else watching game seven? It's been an incredible series. Probably one of the best of all time. >> Yeah, I mean-- >> You have a prediction here? >> You can mention that too. So I don't have a prediction. FiveThirtyEight has the Dodgers with a 60% chance of winning. >> [Katie] LA Fans. >> So you have two teams that are about equal. But the Dodgers pitching staff is in better shape at the moment. The end of a seven game series. And they're at home. >> But the statistics behind the two teams is pretty incredible. >> Yeah. It's like the first World Series in I think 56 years or something where you have two 100 win teams facing one another. There have been a lot of parity in baseball for a lot of years. Not that many offensive overall juggernauts. But this year, and last year with the Cubs and the Indians too really. But this year, you have really spectacular teams in the World Series. It kind of is a showcase of modern baseball. Lots of home runs. Lots of strikeouts. >> [Katie] Lots of extra innings. >> Lots of extra innings. Good defense. Lots of pitching changes. So if you love the modern baseball game, it's been about the best example that you've had. If you like a little bit more contact, and fewer strikeouts, maybe not so much. But it's been a spectacular and very exciting World Series. It's amazing to talk. MLB is huge with analysis. I mean, hands down. But across the board, if you can provide a few examples. Because there's so many teams in front offices putting such an, just a heavy intensity on the analysis side. And where the teams are going. And if you could provide any specific examples of teams that have really blown your mind. Especially over the last year or two. Because every year it gets more exciting if you will. I mean, so a big thing in baseball is defensive shifts. So if you watch tonight, you'll probably see a couple of plays where if you're used to watching baseball, a guy makes really solid contact. And there's a fielder there that you don't think should be there. But that's really very data driven where you analyze where's this guy hit the ball. That part's not so hard. But also there's game theory involved. Because you have to adjust for the fact that he knows where you're positioning the defenders. He's trying therefore to make adjustments to his own swing and so that's been a major innovation in how baseball is played. You know, how bullpens are used too. Where teams have realized that actually having a guy, across all sports pretty much, realizing the importance of rest. And of fatigue. And that you can be the best pitcher in the world, but guess what? After four or five innings, you're probably not as good as a guy who has a fresh arm necessarily. So I mean, it really is like, these are not subtle things anymore. It's not just oh, on base percentage is valuable. It really effects kind of every strategic decision in baseball. The NBA, if you watch an NBA game tonight, see how many three point shots are taken. That's in part because of data. And teams realizing hey, three points is worth more than two, once you're more than about five feet from the basket, the shooting percentage gets really flat. And so it's revolutionary, right? Like teams that will shoot almost half their shots from the three point range nowadays. Larry Bird, who wound up being one of the greatest three point shooters of all time, took only eight three pointers his first year in the NBA. It's quite noticeable if you watch baseball or basketball in particular. >> Not to focus too much on sports. One final question. In terms of Major League Soccer, and now in NFL, we're having the analysis and having wearables where it can now showcase if they wanted to on screen, heart rate and breathing and how much exertion. How much data is too much data? And when does it ruin the sport? >> So, I don't think, I mean, again, it goes sport by sport a little bit. I think in basketball you actually have a more exciting game. I think the game is more open now. You have more three pointers. You have guys getting higher assist totals. But you know, I don't know. I'm not one of those people who thinks look, if you love baseball or basketball, and you go in to work for the Astros, the Yankees or the Knicks, they probably need some help, right? You really have to be passionate about that sport. Because it's all based on what questions am I asking? As I'm a fan or I guess an employee of the team. Or a player watching the game. And there isn't really any substitute I don't think for the insight and intuition that a curious human has to kind of ask the right questions. So we can talk at great length about what tools do you then apply when you have those questions, but that still comes from people. I don't think machine learning could help with what questions do I want to ask of the data. It might help you get the answers. >> If you have a mid-fielder in a soccer game though, not exerting, only 80%, and you're seeing that on a screen as a fan, and you're saying could that person get fired at the end of the day? One day, with the data? >> So we found that actually some in soccer in particular, some of the better players are actually more still. So Leo Messi, maybe the best player in the world, doesn't move as much as other soccer players do. And the reason being that A) he kind of knows how to position himself in the first place. B) he realizes that you make a run, and you're out of position. That's quite fatiguing. And particularly soccer, like basketball, is a sport where it's incredibly fatiguing. And so, sometimes the guys who conserve their energy, that kind of old school mentality, you have to hustle at every moment. That is not helpful to the team if you're hustling on an irrelevant play. And therefore, on a critical play, can't get back on defense, for example. >> Sports, but also data is moving exponentially as we're just speaking about today. Tech, healthcare, every different industry. Is there any particular that's a favorite of yours to cover? And I imagine they're all different as well. >> I mean, I do like sports. We cover a lot of politics too. Which is different. I mean in politics I think people aren't intuitively as data driven as they might be in sports for example. It's impressive to follow the breakthroughs in artificial intelligence. It started out just as kind of playing games and playing chess and poker and Go and things like that. But you really have seen a lot of breakthroughs in the last couple of years. But yeah, it's kind of infused into everything really. >> You're known for your work in politics though. Especially presidential campaigns. >> Yeah. >> This year, in particular. Was it insanely challenging? What was the most notable thing that came out of any of your predictions? >> I mean, in some ways, looking at the polling was the easiest lens to look at it. So I think there's kind of a myth that last year's result was a big shock and it wasn't really. If you did the modeling in the right way, then you realized that number one, polls have a margin of error. And so when a candidate has a three point lead, that's not particularly safe. Number two, the outcome between different states is correlated. Meaning that it's not that much of a surprise that Clinton lost Wisconsin and Michigan and Pennsylvania and Ohio. You know I'm from Michigan. Have friends from all those states. Kind of the same types of people in those states. Those outcomes are all correlated. So what people thought was a big upset for the polls I think was an example of how data science done carefully and correctly where you understand probabilities, understand correlations. Our model gave Trump a 30% chance of winning. Others models gave him a 1% chance. And so that was interesting in that it showed that number one, that modeling strategies and skill do matter quite a lot. When you have someone saying 30% versus 1%. I mean, that's a very very big spread. And number two, that these aren't like solved problems necessarily. Although again, the problem with elections is that you only have one election every four years. So I can be very confident that I have a better model. Even one year of data doesn't really prove very much. Even five or 10 years doesn't really prove very much. And so, being aware of the limitations to some extent intrinsically in elections when you only get one kind of new training example every four years, there's not really any way around that. There are ways to be more robust to sparce data environments. But if you're identifying different types of business problems to solve, figuring out what's a solvable problem where I can add value with data science is a really key part of what you're doing. >> You're such a leader in this space. In data and analysis. It would be interesting to kind of peek back the curtain, understand how you operate but also how large is your team? How you're putting together information. How quickly you're putting it out. Cause I think in this right now world where everybody wants things instantly-- >> Yeah. >> There's also, you want to be first too in the world of journalism. But you don't want to be inaccurate because that's your credibility. >> We talked about this before, right? I think on average, speed is a little bit overrated in journalism. >> [Katie] I think it's a big problem in journalism. >> Yeah. >> Especially in the tech world. You have to be first. You have to be first. And it's just pumping out, pumping out. And there's got to be more time spent on stories if I can speak subjectively. >> Yeah, for sure. But at the same time, we are reacting to the news. And so we have people that come in, we hire most of our people actually from journalism. >> [Katie] How many people do you have on your team? >> About 35. But, if you get someone who comes in from an academic track for example, they might be surprised at how fast journalism is. That even though we might be slower than the average website, the fact that there's a tragic event in New York, are there things we have to say about that? A candidate drops out of the presidential race, are things we have to say about that. In periods ranging from minutes to days as opposed to kind of weeks to months to years in the academic world. The corporate world moves faster. What is a little different about journalism is that you are expected to have more precision where people notice when you make a mistake. In corporations, you have maybe less transparency. If you make 10 investments and seven of them turn out well, then you'll get a lot of profit from that, right? In journalism, it's a little different. If you make kind of seven predictions or say seven things, and seven of them are very accurate and three of them aren't, you'll still get criticized a lot for the three. Just because that's kind of the way that journalism is. And so the kind of combination of needing, not having that much tolerance for mistakes, but also needing to be fast. That is tricky. And I criticize other journalists sometimes including for not being data driven enough, but the best excuse any journalist has, this is happening really fast and it's my job to kind of figure out in real time what's going on and provide useful information to the readers. And that's really difficult. Especially in a world where literally, I'll probably get off the stage and check my phone and who knows what President Trump will have tweeted or what things will have happened. But it really is a kind of 24/7. >> Well because it's 24/7 with FiveThirtyEight, one of the most well known sites for data, are you feeling micromanagey on your people? Because you do have to hit this balance. You can't have something come out four or five days later. >> Yeah, I'm not -- >> Are you overseeing everything? >> I'm not by nature a micromanager. And so you try to hire well. You try and let people make mistakes. And the flip side of this is that if a news organization that never had any mistakes, never had any corrections, that's raw, right? You have to have some tolerance for error because you are trying to decide things in real time. And figure things out. I think transparency's a big part of that. Say here's what we think, and here's why we think it. If we have a model to say it's not just the final number, here's a lot of detail about how that's calculated. In some case we release the code and the raw data. Sometimes we don't because there's a proprietary advantage. But quite often we're saying we want you to trust us and it's so important that you trust us, here's the model. Go play around with it yourself. Here's the data. And that's also I think an important value. >> That speaks to open source. And your perspective on that in general. >> Yeah, I mean, look, I'm a big fan of open source. I worry that I think sometimes the trends are a little bit away from open source. But by the way, one thing that happens when you share your data or you share your thinking at least in lieu of the data, and you can definitely do both is that readers will catch embarrassing mistakes that you made. By the way, even having open sourceness within your team, I mean we have editors and copy editors who often save you from really embarrassing mistakes. And by the way, it's not necessarily people who have a training in data science. I would guess that of our 35 people, maybe only five to 10 have a kind of formal background in what you would call data science. >> [Katie] I think that speaks to the theme here. >> Yeah. >> [Katie] That everybody's kind of got to be data literate. >> But yeah, it is like you have a good intuition. You have a good BS detector basically. And you have a good intuition for hey, this looks a little bit out of line to me. And sometimes that can be based on domain knowledge, right? We have one of our copy editors, she's a big college football fan. And we had an algorithm we released that tries to predict what the human being selection committee will do, and she was like, why is LSU rated so high? Cause I know that LSU sucks this year. And we looked at it, and she was right. There was a bug where it had forgotten to account for their last game where they lost to Troy or something and so -- >> That also speaks to the human element as well. >> It does. In general as a rule, if you're designing a kind of regression based model, it's different in machine learning where you have more, when you kind of build in the tolerance for error. But if you're trying to do something more precise, then so much of it is just debugging. It's saying that looks wrong to me. And I'm going to investigate that. And sometimes it's not wrong. Sometimes your model actually has an insight that you didn't have yourself. But fairly often, it is. And I think kind of what you learn is like, hey if there's something that bothers me, I want to go investigate that now and debug that now. Because the last thing you want is where all of a sudden, the answer you're putting out there in the world hinges on a mistake that you made. Cause you never know if you have so to speak, 1,000 lines of code and they all perform something differently. You never know when you get in a weird edge case where this one decision you made winds up being the difference between your having a good forecast and a bad one. In a defensible position and a indefensible one. So we definitely are quite diligent and careful. But it's also kind of knowing like, hey, where is an approximation good enough and where do I need more precision? Cause you could also drive yourself crazy in the other direction where you know, it doesn't matter if the answer is 91.2 versus 90. And so you can kind of go 91.2, three, four and it's like kind of A) false precision and B) not a good use of your time. So that's where I do still spend a lot of time is thinking about which problems are "solvable" or approachable with data and which ones aren't. And when they're not by the way, you're still allowed to report on them. We are a news organization so we do traditional reporting as well. And then kind of figuring out when do you need precision versus when is being pointed in the right direction good enough? >> I would love to get inside your brain and see how you operate on just like an everyday walking to Walgreens movement. It's like oh, if I cross the street in .2-- >> It's not, I mean-- >> Is it like maddening in there? >> No, not really. I mean, I'm like-- >> This is an honest question. >> If I'm looking for airfares, I'm a little more careful. But no, part of it's like you don't want to waste time on unimportant decisions, right? I will sometimes, if I can't decide what to eat at a restaurant, I'll flip a coin. If the chicken and the pasta both sound really good-- >> That's not high tech Nate. We want better. >> But that's the point, right? It's like both the chicken and the pasta are going to be really darn good, right? So I'm not going to waste my time trying to figure it out. I'm just going to have an arbitrary way to decide. >> Serious and business, how organizations in the last three to five years have just evolved with this data boom. How are you seeing it as from a consultant point of view? Do you think it's an exciting time? Do you think it's a you must act now time? >> I mean, we do know that you definitely see a lot of talent among the younger generation now. That so FiveThirtyEight has been at ESPN for four years now. And man, the quality of the interns we get has improved so much in four years. The quality of the kind of young hires that we make straight out of college has improved so much in four years. So you definitely do see a younger generation for which this is just part of their bloodstream and part of their DNA. And also, particular fields that we're interested in. So we're interested in people who have both a data and a journalism background. We're interested in people who have a visualization and a coding background. A lot of what we do is very much interactive graphics and so forth. And so we do see those skill sets coming into play a lot more. And so the kind of shortage of talent that had I think frankly been a problem for a long time, I'm optimistic based on the young people in our office, it's a little anecdotal but you can tell that there are so many more programs that are kind of teaching students the right set of skills that maybe weren't taught as much a few years ago. >> But when you're seeing these big organizations, ESPN as perfect example, moving more towards data and analytics than ever before. >> Yeah. >> You would say that's obviously true. >> Oh for sure. >> If you're not moving that direction, you're going to fall behind quickly. >> Yeah and the thing is, if you read my book or I guess people have a copy of the book. In some ways it's saying hey, there are lot of ways to screw up when you're using data. And we've built bad models. We've had models that were bad and got good results. Good models that got bad results and everything else. But the point is that the reason to be out in front of the problem is so you give yourself more runway to make errors and mistakes. And to learn kind of what works and what doesn't and which people to put on the problem. I sometimes do worry that a company says oh we need data. And everyone kind of agrees on that now. We need data science. Then they have some big test case. And they have a failure. And they maybe have a failure because they didn't know really how to use it well enough. But learning from that and iterating on that. And so by the time that you're on the third generation of kind of a problem that you're trying to solve, and you're watching everyone else make the mistake that you made five years ago, I mean, that's really powerful. But that doesn't mean that getting invested in it now, getting invested both in technology and the human capital side is important. >> Final question for you as we run out of time. 2018 beyond, what is your biggest project in terms of data gathering that you're working on? >> There's a midterm election coming up. That's a big thing for us. We're also doing a lot of work with NBA data. So for four years now, the NBA has been collecting player tracking data. So they have 3D cameras in every arena. So they can actually kind of quantify for example how fast a fast break is, for example. Or literally where a player is and where the ball is. For every NBA game now for the past four or five years. And there hasn't really been an overall metric of player value that's taken advantage of that. The teams do it. But in the NBA, the teams are a little bit ahead of journalists and analysts. So we're trying to have a really truly next generation stat. It's a lot of data. Sometimes I now more oversee things than I once did myself. And so you're parsing through many, many, many lines of code. But yeah, so we hope to have that out at some point in the next few months. >> Anything you've personally been passionate about that you've wanted to work on and kind of solve? >> I mean, the NBA thing, I am a pretty big basketball fan. >> You can do better than that. Come on, I want something real personal that you're like I got to crunch the numbers. >> You know, we tried to figure out where the best burrito in America was a few years ago. >> I'm going to end it there. >> Okay. >> Nate, thank you so much for joining us. It's been an absolute pleasure. Thank you. >> Cool, thank you. >> I thought we were going to chat World Series, you know. Burritos, important. I want to thank everybody here in our audience. Let's give him a big round of applause. >> [Nate] Thank you everyone. >> Perfect way to end the day. And for a replay of today's program, just head on over to ibm.com/dsforall. I'm Katie Linendoll. And this has been Data Science for All: It's a Whole New Game. Test one, two. One, two, three. Hi guys, I just want to quickly let you know as you're exiting. A few heads up. Downstairs right now there's going to be a meet and greet with Nate. And we're going to be doing that with clients and customers who are interested. So I would recommend before the game starts, and you lose Nate, head on downstairs. And also the gallery is open until eight p.m. with demos and activations. And tomorrow, make sure to come back too. Because we have exciting stuff. I'll be joining you as your host. And we're kicking off at nine a.m. So bye everybody, thank you so much. >> [Announcer] Ladies and gentlemen, thank you for attending this evening's webcast. If you are not attending all cloud and cognitive summit tomorrow, we ask that you recycle your name badge at the registration desk. Thank you. Also, please note there are two exits on the back of the room on either side of the room. Have a good evening. Ladies and gentlemen, the meet and greet will be on stage. Thank you.

Published Date : Nov 1 2017

SUMMARY :

Today the ability to extract value from data is becoming a shared mission. And for all of you during the program, I want to remind you to join that conversation on And when you and I chatted about it. And the scale and complexity of the data that organizations are having to deal with has It's challenging in the world of unmanageable. And they have to find a way. AI. And it's incredible that this buzz word is happening. And to get to an AI future, you have to lay a data foundation today. And four is you got to expand job roles in the organization. First pillar in this you just discussed. And now you get to where we are today. And if you don't have a strategy for how you acquire that and manage it, you're not going And the way I think about that is it's really about moving from static data repositories And we continue with the architecture. So you need a way to federate data across different environments. So we've laid out what you need for driving automation. And so when you think about the real use cases that are driving return on investment today, Let's go ahead and come back to something that you mentioned earlier because it's fascinating And so the new job roles is about how does everybody have data first in their mind? Everybody in the company has to be data literate. So overall, group effort, has to be a common goal, and we all need to be data literate But at the end of the day, it's kind of not an easy task. It's not easy but it's maybe not as big of a shift as you would think. It's interesting to hear you say essentially you need to train everyone though across the And look, if you want to get your hands on code and just dive right in, you go to datascience.ibm.com. And I've heard that the placement behind those jobs, people graduating with the MS is high. Let me get back to something else you touched on earlier because you mentioned that a number They produce a lot of the shows that I'm sure you watch Katie. And this is a good example. So they have to optimize every aspect of their business from marketing campaigns to promotions And so, as we talk to clients we think about how do you start down this path now, even It's analytics first to the data, not the other way around. We as a practice, we say you want to bring data to where the data sits. And a Harvard Business Review even dubbed it the sexiest job of the 21st century. Female preferred, on the cover of Vogue. And how does it change everything? And while it's important to recognize this critical skill set, you can't just limit it And we call it clickers and coders. [Katie] I like that. And there's not a lot of things available today that do that. Because I hear you talking about the data scientists role and how it's critical to success, And my view is if you have the right platform, it enables the organization to collaborate. And every organization needs to think about what are the skills that are critical? Use this as your chance to reinvent IT. And I can tell you even personally being effected by how important the analysis is in working And think about if you don't do something. And now we're going to get to the fun hands on part of our story. And then how do you move analytics closer to your data? And in here I can see that JP Morgan is calling for a US dollar rebound in the second half But then where it gets interesting is you go to the bottom. data, his stock portfolios, and browsing behavior to build a model which can predict his affinity And so, as a financial adviser, you look at this and you say, all right, we know he loves And I want to do that by picking a auto stock which has got negative correlation with Ferrari. Cause you start clicking that and immediately we're getting instant answers of what's happening. And what I see here instantly is that Honda has got a negative correlation with Ferrari, As a financial adviser, you wouldn't think about federating data, machine learning, pretty And drive the machine learning into the appliance. And even score hundreds of customers for their affinities on a daily basis. And then you see when you deploy analytics next to your data, even a financial adviser, And as a data science leader or data scientist, you have a lot of the same concerns. But you guys each have so many unique roles in your business life. And just by looking at the demand of companies that wants us to help them go through this And I think the whole ROI of data is that you can now understand people's relationships Well you can have all the data in the world, and I think it speaks to, if you're not doing And I think that that's one of the things that customers are coming to us for, right? And Nir, this is something you work with a lot. And the companies that are not like that. Tricia, companies have to deal with data behind the firewall and in the new multi cloud And so that's why I think it's really important to understand that when you implement big And how are the clients, how are the users actually interacting with the system? And right now the way I see teams being set up inside companies is that they're creating But in order to actually see all of the RY behind the data, you also have to have a creative That's one of the things that we see a lot. So a lot of the training we do is sort of data engineers. And I think that's a very strong point when it comes to the data analysis side. And that's where you need the human element to come back in and say okay, look, you're And the people who are really great at providing that human intelligence are social scientists. the talent piece is actually the most important crucial hard to get. It may be to take folks internally who have a lot of that domain knowledge that you have And from data scientist to machine learner. And what I explain to them is look, you're still making decisions in the same way. And I mean, just to give you an example, we are partnering with one of the major cloud And what you're talking about with culture is really where I think we're talking about And I think that communication between the technical stakeholders and management You guys made this way too easy. I want to leave you with an opportunity to, anything you want to add to this conversation? I think one thing to conclude is to say that companies that are not data driven is And thank you guys again for joining us. And we're going to turn our attention to how you can deliver on what they're talking about And finally how you could build models anywhere and employ them close to where your data is. And thanks to Siva for taking us through it. You got to break it down for me cause I think we zoom out and see the big picture. And we saw some new capabilities that help companies avoid lock-in, where you can import And as a data scientist, you stop feeling like you're falling behind. We met backstage. And I go to you to talk about sports because-- And what it brings. And the reason being that sports consists of problems that have rules. And I was going to save the baseball question for later. Probably one of the best of all time. FiveThirtyEight has the Dodgers with a 60% chance of winning. So you have two teams that are about equal. It's like the first World Series in I think 56 years or something where you have two 100 And that you can be the best pitcher in the world, but guess what? And when does it ruin the sport? So we can talk at great length about what tools do you then apply when you have those And the reason being that A) he kind of knows how to position himself in the first place. And I imagine they're all different as well. But you really have seen a lot of breakthroughs in the last couple of years. You're known for your work in politics though. What was the most notable thing that came out of any of your predictions? And so, being aware of the limitations to some extent intrinsically in elections when It would be interesting to kind of peek back the curtain, understand how you operate but But you don't want to be inaccurate because that's your credibility. I think on average, speed is a little bit overrated in journalism. And there's got to be more time spent on stories if I can speak subjectively. And so we have people that come in, we hire most of our people actually from journalism. And so the kind of combination of needing, not having that much tolerance for mistakes, Because you do have to hit this balance. And so you try to hire well. And your perspective on that in general. But by the way, one thing that happens when you share your data or you share your thinking And you have a good intuition for hey, this looks a little bit out of line to me. And I think kind of what you learn is like, hey if there's something that bothers me, It's like oh, if I cross the street in .2-- I mean, I'm like-- But no, part of it's like you don't want to waste time on unimportant decisions, right? We want better. It's like both the chicken and the pasta are going to be really darn good, right? Serious and business, how organizations in the last three to five years have just And man, the quality of the interns we get has improved so much in four years. But when you're seeing these big organizations, ESPN as perfect example, moving more towards But the point is that the reason to be out in front of the problem is so you give yourself Final question for you as we run out of time. And so you're parsing through many, many, many lines of code. You can do better than that. You know, we tried to figure out where the best burrito in America was a few years Nate, thank you so much for joining us. I thought we were going to chat World Series, you know. And also the gallery is open until eight p.m. with demos and activations. If you are not attending all cloud and cognitive summit tomorrow, we ask that you recycle your

ENTITIES

Entity	Category	Confidence
Tricia Wang	PERSON	0.99+
Katie	PERSON	0.99+
Katie Linendoll	PERSON	0.99+
Rob	PERSON	0.99+
Google	ORGANIZATION	0.99+
Joane	PERSON	0.99+
Daniel	PERSON	0.99+
Michael Li	PERSON	0.99+
Nate Silver	PERSON	0.99+
Apple	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Trump	PERSON	0.99+
Nate	PERSON	0.99+
Honda	ORGANIZATION	0.99+
Siva	PERSON	0.99+
McKinsey	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Larry Bird	PERSON	0.99+
2017	DATE	0.99+
Rob Thomas	PERSON	0.99+
Michigan	LOCATION	0.99+
Yankees	ORGANIZATION	0.99+
New York	LOCATION	0.99+
Clinton	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Tesco	ORGANIZATION	0.99+
Michael	PERSON	0.99+
America	LOCATION	0.99+
Leo	PERSON	0.99+
four years	QUANTITY	0.99+
five	QUANTITY	0.99+
30%	QUANTITY	0.99+
Astros	ORGANIZATION	0.99+
Trish	PERSON	0.99+
Sudden Compass	ORGANIZATION	0.99+
Leo Messi	PERSON	0.99+
two teams	QUANTITY	0.99+
1,000 lines	QUANTITY	0.99+
one year	QUANTITY	0.99+
10 investments	QUANTITY	0.99+
NASDAQ	ORGANIZATION	0.99+
The Signal and the Noise	TITLE	0.99+
Tricia	PERSON	0.99+
Nir Kaldero	PERSON	0.99+
80%	QUANTITY	0.99+
BCG	ORGANIZATION	0.99+
Daniel Hernandez	PERSON	0.99+
ESPN	ORGANIZATION	0.99+
H2O	ORGANIZATION	0.99+
Ferrari	ORGANIZATION	0.99+
last year	DATE	0.99+
18	QUANTITY	0.99+
three	QUANTITY	0.99+
Data Incubator	ORGANIZATION	0.99+
Patriots	ORGANIZATION	0.99+

Data Science: Present and Future | IBM Data Science For All

>> Announcer: Live from New York City it's The Cube, covering IBM data science for all. Brought to you by IBM. (light digital music) >> Welcome back to data science for all. It's a whole new game. And it is a whole new game. >> Dave Vellante, John Walls here. We've got quite a distinguished panel. So it is a new game-- >> Well we're in the game, I'm just happy to be-- (both laugh) Have a swing at the pitch. >> Well let's what we have here. Five distinguished members of our panel. It'll take me a minute to get through the introductions, but believe me they're worth it. Jennifer Shin joins us. Jennifer's the founder of 8 Path Solutions, the director of the data science of Comcast and part of the faculty at UC Berkeley and NYU. Jennifer, nice to have you with us, we appreciate the time. Joe McKendrick an analyst and contributor of Forbes and ZDNet, Joe, thank you for being here at well. Another ZDNetter next to him, Dion Hinchcliffe, who is a vice president and principal analyst of Constellation Research and also contributes to ZDNet. Good to see you, sir. To the back row, but that doesn't mean anything about the quality of the participation here. Bob Hayes with a killer Batman shirt on by the way, which we'll get to explain in just a little bit. He runs the Business over Broadway. And Joe Caserta, who the founder of Caserta Concepts. Welcome to all of you. Thanks for taking the time to be with us. Jennifer, let me just begin with you. Obviously as a practitioner you're very involved in the industry, you're on the academic side as well. We mentioned Berkeley, NYU, steep experience. So I want you to kind of take your foot in both worlds and tell me about data science. I mean where do we stand now from those two perspectives? How have we evolved to where we are? And how would you describe, I guess the state of data science? >> Yeah so I think that's a really interesting question. There's a lot of changes happening. In part because data science has now become much more established, both in the academic side as well as in industry. So now you see some of the bigger problems coming out. People have managed to have data pipelines set up. But now there are these questions about models and accuracy and data integration. So the really cool stuff from the data science standpoint. We get to get really into the details of the data. And I think on the academic side you now see undergraduate programs, not just graduate programs, but undergraduate programs being involved. UC Berkeley just did a big initiative that they're going to offer data science to undergrads. So that's a huge news for the university. So I think there's a lot of interest from the academic side to continue data science as a major, as a field. But I think in industry one of the difficulties you're now having is businesses are now asking that question of ROI, right? What do I actually get in return in the initial years? So I think there's a lot of work to be done and just a lot of opportunity. It's great because people now understand better with data sciences, but I think data sciences have to really think about that seriously and take it seriously and really think about how am I actually getting a return, or adding a value to the business? >> And there's lot to be said is there not, just in terms of increasing the workforce, the acumen, the training that's required now. It's a still relatively new discipline. So is there a shortage issue? Or is there just a great need? Is the opportunity there? I mean how would you look at that? >> Well I always think there's opportunity to be smart. If you can be smarter, you know it's always better. It gives you advantages in the workplace, it gets you an advantage in academia. The question is, can you actually do the work? The work's really hard, right? You have to learn all these different disciplines, you have to be able to technically understand data. Then you have to understand it conceptually. You have to be able to model with it, you have to be able to explain it. There's a lot of aspects that you're not going to pick up overnight. So I think part of it is endurance. Like are people going to feel motivated enough and dedicate enough time to it to get very good at that skill set. And also of course, you know in terms of industry, will there be enough interest in the long term that there will be a financial motivation. For people to keep staying in the field, right? So I think it's definitely a lot of opportunity. But that's always been there. Like I tell people I think of myself as a scientist and data science happens to be my day job. That's just the job title. But if you are a scientist and you work with data you'll always want to work with data. I think that's just an inherent need. It's kind of a compulsion, you just kind of can't help yourself, but dig a little bit deeper, ask the questions, you can't not think about it. So I think that will always exist. Whether or not it's an industry job in the way that we see it today, and like five years from now, or 10 years from now. I think that's something that's up for debate. >> So all of you have watched the evolution of data and how it effects organizations for a number of years now. If you go back to the days when data warehouse was king, we had a lot of promises about 360 degree views of the customer and how we were going to be more anticipatory in terms and more responsive. In many ways the decision support systems and the data warehousing world didn't live up to those promises. They solved other problems for sure. And so everybody was looking for big data to solve those problems. And they've begun to attack many of them. We talked earlier in The Cube today about fraud detection, it's gotten much, much better. Certainly retargeting of advertising has gotten better. But I wonder if you could comment, you know maybe start with Joe. As to the effect that data and data sciences had on organizations in terms of fulfilling that vision of a 360 degree view of customers and anticipating customer needs. >> So. Data warehousing, I wouldn't say failed. But I think it was unfinished in order to achieve what we need done today. At the time I think it did a pretty good job. I think it was the only place where we were able to collect data from all these different systems, have it in a single place for analytics. The big difference between what I think, between data warehousing and data science is data warehouses were primarily made for the consumer to human beings. To be able to have people look through some tool and be able to analyze data manually. That really doesn't work anymore, there's just too much data to do that. So that's why we need to build a science around it so that we can actually have machines actually doing the analytics for us. And I think that's the biggest stride in the evolution over the past couple of years, that now we're actually able to do that, right? It used to be very, you know you go back to when data warehouses started, you had to be a deep technologist in order to be able to collect the data, write the programs to clean the data. But now you're average causal IT person can do that. Right now I think we're back in data science where you have to be a fairly sophisticated programmer, analyst, scientist, statistician, engineer, in order to do what we need to do, in order to make machines actually understand the data. But I think part of the evolution, we're just in the forefront. We're going to see over the next, not even years, within the next year I think a lot of new innovation where the average person within business and definitely the average person within IT will be able to do as easily say, "What are my sales going to be next year?" As easy as it is to say, "What were my sales last year." Where now it's a big deal. Right now in order to do that you have to build some algorithms, you have to be a specialist on predictive analytics. And I think, you know as the tools mature, as people using data matures, and as the technology ecosystem for data matures, it's going to be easier and more accessible. >> So it's still too hard. (laughs) That's something-- >> Joe C.: Today it is yes. >> You've written about and talked about. >> Yeah no question about it. We see this citizen data scientist. You know we talked about the democratization of data science but the way we talk about analytics and warehousing and all the tools we had before, they generated a lot of insights and views on the information, but they didn't really give us the science part. And that's, I think that what's missing is the forming of the hypothesis, the closing of the loop of. We now have use of this data, but are are changing, are we thinking about it strategically? Are we learning from it and then feeding that back into the process. I think that's the big difference between data science and the analytics side. But, you know just like Google made search available to everyone, not just people who had highly specialized indexers or crawlers. Now we can have tools that make these capabilities available to anyone. You know going back to what Joe said I think the key thing is we now have tools that can look at all the data and ask all the questions. 'Cause we can't possibly do it all ourselves. Our organizations are increasingly awash in data. Which is the life blood of our organizations, but we're not using it, you know this a whole concept of dark data. And so I think the concept, or the promise of opening these tools up for everyone to be able to access those insights and activate them, I think that, you know, that's where it's headed. >> This is kind of where the T shirt comes in right? So Bob if you would, so you've got this Batman shirt on. We talked a little bit about it earlier, but it plays right into what Dion's talking about. About tools and, I don't want to spoil it, but you go ahead (laughs) and tell me about it. >> Right, so. Batman is a super hero, but he doesn't have any supernatural powers, right? He can't fly on his own, he can't become invisible on his own. But the thing is he has the utility belt and he has these tools he can use to help him solve problems. For example he as the bat ring when he's confronted with a building that he wants to get over, right? So he pulls it out and uses that. So as data professionals we have all these tools now that these vendors are making. We have IBM SPSS, we have data science experience. IMB Watson that these data pros can now use it as part of their utility belt and solve problems that they're confronted with. So if you''re ever confronted with like a Churn problem and you have somebody who has access to that data they can put that into IBM Watson, ask a question and it'll tell you what's the key driver of Churn. So it's not that you have to be a superhuman to be a data scientist, but these tools will help you solve certain problems and help your business go forward. >> Joe McKendrick, do you have a comment? >> Does that make the Batmobile the Watson? (everyone laughs) Analogy? >> I was just going to add that, you know all of the billionaires in the world today and none of them decided to become Batman yet. It's very disappointing. >> Yeah. (Joe laughs) >> Go ahead Joe. >> And I just want to add some thoughts to our discussion about what happened with data warehousing. I think it's important to point out as well that data warehousing, as it existed, was fairly successful but for larger companies. Data warehousing is a very expensive proposition it remains a expensive proposition. Something that's in the domain of the Fortune 500. But today's economy is based on a very entrepreneurial model. The Fortune 500s are out there of course it's ever shifting. But you have a lot of smaller companies a lot of people with start ups. You have people within divisions of larger companies that want to innovate and not be tied to the corporate balance sheet. They want to be able to go through, they want to innovate and experiment without having to go through finance and the finance department. So there's all these open source tools available. There's cloud resources as well as open source tools. Hadoop of course being a prime example where you can work with the data and experiment with the data and practice data science at a very low cost. >> Dion mentioned the C word, citizen data scientist last year at the panel. We had a conversation about that. And the data scientists on the panel generally were like, "Stop." Okay, we're not all of a sudden going to turn everybody into data scientists however, what we want to do is get people thinking about data, more focused on data, becoming a data driven organization. I mean as a data scientist I wonder if you could comment on that. >> Well I think so the other side of that is, you know there are also many people who maybe didn't, you know follow through with science, 'cause it's also expensive. A PhD takes a lot of time. And you know if you don't get funding it's a lot of money. And for very little security if you think about how hard it is to get a teaching job that's going to give you enough of a pay off to pay that back. Right, the time that you took off, the investment that you made. So I think the other side of that is by making data more accessible, you allow people who could have been great in science, have an opportunity to be great data scientists. And so I think for me the idea of citizen data scientist, that's where the opportunity is. I think in terms of democratizing data and making it available for everyone, I feel as though it's something similar to the way we didn't really know what KPIs were, maybe 20 years ago. People didn't use it as readily, didn't teach it in schools. I think maybe 10, 20 years from now, some of the things that we're building today from data science, hopefully more people will understand how to use these tools. They'll have a better understanding of working with data and what that means, and just data literacy right? Just being able to use these tools and be able to understand what data's saying and actually what it's not saying. Which is the thing that most people don't think about. But you can also say that data doesn't say anything. There's a lot of noise in it. There's too much noise to be able to say that there is a result. So I think that's the other side of it. So yeah I guess in terms for me, in terms of data a serious data scientist, I think it's a great idea to have that, right? But at the same time of course everyone kind of emphasized you don't want everyone out there going, "I can be a data scientist without education, "without statistics, without math," without understanding of how to implement the process. I've seen a lot of companies implement the same sort of process from 10, 20 years ago just on Hadoop instead of SQL. Right and it's very inefficient. And the only difference is that you can build more tables wrong than they could before. (everyone laughs) Which is I guess >> For less. it's an accomplishment and for less, it's cheaper, yeah. >> It is cheaper. >> Otherwise we're like I'm not a data scientist but I did stay at a Holiday Inn Express last night, right? >> Yeah. (panelists laugh) And there's like a little bit of pride that like they used 2,000, you know they used 2,000 computers to do it. Like a little bit of pride about that, but you know of course maybe not a great way to go. I think 20 years we couldn't do that, right? One computer was already an accomplishment to have that resource. So I think you have to think about the fact that if you're doing it wrong, you're going to just make that mistake bigger, which his also the other side of working with data. >> Sure, Bob. >> Yeah I have a comment about that. I've never liked the term citizen data scientist or citizen scientist. I get the point of it and I think employees within companies can help in the data analytics problem by maybe being a data collector or something. I mean I would never have just somebody become a scientist based on a few classes here she takes. It's like saying like, "Oh I'm going to be a citizen lawyer." And so you come to me with your legal problems, or a citizen surgeon. Like you need training to be good at something. You can't just be good at something just 'cause you want to be. >> John: Joe you wanted to say something too on that. >> Since we're in New York City I'd like to use the analogy of a real scientist versus a data scientist. So real scientist requires tools, right? And the tools are not new, like microscopes and a laboratory and a clean room. And these tools have evolved over years and years, and since we're in New York we could walk within a 10 block radius and buy any of those tools. It doesn't make us a scientist because we use those tools. I think with data, you know making, making the tools evolve and become easier to use, you know like Bob was saying, it doesn't make you a better data scientist, it just makes the data more accessible. You know we can go buy a microscope, we can go buy Hadoop, we can buy any kind of tool in a data ecosystem, but it doesn't really make you a scientist. I'm very involved in the NYU data science program and the Columbia data science program, like these kids are brilliant. You know these kids are not someone who is, you know just trying to run a day to day job, you know in corporate America. I think the people who are running the day to day job in corporate America are going to be the recipients of data science. Just like people who take drugs, right? As a result of a smart data scientist coming up with a formula that can help people, I think we're going to make it easier to distribute the data that can help people with all the new tools. But it doesn't really make it, you know the access to the data and tools available doesn't really make you a better data scientist. Without, like Bob was saying, without better training and education. >> So how-- I'm sorry, how do you then, if it's not for everybody, but yet I'm the user at the end of the day at my company and I've got these reams of data before me, how do you make it make better sense to me then? So that's where machine learning comes in or artificial intelligence and all this stuff. So how at the end of the day, Dion? How do you make it relevant and usable, actionable to somebody who might not be as practiced as you would like? >> I agree with Joe that many of us will be the recipients of data science. Just like you had to be a computer science at one point to develop programs for a computer, now we can get the programs. You don't need to be a computer scientist to get a lot of value out of our IT systems. The same thing's going to happen with data science. There's far more demand for data science than there ever could be produced by, you know having an ivory tower filled with data scientists. Which we need those guys, too, don't get me wrong. But we need to have, productize it and make it available in packages such that it can be consumed. The outputs and even some of the inputs can be provided by mere mortals, whether that's machine learning or artificial intelligence or bots that go off and run the hypotheses and select the algorithms maybe with some human help. We have to productize it. This is a constant of data scientist of service, which is becoming a thing now. It's, "I need this, I need this capability at scale. "I need it fast and I need it cheap." The commoditization of data science is going to happen. >> That goes back to what I was saying about, the recipient also of data science is also machines, right? Because I think the other thing that's happening now in the evolution of data is that, you know the data is, it's so tightly coupled. Back when you were talking about data warehousing you have all the business transactions then you take the data out of those systems, you put them in a warehouse for analysis, right? Maybe they'll make a decision to change that system at some point. Now the analytics platform and the business application is very tightly coupled. They become dependent upon one another. So you know people who are using the applications are now be able to take advantage of the insights of data analytics and data science, just through the app. Which never really existed before. >> I have one comment on that. You were talking about how do you get the end user more involved, well like we said earlier data science is not easy, right? As an end user, I encourage you to take a stats course, just a basic stats course, understanding what a mean is, variability, regression analysis, just basic stuff. So you as an end user can get more, or glean more insight from the reports that you're given, right? If you go to France and don't know French, then people can speak really slowly to you in French, you're not going to get it. You need to understand the language of data to get value from the technology we have available to us. >> Incidentally French is one of the languages that you have the option of learning if you're a mathematicians. So math PhDs are required to learn a second language. France being the country of algebra, that's one of the languages you could actually learn. Anyway tangent. But going back to the point. So statistics courses, definitely encourage it. I teach statistics. And one of the things that I'm finding as I go through the process of teaching it I'm actually bringing in my experience. And by bringing in my experience I'm actually kind of making the students think about the data differently. So the other thing people don't think about is the fact that like statisticians typically were expected to do, you know, just basic sort of tasks. In a sense that they're knowledge is specialized, right? But the day to day operations was they ran some data, you know they ran a test on some data, looked at the results, interpret the results based on what they were taught in school. They didn't develop that model a lot of times they just understand what the tests were saying, especially in the medical field. So when you when think about things like, we have words like population, census. Which is when you take data from every single, you have every single data point versus a sample, which is a subset. It's a very different story now that we're collecting faster than it used to be. It used to be the idea that you could collect information from everyone. Like it happens once every 10 years, we built that in. But nowadays you know, you know here about Facebook, for instance, I think they claimed earlier this year that their data was more accurate than the census data. So now there are these claims being made about which data source is more accurate. And I think the other side of this is now statisticians are expected to know data in a different way than they were before. So it's not just changing as a field in data science, but I think the sciences that are using data are also changing their fields as well. >> Dave: So is sampling dead? >> Well no, because-- >> Should it be? (laughs) >> Well if you're sampling wrong, yes. That's really the question. >> Okay. You know it's been said that the data doesn't lie, people do. Organizations are very political. Oftentimes you know, lies, damned lies and statistics, Benjamin Israeli. Are you seeing a change in the way in which organizations are using data in the context of the politics. So, some strong P&L manager say gets data and crafts it in a way that he or she can advance their agenda. Or they'll maybe attack a data set that is, probably should drive them in a different direction, but might be antithetical to their agenda. Are you seeing data, you know we talked about democratizing data, are you seeing that reduce the politics inside of organizations? >> So you know we've always used data to tell stories at the top level of an organization that's what it's all about. And I still see very much that no matter how much data science or, the access to the truth through looking at the numbers that story telling is still the political filter through which all that data still passes, right? But it's the advent of things like Block Chain, more and more corporate records and corporate information is going to end up in these open and shared repositories where there is not alternate truth. It'll come back to whoever tells the best stories at the end of the day. So I still see the organizations are very political. We are seeing now more open data though. Open data initiatives are a big thing, both in government and in the private sector. It is having an effect, but it's slow and steady. So that's what I see. >> Um, um, go ahead. >> I was just going to say as well. Ultimately I think data driven decision making is a great thing. And it's especially useful at the lower tiers of the organization where you have the routine day to day's decisions that could be automated through machine learning and deep learning. The algorithms can be improved on a constant basis. On the upper levels, you know that's why you pay executives the big bucks in the upper levels to make the strategic decisions. And data can help them, but ultimately, data, IT, technology alone will not create new markets, it will not drive new businesses, it's up to human beings to do that. The technology is the tool to help them make those decisions. But creating businesses, growing businesses, is very much a human activity. And that's something I don't see ever getting replaced. Technology might replace many other parts of the organization, but not that part. >> I tend to be a foolish optimist when it comes to this stuff. >> You do. (laughs) >> I do believe that data will make the world better. I do believe that data doesn't lie people lie. You know I think as we start, I'm already seeing trends in industries, all different industries where, you know conventional wisdom is starting to get trumped by analytics. You know I think it's still up to the human being today to ignore the facts and go with what they think in their gut and sometimes they win, sometimes they lose. But generally if they lose the data will tell them that they should have gone the other way. I think as we start relying more on data and trusting data through artificial intelligence, as we start making our lives a little bit easier, as we start using smart cars for safety, before replacement of humans. AS we start, you know, using data really and analytics and data science really as the bumpers, instead of the vehicle, eventually we're going to start to trust it as the vehicle itself. And then it's going to make lying a little bit harder. >> Okay, so great, excellent. Optimism, I love it. (John laughs) So I'm going to play devil's advocate here a little bit. There's a couple elephant in the room topics that I want to, to explore a little bit. >> Here it comes. >> There was an article today in Wired. And it was called, Why AI is Still Waiting for It's Ethics Transplant. And, I will just read a little segment from there. It says, new ethical frameworks for AI need to move beyond individual responsibility to hold powerful industrial, government and military interests accountable as they design and employ AI. When tech giants build AI products, too often user consent, privacy and transparency are overlooked in favor of frictionless functionality that supports profit driven business models based on aggregate data profiles. This is from Kate Crawford and Meredith Whittaker who founded AI Now. And they're calling for sort of, almost clinical trials on AI, if I could use that analogy. Before you go to market you've got to test the human impact, the social impact. Thoughts. >> And also have the ability for a human to intervene at some point in the process. This goes way back. Is everybody familiar with the name Stanislav Petrov? He's the Soviet officer who back in 1983, it was in the control room, I guess somewhere outside of Moscow in the control room, which detected a nuclear missile attack against the Soviet Union coming out of the United States. Ordinarily I think if this was an entirely AI driven process we wouldn't be sitting here right now talking about it. But this gentlemen looked at what was going on on the screen and, I'm sure he's accountable to his authorities in the Soviet Union. He probably got in a lot of trouble for this, but he decided to ignore the signals, ignore the data coming out of, from the Soviet satellites. And as it turned out, of course he was right. The Soviet satellites were seeing glints of the sun and they were interpreting those glints as missile launches. And I think that's a great example why, you know every situation of course doesn't mean the end of the world, (laughs) it was in this case. But it's a great example why there needs to be a human component, a human ability for human intervention at some point in the process. >> So other thoughts. I mean organizations are driving AI hard for profit. Best minds of our generation are trying to figure out how to get people to click on ads. Jeff Hammerbacher is famous for saying it. >> You can use data for a lot of things, data analytics, you can solve, you can cure cancer. You can make customers click on more ads. It depends on what you're goal is. But, there are ethical considerations we need to think about. When we have data that will have a racial bias against blacks and have them have higher prison sentences or so forth or worse credit scores, so forth. That has an impact on a broad group of people. And as a society we need to address that. And as scientists we need to consider how are we going to fix that problem? Cathy O'Neil in her book, Weapons of Math Destruction, excellent book, I highly recommend that your listeners read that book. And she talks about these issues about if AI, if algorithms have a widespread impact, if they adversely impact protected group. And I forget the last criteria, but like we need to really think about these things as a people, as a country. >> So always think the idea of ethics is interesting. So I had this conversation come up a lot of times when I talk to data scientists. I think as a concept, right as an idea, yes you want things to be ethical. The question I always pose to them is, "Well in the business setting "how are you actually going to do this?" 'Cause I find the most difficult thing working as a data scientist, is to be able to make the day to day decision of when someone says, "I don't like that number," how do you actually get around that. If that's the right data to be showing someone or if that's accurate. And say the business decides, "Well we don't like that number." Many people feel pressured to then change the data, change, or change what the data shows. So I think being able to educate people to be able to find ways to say what the data is saying, but not going past some line where it's a lie, where it's unethical. 'Cause you can also say what data doesn't say. You don't always have to say what the data does say. You can leave it as, "Here's what we do know, "but here's what we don't know." There's a don't know part that many people will omit when they talk about data. So I think, you know especially when it comes to things like AI it's tricky, right? Because I always tell people I don't know everyone thinks AI's going to be so amazing. I started an industry by fixing problems with computers that people didn't realize computers had. For instance when you have a system, a lot of bugs, we all have bug reports that we've probably submitted. I mean really it's no where near the point where it's going to start dominating our lives and taking over all the jobs. Because frankly it's not that advanced. It's still run by people, still fixed by people, still managed by people. I think with ethics, you know a lot of it has to do with the regulations, what the laws say. That's really going to be what's involved in terms of what people are willing to do. A lot of businesses, they want to make money. If there's no rules that says they can't do certain things to make money, then there's no restriction. I think the other thing to think about is we as consumers, like everyday in our lives, we shouldn't separate the idea of data as a business. We think of it as a business person, from our day to day consumer lives. Meaning, yes I work with data. Incidentally I also always opt out of my credit card, you know when they send you that information, they make you actually mail them, like old school mail, snail mail like a document that says, okay I don't want to be part of this data collection process. Which I always do. It's a little bit more work, but I go through that step of doing that. Now if more people did that, perhaps companies would feel more incentivized to pay you for your data, or give you more control of your data. Or at least you know, if a company's going to collect information, I'd want you to be certain processes in place to ensure that it doesn't just get sold, right? For instance if a start up gets acquired what happens with that data they have on you? You agree to give it to start up. But I mean what are the rules on that? So I think we have to really think about the ethics from not just, you know, someone who's going to implement something but as consumers what control we have for our own data. 'Cause that's going to directly impact what businesses can do with our data. >> You know you mentioned data collection. So slightly on that subject. All these great new capabilities we have coming. We talked about what's going to happen with media in the future and what 5G technology's going to do to mobile and these great bandwidth opportunities. The internet of things and the internet of everywhere. And all these great inputs, right? Do we have an arms race like are we keeping up with the capabilities to make sense of all the new data that's going to be coming in? And how do those things square up in this? Because the potential is fantastic, right? But are we keeping up with the ability to make it make sense and to put it to use, Joe? >> So I think data ingestion and data integration is probably one of the biggest challenges. I think, especially as the world is starting to become more dependent on data. I think you know, just because we're dependent on numbers we've come up with GAAP, which is generally accepted accounting principles that can be audited and proven whether it's true or false. I think in our lifetime we will see something similar to that we will we have formal checks and balances of data that we use that can be audited. Getting back to you know what Dave was saying earlier about, I personally would trust a machine that was programmed to do the right thing, than to trust a politician or some leader that may have their own agenda. And I think the other thing about machines is that they are auditable. You know you can look at the code and see exactly what it's doing and how it's doing it. Human beings not so much. So I think getting to the truth, even if the truth isn't the answer that we want, I think is a positive thing. It's something that we can't do today that once we start relying on machines to do we'll be able to get there. >> Yeah I was just going to add that we live in exponential times. And the challenge is that the way that we're structured traditionally as organizations is not allowing us to absorb advances exponentially, it's linear at best. Everyone talks about change management and how are we going to do digital transformation. Evidence shows that technology's forcing the leaders and the laggards apart. There's a few leading organizations that are eating the world and they seem to be somehow rolling out new things. I don't know how Amazon rolls out all this stuff. There's all this artificial intelligence and the IOT devices, Alexa, natural language processing and that's just a fraction, it's just a tip of what they're releasing. So it just shows that there are some organizations that have path found the way. Most of the Fortune 500 from the year 2000 are gone already, right? The disruption is happening. And so we are trying, have to find someway to adopt these new capabilities and deploy them effectively or the writing is on the wall. I spent a lot of time exploring this topic, how are we going to get there and all of us have a lot of hard work is the short answer. >> I read that there's going to be more data, or it was predicted, more data created in this year than in the past, I think it was five, 5,000 years. >> Forever. (laughs) >> And that to mix the statistics that we're analyzing currently less than 1% of the data. To taking those numbers and hear what you're all saying it's like, we're not keeping up, it seems like we're, it's not even linear. I mean that gap is just going to grow and grow and grow. How do we close that? >> There's a guy out there named Chris Dancy, he's known as the human cyborg. He has 700 hundred sensors all over his body. And his theory is that data's not new, having access to the data is new. You know we've always had a blood pressure, we've always had a sugar level. But we were never able to actually capture it in real time before. So now that we can capture and harness it, now we can be smarter about it. So I think that being able to use this information is really incredible like, this is something that over our lifetime we've never had and now we can do it. Which hence the big explosion in data. But I think how we use it and have it governed I think is the challenge right now. It's kind of cowboys and indians out there right now. And without proper governance and without rigorous regulation I think we are going to have some bumps in the road along the way. >> The data's in the oil is the question how are we actually going to operationalize around it? >> Or find it. Go ahead. >> I will say the other side of it is, so if you think about information, we always have the same amount of information right? What we choose to record however, is a different story. Now if you want wanted to know things about the Olympics, but you decide to collect information every day for years instead of just the Olympic year, yes you have a lot of data, but did you need all of that data? For that question about the Olympics, you don't need to collect data during years there are no Olympics, right? Unless of course you're comparing it relative. But I think that's another thing to think about. Just 'cause you collect more data does not mean that data will produce more statistically significant results, it does not mean it'll improve your model. You can be collecting data about your shoe size trying to get information about your hair. I mean it really does depend on what you're trying to measure, what your goals are, and what the data's going to be used for. If you don't factor the real world context into it, then yeah you can collect data, you know an infinite amount of data, but you'll never process it. Because you have no question to ask you're not looking to model anything. There is no universal truth about everything, that just doesn't exist out there. >> I think she's spot on. It comes down to what kind of questions are you trying to ask of your data? You can have one given database that has 100 variables in it, right? And you can ask it five different questions, all valid questions and that data may have those variables that'll tell you what's the best predictor of Churn, what's the best predictor of cancer treatment outcome. And if you can ask the right question of the data you have then that'll give you some insight. Just data for data's sake, that's just hype. We have a lot of data but it may not lead to anything if we don't ask it the right questions. >> Joe. >> I agree but I just want to add one thing. This is where the science in data science comes in. Scientists often will look at data that's already been in existence for years, weather forecasts, weather data, climate change data for example that go back to data charts and so forth going back centuries if that data is available. And they reformat, they reconfigure it, they get new uses out of it. And the potential I see with the data we're collecting is it may not be of use to us today, because we haven't thought of ways to use it, but maybe 10, 20, even 100 years from now someone's going to think of a way to leverage the data, to look at it in new ways and to come up with new ideas. That's just my thought on the science aspect. >> Knowing what you know about data science, why did Facebook miss Russia and the fake news trend? They came out and admitted it. You know, we miss it, why? Could they have, is it because they were focused elsewhere? Could they have solved that problem? (crosstalk) >> It's what you said which is are you asking the right questions and if you're not looking for that problem in exactly the way that it occurred you might not be able to find it. >> I thought the ads were paid in rubles. Shouldn't that be your first clue (panelists laugh) that something's amiss? >> You know red flag, so to speak. >> Yes. >> I mean Bitcoin maybe it could have hidden it. >> Bob: Right, exactly. >> I would think too that what happened last year is actually was the end of an age of optimism. I'll bring up the Soviet Union again, (chuckles). It collapsed back in 1991, 1990, 1991, Russia was reborn in. And think there was a general feeling of optimism in the '90s through the 2000s that Russia is now being well integrated into the world economy as other nations all over the globe, all continents are being integrated into the global economy thanks to technology. And technology is lifting entire continents out of poverty and ensuring more connectedness for people. Across Africa, India, Asia, we're seeing those economies that very different countries than 20 years ago and that extended into Russia as well. Russia is part of the global economy. We're able to communicate as a global, a global network. I think as a result we kind of overlook the dark side that occurred. >> John: Joe? >> Again, the foolish optimist here. But I think that... It shouldn't be the question like how did we miss it? It's do we have the ability now to catch it? And I think without data science without machine learning, without being able to train machines to look for patterns that involve corruption or result in corruption, I think we'd be out of luck. But now we have those tools. And now hopefully, optimistically, by the next election we'll be able to detect these things before they become public. >> It's a loaded question because my premise was Facebook had the ability and the tools and the knowledge and the data science expertise if in fact they wanted to solve that problem, but they were focused on other problems, which is how do I get people to click on ads? >> Right they had the ability to train the machines, but they were giving the machines the wrong training. >> Looking under the wrong rock. >> (laughs) That's right. >> It is easy to play armchair quarterback. Another topic I wanted to ask the panel about is, IBM Watson. You guys spend time in the Valley, I spend time in the Valley. People in the Valley poo-poo Watson. Ah, Google, Facebook, Amazon they've got the best AI. Watson, and some of that's fair criticism. Watson's a heavy lift, very services oriented, you just got to apply it in a very focused. At the same time Google's trying to get you to click on Ads, as is Facebook, Amazon's trying to get you to buy stuff. IBM's trying to solve cancer. Your thoughts on that sort of juxtaposition of the different AI suppliers and there may be others. Oh, nobody wants to touch this one, come on. I told you elephant in the room questions. >> Well I mean you're looking at two different, very different types of organizations. One which is really spent decades in applying technology to business and these other companies are ones that are primarily into the consumer, right? When we talk about things like IBM Watson you're looking at a very different type of solution. You used to be able to buy IT and once you installed it you pretty much could get it to work and store your records or you know, do whatever it is you needed it to do. But these types of tools, like Watson actually tries to learn your business. And it needs to spend time doing that watching the data and having its models tuned. And so you don't get the results right away. And I think that's been kind of the challenge that organizations like IBM has had. Like this is a different type of technology solution, one that has to actually learn first before it can provide value. And so I think you know you have organizations like IBM that are much better at applying technology to business, and then they have the further hurdle of having to try to apply these tools that work in very different ways. There's education too on the side of the buyer. >> I'd have to say that you know I think there's plenty of businesses out there also trying to solve very significant, meaningful problems. You know with Microsoft AI and Google AI and IBM Watson, I think it's not really the tool that matters, like we were saying earlier. A fool with a tool is still a fool. And regardless of who the manufacturer of that tool is. And I think you know having, a thoughtful, intelligent, trained, educated data scientist using any of these tools can be equally effective. >> So do you not see core AI competence and I left out Microsoft, as a strategic advantage for these companies? Is it going to be so ubiquitous and available that virtually anybody can apply it? Or is all the investment in R&D and AI going to pay off for these guys? >> Yeah, so I think there's different levels of AI, right? So there's AI where you can actually improve the model. I remember when I was invited when Watson was kind of first out by IBM to a private, sort of presentation. And my question was, "Okay, so when do I get "to access the corpus?" The corpus being sort of the foundation of NLP, which is natural language processing. So it's what you use as almost like a dictionary. Like how you're actually going to measure things, or things up. And they said, "Oh you can't." "What do you mean I can't?" It's like, "We do that." "So you're telling me as a data scientist "you're expecting me to rely on the fact "that you did it better than me and I should rely on that." I think over the years after that IBM started opening it up and offering different ways of being able to access the corpus and work with that data. But I remember at the first Watson hackathon there was only two corpus available. It was either the travel or medicine. There was no other foundational data available. So I think one of the difficulties was, you know IBM being a little bit more on the forefront of it they kind of had that burden of having to develop these systems and learning kind of the hard way that if you don't have the right models and you don't have the right data and you don't have the right access, that's going to be a huge limiter. I think with things like medical, medical information that's an extremely difficult data to start with. Partly because you know anything that you do find or don't find, the impact is significant. If I'm looking at things like what people clicked on the impact of using that data wrong, it's minimal. You might lose some money. If you do that with healthcare data, if you do that with medical data, people may die, like this is a much more difficult data set to start with. So I think from a scientific standpoint it's great to have any information about a new technology, new process. That's the nice that is that IBM's obviously invested in it and collected information. I think the difficulty there though is just 'cause you have it you can't solve everything. And if feel like from someone who works in technology, I think in general when you appeal to developers you try not to market. And with Watson it's very heavily marketed, which tends to turn off people who are more from the technical side. Because I think they don't like it when it's gimmicky in part because they do the opposite of that. They're always trying to build up the technical components of it. They don't like it when you're trying to convince them that you're selling them something when you could just give them the specs and look at it. So it could be something as simple as communication. But I do think it is valuable to have had a company who leads on the forefront of that and try to do so we can actually learn from what IBM has learned from this process. >> But you're an optimist. (John laughs) All right, good. >> Just one more thought. >> Joe go ahead first. >> Joe: I want to see how Alexa or Siri do on Jeopardy. (panelists laugh) >> All right. Going to go around a final thought, give you a second. Let's just think about like your 12 month crystal ball. In terms of either challenges that need to be met in the near term or opportunities you think will be realized. 12, 18 month horizon. Bob you've got the microphone headed up, so I'll let you lead off and let's just go around. >> I think a big challenge for business, for society is getting people educated on data and analytics. There's a study that was just released I think last month by Service Now, I think, or some vendor, or Click. They found that only 17% of the employees in Europe have the ability to use data in their job. Think about that. >> 17. >> 17. Less than 20%. So these people don't have the ability to understand or use data intelligently to improve their work performance. That says a lot about the state we're in today. And that's Europe. It's probably a lot worse in the United States. So that's a big challenge I think. To educate the masses. >> John: Joe. >> I think we probably have a better chance of improving technology over training people. I think using data needs to be iPhone easy. And I think, you know which means that a lot of innovation is in the years to come. I do think that a keyboard is going to be a thing of the past for the average user. We are going to start using voice a lot more. I think augmented reality is going to be things that becomes a real reality. Where we can hold our phone in front of an object and it will have an overlay of prices where it's available, if it's a person. I think that we will see within an organization holding a camera up to someone and being able to see what is their salary, what sales did they do last year, some key performance indicators. I hope that we are beyond the days of everyone around the world walking around like this and we start actually becoming more social as human beings through augmented reality. I think, it has to happen. I think we're going through kind of foolish times at the moment in order to get to the greater good. And I think the greater good is using technology in a very, very smart way. Which means that you shouldn't have to be, sorry to contradict, but maybe it's good to counterpoint. I don't think you need to have a PhD in SQL to use data. Like I think that's 1990. I think as we evolve it's going to become easier for the average person. Which means people like the brain trust here needs to get smarter and start innovating. I think the innovation around data is really at the tip of the iceberg, we're going to see a lot more of it in the years to come. >> Dion why don't you go ahead, then we'll come down the line here. >> Yeah so I think over that time frame two things are likely to happen. One is somebody's going to crack the consumerization of machine learning and AI, such that it really is available to the masses and we can do much more advanced things than we could. We see the industries tend to reach an inflection point and then there's an explosion. No one's quite cracked the code on how to really bring this to everyone, but somebody will. And that could happen in that time frame. And then the other thing that I think that almost has to happen is that the forces for openness, open data, data sharing, open data initiatives things like Block Chain are going to run headlong into data protection, data privacy, customer privacy laws and regulations that have to come down and protect us. Because the industry's not doing it, the government is stepping in and it's going to re-silo a lot of our data. It's going to make it recede and make it less accessible, making data science harder for a lot of the most meaningful types of activities. Patient data for example is already all locked down. We could do so much more with it, but health start ups are really constrained about what they can do. 'Cause they can't access the data. We can't even access our own health care records, right? So I think that's the challenge is we have to have that battle next to be able to go and take the next step. >> Well I see, with the growth of data a lot of it's coming through IOT, internet of things. I think that's a big source. And we're going to see a lot of innovation. A new types of Ubers or Air BnBs. Uber's so 2013 though, right? We're going to see new companies with new ideas, new innovations, they're going to be looking at the ways this data can be leveraged all this big data. Or data coming in from the IOT can be leveraged. You know there's some examples out there. There's a company for example that is outfitting tools, putting sensors in the tools. Industrial sites can therefore track where the tools are at any given time. This is an expensive, time consuming process, constantly loosing tools, trying to locate tools. Assessing whether the tool's being applied to the production line or the right tool is at the right torque and so forth. With the sensors implanted in these tools, it's now possible to be more efficient. And there's going to be innovations like that. Maybe small start up type things or smaller innovations. We're going to see a lot of new ideas and new types of approaches to handling all this data. There's going to be new business ideas. The next Uber, we may be hearing about it a year from now whatever that may be. And that Uber is going to be applying data, probably IOT type data in some, new innovative way. >> Jennifer, final word. >> Yeah so I think with data, you know it's interesting, right, for one thing I think on of the things that's made data more available and just people we open to the idea, has been start ups. But what's interesting about this is a lot of start ups have been acquired. And a lot of people at start ups that got acquired now these people work at bigger corporations. Which was the way it was maybe 10 years ago, data wasn't available and open, companies kept it very proprietary, you had to sign NDAs. It was like within the last 10 years that open source all of that initiatives became much more popular, much more open, a acceptable sort of way to look at data. I think that what I'm kind of interested in seeing is what people do within the corporate environment. Right, 'cause they have resources. They have funding that start ups don't have. And they have backing, right? Presumably if you're acquired you went in at a higher title in the corporate structure whereas if you had started there you probably wouldn't be at that title at that point. So I think you have an opportunity where people who have done innovative things and have proven that they can build really cool stuff, can now be in that corporate environment. I think part of it's going to be whether or not they can really adjust to sort of the corporate, you know the corporate landscape, the politics of it or the bureaucracy. I think every organization has that. Being able to navigate that is a difficult thing in part 'cause it's a human skill set, it's a people skill, it's a soft skill. It's not the same thing as just being able to code something and sell it. So you know it's going to really come down to people. I think if people can figure out for instance, what people want to buy, what people think, in general that's where the money comes from. You know you make money 'cause someone gave you money. So if you can find a way to look at a data or even look at technology and understand what people are doing, aren't doing, what they're happy about, unhappy about, there's always opportunity in collecting the data in that way and being able to leverage that. So you build cooler things, and offer things that haven't been thought of yet. So it's a very interesting time I think with the corporate resources available if you can do that. You know who knows what we'll have in like a year. >> I'll add one. >> Please. >> The majority of companies in the S&P 500 have a market cap that's greater than their revenue. The reason is 'cause they have IP related to data that's of value. But most of those companies, most companies, the vast majority of companies don't have any way to measure the value of that data. There's no GAAP accounting standard. So they don't understand the value contribution of their data in terms of how it helps them monetize. Not the data itself necessarily, but how it contributes to the monetization of the company. And I think that's a big gap. If you don't understand the value of the data that means you don't understand how to refine it, if data is the new oil and how to protect it and so forth and secure it. So that to me is a big gap that needs to get closed before we can actually say we live in a data driven world. >> So you're saying I've got an asset, I don't know if it's worth this or this. And they're missing that great opportunity. >> So devolve to what I know best. >> Great discussion. Really, really enjoyed the, the time as flown by. Joe if you get that augmented reality thing to work on the salary, point it toward that guy not this guy, okay? (everyone laughs) It's much more impressive if you point it over there. But Joe thank you, Dion, Joe and Jennifer and Batman. We appreciate and Bob Hayes, thanks for being with us. >> Thanks you guys. >> Really enjoyed >> Great stuff. >> the conversation. >> And a reminder coming up a the top of the hour, six o'clock Eastern time, IBMgo.com featuring the live keynote which is being set up just about 50 feet from us right now. Nick Silver is one of the headliners there, John Thomas is well, or rather Rob Thomas. John Thomas we had on earlier on The Cube. But a panel discussion as well coming up at six o'clock on IBMgo.com, six to 7:15. Be sure to join that live stream. That's it from The Cube. We certainly appreciate the time. Glad to have you along here in New York. And until the next time, take care. (bright digital music)

Published Date : Nov 1 2017

SUMMARY :

Brought to you by IBM. Welcome back to data science for all. So it is a new game-- Have a swing at the pitch. Thanks for taking the time to be with us. from the academic side to continue data science And there's lot to be said is there not, ask the questions, you can't not think about it. of the customer and how we were going to be more anticipatory And I think, you know as the tools mature, So it's still too hard. I think that, you know, that's where it's headed. So Bob if you would, so you've got this Batman shirt on. to be a data scientist, but these tools will help you I was just going to add that, you know I think it's important to point out as well that And the data scientists on the panel And the only difference is that you can build it's an accomplishment and for less, So I think you have to think about the fact that I get the point of it and I think and become easier to use, you know like Bob was saying, So how at the end of the day, Dion? or bots that go off and run the hypotheses So you know people who are using the applications are now then people can speak really slowly to you in French, But the day to day operations was they ran some data, That's really the question. You know it's been said that the data doesn't lie, the access to the truth through looking at the numbers of the organization where you have the routine I tend to be a foolish optimist You do. I think as we start relying more on data and trusting data There's a couple elephant in the room topics Before you go to market you've got to test And also have the ability for a human to intervene to click on ads. And I forget the last criteria, but like we need I think with ethics, you know a lot of it has to do of all the new data that's going to be coming in? Getting back to you know what Dave was saying earlier about, organizations that have path found the way. than in the past, I think it was (laughs) I mean that gap is just going to grow and grow and grow. So I think that being able to use this information Or find it. But I think that's another thing to think about. And if you can ask the right question of the data you have And the potential I see with the data we're collecting is Knowing what you know about data science, for that problem in exactly the way that it occurred I thought the ads were paid in rubles. I think as a result we kind of overlook And I think without data science without machine learning, Right they had the ability to train the machines, At the same time Google's trying to get you And so I think you know And I think you know having, I think in general when you appeal to developers But you're an optimist. Joe: I want to see how Alexa or Siri do on Jeopardy. in the near term or opportunities you think have the ability to use data in their job. That says a lot about the state we're in today. I don't think you need to have a PhD in SQL to use data. Dion why don't you go ahead, We see the industries tend to reach an inflection point And that Uber is going to be applying data, I think part of it's going to be whether or not if data is the new oil and how to protect it I don't know if it's worth this or this. Joe if you get that augmented reality thing Glad to have you along here in New York.

ENTITIES

Entity	Category	Confidence
Jeff Hammerbacher	PERSON	0.99+
Dave	PERSON	0.99+
Dion Hinchcliffe	PERSON	0.99+
John	PERSON	0.99+
Jennifer	PERSON	0.99+
Joe	PERSON	0.99+
Comcast	ORGANIZATION	0.99+
Chris Dancy	PERSON	0.99+
Jennifer Shin	PERSON	0.99+
Cathy O'Neil	PERSON	0.99+
Dave Vellante	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Stanislav Petrov	PERSON	0.99+
Joe McKendrick	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Nick Silver	PERSON	0.99+
John Thomas	PERSON	0.99+
100 variables	QUANTITY	0.99+
John Walls	PERSON	0.99+
1990	DATE	0.99+
Joe Caserta	PERSON	0.99+
Rob Thomas	PERSON	0.99+
Uber	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
UC Berkeley	ORGANIZATION	0.99+
1983	DATE	0.99+
1991	DATE	0.99+
2013	DATE	0.99+
Constellation Research	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
Facebook	ORGANIZATION	0.99+
Bob	PERSON	0.99+
Google	ORGANIZATION	0.99+
Bob Hayes	PERSON	0.99+
United States	LOCATION	0.99+
360 degree	QUANTITY	0.99+
one	QUANTITY	0.99+
New York	LOCATION	0.99+
Benjamin Israeli	PERSON	0.99+
France	LOCATION	0.99+
Africa	LOCATION	0.99+
12 month	QUANTITY	0.99+
Soviet Union	LOCATION	0.99+
Batman	PERSON	0.99+
New York City	LOCATION	0.99+
last year	DATE	0.99+
Olympics	EVENT	0.99+
Meredith Whittaker	PERSON	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
Moscow	LOCATION	0.99+
Ubers	ORGANIZATION	0.99+
20 years	QUANTITY	0.99+
Joe C.	PERSON	0.99+

John Thomas, IBM | IBM Data Science For All

(upbeat music) >> Narrator: Live from New York City, it's the Cube, covering IBM Data Science for All. Brought to you by IMB. >> Welcome back to Data Science for All. It's a whole new game here at IBM's event, two-day event going on, 6:00 tonight the big keynote presentation on IBM.com so be sure to join the festivities there. You can watch it live stream, all that's happening. Right now, we're live here on the Cube, along with Dave Vellente, I'm John Walls and we are joined by John Thomas who is a distinguished engineer and director at IBM. John, thank you for your time, good to see you. >> Same here, John. >> Yeah, pleasure, thanks for being with us here. >> John Thomas: Sure. >> I know, in fact, you just wrote this morning about machine learning, so that's obviously very near and dear to you. Let's talk first off about IBM, >> John Thomas: Sure. >> Not a new concept by any means, but what is new with regard to machine learning in your work? >> Yeah, well, that's a good question, John. Actually, I get that question a lot. Machine learning itself is not new, companies have been doing it for decades, so exactly what is new, right? I actually wrote this in a blog today, this morning. It's really three different things, I call them democratizing machine learning, operationalizing machine learning, and hybrid machine learning, right? And we can talk through each of these if you like. But I would say hybrid machine learning is probably closest to my heart. So let me explain what that is because it's sounds fancy, right? (laughter) >> Right. It's what we need is another hybrid something, right? >> In reality, what it is is let data gravity decide where your data stays and let your performance requirements, your SLA's, dictate where your machine learning models go, right? So what do I mean by that? You might have sensitive data, customer data, which you want to keep on a certain platform, right? Instead of moving data off that platform to do machine learning, bring machine learning to that platform, whether that be the mainframe or specialized appliances or hadoop clusters, you name it, right? Bring machine learning to where the data is. Do the training, building of the model, where that is, but then have complete flexibility in terms of where you deploy that model. As an example, you might choose to build and train your model on premises behind the firewall using very sensitive data, but the model that has been built, you may choose to deploy that into a Cloud environment because you have other applications that need to consume it. That flexibility is what I mean by hybrid. Another example is, especially when you get into so many more complex machine learning, deep learning domains, you need exploration and there is hardware that provides that exploration, right? For example, GPU's provide exploration. Well, you need to have the flexibility to train and build the models on hardware that provides that kind of exploration, but then the model that has been built might go into inside of a CICS mainframe transaction for some second scoring of a credit card transaction as to whether it's fraudulent or not, right? So there's flexibility off peri, on peri, different platforms, this is what I mean by hybrid. >> What is the technical enabler to allow that to happen? Is it just a modern software architecture, microservices, containers, blah, blah, blah? Explain that in more detail. >> Yeah, that's a good question and we're not, you know, it's a couple different things. One is bringing native machine learning to these platforms themselves. So you need native machine learning on the mainframe, in the Cloud, in a hadoop cluster environment, in an appliance, right? So you need the run times, the libraries, the frameworks running native on those platforms. And that is not easy to do that, you know? You've got machine learning running native on ZOS, not even Linux on Z. It's native to ZOS on the mainframe. >> At the very primitive level you're talking about. >> Yeah. >> So you get the performance you need. >> You have the runtime environments there and then what you need is a seamless experience across all of these platforms. You need way to export models, repositories into which you can save models, the same API's to save models into a different repository and then consume from them there. So it's a bit of engineering that IBM is doing to enable this, right? Native capabilities on the platforms, the same API's to talk to repositories and consume from the repositories. >> So the other piece of that architecture is talking a lot of tooling that integrated and native. >> John Thomas: Yes. >> And the tooling, as you know, changes, I feel like daily. There's a new tool out there and everybody gloms onto it, so the architecture has to be able to absorb those. What is the enabler there? >> Yeah, so you actually bring up a very good point. There is a new language, a new framework everyday, right? I mean, we all know that, in the world of machine learning, Python and R and Scala. Frameworks like Spark and TensorFlow, they're table scapes now, you know? You have to support all of these, scikit-learning, you name it, right? Obviously, you need a way to support all these frameworks on the platforms you want to enable, right? And then you need an environment which lets you work with the tools of your choice. So you need an environment like a workbench which can allow you to work in the language, the framework that you are the most comfortable with. And that's what we are doing with data science experience. I don't know if you have thought of this, but data science experience is an enterprise ML platform, right, runs in the Cloud, on prem, on x86 machines, you can have it on a (mumbles) box. The idea here is support for a variety of open languages, frameworks, enable through a collaborative workbench kind of interface. >> And the decision to move, whether it's on-prem or in the Cloud, it's a function of many things, but let's talk about those. I mean, data volume is one. You can't just move your business into the Cloud. It's not going to work that well. >> It's a journey, yeah. >> It's too expensive. But then there's others, there's governance edicts and security edicts, not that the security in the Cloud is any worse, it might just different than what your organization requires, and the Cloud supplier might not support that. It's different Clouds, it's location, etc. When you talked about the data thing being on trend, maybe training a model, and then that model moving to the Cloud, so obviously, it's a lighter weight ... It's not as much-- >> Yeah, yeah, yeah, you're not moving the entire data. Right. >> But I have a concern. I wonder if clients as you about this. Okay, well, it's my data, my data, I'm going to keep behind my firewall. But that data trained that model and I'm really worried that that model is now my IP that's going to seep out into the industry. What do you tell a client? >> Yeah, that's a fair point. Obviously, you still need your security mechanisms, you access control mechanisms, your governance control mechanisms. So you need governance whether you are on the Cloud or on prem. And your encryption mechanisms, your version control mechanisms, your governance mechanisms, all need to be in place, regardless of where you deploy, right? And to your question of how do you decide where the model should go, as I said earlier to John, you know, let data gravity SLA's performance security requirements dictate where the model should go. >> We're talking so much about concepts, right, and theories that you have. Lets roll up our sleeves and get to the nitty-gritty a little bit here and talk about what are people really doing out there? >> John Thomas: Oh yeah, use cases. >> Yeah, just give us an idea for some of the ... Kind of the latest and greatest that you're seeing. >> Lots of very interesting, interesting use cases out there so actually, a part of what IBM calls a data science elite team. We go out and engage with customers on very interesting use cases, right? And we see a lot of these hybrid discussions happen as well. On one end of the spectrum is understanding customers better. So I call this reading the customer's mind. So can you understand what is in the customer's mind and have an interaction with the client without asking a bunch of questions, right? Can you look at his historical data, his browsing behavior, his purchasing behavior, and have an offer that he will really love? Can you really understand him and give him a celebrity experience? That's one class of use cases, right? Another class of use cases is around improving operations, improving your own internal processes. One example is fraud detection, right? I mean, that is a hot topic these days. So how do you, as the credit card is swiped, right, it's just a few milliseconds before that travels through a network and kicks you back in mainframe and a scoring is done to as to whether this should be approved or not. Well, you need to have a prediction of how likely this is to be fraudulent or not in the span of the transaction. Here's another one. I don't know if you call help desks now. I sometimes call them "helpless desks." (laughter) >> Try not to. >> Dave: Hell desks. >> Try not to helpless desks but, you know, for pretty every enterprise that I am talking to, there is a goal to optimize their help desk, their call centers. And call center optimization is good. So as the customer calls in, can you understand the intent of the customer? See, he may start off talking about something, but as the call progresses, the intent might change. Can you understand that? In fact, not just understand, but predict it and intercept with something that the client will love before the conversation takes a bad turn? (laughter) >> You must be listening in on my calls. >> Your calls, must be your calls! >> I meander, I go every which way. >> I game the system and just go really mad and go, let me get you an operator. (laughter) Agent, okay. >> You tow guys, your data is a special case. >> Dave: Yeah right, this guy's pissed. >> We are red-flagged right off the top. >> We're not even analyzing you. >> Day job, forget about, you know. What about things, you know, because they're moving so far out to the edge and now with mobile and that explosion there, and sensor data being what it is and all this is tremendous growth. Tough to manage. >> Dave: It is, it really is. >> I guess, maybe tougher to make sense of it, so how are you helping people make sense of this so they can really filter through and find the data that matters? >> Yeah, this is a lot of things rolled up into that question, right? One is just managing those devices, those endpoints in multiple thousands, tens of thousands, millions of these devices. How would you manage them? Then, are you doing the processing of the data and applying ML and DL right at the edge, or are you bringing the data back behind the firewall or into Cloud and then processing it there? If you are doing image reduction in a car, in a self-driving car, can you allow the latency of data being shipping of an image of a pedestrian jumping in front, do we ship across the Cloud for a deep-learning network to process it and give you an answer - oh, that's a pedestrian? You know, you may not have that latency now. So you may want to do some processing on the edge, so that is another interesting discussion, right? And you need exploration there as well. Another aspect now is, as you said, separating the signal from the noise, you know. It's just really, really coming down to the different industries that we go into, what are the signals that we understand now? Can we build on them and can we re-use them? That is an interesting discussion as well. But, yeah, you're right. With the world of exploding data that we are in, with all these devices, it's very important to have systematic approach to managing your data, cataloging it, understanding where to apply ML, where to apply exploration, governance. All of these things become important. >> I want to ask you about, come back to the use cases for a moment. You talk about celebrity experiences, I put that in sort of a marketing category. Fraud detection's always been one of the favorite, big data use cases, help desks, recommendation engines and so forth. Let's start with the fraud detection. About a year ago, first of all, fraud detection in the last six, seven years, has been getting immensely better, no question. And it's great. However, the number of false positives, about a year ago, it was too many. We're a small company but we buy a lot of equipment and lights and cameras and stuff. The number of false positives that I personally get was overwhelming. >> Yeah. >> They've gone down dramatically. >> Yeah. >> In the last 12 months. Is that just a coincidence, happenstance, or is it getting better? >> No, it's not that the bad guys have gone down in number. It's not that at all, no. (laughter) >> Well, that, I know. >> No, I think there is a lot of sophistication in terms of the algorithms that are available now. In terms of ... If you have tens of thousands of features that you're looking at, how do you collapse that space and how do you do that efficiently, right? There are techniques that are evolving in terms of handing that kind of information. In terms of the actual algorithms, are different types of innovations that are happening in that space. But I think, perhaps, the most important one is that things that use to take weeks or days to train and test, now can be done in days or minutes, right? The exploration that comes from GPU's, for example, allows you to test out different algorithms, different models and say, okay, well, this performs well enough for me to roll it out and try this out, right? It gives you a very quick cycle of innovation. >> The time to value is really compressed. Okay, now let's take one that's not so good. Ad recommendations, the Google ads that pop up. One in a hundred are maybe relevant, if that, right? And they pop up on the screen and they're annoying. I worry that Siri's listening somehow. I talk to my wife about Israel and then next thing I know, I'm getting ads for going to Israel. Is that a coincidence or are they listening? What's happening there? >> I don't know about what Google's doing. I can't comment on that. (laughter) I don't want to comment on that. >> Maybe just from a technology perspective. >> From a technology perspective, this notion of understanding what is in the customer's mind and really getting to a customer segment at one, this is top interest for many, many organizations. Regardless of which industry you are, insurance or banking or retail, doesn't matter, right? And it all comes down to the fundamental principles about how efficiently can you do. Now, can you identify the features that have the most predictive power? This is a level of sophistication in terms of the feature engineering, in terms of collapsing that space of features that I had talked about, and then, how do I actually go to the latest science of this? How do I do the exploratory analysis? How do I actually build and test my machine learning models quickly? Do the tools allow me to be very productive about this? Or do I spend weeks and weeks coding in lower-level formats? Or do I get help, do I get guided interfaces, which guide me through the process, right? And then, the topic of exploration we talk about, right? These things come together and then couple that with cognitive API's. For example, speech to text, the word (mumbles) have gone down dramatically now. So as you talk on the phone, with a very high accuracy, we can understand what is being talked about. Image recognition, the accuracy has gone up dramatically. You can create custom classifiers for industry-specific topics that you want to identify in pictures. Natural language processing, natural language understanding, all of these have evolved in the last few years. And all these come together. So machine learning's not an island. All these things coming together is what makes these dramatic advancements possible. >> Well, John, if you've figured out anything about the past 20 minutes or so, is that Dave and I want ads delivered that matter and we want our help desk questions answered right away. (laugher) so if you can help us with that, you're welcome back on the Cube anytime, okay? >> We will try, John. >> That's all we want, that's all we ask. >> You guys, your calls are still being screened. (laughter) >> John Thomas, thank you for joining us, we appreciate that. >> Thank you. >> Our panel discussion coming up at 4:00 Eastern time. Live here on the Cube, we're in New York City. Be back in a bit. (upbeat music)

Published Date : Nov 1 2017

SUMMARY :

Brought to you by IMB. John, thank you for your time, good to see you. I know, in fact, you just wrote this morning And we can talk through each of these if you like. It's what we need is another hybrid something, right? of where you deploy that model. What is the technical enabler to allow that to happen? And that is not easy to do that, you know? and then what you need is a seamless experience So the other piece of that architecture is And the tooling, as you know, changes, I feel like daily. the framework that you are the most comfortable with. And the decision to move, whether it's on-prem and security edicts, not that the security in the Cloud is Yeah, yeah, yeah, you're not moving the entire data. I wonder if clients as you about this. So you need governance whether you are and theories that you have. Kind of the latest and greatest that you're seeing. I don't know if you call help desks now. So as the customer calls in, can you understand and go, let me get you an operator. What about things, you know, because they're moving the signal from the noise, you know. I want to ask you about, come back to the use cases In the last 12 months. No, it's not that the bad guys have gone down in number. and how do you do that efficiently, right? I talk to my wife about Israel and then next thing I know, I don't know about what Google's doing. So as you talk on the phone, with a very high accuracy, so if you can help us with that, You guys, your calls are still being screened. Live here on the Cube, we're in New York City.

ENTITIES

Entity	Category	Confidence
Dave Vellente	PERSON	0.99+
John	PERSON	0.99+
John Thomas	PERSON	0.99+
Dave	PERSON	0.99+
IBM	ORGANIZATION	0.99+
John Walls	PERSON	0.99+
Israel	LOCATION	0.99+
Google	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
Siri	TITLE	0.99+
ZOS	TITLE	0.99+
today	DATE	0.99+
Linux	TITLE	0.99+
One example	QUANTITY	0.99+
Python	TITLE	0.99+
thousands	QUANTITY	0.99+
One	QUANTITY	0.99+
Scala	TITLE	0.99+
Spark	TITLE	0.98+
tens of thousands	QUANTITY	0.98+
this morning	DATE	0.98+
each	QUANTITY	0.98+
IMB	ORGANIZATION	0.96+
one	QUANTITY	0.96+
TensorFlow	TITLE	0.95+
millions	QUANTITY	0.95+
About a year ago	DATE	0.95+
first	QUANTITY	0.94+
one class	QUANTITY	0.92+
Z.	TITLE	0.91+
4:00 Eastern time	DATE	0.9+
decades	QUANTITY	0.9+
6:00 tonight	DATE	0.9+
CICS	ORGANIZATION	0.9+
about a year ago	DATE	0.89+
second	QUANTITY	0.88+
two-day event	QUANTITY	0.86+
three different things	QUANTITY	0.85+
last 12 months	DATE	0.84+
IBM Data Science	ORGANIZATION	0.82+
Cloud	TITLE	0.8+
R	TITLE	0.78+
past 20 minutes	DATE	0.77+
Cube	COMMERCIAL_ITEM	0.75+
a hundred	QUANTITY	0.72+
one end	QUANTITY	0.7+
seven years	QUANTITY	0.69+
features	QUANTITY	0.69+
couple	QUANTITY	0.67+
last six	DATE	0.66+
few milliseconds	QUANTITY	0.63+
last few years	DATE	0.59+
x86	QUANTITY	0.55+
IBM.com	ORGANIZATION	0.53+
SLA	ORGANIZATION	0.49+

Tricia Wang, Sudden Compass | IBM Data Science For All

>> Narrator: Live from New York City, it's theCUBE covering IBM Data Science For All brought to you by IBM. >> Welcome back here on theCUBE. We are live in New York continuing our coverage here for Data Science for All where all things happen. Big things are happening. In fact, there's a huge event tonight I'm going to tell you about a little bit later on, but Tricia Wang who is our next guest is a part of that panel discussion that you'll want to tune in for live on ibmgo.com. 6 o'clock, but more on that a little bit later on. Along with Dave Vellante, John Walls here, and Tricia Wang now joins us. A first ever for us. How are you doing? >> Good. >> A global tech ethnographer. >> You said it correctly, yay! >> I learned a long time ago when you're not sure slow down. >> A plus already. >> Slow down and breathe. >> Slow down. >> You did a good job. Want to do it one more time? >> A global tech ethnographer. >> Tricia: Good job. >> Studying ethnography and putting ethnography into practice. How about that? >> Really great. >> That's taking on the challenge stretch. >> Now say it 10 times faster in a row. >> How about when we're done? Also co-founder of Sudden Compass. So first off, let's tell our viewers a little bit about Sudden Compass. Then I want to get into the ethnography and how that relates to tech. So let's go first off about Sudden Compass and the origins there. >> So Sudden Compass, we're a consulting firm based in New York City, and we help our partners embrace and understand the complexity of their customers. So whenever there are, wherever there's data and wherever there's people, we are there to help them make sure that they can understand their customers at the end of the day. And customers are really the most unpredictable, the most unknown, and the most difficult to quantify thing for any business. We see a lot of our partners really investing in big data data science tools and they're hiring the most amazing data scientists, but we saw them still struggling to make the right decisions, they still weren't getting their ROI, and they certainly weren't growing their customer base. And what we are helping them do is to say, "Look, you can't just rely only on data science. "You can't put it all into only the tool. "You have to think about how to operationalize that "and build a culture around it "and get the right skillsets in place, "and incorporate what we call the thick data, "which is the stuff that's very difficult to quantify, "the unknown, "and then you can figure out "how to best mathematically scale your data models "when it's actually based on real human behavior, "which is what the practice of ethnography is there to help "is to help you understand what do humans actually do, "what is unquantifiable. "And then once you find out those unquantifiable bits "you then have the art and science of figuring out "how do you scale it into a data model." >> Yeah, see that's what I find fascinating about this is that you've got hard and fast, right, data, objective, black and white, very clear, and then you've got people, you know? We all react differently. We have different influences, and different biases, and prejudices, and all that stuff, aptitudes. So you are meshing this art and science. >> Tricia: Absolutely. >> And what is that telling you then about how best to your clients and how to use data (mumbles)? >> Well, we tell our clients that because people are, there are biases, and people are not objective and there's emotions, that all ends up in the data set. To think that your data set, your quantitative data set, is free of biases and has some kind of been scrubbed of emotion is a total fallacy and it's something that needs to be corrected, because that means decision makers are making decisions based off of numbers thinking that they're objective when in fact they contain all the biases of the very complexity of the humans that they're serving. So, there is an art and science of making sure that when you capture that complexity ... We're saying, "Don't scrub it away." Traditional marketing wants to say, "Put your customers in boxes. "Put them in segments. "Use demographic variables like education, income. "Then you can just put everyone in a box, "figure out where you want to target, "figure out the right channels, "and you buy against that and you reach them." That's not how it works anymore. Customers now are moving faster than corporations. The new net worth customer of today has multiple identities is better understood when in relationship to other people. And we're not saying get rid of the data science. We're saying absolutely have it. You need to have scale. What is thick data going to offer you? Not scale, but it will offer you depth. So, that's why you need to combine both to be able to make effective decisions. >> So, I presume you work with a lot of big consumer brands. Is that a safe assumption? >> Absolutely. >> Okay. So, we work with a lot of big tech brands, like IBM and others, and they tend to move at the speed of the CIO, which tends to be really slow and really risk averse, and they're afraid to over rotate and get ahead over their skis. What do you tell folks like that? Is that a mistake being so cautious in this digital age? >> Well, I think the new CIO is on the cutting edge. I was just at Constellation Research Annual Conference in Half Moon Bay at-- >> Our friend Ray Wang. >> Yeah, Ray Wang. And I just spoke about this at their Constellation Connected Enterprise where they had the most, I would have to say the most amazing forward thinking collection of CIOs, CTOs, CDOs all in one room. And the conversation there was like, "We cannot afford to be slow anymore. "We have to be on the edge "of helping our companies push the ground." So, investing in tools is not enough. It is no longer enough to be the buyer, and to just have a relationship with your vendor and assume that they will help you deliver all the understanding. So, CIOs and CTOs need to ensure that their teams are diverse, multi-functional, and that they're totally integrated embedded into the business. And I don't mean just involve a business analyst as if that's cutting edge. I'm saying, "No, you need to make sure that every team "has qualitative people, "and that they're embedded and working closely together." The problem is we don't teach these skills. We're not graduating data scientists or ethnographers who even want to talk to each other. In fact, each side thinks the other side is useless. We're saying, "No, "we need to be able to have these skills "being taught within companies." And you don't need to hire a PhD data scientist or a PhD ethnographer. What we're saying is that these skills can be taught. We need to teach people to be data literate. You've hired the right experts, you have bought the right tools, but we now need to make sure that we're creating data literacy among decision makers so that we can turn these data into insights and then into action. >> Let's peel that a little bit. Data literate, you're talking about creativity, visualization, combining different perspectives? Where should the educational focus be? >> The educational focus should be on one storytelling. Right now, you cannot just be assuming that you can have a decision maker make a decision based on a number or some long PowerPoint report. We have to teach people how to tell compelling stories with data. And when I say data I'm talking about it needs the human component and it needs the numbers. And so one of the things that I saw, this is really close to my heart, was when I was at Nokia, and I remember I spent a decade understanding China. I really understood China. And when I finally had the insight where I was like, "Look, after spending 10 years there, "following 100 to 200 families around, "I had the insight back in 2009 that look, "your company is about to go out of business because "people don't want to buy your feature phones anymore. "They're going to want to buy smartphones." But, I only had qualitative data, and I needed to work alongside the business analysts and the data scientists. I needed access to their data sets, but I needed us to play together and to be on a team together so that I could scale my insights into quantitative models. And the problem was that, your question is, "What does that look like?" That looks like sitting on a team, having a mandate to say, "You have to play together, "and be able to tell an effective story "to the management and to leadership." But back then they were saying, "No, "we don't even consider your data set "to be worthwhile to even look at." >> We love our candy bar phone, right? It's a killer. >> Tricia: And we love our numbers. We love our surveys that tell us-- >> Market share was great. >> Market share is great. We've done all of the analysis. >> Forget the razor. >> Exactly. I'm like, "Look, of course your market share was great, "because your surveys were optimized "for your existing business model." So, big data is great if you want to optimize your supply chain or in systems that are very contained and quantifiable that's more or less fine. You can get optimization. You can get that one to two to five percent. But if you really want to grow your company and you want to ensure its longevity, you cannot just rely on your quantitative data to tell you how to do that. You actually need thick data for discovery, because you need to find the unknown. >> One of the things you talk about your passion is to understand how human perspectives shape the technology we build and how we use it. >> Tricia: Yes, you're speaking my language. >> Okay, so when you think about the development of the iPhone, it wasn't a bunch of surveys that led Steve Jobs to develop the iPhone. I guess the question is does technology lead and shape human perspectives or do human perspectives shape technology? >> Well, it's a dialectical relationship. It's like does a hamburger ... Does a bun shape the burger or does the bun shape the burger? You would never think of asking someone who loves a hamburger that question, because they both shape each other. >> Okay. (laughing) >> So, it's symbiote here, totally symbiotic. >> Surprise answer. You weren't expecting that. >> No, but it is kind of ... Okay, so you're saying it's not a chicken and egg, it's both. >> Absolutely. And the best companies are attuned to both. The best companies know that. The most powerful companies of the 21st century are obsessed with their customers and they're going to do a great job at leveraging human models to be scaled into data models, and that gap is going to be very, very narrow. You get big data. We're going to see more AI or ML disasters when their data models are really far from their actual human models. That's how we get disasters like Tesco or Target, or even when Google misidentified black people as gorillas. It's because their model of their data was so far from the understanding of humans. And the best companies of the future are going to know how to close that gap, and that means they will have the thick data and big data closely integrated. >> Who's doing that today? It seems like there are no ethics in AI. People are aggressively AI for profit and not really thinking about the human impacts and the societal impacts. >> Let's look at IBM. They're doing it. I would say that some of the most innovative projects that are happening at IBM with Watson, where people are using AI to solve meaningful social problems. I don't think that has to be-- >> Like IBM For Social Good. >> Exactly, but it's also, it's not just experimental. I think IBM is doing really great stuff using Watson to understand, identify skin cancer, or looking at the ways that people are using AI to understand eye diseases, things that you can do at scale. But also businesses are also figuring out how to use AI for actually doing better things. I think some of the most interesting ... We're going to see more examples of people using AI for solving meaningful social problems and making a profit at the same time. I think one really great example is WorkIt is they're using AI. They're actually working with Watson. Watson is who they hired to create their engine where union workers can ask questions of Watson that they may not want to ask or may be too costly to ask. So you can be like, "If I want to take one day off, "will this affect my contract or my job?" That's a very meaningful social problem that unions are now working with, and I think that's a really great example of how Watson is really pushing the edge to solve meaningful social problems at the same time. >> I worry sometimes that that's like the little device that you put in your car for the insurance company to see how you drive. >> How do you brake? How do you drive? >> Do people trust feeding that data to Watson because they're afraid Big Brother is watching? >> That's why we always have to have human intelligence working with machine intelligence. This idea of AI versus humans is a false binary, and I don't even know why we're engaging in those kinds of questions. We're not clearly, but there are people who are talking about it as if it's one or the other, and I find it to be a total waste of time. It's like clearly the best AI systems will be integrated with human intelligence, and we need the human training the data with machine learning systems. >> Alright, I'll play the yeah but. >> You're going to play the what? >> Yeah but! >> Yeah but! (crosstalk) >> That machines are replacing humans in cognitive functions. You walk into an airport and there are kiosks. People are losing jobs. >> Right, no that's real. >> So okay, so that's real. >> That is real. >> You agree with that. >> Job loss is real and job replacement is real. >> And I presume you agree that education is at least a part the answer, and training people differently than-- >> Tricia: Absolutely. >> Just straight reading, writing, and arithmetic, but thoughts on that. >> Well what I mean is that, yes, AI is replacing jobs, but the fact that we're treating AI as some kind of rogue machine that is operating on its own without human guidance, that's not happening, and that's not happening right now, and that's not happening in application. And what is more meaningful to talk about is how do we make sure that humans are more involved with the machines, that we always have a human in the loop, and that they're always making sure that they're training in a way where it's bringing up these ethical questions that are very important that you just raised. >> Right, well, and of course a lot of AI people would say is about prediction and then automation. So think about some of the brands that you serve, consult with, don't they want the machines to make certain decisions for them so that they can affect an outcome? >> I think that people want machines to surface things that is very difficult for humans to do. So if a machine can efficiently surface here is a pattern that's going on then that is very helpful. I think we have companies that are saying, "We can automate your decisions," but when you actually look at what they can automate it's in very contained, quantifiable systems. It's around systems around their supply chain or logistics. But, you really do not want your machine automating any decision when it really affects people, in particular your customers. >> Okay, so maybe changing the air pressure somewhere on a widget that's fine, but not-- >> Right, but you still need someone checking that, because will that air pressure create some unintended consequences later on? There's always some kind of human oversight. >> So I was looking at your website, and I always look for, I'm intrigued by interesting, curious thoughts. >> Tricia: Okay, I have a crazy website. >> No, it's very good, but back in your favorite quotes, "Rather have a question I can't answer "than an answer I can't question." So, how do you bring that kind of there's no fear of failure to the boardroom, to people who have to make big leaps and big decisions and enter this digital transformative world? >> I think that a lot of companies are so fearful of what's going to happen next, and that fear can oftentimes corner them into asking small questions and acting small where they're just asking how do we optimize something? That's really essentially what they're asking. "How do we optimize X? "How do we optimize this business?" What they're not really asking are the hard questions, the right questions, the discovery level questions that are very difficult to answer that no big data set can answer. And those are questions ... The questions about the unknown are the most difficult, but that's where you're going to get growth, because when something is unknown that means you have not either quantified it yet or you haven't found the relationship yet in your data set, and that's your competitive advantage. And that's where the boardroom really needs to set the mandate to say, "Look, I don't want you guys only answering "downstream, company-centric questions like, "'How do we optimize XYZ?"'" which is still important to answer. We're saying you absolutely need to pay attention to that, but you also need to ask upstream very customer-centric questions. And that's very difficult, because all day you're operating inside a company . You have to then step outside of your shoes and leave the building and see the world from a customer's perspective or from even a non existing customer's perspective, which is even more difficult. >> The whole know your customer meme has taken off in a big way right now, but I do feel like the pendulum is swinging. Well, I'm sanguined toward AI. It seems to me that ... It used to be that brands had all the power. They had all the knowledge, they knew the pricing, and the consumers knew nothing. The Internet changed all that. I feel like digital transformation and all this AI is an attempt to create that asymmetry again back in favor of the brand. I see people getting very aggressive toward, certainly you see this with Amazon, Amazon I think knows more about me than I know about myself. Should we be concerned about that and who protects the consumer, or is just maybe the benefits outweigh the risks there? >> I think that's such an important question you're asking and it's totally important. A really great TED talk just went up by Zeynep Tufekci where she talks about the most brilliant data scientists, the most brilliant minds of our day, are working on ad tech platforms that are now being created to essentially do what Kenyatta Jeez calls advertising terrorism, which is that all of this data is being collected so that advertisers have this information about us that could be used to create the future forms of surveillance. And that's why we need organizations to ask the kind of questions that you did. So two organizations that I think are doing a really great job to look at are Data & Society. Founder is Danah Boyd. Based in New York City. This is where I'm an affiliate. And they have all these programs that really look at digital privacy, identity, ramifications of all these things we're looking at with AI systems. Really great set of researchers. And then Vint Cerf (mumbles) co-founded People-Centered Internet. And I think this is another organization that we really should be looking at, it's based on the West Coast, where they're also asking similar questions of like instead of just looking at the Internet as a one-to-one model, what is the Internet doing for communities, and how do we make sure we leverage the role of communities to protect what the original founders of the Internet created? >> Right, Danah Boyd, CUBE alum. Shout out to Jeff Hammerbacher, founder of Cloudera, the originator of the greatest minds of my generation are trying to get people to click on ads. Quit Cloudera and now is working at Mount Sinai as an MD, amazing, trying to solve cancer. >> John: A lot of CUBE alums out there. >> Yeah. >> And now we have another one. >> Woo-hoo! >> Tricia, thank you for being with us. >> You're welcome. >> Fascinating stuff. >> Thanks for being on. >> It really is. >> Great questions. >> Nice to really just change the lens a little bit, look through it a different way. Tricia, by the way, part of a panel tonight with Michael Li and Nir Kaldero who we had earlier on theCUBE, 6 o'clock to 7:15 live on ibmgo.com. Nate Silver also joining the conversation, so be sure to tune in for that live tonight 6 o'clock. Back with more of theCUBE though right after this. (techno music)

Published Date : Nov 1 2017

SUMMARY :

brought to you by IBM. I'm going to tell you about a little bit later on, Want to do it one more time? and putting ethnography into practice. the challenge stretch. and how that relates to tech. and the most difficult to quantify thing for any business. and different biases, and prejudices, and all that stuff, and it's something that needs to be corrected, So, I presume you work with a lot of big consumer brands. and they tend to move at the speed of the CIO, I was just at Constellation Research Annual Conference and assume that they will help you deliver Where should the educational focus be? and to be on a team together We love our candy bar phone, right? We love our surveys that tell us-- We've done all of the analysis. You can get that one to two to five percent. One of the things you talk about your passion that led Steve Jobs to develop the iPhone. or does the bun shape the burger? Okay. You weren't expecting that. but it is kind of ... and that gap is going to be very, very narrow. and the societal impacts. I don't think that has to be-- and making a profit at the same time. that you put in your car for the insurance company and I find it to be a total waste of time. You walk into an airport and there are kiosks. but thoughts on that. that are very important that you just raised. So think about some of the brands that you serve, But, you really do not want your machine Right, but you still need someone checking that, and I always look for, to the boardroom, and see the world from a customer's perspective and the consumers knew nothing. that I think are doing a really great job to look at Shout out to Jeff Hammerbacher, Nice to really just change the lens a little bit,

ENTITIES

Entity	Category	Confidence
Diane Greene	PERSON	0.99+
Eric Herzog	PERSON	0.99+
James Kobielus	PERSON	0.99+
Jeff Hammerbacher	PERSON	0.99+
Diane	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Mark Albertson	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Rebecca Knight	PERSON	0.99+
Jennifer	PERSON	0.99+
Colin	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
Rob Hof	PERSON	0.99+
Uber	ORGANIZATION	0.99+
Tricia Wang	PERSON	0.99+
Facebook	ORGANIZATION	0.99+
Singapore	LOCATION	0.99+
James Scott	PERSON	0.99+
Scott	PERSON	0.99+
Ray Wang	PERSON	0.99+
Dell	ORGANIZATION	0.99+
Brian Walden	PERSON	0.99+
Andy Jassy	PERSON	0.99+
Verizon	ORGANIZATION	0.99+
Jeff Bezos	PERSON	0.99+
Rachel Tobik	PERSON	0.99+
Alphabet	ORGANIZATION	0.99+
Zeynep Tufekci	PERSON	0.99+
Tricia	PERSON	0.99+
Stu	PERSON	0.99+
Tom Barton	PERSON	0.99+
Google	ORGANIZATION	0.99+
Sandra Rivera	PERSON	0.99+
John	PERSON	0.99+
Qualcomm	ORGANIZATION	0.99+
Ginni Rometty	PERSON	0.99+
France	LOCATION	0.99+
Jennifer Lin	PERSON	0.99+
Steve Jobs	PERSON	0.99+
Seattle	LOCATION	0.99+
Brian	PERSON	0.99+
Nokia	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
Peter Burris	PERSON	0.99+
Scott Raynovich	PERSON	0.99+
Radisys	ORGANIZATION	0.99+
HP	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Eric	PERSON	0.99+
Amanda Silver	PERSON	0.99+

Nir Kaldero, Galvanize | IBM Data Science For All

>> Announcer: Live from New York City, it's The Cube, covering IBM data science for all. Brought to you by IBM. >> Welcome back to data science for all. This is IBM's event here on the west side of Manhattan, here on The Cube. We're live, we'll be here all day, along with Dave Vallente, I'm John Walls Poor Dave had to put up with all that howling music at this hotel last night, kept him up 'til, all hours. >> Lots of fun here in the city. >> Yeah, yeah. >> All the crazies out last night. >> Yeah, but the headphones, they worked for ya. Glad to hear that. >> People are already dressed for Halloween, you know what I mean? >> John: Yes. >> In New York, you know what I mean? >> John: All year. >> All the time. >> John: All year. >> 365. >> Yeah. We have with us now the head of data science, and the VP at Galvanize, Nir Kaldero, and Nir, good to see you, sir. Thanks for being with us. We appreciate the time. >> Well of course, my pleasure. >> Tell us about Galvanize. I know you're heavily involved in education in terms of the tech community, but you've got corporate clients, you've got academic clients. You cover the waterfront, and I know data science is your baby. >> Nir: Right. >> But tell us a little bit about Galvanize and your mission there. >> Sure, so Galvanize is the learning community for technology. We provide the training in data science, data engineering, and also modern software engineering. We recently built a very large, fast growing enterprise corporate training department, where we basically help companies become digital, become nimble, and also very data driven, so they can actually go through this digital transformation, and survive in this fourth industrial revolution. We do it across all layers of the business, from the executives, to managers, to data scientists, and data analysts, and kind of transform and upscale all current skills to be modern, to be digital, so companies can actually go through this transformation. >> Hit on one of those items you talked about, data driven. >> Nir: Right. >> It seems like a no-brainer, right? That the more information you give me, the more analysis I can apply to it, the more I can put it in my business practice, the more money I make, the more my customers are happy. It's a lay up, right? >> Nir: It is. >> What is a data driven organization, then? Do you have to convince people that this is where they need to be today? >> Sometimes I need to convince them, but (laughs) anyway, so let's back up a little bit. We are in the midst of the fourth industrial revolution, and in order to survive in this fourth industrial revolution, companies need to become nimble, as I said, become agile, but most importantly become data driven, so the organization can actually best respond to all the predictions that are coming from this very sophisticated machine intelligence models. If the organization immediately can best respond to all of that, companies will be able to enhance the user experience, get insight about their customers, enhance performances, and et cetera, and we know that the winners in this revolution, in this era, will be companies who are very digital, that master the skills of becoming a data driven organization, and you know, we can talk more about the transformation, and what it consisted of. Do you want me to? >> John: Sure. >> Can I just ask you a question? This fourth wave, this is what, the cognitive machine wave? Or how would you describe it? >> Some people call it artificial intelligence. I think artificial intelligence is like big data, kind of like a buzz word. I think more appropriately, we should call it machine intelligence industrial revolution. >> Okay. I've got a lot of questions, but carry on. >> So hitting on that, so you see that as being a major era. >> Nir: It's a game changer. >> If you will, not just a chapter, but a major game changer. >> Nir: Yup. >> Why so? >> So, okay, I'll jump in again. Machines have always replaced man, people. >> John: The automation, right. >> Nir: To some extent. >> But certain machines have replaced certain human tasks, let's say that. >> Nir: Correct. >> But for the first time in history, this fourth era, machine's are replacing humans with cognitive tasks, and that scares a lot of people, because you look at the United States, the median income of the U.S. worker has dropped since 1999, from $55,000 to $52,000, and a lot of people believe it's sort of the hollowing out of that factor that we just mentioned. Education many believe is the answer. You know, Galvanize is an organization that plays a critical role in helping deal with that problem, does it not? >> So, as Mark Zuckerberg says, there is a lot of hate love relationship with A.I. People love it on one side, because they're excited about all the opportunities that can come from this utilization of machine intelligence, but many people actually are afraid from it. I read a survey a few weeks ago that says that 36% of the population thinks that A.I. will destroy humanity, and will conquer the world. That's a fact that's what people think. If I think it's going to happen? I don't think so. I highly believe that education is one of the pillars that can address this fear for machine intelligence, and you spoke a lot about jobs I talk about it forever, but just my belief is that machines can actually replace some of our responsibilities, right? Not necessarily take and replace the entire job. Let's talk about lawyers, right? Lawyers currently spend between 40% to 60% of the time writing contracts, or looking at previous cases. The machine can write a contract in two minutes, or look up millions of data points of previous cases in zero time. Why a lawyer today needs to spend 40% to 60% of the time on that? >> Billable hours, that's why. >> It is, so I don't think the machine will replace the job of the lawyer. I think in the future, the machine replaces some of the responsibilities, like auditing, or writing contracts, or looking at previous cases. >> Menial labor, if you will. >> Yes, but you know, for example, the machine is not that great right now with negotiations skills. So maybe in the future, the job of the lawyer will be mostly around negotiation skills, rather than writing contracts, et cetera, but yeah, you're absolutely right. There is a big fear in the market right now among executives, among people in the public. I think we should educate people about what is the true implications of machine intelligence in this fourth industrial revolution and era, and education is definitely one of those. >> Well, one of my favorite stories, when people bring up this topic, is when Gary Kasparov lost to the IBM super computer, Blue Jean, or whatever it's called. >> Nir: Yup. >> Instead of giving up, what he said is he started a competition, where he proved that humans and machines could beat the IBM super computer. So to this day has a competition where the best chess player in the world is a combination between humans and machines, and so it's that creativity. >> Nir: Imagination. >> Imagination, right, combinatorial effects of different technologies that education, hopefully, can help keep those either way. >> Look, I'm a big fan of neuroscience. I wish I did my PhD in neuroscience, but we are very, very far away from understanding how our brain works. Now to try to imitate the brain when we don't know how the brain works? We are very far away from being in a place where a machine can actually replicate, and really best respond like a human. We don't know how our brain works yet. So we need to do a lot of research on that before we actually really write a very strong, powerful machine intelligence model that can actually replace us as humans, and outbid us. We can speak about Jeopardy, and what's on, and we can speak about AlphaGo, it's a Google company that kind of outperformed the world champion. These are very specific tasks, right? Again, like the lawyer, the machines can write beautiful contracts with NLP, machines can look at millions and trillions of data and figure out what's the conclusion there, right? Or summarize text very fast, but not necessarily good in negotiation yet. >> So when you think about a digital business, to us a digital business is a business that uses data to differentiate, and serve customers, and maintain customers. So when you talk about data driven, it strikes me that when everybody's saying digital business, digital transformation, it's about a data transformation, how well they utilize data, and if you look at the bell curve of organizations, most are not. Everybody wants to be data driven, many say they are data driven. >> Right. >> Dave: Would you agree most are not? >> I will agree that most companies say that they are data driven, but actually they're not. I work with a lot of Fortune 500 companies on a daily basis. I meet their executives and functional leaders, and actually see their data, and business problems that they have. Most of them do tend to say that they are data driven, but truly just ask them if they put data and decisions in the same place, every time they have to make a decision, they don't do it. It's a habit that they don't yet have. Companies need to start investing in building what we say healthy data culture in order to enable and become data driven. Part of it is democratization of data, right? Currently what I see if lots of organizations actually open the data just for the analyst, or the marketers, people who kind of make decisions, that need to make decisions with data, but not throughout the entire organization. I know I always say that everyone in the organization makes decisions on a daily basis, from the barista, to the CEO, right? And the entirety of becoming data driven is that data can actually help us make better decisions on a daily basis, so how about democratizing the data to everyone? So everyone, from the barista, to the CEO, can actually make better decisions on a daily basis, and companies don't excel yet in doing it. Not every company is as digital as Amazon. Amazon, I think, is actually one of the most digital companies in the world, if you look at the digital index. Not everyone is Google or Facebook. Most companies want to be there, most companies understand that they will not be able to survive in this era if they will not become data driven, so it's a big problem. We try at Galvanize to address this problem from executive type of education, where we actually meet with the C-level executives in companies, and actually guide them through how to write their data strategy, how to think about prioritizing data investment, to actual implementation of that, and so far we are highly successful. We were able to make a big transformation in very large, important organizations. So I'm actually very proud of it. >> How long are these eras? Is it a century, or more? >> This fourth industrial? >> Yeah. >> Well it's hard to predict that, and I'm not a machine, or what's on it. (laughs) >> But certainly more than 50 years, would you say? Or maybe not, I don't know. >> I actually don't think so. I think it's going to be fast, and we're going to move to the next one pretty soon that will be even more, with more intelligence, with more data. >> So the reason I ask, is there was an article I saw and linked, and I haven't had time to read it, but it talked about the Four Horsemen, Amazon, Google, Facebook, and Apple, and it said they will all be out of business in 50 years. Now, I don't know, I think Apple probably has 50 years of cash flow in the bank, but then they said, the one, the author said, if I had to predict one that would survive, it would be Amazon, to your point, because they are so data driven. The premise, again I didn't read the whole thing, was that some new data driven, digital upstart will disrupt them. >> Yeah, and you know, companies like Amazon, and Alibaba lately, that try kind of like in a competition with Amazon about who is becoming more data driven, utilizing more machine intelligence, are the ones that invested in these capabilities many, many years ago. It's no that they started investing in it last year, or five years ago. We speak about 15 and 20 years ago. So companies who were really a pioneer, and invested very early on, will predict actually to survive in the future, and you know, very much align. >> Yeah, I'm going to touch on something. It might be a bridge too far, I don't know, but you talk about, Dave brought it up, about replacing human capital, right? Because of artificial intelligence. >> Nir: Yup. >> Is there a reluctance, perhaps, on behalf of executives to embrace that, because they are concerned about their own price? >> Nir: You should be in the room with me. (laughing) >> You provide data, but you also provide that capability to analyze, and make the best informed decision, and therefore, eliminate the human element of a C-suite executive that maybe they're not as necessary today, or tomorrow, as they were two years ago. >> So it is absolutely true, and there is a lot of fear in the room, especially when I show them robots, they freak out typically, (John and Dave laugh) but the fact is well known. Leaders who will not embrace these skills, and understanding, and will help the organization to become agile, nimble, and data driven, will not survive. They will be replaced. So on the one hand, they're afraid from it. On the other side, they see that if they will not actually do something, and take an action today, they might be replaced in the future. >> Where should organizations start? Hey, I want to be data driven. Where do I start? >> That's a good question. So data science, machine learning, is a top down initiative. It requires a lot of funding. It requires a change in culture and habits. So it has to start from the top. The journey has to start from executive, from educating and executive about what is data science, what is machine learning, how to prioritize investments in this field, how to build data driven culture, right? When we spoke about data driven, we mainly speaks about the culture aspect here, not specifically about the technical side of it. So it has to come from the top, leaders have to incorporate it in the organization, the have to give authority and power for people, they have to put the funding at first, and then, this is how it's beautiful, that you actually see it trickles down to the organization when they have a very powerful CEO that makes a decision, and moves the organization quickly to become data driven, make executives look at data every time they make a decision, get them into the habit. When people look up to executives, they try to do the same, and if my boss is an example for me, someone who is looking at data every time he is making a decision, ask the right questions, know how to prioritize, set the right goals for me, this helps me, and helps the organization better perform. >> Follow the leader, right? >> Yup. >> Follow the leader. >> Yup, follow the leader. >> Thanks for being with us. >> Nir: Of course, it's my pleasure. >> Pinned this interesting love hate thing that we have going on. >> We should address that. >> Right, right. That's the next segment, how about that? >> Nir Kaldero from Galvanize joining us here live on The Cube. Back with more from New York in just a bit.

Published Date : Nov 1 2017

SUMMARY :

Brought to you by IBM. the west side of Manhattan, Yeah, but the headphones, and the VP at Galvanize, Nir Kaldero, in terms of the tech community, and your mission there. from the executives, to managers, you talked about, data driven. the more analysis I can apply to it, We are in the midst of the I think artificial but carry on. so you see that as being a major era. If you will, not just a chapter, Machines have always replaced man, people. But certain machines have But for the first time of the pillars that can address of the responsibilities, the job of the lawyer will to the IBM super computer, and so it's that creativity. that education, hopefully, kind of outperformed the world champion. and if you look at the bell from the barista, to the CEO, right? and I'm not a machine, or what's on it. 50 years, would you say? I think it's going to be fast, the author said, if I had to are the ones that invested in Yeah, I'm going to touch on something. Nir: You should be in the room with me. and make the best informed decision, So on the one hand, Hey, I want to be data driven. the have to give authority that we have going on. That's the next segment, how about that? New York in just a bit.

ENTITIES

Entity	Category	Confidence
Dave Vallente	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Alibaba	ORGANIZATION	0.99+
John	PERSON	0.99+
Google	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
40%	QUANTITY	0.99+
Apple	ORGANIZATION	0.99+
Gary Kasparov	PERSON	0.99+
New York	LOCATION	0.99+
$55,000	QUANTITY	0.99+
50 years	QUANTITY	0.99+
IBM	ORGANIZATION	0.99+
Galvanize	ORGANIZATION	0.99+
Nir	PERSON	0.99+
New York City	LOCATION	0.99+
Mark Zuckerberg	PERSON	0.99+
Nir Kaldero	PERSON	0.99+
two minutes	QUANTITY	0.99+
tomorrow	DATE	0.99+
36%	QUANTITY	0.99+
1999	DATE	0.99+
Four Horsemen	ORGANIZATION	0.99+
United States	LOCATION	0.99+
60%	QUANTITY	0.99+
last year	DATE	0.99+
more than 50 years	QUANTITY	0.99+
$52,000	QUANTITY	0.99+
five years ago	DATE	0.99+
one	QUANTITY	0.98+
two years ago	DATE	0.98+
today	DATE	0.98+
first time	QUANTITY	0.98+
Manhattan	LOCATION	0.98+
Halloween	EVENT	0.97+
NLP	ORGANIZATION	0.97+
zero time	QUANTITY	0.97+
fourth wave	EVENT	0.97+
last night	DATE	0.96+
20 years ago	DATE	0.95+
AlphaGo	ORGANIZATION	0.95+
IBM Data Science	ORGANIZATION	0.93+
U.S.	LOCATION	0.93+
fourth industrial revolution	EVENT	0.93+
one side	QUANTITY	0.92+
millions and trillions	QUANTITY	0.9+
John Walls	PERSON	0.85+
years ago	DATE	0.83+
Edu	PERSON	0.82+
few weeks ago	DATE	0.82+
millions of data	QUANTITY	0.77+
fourth industrial revolution	EVENT	0.75+
Fortune 500	ORGANIZATION	0.73+
machine wave	EVENT	0.72+
cognitive	EVENT	0.72+
a century	QUANTITY	0.69+

Vikram Murali, IBM | IBM Data Science For All

>> Narrator: Live from New York City, it's theCUBE. Covering IBM Data Science For All. Brought to you by IBM. >> Welcome back to New York here on theCUBE. Along with Dave Vellante, I'm John Walls. We're Data Science For All, IBM's two day event, and we'll be here all day long wrapping up again with that panel discussion from four to five here Eastern Time, so be sure to stick around all day here on theCUBE. Joining us now is Vikram Murali, who is a program director at IBM, and Vikram thank for joining us here on theCUBE. Good to see you. >> Good to see you too. Thanks for having me. >> You bet. So, among your primary responsibilities, The Data Science Experience. So first off, if you would, share with our viewers a little bit about that. You know, the primary mission. You've had two fairly significant announcements. Updates, if you will, here over the past month or so, so share some information about that too if you would. >> Sure, so my team, we build The Data Science Experience, and our goal is for us to enable data scientist, in their path, to gain insights into data using data science techniques, mission learning, the latest and greatest open source especially, and be able to do collaboration with fellow data scientist, with data engineers, business analyst, and it's all about freedom. Giving freedom to data scientist to pick the tool of their choice, and program and code in the language of their choice. So that's the mission of Data Science Experience, when we started this. The two releases, that you mentioned, that we had in the last 45 days. There was one in September and then there was one on October 30th. Both of these releases are very significant in the mission learning space especially. We now support Scikit-Learn, XGBoost, TensorFlow libraries in Data Science Experience. We have deep integration with Horton Data Platform, which is keymark of our partnership with Hortonworks. Something that we announced back in the summer, and this last release of Data Science Experience, two days back, specifically can do authentication with Technotes with Hadoop. So now our Hadoop customers, our Horton Data Platform customers, can leverage all the goodies that we have in Data Science Experience. It's more deeply integrated with our Hadoop based environments. >> A lot of people ask me, "Okay, when IBM announces a product like Data Science Experience... You know, IBM has a lot of products in its portfolio. Are they just sort of cobbling together? You know? So exulting older products, and putting a skin on them? Or are they developing them from scratch?" How can you help us understand that? >> That's a great question, and I hear that a lot from our customers as well. Data Science Experience started off as a design first methodology. And what I mean by that is we are using IBM design to lead the charge here along with the product and development. And we are actually talking to customers, to data scientist, to data engineers, to enterprises, and we are trying to find out what problems they have in data science today and how we can best address them. So it's not about taking older products and just re-skinning them, but Data Science Experience, for example, it started of as a brand new product: completely new slate with completely new code. Now, IBM has done data science and mission learning for a very long time. We have a lot of assets like SPSS Modeler and Stats, and digital optimization. And we are re-investing in those products, and we are investing in such a way, and doing product research in such a way, not to make the old fit with the new, but in a way where it fits into the realm of collaboration. How can data scientist leverage our existing products with open source, and how we can do collaboration. So it's not just re-skinning, but it's building ground up. >> So this is really important because you say architecturally it's built from the ground up. Because, you know, given enough time and enough money, you know, smart people, you can make anything work. So the reason why this is important is you mentioned, for instance, TensorFlow. You know that down the road there's going to be some other tooling, some other open source project that's going to take hold, and your customers are going to say, "I want that." You've got to then integrate that, or you have to choose whether or not to. If it's a super heavy lift, you might not be able to do it, or do it in time to hit the market. If you architected your system to be able to accommodate that. Future proof is the term everybody uses, so have you done? How have you done that? I'm sure API's are involved, but maybe you could add some color. >> Sure. So we are and our Data Science Experience and mission learning... It is a microservices based architecture, so we are completely dockerized, and we use Kubernetes under the covers for container dockerstration. And all these are tools that are used in The Valley, across different companies, and also in products across IBM as well. So some of these legacy products that you mentioned, we are actually using some of these newer methodologies to re-architect them, and we are dockerizing them, and the microservice architecture actually helps us address issues that we have today as well as be open to development and taking newer methodologies and frameworks into consideration that may not exist today. So the microservices architecture, for example, TensorFlow is something that you brought in. So we can just pin up a docker container just for TensorFlow and attach it to our existing Data Science Experience, and it just works. Same thing with other frameworks like XGBoost, and Kross, and Scikit-Learn, all these are frameworks and libraries that are coming up in open source within the last, I would say, a year, two years, three years timeframe. Previously, integrating them into our product would have been a nightmare. We would have had to re-architect our product every time something came, but now with the microservice architecture it is very easy for us to continue with those. >> We were just talking to Daniel Hernandez a little bit about the Hortonworks relationship at high level. One of the things that I've... I mean, I've been following Hortonworks since day one when Yahoo kind of spun them out. And know those guys pretty well. And they always make a big deal out of when they do partnerships, it's deep engineering integration. And so they're very proud of that, so I want to come on to test that a little bit. Can you share with our audience the kind of integrations you've done? What you've brought to the table? What Hortonworks brought to the table? >> Yes, so Data Science Experience today can work side by side with Horton Data Platform, HDP. And we could have actually made that work about two, three months back, but, as part of our partnership that was announced back in June, we set up drawing engineering teams. We have multiple touch points every day. We call it co-development, and they have put resources in. We have put resources in, and today, especially with the release that came out on October 30th, Data Science Experience can authenticate using secure notes. That I previously mentioned, and that was a direct example of our partnership with Hortonworks. So that is phase one. Phase two and phase three is going to be deeper integration, so we are planning on making Data Science Experience and a body management pact. And so a Hortonworks customer, if you have HDP already installed, you don't have to install DSX separately. It's going to be a management pack. You just spin it up. And the third phase is going to be... We're going to be using YARN for resource management. YARN is very good a resource management. And for infrastructure as a service for data scientist, we can actually delegate that work to YARN. So, Hortonworks, they are putting resources into YARN, doubling down actually. And they are making changes to YARN where it will act as the resource manager not only for the Hadoop and Spark workloads, but also for Data Science Experience workloads. So that is the level of deep engineering that we are engaged with Hortonworks. >> YARN stands for yet another resource negotiator. There you go for... >> John: Thank you. >> The trivia of the day. (laughing) Okay, so... But of course, Hortonworks are big on committers. And obviously a big committer to YARN. Probably wouldn't have YARN without Hortonworks. So you mentioned that's kind of what they're bringing to the table, and you guys primarily are focused on the integration as well as some other IBM IP? >> That is true as well as the notes piece that I mentioned. We have a notes commenter. We have multiple notes commenters on our side, and that helps us as well. So all the notes is part of the HDP package. We need knowledge on our side to work with Hortonworks developers to make sure that we are contributing and making end roads into Data Science Experience. That way the integration becomes a lot more easier. And from an IBM IP perspective... So Data Science Experience already comes with a lot of packages and libraries that are open source, but IBM research has worked on a lot of these libraries. I'll give you a few examples: Brunel and PixieDust is something that our developers love. These are visualization libraries that were actually cooked up by IBM research and the open sourced. And these are prepackaged into Data Science Experience, so there is IBM IP involved and there are a lot of algorithms, mission learning algorithms, that we put in there. So that comes right out of the package. >> And you guys, the development teams, are really both in The Valley? Is that right? Or are you really distributed around the world? >> Yeah, so we are. The Data Science Experience development team is in North America between The Valley and Toronto. The Hortonworks team, they are situated about eight miles from where we are in The Valley, so there's a lot of synergy. We work very closely with them, and that's what we see in the product. >> I mean, what impact does that have? Is it... You know, you hear today, "Oh, yeah. We're a virtual organization. We have people all over the world: Eastern Europe, Brazil." How much of an impact is that? To have people so physically proximate? >> I think it has major impact. I mean IBM is a global organization, so we do have teams around the world, and we work very well. With the invent of IP telephoning, and screen-shares, and so on, yes we work. But it really helps being in the same timezone, especially working with a partner just eight miles or ten miles a way. We have a lot of interaction with them and that really helps. >> Dave: Yeah. Body language? >> Yeah. >> Yeah. You talked about problems. You talked about issues. You know, customers. What are they now? Before it was like, "First off, I want to get more data." Now they've got more data. Is it figuring out what to do with it? Finding it? Having it available? Having it accessible? Making sense of it? I mean what's the barrier right now? >> The barrier, I think for data scientist... The number one barrier continues to be data. There's a lot of data out there. Lot of data being generated, and the data is dirty. It's not clean. So number one problem that data scientist have is how do I get to clean data, and how do I access data. There are so many data repositories, data lakes, and data swamps out there. Data scientist, they don't want to be in the business of finding out how do I access data. They want to have instant access to data, and-- >> Well if you would let me interrupt you. >> Yeah? >> You say it's dirty. Give me an example. >> So it's not structured data, so data scientist-- >> John: So unstructured versus structured? >> Unstructured versus structured. And if you look at all the social media feeds that are being generated, the amount of data that is being generated, it's all unstructured data. So we need to clean up the data, and the algorithms need structured data or data in a particular format. And data scientist don't want to spend too much time in cleaning up that data. And access to data, as I mentioned. And that's where Data Science Experience comes in. Out of the box we have so many connectors available. It's very easy for customers to bring in their own connectors as well, and you have instant access to data. And as part of our partnership with Hortonworks, you don't have to bring data into Data Science Experience. The data is becoming so big. You want to leave it where it is. Instead, push analytics down to where it is. And you can do that. We can connect to remote Spark. We can push analytics down through remote Spark. All of that is possible today with Data Science Experience. The second thing that I hear from data scientist is all the open source libraries. Every day there's a new one. It's a boon and a bane as well, and the problem with that is the open source community is very vibrant, and there a lot of data science competitions, mission learning competitions that are helping move this community forward. And it's a good thing. The bad thing is data scientist like to work in silos on their laptop. How do you, from an enterprise perspective... How do you take that, and how do you move it? Scale it to an enterprise level? And that's where Data Science Experience comes in because now we provide all the tools. The tools of your choice: open source or proprietary. You have it in here, and you can easily collaborate. You can do all the work that you need with open source packages, and libraries, bring your own, and as well as collaborate with other data scientist in the enterprise. >> So, you're talking about dirty data. I mean, with Hadoop and no schema on, right? We kind of knew this problem was coming. So technology sort of got us into this problem. Can technology help us get out of it? I mean, from an architectural standpoint. When you think about dirty data, can you architect things in to help? >> Yes. So, if you look at the mission learning pipeline, the pipeline starts with ingesting data and then cleansing or cleaning that data. And then you go into creating a model, training, picking a classifier, and so on. So we have tools built into Data Science Experience, and we're working on tools, that will be coming up and down our roadmap, which will help data scientist do that themselves. I mean, they don't have to be really in depth coders or developers to do that. Python is very powerful. You can do a lot of data wrangling in Python itself, so we are enabling data scientist to do that within the platform, within Data Science Experience. >> If I look at sort of the demographics of the development teams. We were talking about Hortonworks and you guys collaborating. What are they like? I mean people picture IBM, you know like this 100 plus year old company. What's the persona of the developers in your team? >> The persona? I would say we have a very young, agile development team, and by that I mean... So we've had six releases this year in Data Science Experience. Just for the on premises side of the product, and the cloud side of the product it's got huge delivery. We have releases coming out faster than we can code. And it's not just re-architecting it every time, but it's about adding features, giving features that our customers are asking for, and not making them wait for three months, six months, one year. So our releases are becoming a lot more frequent, and customers are loving it. And that is, in part, because of the team. The team is able to evolve. We are very agile, and we have an awesome team. That's all. It's an amazing team. >> But six releases in... >> Yes. We had immediate release in April, and since then we've had about five revisions of the release where we add lot more features to our existing releases. A lot more packages, libraries, functionality, and so on. >> So you know what monster you're creating now don't you? I mean, you know? (laughing) >> I know, we are setting expectation. >> You still have two months left in 2017. >> We do. >> We do not make frame release cycles. >> They are not, and that's the advantage of the microservices architecture. I mean, when you upgrade, a customer upgrades, right? They don't have to bring that entire system down to upgrade. You can target one particular part, one particular microservice. You componentize it, and just upgrade that particular microservice. It's become very simple, so... >> Well some of those microservices aren't so micro. >> Vikram: Yeah. Not. Yeah, so it's a balance. >> You're growing, but yeah. >> It's a balance you have to keep. Making sure that you componentize it in such a way that when you're doing an upgrade, it effects just one small piece of it, and you don't have to take everything down. >> Dave: Right. >> But, yeah, I agree with you. >> Well, it's been a busy year for you. To say the least, and I'm sure 2017-2018 is not going to slow down. So continue success. >> Vikram: Thank you. >> Wish you well with that. Vikram, thanks for being with us here on theCUBE. >> Thank you. Thanks for having me. >> You bet. >> Back with Data Science For All. Here in New York City, IBM. Coming up here on theCUBE right after this. >> Cameraman: You guys are clear. >> John: All right. That was great.

Published Date : Nov 1 2017

SUMMARY :

Brought to you by IBM. Good to see you. Good to see you too. about that too if you would. and be able to do collaboration How can you help us understand that? and we are investing in such a way, You know that down the and attach it to our existing One of the things that I've... And the third phase is going to be... There you go for... and you guys primarily are So that comes right out of the package. The Valley and Toronto. We have people all over the We have a lot of interaction with them Is it figuring out what to do with it? and the data is dirty. You say it's dirty. You can do all the work that you need with can you architect things in to help? I mean, they don't have to and you guys collaborating. And that is, in part, because of the team. and since then we've had about and that's the advantage of microservices aren't so micro. Yeah, so it's a balance. and you don't have to is not going to slow down. Wish you well with that. Thanks for having me. Back with Data Science For All. That was great.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Vikram	PERSON	0.99+
John	PERSON	0.99+
three months	QUANTITY	0.99+
six months	QUANTITY	0.99+
John Walls	PERSON	0.99+
October 30th	DATE	0.99+
2017	DATE	0.99+
April	DATE	0.99+
June	DATE	0.99+
one year	QUANTITY	0.99+
Daniel Hernandez	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
September	DATE	0.99+
one	QUANTITY	0.99+
ten miles	QUANTITY	0.99+
YARN	ORGANIZATION	0.99+
eight miles	QUANTITY	0.99+
Vikram Murali	PERSON	0.99+
New York City	LOCATION	0.99+
North America	LOCATION	0.99+
two day	QUANTITY	0.99+
Python	TITLE	0.99+
two releases	QUANTITY	0.99+
New York	LOCATION	0.99+
two years	QUANTITY	0.99+
three years	QUANTITY	0.99+
six releases	QUANTITY	0.99+
Toronto	LOCATION	0.99+
today	DATE	0.99+
Both	QUANTITY	0.99+
two months	QUANTITY	0.99+
a year	QUANTITY	0.99+
Yahoo	ORGANIZATION	0.99+
third phase	QUANTITY	0.98+
both	QUANTITY	0.98+
this year	DATE	0.98+
first methodology	QUANTITY	0.98+
First	QUANTITY	0.97+
second thing	QUANTITY	0.97+
one small piece	QUANTITY	0.96+
One	QUANTITY	0.96+
XGBoost	TITLE	0.96+
Cameraman	PERSON	0.96+
about eight miles	QUANTITY	0.95+
Horton Data Platform	ORGANIZATION	0.95+
2017-2018	DATE	0.94+
first	QUANTITY	0.94+
The Valley	LOCATION	0.94+
TensorFlow	TITLE	0.94+

Daniel Hernandez, Analytics Offering Management | IBM Data Science For All

>> Announcer: Live from New York City, it's theCUBE. Covering IBM Data Science For All. Brought to you by IBM. >> Welcome to the big apple, John Walls and Dave Vellante here on theCUBE we are live at IBM's Data Science For All. Going to be here throughout the day with a big panel discussion wrapping up our day. So be sure to stick around all day long on theCUBe for that. Dave always good to be here in New York is it not? >> Well you know it's been kind of the data science weeks, months, last week we're in Boston at an event with the chief data officer conference. All the Boston Datarati were there, bring it all down to New York City getting hardcore really with data science so it's from chief data officer to the hardcore data scientists. >> The CDO, hot term right now. Daniel Hernandez now joins as our first guest here at Data Science For All. Who's a VP of IBM Analytics, good to see you. David thanks for being with us. >> Pleasure. >> Alright well give us first off your take, let's just step back high level here. Data science it's certainly been evolving for decades if you will. First off how do you define it today? And then just from the IBM side of the fence, how do you see it in terms of how businesses should be integrating this into their mindset. >> So the way I describe data science simply to my clients is it's using the scientific method to answer questions or deliver insights. It's kind of that simple. Or answering questions quantitatively. So it's a methodology, it's a discipline, it's not necessarily tools. So that's kind of the way I approach describing what it is. >> Okay and then from the IBM side of the fence, in terms of how wide of a net are you casting these days I assume it's as big as you can get your arms out. >> So when you think about any particular problem that's a data science problem, you need certain capabilities. We happen to deliver those capabilities. You need the ability to collect, store, manage, any and all data. You need the ability to organize that data so you can discover it and protect it. You got to be able to analyze it. Automate the mundane, explain the past, predict the future. Those are the capabilities you need to do data science. We deliver a portfolio of it. Including on the analyze part of our portfolio, our data science tools that we would declare as such. >> So data science for all is very aspirational, and when you guys made the announcement of the Watson data platform last fall, one of the things that you focused on was collaboration between data scientists, data engineers, quality engineers, application development, the whole sort of chain. And you made the point that most of the time that data scientists spend is on wrangling data. You're trying to attack that problem, and you're trying to break down the stovepipes between those roles that I just mentioned. All that has to happen before you can actually have data science for all. I mean that's just data science for all hardcore data people. Where are we in terms of sort of the progress that your clients have made in that regard? >> So you know, I would say there's two majors vectors of progress we've made. So if you want data science for all you need to be able to address people that know how to code and people that don't know how to code. So if you consider kind the history of IBM in the data science space especially in SPSS, which has been around for decades. We're mastering and solving data science problems for non-coders. The data science experience really started with embracing coders. Developers that grew up in open source, that lived and learned Jupiter or Python and were more comfortable there. And integration of these is kind of our focus. So that's one aspect. Serving the needs of people that know how to code and don't in the kind of data science role. And then for all means supporting an entire analytics life cycle from collecting the data you need in order to answer the question that you're trying to answer to organizing that information once you've collected so you can discover it inside of tools like our own data science experience and SPSS, and then of course the set of tools that around exploratory analytics. All integrated so that you can do that end to end life cycle. So where clients are, I think they're getting certainly much more sophisticated in understanding that. You know most people have approached data science as a tool problem, as a data prep problem. It's a life cycle problem. And that's kind of how we're thinking about it. We're thinking about it in terms of, alright if our job is answer questions, delivering insights through scientific methods, how do we decompose that problem to a set of things that people need to get the job done, serving the individuals that have to work together. >> And when you think about, go back to the days where it's sort of the data warehouse was king. Something we talked about in Boston last week, it used to be the data warehouse was king, now it's the process is much more important. But it was very few people had access to that data, you had the elapsed time of getting answers, and the inflexibility of the systems. Has that changed and to what degree has it changed? >> I think if you were to go ask anybody in business whether or not they have all the data they need to do their job, they would say no. Why? So we've invested in EDW's, we've invested in Hadoop. In part sometimes, the problem might be, I just don't have the data. Most of the time it is I have the data I just don't know where it is. So there's a pretty significant issue on data discoverability, and it's important that I might have data in my operational systems, I might have data inside my EDW, I don't have everything inside my EDW, I've standed up one or more data lakes, and to solve my problem like customer segmentation I have data everywhere, how do I find and bring it in? >> That seems like that should be a fundamental consideration, right? If you're going to gather this much more information, make it accessible to people. And if you don't, it's a big flaw, it's a big gap is it not? >> So yes, and I think part of the reason why is because governance professionals which I am, you know I spent quite a bit of time trying to solve governance related problems. We've been focusing pretty maniacally on kind of the compliance, and the regulatory and security related issues. Like how do we keep people from going to jail, how do we ensure regulatory compliance with things like e-discovery, and records for instance. And it just so happens the same discipline that you use, even though in some cases lighter weight implementations, are what you need in order to solve this data discovery problem. So the discourse around governance has been historically about compliance, about regulations, about cost takeout, not analytics. And so a lot of our time certainly in R&D is trying to solve that data discovery problem which is how do I discover data using semantics that I have, which as a regular user is not physical understandings of my data, and once I find it how am I assured that what I get is what I should get so that it's, I'm not subject to compliance related issues, but also making the company more vulnerable to data breach. >> Well so presumably part of that anyway involves automating classification at the point of creation or use, which is actually was a technical challenge for a number of years. Has that challenge been solved in your view? >> I think machine learning is, and in fact later on today I will be doing some demonstrations of technology which will show how we're making the application of machine learning easy, inside of everything we do we're applying machine learning techniques including to classification problems that help us solve the problem. So it could be we're automatically harvesting technical metadata. Are there business terms that could be automatically extracted that don't require some data steward to have to know and assert, right? Or can we automatically suggest and still have the steward for a case where I need a canonical data model, and so I just don't want the machine to tell me everything, but I want the machine to assist the data curation process. We are not just exploring the application of machine learning to solve that data classification problem, which historically was a manual one. We're embedding that into most of the stuff that we're doing. Often you won't even know that we're doing it behind the scenes. >> So that means that often times well the machine ideally are making the decisions as to who gets access to what, and is helping at least automate that governance, but there's a natural friction that occurs. And I wonder if you can talk about the balance sheet if you will between information as an asset, information as a liability. You know the more restrictions you put on that information the more it constricts you know a business user's ability. So how do you see that shaping up? >> I think it's often a people process problem, not necessarily a technology problem. I don't think as an industry we've figured it out. Certainly a lot of our clients haven't figured out that balance. I mean there are plenty of conversation I'll go into where I'll talk to a data science team in a same line of business as a governance team and what the data science team will tell us is I'm building my own data catalog because the stuff that the governance guys are doing doesn't help me. And the reason why it doesn't help me is because it's they're going through this top down data curation methodology and I've got a question, I need to go find the data that's relevant. I might not know what that is straight away. So the CDO function in a lot of organizations is helping bridge that. So you'll see governance responsibilities line up with the CDO with analytics. And I think that's gone a long way to bridge that gaps. But that conversation that I was just mentioning is not unique to one or two customers. Still a lot of customers are doing it. Often customers that either haven't started a CDO practice or are early days on it still. >> So about that, because this is being introduced to the workplace, a new concept right, fairly new CDOs. As opposed to CIO or CTO, you know you have these other. I mean how do you talk to your clients about trying to broaden their perspective on that and I guess emphasizing the need for them to consider putting somebody of a sole responsibility, or primary responsibility for their data. Instead of just putting it lumping it in somewhere else. >> So we happen to have one of the best CDO's inside of our group which is like a handy tool for me. So if I go into a client and it's purporting to be a data science problem and it turns out they have a data management issue around data discovery, and they haven't yet figured out how to install the process and people design to solve that particular issue one of the key things I'll do is I'll bring in our CDO and his delegates to have a conversation around them on what we're doing inside of IBM, what we're seeing in other customers to help institute that practice inside of, inside of their own organization. We have forums like the CDO event in Boston last week, which are designed to, you know it's not designed to be here's what IBM can do in technology, it's designed to say here's how the discipline impacts your business and here's some best practices you should apply. So if ultimately I enter into those conversations where I find that there's a need, I typically am like alright, I'm not going to, tools are part of the problem but not the only issue, let me bring someone in that can describe the people process related issues which you got to get right. In order for, in some cases to the tools that I deliver to matter. >> We had Seth Dobrin on last weekend in Boston, and Inderpal Bhandari as well, and he put forth this enterprise, sort of data blueprint if you will. CDO's are sort of-- >> Daniel: We're using that in IBM by the way. >> Well this is the thing, it's a really well thought out sort of structure that seems to be trickling down to the divisions. And so it's interesting to hear how you're applying Seth's expertise. I want to ask you about the Hortonworks relationship. You guys have made a big deal about that this summer. To me it was a no brainer. Really what was the point of IBM having a Hadoop distro, and Hortonworks gets this awesome distribution channel. IBM has always had an affinity for open source so that made sense there. What's behind that relationship and how's it going? >> It's going awesome. Perhaps what we didn't say and we probably should have focused on is the why customers care aspect. There are three main by an occasion use cases that customers are implementing where they are ready even before the relationship. They're asking IBM and Hortonworks to work together. And so we were coming to the table working together as partners before the deeper collaboration we started in June. The first one was bringing data science to Hadoop. So running data science models, doing data exploration where the data is. And if you were to actually rewind the clock on the IBM side and consider what we did with Hortonworks in full consideration of what we did prior, we brought the data science experience and machine learning to Z in February. The highest value transactional data was there. The next step was bring data science to where the, often for a lot of clients the second most valuable set of data which is Hadoop. So that was kind of part one. And then we've kind of continued that by bringing data science experience to the private cloud. So that's one use case. I got a lot data, I need to do data science, I want to do it in resident, I want to take advantage of the compute grid I've already laid down, and I want to take advantage of the performance benefits and the integrated security and governance benefits by having these things co-located. That's kind of play one. So we're bringing in data science experience and HDP and HDF, which are the Hortonworks distributions way closer together and optimized for each other. Another component of that is not all data is going to be in Hadoop as we were describing. Some of it's in an EDW and that data science job is going to require data outside of Hadoop, and so we brought big SQL. It was already supporting Hortonworks, we just optimized the stack, and so the combination of data science experience and big SQL allows you to data science against a broader surface area of data. That's kind of play one. Play two is I've got a EDW either for cost or agility reasons I want to augment it or some cases I might want to offload some data from it to Hadoop. And so the combination of Hortonworks plus big SQL and our data integration technologies are a perfect combination there and we have plenty of clients using that for kind of analytics offloading from EDW. And then the third piece that we're doing quite a bit of engineering, go-to-market work around is govern data lakes. So I want to enable self service analytics throughout my enterprise. I want self service analytics tools to everyone that has access to it. I want to make data available to them, but I want that data to be governed so that they can discover what's in it in the lake, and whatever I give them is what they should have access to. So those are the kind of the three tracks that we're working with Hortonworks on, and all of them are making stunning results inside of clients. >> And so that involves actually some serious engineering as well-- >> Big time. It's not just sort of a Barney deal or just a pure go to market-- >> It's certainly more the market texture and just works. >> Big picture down the road then. Whatever challenges that you see on your side of the business for the next 12 months. What are you going to tackle, what's that monster out there that you think okay this is our next hurdle to get by. >> I forgot if Rob said this before, but you'll hear him say often and it's statistically proven, the majority of the data that's available is not available to be Googled, so it's behind a firewall. And so we started last year with the Watson data platform creating an integrating data analytics system. What if customers have data that's on-prem that they want to take advantage of, what if they're not ready for the public cloud. How do we deliver public benefits to them when they want to run that workload behind a firewall. So we're doing a significant amount of engineering, really starting with the work that we did on a data science experience. Bringing it behind the firewall, but still delivering similar benefits you would expect if you're delivering it in the public cloud. A major advancement that IBM made is run IBM cloud private. I don't know if you guys are familiar with that announcement. We made, I think it's already two weeks ago. So it's a (mumbles) foundation on top of which we have micro services on top of which our stack is going to be made available. So when I think of kind of where the future is, you know our customers ultimately we believe want to run data and analytic workloads in the public cloud. How do we get them there considering they're not there now in a stepwise fashion that is sensible economically project management-wise culturally. Without having them having to wait. That's kind of big picture, kind of a big problem space we're spending considerable time thinking through. >> We've been talking a lot about this on theCUBE in the last several months or even years is people realize they can't just reform their business and stuff into the cloud. They have to bring the cloud model to their data. Wherever that data exists. If it's in the cloud, great. And the key there is you got to have a capability and a solution that substantially mimics that public cloud experience. That's kind of what you guys are focused on. >> What I tell clients is, if you're ready for certain workloads, especially green field workloads, and the capability exists in a public cloud, you should go there now. Because you're going to want to go there eventually anyway. And if not, then a vendor like IBM helps you take advantage of that behind a firewall, often in form facts that are ready to go. The integrated analytics system, I don't know if you're familiar with that. That includes our super advanced data warehouse, the data science experience, our query federation technology powered by big SQL, all in a form factor that's ready to go. You get started there for data and data science workloads and that's a major step in the direction to the public cloud. >> Alright well Daniel thank you for the time, we appreciate that. We didn't get to touch at all on baseball, but next time right? >> Daniel: Go Cubbies. (laughing) >> Sore spot with me but it's alright, go Cubbies. Alright Daniel Hernandez from IBM, back with more here from Data Science For All. IBM's event here in Manhattan. Back with more in theCUBE in just a bit. (electronic music)

Published Date : Nov 1 2017

SUMMARY :

Brought to you by IBM. So be sure to stick around all day long on theCUBe for that. to the hardcore data scientists. Who's a VP of IBM Analytics, good to see you. how do you see it in terms of how businesses should be So that's kind of the way I approach describing what it is. in terms of how wide of a net are you casting You need the ability to organize that data All that has to happen before you can actually and people that don't know how to code. Has that changed and to what degree has it changed? and to solve my problem like customer segmentation And if you don't, it's a big flaw, it's a big gap is it not? And it just so happens the same discipline that you use, Well so presumably part of that anyway We're embedding that into most of the stuff You know the more restrictions you put on that information So the CDO function in a lot of organizations As opposed to CIO or CTO, you know you have these other. the process and people design to solve that particular issue data blueprint if you will. that seems to be trickling down to the divisions. is going to be in Hadoop as we were describing. just a pure go to market-- that you think okay this is our next hurdle to get by. I don't know if you guys are familiar And the key there is you got to have a capability often in form facts that are ready to go. We didn't get to touch at all on baseball, Daniel: Go Cubbies. IBM's event here in Manhattan.

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
Daniel Hernandez	PERSON	0.99+
Daniel	PERSON	0.99+
February	DATE	0.99+
Boston	LOCATION	0.99+
Dave Vellante	PERSON	0.99+
one	QUANTITY	0.99+
David	PERSON	0.99+
Manhattan	LOCATION	0.99+
Inderpal Bhandari	PERSON	0.99+
June	DATE	0.99+
Rob	PERSON	0.99+
Dave	PERSON	0.99+
New York	LOCATION	0.99+
New York City	LOCATION	0.99+
last year	DATE	0.99+
Seth	PERSON	0.99+
Python	TITLE	0.99+
third piece	QUANTITY	0.99+
EDW	ORGANIZATION	0.99+
second	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
last week	DATE	0.99+
today	DATE	0.99+
First	QUANTITY	0.99+
SQL	TITLE	0.99+
two customers	QUANTITY	0.99+
Hadoop	TITLE	0.99+
first	QUANTITY	0.99+
SPSS	TITLE	0.98+
Seth Dobrin	PERSON	0.98+
three tracks	QUANTITY	0.98+
John Walls	PERSON	0.98+
IBM Analytics	ORGANIZATION	0.98+
first guest	QUANTITY	0.97+
two weeks ago	DATE	0.97+
one aspect	QUANTITY	0.96+
first one	QUANTITY	0.96+
Barney	ORGANIZATION	0.96+
two majors	QUANTITY	0.96+
last weekend	DATE	0.94+
this summer	DATE	0.94+
Hadoop	ORGANIZATION	0.93+
decades	QUANTITY	0.92+
last fall	DATE	0.9+
two	QUANTITY	0.85+
IBM Data Science For All	ORGANIZATION	0.79+
three main	QUANTITY	0.78+
next 12 months	DATE	0.78+
CDO	TITLE	0.77+
D	ORGANIZATION	0.72+

Geo Thomas, Benefit Science | PentahoWorld 2017

>> Announcer: Live from Orlando Florida. It's the Cube. Covering Pentaho World 2017. Brought to you by Hitachi Vantara. >> Welcome back to the Cube's live coverage of Pentaho World brought to you by Hitachi Vantara. I'm your host, Rebecca Knight along with my co-host, Jim Kobielus. We are joined by Geo Thomas. He is the director of It at Benefits Science a healthcare insurance analytics company. Thanks so much for coming on the Cube, Geo. >> Thank-you, thanks for having me. >> So Benefits Science is a company launched out of MIT, tell our viewers a little bit more about the company. >> Okay, so Benefits Science is a healthcare data analytic company which co-founded by MIT (mumbles). Doctor (mumbles) and Doctor Stephen so far and we have one more partner. We do data analytics on the healthcare side and we work with employers and the brokers to analyze the data and give them dashboards and workbooks, and so that's what we mainly do. And we, yeah. >> So, as you said, you work with employers to save them healthcare dollars. Can you get into the nitty-gritty a little bit more. >> That's exactly right, so what we do is we empower employers to manage their employee benefits. Providing them the data analytic tools and other optimization tools, and we give them a very fine clear picture of how these plans are performing, and how they can optimize their plans in the near future by giving plan optimization tools and (mumbles) algorithms and things like that. >> You refer this as a manage service for your clients or do you provide specifically licensed software that helps them do this for themselves? From their own premises. >> We are a Cloud platform, and we provide our platform as a sub-lease for our clients. So, we get the data from them and we provide data analytic tool by mashing of this data and they use our platform to see those reports and insights and things like that. >> So, healthcare data is a really special kind of complicated when it comes to data because there's so many security and privacy issues related to it, how do you go about it managing this kind of data? >> Healthcare data is a very complex, very huge and we can't expect what comes next and there a lot of regulations and there are a lot of security issues, so we take all these with upmost priority. So, our company is a SOC1, SOC2, certified company. Which covers a lot of regulations by itself. Our employee's, Benefits Science employees, are really very much aware of these heap of rules. And they are all certified. We have lots of internal an external audits and regulations throughout the place so that would cover all this compliance issues, mainly. >> From an operational standpoint, how are you managing the day-to-day, day-in and day-out, do you provide a data warehouse within which you load it and then from which you do the analysis? What's the sense for how you architected your environment and then where how Pentaho plays into the overall picture? >> We take the data. Once we get the data, we measure the data. So, how we do those, we use Pentahos, and then two and two. Because it gives us a very standardized methodology to process this data, so we identify the PHP data. We sample it, scramble it, and then we do the (mumbles). And once the data element is done, and nobody touches any of those PA jobs or the jobs which we created with Pentaho, and we run this in a very secure environment in which we put all this transformed data into a data analytical platform. >> When you say scramble, you're referring to masking and anonmyzing the data? >> Correct, yes. >> That's what I assumed, you tell me, that's required by HIPA, that you do it that way? >> Yes, that's correct, yeah, yeah. So, we don't take all the data for the development. We take only the sample data, and then we scramble it and we (mumbles) all this information. >> So, what kind of results have you seen in your company since using Pentaho? >> So, I started in almost one year back and when we started, we had 20 tenants. Now, we have 200 tenants, so that's the summary of recently of what I'm seeing because Pentaho gives us lot of flexibility to standardize and make proper checks and balances throughout the data pipeline and we had created very huge test framework which can run automatically. So, all these things would benefit us to board a client because right now, onboarding a client would take less than a week. >> When you say test run automatically what sort of test are you referring to? >> So, we create test scripts, and we created a test suit framework by using Pentaho Jobs. And we schedule that. That test suit what we do is every, whenever any tenant comes in, developers can create N number of test cases and plug that in. So, it is growing and that will run automatically. Along with the PA jobs. So, that gives us a number of outputs and checks and balances and depending on the results we board the client. >> Saving healthcare dollars, spending healthcare dollars. This is really part of the national conversation. How much does Benefits Science really feel a responsibility to weigh-in on these issues. We heard a lot from the CEO this morning about how Pentaho really views its guiding principles as doing good in the world and bettering society. >> The double bottom line. >> Very true, very true, because as Benefits Science company our vision, our motto is not to just built some software and give to customers and get some money. Our vision is to help people or employers reduce the healthcare cost, so. Our data scientists built this great plan optimization tool or (mumbles) to provide employers to look at, "Okay, these "are the large claimant details, which means we might have "to go and find out the reasons and work with them "to reduce the cost." So, we are giving all the tools for them and another thing is the data (mumbles) analyzer our users love it, because we provided a simplified cube for them to drag and drop and create the reports and they can easily drag a couple of data elements and come up with, "Okay, these are the paid amounts "which we paid last month, and this has to go down." So, they can come up with their own strategies to make it down, at least, for the next year and on. >> In terms of user's being able to, in a self-service basis define their views and their reports. Do you take that intelligence that you gained from users and then bring that back into the basic service in terms of adjusting the data model? The set of canned reports or dashboards you provide? What do you do in that regard? >> Yeah, so we have a custom insight reports. Which will give pretty good idea about what this data meant to be for the customers. Like drag dashboards or large claimants or quality measures so things like that. We also have another data science group works on this AI tools or machine-learning algorithms to provide more predictive analysis. So, that would give users a different perspective of, "Okay, if we do this, we can reduce the cost." >> Is that WECA or? >> No, we are using. That's another thing I want to go back and tell them. There is a WECA here, we probably have to start using it. So, right now, we are not, right now we are using RN Python. There's something called (mumbles). So, that's what we use. >> What are some challenges that you are facing right now? What is keeping you up at night? What do you want the next versions of Pentaho to solve for you? >> I'm Director of IT, so I care about IT more than the business. So, my challenge is always how I can board more clients within a short span of time. The scalability, the security, how we can make it compliant. So, I was listening to that ATO, what are the new things coming in ATO? One of the main thing I was looking at is the scalability that is there is something called Worker Naught, that's got announced in ATO. Which you can scale as a docker, and you can spin off as many dockers as you want, and it will work by itself. That's fantastic, I'm really looking forward to get that scalability into our system. >> So, you're saying your IT environment. Your focused now more and more on a Cloud data environment that takes the application functionality and wraps it as containers? So, that's where you're going? And then you're saying that, I don't want to put words in your mouth, what you're doing is consistent with where Pentaho's going with their overall product platform? >> We are hosting an (mumbles) Cloud with Pentaho. So, Pentaho is also going into that direction. Makes me very happy because we are really looking forward to get that working in the Cloud. The thing is the. The Worker Naught, what they're talking about? Is what we were thinking of implementing on our own. So, now they have their own Worker Naught which we can just take and put it there. So, that's very good news. >> I wanted to ask you about the talent shortage in technology because that is something that the CEO talked about, Karen Perlich talked about, too. Is this real dearth of talent in data science. There was a piece in the New York Times just the other day that talked about how data scientists just a PHD can come out and make a half a million dollars in Silicon Valley. What do you think will be the real change and will get more and more graduates into this field. It seems as though the money should be enticement enough. >> That's a million dollar question though. We are in the same boat. >> You're a Massachusetts' based company, it should be. >> Even with that, we are finding a lot of difficulties to get some good data scientists. Because the moment you pass out as data scientist they're asking half a million, so. >> Literally I saw an article the other day. A good data scientist in Silicon Valley can fetch upwards of a half a million per year, so. Imagine in other regions, and now Massachusetts has no shortage of educated, smart people, but still. >> They have that level, then yes. These tools would help, and. Building that artificial intelligence on top of these tools would help, definitely, to have some sort of, not depending on data scientists so much. That even others can do those kind of things. >> So, you might not need the talent in a way. >> I'm looking forward to that because I was listening to your session in the morning. Very impressed with that because that's where I'm also trying to see where the world is heading to. >> So, you make recommendations to your clients about how they should start structure their healthcare insurance plans or employees. Do you have a capability right now within Benefits Science to basically embed a recommendation engine of that sort to help advisors on your staff to work with clients to recommend the right set of options or approaches pulling from the data, that's already there? >> Yes, that's already there. So, we provide recommendations for clients by using these algorithms. So, we have this plan optimization tool. Which will give you, if you do such and such things this is going to go down in the next year. Or there is a plan designed data. So, whenever an enrollment happens the main thing that they look at is what plan they have to sell at for their set of employees. So, every case is unique. So, we put a lot of historical data information and we put those machine-learning algorithms in there and then we come up with. We clean that model with all this data and we predict for each tenant. So, we have that right now. >> Geo, thanks so much for coming on the Cube. It's been really fun talking to you. >> Thanks for having me. >> I'm Rebecca Knight for Jim Kobielus. We will have more from the Cube's live coverage of Pentaho World, just after this. (calm electronica music)

Published Date : Oct 26 2017

SUMMARY :

Brought to you by Hitachi Vantara. to you by Hitachi Vantara. about the company. and we work with employers and the brokers So, as you said, in the near future by giving or do you provide and we provide our platform and we can't expect what comes next and then we do the (mumbles). So, we don't take all the and we had created very and balances and depending on the results We heard a lot from the CEO this morning and this has to go down." in terms of adjusting the data model? Yeah, so we have a So, right now, we are not, right One of the main thing I was looking at is that takes the application functionality So, that's very good news. that the CEO talked about, We are in the same boat. You're a Massachusetts' Because the moment you article the other day. help, definitely, to have So, you might not to your session in the morning. of that sort to help and then we come up with. for coming on the Cube. the Cube's live coverage

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Karen Perlich	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Benefits Science	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
20 tenants	QUANTITY	0.99+
MIT	ORGANIZATION	0.99+
200 tenants	QUANTITY	0.99+
Geo Thomas	PERSON	0.99+
Pentaho	ORGANIZATION	0.99+
Massachusetts'	LOCATION	0.99+
Orlando Florida	LOCATION	0.99+
half a million	QUANTITY	0.99+
Stephen	PERSON	0.99+
next year	DATE	0.99+
less than a week	QUANTITY	0.98+
Hitachi Vantara	ORGANIZATION	0.98+
Worker Naught	ORGANIZATION	0.97+
HIPA	ORGANIZATION	0.97+
last month	DATE	0.96+
Pentaho World	TITLE	0.96+
two	QUANTITY	0.95+
One	QUANTITY	0.95+
Massachusetts	LOCATION	0.95+
Geo	PERSON	0.93+
half a million dollars	QUANTITY	0.93+
million dollar	QUANTITY	0.91+
RN Python	TITLE	0.91+
one year back	DATE	0.9+
each tenant	QUANTITY	0.9+
New York Times	ORGANIZATION	0.9+
Pentahos	ORGANIZATION	0.89+
WECA	ORGANIZATION	0.88+
this morning	DATE	0.84+
Cube	ORGANIZATION	0.82+
Pentaho World	EVENT	0.82+
half a million per	QUANTITY	0.81+
mumbles	ORGANIZATION	0.81+
PHP	TITLE	0.8+
2017	DATE	0.79+
SOC1	ORGANIZATION	0.79+
one more partner	QUANTITY	0.78+
PentahoWorld	ORGANIZATION	0.77+
double	QUANTITY	0.73+
ATO	TITLE	0.71+
SOC2	ORGANIZATION	0.62+
World	TITLE	0.59+
Doctor	PERSON	0.58+
couple	QUANTITY	0.55+
Cube	COMMERCIAL_ITEM	0.43+
Naught	TITLE	0.3+

Yael Garten, LinkedIn | Women in Data Science 2017

>> Announcer: Live, from Stanford University, it's the Cube, covering The Women in Data Science Conference, 2017. >> Welcome back to The Cube, we are live at Stanford University, at the 2nd annual Women in Data Science Conference, this great, fantastic one day technical conference. And we are so excited to be joined by Yael Garten, who was one of the career panelists. Yael, you are the Director of Data Science at LinkedIn, welcome to the cube. >> Yeah, thank you, thanks for having me. So excited to have you here, everybody knows LinkedIn. My parents even have probably multiple LinkedIn accounts, but they do. You've served, what 400 and plus million accounts, I'd love to understand, what is the role, what's the data scientist's role in the business overall? >> Yeah, so I guess when people ask me about data science, what I love to kind of start with is there are a couple different types of data science. And so I would basically say that there are two main categories by which we use data science at LinkedIn. If you think about it, there is really data science where a product of your work is for a human to consume. So using data to help inform business or product strategy, to make better products, make more informed decisions about how you're investing your resources. So that's one side, which is often called decision sciences, or advanced analytics. Another type of data science is where the consumer of the output is a machine. Alright so rather than a human, a machine. So basically they these are things like machine learning models and recommendation systems. So we have really both of those. The second category is what we call data products. And so we use those in virtually everything we do. So on the data products, much of LinkedIn is a data product, it's really based on date. Right, our profiles, our connection graph, the way that people are engaging with LinkedIn helps us improve the product for our members and clients. And then we use that data internally, to really make better decisions, to understand, you know how can we better serve the world's professionals, and make them more productive and successful? >> Right, fantastic, so tell us a little bit about your team. It sounds like it's sort of broken into those two domains. You must have quite a, a large team, or a lean team? >> So yeah, we have, the way we have our team is that we work really closely within all of our product verticals, and we embed closely with the business, to really understand kind of what are the needs. And then we work very cross-functionally. So we will typically have in any group, sort of a product manager, and engineer, a designer, a data scientist, often it's from both kinds of data scientists. So sort of one on the analytic side, one on the machine learning side. Right, marketing, business operation, so really very cross-functional teams working together, using this data. >> Very smart, it sounds very integrated from the beginning, where they kind of by design-- >> Yes. >> So that collaboration is really sort of natural within LinkedIn? >> Yes. >> That's fantastic, very progressive. And certainly it's something that everybody benefits from. >> Yes. >> Right because as whether you're on the advanced analytic side, or on the machine learning side, you're getting exposure to the business side, vice versa, which, that's really a great environment for success. >> Yes, yeah and part of, I think, what I love about LinkedIn is actually our data culture, and how kind of data is infused in the culture of how we do things. >> Right, which is really-- >> Right, not always the case. >> It's not, and it's, cultural shifts have, we were talking about that with a number of guests today, and especially the size of the organization, that's tough. >> Yael: Yes. >> So to have that built in and that integration as part of, this is how we do business is, really you can imagine all the potential and possibilities there. So would love to understand, how is LinkedIn using data to recommend ways to evolve products and services to best serve all of it's members? >> Yeah, so maybe two different examples of how we do this, one is, what we do is every launch that we have, so every feature that we generate, we really do it at an online experimentation setting. So we have a certain feature that we're about to roll out to our members. And we want to make sure that it's a better experience for our members. And better, as measured by kind of the metrics that we've defined in terms of measures of success. And so, which is really aligned to what value we believe we're delivering our members and customers. And so when we roll out features, we'll roll it out to a certain percentage of our users, test the downstream impacts of that, and then decide, based on that, whether we actually roll that feature out to 100% of members. And so that's one of the things that my team is heavily involved in, is really helping to use that data to make sure that we are structuring things in a way that's statistically sound, so that we can measure the impacts correctly, of rolling out certain features. So that's kind of one category of work. And the other category is really to, to do sort of opportunity identification, and kind of deep-dive insights into understanding into a certain product area. Where are there opportunities to improve the product? So one, let me give you a high-level example. One of the ways we might use data is to say okay, Are certain members in certain countries accessing via iOS or Android? And if so, should we be developing more in differentiating between iOS and Android apps? It's one simple example right, where we'll actually decide our R&D investments, based on the data that we're seeing in terms of how people are using our products and do we think that that's important enough of an investment to improve the products and invest in that area? >> Wow very, very smart. What are some of the basic ways that data scientists can deliver more value for their stakeholders, whether they're internal stakeholders, across different functions within the organization, or the members, the external stakeholders? >> Yeah, I think one of the most important things is to really embed closely into these kind of functional or domain areas, and understand qualitatively and quantitatively, what's important. Right, so understanding what the business context is and what problem you're trying to solve. And I think one of the most important that data scientists play a role is actually helping to ensure are we even answering the right question? So as an example, a product manager might ask a data scientist to pull certain data, or to do a certain analysis, and a part of the conversation and the culture has to be what are you trying to get at? What are you trying to understand? And really thinking through is that even the right question to be asking? Or could we ask it in a different way? Because that's going to inform what analysis you do, right what, really what, how you're delivering the results of this analysis to make better decisions. So I think that's a big part of it is, having this iterative process of doing data science. >> Really, it sounds like such and innovative culture, and you're right, looking at the data to determine is this the right next step? Is it not? How do we maybe adapt and change based on really what this data is telling us. If we kind of look at collaboration for a second. You talked about the integrated teams, but I'm wondering how do you scale collaboration within LinkedIn across so many businesses and engineering stakeholders? >> Yeah, so the way I kind of like to think about it is, there's really, you have to invest in culture, process, and tools. So let me start from the bottom up. So on the tools or technology, one of the ways to do it, is actually to create self-served tools, to really democratize the data. So first of all investing in foundations of really good data quality, right, whether you're creating that data yourself, or you're collecting that from externally, from different organizations. Once you have really good data quality, making sure that you have foundations that enable self-serve data basically. So for example, some of the things that data scientists are used today in various companies, really doesn't need a data scientist if you've invested in ways where business partners, let's say, can quarry that data themselves. So they don't need a data scientist to be doing this role. So that's an important investment on the technology side. In addition, making data scientists really productive, by using and investing in tools that will enable them to access the data is really important. So once you have that sort of technology, it enables your data scientist to be productive. The process is really important. So just as an example we have a sort of playbook in terms of how do we launch features? And part of that is kind of bring in data insights, in terms of which features we should be building. And then once you've determined how using the data on those insights, it's okay how are we going to launch this in terms of experimental design and setting? And then what are the success metrics? How are we going to know that this actually a good-- (speaker drowned out by crashing sound) And then once we've launched the experiment, analyzing that, where all of the stakeholders are part of this right? The project manager, the executive, the engineer, the data scientist, and then kind of iterating on the results and deciding what the decision is. So having actually a process that the whole team or the company abides by, really helps at having this collaboration where it's clear what everyone is doing and kind of what's the process by which we use data to develop and to innovate? And then finally culture, I think that's such an important part, and that really needs to be sort of bottoms up, top down, everywhere. It really needs to be a community and a culture where data is discussed and where data is expected, and where decision making really is grounded on, on data. I fundamentally believe that any product being developed, or any decision being made really should be data informed if not data driven. >> Right absolutely. One of the things that I'm hearing in what you're doing is enabling some of business users to be self-sufficient. So you're taking that feedback and that input from the business side to be able to determine what tools they need to have and how you need to enable them so that you've got your resources aligned on certain products. >> Yeah, just as an example, one of the things that we do for example, is we realized over time that, this isn't actually productive, and how do we make ourselves scale, so we started doing data boot camps, for example. >> Interviewer: Okay. >> Where we'll actually train new people coming into the company, on data, and on self-serve tools, and on how to run experiments. And so a variety of different kind of aspects, and even how to work with data scientists productively. So we have actually train that >> fantastic. >> So this data boot camp really helps us to instill a data culture, and it rally empowers the team. >> So this is, anybody coming in, whether they're coming in for a marketing role, or a sales ops role, they get this data boot camp? >> Yeah. >> Wow. >> And it's open to anyone and you know, it yeah, typically is going to be a certain subset of those people, but it really is open to anyone, and we're talking about more ways of how do we scale that and maybe how we put that on LinkedIn learning and make that more broadly accessible. >> Yeah. >> Yeah. >> So you have quite a big team, how do you keep all of the data scientists that you've got happy, what are the challenges that they face, how do you evaluate those challenges and move forward so that they have an opportunity to make an impact at LinkedIn? >> Yeah, so part of the things are actually the things that I mentioned right? So a culture of data so a, it's really important when we see that this is not happening, actually addressing that. So data scientists are going to thrive in a community where data is valued, and where data scientists are valued, so that's actually a really important aspect. And you know luckily people come to use because they know that we do value data. But I think that that's very important for any company and so, I advise startups as well, and this is one of the things that I tell people that are founding companies, is you have to have a culture which values data to attract data scientists, because otherwise they have other options. The other thing is having these, these foundations that enable them to be productive. Right, so these tools and these systems that enable them to really do high-value work, and invest in the right areas. So start graduating from doing things that are more, maybe repetitive or low-level and figure out how do you scale that so that you can have data scientists really, efficiently using their time for things that only they can do? >> Right, I love that this culture is sort of grooming them. One of the things that, a couple things I read recently. One, was that, I think it was Forbes that said, 2017, the best job to apply for is data scientist. But, from an trends perspective, it's looking that by 2018, there's going to be a demand so high, there's not going to be enough talent. How are, what's your perspective on LinkedIn? Are you, have you, it sounds like from a foundational perspective, it is a data driven company that really values data, is that something that you see as a potential issue or you really have built a culture of such, not just collaboration and innovation, but education that LinkedIn is in a very good position? >> Yeah, well so one thing is that, I didn't mention in terms of the happiness factor right? Is that it is actually a place where data scientists look for a place where they can also grow and learn and be with other like-minded data scientists. So I think that's something that we strongly support, again for companies that, people that may be viewing this and are not in such environments, there are a lot of ways to do this. So keeping data scientists happy also can be facilitating meetups, right with data scientists from your local region, and so those are ways that people share information and share techniques and share challenges even right? >> Interviewer: Yeah. >> Because this a growing and evolving field. And so that's, having that community and one of the things that's amazing about this conference is that it's creating this community of data scientists that are all sharing successes and failures as data science is evolving. The other thing is that data science draws from so many different backgrounds right? >> Yeah. >> It's a broad field, right, and there's so many different kinds of data science, and even that is getting both more specialized and more broad. So I think that part of it is also looking at different backgrounds, different educational backgrounds and figuring out how can you expand the pool of people that you're looking at, you know that are data scientists? >> Interviewer: Right. >> And how do you augment what skills they may not have yet, you know, on the job or through training or through online education, and so we're looking at all of these ways so. >> That's fantastic, we've heard a lot of that today. The fact that, the core data science skills are still absolutely vital, but there's some other sort of softer skills, you talked about sharing. Communication has come up a number of times today. It's really a key, not only to be able to understand and interpret the data from a creative perspective and communicate what the data say. But to your point, to grow and learn and keep the data scientists happy, that social skill element is quite important. >> Yael: Yes. >> So that was, that was an interesting learning that I heard today, and I'm sure you've heard many interesting things today that have inspired you as well. >> Yeah, and that's something that you know, creating this culture is something that even data science leaders around the world, where we're discussing this and talking about this, you know what are the challenges? And how do we evolve this field? And how do we help define and help kind of groom the next generation of data scientists? >> Interviewer: Right. >> And to be in a more stable and be in a better place than where we were and to help to continue to evolve it, and so it is yeah. >> Evolution, it's a great word. I think that that's another theme that we've heard today and as much as I'm sure you've inspired and educated these women that are here. Not just in person today, but all the what 70, 70 cities and 25 countries it's being live streamed. >> Yael: Yeah, it was 80 cities and six continets. >> It's growing it's amazing. >> And yeah. >> And I'm sure that they'd vote a 10 from you, but it's probably just in the little bit that we've had a time to chat, I'm sure that you're probably gleaning a lot from them as well. >> Yeah, definitely, absolutely. >> And it's the, we're scratching the surface. >> Yes, absolutely and so there are many more years to come. >> Interviewer: Exactly, Yeal thank you so much for joining us on The Cube. >> Thank you, it's pleasure. >> It's a pleasure talking to you, we wish you continued success at LinkedIn. >> Thank you, it's a pleasure. >> And we want to thank you for watching The Cube. We've had a great day at the 2nd annual Women in Data Science conference at Stanford University. Join the conversation #wids2017. Thanks so much for watching, we'll see ya next time. (rhythmic music) >> Voiceover: Yeah.

Published Date : Feb 4 2017

SUMMARY :

University, it's the Cube, Welcome back to The Cube, we are live So excited to have you here, So on the data products, much Right, fantastic, so tell us the business, to really that everybody benefits from. the business side, vice versa, kind of data is infused in the culture and especially the size of the So to have that built in and One of the ways we might What are some of the basic and the culture has to be at the data to determine that really needs to be the business side to be one of the things that we do So we have actually train that rally empowers the team. And it's open to anyone and that enable them to be productive. the best job to apply something that we strongly community and one of the and even that is getting And how do you augment what and interpret the data So that was, that was And to be in a more stable all the what 70, 70 cities Yael: Yeah, it was 80 And I'm sure that they'd scratching the surface. Yes, absolutely and so there Yeal thank you so much to you, we wish you continued And we want to thank

ENTITIES

Entity	Category	Confidence
Yael	PERSON	0.99+
Yael Garten	PERSON	0.99+
400	QUANTITY	0.99+
LinkedIn	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
70	QUANTITY	0.99+
2018	DATE	0.99+
Yeal	PERSON	0.99+
second category	QUANTITY	0.99+
25 countries	QUANTITY	0.99+
2017	DATE	0.99+
Android	TITLE	0.99+
both	QUANTITY	0.99+
One	QUANTITY	0.99+
80 cities	QUANTITY	0.99+
one	QUANTITY	0.99+
iOS	TITLE	0.99+
Stanford University	ORGANIZATION	0.99+
two domains	QUANTITY	0.99+
70 cities	QUANTITY	0.98+
two main categories	QUANTITY	0.98+
one day	QUANTITY	0.97+
today	DATE	0.97+
The Cube	TITLE	0.97+
one side	QUANTITY	0.97+
10	QUANTITY	0.97+
Forbes	ORGANIZATION	0.94+
one thing	QUANTITY	0.94+
Women in Data Science Conference	EVENT	0.92+
one simple example	QUANTITY	0.92+
#wids2017	EVENT	0.9+
one category	QUANTITY	0.9+
Women in Data Science Conference	EVENT	0.89+
six continets	QUANTITY	0.88+
Stanford University	ORGANIZATION	0.86+
first	QUANTITY	0.85+
Women in Data Science conference	EVENT	0.85+
plus million accounts	QUANTITY	0.82+
2nd annual	EVENT	0.82+
Stanford University	LOCATION	0.8+
2nd	EVENT	0.79+
Women in Data Science	EVENT	0.74+
two different examples	QUANTITY	0.69+
second	QUANTITY	0.68+
career panelists	QUANTITY	0.64+
Science	ORGANIZATION	0.61+
things	QUANTITY	0.54+
Cube	ORGANIZATION	0.48+
annual	QUANTITY	0.47+

Stephanie Gottlib, Agyleo Sport - Women in Data Science 2017 - #WiDS2017 - #theCUBE

>> Narrator: Live from Stanford University, it's theCUBE. Covering the Women in Data Science Conference 2017. >> Welcome back to theCUBE, we are live at Stanford at the second annual Women in Data Science Conference. I am Lisa Martin, joined by one of today's speakers from the event, Stephanie Gottlib. Stephanie, welcome to theCUBE. >> Thank you. >> You had a very interesting talk, which we'll get to in a minute, but you are currently the president of Agyleo Sport. We want to talk about that as well. You've been in the software and technology industry with oil and gas for a very long time, you've got a Bachelors, Masters, just a few years. >> Okay, thank you. >> Just you're, you've got expertise. That many people would desire. So we'd love to understand what your talk was about today, with respect to oil and gas. Data, digital transformation in oil and gas. You said "Data is the new oil." Which I just love that. Talk to us about that, what does that mean with respect to digital business transformation, and that industry? >> Yeah, so first of all, I say Data Science is definitely an area in which a woman, which I think is one of the main topic of today, will have a huge opportunity to move the needle. It's, I mean when you look at the, some numbers, I start in my talk with this example. In France, what is the proportion of women entrepreneurs involved in technology startups? And the answer is in the range of 8 to 12 percent. >> Lisa: Wow. >> I mean, in France right, I mean, economic-wise it's not perfect. But we have a long history, I think, human rights are there and so on, we are open. And to still be at this level, it's not dramatic, but to honest a lot remains to be done. And Data Science, it's a fantastic opportunity for women to change that drastically in the future. So that was cool to be invited to this presentation and see the huge potential that all those womans present for the future. So, having said that, now regarding my talk. What I wanted to bring on the table was about to put all the main foundational story to move into this new digital world. I mean, for industries which have been very conservative for a long time with old legacy aspect in it, moving to this digital world is not trivial. And you have three main components to handle with, which they have to address a bit differently. Which are about the goals, they have to adapt the way to think about, what are the new goals now? Which is mainly about asset utilization and maximizing the efficiency, the cost efficiency, the effectiveness, the safety and reliability and so on. How to integrate all of those technical new stuff, I mean, we are talking about Internet of Things, with plenty of new sensors everywhere in the field. HPC, High Performance Computing, for heavy computation, et cetera, et cetera. So that's some big topic, right? To digest for those industrial guys, and the last pillar which is, for me, the most crucial one is about the control change. Because beyond everything, you know, technical stuff. It's a matter of time, it's easy. But the control aspect is really essential. If you don't get the control right to instill some change management, you will likely fail. And a successful and valuable transformation comes with organization that have learned how to involve all of the entities, not just technical but legal, HR, accounting, sales marketing, all together to be aligned and to go to it. >> That's such a great point. Cultural evolution is critical, it's so hard. >> Stephanie: Absolutely. >> Right? You talk about whether it's a big oil company, or a big tech company, or another company that's large in another industry. Are you saying, though, I completely agree with you that cultural transit is the essential component. In oil and gas industry, how have you seen Data Science drive or influence cultural transformation? >> For sure, I mean the data now is in the center of everything. When I said, and you repeated, "Data is the new oil." Until recent past, we were driven by product centric approach. Today it's all about services and it's all about data. And that is a different paradigm that we need to integrate in the industry and in the oil and gas that I know better. To get the best benefit from it. It's a challenge but it's a fantastic and very passionate challenge to handle in the future. So that's why we have opened a center actually here, for example, in the Bay Area, to be close to the heart of what is happening in Data Science. >> Oh, fantastic, one of the things that you also said in your talk was that transformation through data analytics is equally as relevant on the operational side of a business as it is on the financial side. Expand upon that a little bit. >> Yeah, actually on the financial side, so the operational exploration prediction aspect I think it's more or less understandable. On the financial side it's a bit more hidden. But for too long our industry, I mean the oil and gas industry, have been substantially blind by not understanding how to best choose their commercial data in a holistic way. And now new startups, actually, have instilled some new way to think about that. Instill and develop new products based on machine learning combining machine learning, financial analysis. Et cetera, et cetera. Together to gain in accuracy, to gain in predictability, and a key factor is to... Get access to this information in a much faster time. And you know in our, in any industry, but in oil and gas industry time and precision cost a lot of money. >> Absolutely. What are some of the things that you would recommend to some of the young girls that are here, young women that are here, in terms of being able to influence an industry and elicit cultural change from an education perspective, is it just Data Science or what are some of the other skills and backgrounds do you think they need to be able to drive such change? >> Yeah, I think the conference was touching this point since this morning, and there is no clear answer obviously. There is no recipe, but for sure, I think many industrial today are still mirrored in the old ways. And they really need some fresh input, some fresh... Insight to really drive the culture right, the strategy right, that is necessary to move on the valuable and the successful transformation. And this fresh input, this fresh insight, I think can be completely an opportunity for woman to jump into this... This jobs or this, this aspect of the story. And with either the technical angle or the managerial angle I think it can be both right? And it's not exactly the same sort of skills that are behind. So skill wise, you know, let's be passionate. If you love the data, if you enjoy playing with the data, I think you will be perfect, doesn't matter if you are a man, a woman, I mean you are just a data scientist at the end. With skills and it's all about what you can bring and value to the company that you will work for. >> Lisa: Right. >> So go for it, I mean the Data Science world is an oyster, right? >> Absolutely. >> So go for it! >> Yes. >> I mean, really. It's a fantastic opportunity. >> It is, and some of the things that we heard today from the skills perspective is kind of opening it up or maybe broadening it a bit, absolutely the core Data Science skills are essential. The blend of hacker, statistician, mathematician, scientist, but also looking at some of the softer skills, creativity. Communication. >> Stephanie: Correct. >> And being able to understand enough of the business. >> Stephanie: Correct. >> To bring and really marry those two together. Have you seen that trend in kind of this ideal background coming up in the oil and gas industry? >> Yeah, of course, at the end of the day you've perfectly summarized all the skill set that a good data scientist needs to have. And this curiosity for the domain of application because Data Science either you can work for university then you can approach Data Science from an academic and fundamental thinking, but to be honest most of the time and most of the jobs are using Data Science for a purpose and for an application, so then you need to adapt yourself and be sure that you will have this curiosity, you need to adapt yourself to the knowledge world. And not the opposite, so this ability of adaptation, of curiosity, of passion for the type of problems or challenges, issues, that you will have to address through the Data Science world will be key, and it's really up to everybody to analyze if they want to go for it or not. >> I think that's a great point that you brought up, that adaptation. We have actually heard that a number of times today, that person needs to have the skills but also the adaptation, the flexibility. >> Stephanie: Correct. >> Along those lines, adaptation maybe, talk to us about what your current role is at Agyleo Sport. >> Yeah, with not real transition. (laughter) I moved, I quit Schlumberger a few months ago. My job, I loved my job, but I still live in France. It was difficult to be abroad so often. Anyway, I decided to change life but still I tried to stop working and I almost died. (laughter) So I decided to move forward to another challenge, really. And the new challenge is to combine and reconciliate my two passions, which are digital and sports. >> I love that, tell me more about that. >> So the idea is to raise a fund which would be the first independent fund in France, venture capital fund I mean. Addressing the sport and technology vertical. So domain, market, industry. You know sport, to make the link with what I express today, in fact sport is almost an industry like any other one. And the transformation of sport with integration of all this new tech have to be addressed and everything has to be done. So when you think how to revolutionize the way sport is handling either on the professional side or amateur side. You know, and the more I am digging into this new market for me, it's amazing. The opportunities are tremendous. And so we are pretty close to close our fund and to be, to get ready to invest in some passionating startups. Dynamic statups on this topic. I've just closed some partnership as well with, in LA, where sport tech is already booming. So it's going on and it's quite an exciting new, different, but, challenge that I am taking right now. >> It sounds so interesting. And wrapping things up, you bring up a great point that you've adapted but you've also been able to recognize the linkage between your favorite passion, sports, and technology and digital. And these days especially, we're a bit biased living in Silicon Valley where every company is a tech company, car companies et cetera. It's a really great message for the younger generation to understand, follow your passion. And there's technology there, and were going to need those diverse perspectives to help bring it to life and evolve it. >> Absolutely, so I think I realize that it's a luxury. At the point to have a choice to decide what you like to do in life, but it's also true that you have to address one in your early stage, early years, and giving you the maximum opportunities for the future is important. And then you can have this luxury, effectively to decide for your passion and to be driven by your passion. >> There's the Nirvana exactly. Well Stephanie thank you for those wise words of wisdom. Thanks so much for, >> Thank you very much. >> Stopping by theCUBE today, it's been a pleasure having you on. >> Me too, thank you. >> And we are going to be right back. We are live at the Women in Data Science Conference. Stick around, coming right back. (gentle electronic music)

Published Date : Feb 4 2017

SUMMARY :

Covering the Women in Data at the second annual Women You've been in the software You said "Data is the new oil." And the answer is in the and maximizing the efficiency, critical, it's so hard. the essential component. for example, in the Bay Area, is equally as relevant on the I mean the oil and gas industry, What are some of the things are still mirrored in the old ways. I mean, really. It is, and some of the enough of the business. Have you seen that trend in and most of the jobs are using that person needs to have the skills talk to us about what your And the new challenge is So the idea is to raise the younger generation to At the point to have a choice to decide There's the Nirvana exactly. it's been a pleasure having you on. We are live at the Women

ENTITIES

Entity	Category	Confidence
Stephanie Gottlib	PERSON	0.99+
Stephanie	PERSON	0.99+
8	QUANTITY	0.99+
Lisa Martin	PERSON	0.99+
France	LOCATION	0.99+
Lisa	PERSON	0.99+
LA	LOCATION	0.99+
Agyleo Sport	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
today	DATE	0.99+
two passions	QUANTITY	0.99+
Today	DATE	0.99+
12 percent	QUANTITY	0.98+
Bay Area	LOCATION	0.98+
both	QUANTITY	0.98+
two	QUANTITY	0.98+
this morning	DATE	0.97+
#WiDS2017	EVENT	0.97+
Women in Data Science Conference 2017	EVENT	0.97+
one	QUANTITY	0.95+
Women in Data Science Conference	EVENT	0.95+
Stanford University	ORGANIZATION	0.94+
Stanford	LOCATION	0.93+
theCUBE	ORGANIZATION	0.88+
Schlumberger	PERSON	0.85+
Women in Data Science 2017	EVENT	0.85+
first	QUANTITY	0.74+
second annual	EVENT	0.72+
few months ago	DATE	0.71+
first independent	QUANTITY	0.7+
three main components	QUANTITY	0.65+
#theCUBE	EVENT	0.47+
Nirvana	ORGANIZATION	0.39+

Sinead Kaiya, SAP | Women in Data Science 2017

>> Announcer: Live from Stanford University. It's theCUBE. Covering the Women in Data Science conference, 2017. >> Hi, welcome back to theCUBE, live from Stanford University at the second annual Women in Data Science tech conference. We are here with the COO of Products & Innovation at SAP, Sinead Kaiya. Sinead, welcome to theCUBE! >> Thanks very much! It's great to be here. >> It's great to have you. You were one of the keynote speakers today. >> Sinead: I was. >> Talk to us about your role at SAP and some of the topics that you discussed to the large audience here today. >> Yeah, absolutely. So one of the things I was happy to open my keynote with was letting them know that I'm actually not a data scientist. Because while I think it's important that that community gets together and shares their knowledge, I'm actually coming from the industry business angle. And for the young women who are here starting out in data science, I thought it's also very interesting and important for them to also hear the business perspective on data science. So that was my main contribution to the talk today. And I got a lot of great feedback, that they really appreciated getting that perspective. >> I can't imagine that you wouldn't, because data science is a boardroom conversation now. You report to the CEO. Talk to us about the connection that you help the CEO understand about the value that data science can bring to organizations like SAP. >> Right. It's actually funny. We have recently re-equipped some of our major boardrooms in SAP with huge digital touchscreens. They're absolutely phenomenal, and the reason is because the CEO truly understands, as do the board members, that the power of many of their decisions are lying today in the data. And what they don't want is a static printout on some slides or some chart that somebody hands to them. They want to be able to touch the data and explore the data, and really try to dig into it themselves. So when it comes to the question of the data, I think for CEO's this is a no-brainer. Right, they're drowning in data. They have a lot of data. They understand that. But the point of my talk today was more about the science. So I think where CEO's need to go next, is understanding that just having reams of data and being able to slice and dice it is not going to cut it anymore. You need the young women in these professions that bring the scientific discipline to that data, which is incredibly technical, around machine learning algorithms, to actually start to make sense of that data. So this is a switch for CEO's. The data is a no-brainer, but the science is a new thing that's starting to creep into the boardroom. And they're starting to learn that machine learning and these technologies are going to be very important in how they drive their businesses. >> What's the perception of that at SAP, and what are some of the things that are going on on the technology side to bring that data science in, to make sense of this data and extract value for SAP? >> So obviously SAP has a very strong portfolio of analytics products as well as our SAP HANA in-memory data platform, but where the power of it, is when we start co-innovating with our customers, because it all comes to life once it reaches the customer. So I gave a couple of examples in my keynote today, on how we're co-innovating with, for example, our customer Trenitalia. So Trenitalia is the largest provider of train service in Italy. They move about two million passengers a day. >> Wow. >> And about 80 million tons of freight a year. And they're collaborating with SAP to not only, how do you say, equip all their trains with sensors and be able to be getting that real-time data, how do they connect that with the IT data in their maintenance systems, so that when a train, let's say we know before it's going to break, before it does, and the machine already has triggered the maintenance technician, has already scheduled it, and everything happens in a very smooth and automated way. So it's once we go to the real problems that our customers are having, and we can apply our in-memory technology to their problems, that we get the real value. >> Right. That's such an interesting example. Like, intelligent train, digital train, how do those come together to enable them to meet their customers' objectives. >> Absolutely. Another interesting topic that I talked about was business without bias. So this is a new feature set that we're building into our HR systems. So SAP SuccessFactors has systems that people use for recruiting, and then taking you through the whole HR life cycle from promotions to talent management to compensation. But obviously, anybody who's been through these processes know that there's a certain element of human bias along the way. So, one of the things I talked about is how we're using machine learning to enhance our HR product, so we can try to at least identify some of the bias, if not start to remove it from the system. So... >> This is, sorry. We actually were speaking with someone on the show earlier today, who was looking at how to remove bias from the recruiting process, and creating technology for college campuses and students to be able to use. It's game-based technology, and I thought it was really interesting, because oftentimes recruiting, looking at GPA's, test scores, maybe some of those other hard factors, but now with data science and the ability to understand and add some of the behavioral insights in, really interesting applicability and how that can influence the next generation of people working for lots of different industries and companies, including SAP. >> And it's not just because it's technically interesting, or because it's the right thing to do. To take it from the CEO angle, CEO's today recognize that if they want to solve the big challenges that are on their plate, they not only need the best talent, they need the most diverse talent. But I can see from my experience, just because the CEO decides that diversity should be a corporate priority, and just because people say "yeah, we think that's a good idea," how do you actually codify that in the systems that your employees are using in the business? So the question of, do we need diversity in business, is no longer on the table. But it's rather, how do we actually start to implement that in a more systematic way, so that it's not just wishful thinking. It's actually something that's built in. >> Right. Talk to us about who your collaborators are within SAP, on things like that. Who do you work with, departmentally, function-group-wise, to help make that "yes, we understand, we need to do this" into actually real-world applicability? >> Well, one of the things I talk to, and some advice I gave the young women today, which is true for software in general, is they have to collaborate with the end user. So if you want to build in these bias checks into the HR system, do not sit alone in your laboratory. Do not sit in front of your computer and try to guess what you think is needed. Go out and shadow a recruiter for a week. Go and sit with the end user. Go and understand and truly see what their problems are, and then really involve them in the solution. So, I think that will also help when we talk about how do the young women here take all the academics and all of the, how do you say, theory that they're creating, and start to apply that in a real business context. If you haven't involved the end user, that's going to be quite hard to do. So one of the things I told them is, go to the user. >> That's great advice. I'm curious though, your perspective, coming from the business side, you know we look at data science, Forbes said it's going to be the best job to apply for in 2017. We're also seeing statistics that show, by 2018 there's going to be a shortage. The demand will be so high for data scientists that there will be a shortage. If we kind of look at the evolution of data science and where we are now, you look at the traditional skills. Stats, math, sciences, computing, maybe former hackers. Some of the things that we've heard today that I'd love to get your opinion on, being a businesswoman, is people are now saying, you know, it's the ability to be creative, to analyze and interpret, but also to communicate the information. Another thing that came up that I thought was really interesting was the factor of empathy when you're evaluating different types of data. I thought that was really interesting. I'd love to get your advice for a young woman who might be thinking about majoring in computer science, but maybe her interests really lie in sports or something that you think, is there a technology there? Well yeah. What advice would you give, and what are some of the additional core skills that you see a successful data scientist of the future needs to have? >> Right. So I love that you brought up the topic of communication, because I see in the business world, this is so important. So when you talk about competitive advantage, all of the companies can go out and hire people with, let's say, equivalent technical skills. So we can all get to the same level of technical prowess, let's say, in an industry. But do you have the people who, like you said, can apply the creativity and then find a way to communicate the results back in a superior way? So I think they are going to find that just having the technical skills in business is never enough to really break that ceiling. You have to have absolutely phenomenal communication skills. >> Definitely. >> I also gave them the advice to take a couple of business courses. It really helps to understand how the decision-makers, who you're trying to influence, what are the strategies that they use? What are the challenges that they face? And how do you actually look at some of the problems of data science more from a business perspective? I told them, what I thought is, absolutely the most hireable data scientist would be someone with some domain expertise, someone with the technical background, but somebody who also knows about business. So we need the full package. >> Absolutely! Well and that's an important point, because technology evolves. It's also the catalyst for our evolution, and naturally, any role will change and evolve. I think communication is a core, a very horizontal skill. But I definitely also would agree with your recommendations that having some business acumen in some form or fashion is really going to be key. Tell us a little bit about, what are some of the things, when somebody's coming on to SAP as a data scientist, if they maybe don't have that business background, are they able to get that within, because the culture at SAP kind of supports sort of, cross-collaboration, cross-pollination, so that they might be able to just start to learn different perspectives, to become that package that we talked about. >> Right. So in SAP, of course we have multiple opportunities for employees to either move between departments and see different areas of the company, but as a data scientist at SAP, the best experience you're going to have is working with our customers. It's one of our greatest assets and our greatest pride, is the wonderful relationship we have with hundreds of thousands of leading businesses around the world. So by joining SAP, you get to collaborate with some of the really top companies and industries. And that is when it doesn't become business theory in books. You actually get to go to the customer and see how it touches their business, and where it becomes real. And I think this is what attracts so many people to SAP, and gets them to really engage and stay at SAP, is that phenomenal customer base that we have. >> That's fantastic. Well, that real-world applicability, there isn't anything better than that. You can learn a lot of theory in textbooks, and maybe obviously be able to apply some of it, but having that expertise when something doesn't go the way that it's printed, is really really key to helping shape someone. Speaking of shaping, I'm interested in how you've been at SAP for quite some time, you've had posts in Germany and France, which is amazing. Now you're based in New York. Tell us how you've seen, because you really clearly understand the business side and you understand the importance of the business side and the data science side, the needs there and how they need to work together to drive more value, innovation, drive products, drive revenue. How have you seen SAP's culture evolve to become open to, for example, business and data science merging and being core collaborators? >> Yeah, so I mean, SAP's industry has changed a lot over the recent years. And we've done that along with our customers. So our customers are obviously in a much more tight competitive situation in the whole digitization side of things. So we've been evolving along together with them. But to go back to my other point, one of the major changes or cultural shifts that I've seen in SAP is this tight collaboration with the end user. It used to be that we were only given access to the IT departments of our customers. So we literally had to work through the filter of the IT department to find out what it is we should build. Suddenly, the IT departments are realizing that the end user in companies have quite a bit of power these days, you know. >> Lisa: Yes they do. >> And they're now opening the doors and asking us to collaborate with them, and that shift has allowed our engineers to get even closer to the end users in our customers. >> Fantastic, and I'm sure that's really a key for driving innovation. Last question for you. We're at the second annual WiDS conference. I mean, what an amazing event. Live streamed, reaching so many people. You yourself were a keynote this afternoon. Diane Greene was a keynote this morning. As you look around this very energetic atmosphere that we're in, what has inspired you? What are you going to take away from WiDS 2017 that you're like, wow, that was really fantastic? >> Well, one of the things is the diversity of the speakers. I mean, the breadth of this topic is amazing. Being a woman in tech, of course it's wonderful to see so many highly intelligent and engaged women in one room, which is something we don't usually get to see. So that's one of the other key takeaways for me. >> Fantastic. Well Sinead, we so appreciate you stopping by theCUBE. We wish you continued success as COO of Products & Innovation, and we look forward to seeing you next time on the program. >> Thanks so much! >> And we want to thank you for watching theCUBE. We are live at the second annual Women in Data Science conference, #WiDS2017, but stick around. We'll be right back.

Published Date : Feb 4 2017

SUMMARY :

Covering the Women in Data at the second annual Women in It's great to be here. It's great to have you. and some of the topics that you discussed So one of the things I was I can't imagine that you wouldn't, or some chart that somebody hands to them. So Trenitalia is the largest and be able to be getting to meet their customers' objectives. So, one of the things I talked about and the ability to understand or because it's the right thing to do. to help make that "yes, we So one of the things I told it's the ability to be creative, that just having the What are the challenges that they face? is really going to be key. and see different areas of the company, and the data science side, that the end user in companies and that shift has allowed our engineers We're at the second So that's one of the other and we look forward to seeing at the second annual Women

ENTITIES

Entity	Category	Confidence
Diane Greene	PERSON	0.99+
Germany	LOCATION	0.99+
Trenitalia	ORGANIZATION	0.99+
Italy	LOCATION	0.99+
Lisa	PERSON	0.99+
New York	LOCATION	0.99+
2017	DATE	0.99+
Sinead Kaiya	PERSON	0.99+
France	LOCATION	0.99+
Sinead	PERSON	0.99+
hundreds	QUANTITY	0.99+
2018	DATE	0.99+
SAP	ORGANIZATION	0.99+
today	DATE	0.99+
Forbes	ORGANIZATION	0.99+
WiDS 2017	EVENT	0.98+
Stanford University	ORGANIZATION	0.98+
one	QUANTITY	0.98+
about 80 million tons	QUANTITY	0.98+
one room	QUANTITY	0.98+
#WiDS2017	EVENT	0.97+
about two million passengers a day	QUANTITY	0.97+
SAP HANA	TITLE	0.97+
a week	QUANTITY	0.96+
Women in Data Science conference	EVENT	0.94+
Women in Data Science	EVENT	0.93+
Women in Data Science tech conference	EVENT	0.92+
Women in Data Science 2017	EVENT	0.92+
a year	QUANTITY	0.88+
this afternoon	DATE	0.86+
this morning	DATE	0.86+
WiDS	EVENT	0.84+
theCUBE	ORGANIZATION	0.76+
earlier today	DATE	0.76+
second annual	EVENT	0.76+
second annual	QUANTITY	0.72+
COO	PERSON	0.68+
thousands	QUANTITY	0.68+
couple	QUANTITY	0.6+
SAP	TITLE	0.58+
SAP SuccessFactors	ORGANIZATION	0.55+
keynote speakers	QUANTITY	0.49+

Ann Rosenberg, SAP | Women in Data Science 2017

>> Commentator: Live from Stanford University it's theCUBE covering the Women in Data Science Conference 2017. (jazzy music) >> Hi, welcome back to theCUBE. I'm Lisa Martin live at Stanford University at the second annual Women in Data Science WiDS tech conference. We are here with Ann Rosenberg from SAP. She's the VP head of Global SAP Alliances and SAP Next-Gen. Ann, welcome to the program. >> Thank you so much. >> So SAP is a sponsor of WiDS. Talk to us a little bit about that, and why is it so important for SAP to be involved in this great womens organization. >> So first of all, in my role as working with SAP's relationship to academia and also building up innovation network we see that data science is a very, very key skill set, and we also would like to see many more women get involved into this. Actually (mumbling) right now as we speak we are at the same time in 20 different countries around the world, 24 events we have. So we are both in Berlin, we are in New York, we are all over the world. So it's very important. I call it kind of a movement what we are doing here. It's important that all over the world that we inspire women to go into data science and into tech in general. So it is important thing for SAP. First of all, we need a lot of data science interested people. You also need our entire SAP ecosystem to go out to universities and be able to recruit a data science student both from a diversity perspective, whatever you are a female or a man of course. >> Absolutely, you're right. This is a very inspiring event. It's something that you can really actually feel. You're hearing a lot of applause from the speakers. When you're looking enabling even SAP people to go out and educate and recruit data scientists, what are some of the key skills that you're looking for as the next generation of data scientists? >> This is an interesting thing because you can say that you need like a very strong technical skill set, but we see more and more, and I saw that after I moved to Silicon Valley for two years that also the whole thing about design thinking, the combination of design thinking and data science is becoming something which is extremely important, but also the whole topic about empathy and also, so when you build solution you need to have this whole purpose driven in mindset. So I think what we're seeing more and more is that it's great to be a great data science, but it takes more than that. And that's what I see Stanford and Berkeley are doing a lot, that they're kind of mixing up kind of like the classes. And so you can be a strong data science, but at the same time you also have the whole design thinking background. That's some of the things that we look for at SAP. >> And that's great. We're hearing more and more of that, other skills, critical thinking, being able to not only analyze and interpret the information, but apply it and explain it in a way that really reflects the value. So I know that you have a career, you've been in industry, but you've also been a lecturer. Is this career that you're doing now, this job in alliances and next-gen for SAP sort of a match made in heaven in terms of your background? >> I actually love that question, probably the best question I ever got because it is definitely my dream job. When I was teaching in Copenhagen for some years ago I saw the mind of young people. I saw the thesis, the best of master thesis. I saw what they were able to do, and I'm an old management consultant, and I kept on thinking that the quality of work, the quality of ideas and ideations that the students come with were something that the industry could benefit so much from. So I always wanted to do this matchmaking between the industries and the mind of young people. And it's actually right now I see that it's started kind of, what I at least saw for the last two years that the industries that go to academia, go to universities to educate or to students to work on new ideas. And of course in Silicon Valley this has been going on for some time now, but we see all over the world. And the network that I'm responsible for at SAP, we work in more than 106 countries around the world, with 3,100 universities. And what I really want to do now, I call it the Silicon Valleys of the world where you are mapping the industries with academia with the accelerators and start ups. It's just an incredible innovation network, and this is what I see is just so much growing right now. So it's a great opportunity for academia, but equally also for the industry. >> I love that. Something that caught my eye, I was doing some research, and April 2016 SAP announced a collaboration with the White House's Computer Science for All Initiative. Tell us about that. >> I mean the whole DNA of SAP is in education. And therefore we do support a number of entity around the world. Whatever we talk about building up a skill set within data science, building skill set in design thinking, or in any kind of development skills is really, really important for us. So we do a lot of work together with the governments around the world. Whatever you talk about the host communication, for example, we have programs called Young Thinkers, Beatick, where you go out to high schools or you go into academia, to universities. So when this institute came up, we of course went in and said we want to support this. So if I look at United States, so we have a huge amount of universities part of the network that I'm driving with my team. So we have data curriculums, education material, we have train to train our faculties, boot camps. We do hackathons, coach games. We do around 1,200 to 1,600 hackathon coach games per year around the world. We engage with the industries out to the universities. So therefore it was a perfect match for us to kind of support this institute. >> Fantastic. Are there any things that SAP does as we look at the conference where we are, this Women in Data Science, are there things that you're doing specifically to help SAP, maybe even universities bring in more females into the programs, whether it's a university program or into SAP? >> Yeah, so for SAP in our whole recruiting process we definitely are looking into that. There is a great mix between female and male people who get hired into the company, but we also, it all start with that you actually inspire young women to go into a data science education or into a development education. So my team, we actually go in before SAP recruiting get involved where we, that's why we build up the strong relationships with universities where we inspire young women, like we do at this event here to why should they go in and have a career like this. So therefore you can see there's a lot of pre=work we need to be done for us to be able to go in and go into the recruiting process afterwards. So SAP do a lot of course in the United States, but all over the world to inspire young women to go into tech. And SAP does what we see today all over the world we have huge amount of female from SAP, female speakers at all our events who stand as role models to show that they are women, they are working for SAP, and are very, very strong female speakers and are female role models for all young women to get involved. So we do a lot of stuff to show that to the next generation of data science of whatever it is in tech. >> Yeah, and I can imagine that that's quite symbiotic. It's probably a really nice thing for that female speaker to be able to have the opportunity to share what she's doing, what she's working on, but also probably nice for her to have the opportunity to be a mentor and to help influence someone else's career. So you mentioned accelerators a minute ago, and I wanted to understand a little bit more about SAP Next-Gen Consulting, this collaboration of SAP with accelerators or start ups. How are you partnering to help accelerate innovation, and who is geared towards? Is it geared more towards student? Or is SAP also helping current business leaders to evolve and really drive digital transformation within their companies? >> So the big (mumbling) I'm working on right now too is as mentioned you said SAP Next-Gen is called SAP Next-Gen Innovation With Purpose. So it's linked to the 17 U.N. global goals. We've seen from now in Silicon Valley when you innovate you actually make innovation web purposes included. And that's why we kind of agreed on in SAP why don't we make an innovation network where the main focus is that all the innovation we get out of this is purpose driven linked to the 17 global goals. Like the event here is the goal number five, gender equality. In that network we actually do the matchmaking between academia. We look at all the disrupted new technologies, experience the technologies like machine learning like what's being discussed a lot here, block chain IOT. And then we look at the industry out there because the industries, they need all the new ideas and how to work with all the new opportunities that technology can provide, but then we also look into accelerator start ups. The huge amount, and often when you're in Silicon Valley you kind of think this is the world of the start ups of the world. So when you travel around the world, that's we we looked into a lot the last two years. We call the Silicon Valleys of the world, any big city around the world, or even smaller cities, they have tech hub. So you have Ferline Valley, you have Silicon Roundabout in London, you have Silicon Alley in New York, and that is where there is a huge amount of gravity of start ups and accelerators. And when you begin to link them together with the university network of the world and together with the industry network of the world, you suddenly realize that there is an incredible activity of creativity and ideations and start ups, and you can begin to group that into industries. And that give industries the opportunity not only to develop solution inside the company, but kind of like go in and tap into that incredible innovation network. So we work a lot with seeding in start up, early start ups into corporates, and also crowd source out to academia and the mind of young people all Next-Gen Consulting project where you similar work with students at universities on projects. It could be big data science project. It could be new applications. So I see like as the next generation type of consultancy and research what is happening in that whole network. But that is really what SAP Next-Gen is, but it is linked to the 17 U.N. global goals. It is innovation with purpose, which I'm really happy to see because I think when you build innovation, you really think about in the bigger, the whole (mumbling) thing that we know from singularity. You should think about a bigger purpose of what you're doing. >> Right, right. It sounds like though that this Next-Gen Consulting is built on a foundation of collaboration and sharing. >> It is, it is, and we have three Next-Gen lab types we set up. In this year we built, last year, we are a new year now, we built 20 Next-Gen labs at university campuses and at SAP locations. And here in the new year more labs is being set up. We are opening up a big lab in New York. We just recently opened up one in Valdov at SAP's headquarter. We have one here in Silicon Valley, and then we have a number of universities around the world where SAP's customers go in and work with academia, with educators and students because what do you do today if you're in industry? You need to find students who are strong in machine learning and all the new technologies, right? So there's a huge need for in industry now to engage with academia, an incredible opportunity for both sides. >> Right, and one last question. Who are you, in the spirit of collaboration, who do you collaborate back with at SAP corporate? Who are all the beneficiaries or the influencers of Next-Gen Consulting? >> So I collaborate, inside SAP I collaborate, SAP have a number of, we have ICN, Innovation Center Network. We have our start up focus program. We have a number of innovation, the labs, a number of basically do all our software developments, so they're heavily involved. We have our whole go to market organization with all our SAP customers and industry, I call them clubs. And then externally is of course academia, universities, and then it is the start up communities, accelerators and of course, the industry. So it is really like a matchmaking. That's like, when people ask me what do you do, and I'm a matchmaker. That's really what I am. (Lisa laughs) >> I like that, a matchmaker of technology and people all over. So you're on the planning committee for WiDS. Wrapping things up here, what does this event mean to you in terms of what you've heard today? And what are you excited about for next year's event? >> So for me, one year ago when I heard about this year I kind of said this is important, this is very important. And it's not just an event, it's a movement. And so that was where I went in and said you know, we want to be part of this, but it must be more than just an event here. It's staying for the need to be much more than that. And this is where we all teamed up, all the sponsors together with ISMIE, and we said okay, let us crowd source it out, let us live stream it out much more than ever. And this is also what the assignment is now, that we to so many locations. This is just the beginning. Next year is going to be even bigger, and it's not like that we will wait to next year. We this week announced the SAP Next-Gen global challenges linked to the 17 U.N. global goals. So we are inspiring everybody to go in and work on those global challenges, and one of them is goal number five, which is linked to this event here. So for us and for me this is just the beginning, and next year is going to be even bigger. But we are going to do so many event and activity up to next year. My team in APJ, because of the Chinese New Year, have already been planned coming up here. >> Lisa: Fantastic. >> And we have been doing pre-event, (mumbling) events. So again, it is a movement, and it's going to be big. That's for sure. >> I completely can feel that within you. And you're going to be driving this momentum to make the movement even louder, ever more visible next year. >> Ann: Yeah. >> Well Ann, thank you so much for joining us on The Cube. We're happy to have you. >> Thank you so much for the opportunity. >> And we thank you for watching The Cube. I am Lisa Martin. We are live at Stanford University at the second annual Women in Data Science Conference. Stick around, we'll be right back. (jazzy music)

Published Date : Feb 4 2017

SUMMARY :

covering the Women in Data Stanford University at the important for SAP to be around the world, 24 events we have. as the next generation of data scientists? that also the whole thing So I know that you have a the industries that go to the White House's Computer I mean the whole DNA the conference where we are, in the United States, and to help influence all the innovation we get this Next-Gen Consulting And here in the new year Who are all the beneficiaries and of course, the industry. does this event mean to you of the Chinese New Year, and it's going to be big. the movement even louder, We're happy to have you. And we thank you for watching The Cube.

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
Ann	PERSON	0.99+
Ann Rosenberg	PERSON	0.99+
SAP	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
Lisa	PERSON	0.99+
White House	ORGANIZATION	0.99+
New York	LOCATION	0.99+
Berlin	LOCATION	0.99+
April 2016	DATE	0.99+
last year	DATE	0.99+
24 events	QUANTITY	0.99+
Copenhagen	LOCATION	0.99+
Stanford	ORGANIZATION	0.99+
3,100 universities	QUANTITY	0.99+
two years	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
ICN	ORGANIZATION	0.99+
one year ago	DATE	0.99+
United States	LOCATION	0.99+
one	QUANTITY	0.99+
20 different countries	QUANTITY	0.99+
three	QUANTITY	0.99+
Silicon Alley	LOCATION	0.99+
Next year	DATE	0.99+
London	LOCATION	0.99+
next year	DATE	0.99+
Innovation Center Network	ORGANIZATION	0.99+
more than 106 countries	QUANTITY	0.99+
Silicon Valleys	LOCATION	0.99+
both	QUANTITY	0.99+
both sides	QUANTITY	0.99+
ISMIE	ORGANIZATION	0.99+
this week	DATE	0.98+
this year	DATE	0.98+
Berkeley	ORGANIZATION	0.98+
around 1,200	QUANTITY	0.98+
Next-Gen	ORGANIZATION	0.98+
today	DATE	0.97+
Women in Data Science Conference	EVENT	0.97+
Women in Data Science Conference 2017	EVENT	0.97+
Chinese New Year	EVENT	0.97+
Global SAP Alliances	ORGANIZATION	0.97+
First	QUANTITY	0.97+
labs	QUANTITY	0.96+
WiDS	EVENT	0.96+
Gen	ORGANIZATION	0.95+
The Cube	TITLE	0.95+
Stanford University	ORGANIZATION	0.94+
Next-Gen Consulting	ORGANIZATION	0.94+
one last question	QUANTITY	0.93+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Science: