Yael Garten, LinkedIn | Women in Data Science 2017

>> Announcer: Live, from Stanford University, it's the Cube, covering The Women in Data Science Conference, 2017. >> Welcome back to The Cube, we are live at Stanford University, at the 2nd annual Women in Data Science Conference, this great, fantastic one day technical conference. And we are so excited to be joined by Yael Garten, who was one of the career panelists. Yael, you are the Director of Data Science at LinkedIn, welcome to the cube. >> Yeah, thank you, thanks for having me. So excited to have you here, everybody knows LinkedIn. My parents even have probably multiple LinkedIn accounts, but they do. You've served, what 400 and plus million accounts, I'd love to understand, what is the role, what's the data scientist's role in the business overall? >> Yeah, so I guess when people ask me about data science, what I love to kind of start with is there are a couple different types of data science. And so I would basically say that there are two main categories by which we use data science at LinkedIn. If you think about it, there is really data science where a product of your work is for a human to consume. So using data to help inform business or product strategy, to make better products, make more informed decisions about how you're investing your resources. So that's one side, which is often called decision sciences, or advanced analytics. Another type of data science is where the consumer of the output is a machine. Alright so rather than a human, a machine. So basically they these are things like machine learning models and recommendation systems. So we have really both of those. The second category is what we call data products. And so we use those in virtually everything we do. So on the data products, much of LinkedIn is a data product, it's really based on date. Right, our profiles, our connection graph, the way that people are engaging with LinkedIn helps us improve the product for our members and clients. And then we use that data internally, to really make better decisions, to understand, you know how can we better serve the world's professionals, and make them more productive and successful? >> Right, fantastic, so tell us a little bit about your team. It sounds like it's sort of broken into those two domains. You must have quite a, a large team, or a lean team? >> So yeah, we have, the way we have our team is that we work really closely within all of our product verticals, and we embed closely with the business, to really understand kind of what are the needs. And then we work very cross-functionally. So we will typically have in any group, sort of a product manager, and engineer, a designer, a data scientist, often it's from both kinds of data scientists. So sort of one on the analytic side, one on the machine learning side. Right, marketing, business operation, so really very cross-functional teams working together, using this data. >> Very smart, it sounds very integrated from the beginning, where they kind of by design-- >> Yes. >> So that collaboration is really sort of natural within LinkedIn? >> Yes. >> That's fantastic, very progressive. And certainly it's something that everybody benefits from. >> Yes. >> Right because as whether you're on the advanced analytic side, or on the machine learning side, you're getting exposure to the business side, vice versa, which, that's really a great environment for success. >> Yes, yeah and part of, I think, what I love about LinkedIn is actually our data culture, and how kind of data is infused in the culture of how we do things. >> Right, which is really-- >> Right, not always the case. >> It's not, and it's, cultural shifts have, we were talking about that with a number of guests today, and especially the size of the organization, that's tough. >> Yael: Yes. >> So to have that built in and that integration as part of, this is how we do business is, really you can imagine all the potential and possibilities there. So would love to understand, how is LinkedIn using data to recommend ways to evolve products and services to best serve all of it's members? >> Yeah, so maybe two different examples of how we do this, one is, what we do is every launch that we have, so every feature that we generate, we really do it at an online experimentation setting. So we have a certain feature that we're about to roll out to our members. And we want to make sure that it's a better experience for our members. And better, as measured by kind of the metrics that we've defined in terms of measures of success. And so, which is really aligned to what value we believe we're delivering our members and customers. And so when we roll out features, we'll roll it out to a certain percentage of our users, test the downstream impacts of that, and then decide, based on that, whether we actually roll that feature out to 100% of members. And so that's one of the things that my team is heavily involved in, is really helping to use that data to make sure that we are structuring things in a way that's statistically sound, so that we can measure the impacts correctly, of rolling out certain features. So that's kind of one category of work. And the other category is really to, to do sort of opportunity identification, and kind of deep-dive insights into understanding into a certain product area. Where are there opportunities to improve the product? So one, let me give you a high-level example. One of the ways we might use data is to say okay, Are certain members in certain countries accessing via iOS or Android? And if so, should we be developing more in differentiating between iOS and Android apps? It's one simple example right, where we'll actually decide our R&D investments, based on the data that we're seeing in terms of how people are using our products and do we think that that's important enough of an investment to improve the products and invest in that area? >> Wow very, very smart. What are some of the basic ways that data scientists can deliver more value for their stakeholders, whether they're internal stakeholders, across different functions within the organization, or the members, the external stakeholders? >> Yeah, I think one of the most important things is to really embed closely into these kind of functional or domain areas, and understand qualitatively and quantitatively, what's important. Right, so understanding what the business context is and what problem you're trying to solve. And I think one of the most important that data scientists play a role is actually helping to ensure are we even answering the right question? So as an example, a product manager might ask a data scientist to pull certain data, or to do a certain analysis, and a part of the conversation and the culture has to be what are you trying to get at? What are you trying to understand? And really thinking through is that even the right question to be asking? Or could we ask it in a different way? Because that's going to inform what analysis you do, right what, really what, how you're delivering the results of this analysis to make better decisions. So I think that's a big part of it is, having this iterative process of doing data science. >> Really, it sounds like such and innovative culture, and you're right, looking at the data to determine is this the right next step? Is it not? How do we maybe adapt and change based on really what this data is telling us. If we kind of look at collaboration for a second. You talked about the integrated teams, but I'm wondering how do you scale collaboration within LinkedIn across so many businesses and engineering stakeholders? >> Yeah, so the way I kind of like to think about it is, there's really, you have to invest in culture, process, and tools. So let me start from the bottom up. So on the tools or technology, one of the ways to do it, is actually to create self-served tools, to really democratize the data. So first of all investing in foundations of really good data quality, right, whether you're creating that data yourself, or you're collecting that from externally, from different organizations. Once you have really good data quality, making sure that you have foundations that enable self-serve data basically. So for example, some of the things that data scientists are used today in various companies, really doesn't need a data scientist if you've invested in ways where business partners, let's say, can quarry that data themselves. So they don't need a data scientist to be doing this role. So that's an important investment on the technology side. In addition, making data scientists really productive, by using and investing in tools that will enable them to access the data is really important. So once you have that sort of technology, it enables your data scientist to be productive. The process is really important. So just as an example we have a sort of playbook in terms of how do we launch features? And part of that is kind of bring in data insights, in terms of which features we should be building. And then once you've determined how using the data on those insights, it's okay how are we going to launch this in terms of experimental design and setting? And then what are the success metrics? How are we going to know that this actually a good-- (speaker drowned out by crashing sound) And then once we've launched the experiment, analyzing that, where all of the stakeholders are part of this right? The project manager, the executive, the engineer, the data scientist, and then kind of iterating on the results and deciding what the decision is. So having actually a process that the whole team or the company abides by, really helps at having this collaboration where it's clear what everyone is doing and kind of what's the process by which we use data to develop and to innovate? And then finally culture, I think that's such an important part, and that really needs to be sort of bottoms up, top down, everywhere. It really needs to be a community and a culture where data is discussed and where data is expected, and where decision making really is grounded on, on data. I fundamentally believe that any product being developed, or any decision being made really should be data informed if not data driven. >> Right absolutely. One of the things that I'm hearing in what you're doing is enabling some of business users to be self-sufficient. So you're taking that feedback and that input from the business side to be able to determine what tools they need to have and how you need to enable them so that you've got your resources aligned on certain products. >> Yeah, just as an example, one of the things that we do for example, is we realized over time that, this isn't actually productive, and how do we make ourselves scale, so we started doing data boot camps, for example. >> Interviewer: Okay. >> Where we'll actually train new people coming into the company, on data, and on self-serve tools, and on how to run experiments. And so a variety of different kind of aspects, and even how to work with data scientists productively. So we have actually train that >> fantastic. >> So this data boot camp really helps us to instill a data culture, and it rally empowers the team. >> So this is, anybody coming in, whether they're coming in for a marketing role, or a sales ops role, they get this data boot camp? >> Yeah. >> Wow. >> And it's open to anyone and you know, it yeah, typically is going to be a certain subset of those people, but it really is open to anyone, and we're talking about more ways of how do we scale that and maybe how we put that on LinkedIn learning and make that more broadly accessible. >> Yeah. >> Yeah. >> So you have quite a big team, how do you keep all of the data scientists that you've got happy, what are the challenges that they face, how do you evaluate those challenges and move forward so that they have an opportunity to make an impact at LinkedIn? >> Yeah, so part of the things are actually the things that I mentioned right? So a culture of data so a, it's really important when we see that this is not happening, actually addressing that. So data scientists are going to thrive in a community where data is valued, and where data scientists are valued, so that's actually a really important aspect. And you know luckily people come to use because they know that we do value data. But I think that that's very important for any company and so, I advise startups as well, and this is one of the things that I tell people that are founding companies, is you have to have a culture which values data to attract data scientists, because otherwise they have other options. The other thing is having these, these foundations that enable them to be productive. Right, so these tools and these systems that enable them to really do high-value work, and invest in the right areas. So start graduating from doing things that are more, maybe repetitive or low-level and figure out how do you scale that so that you can have data scientists really, efficiently using their time for things that only they can do? >> Right, I love that this culture is sort of grooming them. One of the things that, a couple things I read recently. One, was that, I think it was Forbes that said, 2017, the best job to apply for is data scientist. But, from an trends perspective, it's looking that by 2018, there's going to be a demand so high, there's not going to be enough talent. How are, what's your perspective on LinkedIn? Are you, have you, it sounds like from a foundational perspective, it is a data driven company that really values data, is that something that you see as a potential issue or you really have built a culture of such, not just collaboration and innovation, but education that LinkedIn is in a very good position? >> Yeah, well so one thing is that, I didn't mention in terms of the happiness factor right? Is that it is actually a place where data scientists look for a place where they can also grow and learn and be with other like-minded data scientists. So I think that's something that we strongly support, again for companies that, people that may be viewing this and are not in such environments, there are a lot of ways to do this. So keeping data scientists happy also can be facilitating meetups, right with data scientists from your local region, and so those are ways that people share information and share techniques and share challenges even right? >> Interviewer: Yeah. >> Because this a growing and evolving field. And so that's, having that community and one of the things that's amazing about this conference is that it's creating this community of data scientists that are all sharing successes and failures as data science is evolving. The other thing is that data science draws from so many different backgrounds right? >> Yeah. >> It's a broad field, right, and there's so many different kinds of data science, and even that is getting both more specialized and more broad. So I think that part of it is also looking at different backgrounds, different educational backgrounds and figuring out how can you expand the pool of people that you're looking at, you know that are data scientists? >> Interviewer: Right. >> And how do you augment what skills they may not have yet, you know, on the job or through training or through online education, and so we're looking at all of these ways so. >> That's fantastic, we've heard a lot of that today. The fact that, the core data science skills are still absolutely vital, but there's some other sort of softer skills, you talked about sharing. Communication has come up a number of times today. It's really a key, not only to be able to understand and interpret the data from a creative perspective and communicate what the data say. But to your point, to grow and learn and keep the data scientists happy, that social skill element is quite important. >> Yael: Yes. >> So that was, that was an interesting learning that I heard today, and I'm sure you've heard many interesting things today that have inspired you as well. >> Yeah, and that's something that you know, creating this culture is something that even data science leaders around the world, where we're discussing this and talking about this, you know what are the challenges? And how do we evolve this field? And how do we help define and help kind of groom the next generation of data scientists? >> Interviewer: Right. >> And to be in a more stable and be in a better place than where we were and to help to continue to evolve it, and so it is yeah. >> Evolution, it's a great word. I think that that's another theme that we've heard today and as much as I'm sure you've inspired and educated these women that are here. Not just in person today, but all the what 70, 70 cities and 25 countries it's being live streamed. >> Yael: Yeah, it was 80 cities and six continets. >> It's growing it's amazing. >> And yeah. >> And I'm sure that they'd vote a 10 from you, but it's probably just in the little bit that we've had a time to chat, I'm sure that you're probably gleaning a lot from them as well. >> Yeah, definitely, absolutely. >> And it's the, we're scratching the surface. >> Yes, absolutely and so there are many more years to come. >> Interviewer: Exactly, Yeal thank you so much for joining us on The Cube. >> Thank you, it's pleasure. >> It's a pleasure talking to you, we wish you continued success at LinkedIn. >> Thank you, it's a pleasure. >> And we want to thank you for watching The Cube. We've had a great day at the 2nd annual Women in Data Science conference at Stanford University. Join the conversation #wids2017. Thanks so much for watching, we'll see ya next time. (rhythmic music) >> Voiceover: Yeah.

Published Date : Feb 4 2017

SUMMARY :

University, it's the Cube, Welcome back to The Cube, we are live So excited to have you here, So on the data products, much Right, fantastic, so tell us the business, to really that everybody benefits from. the business side, vice versa, kind of data is infused in the culture and especially the size of the So to have that built in and One of the ways we might What are some of the basic and the culture has to be at the data to determine that really needs to be the business side to be one of the things that we do So we have actually train that rally empowers the team. And it's open to anyone and that enable them to be productive. the best job to apply something that we strongly community and one of the and even that is getting And how do you augment what and interpret the data So that was, that was And to be in a more stable all the what 70, 70 cities Yael: Yeah, it was 80 And I'm sure that they'd scratching the surface. Yes, absolutely and so there Yeal thank you so much to you, we wish you continued And we want to thank

ENTITIES

Entity	Category	Confidence
Yael	PERSON	0.99+
Yael Garten	PERSON	0.99+
400	QUANTITY	0.99+
LinkedIn	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
70	QUANTITY	0.99+
2018	DATE	0.99+
Yeal	PERSON	0.99+
second category	QUANTITY	0.99+
25 countries	QUANTITY	0.99+
2017	DATE	0.99+
Android	TITLE	0.99+
both	QUANTITY	0.99+
One	QUANTITY	0.99+
80 cities	QUANTITY	0.99+
one	QUANTITY	0.99+
iOS	TITLE	0.99+
Stanford University	ORGANIZATION	0.99+
two domains	QUANTITY	0.99+
70 cities	QUANTITY	0.98+
two main categories	QUANTITY	0.98+
one day	QUANTITY	0.97+
today	DATE	0.97+
The Cube	TITLE	0.97+
one side	QUANTITY	0.97+
10	QUANTITY	0.97+
Forbes	ORGANIZATION	0.94+
one thing	QUANTITY	0.94+
Women in Data Science Conference	EVENT	0.92+
one simple example	QUANTITY	0.92+
#wids2017	EVENT	0.9+
one category	QUANTITY	0.9+
Women in Data Science Conference	EVENT	0.89+
six continets	QUANTITY	0.88+
Stanford University	ORGANIZATION	0.86+
first	QUANTITY	0.85+
Women in Data Science conference	EVENT	0.85+
plus million accounts	QUANTITY	0.82+
2nd annual	EVENT	0.82+
Stanford University	LOCATION	0.8+
2nd	EVENT	0.79+
Women in Data Science	EVENT	0.74+
two different examples	QUANTITY	0.69+
second	QUANTITY	0.68+
career panelists	QUANTITY	0.64+
Science	ORGANIZATION	0.61+
things	QUANTITY	0.54+
Cube	ORGANIZATION	0.48+
annual	QUANTITY	0.47+

Stephanie Gottlib, Agyleo Sport - Women in Data Science 2017 - #WiDS2017 - #theCUBE

>> Narrator: Live from Stanford University, it's theCUBE. Covering the Women in Data Science Conference 2017. >> Welcome back to theCUBE, we are live at Stanford at the second annual Women in Data Science Conference. I am Lisa Martin, joined by one of today's speakers from the event, Stephanie Gottlib. Stephanie, welcome to theCUBE. >> Thank you. >> You had a very interesting talk, which we'll get to in a minute, but you are currently the president of Agyleo Sport. We want to talk about that as well. You've been in the software and technology industry with oil and gas for a very long time, you've got a Bachelors, Masters, just a few years. >> Okay, thank you. >> Just you're, you've got expertise. That many people would desire. So we'd love to understand what your talk was about today, with respect to oil and gas. Data, digital transformation in oil and gas. You said "Data is the new oil." Which I just love that. Talk to us about that, what does that mean with respect to digital business transformation, and that industry? >> Yeah, so first of all, I say Data Science is definitely an area in which a woman, which I think is one of the main topic of today, will have a huge opportunity to move the needle. It's, I mean when you look at the, some numbers, I start in my talk with this example. In France, what is the proportion of women entrepreneurs involved in technology startups? And the answer is in the range of 8 to 12 percent. >> Lisa: Wow. >> I mean, in France right, I mean, economic-wise it's not perfect. But we have a long history, I think, human rights are there and so on, we are open. And to still be at this level, it's not dramatic, but to honest a lot remains to be done. And Data Science, it's a fantastic opportunity for women to change that drastically in the future. So that was cool to be invited to this presentation and see the huge potential that all those womans present for the future. So, having said that, now regarding my talk. What I wanted to bring on the table was about to put all the main foundational story to move into this new digital world. I mean, for industries which have been very conservative for a long time with old legacy aspect in it, moving to this digital world is not trivial. And you have three main components to handle with, which they have to address a bit differently. Which are about the goals, they have to adapt the way to think about, what are the new goals now? Which is mainly about asset utilization and maximizing the efficiency, the cost efficiency, the effectiveness, the safety and reliability and so on. How to integrate all of those technical new stuff, I mean, we are talking about Internet of Things, with plenty of new sensors everywhere in the field. HPC, High Performance Computing, for heavy computation, et cetera, et cetera. So that's some big topic, right? To digest for those industrial guys, and the last pillar which is, for me, the most crucial one is about the control change. Because beyond everything, you know, technical stuff. It's a matter of time, it's easy. But the control aspect is really essential. If you don't get the control right to instill some change management, you will likely fail. And a successful and valuable transformation comes with organization that have learned how to involve all of the entities, not just technical but legal, HR, accounting, sales marketing, all together to be aligned and to go to it. >> That's such a great point. Cultural evolution is critical, it's so hard. >> Stephanie: Absolutely. >> Right? You talk about whether it's a big oil company, or a big tech company, or another company that's large in another industry. Are you saying, though, I completely agree with you that cultural transit is the essential component. In oil and gas industry, how have you seen Data Science drive or influence cultural transformation? >> For sure, I mean the data now is in the center of everything. When I said, and you repeated, "Data is the new oil." Until recent past, we were driven by product centric approach. Today it's all about services and it's all about data. And that is a different paradigm that we need to integrate in the industry and in the oil and gas that I know better. To get the best benefit from it. It's a challenge but it's a fantastic and very passionate challenge to handle in the future. So that's why we have opened a center actually here, for example, in the Bay Area, to be close to the heart of what is happening in Data Science. >> Oh, fantastic, one of the things that you also said in your talk was that transformation through data analytics is equally as relevant on the operational side of a business as it is on the financial side. Expand upon that a little bit. >> Yeah, actually on the financial side, so the operational exploration prediction aspect I think it's more or less understandable. On the financial side it's a bit more hidden. But for too long our industry, I mean the oil and gas industry, have been substantially blind by not understanding how to best choose their commercial data in a holistic way. And now new startups, actually, have instilled some new way to think about that. Instill and develop new products based on machine learning combining machine learning, financial analysis. Et cetera, et cetera. Together to gain in accuracy, to gain in predictability, and a key factor is to... Get access to this information in a much faster time. And you know in our, in any industry, but in oil and gas industry time and precision cost a lot of money. >> Absolutely. What are some of the things that you would recommend to some of the young girls that are here, young women that are here, in terms of being able to influence an industry and elicit cultural change from an education perspective, is it just Data Science or what are some of the other skills and backgrounds do you think they need to be able to drive such change? >> Yeah, I think the conference was touching this point since this morning, and there is no clear answer obviously. There is no recipe, but for sure, I think many industrial today are still mirrored in the old ways. And they really need some fresh input, some fresh... Insight to really drive the culture right, the strategy right, that is necessary to move on the valuable and the successful transformation. And this fresh input, this fresh insight, I think can be completely an opportunity for woman to jump into this... This jobs or this, this aspect of the story. And with either the technical angle or the managerial angle I think it can be both right? And it's not exactly the same sort of skills that are behind. So skill wise, you know, let's be passionate. If you love the data, if you enjoy playing with the data, I think you will be perfect, doesn't matter if you are a man, a woman, I mean you are just a data scientist at the end. With skills and it's all about what you can bring and value to the company that you will work for. >> Lisa: Right. >> So go for it, I mean the Data Science world is an oyster, right? >> Absolutely. >> So go for it! >> Yes. >> I mean, really. It's a fantastic opportunity. >> It is, and some of the things that we heard today from the skills perspective is kind of opening it up or maybe broadening it a bit, absolutely the core Data Science skills are essential. The blend of hacker, statistician, mathematician, scientist, but also looking at some of the softer skills, creativity. Communication. >> Stephanie: Correct. >> And being able to understand enough of the business. >> Stephanie: Correct. >> To bring and really marry those two together. Have you seen that trend in kind of this ideal background coming up in the oil and gas industry? >> Yeah, of course, at the end of the day you've perfectly summarized all the skill set that a good data scientist needs to have. And this curiosity for the domain of application because Data Science either you can work for university then you can approach Data Science from an academic and fundamental thinking, but to be honest most of the time and most of the jobs are using Data Science for a purpose and for an application, so then you need to adapt yourself and be sure that you will have this curiosity, you need to adapt yourself to the knowledge world. And not the opposite, so this ability of adaptation, of curiosity, of passion for the type of problems or challenges, issues, that you will have to address through the Data Science world will be key, and it's really up to everybody to analyze if they want to go for it or not. >> I think that's a great point that you brought up, that adaptation. We have actually heard that a number of times today, that person needs to have the skills but also the adaptation, the flexibility. >> Stephanie: Correct. >> Along those lines, adaptation maybe, talk to us about what your current role is at Agyleo Sport. >> Yeah, with not real transition. (laughter) I moved, I quit Schlumberger a few months ago. My job, I loved my job, but I still live in France. It was difficult to be abroad so often. Anyway, I decided to change life but still I tried to stop working and I almost died. (laughter) So I decided to move forward to another challenge, really. And the new challenge is to combine and reconciliate my two passions, which are digital and sports. >> I love that, tell me more about that. >> So the idea is to raise a fund which would be the first independent fund in France, venture capital fund I mean. Addressing the sport and technology vertical. So domain, market, industry. You know sport, to make the link with what I express today, in fact sport is almost an industry like any other one. And the transformation of sport with integration of all this new tech have to be addressed and everything has to be done. So when you think how to revolutionize the way sport is handling either on the professional side or amateur side. You know, and the more I am digging into this new market for me, it's amazing. The opportunities are tremendous. And so we are pretty close to close our fund and to be, to get ready to invest in some passionating startups. Dynamic statups on this topic. I've just closed some partnership as well with, in LA, where sport tech is already booming. So it's going on and it's quite an exciting new, different, but, challenge that I am taking right now. >> It sounds so interesting. And wrapping things up, you bring up a great point that you've adapted but you've also been able to recognize the linkage between your favorite passion, sports, and technology and digital. And these days especially, we're a bit biased living in Silicon Valley where every company is a tech company, car companies et cetera. It's a really great message for the younger generation to understand, follow your passion. And there's technology there, and were going to need those diverse perspectives to help bring it to life and evolve it. >> Absolutely, so I think I realize that it's a luxury. At the point to have a choice to decide what you like to do in life, but it's also true that you have to address one in your early stage, early years, and giving you the maximum opportunities for the future is important. And then you can have this luxury, effectively to decide for your passion and to be driven by your passion. >> There's the Nirvana exactly. Well Stephanie thank you for those wise words of wisdom. Thanks so much for, >> Thank you very much. >> Stopping by theCUBE today, it's been a pleasure having you on. >> Me too, thank you. >> And we are going to be right back. We are live at the Women in Data Science Conference. Stick around, coming right back. (gentle electronic music)

Published Date : Feb 4 2017

SUMMARY :

Covering the Women in Data at the second annual Women You've been in the software You said "Data is the new oil." And the answer is in the and maximizing the efficiency, critical, it's so hard. the essential component. for example, in the Bay Area, is equally as relevant on the I mean the oil and gas industry, What are some of the things are still mirrored in the old ways. I mean, really. It is, and some of the enough of the business. Have you seen that trend in and most of the jobs are using that person needs to have the skills talk to us about what your And the new challenge is So the idea is to raise the younger generation to At the point to have a choice to decide There's the Nirvana exactly. it's been a pleasure having you on. We are live at the Women

ENTITIES

Entity	Category	Confidence
Stephanie Gottlib	PERSON	0.99+
Stephanie	PERSON	0.99+
8	QUANTITY	0.99+
Lisa Martin	PERSON	0.99+
France	LOCATION	0.99+
Lisa	PERSON	0.99+
LA	LOCATION	0.99+
Agyleo Sport	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
today	DATE	0.99+
two passions	QUANTITY	0.99+
Today	DATE	0.99+
12 percent	QUANTITY	0.98+
Bay Area	LOCATION	0.98+
both	QUANTITY	0.98+
two	QUANTITY	0.98+
this morning	DATE	0.97+
#WiDS2017	EVENT	0.97+
Women in Data Science Conference 2017	EVENT	0.97+
one	QUANTITY	0.95+
Women in Data Science Conference	EVENT	0.95+
Stanford University	ORGANIZATION	0.94+
Stanford	LOCATION	0.93+
theCUBE	ORGANIZATION	0.88+
Schlumberger	PERSON	0.85+
Women in Data Science 2017	EVENT	0.85+
first	QUANTITY	0.74+
second annual	EVENT	0.72+
few months ago	DATE	0.71+
first independent	QUANTITY	0.7+
three main components	QUANTITY	0.65+
#theCUBE	EVENT	0.47+
Nirvana	ORGANIZATION	0.39+

Sinead Kaiya, SAP | Women in Data Science 2017

>> Announcer: Live from Stanford University. It's theCUBE. Covering the Women in Data Science conference, 2017. >> Hi, welcome back to theCUBE, live from Stanford University at the second annual Women in Data Science tech conference. We are here with the COO of Products & Innovation at SAP, Sinead Kaiya. Sinead, welcome to theCUBE! >> Thanks very much! It's great to be here. >> It's great to have you. You were one of the keynote speakers today. >> Sinead: I was. >> Talk to us about your role at SAP and some of the topics that you discussed to the large audience here today. >> Yeah, absolutely. So one of the things I was happy to open my keynote with was letting them know that I'm actually not a data scientist. Because while I think it's important that that community gets together and shares their knowledge, I'm actually coming from the industry business angle. And for the young women who are here starting out in data science, I thought it's also very interesting and important for them to also hear the business perspective on data science. So that was my main contribution to the talk today. And I got a lot of great feedback, that they really appreciated getting that perspective. >> I can't imagine that you wouldn't, because data science is a boardroom conversation now. You report to the CEO. Talk to us about the connection that you help the CEO understand about the value that data science can bring to organizations like SAP. >> Right. It's actually funny. We have recently re-equipped some of our major boardrooms in SAP with huge digital touchscreens. They're absolutely phenomenal, and the reason is because the CEO truly understands, as do the board members, that the power of many of their decisions are lying today in the data. And what they don't want is a static printout on some slides or some chart that somebody hands to them. They want to be able to touch the data and explore the data, and really try to dig into it themselves. So when it comes to the question of the data, I think for CEO's this is a no-brainer. Right, they're drowning in data. They have a lot of data. They understand that. But the point of my talk today was more about the science. So I think where CEO's need to go next, is understanding that just having reams of data and being able to slice and dice it is not going to cut it anymore. You need the young women in these professions that bring the scientific discipline to that data, which is incredibly technical, around machine learning algorithms, to actually start to make sense of that data. So this is a switch for CEO's. The data is a no-brainer, but the science is a new thing that's starting to creep into the boardroom. And they're starting to learn that machine learning and these technologies are going to be very important in how they drive their businesses. >> What's the perception of that at SAP, and what are some of the things that are going on on the technology side to bring that data science in, to make sense of this data and extract value for SAP? >> So obviously SAP has a very strong portfolio of analytics products as well as our SAP HANA in-memory data platform, but where the power of it, is when we start co-innovating with our customers, because it all comes to life once it reaches the customer. So I gave a couple of examples in my keynote today, on how we're co-innovating with, for example, our customer Trenitalia. So Trenitalia is the largest provider of train service in Italy. They move about two million passengers a day. >> Wow. >> And about 80 million tons of freight a year. And they're collaborating with SAP to not only, how do you say, equip all their trains with sensors and be able to be getting that real-time data, how do they connect that with the IT data in their maintenance systems, so that when a train, let's say we know before it's going to break, before it does, and the machine already has triggered the maintenance technician, has already scheduled it, and everything happens in a very smooth and automated way. So it's once we go to the real problems that our customers are having, and we can apply our in-memory technology to their problems, that we get the real value. >> Right. That's such an interesting example. Like, intelligent train, digital train, how do those come together to enable them to meet their customers' objectives. >> Absolutely. Another interesting topic that I talked about was business without bias. So this is a new feature set that we're building into our HR systems. So SAP SuccessFactors has systems that people use for recruiting, and then taking you through the whole HR life cycle from promotions to talent management to compensation. But obviously, anybody who's been through these processes know that there's a certain element of human bias along the way. So, one of the things I talked about is how we're using machine learning to enhance our HR product, so we can try to at least identify some of the bias, if not start to remove it from the system. So... >> This is, sorry. We actually were speaking with someone on the show earlier today, who was looking at how to remove bias from the recruiting process, and creating technology for college campuses and students to be able to use. It's game-based technology, and I thought it was really interesting, because oftentimes recruiting, looking at GPA's, test scores, maybe some of those other hard factors, but now with data science and the ability to understand and add some of the behavioral insights in, really interesting applicability and how that can influence the next generation of people working for lots of different industries and companies, including SAP. >> And it's not just because it's technically interesting, or because it's the right thing to do. To take it from the CEO angle, CEO's today recognize that if they want to solve the big challenges that are on their plate, they not only need the best talent, they need the most diverse talent. But I can see from my experience, just because the CEO decides that diversity should be a corporate priority, and just because people say "yeah, we think that's a good idea," how do you actually codify that in the systems that your employees are using in the business? So the question of, do we need diversity in business, is no longer on the table. But it's rather, how do we actually start to implement that in a more systematic way, so that it's not just wishful thinking. It's actually something that's built in. >> Right. Talk to us about who your collaborators are within SAP, on things like that. Who do you work with, departmentally, function-group-wise, to help make that "yes, we understand, we need to do this" into actually real-world applicability? >> Well, one of the things I talk to, and some advice I gave the young women today, which is true for software in general, is they have to collaborate with the end user. So if you want to build in these bias checks into the HR system, do not sit alone in your laboratory. Do not sit in front of your computer and try to guess what you think is needed. Go out and shadow a recruiter for a week. Go and sit with the end user. Go and understand and truly see what their problems are, and then really involve them in the solution. So, I think that will also help when we talk about how do the young women here take all the academics and all of the, how do you say, theory that they're creating, and start to apply that in a real business context. If you haven't involved the end user, that's going to be quite hard to do. So one of the things I told them is, go to the user. >> That's great advice. I'm curious though, your perspective, coming from the business side, you know we look at data science, Forbes said it's going to be the best job to apply for in 2017. We're also seeing statistics that show, by 2018 there's going to be a shortage. The demand will be so high for data scientists that there will be a shortage. If we kind of look at the evolution of data science and where we are now, you look at the traditional skills. Stats, math, sciences, computing, maybe former hackers. Some of the things that we've heard today that I'd love to get your opinion on, being a businesswoman, is people are now saying, you know, it's the ability to be creative, to analyze and interpret, but also to communicate the information. Another thing that came up that I thought was really interesting was the factor of empathy when you're evaluating different types of data. I thought that was really interesting. I'd love to get your advice for a young woman who might be thinking about majoring in computer science, but maybe her interests really lie in sports or something that you think, is there a technology there? Well yeah. What advice would you give, and what are some of the additional core skills that you see a successful data scientist of the future needs to have? >> Right. So I love that you brought up the topic of communication, because I see in the business world, this is so important. So when you talk about competitive advantage, all of the companies can go out and hire people with, let's say, equivalent technical skills. So we can all get to the same level of technical prowess, let's say, in an industry. But do you have the people who, like you said, can apply the creativity and then find a way to communicate the results back in a superior way? So I think they are going to find that just having the technical skills in business is never enough to really break that ceiling. You have to have absolutely phenomenal communication skills. >> Definitely. >> I also gave them the advice to take a couple of business courses. It really helps to understand how the decision-makers, who you're trying to influence, what are the strategies that they use? What are the challenges that they face? And how do you actually look at some of the problems of data science more from a business perspective? I told them, what I thought is, absolutely the most hireable data scientist would be someone with some domain expertise, someone with the technical background, but somebody who also knows about business. So we need the full package. >> Absolutely! Well and that's an important point, because technology evolves. It's also the catalyst for our evolution, and naturally, any role will change and evolve. I think communication is a core, a very horizontal skill. But I definitely also would agree with your recommendations that having some business acumen in some form or fashion is really going to be key. Tell us a little bit about, what are some of the things, when somebody's coming on to SAP as a data scientist, if they maybe don't have that business background, are they able to get that within, because the culture at SAP kind of supports sort of, cross-collaboration, cross-pollination, so that they might be able to just start to learn different perspectives, to become that package that we talked about. >> Right. So in SAP, of course we have multiple opportunities for employees to either move between departments and see different areas of the company, but as a data scientist at SAP, the best experience you're going to have is working with our customers. It's one of our greatest assets and our greatest pride, is the wonderful relationship we have with hundreds of thousands of leading businesses around the world. So by joining SAP, you get to collaborate with some of the really top companies and industries. And that is when it doesn't become business theory in books. You actually get to go to the customer and see how it touches their business, and where it becomes real. And I think this is what attracts so many people to SAP, and gets them to really engage and stay at SAP, is that phenomenal customer base that we have. >> That's fantastic. Well, that real-world applicability, there isn't anything better than that. You can learn a lot of theory in textbooks, and maybe obviously be able to apply some of it, but having that expertise when something doesn't go the way that it's printed, is really really key to helping shape someone. Speaking of shaping, I'm interested in how you've been at SAP for quite some time, you've had posts in Germany and France, which is amazing. Now you're based in New York. Tell us how you've seen, because you really clearly understand the business side and you understand the importance of the business side and the data science side, the needs there and how they need to work together to drive more value, innovation, drive products, drive revenue. How have you seen SAP's culture evolve to become open to, for example, business and data science merging and being core collaborators? >> Yeah, so I mean, SAP's industry has changed a lot over the recent years. And we've done that along with our customers. So our customers are obviously in a much more tight competitive situation in the whole digitization side of things. So we've been evolving along together with them. But to go back to my other point, one of the major changes or cultural shifts that I've seen in SAP is this tight collaboration with the end user. It used to be that we were only given access to the IT departments of our customers. So we literally had to work through the filter of the IT department to find out what it is we should build. Suddenly, the IT departments are realizing that the end user in companies have quite a bit of power these days, you know. >> Lisa: Yes they do. >> And they're now opening the doors and asking us to collaborate with them, and that shift has allowed our engineers to get even closer to the end users in our customers. >> Fantastic, and I'm sure that's really a key for driving innovation. Last question for you. We're at the second annual WiDS conference. I mean, what an amazing event. Live streamed, reaching so many people. You yourself were a keynote this afternoon. Diane Greene was a keynote this morning. As you look around this very energetic atmosphere that we're in, what has inspired you? What are you going to take away from WiDS 2017 that you're like, wow, that was really fantastic? >> Well, one of the things is the diversity of the speakers. I mean, the breadth of this topic is amazing. Being a woman in tech, of course it's wonderful to see so many highly intelligent and engaged women in one room, which is something we don't usually get to see. So that's one of the other key takeaways for me. >> Fantastic. Well Sinead, we so appreciate you stopping by theCUBE. We wish you continued success as COO of Products & Innovation, and we look forward to seeing you next time on the program. >> Thanks so much! >> And we want to thank you for watching theCUBE. We are live at the second annual Women in Data Science conference, #WiDS2017, but stick around. We'll be right back.

Published Date : Feb 4 2017

SUMMARY :

Covering the Women in Data at the second annual Women in It's great to be here. It's great to have you. and some of the topics that you discussed So one of the things I was I can't imagine that you wouldn't, or some chart that somebody hands to them. So Trenitalia is the largest and be able to be getting to meet their customers' objectives. So, one of the things I talked about and the ability to understand or because it's the right thing to do. to help make that "yes, we So one of the things I told it's the ability to be creative, that just having the What are the challenges that they face? is really going to be key. and see different areas of the company, and the data science side, that the end user in companies and that shift has allowed our engineers We're at the second So that's one of the other and we look forward to seeing at the second annual Women

ENTITIES

Entity	Category	Confidence
Diane Greene	PERSON	0.99+
Germany	LOCATION	0.99+
Trenitalia	ORGANIZATION	0.99+
Italy	LOCATION	0.99+
Lisa	PERSON	0.99+
New York	LOCATION	0.99+
2017	DATE	0.99+
Sinead Kaiya	PERSON	0.99+
France	LOCATION	0.99+
Sinead	PERSON	0.99+
hundreds	QUANTITY	0.99+
2018	DATE	0.99+
SAP	ORGANIZATION	0.99+
today	DATE	0.99+
Forbes	ORGANIZATION	0.99+
WiDS 2017	EVENT	0.98+
Stanford University	ORGANIZATION	0.98+
one	QUANTITY	0.98+
about 80 million tons	QUANTITY	0.98+
one room	QUANTITY	0.98+
#WiDS2017	EVENT	0.97+
about two million passengers a day	QUANTITY	0.97+
SAP HANA	TITLE	0.97+
a week	QUANTITY	0.96+
Women in Data Science conference	EVENT	0.94+
Women in Data Science	EVENT	0.93+
Women in Data Science tech conference	EVENT	0.92+
Women in Data Science 2017	EVENT	0.92+
a year	QUANTITY	0.88+
this afternoon	DATE	0.86+
this morning	DATE	0.86+
WiDS	EVENT	0.84+
theCUBE	ORGANIZATION	0.76+
earlier today	DATE	0.76+
second annual	EVENT	0.76+
second annual	QUANTITY	0.72+
COO	PERSON	0.68+
thousands	QUANTITY	0.68+
couple	QUANTITY	0.6+
SAP	TITLE	0.58+
SAP SuccessFactors	ORGANIZATION	0.55+
keynote speakers	QUANTITY	0.49+

Ann Rosenberg, SAP | Women in Data Science 2017

>> Commentator: Live from Stanford University it's theCUBE covering the Women in Data Science Conference 2017. (jazzy music) >> Hi, welcome back to theCUBE. I'm Lisa Martin live at Stanford University at the second annual Women in Data Science WiDS tech conference. We are here with Ann Rosenberg from SAP. She's the VP head of Global SAP Alliances and SAP Next-Gen. Ann, welcome to the program. >> Thank you so much. >> So SAP is a sponsor of WiDS. Talk to us a little bit about that, and why is it so important for SAP to be involved in this great womens organization. >> So first of all, in my role as working with SAP's relationship to academia and also building up innovation network we see that data science is a very, very key skill set, and we also would like to see many more women get involved into this. Actually (mumbling) right now as we speak we are at the same time in 20 different countries around the world, 24 events we have. So we are both in Berlin, we are in New York, we are all over the world. So it's very important. I call it kind of a movement what we are doing here. It's important that all over the world that we inspire women to go into data science and into tech in general. So it is important thing for SAP. First of all, we need a lot of data science interested people. You also need our entire SAP ecosystem to go out to universities and be able to recruit a data science student both from a diversity perspective, whatever you are a female or a man of course. >> Absolutely, you're right. This is a very inspiring event. It's something that you can really actually feel. You're hearing a lot of applause from the speakers. When you're looking enabling even SAP people to go out and educate and recruit data scientists, what are some of the key skills that you're looking for as the next generation of data scientists? >> This is an interesting thing because you can say that you need like a very strong technical skill set, but we see more and more, and I saw that after I moved to Silicon Valley for two years that also the whole thing about design thinking, the combination of design thinking and data science is becoming something which is extremely important, but also the whole topic about empathy and also, so when you build solution you need to have this whole purpose driven in mindset. So I think what we're seeing more and more is that it's great to be a great data science, but it takes more than that. And that's what I see Stanford and Berkeley are doing a lot, that they're kind of mixing up kind of like the classes. And so you can be a strong data science, but at the same time you also have the whole design thinking background. That's some of the things that we look for at SAP. >> And that's great. We're hearing more and more of that, other skills, critical thinking, being able to not only analyze and interpret the information, but apply it and explain it in a way that really reflects the value. So I know that you have a career, you've been in industry, but you've also been a lecturer. Is this career that you're doing now, this job in alliances and next-gen for SAP sort of a match made in heaven in terms of your background? >> I actually love that question, probably the best question I ever got because it is definitely my dream job. When I was teaching in Copenhagen for some years ago I saw the mind of young people. I saw the thesis, the best of master thesis. I saw what they were able to do, and I'm an old management consultant, and I kept on thinking that the quality of work, the quality of ideas and ideations that the students come with were something that the industry could benefit so much from. So I always wanted to do this matchmaking between the industries and the mind of young people. And it's actually right now I see that it's started kind of, what I at least saw for the last two years that the industries that go to academia, go to universities to educate or to students to work on new ideas. And of course in Silicon Valley this has been going on for some time now, but we see all over the world. And the network that I'm responsible for at SAP, we work in more than 106 countries around the world, with 3,100 universities. And what I really want to do now, I call it the Silicon Valleys of the world where you are mapping the industries with academia with the accelerators and start ups. It's just an incredible innovation network, and this is what I see is just so much growing right now. So it's a great opportunity for academia, but equally also for the industry. >> I love that. Something that caught my eye, I was doing some research, and April 2016 SAP announced a collaboration with the White House's Computer Science for All Initiative. Tell us about that. >> I mean the whole DNA of SAP is in education. And therefore we do support a number of entity around the world. Whatever we talk about building up a skill set within data science, building skill set in design thinking, or in any kind of development skills is really, really important for us. So we do a lot of work together with the governments around the world. Whatever you talk about the host communication, for example, we have programs called Young Thinkers, Beatick, where you go out to high schools or you go into academia, to universities. So when this institute came up, we of course went in and said we want to support this. So if I look at United States, so we have a huge amount of universities part of the network that I'm driving with my team. So we have data curriculums, education material, we have train to train our faculties, boot camps. We do hackathons, coach games. We do around 1,200 to 1,600 hackathon coach games per year around the world. We engage with the industries out to the universities. So therefore it was a perfect match for us to kind of support this institute. >> Fantastic. Are there any things that SAP does as we look at the conference where we are, this Women in Data Science, are there things that you're doing specifically to help SAP, maybe even universities bring in more females into the programs, whether it's a university program or into SAP? >> Yeah, so for SAP in our whole recruiting process we definitely are looking into that. There is a great mix between female and male people who get hired into the company, but we also, it all start with that you actually inspire young women to go into a data science education or into a development education. So my team, we actually go in before SAP recruiting get involved where we, that's why we build up the strong relationships with universities where we inspire young women, like we do at this event here to why should they go in and have a career like this. So therefore you can see there's a lot of pre=work we need to be done for us to be able to go in and go into the recruiting process afterwards. So SAP do a lot of course in the United States, but all over the world to inspire young women to go into tech. And SAP does what we see today all over the world we have huge amount of female from SAP, female speakers at all our events who stand as role models to show that they are women, they are working for SAP, and are very, very strong female speakers and are female role models for all young women to get involved. So we do a lot of stuff to show that to the next generation of data science of whatever it is in tech. >> Yeah, and I can imagine that that's quite symbiotic. It's probably a really nice thing for that female speaker to be able to have the opportunity to share what she's doing, what she's working on, but also probably nice for her to have the opportunity to be a mentor and to help influence someone else's career. So you mentioned accelerators a minute ago, and I wanted to understand a little bit more about SAP Next-Gen Consulting, this collaboration of SAP with accelerators or start ups. How are you partnering to help accelerate innovation, and who is geared towards? Is it geared more towards student? Or is SAP also helping current business leaders to evolve and really drive digital transformation within their companies? >> So the big (mumbling) I'm working on right now too is as mentioned you said SAP Next-Gen is called SAP Next-Gen Innovation With Purpose. So it's linked to the 17 U.N. global goals. We've seen from now in Silicon Valley when you innovate you actually make innovation web purposes included. And that's why we kind of agreed on in SAP why don't we make an innovation network where the main focus is that all the innovation we get out of this is purpose driven linked to the 17 global goals. Like the event here is the goal number five, gender equality. In that network we actually do the matchmaking between academia. We look at all the disrupted new technologies, experience the technologies like machine learning like what's being discussed a lot here, block chain IOT. And then we look at the industry out there because the industries, they need all the new ideas and how to work with all the new opportunities that technology can provide, but then we also look into accelerator start ups. The huge amount, and often when you're in Silicon Valley you kind of think this is the world of the start ups of the world. So when you travel around the world, that's we we looked into a lot the last two years. We call the Silicon Valleys of the world, any big city around the world, or even smaller cities, they have tech hub. So you have Ferline Valley, you have Silicon Roundabout in London, you have Silicon Alley in New York, and that is where there is a huge amount of gravity of start ups and accelerators. And when you begin to link them together with the university network of the world and together with the industry network of the world, you suddenly realize that there is an incredible activity of creativity and ideations and start ups, and you can begin to group that into industries. And that give industries the opportunity not only to develop solution inside the company, but kind of like go in and tap into that incredible innovation network. So we work a lot with seeding in start up, early start ups into corporates, and also crowd source out to academia and the mind of young people all Next-Gen Consulting project where you similar work with students at universities on projects. It could be big data science project. It could be new applications. So I see like as the next generation type of consultancy and research what is happening in that whole network. But that is really what SAP Next-Gen is, but it is linked to the 17 U.N. global goals. It is innovation with purpose, which I'm really happy to see because I think when you build innovation, you really think about in the bigger, the whole (mumbling) thing that we know from singularity. You should think about a bigger purpose of what you're doing. >> Right, right. It sounds like though that this Next-Gen Consulting is built on a foundation of collaboration and sharing. >> It is, it is, and we have three Next-Gen lab types we set up. In this year we built, last year, we are a new year now, we built 20 Next-Gen labs at university campuses and at SAP locations. And here in the new year more labs is being set up. We are opening up a big lab in New York. We just recently opened up one in Valdov at SAP's headquarter. We have one here in Silicon Valley, and then we have a number of universities around the world where SAP's customers go in and work with academia, with educators and students because what do you do today if you're in industry? You need to find students who are strong in machine learning and all the new technologies, right? So there's a huge need for in industry now to engage with academia, an incredible opportunity for both sides. >> Right, and one last question. Who are you, in the spirit of collaboration, who do you collaborate back with at SAP corporate? Who are all the beneficiaries or the influencers of Next-Gen Consulting? >> So I collaborate, inside SAP I collaborate, SAP have a number of, we have ICN, Innovation Center Network. We have our start up focus program. We have a number of innovation, the labs, a number of basically do all our software developments, so they're heavily involved. We have our whole go to market organization with all our SAP customers and industry, I call them clubs. And then externally is of course academia, universities, and then it is the start up communities, accelerators and of course, the industry. So it is really like a matchmaking. That's like, when people ask me what do you do, and I'm a matchmaker. That's really what I am. (Lisa laughs) >> I like that, a matchmaker of technology and people all over. So you're on the planning committee for WiDS. Wrapping things up here, what does this event mean to you in terms of what you've heard today? And what are you excited about for next year's event? >> So for me, one year ago when I heard about this year I kind of said this is important, this is very important. And it's not just an event, it's a movement. And so that was where I went in and said you know, we want to be part of this, but it must be more than just an event here. It's staying for the need to be much more than that. And this is where we all teamed up, all the sponsors together with ISMIE, and we said okay, let us crowd source it out, let us live stream it out much more than ever. And this is also what the assignment is now, that we to so many locations. This is just the beginning. Next year is going to be even bigger, and it's not like that we will wait to next year. We this week announced the SAP Next-Gen global challenges linked to the 17 U.N. global goals. So we are inspiring everybody to go in and work on those global challenges, and one of them is goal number five, which is linked to this event here. So for us and for me this is just the beginning, and next year is going to be even bigger. But we are going to do so many event and activity up to next year. My team in APJ, because of the Chinese New Year, have already been planned coming up here. >> Lisa: Fantastic. >> And we have been doing pre-event, (mumbling) events. So again, it is a movement, and it's going to be big. That's for sure. >> I completely can feel that within you. And you're going to be driving this momentum to make the movement even louder, ever more visible next year. >> Ann: Yeah. >> Well Ann, thank you so much for joining us on The Cube. We're happy to have you. >> Thank you so much for the opportunity. >> And we thank you for watching The Cube. I am Lisa Martin. We are live at Stanford University at the second annual Women in Data Science Conference. Stick around, we'll be right back. (jazzy music)

Published Date : Feb 4 2017

SUMMARY :

covering the Women in Data Stanford University at the important for SAP to be around the world, 24 events we have. as the next generation of data scientists? that also the whole thing So I know that you have a the industries that go to the White House's Computer I mean the whole DNA the conference where we are, in the United States, and to help influence all the innovation we get this Next-Gen Consulting And here in the new year Who are all the beneficiaries and of course, the industry. does this event mean to you of the Chinese New Year, and it's going to be big. the movement even louder, We're happy to have you. And we thank you for watching The Cube.

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
Ann	PERSON	0.99+
Ann Rosenberg	PERSON	0.99+
SAP	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
Lisa	PERSON	0.99+
White House	ORGANIZATION	0.99+
New York	LOCATION	0.99+
Berlin	LOCATION	0.99+
April 2016	DATE	0.99+
last year	DATE	0.99+
24 events	QUANTITY	0.99+
Copenhagen	LOCATION	0.99+
Stanford	ORGANIZATION	0.99+
3,100 universities	QUANTITY	0.99+
two years	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
ICN	ORGANIZATION	0.99+
one year ago	DATE	0.99+
United States	LOCATION	0.99+
one	QUANTITY	0.99+
20 different countries	QUANTITY	0.99+
three	QUANTITY	0.99+
Silicon Alley	LOCATION	0.99+
Next year	DATE	0.99+
London	LOCATION	0.99+
next year	DATE	0.99+
Innovation Center Network	ORGANIZATION	0.99+
more than 106 countries	QUANTITY	0.99+
Silicon Valleys	LOCATION	0.99+
both	QUANTITY	0.99+
both sides	QUANTITY	0.99+
ISMIE	ORGANIZATION	0.99+
this week	DATE	0.98+
this year	DATE	0.98+
Berkeley	ORGANIZATION	0.98+
around 1,200	QUANTITY	0.98+
Next-Gen	ORGANIZATION	0.98+
today	DATE	0.97+
Women in Data Science Conference	EVENT	0.97+
Women in Data Science Conference 2017	EVENT	0.97+
Chinese New Year	EVENT	0.97+
Global SAP Alliances	ORGANIZATION	0.97+
First	QUANTITY	0.97+
labs	QUANTITY	0.96+
WiDS	EVENT	0.96+
Gen	ORGANIZATION	0.95+
The Cube	TITLE	0.95+
Stanford University	ORGANIZATION	0.94+
Next-Gen Consulting	ORGANIZATION	0.94+
one last question	QUANTITY	0.93+

Miriah Meyer, University of Utah - Women in Data Science 2017 - #WiDS2017 - #theCUBE

>> Announcer: Live from Stanford University, it's the Cube, covering the Women in Data Science Conference 2017. (electronic music) >> Hi, and welcome back to the Cube. I'm Lisa Martin live at the Women in Data Science Conference, second annual, here at Stanford University, #WiDS2017. Fortunate to be joined next by Miriah Meyer, who is an Assistant Professor at the University of Utah in the School of Computing. Miriah, welcome to the Cube. >> Thank you. >> It's great to have you here. You're a speaker at this event this year. >> Yes. >> Tell us a little bit about how you got involved in WiDS and what excites you about being able to speak to this very passionate, invigorating audience? >> Yeah, so I got an invitation from one of the organizers, seems like quite some time ago, and when I looked into the conference, it just looked fantastic. I was so impressed with the speakers they had last year and the speakers for this year. It's a really amazing powerhouse of a community here. The fact that it's a great technical conference that, oh, just happens to be all women, it was pretty awesome, I was pretty flattered to get invited. Then the sort of, the energy in there is really awesome. It is different, it feels different than other technical conferences I go to. >> I completely agree. I love that you talked about just the community, because that's really what it is, and I think some of the, just the vibe that you can feel sitting here is one of excitement, it's one of passion of women who have been in this industry for a very long time in computer science, and then those young girls who are looking for inspiration. I think it's very symbiotic, right? They're learning from you, but I think you're probably also learning from them. >> Definitely. I find that every time I present my work to another group of people, a different community, I always have to come up against what my own assumptions are about how easy or not it is to understand the kind of work I do. I personally find it just so important to communicate clearly, it's probably partly why I do the work that I do. But I learn a lot every time I give a talk at a place like this. >> Wow, outstanding. Well, speaking of your talk, your research is in visualization systems. Share with us what you shared with the audience today, goals, outcomes, current outcomes of your visualization research. >> Mm-hmm. My research passion is around helping people make sense of complex data. I've particularly done a lot of work with scientists, particularly that in biology, where there's just been this amazing explosion of data and people are just trying to wrap their heads around what they have and what kinds of amazing discoveries they're sitting on. But it's really interesting, we've gotten so good at creating data, but then, that's wonderful, but if you can't make sense of it, who cares? >> Lisa: Right. >> I have this incredibly privileged position where I get to go and work with people who are at the cutting edge of their field and learn about this amazing work that they've been spending their lifetime on. Then I help them, I design tools with them that sometimes changes even the way that they're thinking about the problem. It's incredibly satisfying and it's very much in the spirit of team science and it's a lot of fun. I was talking about just some of the basics behind how do you create effective visualizations, which, for me, it also draws heavily on the notion of how do we collaborate effectively, how do you get at people's deep needs when it comes to making sense of data, when they often times can't articulate it themselves. I refer to it as data counseling, because it feels very much like, I talk with people who have problems but they can't articulate it, so I ask them lots of questions to help them uncover the root of their problems. >> Lisa: Right. >> That's basically what I do. >> That data counseling. That's fantastic. >> Yeah, and then you use what you discover in order to design tools. >> Share with us a little bit about the courses that you teach in Computer Science at the University of Utah. >> Yeah, so I teach a graduate level visualization course. It is just about the basic foundational principles we have behind perception and cognition and what that means for how we encode information, and then also, the process of how do you evaluate visualizations effectively. It's a really wonderful course where we have people from, actually, all across campus, so a lot of people are bringing problems that they have in other fields and trying to learn how to be more effective in their own exploration with visualizations. Then at the undergrad level, I actually teach our second semester programming course, so these things are worlds apart. This is one of our large 200 person introduction to data structures and algorithms. >> OK. What are some of the things that are inspiring? We'll talk about your graduate students for a moment. What are some of the things that you find are inspiring them to want to understand data in this way? Is it because they were kids that grew up in STEM programs, or they just had a computer since the time they were little, or are there other factors that you're finding that are really drivers of them wanting this type of education? >> So the students that I work with directly, I think, kind of fall into two camps. One camp is, they're a sort of non-traditional computer scientist, where they enjoy the engineering, they enjoy the programming, but they also really enjoy people and are passionate about making a difference. They also really enjoy the interaction that we have to go through in trying to understand what someone needs. There's also a design component, it's really fun to get to create things that feel good and look good. That's definitely one class, so it's the sort of non-traditional computer scientist. The other class, I have a couple of students who come from a science background, who love science, but find that they like building things more than they like doing the science itself, and visualization is kind of a wonderful place in the middle where you can be part of science but doing the making and building that we do in computer science, as opposed to doing the sort of experimentation and studying that you do as a scientist. That was definitely, for myself, I have a background in science and that's what really drew me, when I discovered computer science and visualization itself. >> What are some of the traditional skills that a good educated computer scientist needed maybe five years ago, and how are you seeing that change? Are there new behavioral traits or skills that really are going to be essential for these people going forward? >> Yeah, I think especially in the space of data science and remembering that at the end of the pipeline there's a person sitting there either bringing their knowledge to bear or that you're trying to tell a story to you from data. I think one trait is the idea of having empathy and being able to connect with people, and to just understand that as technologists, we're, not all of us, but largely creating technology for people. That's something that I think has traditionally been undervalued and perhaps a little bit filtered out by perceptions of what a computer scientist is. But as technology is becoming more ubiquitous and people are understanding the impact that they could have, I think it is bringing in a different group of people that have different motivations for coming to the field. >> What are some of the, as your graduate students finish their education and go on to different industries, what are some of the industries that you're seeing that they're using their skills in? >> Yeah, so a lot of it is getting hired in companies that, their core product that they develop isn't necessarily a piece of technology. But they're using data now to really understand their business needs and things like that. I have a student right now who's actually at a government organization in DC, working with some amazing global health specialists. But these are midwives and social workers and they don't have the deep skills in data analysis. So there's opportunities for people in visualization and data science to go and really make an impact in a whole variety of interesting fields. That's actually one of the things that I always love to tell undergrads who come to talk to me about, "Oh, should I do computer science?" The thing I love most about it is that, whatever your passion is in life, whether it's medicine or whether it's music, or whether it's skiing, there is a technology problem there. If you have those skill sets, you can go and apply it to anything that you care deeply about. >> I couldn't agree more. That's such an important message to get out. I mean, every company, we're sitting here in Silicon Valley, where car companies are technology companies, every company these days, Walmart is a technology company. I think that's an important message for those kids to understand, following their passion. I don't think that that can be repeated enough, because you're right, whatever it is, there's a technology component to that. With that tip, let me ask you, what were some of your passions when you were younger in school? You mentioned your science degrees. But what were some of the things that really helped or maybe people shape your career and where you are today? >> Yeah, growing up, I was, my dad's a scientist, my mother's an artist, so there's definitely, both of those. >> Lisa: Art and science, so yes. >> Yeah, influences of both, and I really wanted to be an astronaut, but it turns out I get really motion sick. >> Oh, that's a bummer. >> So I had to give up that dream. I studied science, but at the same time, my mom always had me creating and doing things with her in her studio. I think I found this love of just being able to make something and how satisfying that is. I think that was influential. Then also, when I was in college, I was an astronomy major, and I had the opportunity to take lots of electives, which, in hindsight, I think was really important, because it let me explore many things. I found myself taking a lot of women's studies classes. What was interesting about that is just the way that you think and problem solve in a discipline like that where it's all critical analysis. That, sort of coupled with the deep analytics that I was, skills I was learning in physics, made for this just really interesting, I think, multiple, gave me perspectives to look at problems in multiple different ways. I think that that's been really important for being able to bring that suite of perspectives to how we solve problems. It's not all just quantitative, and it's just all qualitative. But it's really a nice mixture of both, if it gets us to good places. >> Absolutely. I think that zigzag career path that you're sounding like you're talking about, I know I had one as well, gives you perspectives that you wouldn't have even thought to seek, had you not been on these trails. >> Mm-hmm. >> I think that's great advice that people that are, whether they're in your classes or they're being able to listen to you here, should be able to know that it's OK to try things. >> Yes. Yes, exactly. I think back to the person I was when I was, say, 18. I didn't know. I think the one sort of constant in my career trajectory has been just, wow, this thing looks really interesting, I don't know where it's going to go, but I'm going to follow that path. Inevitably, if it's something that catches your attention, there's going to be something interesting that can come out of it. I think sort of letting go of this need to have everything defined from day one and instead following your passions is, that's the theme I've heard over and over again from the speakers in here, too. >> Absolutely. Don't be afraid to fail is one of the themes that has come out from this morning. Diane Greene, SVP of Google Cloud, who was in morning keynote, had even said, "Don't be afraid to get fired." I mean, could you imagine your parents saying that to you? >> Yeah. >> I couldn't, but it's also something that just shows you that there is tremendous opportunity in many different disciplines and domains for this type. >> By the way, if you have a technical computer science background, you can always find another job. (laughter) >> That is true. What is next on your plate in terms of research, what are you looking forward to the rest of 2017? >> Wow. >> Lisa: Sorry, was that too big of a question? >> Yeah. We have a couple of really interesting problems around color, around some new tools for helping designers and journalists work with data. I think also, I'm starting to think about trying to focus more on K through 12 education and trying to understand what some of the roadblocks are to getting computer science to a younger community of people. In Utah, we have a lot of rural populations. We also have Native American reservations. I think there's some really interesting challenges with getting computer science into those communities. I'm sort of thinking about working with some folks to try to understand more about that. >> That's fantastic. I mean, you bring up a good point, that kind of depending, then, where you are, here we are sitting at Stanford University, one of the pre-eminent universities in the world, and there's a tremendous amount of technology and resources available. But then you look at, really, the needs of communities in Utah, and they need people like you to help, go, "You know what, we have challenges here, and we need to solve that." Because that's part of the next generation of the people that are here speaking at these types of events. >> Miriah: Right. >> Absolutely critical problem. Well, Miriah, thank you so much for being on the Cube. >> Thank you for the opportunity. >> It's been a pleasure, we wish you the best of luck with your big plans for 2017. >> Thanks. >> Lisa: Hopefully, we'll see you next time. >> Great. >> We thank you for watching the Cube again, Lisa Martin, live at Stanford University at the Women in Data Science Second Annual Conference. Stick around, we've got more, we'll be right back. (electronic music)

Published Date : Feb 4 2017

SUMMARY :

it's the Cube, in the School of Computing. It's great to have you here. and the speakers for this year. I love that you talked I always have to come up against Share with us what you shared to wrap their heads around I refer to it as data counseling, That's fantastic. Yeah, and then you that you teach in Computer Science It is just about the basic What are some of the things that you find and studying that you do as a scientist. and being able to connect with people, that I always love to tell undergrads I don't think that that definitely, both of those. and I really wanted to be an astronaut, is just the way that you thought to seek, had you that it's OK to try things. I think back to the person I mean, could you imagine your that just shows you that there By the way, if you have a technical what are you looking I think also, I'm starting to think about and they need people like you to help, go, much for being on the Cube. we wish you the best of luck we'll see you next time. at the Women in Data Science

ENTITIES

Entity	Category	Confidence
Miriah	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
Diane Greene	PERSON	0.99+
Lisa	PERSON	0.99+
Utah	LOCATION	0.99+
Miriah Meyer	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
One camp	QUANTITY	0.99+
two camps	QUANTITY	0.99+
2017	DATE	0.99+
last year	DATE	0.99+
both	QUANTITY	0.99+
200 person	QUANTITY	0.98+
WiDS	EVENT	0.98+
12	QUANTITY	0.98+
five years ago	DATE	0.98+
University of Utah	ORGANIZATION	0.98+
one	QUANTITY	0.98+
this year	DATE	0.97+
one class	QUANTITY	0.97+
second semester	QUANTITY	0.97+
one trait	QUANTITY	0.97+
Stanford University	ORGANIZATION	0.97+
Women in Data Science Conference	EVENT	0.97+
Women in Data Science Conference 2017	EVENT	0.96+
#WiDS2017	EVENT	0.96+
18	QUANTITY	0.96+
second annual	QUANTITY	0.96+
Women in Data Science Second Annual Conference	EVENT	0.94+
today	DATE	0.93+
Stanford University	LOCATION	0.91+
Google Cloud	ORGANIZATION	0.89+
this morning	DATE	0.78+
Women in Data Science 2017	EVENT	0.74+
Native American	OTHER	0.73+
day one	QUANTITY	0.71+
couple	QUANTITY	0.65+
Cube	COMMERCIAL_ITEM	0.65+
School of Computing	ORGANIZATION	0.63+
Stanford	LOCATION	0.63+
students	QUANTITY	0.6+
organizers	QUANTITY	0.58+
University	ORGANIZATION	0.57+
Cube	ORGANIZATION	0.48+

Megan Price, Human Rights Data Analysis Group - Women in Data Science 2017 - #WiDS2017 - #theCUBE

(upbeat music) >> Voiceover: Live from Stanford University. It's the Cube covering the Women in Data Science Conference, 2017. >> Hi, welcome back to the Cube. I'm Lisa Martin and we are at the second annual Women in Data Science Conference at Stanford University. Such an inspiring day that we've had so far and right now we're joined by Megan Price, the executive director of the human rights data analysis group. Megan, welcome to the Cube. >> Thank you. >> It's so exciting to have you here. Megan, you're background is statistics. You have a PhD as a statistician. The Human Rights Data and Analysis Group, HRDAG, is focused on statistical analysis of mass violence. Talk to us about sort of the merger of your bio statistician or your statistician background with human rights. Was that something that you were always interested in? >> Sure. It was and I have to say I was really lucky. I got my Bachelor's and my Master's in statistics from a very technical engineering school in Ohio, where honestly, a lot of people would sort of, pat me on the head and say, "That's nice, that you're interested in human rights. You'll outgrow that." And fortunately I had one very thoughtful mentor, who said to me, "You know, I really think Public Health school is the direction you should go in", and so I got my PhD in biostatistics from Public Health school and it was really there that I was exposed to people who kind of said, "Yeah, social justice, human rights, do that as a day job. Get on it.", and so that was really great that I was exposed to that as something I can move into as a career. >> Exposed to them, but also you had the confidence. You obviously had a mentor that was very influential, but that takes some courage and some guts to go, you know what, yeah, this is needed. >> It's true, yeah. (laughs) >> So talk to us about some of the ... The HRDAG, we talked about it a little bit before we went live. The evolution. Show to our viewers, how it's evolved to what it is today. >> Sure. So the organization, the name and work started with work that my colleague, Dr. Patrick Ball started doing in El Salvador and in Guatemala in the 90s. And at the time, he was working ... He's formed a team to do the work at the American Association for the Advancement of Science. And so that was about 25 years ago. And then the work evolved and the team just kept kind of moving to where the right home was to get that work done and so in nearly 2000s, they moved out here to Paul Walter just up the street to Benetech, another technical non-profit. And they provided us a really nice home for our work for nine years. And then in 2013, the time had really come to be the right time for Patrick and I to spin out HRDAG as it's own non-profit organization. We're fiscally sponsored right now, but we're our own institution, which we're really excited about. >> So you mentioned some of the projects that Patrick was working on. What are some of the things that were really compelling to you, specifically within human rights, that really are catalysts for the work that you're doing today? >> Sure. I think that there are a lot of quantitative questions that get raised in looking at these questions about widespread patterns of violence, and asking questions about accountability and responsibility for violence. And to answer those questions, you have to look at statistical patterns, and so you need to bring a deep understanding of the data that are available and the appropriate way to analyze and answer those questions. >> How do you from an accuracy perspective, I understand that that's incredibly vital, especially where these important issues are concerned, how does HRDAG eliminate, mitigate inaccuracy issues with respect to data? >> Yeah, well we're always thinking about each of our projects as taking place in an adversarial environment, because we ultimately assume that at the end of the day our results are going to be either subjected to the kind of deep scrutiny that comes along with any kind of socially and politically sensitive topic, or with the kind of scrutiny that happens in a court room. And so that's really what motivates the level of rigor that we require in our work. And we maintain that by maintaining our relationship with mostly academicians, who are really pushing these methods forward and staying on top of what is the most cutting edge approach to this problem and how can we really know that we're being as transparent as possible in the way this data were collected, the way they were analyzed, the way they were processed and the limitations of those analysis. You know, the uncertainty present in any estimates that we put out. >> Give us an example of some of the type data sources that you're evaluating, say for the conflict in Syria. >> So in the case of Syria, we have relationships with four organizations that are all collecting information about victims who've been killed in the ongoing conflict in Syria. Those groups are the Syrian Center for Statistics and Research, Syrian Network for Human Rights, the Damascus Center for Human Rights Studies, and the Violations Documentation Center. And those are all citizen led, by groups that are maintaining networks collecting that information to the best of their ability. And they share with us, largely Excel spreadsheets that contain names of victims and any other information they were able to collect about those victims. >> You mentioned University collaboration a minute ago. From a methodology standpoint. Give me an insight into ... You're getting data from these various sources, largely Excel, where we know with Excel comes humans, comes sometimes, "Oops". How are you working with universities to help evaluate the data or what are some of the methodologies that they're recommending, given the data sources and the tools that you have? >> So there's really two stages that the data go through and the first one is within the groups themselves, who do that first layer of verification, and that is the human verification prior to, kind of all the risks of data entry problems. And so they're doing the on the ground, making sure that they've collected and confirmed that information, but then you're absolutely right, we get this data that's been hand entered and with all of the risks and potential down sides of hand entered data and so primarily what we do is fairly conventional data processing and data cleaning to just check for things like outliers, contradictory information. We'll do that using Python and using R. And then our friends and colleagues in academia, where they're really helping us out is, because there are these multiple sources collecting names of individual victims, what we have is a record linkage problem. And so we have multiple records that refer to the same individual. >> Okay. >> And so we work a lot with our academic partners to stay on top of the latest ways to de-duplicate databases, that might have multiple entries that refer to the same person. And so that's been really great lately. >> Okay. What are some of the methods that you've used in Syria to quantify mass violence and what have some of the outcomes been to date? >> So we rely primarily on methods from record linkage and that gets us to what we know and can observe. And then from there we need to build an estimate, what we don't know and what we can't observe, because inevitably in conflict violence, some of that violence is hidden. Some of those victims have not been identified or their stories have not been told yet. And it's our job as data scientists to use the tools at our disposal to estimate how much we don't know. And so for that step we use a class of statistical tools, called multiple systems estimation. And essentially what that does, is it builds on the patterns of data as they're collected by these multiple sources to model what the underlying population must have been. To generate what we were able to see. >> Okay. >> And so that's been the primary analysis we've done in Syria. And what we found from that analysis, is that as valuable and important as the documented data are, they often are overwhelmed, for example when violence peaks. It may be too dangerous and it may be impossible to accurately record how many people have been killed. >> Okay. >> And so we need a statistical model that can help us identify when data we observe seem to plateau, but perhaps our estimates tell us no, in fact that was a very violent period. And then we can dig in with field experts and interpret, was that a time when we know that territorial control was in contention. Or was that a time when we know, that there were clashes between certain groups. And so then we can infer further from that about responsibility for violence. >> So applying some additional attributers. Things that are attributing to this. What are some of the differences that you think that this has made so far? >> What I hope this has done so far, is simply to raise awareness about the scale of the violence that's happening in Syria. And what I hope ultimately, is that it helps to attribute accountability to those who are responsible for this violence. >> You've also got some projects going on in Guatemala. Can you share a little bit about that? >> We do. Yeah, we have a couple of projects in Guatemala. The one that I've worked on most closely, is looking at the historic archive of the national police in Guatemala. And that's actually the project that I started working on when I joined HRDAG. And Guatemala suffered an armed internal conflict from 1960 to 1996. And during that time period, many witnesses came forward and said that the national police force participated in the violence, but at the time that the UN, the United Nations broke our peace treaties, they weren't able to find any documentary evidence of the role the police played. And then in 2005, quite by accident, this archive, that's this cache of the police forces bureaucratic documents was discovered. And so we've been studying it since then. And it's been this really fascinating problem, if you have this building full of millions and millions and millions of pieces of paper, that are not really organized in any way. And how do you go about studying that? And so we partnered with other experts from the American Statistical Association, to design a random sample of the archive, so that we could learn about it as quickly as possible. >> What are some of the learnings that you've discovered so far? >> What we've discovered so far is just the sheer magnitude of the archive and in particular the amount of documents that were generated during the conflict. And then the other thing that we have discovered is the communication flow. The pattern of documents being sent to and from leadership the National Police Force. And specifically, Patrick Ball testified about that communication flow, to help establish command responsibility for the former chief of police, for a kidnapping that occurred in 1984. >> Wow, incredibly impactful work. But you've got some things on the domestic frontier. With us a little bit about what you're working on stateside. >> We do, yeah. In the past year, we've started our first US based project, which we're really excited about. And it's looking at the algorithms that are being used both in predictive policing and in criminal justice risk assessment. So decisions like whether or not someone should get bail or pre trial hearings, things like that. And we've been working with partners, primarily lawyers, to help assess, sort of, how are those algorithms working and what's the underlying data that's being fed into those algorithms. And what's the ways in which that data are biased. And so the algorithms are replicating the bias that exists in the data. >> Tell me, how does that conversation go, as a statistician with a lawyer, who is, you know, a business person. What sort of educating do you need to do to them about the impact that this data can make and how imperative it is that it'd be accurate. >> Yeah, well those conversations are really interesting, because there's so much education going in both directions. Where both we are helping them to turn their substantive question into an analytical question and sort of develop it in a way that we can do an analysis to get at that question, but then they're also helping us to understand, what's the way in which this information needs to be conveyed, so that it holds up in court, and so that it establish some sort of precedence, so that they can make policy change. >> It makes me think of, sort of the topic or the skill of communication. A number of our guests this morning on the program and those that we've heard speaking today, talk about the traditional data scientist skills. You know hybrid, hacker, someone that has statistics, mathematical skills, but now really looking at somebody who also has to have other behavioral skills. Be able to be creative, interpretive, but also to communicate it. I'd love to get your perspective as you've seen data science evolve in your own career. How have you maybe trained your team on the importance of communicating this information, so that it has a value and it has impact? >> Absolutely. I think creativity and communication are probably the two most important skills for a data scientist to have these days and that's definitely something that on our team, you know, it's always a painful process, but every time we give a talk, if we're fortunate enough that it's been videoed, we always have to go back and watch that. And I recommend to my teammates to do it quietly at home alone, maybe with their preferred beverage of choice, but that's the way that you learn and you discover, oh I could have said that differently or I could have said that another way, or I could have thought about a different way to present that, because I do think that that's absolutely vital. >> I'm just curious what you're perspective is from a curriculum standpoint, we've got a lot of students here, we've got some professors here. Is there something that you would recommend as part of ... Look back to your education. Would you think, you know what, being able to understand statistics is one thing, I need to be able to communicate it. Was that something that was part of your curriculum or something that you think, you know what, that's a vital component of this? >> It's absolutely a vital component. It was not part of my formal curriculum, but it was something that I got out of graduate school, because I was very lucky that I got to teach, essentially statistics 101 to introductory Public Health students. So they were graduate students, but there were a lot of students who maybe hadn't had a math class in a decade and were fairly math phobic. >> Lisa: Sounds like me. (both laughing) >> We could, you know, hold hands and get through it together. >> Okay, oh good. Beverage of my choice, awesome. (laughs) >> Exactly. And I really feel like that was what improved my communication skills, was experience with those students and thinking about how to convey the information to that class and going in day after day and designing that curriculum and really thinking about how to teach that class, is really the way that I have learned my communication skills. >> Oh that's fab. That real world experience, there's nothing that beats that. What are some of the things that have excited you about participating in (mumbles) this year? >> Oh my gosh, it is so much fun to be in an audience and to speak to an audience, that is so predominantly female. I mean of course, that's not something that we get to do very often. And so young, I mean this audience is really full of very energetic, ready to go tackle the world's problems women and it's very invigorating for me. It helps me to kind of go back and think, alright how can we do more and do bigger and create more opportunities for these folks to fill? >> It's a very symbiotic relationship, I think. They learn so much from you and you're learning so much from them. It's really nice. You can feel it. Right, you can feel it here in this environment. >> Absolutely. >> Well, Megan, thank you so much for joining us on the program today. We wish you the best of luck with HRDAG and your impending new little girl. >> Thank you. (laughs) I appreciate that. >> Absolutely. Well we thank you for watching the Cube. Again, we're live at the Women and Data Science Conference at Stanford University, second annual event. Stick around, we'll be right back. (upbeat music)

Published Date : Feb 3 2017

SUMMARY :

It's the Cube covering are at the second annual It's so exciting to have you here. school is the direction you should go in", and some guts to go, It's true, yeah. So talk to us about some of the ... And so that was about 25 years ago. What are some of the things And to answer those questions, you have to that at the end of the day say for the conflict in Syria. and the Violations Documentation Center. and the tools that you have? and that is the human And so we work a lot of the outcomes been to date? And so for that step we use And so that's been the primary analysis And so then we can infer further from that Things that are attributing to this. is that it helps to Can you share a little bit about that? forward and said that the that we have discovered on the domestic frontier. that exists in the data. the impact that this data can and so that it establish so that it has a value and it has impact? that's the way that you learn or something that you that I got to teach, Lisa: Sounds like me. We could, you know, hold hands Beverage of my choice, awesome. that was what improved What are some of the things and to speak to an audience, They learn so much from you and you're the program today. I appreciate that. Well we thank you for watching the Cube.

ENTITIES

Entity	Category	Confidence
Megan	PERSON	0.99+
Patrick Ball	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Patrick	PERSON	0.99+
2005	DATE	0.99+
Ohio	LOCATION	0.99+
Guatemala	LOCATION	0.99+
American Statistical Association	ORGANIZATION	0.99+
Lisa	PERSON	0.99+
El Salvador	LOCATION	0.99+
Patrick Ball	PERSON	0.99+
1984	DATE	0.99+
Megan Price	PERSON	0.99+
National Police Force	ORGANIZATION	0.99+
Syria	LOCATION	0.99+
American Association for the Advancement of Science	ORGANIZATION	0.99+
Syrian Network for Human Rights	ORGANIZATION	0.99+
2013	DATE	0.99+
Violations Documentation Center	ORGANIZATION	0.99+
United Nations	ORGANIZATION	0.99+
Damascus Center for Human Rights Studies	ORGANIZATION	0.99+
Excel	TITLE	0.99+
Syrian Center for Statistics and Research	ORGANIZATION	0.99+
1960	DATE	0.99+
HRDAG	ORGANIZATION	0.99+
first	QUANTITY	0.99+
nine years	QUANTITY	0.99+
two	QUANTITY	0.99+
1996	DATE	0.99+
Python	TITLE	0.99+
US	LOCATION	0.99+
each	QUANTITY	0.99+
Stanford University	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Human Rights Data and Analysis Group	ORGANIZATION	0.99+
millions	QUANTITY	0.98+
UN	ORGANIZATION	0.98+
one	QUANTITY	0.97+
Women in Data Science Conference	EVENT	0.97+
today	DATE	0.97+
past year	DATE	0.97+
Women and Data Science Conference	EVENT	0.97+
#WiDS2017	EVENT	0.97+
90s	DATE	0.96+
Women in Data Science Conference	EVENT	0.96+
two stages	QUANTITY	0.96+
first one	QUANTITY	0.95+
first layer	QUANTITY	0.94+
both directions	QUANTITY	0.94+
this morning	DATE	0.93+
Stanford University	LOCATION	0.93+
millions of pieces	QUANTITY	0.91+
Benetech	ORGANIZATION	0.91+
Public Health school	ORGANIZATION	0.9+
Women in Data Science 2017	EVENT	0.9+
this year	DATE	0.88+
2017	DATE	0.86+
about 25 years ago	DATE	0.85+
Human Rights Data Analysis Group	ORGANIZATION	0.81+
second annual	QUANTITY	0.81+
Public Health school	ORGANIZATION	0.81+
HRDAG	PERSON	0.8+
101	QUANTITY	0.78+
human rights	ORGANIZATION	0.77+
one thing	QUANTITY	0.76+
Cube	ORGANIZATION	0.74+
Paul Walter	LOCATION	0.73+
2000s	DATE	0.72+
couple	QUANTITY	0.68+
paper	QUANTITY	0.65+
a minute	DATE	0.64+
analysis	ORGANIZATION	0.55+
Dr.	PERSON	0.53+

Finale Doshi-Velez, Harvard University | Women in Data Science 2017

>> Announcer: Live, from Stanford University, it's theCUBE, covering the Women in Data Science Conference 2017. (upbeat music) >> Hi and welcome back to theCUBE, I'm Lisa Martin and we are at Stanford University for the second annual Women in Data Science Conference. Fantastic event with leaders from all different industries. Next we're joined by Finale Doshi-Velez. You are the Associate Professor of Computer Science at Harvard University. Welcome to the program. >> Excited to be here. >> You're a technical speaker so give us a little bit of insight as to what some of the attendees, those that are attending live and those that are watching the livestream across 75 locations. What are some of the key highlights from your talk that they're going to learn? >> So my main area is working on machine learning for healthcare applications and what I really want people to take away from my talk is all the needs and opportunities there are for data science to benefit patients in very very tangible ways. There's so much power that you can use with data science these days and I think we should be applying it to problems that really matter, like healthcare. >> Absolutely, absolutely. So talking about healthcare you kind of see the intersection, that's your big focus, is the intersection of machine learning and healthcare. What does that intersection look like from a real world applicability perspective? What are some of the big challenges? And can you talk about maybe specific diseases that you're maybe working on-- >> Sure, absolutely. So I'll tell you about two examples. One example that we're working on is with autism spectrum disorder. And as the name suggests, it's a really broad spectrum. And so things that might work well for one sort of child might not work for a different sort of child. And we're using big data and machine learning to figure out what are the natural categories here and once we can divide this disease into subgroups, we can maybe do better treatment, better prognosis for these children, rather than lumping them into this big bucket-- >> Lisa: And treating everybody the same? >> Exactly. >> Lisa: Right. >> And another area we're working on is personalizing treatment selection for patients with HIV and with depression. And again, in these cases, there's a lot of heterogeneity in how people respond to the diseases. >> Lisa: Right. >> And with the large data sets that we now have available, we actually have huge opportunities in getting the right treatments to the right people. >> That's fantastic, so exciting. And it's really leveraging data as a change agent to really improve the lives of patients. >> Finale: Absolutely. >> From a human interaction perspective, we hear that machine learning is going to replace jobs. It's really kind of a known fact. But human insight is still quite important. Can you share with us-- >> Finale: Absolutely. >> where the machines and the humans come into play to help some of these dis-- >> Yes, so a big area that we work on is actually in formalizing notions of interpretability because in the healthcare setting, the data that I use is really really poor quality. There's lots of it. It's collected in a standard of care everyday but it's biased, it's messy. And you really need the clinician to be able to vet the suggestions that the agent is making. Because there might be some bias, some confounder, some reason why the suggestions actually don't make sense at all. And so a big area that we're looking at is how do you make these algorithms interpretable to domain experts such as clinicians, but not data experts. And so this is a really important area. And I don't see that clinician being replaced anytime soon in this process. But what we're allowing them to do is look at things that they couldn't look at before. They're not able to look at the entire patient's record. They certainly can't look at all the patient records for the entire hospital system when making recommendations. But they're still going to be necessary because you also need to talk to the patient and figure out what are their needs, do they care about a drug, that might cause weight gain for example, when treating depression. And all of these sorts of things. Those are not factors again that the machine are going to be able to take over. >> Lisa: Right. >> But it's really an ecosystem where you need both of these agents to get the best care possible. >> Got it, that's interesting. From an experimentation perspective, are you running these different experiments simultaneously, how do you focus your priorities, on the autism side, on the depression side? >> I see, well I have a lab, so that helps makes things easy. >> Lisa: Yup, you got it. >> I have some students working on some projects-- >> Lisa: Excellent >> And some students working on other projects, And we really, we follow the data. My collaborations are largely chosen based on areas where there are data available and we believe we can make an impact. >> Fantastic, speaking of your students, I'd love to understand a little bit more. You teach computer science to undergrads. >> Yes. >> As we look at how we're at this really inflection point with data science; there's so much that can be done in that, to your point, in tangible ways the differences that we can make. Kids that are undergrads at Harvard these days grew up with technology and the ability to get something like that; we didn't. So what are some of the things that have influenced them to want to become the next generation of computer or data scientists? >> I mean, I think most of them just realize that computers and data are essential in whatever field they are. They don't necessarily come to Harvard thinking that they're going to become data scientists. But in whatever field that they end up in, whether it's economics or government, they quickly realize, or business, they quickly realize that data is very important. So they end up in my undergraduate machine learning course. And for these students, my main focus is just to teach them, what the science, what the field can do, and also what the field can't do. And teach them that with great power comes great responsibility. So we're really focused on evaluation and just understanding on how to use these methods properly. >> So looking at kind of traditional computer, data science skills: data analytics, being able to interpret, mathematics, statistics, what are some of the new emerging skills that the future generation of data and computer scientists needs to have, especially related to the social skills and communication? >> So I think that communication is absolutely essential. At Harvard, I think we're fortunate because most of these people are already in a different field. They're also taking data science so they're already very good at communicating. >> Lisa: Okay. >> Because they're already thinking about some other area they want to apply in. >> So they've got, they're getting really a good breadth. >> They're getting a really great breadth, but in general, I think it is on us, the data scientists, to figure out how do we explain the assumptions in our algorithms to people who are not experts again in data science, because that could have really huge downstream effects. >> Absolutely. I like what you said that these kids understand that the computers and technology are important whatever they do. We've got a great cross section of speakers at this event that are people of, that are influencing this in retail, in healthcare, in education, and as well as in sports technology, on the venture capital side. And it really shows you that this day and age, everything is technology, every company we're in, we're sitting in Silicon Valley of course, where a car company is a technology company. But that's a great point that the next generation understands that it's prolific. I can't do anything without understanding this and knowing how to communicate it. So from your background perspective, were you a STEM kid from way back and you really just loved math and science? Is that what shaped your career? >> So I grew up in a family with like 15 generations back, accounting, finance, small business, and I was like, I'm never going to do any of this. (Lisa laughs) I am going to do something completely different. >> Lisa: You were determined, right. >> And so now I'm a data scientist. (laughing) >> At Harvard, that's pretty good, they must be proud. >> Working on healthcare applications. So I think numbers were definitely very much part of my upbringing, from the beginning. But one thing that I think did take a while for me to put together is that I came from a family where my great uncle was part of India's independence movement. My role models were people like Martin Luther King and Mother Teresa and I liked numbers. >> Lisa: Yeah. >> And, like how to put those together? And I think it definitely took me a while to figure out okay, how do you deliver those warm fuzzies with like cold hard facts. >> Lisa: Right. >> And I'm really glad that we're in a place today where the sort of skills that I have can be used to do enormous social good. >> What are some of the things that you're most excited about about this particular conference and being involved here? >> So I think conferences like these, like the Women in Data Science, I'm also involved in the Women in Machine Learning Conference, are a tremendous opportunity for people to find mentors and cohorts. So I went to my first Women in Machine Learning Conference over 10 years ago, and those are the people I still talk to whenever I need career advice, when I'm trying to figure out what I want to do with my research and what directions, or just general support. And when you're in a field where you maybe don't see that many women around you, it's great to have this connection so that you can draw on that wherever you end up. Your workplace may or may not have that many women but you know that they're out there and you can get support. >> Now that there's so much data available, a lot of the spirit of corporations that use data as a change agent have adopted cultures or tried, of try it, it might fail, but we're going to learn something from this. Do you see that mentality in your students about being free or being confident enough to try experiments and if they fail, take learnings from it and move forward as a positive? >> I mean, certainly that's what I try to teach my students. >> Lisa: Yeah, yeah. >> My graduate students I tell them, I expect you to make consistent progress. Progress includes failure if you can explain why it failed. And that's huge, that's how we learn and that's how we develop new algorithms, absolutely. >> Yeah, and I think that confidence is a key factor. You mention that Women in Machine Learning Conference, you've been involved in that for 10 years, how have you seen women's perspectives, maybe confidence evolve and change and grow as a result of this continued networking? Are you seeing people become more confident-- >> Finale: I think so. >> To be able to try things and experiments. >> I mean certainly, as people stay involved in the field, I've noticed that you develop that network, you develop that confidence, it's amazing. The first event had less than a hundred people. The last event that we had had over 500 people. The number of people at just the Women in Machine Learning event, was the same as the number of people at the entire conference 10 years ago. >> Right. >> Right, and so the field has grown but the number of women involved that you see through this events like WIDS and WIML I think is enormous. >> And the great thing that's happening here at WIDS 2017 is it's being live streamed. >> Finale: Right. >> Over 75 locations. >> So it's accessible to so many people. >> Exactly. >> Yes. >> We're expecting up to 6,000 people on the live stream. So the reach and the extension is truly global. >> Which is fantastic. >> It is fantastic and just the breadth of speakers that are here to influence. You mentioned a couple of your key influencers: Martin Luther King and Mother Teresa. From an education perspective, when you were trying to figure out your love of math and numbers and that, who were some of the people in your early career that were really inspiring and helped you gain that confidence that you would need to do what you're doing? >> So I think if I had to pick one person, it was probably a professor at MIT that I interacted quite a bit in my undergrad and continued to mentor me, Leslie Kaelbling, who is just absolutely fearless in just telling people to follow their passions. Because we really are super privileged as was mentioned earlier: we lose our jobs, we can just get another one. >> Lisa: Right. >> Right? And our skills are so in need that we can and we should try to do amazing things that we care about. And I think that message really stayed with me. >> Absolutely. >> So you got research going on in autism. You mentioned depression. What's next for you? What are some of your next interests? Cancer research, other things like that? >> So I'm actually really interested in mental health because I think that that's, you know, talk about messy spaces, in terms of data. (laughing) It's very hard to quantify but it has a huge, huge burden both to the people who suffer from mental health disorders, which is like close to 15 percent, 20 percent, depending on how you count. But also it has a huge burden on everyone else too, on like lost work, on the people around them. And so we're working with depression and autism, as I mentioned. And we're hoping to branch out into other neurodevelopmental disorders, as well as adult psychiatric disorders. And I feel like in this phase, it's even harder to find the right treatments. And the treatments take so long to test, six to eight weeks. And it can be so hard to keep up the morale, to keep trying out a treatment when your disorder is one that makes it hard to keep up trying whatever you need to try. >> Lisa: Right. >> So that's an area that I'm really focusing on these days. >> Well then your passion is clearly there. That intersection of machine learning and healthcare. You're right, you're talking about something that maybe isn't talked about nearly as much as some of other big diseases but it's one that is prolific. It affects so many. And it's exciting to know that there are people out there like you who really have a passion for that and are using data as a change agent to help current generations and future to come. So Finale, such a pleasure to have you on theCUBE. We wish you the best of luck in your technical talk and know that you're going to be mentoring a lot of people from far and wide. >> Thank you, my pleasure to be here. >> Absolutely, so I'm Lisa Martin. You've been watching theCUBE. We are live at the Women in Data Science Conference at Stanford University, but stick around, we'll be right back. (upbeat music)

Published Date : Feb 3 2017

SUMMARY :

covering the Women in Data Welcome to the program. that they're going to learn? There's so much power that you can use What are some of the big challenges? to figure out what are the in how people respond to the diseases. that we now have available, to really improve the lives of patients. is going to replace jobs. And so a big area that we're looking at both of these agents to how do you focus your that helps makes things easy. And we really, we follow the data. You teach computer science to undergrads. the ability to get something focus is just to teach them, At Harvard, I think we're fortunate about some other area So they've got, they're the data scientists, to figure out that the computers and technology I am going to do something And so now I'm a data scientist. At Harvard, that's pretty is that I came from a And, like how to put those together? that we're in a place today are the people I still talk to a lot of the spirit of corporations I mean, certainly that's And that's huge, that's how we learn You mention that Women in To be able to try I've noticed that you that you see through this And the great thing that's So the reach and the that are here to influence. So I think if I had to pick one person, that we can and we should So you got research going on in autism. that makes it hard to keep up So that's an area that I'm And it's exciting to know We are live at the Women

ENTITIES

Entity	Category	Confidence
Leslie Kaelbling	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Lisa	PERSON	0.99+
20 percent	QUANTITY	0.99+
both	QUANTITY	0.99+
10 years	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
six	QUANTITY	0.99+
75 locations	QUANTITY	0.99+
One example	QUANTITY	0.99+
10 years ago	DATE	0.99+
MIT	ORGANIZATION	0.99+
Mother Teresa	PERSON	0.99+
Martin Luther King	PERSON	0.99+
over 500 people	QUANTITY	0.99+
first event	QUANTITY	0.98+
eight weeks	QUANTITY	0.98+
WIDS 2017	EVENT	0.98+
15 generations	QUANTITY	0.98+
Stanford University	ORGANIZATION	0.98+
Finale Doshi-Velez	PERSON	0.98+
Harvard	ORGANIZATION	0.98+
Harvard University	ORGANIZATION	0.98+
Women in Data Science Conference 2017	EVENT	0.97+
less than a hundred people	QUANTITY	0.97+
Women in Data Science Conference	EVENT	0.96+
Women in Machine Learning Conference	EVENT	0.96+
Women in Machine Learning	EVENT	0.95+
one sort	QUANTITY	0.95+
up to 6,000 people	QUANTITY	0.95+
WIDS	EVENT	0.95+
Women in Machine Learning Conference	EVENT	0.95+
Over 75 locations	QUANTITY	0.94+
one thing	QUANTITY	0.93+
today	DATE	0.92+
WIML	EVENT	0.92+
Women in Machine Learning	EVENT	0.91+
one person	QUANTITY	0.91+
HIV	OTHER	0.87+
Stanford University	ORGANIZATION	0.85+
first	QUANTITY	0.85+
Women in Data Science	EVENT	0.84+
two examples	QUANTITY	0.76+
Women in Data Science	EVENT	0.76+
autism spectrum disorder	OTHER	0.75+
India	ORGANIZATION	0.75+
15 percent	QUANTITY	0.72+
2017	DATE	0.69+
Stanford University	LOCATION	0.62+
theCUBE	ORGANIZATION	0.61+
over	DATE	0.59+
second annual	QUANTITY	0.58+
close	QUANTITY	0.57+
couple	QUANTITY	0.52+

Janet George, Western Digital | Women in Data Science 2017

>> Male Voiceover: Live from Stanford University, it's The Cube covering the Women in Data Science Conference 2017. >> Hi, welcome back to The Cube, I'm Lisa Martin and we are live at Stanford University at the second annual Women in Data Science Technical Conference. It's a one day event here, incredibly inspiring morning we've had. We're joined by Janet George, who is the chief data scientist at Western Digital. Janet, welcome to the show. >> Thank you very much. >> You're a speaker at-- >> Very happy to be here. >> We're very happy to have you. You're a speaker at this event and we want to talk about what you're going to be talking about. Industrialized data science. What is that? >> Industrialized data science is mostly about how data science is applied in the industry. It's less about more research work, but it's more about practical application of industry use cases in which we actually apply machine learning and artificial intelligence. >> What are some of the use cases at Western Digital for that application? >> One of the use case that we use is, we are in the business of creating new technology nodes and for creating new technology nodes we actually create a lot of data. And with that data, we actually look at, can we understand pattern recognition at very large scale? We're talking millions of wafers. Can we understand memory holes? The shape, the type, the curvature, circularity, radius, can we detect these patterns at scale? And then how can we detect if the memory hole is warped or deformed and how can we have machine learning do that for us? We also look at things like correlations during the manufacturing process. Strong correlations, weak correlations, and we try to figure out interactions between different correlations. >> Fantastic. So if we look at big data, it's probably applicable across every industry. How has it helped to transform Western Digital, that's been an institution here in Silicon Valley for a while? >> We in Western Digital we move mountains of data. That's just part of our job, right? And so we are the leaders in storage technology, people store data in Western Digital products, and so data's inherently very familiar to us. We actually deal with data on a regular basis. And now we've started confronting our data with data science. And we started confronting our data with machine learning because we are very aware that artificial intelligence, machine learning can bring a different value to that data. We can look at the insides, we can develop intelligence about how we build our storage products. What we do with our storage. Failure analysis is a huge area for us. So we're really tapping into our data to figure out how can we make artificial intelligence and machine learning ingrained in the way we do work. >> So from a cultural perspective, you've really done a lot to evolve the culture of Western Digital to apply the learnings, to improve the values that you deliver to all of your customers. >> Yes, believe it or not, we've become a data-driven company. That's amazing, because we've invested in our own data, and we've said "Hey, if we are going to store the world's data, we need to lead, from a data perspective" and so we've sort of embraced machine learning and artificial intelligence. We've embraced new algorithms, technologies that's out there we can tap into to look at our data. >> So from a machine learning, human perspective, in storage manufacturing, is there still a dependence on human insight where storage manufacturing devices are concerned, or are you seeing the machine learning really, in this case, take more of a lead? >> No, I think humans play a huge role, right? Because these are domain experts. We're talking about Ph.D.'s in material science and device physics areas so what I see is the augmentation between machine learning and humans, and the domain experts. Domain experts will not be able to scale. When the scale of wafer production becomes very large. So let's talk about 3 million wafers. How is a machine going to physically look at all the failure patterns on those wafers? We're not going to be able to scale just having domain expertise. But taking our core domain expertise and using that as training data to build intelligence models that can inform the domain expert and be smart and come up with all the ideas, that's where we want to be. >> Excellent. So you talked a little bit about the manufacturing process. Who are some of the other constituents that you collaborate with as chief data scientist at Western Digital that are demanding access to data, marketing, etcetera, what are some of those key collaborators for your group? >> Many of our marketing department, as well as our customer service department, we also have collaborations going on with universities, but one of the things we found out was when a drive fails, and it goes to our customer, it's much better for us to figure out the failure. So we've started modeling out all the customer returns that we've received, and look at that and see "How can we predict the life cycle of our storage?" And get to those return possibilities or potential issues before it lands in the hands of customers. >> That's excellent. >> So that's one area we've been focusing quite a bit on, to look at the whole life cycle of failures. >> You also talked about collaborating with universities. Share a little bit about that in terms of, is there a program for internships for example? How are you helping to shape the next generation of computer scientists? >> We are very strongly embedded in universities. We usually have a very good internship program. Six to eight weeks, to 12 weeks in the summer, the interns come in. Ours is a little different where we treat our interns as real value add. They come in, and they're given a hypothesis, or problem domain that they need to go after. And within six to eight weeks, and they have access to tremendous amounts of data, so they get to play with all this industry data that they would never get to play with. They can quickly bring their academic background, or their academic learning to that data. We also take really hard research-ended problems or further out problems and we collaborate with universities on that, especially Stanford University, we've been doing great collaborations with them. I'm super encouraged with Feliz's work on computer vision, and we've been looking into things around deep neural networks. This is an area of great passion for me. I think the cognitive computing space is just started to open up and we have a lot to learn from neural networks and how they work and where the value can be added. >> Looking at, just want to explore the internship topic for a second. And we're at the second annual Women in Data Science Conference. There's a lot of young minds here, not just here in person, but in many cities across the globe. What are you seeing with some of the interns that come in? Are they confident enough to say "I'm getting access to real world data I wouldn't have access to in school", are they confident to play around with that, test out a hypothesis and fail? Or do they fear, "I need to get this right right away, this is my career at stake?" >> It's an interesting dichotomy because they have a really short time frame. That's an issue because of the time frame, and they have to quickly discover. Failing fast and learning fast is part of data science and I really think that we have to get to that point where we're really comfortable with failure, and the learning we get from the failure. Remember the light bulb was invented with 99% negative knowledge, so we have to get to that negative knowledge and treat that as learning. So we encourage a culture, we encourage a style of different learning cycles so we say, "What did we learn in the first learning cycle?" "What discoveries, what hypothesis did we figure out in the first learning cycle, which will then prepare our second learning cycle?" And we don't see it as a one-stop, rather more iterative form of work. Also with the internships, I think sometimes it's really essential to have critical thinking. And so the interns get that environment to learn critical thinking in the industry space. >> Tell us about, from a skills perspective, these are, you can share with us, presumably young people studying computer science, maybe engineering topics, what are some of the traditional data science skills that you think are still absolutely there? Maybe it's a hybrid of a hacker and someone who's got, great statistician background. What about the creative side and the ability to communicate? What's your ideal data scientist today? What are the embodiments of those? >> So this is a fantastic question, because I've been thinking about this a lot. I think the ideal data scientist is at the intersection of three circles. The first circle is really somebody who's very comfortable with data, mathematics, statistics, machine learning, that sort of thing. The second circle is in the intersection of implementation, engineering, computer science, electrical engineering, those backgrounds where they've had discipline. They understand that they can take complex math or complex algorithms and then actually implement them to get business value out of them. And the third circle is around business acumen, program management, critical thinking, really going deeper, asking the questions, explaining the results, very complex charts. The ability to visualize that data and understand the trends in that data. So it's the intersection of these very diverse disciplines, and somebody who has deep critical thinking and never gives up. (laughs) >> That's a great one, that never gives up. But looking at it, in that way, have you seen this, we're really here at a revolution, right? Have you seen that data science traditionalist role evolve into these three, the intersection of these three elements? >> Yeah, traditionally, if you did a lot of computer science, or you did a lot of math, you'd be considered a great data scientist. But if you don't have that business acumen, how do you look at the critical problems? How do you communicate what you found? How do you communicate that what you found actually matters in the scheme of things? Sometimes people talk about anomalies, and I always say "is the anomaly structured enough that I need to care about?" Is it systematic? Why should I care about this anomaly? Why is it different from an alert? If you have modeled all the behaviors, and you understand that this is a different anomaly than I've normally seen, and you must care about it. So you need to have business acumen to ask the right business questions and understand why that matters. >> So your background in computer science, your bachelor's Ph.D.? >> Bachelor's and master's in computer science, mathematics, and statistics, so I've got a combination of all of those and then my business experience comes from being in the field. >> Lisa: I was going to ask you that, how did you get that business acumen? Sounds like it was by in-field training, basically on-the-job? >> It was in the industry, it was on-the-job, I put myself in positions where I've had great opportunities and tackled great business problems that I had to go out and solve, very unique set of business problems that I had to dig deep into figuring out what the solutions were, and so then gained the experience from that. >> So going back to Western Digital, how you're leveraging data science to really evolve the company. You talked about the cultural evolution there, which we both were mentioning off-camera, is quite a feat because it's very challenging. Data from many angles, security, usage, is a board level, boardroom conversation. I'd love to understand, and you also talked about collaboration, so talk to us a little bit about how, and some of the ways, tangible ways, that data science and your team have helped evolve Western Digital. Improving products, improving services, improving revenue. >> I think of it as when an algorithm or a machine learning model is smart, it cannot be a threat. There's a difference between being smart and being a threat. It's smart when it actually provides value. It's a threat when it takes away or does something you would be wanting to do, and here I see that initially there's a lot of fear in the industry, and I think the fear is related to "oh, here's a new technology," and we've seen technologies come in and disrupt in a major way. And machine learning will make a lot of disruptions in the industry for sure. But I think that will cause a shift, or a change. Look at our phone industry, and how much the phone industry has gone through. We never complain that the smart phone is smarter than us. (laughs) We love the fact that the smartphone can show us maps and it can send us in the right, of course, it sends us in the wrong direction sometimes, most of the time it's pretty good. We've grown to rely on our cell phones. We've grown to rely on the smartness. I look at when technology becomes your partner, when technology becomes your ally, and when it actually becomes useful to you, there is a shift in culture. We start by saying "how do we earn the value of the humans?" How can machine learning, how can the algorithms we built, actually show you the difference? How can it come up with things you didn't see? How can it discover new things for you that will create a wow factor for you? And when it does create a wow factor for you, you will want more of it, so it's more, to me, it's most an intent-based progress, in terms of a culture change. You can't push any new technology on people. People will be reluctant to adapt. The only way you can, that people adopt to new technologies is when they the value of the technology instantly and then they become believers. It's a very grassroots-level change, if you will. >> For the foreseeable future, that from a fear perspective and maybe job security, that at least in the storage and manufacturing industry, people aren't going to be replaced by machines. You think it's going to maybe live together for a very long, long time? >> I totally agree. I think that it's going to augment the humans for a long, long time. I think that we will get over our fear, we worry that the humans, I think humans are incredibly powerful. We give way too little credit to ourselves. I think we have huge creative capacity. Machines do have processing capacity, they have very large scale processing capacity, and humans and machines can augment each other. I do believe that the time when we had computers and we relied on our computers for data processing. We're going to rely on computers for machine learning. We're going to get smarter, so we don't have to do all the automation and the daily grind of stuff. If you can predict, and that prediction can help you, and you can feed that prediction model some learning mechanism by reinforced learning or reading or ranking. Look at spam industry. We just taught the Spam-a-Guccis to become so good at catching spam, and we don't worry about the fact that they do the cleansing of that level of data for us and so we'll get to that stage first, and then we'll get better and better and better. I think humans have a natural tendency to step up, they always do. We've always, through many generations, we have always stepped up higher than where we were before, so this is going to make us step up further. We're going to demand more, we're going to invent more, we're going to create more. But it's not going to be, I don't see it as a real threat. The places where I see it as a threat is when the data has bias, or the data is manipulated, which exists even without machine learning. >> I love though, that the analogy that you're making is as technology is evolving, it's kind of a natural catalyst >> Janet: It is a natural catalyst. >> For us humans to evolve and learn and progress and that's a great cycle that you're-- >> Yeah, imagine how we did farming ten years ago, twenty years ago. Imagine how we drive our cars today than we did many years ago. Imagine the role of maps in our lives. Imagine the role of autonomous cars. This is a natural progression of the human race, that's how I see it, and you can see the younger, young people now are so natural for them, technology is so natural for them. They can tweet, and swipe, and that's the natural progression of the human race. I don't think we can stop that, I think we have to embrace that it's a gift. >> That's a great message, embracing it. It is a gift. Well, we wish you the best of luck this year at Western Digital, and thank you for inspiring us and probably many that are here and those that are watching the livestream. Janet George, thanks so much for being on The Cube. >> Thank you. >> Thank you for watching The Cube. We are again live from the second annual Women in Data Science conference at Stanford, I'm Lisa Martin, don't go away. We'll be right back. (upbeat electronic music)

Published Date : Feb 3 2017

SUMMARY :

it's The Cube covering the Women in I'm Lisa Martin and we are going to be talking about. data science is applied in the industry. One of the use case How has it helped to in the way we do work. apply the learnings, to to look at our data. that can inform the a little bit about the the things we found out quite a bit on, to look at the helping to shape the next started to open up and we but in many cities across the globe. That's an issue because of the time frame, the ability to communicate? So it's the intersection of the intersection of I always say "is the So your background in computer science, comes from being in the field. problems that I had to You talked about the how can the algorithms we built, that at least in the I do believe that the time of the human race, Well, we wish you the We are again live from the second annual

ENTITIES

Entity	Category	Confidence
Janet	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Janet George	PERSON	0.99+
Western Digital	ORGANIZATION	0.99+
Six	QUANTITY	0.99+
Lisa	PERSON	0.99+
99%	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
12 weeks	QUANTITY	0.99+
Stanford University	ORGANIZATION	0.99+
third circle	QUANTITY	0.99+
first circle	QUANTITY	0.99+
twenty years ago	DATE	0.99+
second circle	QUANTITY	0.99+
eight weeks	QUANTITY	0.99+
six	QUANTITY	0.99+
The Cube	TITLE	0.99+
ten years ago	DATE	0.99+
One	QUANTITY	0.98+
three	QUANTITY	0.98+
eight weeks	QUANTITY	0.98+
three circles	QUANTITY	0.98+
Women in Data Science Technical Conference	EVENT	0.98+
this year	DATE	0.98+
Feliz	PERSON	0.97+
Stanford	LOCATION	0.97+
three elements	QUANTITY	0.97+
one	QUANTITY	0.97+
Women in Data Science Conference 2017	EVENT	0.97+
both	QUANTITY	0.96+
Women in Data Science Conference	EVENT	0.96+
many years ago	DATE	0.96+
second learning cycle	QUANTITY	0.96+
Women in Data Science	EVENT	0.96+
one day event	QUANTITY	0.96+
first learning cycle	QUANTITY	0.94+
first learning cycle	QUANTITY	0.93+
today	DATE	0.91+
one area	QUANTITY	0.91+
Women in Data Science conference	EVENT	0.89+
second	QUANTITY	0.88+
millions of wafers	QUANTITY	0.87+
first	QUANTITY	0.86+
one-stop	QUANTITY	0.86+
about 3 million wafers	QUANTITY	0.84+
-a-Guccis	ORGANIZATION	0.81+
The Cube	ORGANIZATION	0.77+
University	ORGANIZATION	0.6+
second annual	QUANTITY	0.56+
2017	DATE	0.51+
Cube	PERSON	0.36+

Claudia Perlich, Dstillery - Women in Data Science 2017 - #WiDS2017 - #theCUBE

>> Narrator: Live from Stanford University, it's theCUBE covering the Women in Data Science Conference 2017. >> Hi welcome back to theCUBE, I'm Lisa Martin and we are live at Stanford University at the second annual Women in Data Science one day tech conference. We are joined by one of the speakers for the event today, Claudia Perlich, the Chief Scientist at Dstillery, Claudia, welcome to theCUBE. >> Claudia: Thank you so much for having me. It's exciting. >> It is exciting! It's great to have you here. You are quite the prolific author, you've won data mining competitions and awards, you speak at conferences all around the world. Talk to us what you're currently doing as the Chief Scientist for Dstillery. Who's Dstillery? What's the Chief Scientist's role and how are you really leveraging data and science to be a change agent for your clients. I joined Dstillery when it was still called Media6Degrees as a very small startup in the New York ad tech space. It was very exciting. I came out of the IBM Watson Research Lab and really found this a new challenging application area for my skills. What does a Chief Scientist do? It's a good question, I think it actually took the CEO about two years to finally give me a job description, (laughter) and the conclusion at that point was something like, okay there is technical contribution, so I sit down and actually code things and I build prototypes and I play around with data. I also am referred to as Intellectual Leadership, so I work a lot with the teams just kind of scoping problems, brainstorming was may work or dozen, and finally, that's what I'm here for today, is what they consider an Ambassador for the company, so being the face to talk about the more scientific aspects of what's happening now in ad tech, which brings me to what we actually do, right. One of the things that happened over the recent past in advertising is it became an incredible playground for data signs because the available data is incomparable to many other fields that I have seen. And so Dstillery was a pioneer in that space starting to look at initially social data things that people shared, but over the years it has really grown into getting a sense of the digital footprint of what people do. And our primary business model was to bring this to marketers to help them on a much more individualized basis identify who their customers current as well as futures are. Really get a very different understanding than these broad middle-aged soccer mom kind of categories to honor the individual tastes and preferences and actions that really truly reflect the variety of what people do. I'm many things as you mentioned, I publish mom, what's a mom, and I have a horse, so there are many different parts to me. I don't think any single one description fully captures that and we felt that advertising is a great space to explore how you can translate that and help both sides, the people that are being interacted with, as well as the brands that want to make sure that they reach the right individuals. >> Lisa: Very interesting. Well, as buyers journey as changed to mostly online, >> Exactly. >> You're right, it's an incredibly rich opportunity for companies to harness more of that behavioral information and probably see things that they wouldn't have predicted. We were talking to Walmart Labs earlier and one of the interesting insights that they shared was that, especially in Silicon Valley where people spend too much time in the car commuting-- (laughter) You have a long commute as well by train. >> Yes. >> And you'd think that people would want, I want my groceries to show up on my doorstep, I don't want to have to go into the store, and they actually found the opposite that people in such a cosmopolitan area as Silicon Valley actually want to go into the store and pick up-- >> Claudia: Yep. >> Their groceries, so it's very interesting how the data actually can sometimes really change. It's really the scientific method on a very different scale >> Claudia: Much smaller. >> But really using the behavior insights to change the shopping experience, but also to change the experience of companies that are looking to sell their products. >> I think that the last part of the puzzle is, the question is no longer what is the right video for the Super Bowl, I mean we have the Super Bowl coming up, right? >> Lisa: Right. Right. >> They did a study like when do people pay attention to the Super Bowl. You can actually tell, cuz you know what people don't do when they pay attention to the Super Bowl? >> Lisa: Mm,hmm. >> They're not playing around with their phones. They're actually not playing-- >> Lisa: Of course. >> Candy Crush and all these things, so what we see in the ad tech environment, we actually see that the demand for the digital ads go down when people really focus on what's going on on the big screen. But that was a diversion ... >> Lisa: It's very interesting (laughter) though cuz it's something that's very tangible and very ... It's a real world applications. Question for you about data science and your background. You mentioned that you worked with IBM Watson. Forbes has just said that Data Scientist is the best job to apply for in 2017. What is your vision? Talk to us about your team, how you've grown that up, how you're using big data and science to really optimize the products that you deliver to your customers. >> Data Science is really many, many different flavors and in some sense I became a Data Scientist long before the term really existed. Back then I was just a particular weird kind of geek. (laughter) You know all of a sudden it's-- >> Now it has a name. (laughter) >> Right and the reputation to be fun and so you see really many different application areas depending very different skillsets. What is originally the focus of our company has always been around, can we predict what people are going to do? That was always the primary focus and now you see that it's very nicely reflected at the event too. All of sudden communicating this becomes much bigger a part of the puzzle where people say, "Okay, I realize that you're really "good at predicting, but can you tell me why, "what is it these nuggets of inside-- >> Interpretation, right. >> "That you mentioned. Can you visualize what's going on?" And so we grew a team initially from a small group of really focused machine learning and predictive skills over to the broader can you communicate it. Can you explain to the customer archieve brands what happened here. Can you visualize data. That's kind of the broader shift and I think the most challenging part that I can tell in the broader picture of where there is a bit of a short coming in skillset, we have a lot of people who are really good today at analyzing data and coding, so that part has caught up. There are so many Data Science programs. What I still am looking for is how do you bring management and corporate culture to the place where they can truly take advantage of it. >> Lisa: Right. >> This kind of disconnect that we still have-- >> Lisa: Absolutely. >> How do we educate the management level to be comfortable evaluating what their data science group actually did. Whether they working on the right problems that really ultimately will have impact. I think that layer of education needs to receive a lot more emphasis compared to what we already see in terms of this increased skillset on just the sheer technical side of it. >> You mentioned that you teach-- >> Claudia: Mm,hmm. >> Before we went live here, that you teach at NYU, but you're also teaching Data Science to the business folks. I would love for you to expand a little bit more upon that and how are you helping to educate these people to understand the impact. Cuz that's really, really a change agent within the company. That's a cultural change, which is really challenging-- >> Claudia: Very much so. >> Lisa: What's their perception? What's their interest in understanding how this can really drive value? >> What you see, I've been teaching this course for almost six years now, and originally it was really kind of the hardcore coders who also happened to get a PhD on the side, who came to the course. Now you increasingly have a very broad collection of business minded people. I typically teach in the part-time, meaning they all have day jobs and they've realized in their day jobs, I need this. I need that. That skill. That knowledge. We're trying to get on the ground where without having to teach them python and ARM whatever the new toys are there. How can you identify opportunities? How do you know which of the many different flavors of Data Science, from prediction towards visualization to just analyzing historical data to maybe even causality. Which of these tools is appropriate for the task at hand and then being able to evaluate whether the level of support that a machine can only bring, is it even sufficient? Because often just because you can analyze data doesn't mean that the reliability of the model is truly sufficient to support then a downstream business project. Being able to really understand those trade offs without necessarily being able to sit down and code it yourself. That knowledge has become a lot more valuable and I really enjoy the brainstorming when we're just trying to scope a project when they come with problems from their day job and say, "Hey, we're trying to do that." And saying, "Are you really trying to do that?" "What are you actually able to execute? "What kind of decisions can you make?" This is almost like the brainstorming in my own company now brought out to much broader people working in hospitals, people working in banking, so I get exposed to all of these kinds of problems said and that makes it really exciting for me. >> Lisa: Interesting. When Dstillery is talking to customer or prospective customers, is this now something that you're finding is a board level conversation within businesses? >> Claudia: No, I never get bored of that, so there is a part of the business that is pretty well understood and executed. You come to us, you give us money, and we will execute a digital campaign, either on mobile phones, on video, and you tell me what it is that you want me to optimize for. Do you want people to click on your ad? Please don't say yes, that's the worst possible things you may ask me to do-- (laughter) But let's talk about what you're going to measure, whether you want people to show up in your store, whether you really care about signing up for a test drive, and then the system automatically will build all the models that then do all the real-time bidding. Advertising, I'm not sure how many people are aware, as your New York Times page loads, every single ad slot on that side is sold in a real-time auction. About 50 billion times a day, we receive a request whether we want to bid on the opportunity to show somebody an ad. >> Lisa: Wow. >> So that piece, I can't make 50 billion decisions a day. >> Lisa: Right. >> It is entirely automated. There's this fully automated machine learning that just serves that purpose. What makes it interesting for me now that ... Now this is kind of standard fare if you want to move over and is more interesting parts. Well, can you for instance predict which of the 15 different creatives I have for Jobani, should I show you? >> Lisa: Mm,hmm. >> The one with the woman running, or the one with the kid opening, so there is no nuances to it and exploring these new challenges or going into totally new areas talking about, for instance churn prediction, I know an awful lot about people, I can predict very many things and a lot of them go far beyond just how you interact with ads, it's almost the most boring part. We can see people researching diabetes. We can provide snapshots to farmer telling them here's really where we see a rise of activity on a certain topic and maybe this is something of interest to understand which population is driving those changes. These kinds of conversations really making it exciting for me to bring the knowledge of what I see back to many different constituents and see what kind of problems we can possibly support with that. >> Lisa: It's interesting too. It sounds like more, not just providing ad technology to customers-- >> Claudia: Yeah. >> You're really helping them understand where they should be looking to drive value for their businesses. >> Claudia: That's really been the focus increasingly and I enjoy that a lot. >> Lisa: I can imagine that, that's quite interesting. Want to ask you a little bit before we wrap up here about your talk today. I was looking at your, the title of your abstract is, "Beware what you ask for: The secret life of predictive models". (laughter) Talk to us about some of the lessons you learn when things have gone a little bit, huh, I didn't expect that. >> I'm a huge fan of predictive modeling. I love the capabilities and what this technology can do. This being said, it's a collection of aha moments where you're looking at this and this, this doesn't really smell right. To give you an example from ad tech, and I alluded to this, when people say, "Okay we want a high click through rate." Yes, that means I have to predict who will click on an ad. And then you realize that no matter what the campaign, no matter what the product, the model always chooses to show the ad on the flashlight app. Yeah, because that's when people fumble in the dark. The model's really, really good at predicting when people are likely to click on an ad, except that's really not what you intended-- >> Right. >> When you asked me to do that. >> Right. >> So it's almost the best and powerful that they move off into a sidetracked direction you didn't even know existed. Something similar happened with one of these competitions that I won. For Siemens Medical where you had to identify an FMI images of breast, which of these regions are most likely benign or which one have cancer. In both models we did really, really well, all was good. Until we realized that the patient ID was by far the most predictive feature. Now this really shouldn't happen. Your social security number shouldn't be able to predict-- >> Lisa: Right. >> Anything really. It wasn't the social security number, but when we started looking a little bit deeper, we realized what had happened is the data set was a sample from different sources, and one was a treatment center, and one was a screening center and they had certain ranges of patient IDs, so the model had learned where the machine stood, not what the image actually contained about the probability of having cancer. Whoever assembled the data set possibly didn't think about the downstream effect this can have on modeling-- >> Right. >> Which brings us back to the data science skill as really comprehensive starting all the way from the beginning of where the data is collected, all the way down to be extremely skeptical about your own work and really make sure that it truly reflects what you want it to do. You asked earlier like what makes really good Data Scientists. The intuition to feel when something is wrong and to be able to pinpoint and trace it back with the curiosity of really needing to understand everything about the whole process. >> Lisa: And also being not only being able to communicate it, but probably being willing to fail. >> Claudia: That is the number one really requirement. If you want to have a data-driven culture, you have to embrace failure, because otherwise you will fail. >> Lisa: How do you find the reception (laughter) to that fact by your business students. Is that something that they're used to hearing or does it sound like a foreign language to them? >> I think the majority of them are in junior enough positions that they-- >> Lisa: Okay. >> Truly embrace that and if at all, they have come across the fact that they weren't allowed to fail as often as they had wanted to. I think once you go into the higher levels of conversation and we see that a lot in the ad tech industry where you have incentive problems. We see a lot of fraud being targeted. At the end of the day, the ad agency doesn't want to confess to the client that yeah they just wasted five million dollars-- >> Lisa: Right. >> Of ad spend on bots, and even the CMO might not be feeling very comfortable confessing that to the CO-- >> Right. >> Claudia: Being willing to truly face up the truth that sometimes data forces you into your face, that can be quite difficult for a company or even an industry. >> Lisa: Yes, it can. It's quite revolutionary. As is this event, so Claudia Perlich we thank you so much for joining us-- >> My pleasure. >> Lisa: On theCUBE today and we know that you're going to be mentoring a lot of people that are here. We thank you for watching theCUBE. We are live at Stanford University from the Women in Data Science Conference. I am Lisa Martin and we'll be right back (upbeat music)

Published Date : Feb 3 2017

SUMMARY :

covering the Women in Data We are joined by one of the Claudia: Thank you so being the face to talk about changed to mostly online, and one of the interesting It's really the scientific that are looking to sell their products. Lisa: Right. to the Super Bowl. around with their phones. demand for the digital ads is the best job to apply for in 2017. before the term really existed. Now it has a name. Right and the reputation to be fun and corporate culture to the the management level to and how are you helping and I really enjoy the brainstorming to customer or prospective customers, on the opportunity to show somebody an ad. So that piece, I can't make Well, can you for instance predict of interest to understand which population ad technology to customers-- be looking to drive value and I enjoy that a lot. of the lessons you learn the model always chooses to show the ad So it's almost the best and powerful happened is the data set was and to be able to able to communicate it, Claudia: That is the Lisa: How do you find the reception I think once you go into the to truly face up the truth we thank you so much for joining us-- from the Women in Data Science Conference.

ENTITIES

Entity	Category	Confidence
Claudia Perlich	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Lisa	PERSON	0.99+
Claudia	PERSON	0.99+
2017	DATE	0.99+
Candy Crush	TITLE	0.99+
Silicon Valley	LOCATION	0.99+
Siemens Medical	ORGANIZATION	0.99+
Dstillery	ORGANIZATION	0.99+
New York	LOCATION	0.99+
Super Bowl	EVENT	0.99+
Super Bowl	EVENT	0.99+
Walmart Labs	ORGANIZATION	0.99+
IBM Watson Research Lab	ORGANIZATION	0.99+
Jobani	PERSON	0.99+
five million dollars	QUANTITY	0.99+
both models	QUANTITY	0.99+
both sides	QUANTITY	0.99+
single	QUANTITY	0.99+
today	DATE	0.99+
15 different creatives	QUANTITY	0.98+
One	QUANTITY	0.97+
#WiDS2017	EVENT	0.97+
about two years	QUANTITY	0.97+
ARM	ORGANIZATION	0.97+
Women in Data Science Conference 2017	EVENT	0.97+
Women in Data Science Conference	EVENT	0.97+
Women in Data Science	EVENT	0.96+
one	QUANTITY	0.96+
Media6Degrees	ORGANIZATION	0.96+
About 50 billion times a day	QUANTITY	0.95+
Forbes	ORGANIZATION	0.95+
Stanford University	ORGANIZATION	0.93+
50 billion decisions a day	QUANTITY	0.92+
Women in Data Science 2017	EVENT	0.92+
Beware what you ask for: The secret life of predictive models	TITLE	0.9+
IBM Watson	ORGANIZATION	0.89+
theCUBE	ORGANIZATION	0.89+
almost six years	QUANTITY	0.88+
one day	QUANTITY	0.86+
Stanford University	ORGANIZATION	0.84+
NYU	ORGANIZATION	0.82+
single ad	QUANTITY	0.72+
python	ORGANIZATION	0.66+
second annual	QUANTITY	0.62+
one of the speakers	QUANTITY	0.61+
New York Times	TITLE	0.6+
dozen	QUANTITY	0.56+

Julie Yoo, Pymetrics - Women in Data Science 2017 - #WiDS2017 - #theCUBE

>> Announcer: Live, from Stanford University, it's theCUBE, covering the Women in Data Science Conference 2017. >> Hi, I'm Lisa Martin, welcome back to theCUBE. We are live at Stanford University at the second annual Women in Data Science Conference, the one-day tech conference and we are joined by Julie Yoo, who is the founder and chief data scientist of Pymetrics. Julie, you were on the customer panel today. So welcome to theCUBE. >> Thank you. >> It's great to have you, it's such an interesting background. >> Julie: Thank you. >> Neuroscience meets engineering, or engineering meets neuroscience. I'd love for us to understand a little bit more about those two, how they're combined, and also, about Pymetrics. But give us a little bit of a background, as a woman in the sciences, how you got to where you are now. >> As you mentioned, my background's in computer engineering and I went into PhD program in electrical and computer engineering 'cause I wanted to study artificial intelligence. I was fascinated by the notion of artificial intelligence. So my research topic started in automatic speech recognition systems, building computers to decode and decipher human speech. After a couple of years, I got frustrated with just the engineering approach or statistical methods-based approach to improving the existing speech recognition systems that are out there, 'cause I thought to myself, We're trying to make computers understand human speech and mimic human function when we don't really understand how our brain works and I don't really know exactly what happens when you listen to you speak, when I listen to you speak and when you listen to I speak, what is going on? We didn't really have a good sense, so I wanted to study neuroscience. So I quit engineering and I went into PhD program in neuroscience and there, I started doing a lot of neuroimaging study, just looking at human cognition and just figuring out what is going on when people perceive and process these signals that are out there. >> And was your idea to eventually marry the two? >> I didn't really think about it that way, but it just sort of happened, as in like, my background in engineering sort of homed me into doing some of the projects that I did when I was doing my PhD and my post-doc. And while I was doing all that, I just evolved to be a data scientist without, really, me realizing I was doing everything that a typical data scientist would do. And this was even before 2008. The job title of data scientist wasn't even around then, so it sort of happened because of where I came from and because what I was interested in and as I was doing that, it just ended up being a good marriage. >> And there it was. Talk to us, tell people what Pymetrics is and what the genesis of this company was. >> Pymetrics is a platform that uses neuroscience-based games and data science to promote predictive and bias-free hiring. How we became a product was because I was going through post-doc and my co-founder was also going through business school and we were both going through the phase of, Okay, we don't want to stay in academia. What do we want to do with our lives? And at the time, we realized a lot of the career-advising tools that are out there were not scientific and they were not data-driven and we felt that there is a clear need for a tool that can actually use all these data that are out there to help people figure out what they should be doing with their lives. So we thought we were uniquely positioned to use our background in engineering and neuroscience and build a product that could actually solve these challenging problem and that's how we started Pymetrics. >> That's fantastic. You started about three years ago in 2013. So, really getting rid of some of the biases, share with us what some of the biases are. Is it test scores, SATs, MCATs, GPAs? >> There are many, many different kinds of biases in hiring process right now, I think. There is a preconception of what an engineer should look like and I think that plays a lot. And when you do going to an interview, how you look and how you dress, it adds to the bias. There is ethnic bias, there's gender bias, and there is bias based on test scores and what school you went to. So we want to remove ourselves from that and really get down to what kind of person you are and are you really... I guess, have the right set of skills to succeed in certain job functions. We do that by measuring, instead of taking your subjective answers from questionnaires, we do that by objectively measuring your behavior and these games are based on neuroscience research so we know that they actually measure things that we want them to measure, for instance, your ability to pay attention, your risk appetite, and all those things that we think matters as to what makes you good at certain things and not so good at some other things. So we use these objective data and data science and predictive modeling to come up with predictions as to how good you will be in certain career versus some other career. >> Really, an incredible need for that. It's game-based, so it's an actual game that people will play that will help understand more of who they are as a person, their behaviors, those patterns. Tell us a little bit about the invention of the game, what was it like, who was it for? >> The games were actually sourced from neuroscience research community. We did not create these games. What we did was we actually just took them from research and medical settings and applied it through hiring. We know that these are relevant to measuring your attributes and your personality, so why not use it for hiring and career advising, because it makes sense. We're trying to measure your qualities, your soft skills and what-not, why not just use it for something that could really benefit from these sort of data. What we did do is we actually made these games, they're not really called games in research community, but we made it shorter and we made it more applicable to the things that we are trying to use if for. >> You took feedback from some of your earlier adopters who were saying maybe it's taking me too long, maybe some of the recruiters might say, they gave you some very viable feedback that have helped you optimize the products. >> Right, as a data scientist, I always think the more data, the better, but that also means that people would have to sit in front of their computers and play an hour-long battery of games and a lot of people were thinking that it might be just a tad too long and companies felt that spending 45 minutes to an hour could be a discouraging thing and people felt fatigue effect and we could see that in the results, so we ended up making it shorter. We went from 20 games to 12 games and we cut it down to 25 minutes long and I think, now, we're in the sweet spot where we do get enough data but, at the same time, we're not making it an hour long. >> Right, so this is really targeted for people coming out of university programs, whether it's bachelor's, master's, doctorate, et cetera, and also, what type of companies who are looking to hire, what's kind of your target market for that? >> I think mostly Fortune 500 companies 'cause a lot of these companies do hire in large volume, so it helps to have us go to these companies and build their models based off of their employees. And if a smaller company comes along and they only have 10 employees in the job function, then it's extremely difficult for us to build the model base off of their 10 employees, whereas if it's a larger corporation, then we can have 200 employees play and we can build the model based on their data. So generally, large corporations is our target clients. >> I'm curious, in terms of some of the data that you are seeing, that you're analyzing, are you seeing, we look at data science as a great example of the event that we're at, in report from Forbes recently that said it's the best job to apply for in 2017. We're looking at now what's going to be happening, predicted over the course of the next year, and that's a shortage in talent. Are you seeing, with some of the data that you're taking in, are you seeing things that are mapping to that, like people that are really geared towards that? Or are you seeing more companies that are looking for computer-industry, data-science type roles? Is that increasing, as well? >> I think companies are definitely looking for more data scientists and I think, also, people are figuring out that there are data science programs like graduate school programs and I think that supply of data scientists is definitely increasing, but at the same time, or more so, the demand for data scientists is increasing. And not to mention, the available data that's out there is increasing at a faster rate than anything else. Yeah, it is, I think, the best time to be a data scientist right now. >> Let me ask you one more question about looking at skills. We have such a great cross-section at this event of leaders in retail, in obviously, what you're doing in neuroscience-gaming-merging world. We've got professors here. Data science is such an interesting topic, it's obviously very horizontal. From a skill set perspective, kind of the traditional skills of being a statistician, mathematics, being a hacker, a lot of the things that we've been hearing around the show today, and really aligns with what you're doing is more on the behavioral insight side of, you have to be able to communicate what you're seeing and be able to apply it. I'd love to understand a profile of an ideal data scientist that you guys are seeing from your data. What are some of the other behavioral attributes that maybe are some of the non-teachable things that you're seeing that really come up that this would be a great career path for someone? >> Personally, I think intellectual curiosity is number one, and they would have to have strong self-motivation and discipline because you could love analyzing data and you could just be doing that for how many days, I don't know, and that's it. You could actually come up with a good story. You've got to be a good storyteller and if you have artistic flair to make the data beautiful, then even better. But it is important to go from the beginning of the project where you have a bunch of data set and actually come up with actionable results that people can use. And you're not only always going to be communicating with a data scientist, so you need to be able to present your data in a more succinct and easily-digestible way. >> That sounds like, as the chief data scientist for Pymetrics, that's what you're looking for to hire on your team. Give us a little bit, last question here, just a little bit of an overview of what your data science team looks like at Pymetrics, as you're helping to leverage this data to give people opportunities with careers. What does your team look like? >> Our team has a very diverse background. We have a few PhD's in Physics and you know, well, I have a PhD in Neuroscience and there's other data scientists with PhD's in Physics. We actually have one guy who majored in Data Science and we have another guy who majored in Bio Engineering. So it's definitely a diverse background. But the general theme is that you do need a good, quantitative foundation. So, whether it's engineering or physics, it is still helpful to have that statistical or analytical mind and if you can actually apply that, and actually love solving problems then I think data scientist is a right goal. >> So you're on the career panel at WiDS2017, is that the advice that you would give to kind of, the next generation of kids that are interested in this but aren't quite sure what industry they would want to go into? >> What industry? I think, I mean if they're even remotely interested in going into data science, I would encourage them to pursue it. I think it is one of the most fascinating fields right now and there's never going to be a shortage of needs for data scientists. So if you like it, if you think you are going to be pretty good at it, I say go for it. >> Fantastic. And you've got a great audience here. This is being live streamed in 20 cities, I think across the globe, or 75 cities, I have to get those stats right. But, there's a big opportunity here to be an influencer and we thank you for spending some time with us. Best of luck on the panel. >> Thank you. >> Thank you for watching. I'm Lisa Martin, we are live with theCUBE, at Women and Data Science 2017, #WiDS2017. Stick around, we'll be right back. (upbeat mellow music)

Published Date : Feb 3 2017

SUMMARY :

covering the Women in Data and we are joined It's great to have you, and also, about Pymetrics. and I don't really know I just evolved to be a and what the genesis of this company was. and we were both going of some of the biases, and what school you went to. the invention of the game, to the things that we that have helped you and a lot of people were and we can build the that are mapping to that, and I think that supply of data scientists and be able to apply it. and if you have artistic flair of an overview of what your Physics and you know, think you are going to be and we thank you for I'm Lisa Martin, we are live with theCUBE,

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
Julie	PERSON	0.99+
20 games	QUANTITY	0.99+
Pymetrics	ORGANIZATION	0.99+
25 minutes	QUANTITY	0.99+
45 minutes	QUANTITY	0.99+
10 employees	QUANTITY	0.99+
2017	DATE	0.99+
20 cities	QUANTITY	0.99+
Julie Yoo	PERSON	0.99+
12 games	QUANTITY	0.99+
10 employees	QUANTITY	0.99+
two	QUANTITY	0.99+
75 cities	QUANTITY	0.99+
2013	DATE	0.99+
200 employees	QUANTITY	0.99+
Stanford University	ORGANIZATION	0.99+
one-day	QUANTITY	0.99+
an hour	QUANTITY	0.98+
#WiDS2017	EVENT	0.98+
next year	DATE	0.98+
WiDS2017	EVENT	0.97+
theCUBE	ORGANIZATION	0.97+
Women in Data Science Conference 2017	EVENT	0.97+
Forbes	ORGANIZATION	0.97+
both	QUANTITY	0.97+
Women in Data Science Conference	EVENT	0.97+
today	DATE	0.96+
one guy	QUANTITY	0.94+
one	QUANTITY	0.93+
one more question	QUANTITY	0.93+
three years ago	DATE	0.9+
2008	DATE	0.88+
couple of years	DATE	0.84+
Stanford University	ORGANIZATION	0.81+
Women and Data Science	EVENT	0.73+
After	DATE	0.7+
in Data Science 2017	EVENT	0.64+
500	QUANTITY	0.63+
Pymetrics	TITLE	0.61+
#theCUBE	ORGANIZATION	0.59+
second annual	QUANTITY	0.59+
conference	QUANTITY	0.51+
Women	TITLE	0.44+

Esteban Arcaute, @WalmartLabs - Women in Data Science 2017 - #WiDS2017 - #theCUBE

>> Announcer: Live from Stanford University, it's theCUBE, covering the Women in Data Science Conference 2017. >> Hi, welcome to theCUBE. I'm Lisa Martin, and we are at the Women in Data Science second annual conference at Stanford University. Great event, very excited to be joined by one of the founders of the Women in Data Science, the Senior Director and Head of Data Science at Walmart Labs, Esteban Arcaute. Very nice to have you on the program. Thanks for joining us. >> Thank you for having me, Lisa. >> So talk to us about data science in retail. How is Walmart using data science too influence shoppers wherever they are, mobile, in store, dot com? >> So data science is a key component to how we create our experiences, especially now that our customers essentially don't really make a distinction between they're shopping in stores or they're actually using their mobile device, or they're at home with their desktop. So that means that for us it really is about creating a seamless experience that allows a customer to not feel that barrier of the medium that they're using to shop. So more practically, that means that the data that we're using to create the experience is essentially the same across all of these medias. >> So big data brings, and data science brings big opportunities, but also some challenges. Talk to us about some of the challenges that you've had with the tremendous amount of data because you've got what? Sixty million shoppers, 260 million, excuse me, globally. How are you dealing with some of those challenges and really turning them into opportunities to create that seamless experience? >> So for us it means that a lot of ready-made solutions that are available for other companies, they just don't work for us. The same way that other companies with large amounts of data, they actually have to create their own in-house solutions or technology. It is the same for us. Now in terms of how that is a very specific challenge, that means that when you actually go and train, let's say a model, that is trying to predict whether a customer is going to satisfied with a purchase or not, usually the amount of data that you have will make that model to not be that reliable unless you actually did it in-house. >> Okay, so from an accuracy perspective that really is what was driving being able to do that within Walmart Labs? >> Yes, and just sort of to give a plug to the department where I got my PhD, all of these numerical instabilities that in past you will only see when doing computational fluid dynamics, they actually start appearing in places like retail just because of the volume of data that is available. And so for us it's a great opportunity to be an ICME student. >> Excellent, and that's right, you got your Master's and your PhD right here at Stanford. Talk to us about from a scale and a speed perspective. How are you seeing the ability to influence the consumer experience? How quickly are you able to identify trends and act on them so that customer experience is better, and also the bottom line financials are improved as well for Walmart? >> That is a great question, Lisa, because our customers' expectations are changing really, really rapidly. If you remember back in the late 90s when you would go to a search engine and it worked, it was like a miracle. Everybody was really excited. Fast forward to today, you go to any search box, not a search engine, you put in a query. If it doesn't work, you're disappointed. When it works, it's just table stakes. That means that for us we need to be able to iterate as quickly as the customer expectations change, which is really, really fast. >> Absolutely. How do you collaborate with the business side? So first, let's talk about your team. What's the size of your team? As the head of data science, what are the different functions within your team, first and foremost? >> I'm also in charge of the search experience within Walmart Global eCommerce. It's a fairly large team because it is composed of basically the full stack from the back end, data science, dev ops, product management, so I cannot give you an exact size, but it's a fairly large team. >> And so how do you collaborate with the business to influence merchandising, for example? What is that collaboration like between Walmart Labs and the dot com side? >> So last year, Kelly Thompson was one of the speakers at the Women in Data Science Conference, and she talked about the importance of bringing the art of merchandising with the science of data science together. And it really is true that there're certain things that algorithms cannot catch as soon as a human expert actually knows about. And so the way we develop our products and enhance experiences for our customers is really bringing these two together in a partnership to ensure that there's never one side that is working on something that the other one cannot just leverage. >> From a priority perspective, how are some of the trends that you find driving priorities for investment? >> It goes both ways. Sometimes we find the trend. Sometimes the business finds the trend. And so sometimes the business asks us to try to automate or to predict something that we hadn't thought about, and that is actually very difficult, and hence we invest a lot in that. And sometimes we find some customer patterns that indicate a different behavior in a locality or with certain characteristics that then the business can go and better serve themselves. So it really is driven by whoever has a good idea, and they can come from anywhere. >> You mentioned the need still for human insight. Talk to us about that dynamic, machine learning and human insight. How does that work together, and again kind of thinking in the context of speed and skill to meet those changing customer demands? >> That is one of the best kept secrets for machine learning, is that most machine learning systems, the moment they have a human in the loop, the learning grade gets accelerated exponentially bcause essentially when a machine learning method is not working properly, it tends to be for certain types of cases that if they get resolved, just a few insights from a human being can actually go and make the machine learn a lot faster than if it's trying to figure it out on its own. So for us really even there is a partnership. We think of it as a systems with a human in the loop. That human, if it's an expert, it's even better, which is what we have. And so we create our systems to deeply integrate our merchandising capacity. >> So you actually see human intervention or interaction as a necessary component to speed to market leveraging data? >> That is the fastest way to get there. There might be other ways to do with that. We don't always have a human in the loop, but when we can have a human in the loop, we have seen that acceleration is actually measurable. >> Fantastic. So one of the things I wanted to chat about with you is looking at your team a little bit, as well as your involvement here in the Women in Data Science. You were one of the founders. Talk to us about Walmart's interest in helping to not only educate women, and further their education in data science, but also maybe to combat the predicted shortage of data scientists that's predicted to start even in 2018. How is that collaboration going to help in that sense? >> So let me address the question in two parts. First, the question related to women and minorities into data science. So Walmart is a very inclusive company. We win awards every year because of all of our work in there. And I think that starting with Women in Data Science, it's a natural place to start because there's always 50% of women everywhere. And so that means that really thinking that there should be an equal representation, or maybe not equal representation, there should be a way to funnel all of this talent into data science just makes sense. There's not a question as to whether there's sufficiently many of them or things like that. >> So culturally it was kind of a natural extension for Walmart Labs it sounds like. >> Absolutely, yes. And the second question is the shortage. So for us we're very lucky in that we have two things that any company needs to have to attract great data scientists. So first one is that we actually have data. Believe it or not, it is an asset that a lot of companies don't realize is actually (mumbling). And the second one is that we empower all of our associates with the ability to have impact from the get go. We don't put them in some small project that might have an impact in maybe three years. No, we actually put them in participating projects that might have, for instance in my team, impact within the first three to four months of being on the floor. >> That's fantastic, and I'm sure that really inspires them. They see that they can make an impact right away. And I would imagine just after chatting with you that they have the freedom probably to test and fail, and from that failure it becomes more opportunities to get and tweak and get things right. >> Absolutely. So especially in a field like retail, there's no laws of retail. There's not someone that just put in some nice equations and we just and study and do something. Actually you need to test over and interate constantly, especially when your customers expectations change so rapidly. >> So in terms of evolution of data science and skills, data presentation skills, analysis, stats, math, what are some of the other skills, maybe even social skills that you think are really key for the young next generation of data scientists to really get into this field regardless of industry and be successful? >> It's a question that I get very often, and especially because data science has not yet been formally properly defined in some sense. Data scientist is even less properly defined, so the term just started in 2010 or 11, so usually people think that they have to be hackers, have analytical skills and have some domain expertise. We actually flip that to say you have to have analytical skills, so that stays. You have to be a software engineer or have software engineering skills, and you have to project management skills. And the reason is that unless you are able to properly communicate what your insights are, to understand how they get incorporated into a real software system, and of course to have the expertise to know what you are doing, you're not going to be successful as a data scientist. So for us really those three components are the ones that drive what are we looking at data scientists. >> Excellent, so you mentioned hackers. Hackathons, you recently had a hackathon. How is Walmart Labs giving opportunities to maybe kids in grade school and high school, kids that are university to start developing that talent. >> So we have also an internship program every year. We have interns across all of Walmart Labs, and there is always a great opportunity to seed fresh new ideas that come from our interns, so that happens every year. We organize hackathons in very targeted way in places where we see that there is demand to have these kind of events organized. So I think one that we have in our website is one from 2015 with Tech Crunch Disrupt. It's a big one, but we do other things as well. >> But that actually has the ability, someone who's made a big difference or won at a hackathon that Walmart Lab sponsors has the ability to actually influence Walmart. >> Absolutely because as I said a couple of minutes ago, great ideas come from anywhere. And hackathons are great places where you see all of these ideas bubbling, and that you might not even realize that oh, that opportunity is right there. Someone can see it, and wants it seen, everybody can see it. So it's a great place. >> But that's a great, from a cultural perspective what you're saying sounds fantastic, that you're, there's a culture within Walmart Labs and Walmart that really is not only diverse from women in the sciences as well, but also one that really encourages test it, try it, you can make an impact here. And I think that's huge for attracting talent. What advice would you give to some of the young women that are here at the Women in Data Science Conference for the second annual to want to become successful data scientists? >> So I would give the advice that I have for myself, which is stay true to yourself, and anyone can be a great data scientist. >> What are some of the things that you're most looking forward to learning and hearing at this second annual event? >> The line up of speakers is amazing, and I think that the fact that they come from all places in industry, and all types of academic and professional journeys make it a very rich experience even for me to understand what are the possibilities. >> Absolutely, the cross section of speakers at the event is amazing. You've got obviously you know, data science into retail. We've got people that are using, that are going to be on the show later, data science to change the way college kids are recruited for jobs. Kind of getting away from that things that used to scare me, GPA, test scores, really leveraging science to open up those possibilities. And I think one of the things that that can enable from your comment earlier is the importance of being able to be a good communicator. It's not just about understanding the data. You've got to be able to explain it in a way that makes sense. Is this an impact? Also you mentioned we've got people that are here today on the academic side that are helping to educate the next generation of computer and data scientists. So I think it's a phenomenal opportunity for women of all ages to really understand it's not just technology. Every company this day and age is a technology company, and the opportunities are there to be influencers, and it sounds like at Walmart Labs, from the ground up. >> Yes, absolutely. >> Fantastic. Well, Esteban it's been such a pleasure having you on the program today. Thank you so much for joining. We look forward to having a great event and hopefully seeing you at the third annual next year. >> Definitely. Thank you very much for having me, Lisa. >> And you've been watching theCUBE. We are live at the Women in Data Science Conference at Stanford University. Stick around, be right back. (jazzy music)

Published Date : Feb 3 2017

SUMMARY :

covering the Women in Data Science Conference 2017. Very nice to have you on the program. So talk to us about data science in retail. So more practically, that means that the data that we're Talk to us about some of the challenges that you've had that means that when you actually go and train, that in past you will only see when doing computational so that customer experience is better, and also the bottom Fast forward to today, you go to any search box, As the head of data science, what are the different I'm also in charge of the search experience within And so the way we develop our products and enhance And so sometimes the business asks us to try to automate the context of speed and skill to meet those changing is that most machine learning systems, the moment they have have a human in the loop, we have seen that acceleration So one of the things I wanted to chat about with you is First, the question related to women and minorities So culturally it was kind of a natural extension the first three to four months of being on the floor. and from that failure it becomes more opportunities There's not someone that just put in some nice equations We actually flip that to say you have to have How is Walmart Labs giving opportunities to maybe kids and there is always a great opportunity to seed sponsors has the ability to actually influence Walmart. And hackathons are great places where you see all of that are here at the Women in Data Science Conference So I would give the advice that I have for myself, the fact that they come from all places in industry, and the opportunities are there to be influencers, We look forward to having a great event and hopefully Thank you very much for having me, Lisa. We are live at the Women in Data Science Conference

ENTITIES

Entity	Category	Confidence
Walmart	ORGANIZATION	0.99+
Lisa Martin	PERSON	0.99+
Lisa	PERSON	0.99+
Kelly Thompson	PERSON	0.99+
2015	DATE	0.99+
2010	DATE	0.99+
2018	DATE	0.99+
Walmart Labs	ORGANIZATION	0.99+
Esteban	PERSON	0.99+
50%	QUANTITY	0.99+
second question	QUANTITY	0.99+
last year	DATE	0.99+
three years	QUANTITY	0.99+
two parts	QUANTITY	0.99+
11	DATE	0.99+
First	QUANTITY	0.99+
260 million	QUANTITY	0.99+
four months	QUANTITY	0.99+
second one	QUANTITY	0.99+
Esteban Arcaute	PERSON	0.99+
two things	QUANTITY	0.98+
one	QUANTITY	0.98+
late 90s	DATE	0.98+
#WiDS2017	EVENT	0.98+
Women in Data Science Conference 2017	EVENT	0.97+
Walmart Lab	ORGANIZATION	0.97+
first	QUANTITY	0.97+
two	QUANTITY	0.97+
today	DATE	0.97+
Women in Data Science Conference	EVENT	0.97+
first one	QUANTITY	0.97+
one side	QUANTITY	0.97+
Stanford University	ORGANIZATION	0.97+
both ways	QUANTITY	0.96+
Women in Data Science	ORGANIZATION	0.95+
Women in Data Science 2017	EVENT	0.95+
Sixty million shoppers	QUANTITY	0.92+
first three	QUANTITY	0.91+
second annual	QUANTITY	0.9+
next year	DATE	0.88+
@WalmartLabs	ORGANIZATION	0.88+
theCUBE	TITLE	0.87+
#theCUBE	EVENT	0.86+
couple of minutes ago	DATE	0.85+
Stanford University	LOCATION	0.84+
Walmart Global	ORGANIZATION	0.83+
three components	QUANTITY	0.83+
Women in Data Science	EVENT	0.78+
Stanford	ORGANIZATION	0.78+
Tech Crunch Disrupt	ORGANIZATION	0.76+
ICME	ORGANIZATION	0.69+
theCUBE	ORGANIZATION	0.68+
third annual	QUANTITY	0.65+
second annual conference	QUANTITY	0.6+

Gabriela de Queiroz, Microsoft | WiDS 2023

(upbeat music) >> Welcome back to theCUBE's coverage of Women in Data Science 2023 live from Stanford University. This is Lisa Martin. My co-host is Tracy Yuan. We're excited to be having great conversations all day but you know, 'cause you've been watching. We've been interviewing some very inspiring women and some men as well, talking about all of the amazing applications of data science. You're not going to want to miss this next conversation. Our guest is Gabriela de Queiroz, Principal Cloud Advocate Manager of Microsoft. Welcome, Gabriela. We're excited to have you. >> Thank you very much. I'm so excited to be talking to you. >> Yeah, you're on theCUBE. >> Yeah, finally. (Lisa laughing) Like a dream come true. (laughs) >> I know and we love that. We're so thrilled to have you. So you have a ton of experience in the data space. I was doing some research on you. You've worked in software, financial advertisement, health. Talk to us a little bit about you. What's your background in? >> So I was trained in statistics. So I'm a statistician and then I worked in epidemiology. I worked with air pollution and public health. So I was a researcher before moving into the industry. So as I was talking today, the weekly paths, it's exactly who I am. I went back and forth and back and forth and stopped and tried something else until I figured out that I want to do data science and that I want to do different things because with data science we can... The beauty of data science is that you can move across domains. So I worked in healthcare, financial, and then different technology companies. >> Well the nice thing, one of the exciting things that data science, that I geek out about and Tracy knows 'cause we've been talking about this all day, it's just all the different, to your point, diverse, pun intended, applications of data science. You know, this morning we were talking about, we had the VP of data science from Meta as a keynote. She came to theCUBE talking and really kind of explaining from a content perspective, from a monetization perspective, and of course so many people in the world are users of Facebook. It makes it tangible. But we also heard today conversations about the applications of data science in police violence, in climate change. We're in California, we're expecting a massive rainstorm and we don't know what to do when it rains or snows. But climate change is real. Everyone's talking about it, and there's data science at its foundation. That's one of the things that I love. But you also have a lot of experience building diverse teams. Talk a little bit about that. You've created some very sophisticated data science solutions. Talk about your recommendation to others to build diverse teams. What's in it for them? And maybe share some data science project or two that you really found inspirational. >> Yeah, absolutely. So I do love building teams. Every time I'm given the task of building teams, I feel the luckiest person in the world because you have the option to pick like different backgrounds and all the diverse set of like people that you can find. I don't think it's easy, like people say, yeah, it's very hard. You have to be intentional. You have to go from the very first part when you are writing the job description through the interview process. So you have to be very intentional in every step. And you have to think through when you are doing that. And I love, like my last team, we had like 10 people and we were so diverse. Like just talking about languages. We had like 15 languages inside a team. So how beautiful it is. Like all different backgrounds, like myself as a statistician, but we had people from engineering background, biology, languages, and so on. So it's, yeah, like every time thinking about building a team, if you wanted your team to be diverse, you need to be intentional. >> I'm so glad you brought up that intention point because that is the fundamental requirement really is to build it with intention. >> Exactly, and I love to hear like how there's different languages. So like I'm assuming, or like different backgrounds, I'm assuming everybody just zig zags their way into the team and now you're all women in data science and I think that's so precious. >> Exactly. And not only woman, right. >> Tracy: Not only woman, you're right. >> The team was diverse not only in terms of like gender, but like background, ethnicity, and spoken languages, and language that they use to program and backgrounds. Like as I mentioned, not everybody did the statistics in school or computer science. And it was like one of my best teams was when we had this combination also like things that I'm good at the other person is not as good and we have this knowledge sharing all the time. Every day I would feel like I'm learning something. In a small talk or if I was reviewing something, there was always something new because of like the richness of the diverse set of people that were in your team. >> Well what you've done is so impressive, because not only have you been intentional with it, but you sound like the hallmark of a great leader of someone who hires and builds teams to fill gaps. They don't have to know less than I do for me to be the leader. They have to have different skills, different areas of expertise. That is really, honestly Gabriela, that's the hallmark of a great leader. And that's not easy to come by. So tell me, who were some of your mentors and sponsors along the way that maybe influenced you in that direction? Or is that just who you are? >> That's a great question. And I joke that I want to be the role model that I never had, right. So growing up, I didn't have anyone that I could see other than my mom probably or my sister. But there was no one that I could see, I want to become that person one day. And once I was tracing my path, I started to see people looking at me and like, you inspire me so much, and I'm like, oh wow, this is amazing and I want to do do this over and over and over again. So I want to be that person to inspire others. And no matter, like I'll be like a VP, CEO, whoever, you know, I want to be, I want to keep inspiring people because that's so valuable. >> Lisa: Oh, that's huge. >> And I feel like when we grow professionally and then go to the next level, we sometimes we lose that, you know, thing that's essential. And I think also like, it's part of who I am as I was building and all my experiences as I was going through, I became what I mentioned is unique person that I think we all are unique somehow. >> You're a rockstar. Isn't she a rockstar? >> You dropping quotes out. >> I'm loving this. I'm like, I've inspired Gabriela. (Gabriela laughing) >> Oh my God. But yeah, 'cause we were asking our other guests about the same question, like, who are your role models? And then we're talking about how like it's very important for women to see that there is a representation, that there is someone they look up to and they want to be. And so that like, it motivates them to stay in this field and to start in this field to begin with. So yeah, I think like you are definitely filling a void and for all these women who dream to be in data science. And I think that's just amazing. >> And you're a founder too. In 2012, you founded R Ladies. Talk a little bit about that. This is present in more than 200 cities in 55 plus countries. Talk about R Ladies and maybe the catalyst to launch it. >> Yes, so you always start, so I'm from Brazil, I always talk about this because it's such, again, I grew up over there. So I was there my whole life and then I moved to here, Silicon Valley. And when I moved to San Francisco, like the doors opened. So many things happening in the city. That was back in 2012. Data science was exploding. And I found out something about Meetup.com, it's a website that you can join and go in all these events. And I was going to this event and I joke that it was kind of like going to the Disneyland, where you don't know if I should go that direction or the other direction. >> Yeah, yeah. >> And I was like, should I go and learn about data visualization? Should I go and learn about SQL or should I go and learn about Hadoop, right? So I would go every day to those meetups. And I was a student back then, so you know, the budget was very restricted as a student. So we don't have much to spend. And then they would serve dinner and you would learn for free. And then I got to a point where I was like, hey, they are doing all of this as a volunteer. Like they are running this meetup and events for free. And I felt like it's a cycle. I need to do something, right. I'm taking all this in. I'm having this huge opportunity to be here. I want to give back. So that's what how everything started. I was like, no, I have to think about something. I need to think about something that I can give back. And I was using R back then and I'm like how about I do something with R. I love R, I'm so passionate about R, what about if I create a community around R but not a regular community, because by going to this events, I felt that as a Latina and as a woman, I was always in the corner and I was not being able to participate and to, you know, be myself and to network and ask questions. I would be in the corner. So I said to myself, what about if I do something where everybody feel included, where everybody can participate, can share, can ask questions without judgment? So that's how R ladies all came together. >> That's awesome. >> Talk about intentions, like you have to, you had that go in mind, but yeah, I wanted to dive a little bit into R. So could you please talk more about where did the passion for R come from, and like how did the special connection between you and R the language, like born, how did that come from? >> It was not a love at first sight. >> No. >> Not at all. Not at all. Because that was back in Brazil. So all the documentation were in English, all the tutorials, only two. We had like very few tutorials. It was not like nowadays that we have so many tutorials and courses. There were like two tutorials, other documentation in English. So it's was hard for me like as someone that didn't know much English to go through the language and then to learn to program was not easy task. But then as I was going through the language and learning and reading books and finding the people behind the language, I don't know how I felt in love. And then when I came to to San Francisco, I saw some of like the main contributors who are speaking in person and I'm like, wow, they are like humans. I don't know, it was like, I have no idea why I had this love. But I think the the people and then the community was the thing that kept me with the R language. >> Yeah, the community factors is so important. And it's so, at WIDS it's so palpable. I mean I literally walk in the door, every WIDS I've done, I think I've been doing them for theCUBE since 2017. theCUBE has been here since the beginning in 2015 with our co-founders. But you walk in, you get this sense of belonging. And this sense of I can do anything, why not? Why not me? Look at her up there, and now look at you speaking in the technical talk today on theCUBE. So inspiring. One of the things that I always think is you can't be what you can't see. We need to be able to see more people that look like you and sound like you and like me and like you as well. And WIDS gives us that opportunity, which is fantastic, but it's also helping to move the needle, really. And I was looking at some of the Anitab.org stats just yesterday about 2022. And they're showing, you know, the percentage of females in technical roles has been hovering around 25% for a while. It's a little higher now. I think it's 27.6 according to any to Anitab. We're seeing more women hired in roles. But what are the challenges, and I would love to get your advice on this, for those that might be in this situation is attrition, women who are leaving roles. What would your advice be to a woman who might be trying to navigate family and work and career ladder to stay in that role and keep pushing forward? >> I'll go back to the community. If you don't have a community around you, it's so hard to navigate. >> That's a great point. >> You are lonely. There is no one that you can bounce ideas off, that you can share what you are feeling or like that you can learn as well. So sometimes you feel like you are the only person that is going through that problem or like, you maybe have a family or you are planning to have a family and you have to make a decision. But you've never seen anyone going through this. So when you have a community, you see people like you, right. So that's where we were saying about having different people and people like you so they can share as well. And you feel like, oh yeah, so they went through this, they succeed. I can also go through this and succeed. So I think the attrition problem is still big problem. And I'm sure will be worse now with everything that is happening in Tech with layoffs. >> Yes and the great resignation. >> Yeah. >> We are going back, you know, a few steps, like a lot of like advancements that we did. I feel like we are going back unfortunately, but I always tell this, make sure that you have a community. Make sure that you have a mentor. Make sure that you have someone or some people, not only one mentor, different mentors, that can support you through this trajectory. Because it's not easy. But there are a lot of us out there. >> There really are. And that's a great point. I love everything about the community. It's all about that network effect and feeling like you belong- >> That's all WIDS is about. >> Yeah. >> Yes. Absolutely. >> Like coming over here, it's like seeing the old friends again. It's like I'm so glad that I'm coming because I'm all my old friends that I only see like maybe once a year. >> Tracy: Reunion. >> Yeah, exactly. And I feel like that our tank get, you know- >> Lisa: Replenished. >> Exactly. For the rest of the year. >> Yes. >> Oh, that's precious. >> I love that. >> I agree with that. I think one of the things that when I say, you know, you can't see, I think, well, how many females in technology would I be able to recognize? And of course you can be female technology working in the healthcare sector or working in finance or manufacturing, but, you know, we need to be able to have more that we can see and identify. And one of the things that I recently found out, I was telling Tracy this earlier that I geeked out about was finding out that the CTO of Open AI, ChatGPT, is a female. I'm like, (gasps) why aren't we talking about this more? She was profiled on Fast Company. I've seen a few pieces on her, Mira Murati. But we're hearing so much about ChatJTP being... ChatGPT, I always get that wrong, about being like, likening it to the launch of the iPhone, which revolutionized mobile and connectivity. And here we have a female in the technical role. Let's put her on a pedestal because that is hugely inspiring. >> Exactly, like let's bring everybody to the front. >> Yes. >> Right. >> And let's have them talk to us because like, you didn't know. I didn't know probably about this, right. You didn't know. Like, we don't know about this. It's kind of like we are hidden. We need to give them the spotlight. Every woman to give the spotlight, so they can keep aspiring the new generation. >> Or Susan Wojcicki who ran, how long does she run YouTube? All the YouTube influencers that probably have no idea who are influential for whatever they're doing on YouTube in different social platforms that don't realize, do you realize there was a female behind the helm that for a long time that turned it into what it is today? That's outstanding. Why aren't we talking about this more? >> How about Megan Smith, was the first CTO on the Obama administration. >> That's right. I knew it had to do with Obama. Couldn't remember. Yes. Let's let's find more pedestals. But organizations like WIDS, your involvement as a speaker, showing more people you can be this because you can see it, >> Yeah, exactly. is the right direction that will help hopefully bring us back to some of the pre-pandemic levels, and keep moving forward because there's so much potential with data science that can impact everyone's lives. I always think, you know, we have this expectation that we have our mobile phone and we can get whatever we want wherever we are in the world and whatever time of day it is. And that's all data driven. The regular average person that's not in tech thinks about data as a, well I'm paying for it. What's all these data charges? But it's powering the world. It's powering those experiences that we all want as consumers or in our business lives or we expect to be able to do a transaction, whether it's something in a CRM system or an Uber transaction like that, and have the app respond, maybe even know me a little bit better than I know myself. And that's all data. So I think we're just at the precipice of the massive impact that data science will make in our lives. And luckily we have leaders like you who can help navigate us along this path. >> Thank you. >> What advice for, last question for you is advice for those in the audience who might be nervous or maybe lack a little bit of confidence to go I really like data science, or I really like engineering, but I don't see a lot of me out there. What would you say to them? >> Especially for people who are from like a non-linear track where like going onto that track. >> Yeah, I would say keep going. Keep going. I don't think it's easy. It's not easy. But keep going because the more you go the more, again, you advance and there are opportunities out there. Sometimes it takes a little bit, but just keep going. Keep going and following your dreams, that you get there, right. So again, data science, such a broad field that doesn't require you to come from a specific background. And I think the beauty of data science exactly is this is like the combination, the most successful data science teams are the teams that have all these different backgrounds. So if you think that we as data scientists, we started programming when we were nine, that's not true, right. You can be 30, 40, shifting careers, starting to program right now. It doesn't matter. Like you get there no matter how old you are. And no matter what's your background. >> There's no limit. >> There was no limits. >> I love that, Gabriela, >> Thank so much. for inspiring. I know you inspired me. I'm pretty sure you probably inspired Tracy with your story. And sometimes like what you just said, you have to be your own mentor and that's okay. Because eventually you're going to turn into a mentor for many, many others and sounds like you're already paving that path and we so appreciate it. You are now officially a CUBE alumni. >> Yes. Thank you. >> Yay. We've loved having you. Thank you so much for your time. >> Thank you. Thank you. >> For our guest and for Tracy's Yuan, this is Lisa Martin. We are live at WIDS 23, the eighth annual Women in Data Science Conference at Stanford. Stick around. Our next guest joins us in just a few minutes. (upbeat music)

Published Date : Mar 8 2023

SUMMARY :

but you know, 'cause you've been watching. I'm so excited to be talking to you. Like a dream come true. So you have a ton of is that you can move across domains. But you also have a lot of like people that you can find. because that is the Exactly, and I love to hear And not only woman, right. that I'm good at the other Or is that just who you are? And I joke that I want And I feel like when You're a rockstar. I'm loving this. So yeah, I think like you the catalyst to launch it. And I was going to this event And I was like, and like how did the special I saw some of like the main more people that look like you If you don't have a community around you, There is no one that you Make sure that you have a mentor. and feeling like you belong- it's like seeing the old friends again. And I feel like that For the rest of the year. And of course you can be everybody to the front. you didn't know. do you realize there was on the Obama administration. because you can see it, I always think, you know, What would you say to them? are from like a non-linear track that doesn't require you to I know you inspired me. you so much for your time. Thank you. the eighth annual Women

ENTITIES

Entity	Category	Confidence
Tracy Yuan	PERSON	0.99+
Megan Smith	PERSON	0.99+
Gabriela de Queiroz	PERSON	0.99+
Susan Wojcicki	PERSON	0.99+
Gabriela	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Brazil	LOCATION	0.99+
2015	DATE	0.99+
2012	DATE	0.99+
San Francisco	LOCATION	0.99+
San Francisco	LOCATION	0.99+
Tracy	PERSON	0.99+
Obama	PERSON	0.99+
Lisa	PERSON	0.99+
Mira Murati	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
California	LOCATION	0.99+
Silicon Valley	LOCATION	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
Uber	ORGANIZATION	0.99+
27.6	QUANTITY	0.99+
two	QUANTITY	0.99+
30	QUANTITY	0.99+
40	QUANTITY	0.99+
15 languages	QUANTITY	0.99+
R Ladies	ORGANIZATION	0.99+
two tutorials	QUANTITY	0.99+
Anitab	ORGANIZATION	0.99+
10 people	QUANTITY	0.99+
one	QUANTITY	0.99+
YouTube	ORGANIZATION	0.99+
today	DATE	0.99+
55 plus countries	QUANTITY	0.99+
first part	QUANTITY	0.99+
more than 200 cities	QUANTITY	0.99+
first	QUANTITY	0.98+
nine	QUANTITY	0.98+
SQL	TITLE	0.98+
theCUBE	ORGANIZATION	0.98+
WIDS 23	EVENT	0.98+
Stanford University	ORGANIZATION	0.98+
2017	DATE	0.98+
CUBE	ORGANIZATION	0.97+
Stanford	LOCATION	0.97+
Women in Data Science	TITLE	0.97+
around 25%	QUANTITY	0.96+
Disneyland	LOCATION	0.96+
English	OTHER	0.96+
one mentor	QUANTITY	0.96+
Women in Data Science Conference	EVENT	0.96+
once a year	QUANTITY	0.95+
WIDS	ORGANIZATION	0.92+
this morning	DATE	0.91+
Meetup.com	ORGANIZATION	0.91+
Facebook	ORGANIZATION	0.9+
Hadoop	TITLE	0.89+
WiDS 2023	EVENT	0.88+
Anitab.org	ORGANIZATION	0.87+
ChatJTP	TITLE	0.86+
One	QUANTITY	0.86+
one day	QUANTITY	0.85+
ChatGPT	TITLE	0.84+
pandemic	EVENT	0.81+
Fast Company	ORGANIZATION	0.78+
CTO	PERSON	0.76+
Open	ORGANIZATION	0.76+

Myriam Fayad & Alexandre Lapene, TotalEnergies | WiDS 2023

(upbeat music) >> Hey, girls and guys. Welcome back to theCUBE. We are live at Stanford University, covering the 8th Annual Women in Data Science Conference. One of my favorite events. Lisa Martin here. Got a couple of guests from Total Energies. We're going to be talking all things data science, and I think you're going to find this pretty interesting and inspirational. Please welcome Alexandre Lapene, Tech Advisor Data Science at Total Energy. It's great to have you. >> Thank you. >> And Myriam Fayad is here as well, product and value manager at Total Energies. Great to have you guys on theCUBE today. Thank you for your time. >> Thank you for - >> Thank you for receiving us. >> Give the audience, Alexandre, we'll start with you, a little bit about Total Energies, so they understand the industry, and what it is that you guys are doing. >> Yeah, sure, sure. So Total Energies, is a former Total, so we changed name two years ago. So we are a multi-energy company now, working over 130 countries in the world, and more than 100,000 employees. >> Lisa: Oh, wow, big ... >> So we're a quite big company, and if you look at our new logo, you will see there are like seven colors. That's the seven energy that we basically that our business. So you will see the red for the oil, the blue for the gas, because we still have, I mean, a lot of oil and gas, but you will see other color, like blue for hydrogen. >> Lisa: Okay. >> Green for gas, for biogas. >> Lisa: Yeah. >> And a lot of other solar and wind. So we're definitely multi-energy company now. >> Excellent, and you're both from Paris? I'm jealous, I was supposed to go. I'm not going to be there next month. Myriam, talk a little bit about yourself. I'd love to know a little bit about your role. You're also a WiDS ambassador this year. >> Myriam: Yes. >> Lisa: Which is outstanding, but give us a little bit of your background. >> Yes, so today I'm a product manager at the Total Energies' Digital Factory. And at the Digital Factory, our role is to develop digital solutions for all of the businesses of Total Energies. And as a background, I did engineering school. So, and before that I, I would say, I wasn't really aware of, I had never asked myself if being a woman could stop me from being, from doing what I want to do in the professional career. But when I started my engineering school, I started seeing that women are becoming, I would say, increasingly rare in the environment >> Lisa: Yes. >> that, where I was evolving. >> Lisa: Yes. >> So that's why I was, I started to think about, about such initiatives. And then when I started working in the tech field, that conferred me that women are really rare in the tech field and data science field. So, and at Total Energies, I met ambassadors of, of the WiDS initiatives. And that's how I, I decided to be a WiDS Ambassador, too. So our role is to organize events locally in the countries where we work to raise awareness about the importance of having women in the tech and data fields. And also to talk about the WiDS initiative more globally. >> One of my favorite things about WiDS is it's this global movement, it started back in 2015. theCUBE has been covering it since then. I think I've been covering it for theCUBE since 2017. It's always a great day full of really positive messages. One of the things that we talk a lot about when we're focusing on the Q1 Women in Tech, or women in technical roles is you can't be what you can't see. We need to be able to see these role models, but also it, we're not just talking about women, we're talking about underrepresented minorities, we're talking about men like you, Alexander. Talk to us a little bit about what your thoughts are about being at a Women and Data Science Conference and your sponsorship, I'm sure, of many women in Total, and other industries that appreciate having you as a guide. >> Yeah, yeah, sure. First I'm very happy because I'm back to Stanford. So I did my PhD, postdoc, sorry, with Margot, I mean, back in 20, in 2010, so like last decade. >> Lisa: Yeah, yep. >> I'm a film mechanics person, so I didn't start as data scientist, but yeah, WiDS is always, I mean, this great event as you describe it, I mean, to see, I mean it's growing every year. I mean, it's fantastic. And it's very, I mean, I mean, it's always also good as a man, I mean, to, to be in the, in the situation of most of the women in data science conferences. And when Margo, she asked at the beginning of the conference, "Okay, how many men do we have? Okay, can you stand up?" >> Lisa: Yes. I saw that >> It was very interesting because - >> Lisa: I could count on one hand. >> What, like 10 or ... >> Lisa: Yeah. >> Maximum. >> Lisa: Yeah. >> And, and I mean, you feel that, I mean, I mean you could feel what what it is to to be a woman in the field and - >> Lisa: Absolutely. >> Alexandre: That's ... >> And you, sounds like you experienced it. I experienced the same thing. But one of the things that fascinates me about data science is all of the different real world problems it's helping to solve. Like, I keep saying this, we're, we're in California, I'm a native Californian, and we've been in an extreme drought for years. Well, we're getting a ton of rain and snow this year. Climate change. >> Guests: Yeah. We're not used to driving in the rain. We are not very good at it either. But the, just thinking about data science as a facilitator of its understanding climate change better; to be able to make better decisions, predictions, drive better outcomes, or things like, police violence or healthcare inequities. I think the power of data science to help unlock a lot of the unknown is so great. And, and we need that thought diversity. Miriam, you're talking about being in engineering. Talk to me a little bit about what projects interest you with respect to data science, and how you are involved in really creating more diversity and thought. >> Hmm. In fact, at Total Energies in addition to being an energy company we're also a data company in the sense that we produce a lot of data in our activities. For example with the sensors on the fuel on the platforms. >> Lisa: Yes. >> Or on the wind turbines, solar panels and even data related to our clients. So what, what is really exciting about being, working in the data science field at Total Energies is that we really feel the impact of of the project that we're working on. And we really work with the business to understand their problems. >> Lisa: Yeah. >> Or their issues and try to translate it to a technical problem and to solve it with the data that we have. So that's really exciting, to feel the impact of the projects we're working on. So, to take an example, maybe, we know that one of the challenges of the energy transition is the storage of of energy coming from renewable power. >> Yes. >> So I'm working currently on a project to improve the process of creating larger batteries that will help store this energy, by collecting the data, and helping the business to improve the process of creating these batteries. To make it more reliable, and with a better quality. So this is a really interesting project we're working on. >> Amazing, amazing project. And, you know, it's, it's fun I think to think of all of the different people, communities, countries, that are impacted by what you're doing. Everyone, everyone knows about data. Sometimes we think about it as we're paying we're always paying for a lot of data on our phone or "data rates may apply" but we may not be thinking about all of the real world impact that data science is making in our lives. We have this expectation in our personal lives that we're connected 24/7. >> Myriam: Yeah. >> I can get whatever I want from my phone wherever I am in the world. And that's all data driven. And we expect that if I'm dealing with Total Energies, or a retailer, or a car dealer that they're going to have the data, the data to have a personal conversation, conversation with me. We have this expectation. I don't think a lot of people that aren't in data science or technology really realize the impact of data all around their lives. Alexander, talk about some of the interesting data science projects that you're working on. >> There's one that I'm working right now, so I stake advisor. I mean, I'm not the one directly working on it. >> Lisa: Okay. >> But we have, you know, we, we are from the digital factory where we, we make digital products. >> Lisa: Okay. >> And we have different squads. I mean, it's a group of different people with different skills. And one of, one of the, this squad, they're, they're working on the on, on the project that is about safety. We have a lot of site, work site on over the world where we deploy solar panels on on parkings, on, on buildings everywhere. >> Lisa: Okay. Yeah. >> And there's, I mean, a huge, I mean, but I mean, we, we have a lot of, of worker and in term of safety we want to make sure that the, they work safely and, and we want to prevent accidents. So what we, what we do is we, we develop some computer vision approach to help them at improving, you know, the, the, the way they work. I mean the, the basic things is, is detecting, detecting some equipment like the, the the mean the, the vest and so on. But we, we, we, we are working, we're working to really extend that to more concrete recommendation. And that's one a very exciting project. >> Lisa: Yeah. >> Because it's very concrete. >> Yeah. >> And also, I, I'm coming from the R&D of the company and that's one, that's one of this project that started in R&D and is now into the Digital Factory. And it will become a real product deployed over the world on, on our assets. So that's very great. >> The influence and the impact that data can have on every business always is something that, we could talk about that for a very long time. >> Yeah. >> But one of the things I want to address is there, I'm not sure if you're familiar with AnitaB.org the Grace Hopper Institute? It's here in the States and they do this great event every year. It's very pro-women in technology and technical roles. They do a lot of, of survey of, of studies. So they have data demonstrating where are we with respect to women in technical roles. And we've been talking about it for years. It's been, for a while hovering around 25% of technical roles are held by women. I noticed in the AnitaB.org research findings from 2022, It's up to 27.6% I believe. So we're seeing those numbers slowly go up. But one of the things that's a challenge is attrition; of women getting in the roles and then leaving. Miryam, as a woman in, in technology. What inspires you to continue doing what you're doing and to elevate your career in data science? >> What motivates me, is that data science, we really have to look at it as a mean to solve a problem and not a, a fine, a goal in itself. So the fact that we can apply data science to so many fields and so many different projects. So here, for example we took examples of more industrial, maybe, applications. But for example, recently I worked on, on a study, on a data science study to understand what to, to analyze Google reviews of our clients on the service stations and to see what are the the topics that, that are really important to them. So we really have a, a large range of topics, and a diversity of topics that are really interesting, so. >> And that's so important, the diversity of topics alone. There's, I think we're just scratching the surface. We're just at the very beginning of what data science can empower for our daily lives. For businesses, small businesses, large businesses. I'd love to get your perspective as our only male on the show today, Alexandre, you have that elite title. The theme of International Women's Day this year which is today, March 8th, is "Embrace equity." >> Alexandre: Yes. >> Lisa: What is that, when you hear that theme as as a male in technology, as a male in the, in a role where you can actually elevate women and really bring in that thought diversity, what is embracing equity, what does it look like to you? >> To me, it, it's really, I mean, because we, we always talk about how we can, you know, I mean improve, but actually we are fixing a problem, an issue. I mean, it's such a reality. I mean, and the, the reality and and I mean, and force in, in the company. And that's, I think in Total Energy, we, we still have, I mean things, I mean, we, we haven't reached our objective but we're working hard and especially at the Digital Factory to, to, to improve on that. And for example, we have 40% of our women in tech. >> Lisa: 40? >> 40% of our tech people that are women. >> Lisa: Wow, that's fantastic! >> Yeah. That's, that's ... >> You're way ahead of, of the global average. >> Alexandre: Yeah. Yeah. >> That outstanding. >> We're quite proud of that. >> You should be. >> But we, we still, we still know that we, we have at least 10% >> Lisa: Yes. because it's not 50. The target is, the target is to 50 or more. And, and, but I want to insist on the fact that we have, we are correcting an issue. We are fixing an issue. We're not trying to improve something. I mean, that, that's important to have that in mind. >> Lisa: It is. Absolutely. >> Yeah. >> Miryam, I'd love to get your advice to your younger self, before you studied engineering. Obviously you had an interest when you were younger. What advice would you give to young Miriam now, looking back at what you've accomplished and being one of our female, visible females, in a technical role? What do you, what would you say to your younger self? >> Maybe I would say to continue as I started. So as I was saying at the beginning of the interview, when I was at high school, I have never felt like being a woman could stop me from doing anything. >> Lisa: Yeah. Yeah. >> So maybe to continue thinking this way, and yeah. And to, to stay here for, to, to continue this way. Yeah. >> Lisa: That's excellent. Sounds like you have the confidence. >> Mm. Yeah. >> And that's something that, that a lot of people ... I struggled with it when I was younger, have the confidence, "Can I do this?" >> Alexandre: Yeah. >> "Should I do this?" >> Myriam: Yeah. >> And you kind of went, "Why not?" >> Myriam: Yes. >> Which is, that is such a great message to get out to our audience and to everybody else's. Just, "I'm interested in this. I find it fascinating. Why not me?" >> Myriam: Yeah. >> Right? >> Alexandre: Yeah, true. >> And by bringing out, I think, role models as we do here at the conference, it's a, it's a way to to help young girls to be inspired and yeah. >> Alexandre: Yeah. >> We need to have women in leadership positions that we can see, because there's a saying here that we say a lot in the States, which is: "You can't be what you can't see." >> Alexandre: Yeah, that's true. >> And so we need more women and, and men supporting women and underrepresented minorities. And the great thing about WiDS is it does just that. So we thank you so much for your involvement in WiDS, Ambassador, our only male on the program today, Alexander, we thank you. >> I'm very proud of it. >> Awesome to hear that Total Energies has about 40% of females in technical roles and you're on that path to 50% or more. We, we look forward to watching that journey and we thank you so much for joining us on the show today. >> Alexandre: Thank you. >> Myriam: Thank you. >> Lisa: All right. For my guests, I'm Lisa Martin. You're watching theCUBE Live from Stanford University. This is our coverage of the eighth Annual Women in Data Science Conference. We'll be back after a short break, so stick around. (upbeat music)

Published Date : Mar 8 2023

SUMMARY :

covering the 8th Annual Women Great to have you guys on theCUBE today. and what it is that you guys are doing. So we are a multi-energy company now, That's the seven energy that we basically And a lot of other solar and wind. I'm not going to be there next month. bit of your background. for all of the businesses of the WiDS initiatives. One of the things that we talk a lot about I'm back to Stanford. of most of the women in of the different real world problems And, and we need that thought diversity. in the sense that we produce a lot of the project that we're working on. the data that we have. and helping the business all of the real world impact have the data, the data to I mean, I'm not the one But we have, you know, we, on the project that is about safety. and in term of safety we and is now into the Digital Factory. The influence and the I noticed in the AnitaB.org So the fact that we can apply data science as our only male on the show today, and I mean, and force in, in the company. of the global average. on the fact that we have, Lisa: It is. Miryam, I'd love to get your beginning of the interview, So maybe to continue Sounds like you have the confidence. And that's something that, and to everybody else's. here at the conference, We need to have women So we thank you so much for and we thank you so much for of the eighth Annual Women

ENTITIES

Entity	Category	Confidence
Miriam	PERSON	0.99+
Myriam Fayad	PERSON	0.99+
Alexander	PERSON	0.99+
Alexandre	PERSON	0.99+
Myriam	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Total Energies	ORGANIZATION	0.99+
Lisa	PERSON	0.99+
Miryam	PERSON	0.99+
Margo	PERSON	0.99+
Alexandre Lapene	PERSON	0.99+
2010	DATE	0.99+
Paris	LOCATION	0.99+
2022	DATE	0.99+
2015	DATE	0.99+
Grace Hopper Institute	ORGANIZATION	0.99+
Total Energy	ORGANIZATION	0.99+
40	QUANTITY	0.99+
50%	QUANTITY	0.99+
California	LOCATION	0.99+
50	QUANTITY	0.99+
40%	QUANTITY	0.99+
next month	DATE	0.99+
Margot	PERSON	0.99+
more than 100,000 employees	QUANTITY	0.99+
two years ago	DATE	0.99+
TotalEnergies	ORGANIZATION	0.99+
today	DATE	0.99+
AnitaB.org	ORGANIZATION	0.99+
both	QUANTITY	0.99+
10	QUANTITY	0.99+
First	QUANTITY	0.99+
8th Annual Women in Data Science Conference	EVENT	0.99+
International Women's Day	EVENT	0.99+
Stanford University	ORGANIZATION	0.98+
Total	ORGANIZATION	0.98+
2017	DATE	0.98+
over 130 countries	QUANTITY	0.98+
Google	ORGANIZATION	0.98+
One	QUANTITY	0.98+
seven colors	QUANTITY	0.98+

Jacqueline Kuo, Dataiku | WiDS 2023

(upbeat music) >> Morning guys and girls, welcome back to theCUBE's live coverage of Women in Data Science WIDS 2023 live at Stanford University. Lisa Martin here with my co-host for this segment, Tracy Zhang. We're really excited to be talking with a great female rockstar. You're going to learn a lot from her next, Jacqueline Kuo, solutions engineer at Dataiku. Welcome, Jacqueline. Great to have you. >> Thank you so much. >> Thank for being here. >> I'm so excited to be here. >> So one of the things I have to start out with, 'cause my mom Kathy Dahlia is watching, she's a New Yorker. You are a born and raised New Yorker and I learned from my mom and others. If you're born in New York no matter how long you've moved away, you are a New Yorker. There's you guys have like a secret club. (group laughs) >> I am definitely very proud of being born and raised in New York. My family immigrated to New York, New Jersey from Taiwan. So very proud Taiwanese American as well. But I absolutely love New York and I can't imagine living anywhere else. >> Yeah, yeah. >> I love it. >> So you studied, I was doing some research on you you studied mechanical engineering at MIT. >> Yes. >> That's huge. And you discovered your passion for all things data-related. You worked at IBM as an analytics consultant. Talk to us a little bit about your career path. Were you always interested in engineering STEM-related subjects from the time you were a child? >> I feel like my interests were ranging in many different things and I ended up landing in engineering, 'cause I felt like I wanted to gain a toolkit like a toolset to make some sort of change with or use my career to make some sort of change in this world. And I landed on engineering and mechanical engineering specifically, because I felt like I got to, in my undergrad do a lot of hands-on projects, learn every part of the engineering and design process to build products which is super-transferable and transferable skills sort of is like the trend in my career so far. Where after undergrad I wanted to move back to New York and mechanical engineering jobs are kind of few and fall far in between in the city. And I ended up landing at IBM doing analytics consulting, because I wanted to understand how to use data. I knew that data was really powerful and I knew that working with it could allow me to tell better stories to influence people across different industries. And that's also how I kind of landed at Dataiku to my current role, because it really does allow me to work across different industries and work on different problems that are just interesting. >> Yeah, I like the way that, how you mentioned building a toolkit when doing your studies at school. Do you think a lot of skills are still very relevant to your job at Dataiku right now? >> I think that at the core of it is just problem solving and asking questions and continuing to be curious or trying to challenge what is is currently given to you. And I think in an engineering degree you get a lot of that. >> Yeah, I'm sure. >> But I think that we've actually seen that a lot in the panels today already, that you get that through all different types of work and research and that kind of thoughtfulness comes across in all different industries too. >> Talk a little bit about some of the challenges, that data science is solving, because every company these days, whether it's an enterprise in manufacturing or a small business in retail, everybody has to be data-driven, because the end user, the end customer, whoever that is whether it's a person, an individual, a company, a B2B, expects to have a personalized custom experience and that comes from data. But you have to be able to understand that data treated properly, responsibly. Talk about some of the interesting projects that you're doing at Dataiku or maybe some that you've done in the past that are really kind of transformative across things climate change or police violence, some of the things that data science really is impacting these days. >> Yeah, absolutely. I think that what I love about coming to these conferences is that you hear about those really impactful social impact projects that I think everybody who's in data science wants to be working on. And I think at Dataiku what's great is that we do have this program called Ikig.AI where we work with nonprofits and we support them in their data and analytics projects. And so, a project I worked on was with the Clean Water, oh my goodness, the Ocean Cleanup project, Ocean Cleanup organization, which was amazing, because it was sort of outside of my day-to-day and it allowed me to work with them and help them understand better where plastic is being aggregated across the world and where it appears, whether that's on beaches or in lakes and rivers. So using data to help them better understand that. I feel like from a day-to-day though, we, in terms of our customers, they're really looking at very basic problems with data. And I say basic, not to diminish it, but really just to kind of say that it's high impact, but basic problems around how do they forecast sales better? That's a really kind of, sort of basic problem, but it's actually super-complex and really impactful for people, for companies when it comes to forecasting how much headcount they need to have in the next year or how much inventory to have if they're retail. And all of those are going to, especially for smaller companies, make a huge impact on whether they make profit or not. And so, what's great about working at Dataiku is you get to work on these high-impact projects and oftentimes I think from my perspective, I work as a solutions engineer on the commercial team. So it's just, we work generally with smaller customers and sometimes talking to them, me talking to them is like their first introduction to what data science is and what they can do with that data. And sort of using our platform to show them what the possibilities are and help them build a strategy around how they can implement data in their day-to-day. >> What's the difference? You were a data scientist by title and function, now you're a solutions engineer. Talk about the ascendancy into that and also some of the things that you and Tracy will talk about as those transferable, those transportable skills that probably maybe you learned in engineering, you brought data science now you're bringing to solutions engineering. >> Yeah, absolutely. So data science, I love working with data. I love getting in the weeds of things and I love, oftentimes that means debugging things or looking line by line at your code and trying to make it better. I found that on in the data science role, while those things I really loved, sometimes it also meant that I didn't, couldn't see or didn't have visibility into the broader picture of well like, well why are we doing this project? And who is it impacting? And because oftentimes your day-to-day is very much in the weeds. And so, I moved into sales or solutions engineering at Dataiku to get that perspective, because what a sales engineer does is support the sale from a technical perspective. And so, you really truly understand well, what is the customer looking for and what is going to influence them to make a purchase? And how do you tell the story of the impact of data? Because oftentimes they need to quantify well, if I purchase a software like Dataiku then I'm able to build this project and make this X impact on the business. And that is really powerful. That's where the storytelling comes in and that I feel like a lot of what we've been hearing today about connecting data with people who can actually do something with that data. That's really the bridge that we as sales engineers are trying to connect in that sales process. >> It's all about connectivity, isn't it? >> Yeah, definitely. We were talking about this earlier that it's about making impact and it's about people who we are analyzing data is like influencing. And I saw that one of the keywords or one of the biggest thing at Dataiku is everyday AI, so I wanted to just ask, could you please talk more about how does that weave into the problem solving and then day-to-day making an impact process? >> Yes, so I started working on Dataiku around three years ago and I fell in love with the product itself. The product that we have is we allow for people with different backgrounds. If you're coming from a data analyst background, data science, data engineering, maybe you are more of like a business subject matter expert, to all work in one unified central platform, one user interface. And why that's powerful is that when you're working with data, it's not just that data scientist working on their own and their own computer coding. We've heard today that it's all about connecting the data scientists with those business people, with maybe the data engineers and IT people who are actually going to put that model into production or other folks. And so, they all use different languages. Data scientists might use Python and R, your business people are using PowerPoint and Excel, everyone's using different tools. How do we bring them all in one place so that you can have conversations faster? So the business people can understand exactly what you're building with the data and can get their hands on that data and that model prediction faster. So that's what Dataiku does. That's the product that we have. And I completely forgot your question, 'cause I got so invested in talking about this. Oh, everyday AI. Yeah, so the goal of of Dataiku is really to allow for those maybe less technical people with less traditional data science backgrounds. Maybe they're data experts and they understand the data really well and they've been working in SQL for all their career. Maybe they're just subject matter experts and want to get more into working with data. We allow those people to do that through our no and low-code tools within our platform. Platform is very visual as well. And so, I've seen a lot of people learn data science, learn machine learning by working in the tool itself. And that's sort of, that's where everyday AI comes in, 'cause we truly believe that there are a lot of, there's a lot of unutilized expertise out there that we can bring in. And if we did give them access to data, imagine what we could do in the kind of work that they can do and become empowered basically with that. >> Yeah, we're just scratching the surface. I find data science so fascinating, especially when you talk about some of the real world applications, police violence, health inequities, climate change. Here we are in California and I don't know if you know, we're experiencing an atmospheric river again tomorrow. Californians and the rain- >> Storm is coming. >> We are not good... And I'm a native Californian, but we all know about climate change. People probably don't associate all of the data that is helping us understand it, make decisions based on what's coming what's happened in the past. I just find that so fascinating. But I really think we're truly at the beginning of really understanding the impact that being data-driven can actually mean whether you are investigating climate change or police violence or health inequities or your a grocery store that needs to become data-driven, because your consumer is expecting a personalized relevant experience. I want you to offer me up things that I know I was doing online grocery shopping, yesterday, I just got back from Europe and I was so thankful that my grocer is data-driven, because they made the process so easy for me. And but we have that expectation as consumers that it's going to be that easy, it's going to be that personalized. And what a lot of folks don't understand is the data the democratization of data, the AI that's helping make that a possibility that makes our lives easier. >> Yeah, I love that point around data is everywhere and the more we have, the actually the more access we actually are providing. 'cause now compute is cheaper, data is literally everywhere, you can get access to it very easily. And so, I feel like more people are just getting themselves involved and that's, I mean this whole conference around just bringing more women into this industry and more people with different backgrounds from minority groups so that we get their thoughts, their opinions into the work is so important and it's becoming a lot easier with all of the technology and tools just being open source being easier to access, being cheaper. And that I feel really hopeful about in this field. >> That's good. Hope is good, isn't it? >> Yes, that's all we need. But yeah, I'm glad to see that we're working towards that direction. I'm excited to see what lies in the future. >> We've been talking about numbers of women, percentages of women in technical roles for years and we've seen it hover around 25%. I was looking at some, I need to AnitaB.org stats from 2022 was just looking at this yesterday and the numbers are going up. I think the number was 26, 27.6% of women in technical roles. So we're seeing a growth there especially over pre-pandemic levels. Definitely the biggest challenge that still seems to be one of the biggest that remains is attrition. I would love to get your advice on what would you tell your younger self or the previous prior generation in terms of having the confidence and the courage to pursue engineering, pursue data science, pursue a technical role, and also stay in that role so you can be one of those females on stage that we saw today? >> Yeah, that's the goal right there one day. I think it's really about finding other people to lift and mentor and support you. And I talked to a bunch of people today who just found this conference through Googling it, and the fact that organizations like this exist really do help, because those are the people who are going to understand the struggles you're going through as a woman in this industry, which can get tough, but it gets easier when you have a community to share that with and to support you. And I do want to definitely give a plug to the WIDS@Dataiku team. >> Talk to us about that. >> Yeah, I was so fortunate to be a WIDS ambassador last year and again this year with Dataiku and I was here last year as well with Dataiku, but we have grown the WIDS effort so much over the last few years. So the first year we had two events in New York and also in London. Our Dataiku's global. So this year we additionally have one in the west coast out here in SF and another one in Singapore which is incredible to involve that team. But what I love is that everyone is really passionate about just getting more women involved in this industry. But then also what I find fortunate too at Dataiku is that we have a strong female, just a lot of women. >> Good. >> Yeah. >> A lot of women working as data scientists, solutions engineer and sales and all across the company who even if they aren't doing data work in a day-to-day, they are super-involved and excited to get more women in the technical field. And so. that's like our Empower group internally that hosts events and I feel like it's a really nice safe space for all of us to speak about challenges that we encounter and feel like we're not alone in that we have a support system to make it better. So I think from a nutrition standpoint every organization should have a female ERG to just support one another. >> Absolutely. There's so much value in a network in the community. I was talking to somebody who I'm blanking on this may have been in Barcelona last week, talking about a stat that showed that a really high percentage, 78% of people couldn't identify a female role model in technology. Of course, Sheryl Sandberg's been one of our role models and I thought a lot of people know Sheryl who's leaving or has left. And then a whole, YouTube influencers that have no idea that the CEO of YouTube for years has been a woman, who has- >> And she came last year to speak at WIDS. >> Did she? >> Yeah. >> Oh, I missed that. It must have been, we were probably filming. But we need more, we need to be, and it sounds like Dataiku was doing a great job of this. Tracy, we've talked about this earlier today. We need to see what we can be. And it sounds like Dataiku was pioneering that with that ERG program that you talked about. And I completely agree with you. That should be a standard program everywhere and women should feel empowered to raise their hand ask a question, or really embrace, "I'm interested in engineering, I'm interested in data science." Then maybe there's not a lot of women in classes. That's okay. Be the pioneer, be that next Sheryl Sandberg or the CTO of ChatGPT, Mira Murati, who's a female. We need more people that we can see and lean into that and embrace it. I think you're going to be one of them. >> I think so too. Just so that young girls like me like other who's so in school, can see, can look up to you and be like, "She's my role model and I want to be like her. And I know that there's someone to listen to me and to support me if I have any questions in this field." So yeah. >> Yeah, I mean that's how I feel about literally everyone that I'm surrounded by here. I find that you find role models and people to look up to in every conversation whenever I'm speaking with another woman in tech, because there's a journey that has had happen for you to get to that place. So it's incredible, this community. >> It is incredible. WIDS is a movement we're so proud of at theCUBE to have been a part of it since the very beginning, since 2015, I've been covering it since 2017. It's always one of my favorite events. It's so inspiring and it just goes to show the power that data can have, the influence, but also just that we're at the beginning of uncovering so much. Jacqueline's been such a pleasure having you on theCUBE. Thank you. >> Thank you. >> For sharing your story, sharing with us what Dataiku was doing and keep going. More power to you girl. We're going to see you up on that stage one of these years. >> Thank you so much. Thank you guys. >> Our pleasure. >> Our pleasure. >> For our guests and Tracy Zhang, this is Lisa Martin, you're watching theCUBE live at WIDS '23. #EmbraceEquity is this year's International Women's Day theme. Stick around, our next guest joins us in just a minute. (upbeat music)

Published Date : Mar 8 2023

SUMMARY :

We're really excited to be talking I have to start out with, and I can't imagine living anywhere else. So you studied, I was the time you were a child? and I knew that working Yeah, I like the way and continuing to be curious that you get that through and that comes from data. And I say basic, not to diminish it, and also some of the I found that on in the data science role, And I saw that one of the keywords so that you can have conversations faster? Californians and the rain- that it's going to be that easy, and the more we have, Hope is good, isn't it? I'm excited to see what and also stay in that role And I talked to a bunch of people today is that we have a strong and all across the company that have no idea that the And she came last and lean into that and embrace it. And I know that there's I find that you find role models but also just that we're at the beginning We're going to see you up on Thank you so much. #EmbraceEquity is this year's

ENTITIES

Entity	Category	Confidence
Sheryl	PERSON	0.99+
Mira Murati	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Tracy Zhang	PERSON	0.99+
Tracy	PERSON	0.99+
Jacqueline	PERSON	0.99+
Kathy Dahlia	PERSON	0.99+
Jacqueline Kuo	PERSON	0.99+
California	LOCATION	0.99+
Europe	LOCATION	0.99+
Dataiku	ORGANIZATION	0.99+
New York	LOCATION	0.99+
Singapore	LOCATION	0.99+
London	LOCATION	0.99+
last year	DATE	0.99+
Sheryl Sandberg	PERSON	0.99+
YouTube	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Barcelona	LOCATION	0.99+
2022	DATE	0.99+
Taiwan	LOCATION	0.99+
2015	DATE	0.99+
last week	DATE	0.99+
two events	QUANTITY	0.99+
26, 27.6%	QUANTITY	0.99+
last year	DATE	0.99+
PowerPoint	TITLE	0.99+
Excel	TITLE	0.99+
this year	DATE	0.99+
yesterday	DATE	0.99+
Python	TITLE	0.99+
Dataiku	PERSON	0.99+
New York, New Jersey	LOCATION	0.99+
tomorrow	DATE	0.99+
2017	DATE	0.99+
SF	LOCATION	0.99+
MIT	ORGANIZATION	0.99+
today	DATE	0.98+
78%	QUANTITY	0.98+
ChatGPT	ORGANIZATION	0.98+
one	QUANTITY	0.98+
Ocean Cleanup	ORGANIZATION	0.98+
SQL	TITLE	0.98+
next year	DATE	0.98+
International Women's Day	EVENT	0.97+
R	TITLE	0.97+
around 25%	QUANTITY	0.96+
Californians	PERSON	0.95+
Women in Data Science	TITLE	0.94+
one day	QUANTITY	0.92+
theCUBE	ORGANIZATION	0.91+
WIDS	ORGANIZATION	0.89+
first introduction	QUANTITY	0.88+
Stanford University	LOCATION	0.87+
one place	QUANTITY	0.87+

Margot Gerritsen, Stanford University | WiDS 2018

>> Narrator: Alumni. (upbeat music) >> Announcer: Live from Stanford University in Palo Alto, California, it's theCUBE. Covering Women in Data Science Conference 2018. Brought to you by Stanford. >> Welcome back to theCUBE, we are live at Stanford University for the third annual Women in Data Science Conference, WiDS. I'm Lisa Martin, very honored to be joined by one of the co-founders of this incredible WiDS movement and phenomenon, Dr. Margot Gerritsen. Welcome to theCUBE! >> It's great to be here, thanks so much for being at our conference. >> Oh, likewise. You were the senior associate dean and director of the Institute for Computational Mathematics and Engineering at Stanford. >> Gerritsen: That's right, yep. >> Wow, that's a mouthful and I'm glad I could actually pronounce that. So you have been, well, I would love to give our audience a sense of the history of WiDS, which is very short. You've been on this incredible growth and scale trajectory. But you've been in this field of computational science for what, 30, over 30 years? >> Yeah, probably since I was 16, so that was 35 years ago. >> Yeah, and you were used to being one of few, or if not the only woman >> That's right. >> In a meeting, in a room. You were okay with that but you realized, you know what? There are probably women who are not comfortable with this and it's probably going to be a barrier. Tell us about the conception of WiDS that you and your co-founders had. >> So, May, 2015, Esteban from Walmart Labs, now at Facebook, and Karen Matthys, who's still very active, you know, one of the organizers of the conference, and I were having coffee at a cafe in Stanford and we were lamenting the fact that at another data science conference that we had been to had only had male speakers. And so we connected with the organizers and asked them why? Did you notice? Because very often people are not even aware, it's just such the norm to only have male speakers, >> Right, right. >> That people don't even notice. And so we asked why is that? And they said, "Well, you know we really tried to find "speakers but we couldn't find any." And that really was, for me, the last straw. I've been in so many of these situations and I thought, you know, we're going to show them. So we joke sometimes, a little bit, we say it's sort of a revenge conference. (laughs) We said, let's show them we can get some really outstanding women, and in fact only women. And that's how it started. Now we were sitting at this coffee shop and I said, "Let's do a conference." And they said, "Well, that would be great, next year." And I said, "No, this year. "Let's just do it. "Let's do it in November." We had six months to put it together. It was just a local conference here. We got outstanding speakers, which were really great. Mostly from the area. And then we started live-streaming because we thought it would be fun to do. And to our big surprise, we had 6,000 people on the livestream just without really advertising. That made us realize, in November 2015, my goodness, we're onto something. And we had such amazing responses. We wanted to then scale up the conference and then you can hire a fantastic conference center in San Francisco and get 10,000 people in like they do, for example, at Grace Hopper. But we thought, why not use online technology and scale it up virtually and make this a global event using the livestream, that we will then provide to people, and asking for regional events, local events to be set up all around the world. And we created this ambassador program, that is now in its second year. the first year the responses were actually overwhelming to us already then. We got 75 ambassadors who set up 75 events around the world >> In about 40 countries. >> This was last year, 2017? >> Yeah, almost exactly 13 months ago, and then this year now we have over 200 ambassadors. We have 177 events in 155 cities in 53 countries. >> That's incredible. >> So we're on every continent apart from Antarctica but we're working on that one. >> Martin: I was going to say, that's probably next year. >> Yeah, that's right. >> The scale, though, that you've achieved in such a short time period, I think, not only speaks to the power, like you said, of using technology and using live-streaming, but also, there is a massive demand. >> Gerritsen: There is a great need, yeah. >> For not only supporting, like from the perspective of the conference, you want to support and inspire and educate data scientists worldwide and support females in the field, but it really, I think, underscores, there is still in 2018, a massive need to start raising more profiles and not just inspiring undergrad females, but also reinvigorating those of us that have been in the STEM field and technology for a while. >> Gerritsen: That's right. >> So, what are some of the things, so, this year, not only are you reaching, hopefully about 100,000 people, you mentioned some of the countries involved today, but you also have a new first this year with the WiDS Datathon. >> That's right. >> Tell us about the WiDS Datathon, what was the idea behind it? You announced some winners today? >> Yeah. Yeah, so with WiDS last year, we really felt that we hit a nerve. Now there is an incredible need for women to see other women perform so well in this field. And, you know, that's why we do it, to inspire. But it's a one-time event, it's once a year. And we started to think about, what are some of the ways that we can make this movement, because it's really become a movement, into something more than just an annual, once-a-year conference? And so, Datathon is a fantastic way to do that. You can engage people for several months before the conference, and you can announce the winner at the conference. It is something that can be done really easily worldwide if it is supported again by the ambassadors, so the local WiDS organizations. So we thought we'd just try. But again, it's one of those things we say, "Oh, let's do it." We, I think, thought about this about six months ago. Finding a good data set is always a challenge but we found a wonderful data set, and we had a great response with 1100, almost 1200 people in the world participating. >> That's incredible. >> Several hundred teams. Yeah, and what we said at the time was, well, let's have the teams be 50% female at least, so that was the requirement, we have a lot of mixed teams. And ultimately, of course, that's what we want. We want 50-50, men-women, have them both at the table, to participate in data science activities, to do data science research, and answer a lot of these data questions that are now driving so many decisions. Now we want everybody around the table. So with this Datathon, it was just a very small event in the sense, and I'm sure next year it will be bigger, but it was a great success now. >> Well, congratulations on that. One of the things I saw you on a Youtube video talking about over the weekend when I was doing some prep was that you wanted this Datathon to be fun, creative, and I think those are two incredibly important ways to describe careers, not just in STEM but in data science, that yes, this can be fun. >> Yep. >> Should be if you're spending so much time every day, right, doing something for a living. But I love the creativity descriptor. Tell us a little bit about the room for interpretation and creativity to start removing some of the bias that is clearly there in data interpretation? >> Oh. (laughs) You're hitting the biggest sore point in data science. And you could even turn it around, you say, because of creativity, we have a problem too. Because you can be very creative in how you interpret the data, and unfortunately, for most of us, whenever we look at news, whenever we look at data or other information given to us, we never see this through an objective lens. We always see this through our own filters. And that, of course, when you're doing data analysis is risky, and it's tricky. 'cause you're often not even aware that you're doing it. So that's one thing, you have this bias coming in just as a data scientist and engineer. Even though we always say we do objective work and we're building neutral software programs, we're not. We're not. Everything that we do in machine learning, data mining, we're looking for patterns that we think may be in the data because we have to program this data. And then even looking at some of the results, the way we visualize them, present them, can really introduce bias as well. And then we don't control the perception of people of this data. So we can present it the way we think is fair, but other people can interpret or use little bits of that data in other ways. So it's an incredibly difficult problem and the more we use data to address and answer critical challenges, the more data is influencing decisions made by politicians, made in industry, made by government, the more important it is that we are at least aware. One of the really interesting things this conference, is that many of the speakers are talking to that. We just had Latanya Sweeney give an outstanding keynote really about this, raising this awareness. We had Daniela Witten saying this, and various other speakers. And in the first year that we had this conference, you would not have heard this. >> Martin: Really? Only two years ago? >> Yeah. So even two years ago, some people were bringing it up, but now it is right at the forefront of almost everybody's thinking. Data ethics, the issue of reproducibility, confirmations bias, now at least people now are aware. And I'm always a great optimist, thinking if people are aware, and they see the need to really work on this, something will happen. But it is incredibly important for the new data scientists that come into the field to really have this awareness, and to have the skill sets to actually work with that. So as a data scientist, one of the reasons why I think it's so fun, you're not just a mathematician or statistician or computer scientist, you are somebody who needs to look at things taking into account ethics, and fairness. You need to understand human behavior. You need to understand the social sciences. And we're seeing that awareness now grow. The new generation of data scientists is picking that up now much more. Educational programs like ours too have embedded these sort of aspects into the education and I think there is a lot of hope for the future. But we're just starting. >> Right. But you hit the nail on the head. You've got to start with that awareness. And it sounds like, another thing that you just described is we often hear, the top skills that a data scientist needs to have is statistical analysis, data mining. But there's also now some of these other skills you just mentioned, maybe more on the softer side, that seem to be, from what we hear on theCUBE, as important, >> Gerritsen: That's right. >> As really that technical training. To be more well-rounded and to also, as you mentioned earlier, to have to the chance to influence every single sector, every single industry, in our world today. >> And it's a pity that they're called softer skills. (laughs) >> It is. >> Because they're very very hard skills to really master. >> A lot of them are probably you're born with it, right? It's innate, certain things that you can't necessarily teach? >> Well, I don't believe that you cannot do this without innate ability. Of course if you have this innate ability it helps a little, but there's a growth mindset of course, in this, and everybody can be taught. And that's what we try to do. Now, it may take a little bit of time, but you have to confront this and you have to give the people the skills and really integrate this in your education, integrate this at companies. Company culture plays a big role. >> Absolutely. >> This is one of the reasons why we want way more diversity in these companies, right. It's not just to have people in decision-making teams that are more diverse, but the whole culture of the company needs to change so that these sort of skills, communication, empathy, big one, communication skills, presentation skills, visualization skills, negotiation skills, that they really are developed everywhere, in the companies, at the universities. >> Absolutely. We speak with some companies, and some today, even, on theCUBE, where they really talk about how they're shifting, and SAP is one of them, their corporate culture to say we've got a goal by 2020 to have 30% of our workforce be female. You've got some great partners, you mentioned Walmart Labs, how challenging was it to go to some of these companies here in Silicon Valley and beyond and say, hey we have this idea for a conference, we want to do this in six months so strap on your seatbelts, what were those conversations like to get some of those partners onboard? >> We wouldn't have been able to do it in six months if the response had not been fantastic right from the get-go. I think we started the conference just at the right time. There was a lot of talk about diversity. Several of the companies were starting really big diversity initiatives. Intel is one of them, SAP is another one of them. We were connected with these companies. Walmart Labs, for example, one of the founders of the company was from Walmart Labs. And so when we said, look, we want to put this together, they said great. This is a fantastic venue for us also. You see this with some of these companies, they don't just come and give us money for this conference. They build their own WiDS events around the world. Like SAP built 30 WiDS events around the world. So they're very active everywhere. They see the need, of course, too. They do this because they really believe that a changed culture is for the best of everybody. But they also believe that because they need the women. There is a great shortage of really excellent data scientists right now, so why not look at 50% of your population? >> Martin: Exactly. >> You know, there's fantastic talent in that pool and they want to track that also. So I think that within the companies, there is more awareness, there is an economic need to do so, a real need, if they want to grow, they need those people. There is an awareness that for their future, the long term benefit of the company, they need this diversity in opinions, they need the diversity in the questions that are being asked, and the way that the companies look at the data. And so, I think we're at a golden age for that now. Now am I a little bit frustrated that it's 2018 and we're doing this? Yes. When I was a student 30 some years ago, I was one of the very few women, and I thought, by the time I'm old, and now I'm old, you know, as far as my 18-year-old self, right, I mean in your 50s, you're old. I thought everything would be better. And we certainly would be at critical mass, which is 30% or higher, and it's actually gone down since the 80s, in computer science and in data science and statistics, so it is really very frustrating in that sense that we're really starting again from quite a low level. >> Right. Right. >> But I see much more enthusiasm and now the difference is the economical need. So this is going to be driven by business sense as well as any other sense. >> Well I think you definitely, with WiDS, you are beyond onto something with what you've achieved in such a short time period. So I can only imagine, WiDS 2018 reaching up to 100,000 people over these events, what do you do next year? Where do you go from here? (laughs) >> Well, it's becoming a little bit of a challenge actually to organize and help and support all of these international events, so we're going to be thinking about how to organize ourselves, maybe on every continent. >> Getting to Antarctica in 2019? >> Yeah, but have a little bit more of a local or regional organization, so that's one thing. The main thing that we'd like to do is have even more events during the year. There are some specific needs that we cannot address right now. One need, for example, is for high school students. We have two high school students here today, which is wonderful, and quite a few of them are looking at the live-stream of the conference. But if you want to really reach out to high school students and tell them about this and the sort of skill sets that they should be thinking about developing when they are at university, you have to really do a special event. The same with undergraduate students, graduate students. So there are some markets there, some subgroups of people that we would really like to tailor to. The other thing is a lot of people are very very eager to self-educate, and so what we are going to be putting together, at least that's the plan now, we'll see, if we can make this, is educational tools, and really have a repository of educational tools that people can use to educate themselves and to learn more. We're going to start a podcast series of women, which will be very, very interesting. We'll start this next month, and so every week or every two weeks we'll have a new podcast out there. And then we'll keep the momentum going. But really the idea is to not provide just this one day of inspiration, but to provide throughout the year, >> Sustained inspiration. >> Sustained inspiration and resources. >> Wow, well, congratulations, Margot, to you and your co-founders. This is a movement, and we are very excited for the opportunity to have you on theCUBE as well as some of the speakers and the attendeees from the event today. And we look forward to seeing all the great things that I think are going to come for sure, the rest of this year and beyond. So thank you for giving us some of your time. >> Thank you so much, we're a big fan of theCUBE. >> Oh, we're lucky, thank you, thank you. We want to thank you for watching theCUBE. I'm Lisa Martin, we are live at the third annual Women in Data Science Conference coming to you from Stanford University, #WiDS2018, join the conversation. I'll be back with my next guest after a short break. (upbeat music)

Published Date : Mar 5 2018

SUMMARY :

(upbeat music) Brought to you by Stanford. Welcome back to theCUBE, we are live It's great to be here, thanks so much and director of the Institute for Computational a sense of the history of WiDS, which is very short. and it's probably going to be a barrier. And so we connected with the organizers and asked them why? And to our big surprise, we had 6,000 people now we have over 200 ambassadors. So we're on every continent apart from Antarctica not only speaks to the power, like you said, that have been in the STEM field and technology for a while. so, this year, not only are you reaching, before the conference, and you can announce so that was the requirement, we have a lot of mixed teams. One of the things I saw you on a Youtube video talking about and creativity to start removing some of the bias is that many of the speakers are talking to that. that come into the field to really have this awareness, that seem to be, from what we hear on theCUBE, as you mentioned earlier, to have to the chance to influence And it's a pity that they're called softer skills. and you have to give the people the skills that are more diverse, but the whole culture of the company You've got some great partners, you mentioned Walmart Labs, of the company was from Walmart Labs. by the time I'm old, and now I'm old, you know, Right. and now the difference is the economical need. what do you do next year? how to organize ourselves, maybe on every continent. But really the idea is to not provide for the opportunity to have you on theCUBE coming to you from Stanford University,

ENTITIES

Entity	Category	Confidence
Daniela Witten	PERSON	0.99+
Margot Gerritsen	PERSON	0.99+
Latanya Sweeney	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Esteban	PERSON	0.99+
Martin	PERSON	0.99+
Gerritsen	PERSON	0.99+
2018	DATE	0.99+
November 2015	DATE	0.99+
Walmart Labs	ORGANIZATION	0.99+
Karen Matthys	PERSON	0.99+
30%	QUANTITY	0.99+
May, 2015	DATE	0.99+
Institute for Computational Mathematics and Engineering	ORGANIZATION	0.99+
75 ambassadors	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
50%	QUANTITY	0.99+
75 events	QUANTITY	0.99+
San Francisco	LOCATION	0.99+
six months	QUANTITY	0.99+
Antarctica	LOCATION	0.99+
November	DATE	0.99+
155 cities	QUANTITY	0.99+
1100	QUANTITY	0.99+
18-year	QUANTITY	0.99+
SAP	ORGANIZATION	0.99+
Margot	PERSON	0.99+
last year	DATE	0.99+
53 countries	QUANTITY	0.99+
next year	DATE	0.99+
2019	DATE	0.99+
Stanford	LOCATION	0.99+
2020	DATE	0.99+
10,000 people	QUANTITY	0.99+
two	QUANTITY	0.99+
177 events	QUANTITY	0.99+
30	QUANTITY	0.99+
Intel	ORGANIZATION	0.99+
one	QUANTITY	0.99+
one-time	QUANTITY	0.99+
6,000 people	QUANTITY	0.99+
Palo Alto, California	LOCATION	0.99+
WiDS Datathon	EVENT	0.99+
this year	DATE	0.99+
over 200 ambassadors	QUANTITY	0.99+
WiDS	EVENT	0.99+
#WiDS2018	EVENT	0.99+
second year	QUANTITY	0.99+
Facebook	ORGANIZATION	0.98+
One	QUANTITY	0.98+
Stanford University	ORGANIZATION	0.98+
Stanford	ORGANIZATION	0.98+
one day	QUANTITY	0.98+
today	DATE	0.98+
Youtube	ORGANIZATION	0.98+
once a year	QUANTITY	0.97+
next month	DATE	0.97+
two years ago	DATE	0.97+
50-50	QUANTITY	0.97+
13 months ago	DATE	0.97+
50s	QUANTITY	0.97+
16	QUANTITY	0.97+
both	QUANTITY	0.97+
80s	DATE	0.97+
WiDS 2018	EVENT	0.96+

Mala Anand, SAP | WiDS 2018

>> Narrator: Live from Stanford University in Palo Alto, California. It's theCUBE covering Women in Data Science Conference 2018. Brought to you by Stanford. >> Welcome back to theCUBE. Our continuing coverage live at the Women in Data Science Conference 2018, #WiDS2018. I'm Lisa Martin and I'm very excited to not only be at the event, but to now be joined by one of the speakers who spoke this morning. Mala Anand, the executive vice president at SAP and the president of SAP Leonardo Data Analytics, Mala Anand, Mala, welcome to theCUBE. >> Thank you Lisa, I'm delighted to be here. >> So this is your first WiDS and we were talking off camera about this is the third WiDS and 100,000 people they're expecting to reach today. As a speaker, how does that feel knowing that this is being live streamed and on their Facebook Live page and you have the chance to reach that many people? >> It's really exciting, Lisa and you know, it's inspiring to see that we've been able to attract so many participants. It's such an important topic for us. More and more I think two elements of the topic, one is the impact that data science is going to have in our industry as well as the impact that we want more women to participate with the right passion and being able to be successful in this field. >> I love that you said passion. I think that's so key and that's certainly one of the things, I think as my second year hosting theCUBE at WiDS, you feel it when you walk in the door. You feel it when you're reading the #WiDS2018 Twitter feed. It's the passion is here, the excitement is here. 150 plus regional WiDS events going on today in over 50 countries so the reach can be massive. What were maybe the top three takeaways from your talk this morning that the participants got to learn? >> Absolutely, and what's really exciting to see is that we see from a business perspective that customers are seeing the potential to drive higher productivity and faster growth in this whole new notion of digital technologies and the ability now for these new forms of systems of intelligence where we embed machine learning, big data, analytics, IoT, into the core of the business processes and it allows us to reap unprecedented value from data. It allows us to create new business models and it also allows us to reimagine experiences. But all of this is only possible now with the ability to apply data science across industries in a very deep and domain expertise way, and so that's really exciting and, moreover, to see diversity in the participants. Diversity in the people that can impact this is very exciting. >> I agree. You talked about digital business. Digital transformation opens up so many new business model opportunities for companies but the application of advanced analytics, for example, alone opens up so many more career opportunities because every sector is affected by big data. Whether we know it or not, right? And so the opportunity for those careers is exploding. But another thing that I think is also ripe for conversation is bringing in diverse perspectives to analyze and interpret that data. >> Absolutely. >> To remove some of the bias so that more of those business models and opportunities can really bubble up. >> Absolutely. >> Lisa: Tell me about your team at SAP Leonardo and from a diversity perspective, what's going on there? >> Yeah, absolutely. So I think your point is really valid which is, the importance of bringing in diversity and also the importance of diversity both from a gender perspective and a diversity in skills. And I think the key element of data and decision science is now it opens up different types of skills, right? It opens up the skills of course, the technology skills are fundamental. The ability to read data modeling is fundamental, but then we add in the deep domain expertise. The add in the business perspectives. The ability to story tell and that's where I see the ability to story tell with the right domain expertise opens up such a massive opportunity for different kinds of participants in this field and so within SAP itself, we are very driven by driving diversity. SAP had set a very aggressive goal for by 2017 to be at 25% of women in leadership positions and we achieved that. We've got an aggressive goal to be at 30% of women in leadership positions by 2020 and we're really excited to achieve that as well and very important as well both within Leonardo and data analytics as well, by diversity is fundamental to our growth and more importantly to the growth for the industry. I think that's going to be fundamental. >> I think that's a really important point, the growth of the industry. SAP does a lot with WiDS. We had Ann Rosenberg on last year. I saw her walking around. So from a cultural stand point, what you've described, there's really a dedicated focus there and I think it's a unique opportunity that SAP doesn't have. They're taking advantage of it to really show how a massive corporation, a huge enterprise, can really be very dedicated to bringing in this diversity. It helps the business, but it also, to your point, can make a big impact on industry. >> Absolutely, you know, culture is such a critical part of being succeeding in the business, and I think culture is an important lever that can help differentiate companies in the market. So of course it's technology, it's value creation for our customers, and I think culture is such an important part of it, and when you unpeel the lever of culture, within there comes diversity, and within there comes bringing a different diversity of skills base as well that is going to be really critical in the next generation of businesses that will get created. >> I like that. Especially sitting in Silicon Valley where there's new businesses being created every, probably 30 seconds. I'd love to understand, if we kind of take a walk back through your career and how you got to where you are now. What were some of the things that inspired you along the way, mentors? What were some of the things that you found really impactful and crucial to you being as successful as you are and a speaker at an event like WiDS? >> Oh, absolutely. It's really exciting to see that from my own personal journey, I think that one of the things that was really important is passion. And ensuring that you find those areas that you're passionate about. I was always very passionate about software and being able to look at data and analyze data. From doing my undergraduate in Computer Science, as well as my graduate work in Computer Science from Brown, and from there on out, always looking at any of the opportunities whether it was an individual contributor that I did. It's important to be passionate and I felt that that was really my guiding post to really being able to move up from a career perspective, and also looking to be in an environment, in an ecosystem, of people and environments that you're always learning from, right? And always never being afraid to reach a little bit further than your capabilities. I think ensuring that you always have confidence in the ability that you can reach, and even though the goals might feel a little bit far away at the moment. So I think also being around a really solid team of mentors and being able to constantly learn. So I would say a constant, continuous learning, and passion is really the key to success. >> I couldn't agree more. I think it's that we often, the word expert is thrown around so often and in so many things, and there certainly are people that have garnered a lot of expertise in certain areas, but I always think, "Are you really ever an expert?" There's so much to learn everyday, there's so many opportunities. But another thing that you mentioned that reminded me of, we had Maria Klawe on a little bit earlier today and one of the things that she said in her welcome address was, in terms of inspiration, "Don't worry if there's something "that you think you're not good at." >> Mala: Absolutely. >> It's sort of getting out of your comfort zone and one of my mentors likes to say, "getting comfortably uncomfortable." That's not an easy thing to achieve. So I think having people around, people like yourself, you're now a mentor to potentially 100,000 people today, alone. What are some of the steps that you recommend of, how does someone go, "I really like this, "but I don't know if I can do it." How would you help someone get comfortably uncomfortable? >> Yeah, I think first of all, building a small group I would say, of stakeholders that are behind you and your success is going to be really important. I think also being confident about your abilities. Confidence comes in failing a few times. It's okay to miss a few goals, it's okay to fail, but then you leap forward even faster. >> Failure is not a bad F word, right? >> Mala: Absolutely. >> It really can be, and I think, a lot of leaders, like yourself will say that it's actually part of the process. >> It's very much part of the process. And so I think, number one thing is passion. First you've got to be really clear that this is exactly what you're passionate about. Second is building a team around you that you can count on, you can rely on, that are invested in your success. And then thirdly is also just to ensure that you are confident. Being confident about asking for more. Being confident about being able to reach close to the impossible is okay. >> It is okay, and it should be encouraged, every day. No matter what gender, what ethnicity, that should just sort of be one of those level playing fields, I think. Unfortunately, it probably won't be but events like WiDS, and the reach that it's making today alone, certainly, I think, offer a great foundation to start helping break some of the molds that even as we sit in Silicon Valley, are still there. There's still massive discrepancies in pay grades. There's still a big percentage of females with engineering degrees that are not working in the field. And I think the more people like yourself, and some of your other colleagues that are here participating at WiDS alone today, have the opportunity to reach a broader audience, share their stories. Their failures, the successes, and all the things that have shaped that path, the bigger the opportunity we have and it's, I think, almost, sort of a responsibility for those of us who've been in STEM for a while, to help the next generation understand nobody got here with a silver spoon. Eh, some. >> Absolutely. >> But on a straight path. It's always that zig zaggy sort of path, and embrace it! >> Yeah, I think that's key, right? And the one point here is very relevant that you mentioned as well is, that it's very important for us to recognize that a love for an environment where you can embrace the change, right? In order to embrace change, it's not just people that are going through it, but people that are supporting it and sponsoring it because it's a big change. It's a change from what was an environment a few years ago to what is going to be an environment of the future, which is an environment full of diversity. So I think being able to be ambassadors of the change is really important. As well as to allow for confidence building in this environment, right? I think that's going to be really critical as well. And for us to support those environments and build awareness. Build awareness of what is possible. I think many times people will go through their careers without being aware of what is possible. Things that were certain thresholds, certain limits, certain guidelines, two years ago are dramatically different today. >> Oh yes. >> So having those ambassadors of change that can help us build awareness, with our growing community, I think is going to be really important. >> I think, some of the things too, that you're speaking to, there are boundaries that are evaporating. We're seeing them become perforated and sort of disappear, as well as maybe some of these structured careers. There's a career as this, as that. They used to be pretty demarcated. Doctor, lawyer, architect, accountant, whatnot. And now it's almost infinite. Especially having a foundation in technology with data science and the real world social implications alone, that a career in this field can deliver just kind of shows the sky's the limit. >> Yeah, absolutely. The sky's truly the limit, and I think that's where you're absolutely right. The lines are blurring between certain areas, and at the same time, I think, this opens up huge opportunity for diversity in skill set and diversity in domain. I think equally important is to ensure to be successful you want to start by driving focus, as well, right? So, how do you draw that balance? And for us to be able to mentor and guide the younger generation, to drive that focus. At the same time take leverage the opportunities open is going to be critical. >> So getting back to SAP Leondardo. What's next in this year, we're in March of 2018. What are some of the things that are exciting you that your team is going to be working on and delivering for SAP and your customers this year? >> SAP Leondardo is really exciting because it essentially allows for our customers to drive faster innovation with less risk. And it allows our customers to create these digital businesses where you have to change a business process and a business model that no single technology can deliver. So as a result we bring together machine learning, big data analytics, IoT, all running on a solid cloud platform with in-memory databases like Kana, at scale. So this year is going to be all about how we bring these capabilities together very specifically by industry and reimagine processes across different industries. >> I like that, reimagine. I think that's one of the things that you're helping to do for females in data science and computer sciences. Reimagine the possibilities. Not just the younger generation, but also those who've been in the field for a while that I think will probably be quite inspired and reinvigorated by some of the things that you're sharing. So, Mala, thank you so much for taking the time to stop by theCUBE and share your insights with us. We wish you continued success in your career and we look forward to seeing you WiDS next year. >> Thank you so much, Lisa. I'm delighted to be here. >> Excellent. >> Thank you. >> My pleasure. We want to thank you. You are watching theCUBE live from WiDS 2018, at Stanford University. I'm Lisa Martin. Stick around, my next guest will be joining me after this short break.

Published Date : Mar 5 2018

SUMMARY :

Brought to you by Stanford. be at the event, but to now be joined and 100,000 people they're expecting to reach today. and being able to be successful in this field. that the participants got to learn? and the ability now for these new forms And so the opportunity for those careers is exploding. To remove some of the bias so that more I think that's going to be fundamental. to your point, can make a big impact on industry. that can help differentiate companies in the market. to you being as successful as you are and passion is really the key to success. and one of the things that she said and one of my mentors likes to say, It's okay to miss a few goals, it's okay to fail, a lot of leaders, like yourself to ensure that you are confident. that have shaped that path, the bigger It's always that zig zaggy sort of path, and embrace it! I think that's going to be really critical as well. I think is going to be really important. can deliver just kind of shows the sky's the limit. the opportunities open is going to be critical. What are some of the things that are exciting you And it allows our customers to create and reinvigorated by some of the things that you're sharing. I'm delighted to be here. from WiDS 2018, at Stanford University.

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
Lisa	PERSON	0.99+
March of 2018	DATE	0.99+
Mala Anand	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
Ann Rosenberg	PERSON	0.99+
2017	DATE	0.99+
Maria Klawe	PERSON	0.99+
SAP	ORGANIZATION	0.99+
30%	QUANTITY	0.99+
2020	DATE	0.99+
Second	QUANTITY	0.99+
30 seconds	QUANTITY	0.99+
100,000 people	QUANTITY	0.99+
last year	DATE	0.99+
Mala	PERSON	0.99+
next year	DATE	0.99+
25%	QUANTITY	0.99+
first	QUANTITY	0.99+
two elements	QUANTITY	0.99+
Palo Alto, California	LOCATION	0.99+
#WiDS2018	EVENT	0.99+
second year	QUANTITY	0.99+
First	QUANTITY	0.99+
SAP Leonardo	ORGANIZATION	0.99+
Women in Data Science Conference 2018	EVENT	0.98+
one	QUANTITY	0.98+
both	QUANTITY	0.98+
two years ago	DATE	0.98+
over 50 countries	QUANTITY	0.98+
third	QUANTITY	0.98+
this year	DATE	0.98+
one point	QUANTITY	0.98+
Stanford	ORGANIZATION	0.98+
SAP Leonardo Data Analytics	ORGANIZATION	0.97+
Brown	ORGANIZATION	0.97+
today	DATE	0.97+
WiDS	EVENT	0.97+
Women in Data Science Conference 2018	EVENT	0.97+
thirdly	QUANTITY	0.96+
Stanford University	ORGANIZATION	0.95+
single	QUANTITY	0.94+
WiDS 2018	EVENT	0.93+
few years ago	DATE	0.92+
WiDS	ORGANIZATION	0.92+
executive vice president	PERSON	0.9+
Twitter	ORGANIZATION	0.9+
this morning	DATE	0.89+
three takeaways	QUANTITY	0.86+
theCUBE	ORGANIZATION	0.84+
Leondardo	TITLE	0.83+
one of the speakers	QUANTITY	0.83+
Narrator	TITLE	0.8+
Facebook	TITLE	0.79+
president	PERSON	0.76+
earlier	DATE	0.73+

Data Science for All: It's a Whole New Game

>> There's a movement that's sweeping across businesses everywhere here in this country and around the world. And it's all about data. Today businesses are being inundated with data. To the tune of over two and a half million gigabytes that'll be generated in the next 60 seconds alone. What do you do with all that data? To extract insights you typically turn to a data scientist. But not necessarily anymore. At least not exclusively. Today the ability to extract value from data is becoming a shared mission. A team effort that spans the organization extending far more widely than ever before. Today, data science is being democratized. >> Data Sciences for All: It's a Whole New Game. >> Welcome everyone, I'm Katie Linendoll. I'm a technology expert writer and I love reporting on all things tech. My fascination with tech started very young. I began coding when I was 12. Received my networking certs by 18 and a degree in IT and new media from Rochester Institute of Technology. So as you can tell, technology has always been a sure passion of mine. Having grown up in the digital age, I love having a career that keeps me at the forefront of science and technology innovations. I spend equal time in the field being hands on as I do on my laptop conducting in depth research. Whether I'm diving underwater with NASA astronauts, witnessing the new ways which mobile technology can help rebuild the Philippine's economy in the wake of super typhoons, or sharing a first look at the newest iPhones on The Today Show, yesterday, I'm always on the hunt for the latest and greatest tech stories. And that's what brought me here. I'll be your host for the next hour and as we explore the new phenomenon that is taking businesses around the world by storm. And data science continues to become democratized and extends beyond the domain of the data scientist. And why there's also a mandate for all of us to become data literate. Now that data science for all drives our AI culture. And we're going to be able to take to the streets and go behind the scenes as we uncover the factors that are fueling this phenomenon and giving rise to a movement that is reshaping how businesses leverage data. And putting organizations on the road to AI. So coming up, I'll be doing interviews with data scientists. We'll see real world demos and take a look at how IBM is changing the game with an open data science platform. We'll also be joined by legendary statistician Nate Silver, founder and editor-in-chief of FiveThirtyEight. Who will shed light on how a data driven mindset is changing everything from business to our culture. We also have a few people who are joining us in our studio, so thank you guys for joining us. Come on, I can do better than that, right? Live studio audience, the fun stuff. And for all of you during the program, I want to remind you to join that conversation on social media using the hashtag DSforAll, it's data science for all. Share your thoughts on what data science and AI means to you and your business. And, let's dive into a whole new game of data science. Now I'd like to welcome my co-host General Manager IBM Analytics, Rob Thomas. >> Hello, Katie. >> Come on guys. >> Yeah, seriously. >> No one's allowed to be quiet during this show, okay? >> Right. >> Or, I'll start calling people out. So Rob, thank you so much. I think you know this conversation, we're calling it a data explosion happening right now. And it's nothing new. And when you and I chatted about it. You've been talking about this for years. You have to ask, is this old news at this point? >> Yeah, I mean, well first of all, the data explosion is not coming, it's here. And everybody's in the middle of it right now. What is different is the economics have changed. And the scale and complexity of the data that organizations are having to deal with has changed. And to this day, 80% of the data in the world still sits behind corporate firewalls. So, that's becoming a problem. It's becoming unmanageable. IT struggles to manage it. The business can't get everything they need. Consumers can't consume it when they want. So we have a challenge here. >> It's challenging in the world of unmanageable. Crazy complexity. If I'm sitting here as an IT manager of my business, I'm probably thinking to myself, this is incredibly frustrating. How in the world am I going to get control of all this data? And probably not just me thinking it. Many individuals here as well. >> Yeah, indeed. Everybody's thinking about how am I going to put data to work in my organization in a way I haven't done before. Look, you've got to have the right expertise, the right tools. The other thing that's happening in the market right now is clients are dealing with multi cloud environments. So data behind the firewall in private cloud, multiple public clouds. And they have to find a way. How am I going to pull meaning out of this data? And that brings us to data science and AI. That's how you get there. >> I understand the data science part but I think we're all starting to hear more about AI. And it's incredible that this buzz word is happening. How do businesses adopt to this AI growth and boom and trend that's happening in this world right now? >> Well, let me define it this way. Data science is a discipline. And machine learning is one technique. And then AI puts both machine learning into practice and applies it to the business. So this is really about how getting your business where it needs to go. And to get to an AI future, you have to lay a data foundation today. I love the phrase, "there's no AI without IA." That means you're not going to get to AI unless you have the right information architecture to start with. >> Can you elaborate though in terms of how businesses can really adopt AI and get started. >> Look, I think there's four things you have to do if you're serious about AI. One is you need a strategy for data acquisition. Two is you need a modern data architecture. Three is you need pervasive automation. And four is you got to expand job roles in the organization. >> Data acquisition. First pillar in this you just discussed. Can we start there and explain why it's so critical in this process? >> Yeah, so let's think about how data acquisition has evolved through the years. 15 years ago, data acquisition was about how do I get data in and out of my ERP system? And that was pretty much solved. Then the mobile revolution happens. And suddenly you've got structured and non-structured data. More than you've ever dealt with. And now you get to where we are today. You're talking terabytes, petabytes of data. >> [Katie] Yottabytes, I heard that word the other day. >> I heard that too. >> Didn't even know what it meant. >> You know how many zeros that is? >> I thought we were in Star Wars. >> Yeah, I think it's a lot of zeroes. >> Yodabytes, it's new. >> So, it's becoming more and more complex in terms of how you acquire data. So that's the new data landscape that every client is dealing with. And if you don't have a strategy for how you acquire that and manage it, you're not going to get to that AI future. >> So a natural segue, if you are one of these businesses, how do you build for the data landscape? >> Yeah, so the question I always hear from customers is we need to evolve our data architecture to be ready for AI. And the way I think about that is it's really about moving from static data repositories to more of a fluid data layer. >> And we continue with the architecture. New data architecture is an interesting buzz word to hear. But it's also one of the four pillars. So if you could dive in there. >> Yeah, I mean it's a new twist on what I would call some core data science concepts. For example, you have to leverage tools with a modern, centralized data warehouse. But your data warehouse can't be stagnant to just what's right there. So you need a way to federate data across different environments. You need to be able to bring your analytics to the data because it's most efficient that way. And ultimately, it's about building an optimized data platform that is designed for data science and AI. Which means it has to be a lot more flexible than what clients have had in the past. >> All right. So we've laid out what you need for driving automation. But where does the machine learning kick in? >> Machine learning is what gives you the ability to automate tasks. And I think about machine learning. It's about predicting and automating. And this will really change the roles of data professionals and IT professionals. For example, a data scientist cannot possibly know every algorithm or every model that they could use. So we can automate the process of algorithm selection. Another example is things like automated data matching. Or metadata creation. Some of these things may not be exciting but they're hugely practical. And so when you think about the real use cases that are driving return on investment today, it's things like that. It's automating the mundane tasks. >> Let's go ahead and come back to something that you mentioned earlier because it's fascinating to be talking about this AI journey, but also significant is the new job roles. And what are those other participants in the analytics pipeline? >> Yeah I think we're just at the start of this idea of new job roles. We have data scientists. We have data engineers. Now you see machine learning engineers. Application developers. What's really happening is that data scientists are no longer allowed to work in their own silo. And so the new job roles is about how does everybody have data first in their mind? And then they're using tools to automate data science, to automate building machine learning into applications. So roles are going to change dramatically in organizations. >> I think that's confusing though because we have several organizations who saying is that highly specialized roles, just for data science? Or is it applicable to everybody across the board? >> Yeah, and that's the big question, right? Cause everybody's thinking how will this apply? Do I want this to be just a small set of people in the organization that will do this? But, our view is data science has to for everybody. It's about bring data science to everybody as a shared mission across the organization. Everybody in the company has to be data literate. And participate in this journey. >> So overall, group effort, has to be a common goal, and we all need to be data literate across the board. >> Absolutely. >> Done deal. But at the end of the day, it's kind of not an easy task. >> It's not. It's not easy but it's maybe not as big of a shift as you would think. Because you have to put data in the hands of people that can do something with it. So, it's very basic. Give access to data. Data's often locked up in a lot of organizations today. Give people the right tools. Embrace the idea of choice or diversity in terms of those tools. That gets you started on this path. >> It's interesting to hear you say essentially you need to train everyone though across the board when it comes to data literacy. And I think people that are coming into the work force don't necessarily have a background or a degree in data science. So how do you manage? >> Yeah, so in many cases that's true. I will tell you some universities are doing amazing work here. One example, University of California Berkeley. They offer a course for all majors. So no matter what you're majoring in, you have a course on foundations of data science. How do you bring data science to every role? So it's starting to happen. We at IBM provide data science courses through CognitiveClass.ai. It's for everybody. It's free. And look, if you want to get your hands on code and just dive right in, you go to datascience.ibm.com. The key point is this though. It's more about attitude than it is aptitude. I think anybody can figure this out. But it's about the attitude to say we're putting data first and we're going to figure out how to make this real in our organization. >> I also have to give a shout out to my alma mater because I have heard that there is an offering in MS in data analytics. And they are always on the forefront of new technologies and new majors and on trend. And I've heard that the placement behind those jobs, people graduating with the MS is high. >> I'm sure it's very high. >> So go Tigers. All right, tangential. Let me get back to something else you touched on earlier because you mentioned that a number of customers ask you how in the world do I get started with AI? It's an overwhelming question. Where do you even begin? What do you tell them? >> Yeah, well things are moving really fast. But the good thing is most organizations I see, they're already on the path, even if they don't know it. They might have a BI practice in place. They've got data warehouses. They've got data lakes. Let me give you an example. AMC Networks. They produce a lot of the shows that I'm sure you watch Katie. >> [Katie] Yes, Breaking Bad, Walking Dead, any fans? >> [Rob] Yeah, we've got a few. >> [Katie] Well you taught me something I didn't even know. Because it's amazing how we have all these different industries, but yet media in itself is impacted too. And this is a good example. >> Absolutely. So, AMC Networks, think about it. They've got ads to place. They want to track viewer behavior. What do people like? What do they dislike? So they have to optimize every aspect of their business from marketing campaigns to promotions to scheduling to ads. And their goal was transform data into business insights and really take the burden off of their IT team that was heavily burdened by obviously a huge increase in data. So their VP of BI took the approach of using machine learning to process large volumes of data. They used a platform that was designed for AI and data processing. It's the IBM analytics system where it's a data warehouse, data science tools are built in. It has in memory data processing. And just like that, they were ready for AI. And they're already seeing that impact in their business. >> Do you think a movement of that nature kind of presses other media conglomerates and organizations to say we need to be doing this too? >> I think it's inevitable that everybody, you're either going to be playing, you're either going to be leading, or you'll be playing catch up. And so, as we talk to clients we think about how do you start down this path now, even if you have to iterate over time? Because otherwise you're going to wake up and you're going to be behind. >> One thing worth noting is we've talked about analytics to the data. It's analytics first to the data, not the other way around. >> Right. So, look. We as a practice, we say you want to bring data to where the data sits. Because it's a lot more efficient that way. It gets you better outcomes in terms of how you train models and it's more efficient. And we think that leads to better outcomes. Other organization will say, "Hey move the data around." And everything becomes a big data movement exercise. But once an organization has started down this path, they're starting to get predictions, they want to do it where it's really easy. And that means analytics applied right where the data sits. >> And worth talking about the role of the data scientist in all of this. It's been called the hot job of the decade. And a Harvard Business Review even dubbed it the sexiest job of the 21st century. >> Yes. >> I want to see this on the cover of Vogue. Like I want to see the first data scientist. Female preferred, on the cover of Vogue. That would be amazing. >> Perhaps you can. >> People agree. So what changes for them? Is this challenging in terms of we talk data science for all. Where do all the data science, is it data science for everyone? And how does it change everything? >> Well, I think of it this way. AI gives software super powers. It really does. It changes the nature of software. And at the center of that is data scientists. So, a data scientist has a set of powers that they've never had before in any organization. And that's why it's a hot profession. Now, on one hand, this has been around for a while. We've had actuaries. We've had statisticians that have really transformed industries. But there are a few things that are new now. We have new tools. New languages. Broader recognition of this need. And while it's important to recognize this critical skill set, you can't just limit it to a few people. This is about scaling it across the organization. And truly making it accessible to all. >> So then do we need more data scientists? Or is this something you train like you said, across the board? >> Well, I think you want to do a little bit of both. We want more. But, we can also train more and make the ones we have more productive. The way I think about it is there's kind of two markets here. And we call it clickers and coders. >> [Katie] I like that. That's good. >> So, let's talk about what that means. So clickers are basically somebody that wants to use tools. Create models visually. It's drag and drop. Something that's very intuitive. Those are the clickers. Nothing wrong with that. It's been valuable for years. There's a new crop of data scientists. They want to code. They want to build with the latest open source tools. They want to write in Python or R. These are the coders. And both approaches are viable. Both approaches are critical. Organizations have to have a way to meet the needs of both of those types. And there's not a lot of things available today that do that. >> Well let's keep going on that. Because I hear you talking about the data scientists role and how it's critical to success, but with the new tools, data science and analytics skills can extend beyond the domain of just the data scientist. >> That's right. So look, we're unifying coders and clickers into a single platform, which we call IBM Data Science Experience. And as the demand for data science expertise grows, so does the need for these kind of tools. To bring them into the same environment. And my view is if you have the right platform, it enables the organization to collaborate. And suddenly you've changed the nature of data science from an individual sport to a team sport. >> So as somebody that, my background is in IT, the question is really is this an additional piece of what IT needs to do in 2017 and beyond? Or is it just another line item to the budget? >> So I'm afraid that some people might view it that way. As just another line item. But, I would challenge that and say data science is going to reinvent IT. It's going to change the nature of IT. And every organization needs to think about what are the skills that are critical? How do we engage a broader team to do this? Because once they get there, this is the chance to reinvent how they're performing IT. >> [Katie] Challenging or not? >> Look it's all a big challenge. Think about everything IT organizations have been through. Some of them were late to things like mobile, but then they caught up. Some were late to cloud, but then they caught up. I would just urge people, don't be late to data science. Use this as your chance to reinvent IT. Start with this notion of clickers and coders. This is a seminal moment. Much like mobile and cloud was. So don't be late. >> And I think it's critical because it could be so costly to wait. And Rob and I were even chatting earlier how data analytics is just moving into all different kinds of industries. And I can tell you even personally being effected by how important the analysis is in working in pediatric cancer for the last seven years. I personally implement virtual reality headsets to pediatric cancer hospitals across the country. And it's great. And it's working phenomenally. And the kids are amazed. And the staff is amazed. But the phase two of this project is putting in little metrics in the hardware that gather the breathing, the heart rate to show that we have data. Proof that we can hand over to the hospitals to continue making this program a success. So just in-- >> That's a great example. >> An interesting example. >> Saving lives? >> Yes. >> That's also applying a lot of what we talked about. >> Exciting stuff in the world of data science. >> Yes. Look, I just add this is an existential moment for every organization. Because what you do in this area is probably going to define how competitive you are going forward. And think about if you don't do something. What if one of your competitors goes and creates an application that's more engaging with clients? So my recommendation is start small. Experiment. Learn. Iterate on projects. Define the business outcomes. Then scale up. It's very doable. But you've got to take the first step. >> First step always critical. And now we're going to get to the fun hands on part of our story. Because in just a moment we're going to take a closer look at what data science can deliver. And where organizations are trying to get to. All right. Thank you Rob and now we've been joined by Siva Anne who is going to help us navigate this demo. First, welcome Siva. Give him a big round of applause. Yeah. All right, Rob break down what we're going to be looking at. You take over this demo. >> All right. So this is going to be pretty interesting. So Siva is going to take us through. So he's going to play the role of a financial adviser. Who wants to help better serve clients through recommendations. And I'm going to really illustrate three things. One is how do you federate data from multiple data sources? Inside the firewall, outside the firewall. How do you apply machine learning to predict and to automate? And then how do you move analytics closer to your data? So, what you're seeing here is a custom application for an investment firm. So, Siva, our financial adviser, welcome. So you can see at the top, we've got market data. We pulled that from an external source. And then we've got Siva's calendar in the middle. He's got clients on the right side. So page down, what else do you see down there Siva? >> [Siva] I can see the recent market news. And in here I can see that JP Morgan is calling for a US dollar rebound in the second half of the year. And, I have upcoming meeting with Leo Rakes. I can get-- >> [Rob] So let's go in there. Why don't you click on Leo Rakes. So, you're sitting at your desk, you're deciding how you're going to spend the day. You know you have a meeting with Leo. So you click on it. You immediately see, all right, so what do we know about him? We've got data governance implemented. So we know his age, we know his degree. We can see he's not that aggressive of a trader. Only six trades in the last few years. But then where it gets interesting is you go to the bottom. You start to see predicted industry affinity. Where did that come from? How do we have that? >> [Siva] So these green lines and red arrows here indicate the trending affinity of Leo Rakes for particular industry stocks. What we've done here is we've built machine learning models using customer's demographic data, his stock portfolios, and browsing behavior to build a model which can predict his affinity for a particular industry. >> [Rob] Interesting. So, I like to think of this, we call it celebrity experiences. So how do you treat every customer like they're a celebrity? So to some extent, we're reading his mind. Because without asking him, we know that he's going to have an affinity for auto stocks. So we go down. Now we look at his portfolio. You can see okay, he's got some different holdings. He's got Amazon, Google, Apple, and then he's got RACE, which is the ticker for Ferrari. You can see that's done incredibly well. And so, as a financial adviser, you look at this and you say, all right, we know he loves auto stocks. Ferrari's done very well. Let's create a hedge. Like what kind of security would interest him as a hedge against his position for Ferrari? Could we go figure that out? >> [Siva] Yes. Given I know that he's gotten an affinity for auto stocks, and I also see that Ferrari has got some terminus gains, I want to lock in these gains by hedging. And I want to do that by picking a auto stock which has got negative correlation with Ferrari. >> [Rob] So this is where we get to the idea of in database analytics. Cause you start clicking that and immediately we're getting instant answers of what's happening. So what did we find here? We're going to compare Ferrari and Honda. >> [Siva] I'm going to compare Ferrari with Honda. And what I see here instantly is that Honda has got a negative correlation with Ferrari, which makes it a perfect mix for his stock portfolio. Given he has an affinity for auto stocks and it correlates negatively with Ferrari. >> [Rob] These are very powerful tools at the hand of a financial adviser. You think about it. As a financial adviser, you wouldn't think about federating data, machine learning, pretty powerful. >> [Siva] Yes. So what we have seen here is that using the common SQL engine, we've been able to federate queries across multiple data sources. Db2 Warehouse in the cloud, IBM's Integrated Analytic System, and Hortonworks powered Hadoop platform for the new speeds. We've been able to use machine learning to derive innovative insights about his stock affinities. And drive the machine learning into the appliance. Closer to where the data resides to deliver high performance analytics. >> [Rob] At scale? >> [Siva] We're able to run millions of these correlations across stocks, currency, other factors. And even score hundreds of customers for their affinities on a daily basis. >> That's great. Siva, thank you for playing the role of financial adviser. So I just want to recap briefly. Cause this really powerful technology that's really simple. So we federated, we aggregated multiple data sources from all over the web and internal systems. And public cloud systems. Machine learning models were built that predicted Leo's affinity for a certain industry. In this case, automotive. And then you see when you deploy analytics next to your data, even a financial adviser, just with the click of a button is getting instant answers so they can go be more productive in their next meeting. This whole idea of celebrity experiences for your customer, that's available for everybody, if you take advantage of these types of capabilities. Katie, I'll hand it back to you. >> Good stuff. Thank you Rob. Thank you Siva. Powerful demonstration on what we've been talking about all afternoon. And thank you again to Siva for helping us navigate. Should be give him one more round of applause? We're going to be back in just a moment to look at how we operationalize all of this data. But in first, here's a message from me. If you're a part of a line of business, your main fear is disruption. You know data is the new goal that can create huge amounts of value. So does your competition. And they may be beating you to it. You're convinced there are new business models and revenue sources hidden in all the data. You just need to figure out how to leverage it. But with the scarcity of data scientists, you really can't rely solely on them. You may need more people throughout the organization that have the ability to extract value from data. And as a data science leader or data scientist, you have a lot of the same concerns. You spend way too much time looking for, prepping, and interpreting data and waiting for models to train. You know you need to operationalize the work you do to provide business value faster. What you want is an easier way to do data prep. And rapidly build models that can be easily deployed, monitored and automatically updated. So whether you're a data scientist, data science leader, or in a line of business, what's the solution? What'll it take to transform the way you work? That's what we're going to explore next. All right, now it's time to delve deeper into the nuts and bolts. The nitty gritty of operationalizing data science and creating a data driven culture. How do you actually do that? Well that's what these experts are here to share with us. I'm joined by Nir Kaldero, who's head of data science at Galvanize, which is an education and training organization. Tricia Wang, who is co-founder of Sudden Compass, a consultancy that helps companies understand people with data. And last, but certainly not least, Michael Li, founder and CEO of Data Incubator, which is a data science train company. All right guys. Shall we get right to it? >> All right. >> So data explosion happening right now. And we are seeing it across the board. I just shared an example of how it's impacting my philanthropic work in pediatric cancer. But you guys each have so many unique roles in your business life. How are you seeing it just blow up in your fields? Nir, your thing? >> Yeah, for example like in Galvanize we train many Fortune 500 companies. And just by looking at the demand of companies that wants us to help them go through this digital transformation is mind-blowing. Data point by itself. >> Okay. Well what we're seeing what's going on is that data science like as a theme, is that it's actually for everyone now. But what's happening is that it's actually meeting non technical people. But what we're seeing is that when non technical people are implementing these tools or coming at these tools without a base line of data literacy, they're often times using it in ways that distance themselves from the customer. Because they're implementing data science tools without a clear purpose, without a clear problem. And so what we do at Sudden Compass is that we work with companies to help them embrace and understand the complexity of their customers. Because often times they are misusing data science to try and flatten their understanding of the customer. As if you can just do more traditional marketing. Where you're putting people into boxes. And I think the whole ROI of data is that you can now understand people's relationships at a much more complex level at a greater scale before. But we have to do this with basic data literacy. And this has to involve technical and non technical people. >> Well you can have all the data in the world, and I think it speaks to, if you're not doing the proper movement with it, forget it. It means nothing at the same time. >> No absolutely. I mean, I think that when you look at the huge explosion in data, that comes with it a huge explosion in data experts. Right, we call them data scientists, data analysts. And sometimes they're people who are very, very talented, like the people here. But sometimes you have people who are maybe re-branding themselves, right? Trying to move up their title one notch to try to attract that higher salary. And I think that that's one of the things that customers are coming to us for, right? They're saying, hey look, there are a lot of people that call themselves data scientists, but we can't really distinguish. So, we have sort of run a fellowship where you help companies hire from a really talented group of folks, who are also truly data scientists and who know all those kind of really important data science tools. And we also help companies internally. Fortune 500 companies who are looking to grow that data science practice that they have. And we help clients like McKinsey, BCG, Bain, train up their customers, also their clients, also their workers to be more data talented. And to build up that data science capabilities. >> And Nir, this is something you work with a lot. A lot of Fortune 500 companies. And when we were speaking earlier, you were saying many of these companies can be in a panic. >> Yeah. >> Explain that. >> Yeah, so you know, not all Fortune 500 companies are fully data driven. And we know that the winners in this fourth industrial revolution, which I like to call the machine intelligence revolution, will be companies who navigate and transform their organization to unlock the power of data science and machine learning. And the companies that are not like that. Or not utilize data science and predictive power well, will pretty much get shredded. So they are in a panic. >> Tricia, companies have to deal with data behind the firewall and in the new multi cloud world. How do organizations start to become driven right to the core? >> I think the most urgent question to become data driven that companies should be asking is how do I bring the complex reality that our customers are experiencing on the ground in to a corporate office? Into the data models. So that question is critical because that's how you actually prevent any big data disasters. And that's how you leverage big data. Because when your data models are really far from your human models, that's when you're going to do things that are really far off from how, it's going to not feel right. That's when Tesco had their terrible big data disaster that they're still recovering from. And so that's why I think it's really important to understand that when you implement big data, you have to further embrace thick data. The qualitative, the emotional stuff, that is difficult to quantify. But then comes the difficult art and science that I think is the next level of data science. Which is that getting non technical and technical people together to ask how do we find those unknown nuggets of insights that are difficult to quantify? Then, how do we do the next step of figuring out how do you mathematically scale those insights into a data model? So that actually is reflective of human understanding? And then we can start making decisions at scale. But you have to have that first. >> That's absolutely right. And I think that when we think about what it means to be a data scientist, right? I always think about it in these sort of three pillars. You have the math side. You have to have that kind of stats, hardcore machine learning background. You have the programming side. You don't work with small amounts of data. You work with large amounts of data. You've got to be able to type the code to make those computers run. But then the last part is that human element. You have to understand the domain expertise. You have to understand what it is that I'm actually analyzing. What's the business proposition? And how are the clients, how are the users actually interacting with the system? That human element that you were talking about. And I think having somebody who understands all of those and not just in isolation, but is able to marry that understanding across those different topics, that's what makes a data scientist. >> But I find that we don't have people with those skill sets. And right now the way I see teams being set up inside companies is that they're creating these isolated data unicorns. These data scientists that have graduated from your programs, which are great. But, they don't involve the people who are the domain experts. They don't involve the designers, the consumer insight people, the people, the salespeople. The people who spend time with the customers day in and day out. Somehow they're left out of the room. They're consulted, but they're not a stakeholder. >> Can I actually >> Yeah, yeah please. >> Can I actually give a quick example? So for example, we at Galvanize train the executives and the managers. And then the technical people, the data scientists and the analysts. But in order to actually see all of the RY behind the data, you also have to have a creative fluid conversation between non technical and technical people. And this is a major trend now. And there's a major gap. And we need to increase awareness and kind of like create a new, kind of like environment where technical people also talks seamlessly with non technical ones. >> [Tricia] We call-- >> That's one of the things that we see a lot. Is one of the trends in-- >> A major trend. >> data science training is it's not just for the data science technical experts. It's not just for one type of person. So a lot of the training we do is sort of data engineers. People who are more on the software engineering side learning more about the stats of math. And then people who are sort of traditionally on the stat side learning more about the engineering. And then managers and people who are data analysts learning about both. >> Michael, I think you said something that was of interest too because I think we can look at IBM Watson as an example. And working in healthcare. The human component. Because often times we talk about machine learning and AI, and data and you get worried that you still need that human component. Especially in the world of healthcare. And I think that's a very strong point when it comes to the data analysis side. Is there any particular example you can speak to of that? >> So I think that there was this really excellent paper a while ago talking about all the neuro net stuff and trained on textual data. So looking at sort of different corpuses. And they found that these models were highly, highly sexist. They would read these corpuses and it's not because neuro nets themselves are sexist. It's because they're reading the things that we write. And it turns out that we write kind of sexist things. And they would sort of find all these patterns in there that were sort of latent, that had a lot of sort of things that maybe we would cringe at if we sort of saw. And I think that's one of the really important aspects of the human element, right? It's being able to come in and sort of say like, okay, I know what the biases of the system are, I know what the biases of the tools are. I need to figure out how to use that to make the tools, make the world a better place. And like another area where this comes up all the time is lending, right? So the federal government has said, and we have a lot of clients in the financial services space, so they're constantly under these kind of rules that they can't make discriminatory lending practices based on a whole set of protected categories. Race, sex, gender, things like that. But, it's very easy when you train a model on credit scores to pick that up. And then to have a model that's inadvertently sexist or racist. And that's where you need the human element to come back in and say okay, look, you're using the classic example would be zip code, you're using zip code as a variable. But when you look at it, zip codes actually highly correlated with race. And you can't do that. So you may inadvertently by sort of following the math and being a little naive about the problem, inadvertently introduce something really horrible into a model and that's where you need a human element to sort of step in and say, okay hold on. Slow things down. This isn't the right way to go. >> And the people who have -- >> I feel like, I can feel her ready to respond. >> Yes, I'm ready. >> She's like let me have at it. >> And the people here it is. And the people who are really great at providing that human intelligence are social scientists. We are trained to look for bias and to understand bias in data. Whether it's quantitative or qualitative. And I really think that we're going to have less of these kind of problems if we had more integrated teams. If it was a mandate from leadership to say no data science team should be without a social scientist, ethnographer, or qualitative researcher of some kind, to be able to help see these biases. >> The talent piece is actually the most crucial-- >> Yeah. >> one here. If you look about how to enable machine intelligence in organization there are the pillars that I have in my head which is the culture, the talent and the technology infrastructure. And I believe and I saw in working very closely with the Fortune 100 and 200 companies that the talent piece is actually the most important crucial hard to get. >> [Tricia] I totally agree. >> It's absolutely true. Yeah, no I mean I think that's sort of like how we came up with our business model. Companies were basically saying hey, I can't hire data scientists. And so we have a fellowship where we get 2,000 applicants each quarter. We take the top 2% and then we sort of train them up. And we work with hiring companies who then want to hire from that population. And so we're sort of helping them solve that problem. And the other half of it is really around training. Cause with a lot of industries, especially if you're sort of in a more regulated industry, there's a lot of nuances to what you're doing. And the fastest way to develop that data science or AI talent may not necessarily be to hire folks who are coming out of a PhD program. It may be to take folks internally who have a lot of that domain knowledge that you have and get them trained up on those data science techniques. So we've had large insurance companies come to us and say hey look, we hire three or four folks from you a quarter. That doesn't move the needle for us. What we really need is take the thousand actuaries and statisticians that we have and get all of them trained up to become a data scientist and become data literate in this new open source world. >> [Katie] Go ahead. >> All right, ladies first. >> Go ahead. >> Are you sure? >> No please, fight first. >> Go ahead. >> Go ahead Nir. >> So this is actually a trend that we have been seeing in the past year or so that companies kind of like start to look how to upscale and look for talent within the organization. So they can actually move them to become more literate and navigate 'em from analyst to data scientist. And from data scientist to machine learner. So this is actually a trend that is happening already for a year or so. >> Yeah, but I also find that after they've gone through that training in getting people skilled up in data science, the next problem that I get is executives coming to say we've invested in all of this. We're still not moving the needle. We've already invested in the right tools. We've gotten the right skills. We have enough scale of people who have these skills. Why are we not moving the needle? And what I explain to them is look, you're still making decisions in the same way. And you're still not involving enough of the non technical people. Especially from marketing, which is now, the CMO's are much more responsible for driving growth in their companies now. But often times it's so hard to change the old way of marketing, which is still like very segmentation. You know, demographic variable based, and we're trying to move people to say no, you have to understand the complexity of customers and not put them in boxes. >> And I think underlying a lot of this discussion is this question of culture, right? >> Yes. >> Absolutely. >> How do you build a data driven culture? And I think that that culture question, one of the ways that comes up quite often in especially in large, Fortune 500 enterprises, is that they are very, they're not very comfortable with sort of example, open source architecture. Open source tools. And there is some sort of residual bias that that's somehow dangerous. So security vulnerability. And I think that that's part of the cultural challenge that they often have in terms of how do I build a more data driven organization? Well a lot of the talent really wants to use these kind of tools. And I mean, just to give you an example, we are partnering with one of the major cloud providers to sort of help make open source tools more user friendly on their platform. So trying to help them attract the best technologists to use their platform because they want and they understand the value of having that kind of open source technology work seamlessly on their platforms. So I think that just sort of goes to show you how important open source is in this movement. And how much large companies and Fortune 500 companies and a lot of the ones we work with have to embrace that. >> Yeah, and I'm seeing it in our work. Even when we're working with Fortune 500 companies, is that they've already gone through the first phase of data science work. Where I explain it was all about the tools and getting the right tools and architecture in place. And then companies started moving into getting the right skill set in place. Getting the right talent. And what you're talking about with culture is really where I think we're talking about the third phase of data science, which is looking at communication of these technical frameworks so that we can get non technical people really comfortable in the same room with data scientists. That is going to be the phase, that's really where I see the pain point. And that's why at Sudden Compass, we're really dedicated to working with each other to figure out how do we solve this problem now? >> And I think that communication between the technical stakeholders and management and leadership. That's a very critical piece of this. You can't have a successful data science organization without that. >> Absolutely. >> And I think that actually some of the most popular trainings we've had recently are from managers and executives who are looking to say, how do I become more data savvy? How do I figure out what is this data science thing and how do I communicate with my data scientists? >> You guys made this way too easy. I was just going to get some popcorn and watch it play out. >> Nir, last 30 seconds. I want to leave you with an opportunity to, anything you want to add to this conversation? >> I think one thing to conclude is to say that companies that are not data driven is about time to hit refresh and figure how they transition the organization to become data driven. To become agile and nimble so they can actually see what opportunities from this important industrial revolution. Otherwise, unfortunately they will have hard time to survive. >> [Katie] All agreed? >> [Tricia] Absolutely, you're right. >> Michael, Trish, Nir, thank you so much. Fascinating discussion. And thank you guys again for joining us. We will be right back with another great demo. Right after this. >> Thank you Katie. >> Once again, thank you for an excellent discussion. Weren't they great guys? And thank you for everyone who's tuning in on the live webcast. As you can hear, we have an amazing studio audience here. And we're going to keep things moving. I'm now joined by Daniel Hernandez and Siva Anne. And we're going to turn our attention to how you can deliver on what they're talking about using data science experience to do data science faster. >> Thank you Katie. Siva and I are going to spend the next 10 minutes showing you how you can deliver on what they were saying using the IBM Data Science Experience to do data science faster. We'll demonstrate through new features we introduced this week how teams can work together more effectively across the entire analytics life cycle. How you can take advantage of any and all data no matter where it is and what it is. How you could use your favorite tools from open source. And finally how you could build models anywhere and employ them close to where your data is. Remember the financial adviser app Rob showed you? To build an app like that, we needed a team of data scientists, developers, data engineers, and IT staff to collaborate. We do this in the Data Science Experience through a concept we call projects. When I create a new project, I can now use the new Github integration feature. We're doing for data science what we've been doing for developers for years. Distributed teams can work together on analytics projects. And take advantage of Github's version management and change management features. This is a huge deal. Let's explore the project we created for the financial adviser app. As you can see, our data engineer Joane, our developer Rob, and others are collaborating this project. Joane got things started by bringing together the trusted data sources we need to build the app. Taking a closer look at the data, we see that our customer and profile data is stored on our recently announced IBM Integrated Analytics System, which runs safely behind our firewall. We also needed macro economic data, which she was able to find in the Federal Reserve. And she stored it in our Db2 Warehouse on Cloud. And finally, she selected stock news data from NASDAQ.com and landed that in a Hadoop cluster, which happens to be powered by Hortonworks. We added a new feature to the Data Science Experience so that when it's installed with Hortonworks, it automatically uses a need of security and governance controls within the cluster so your data is always secure and safe. Now we want to show you the news data we stored in the Hortonworks cluster. This is the mean administrative console. It's powered by an open source project called Ambari. And here's the news data. It's in parquet files stored in HDFS, which happens to be a distributive file system. To get the data from NASDAQ into our cluster, we used IBM's BigIntegrate and BigQuality to create automatic data pipelines that acquire, cleanse, and ingest that news data. Once the data's available, we use IBM's Big SQL to query that data using SQL statements that are much like the ones we would use for any relation of data, including the data that we have in the Integrated Analytics System and Db2 Warehouse on Cloud. This and the federation capabilities that Big SQL offers dramatically simplifies data acquisition. Now we want to show you how we support a brand new tool that we're excited about. Since we launched last summer, the Data Science Experience has supported Jupyter and R for data analysis and visualization. In this week's update, we deeply integrated another great open source project called Apache Zeppelin. It's known for having great visualization support, advanced collaboration features, and is growing in popularity amongst the data science community. This is an example of Apache Zeppelin and the notebook we created through it to explore some of our data. Notice how wonderful and easy the data visualizations are. Now we want to walk you through the Jupyter notebook we created to explore our customer preference for stocks. We use notebooks to understand and explore data. To identify the features that have some predictive power. Ultimately, we're trying to assess what ultimately is driving customer stock preference. Here we did the analysis to identify the attributes of customers that are likely to purchase auto stocks. We used this understanding to build our machine learning model. For building machine learning models, we've always had tools integrated into the Data Science Experience. But sometimes you need to use tools you already invested in. Like our very own SPSS as well as SAS. Through new import feature, you can easily import those models created with those tools. This helps you avoid vendor lock-in, and simplify the development, training, deployment, and management of all your models. To build the models we used in app, we could have coded, but we prefer a visual experience. We used our customer profile data in the Integrated Analytic System. Used the Auto Data Preparation to cleanse our data. Choose the binary classification algorithms. Let the Data Science Experience evaluate between logistic regression and gradient boosted tree. It's doing the heavy work for us. As you can see here, the Data Science Experience generated performance metrics that show us that the gradient boosted tree is the best performing algorithm for the data we gave it. Once we save this model, it's automatically deployed and available for developers to use. Any application developer can take this endpoint and consume it like they would any other API inside of the apps they built. We've made training and creating machine learning models super simple. But what about the operations? A lot of companies are struggling to ensure their model performance remains high over time. In our financial adviser app, we know that customer data changes constantly, so we need to always monitor model performance and ensure that our models are retrained as is necessary. This is a dashboard that shows the performance of our models and lets our teams monitor and retrain those models so that they're always performing to our standards. So far we've been showing you the Data Science Experience available behind the firewall that we're using to build and train models. Through a new publish feature, you can build models and deploy them anywhere. In another environment, private, public, or anywhere else with just a few clicks. So here we're publishing our model to the Watson machine learning service. It happens to be in the IBM cloud. And also deeply integrated with our Data Science Experience. After publishing and switching to the Watson machine learning service, you can see that our stock affinity and model that we just published is there and ready for use. So this is incredibly important. I just want to say it again. The Data Science Experience allows you to train models behind your own firewall, take advantage of your proprietary and sensitive data, and then deploy those models wherever you want with ease. So summarize what we just showed you. First, IBM's Data Science Experience supports all teams. You saw how our data engineer populated our project with trusted data sets. Our data scientists developed, trained, and tested a machine learning model. Our developers used APIs to integrate machine learning into their apps. And how IT can use our Integrated Model Management dashboard to monitor and manage model performance. Second, we support all data. On premises, in the cloud, structured, unstructured, inside of your firewall, and outside of it. We help you bring analytics and governance to where your data is. Third, we support all tools. The data science tools that you depend on are readily available and deeply integrated. This includes capabilities from great partners like Hortonworks. And powerful tools like our very own IBM SPSS. And fourth, and finally, we support all deployments. You can build your models anywhere, and deploy them right next to where your data is. Whether that's in the public cloud, private cloud, or even on the world's most reliable transaction platform, IBM z. So see for yourself. Go to the Data Science Experience website, take us for a spin. And if you happen to be ready right now, our recently created Data Science Elite Team can help you get started and run experiments alongside you with no charge. Thank you very much. >> Thank you very much Daniel. It seems like a great time to get started. And thanks to Siva for taking us through it. Rob and I will be back in just a moment to add some perspective right after this. All right, once again joined by Rob Thomas. And Rob obviously we got a lot of information here. >> Yes, we've covered a lot of ground. >> This is intense. You got to break it down for me cause I think we zoom out and see the big picture. What better data science can deliver to a business? Why is this so important? I mean we've heard it through and through. >> Yeah, well, I heard it a couple times. But it starts with businesses have to embrace a data driven culture. And it is a change. And we need to make data accessible with the right tools in a collaborative culture because we've got diverse skill sets in every organization. But data driven companies succeed when data science tools are in the hands of everyone. And I think that's a new thought. I think most companies think just get your data scientist some tools, you'll be fine. This is about tools in the hands of everyone. I think the panel did a great job of describing about how we get to data science for all. Building a data culture, making it a part of your everyday operations, and the highlights of what Daniel just showed us, that's some pretty cool features for how organizations can get to this, which is you can see IBM's Data Science Experience, how that supports all teams. You saw data analysts, data scientists, application developer, IT staff, all working together. Second, you saw how we support all tools. And your choice of tools. So the most popular data science libraries integrated into one platform. And we saw some new capabilities that help companies avoid lock-in, where you can import existing models created from specialist tools like SPSS or others. And then deploy them and manage them inside of Data Science Experience. That's pretty interesting. And lastly, you see we continue to build on this best of open tools. Partnering with companies like H2O, Hortonworks, and others. Third, you can see how you use all data no matter where it lives. That's a key challenge every organization's going to face. Private, public, federating all data sources. We announced new integration with the Hortonworks data platform where we deploy machine learning models where your data resides. That's been a key theme. Analytics where the data is. And lastly, supporting all types of deployments. Deploy them in your Hadoop cluster. Deploy them in your Integrated Analytic System. Or deploy them in z, just to name a few. A lot of different options here. But look, don't believe anything I say. Go try it for yourself. Data Science Experience, anybody can use it. Go to datascience.ibm.com and look, if you want to start right now, we just created a team that we call Data Science Elite. These are the best data scientists in the world that will come sit down with you and co-create solutions, models, and prove out a proof of concept. >> Good stuff. Thank you Rob. So you might be asking what does an organization look like that embraces data science for all? And how could it transform your role? I'm going to head back to the office and check it out. Let's start with the perspective of the line of business. What's changed? Well, now you're starting to explore new business models. You've uncovered opportunities for new revenue sources and all that hidden data. And being disrupted is no longer keeping you up at night. As a data science leader, you're beginning to collaborate with a line of business to better understand and translate the objectives into the models that are being built. Your data scientists are also starting to collaborate with the less technical team members and analysts who are working closest to the business problem. And as a data scientist, you stop feeling like you're falling behind. Open source tools are keeping you current. You're also starting to operationalize the work that you do. And you get to do more of what you love. Explore data, build models, put your models into production, and create business impact. All in all, it's not a bad scenario. Thanks. All right. We are back and coming up next, oh this is a special time right now. Cause we got a great guest speaker. New York Magazine called him the spreadsheet psychic and number crunching prodigy who went from correctly forecasting baseball games to correctly forecasting presidential elections. He even invented a proprietary algorithm called PECOTA for predicting future performance by baseball players and teams. And his New York Times bestselling book, The Signal and the Noise was named by Amazon.com as the number one best non-fiction book of 2012. He's currently the Editor in Chief of the award winning website, FiveThirtyEight and appears on ESPN as an on air commentator. Big round of applause. My pleasure to welcome Nate Silver. >> Thank you. We met backstage. >> Yes. >> It feels weird to re-shake your hand, but you know, for the audience. >> I had to give the intense firm grip. >> Definitely. >> The ninja grip. So you and I have crossed paths kind of digitally in the past, which it really interesting, is I started my career at ESPN. And I started as a production assistant, then later back on air for sports technology. And I go to you to talk about sports because-- >> Yeah. >> Wow, has ESPN upped their game in terms of understanding the importance of data and analytics. And what it brings. Not just to MLB, but across the board. >> No, it's really infused into the way they present the broadcast. You'll have win probability on the bottom line. And they'll incorporate FiveThirtyEight metrics into how they cover college football for example. So, ESPN ... Sports is maybe the perfect, if you're a data scientist, like the perfect kind of test case. And the reason being that sports consists of problems that have rules. And have structure. And when problems have rules and structure, then it's a lot easier to work with. So it's a great way to kind of improve your skills as a data scientist. Of course, there are also important real world problems that are more open ended, and those present different types of challenges. But it's such a natural fit. The teams. Think about the teams playing the World Series tonight. The Dodgers and the Astros are both like very data driven, especially Houston. Golden State Warriors, the NBA Champions, extremely data driven. New England Patriots, relative to an NFL team, it's shifted a little bit, the NFL bar is lower. But the Patriots are certainly very analytical in how they make decisions. So, you can't talk about sports without talking about analytics. >> And I was going to save the baseball question for later. Cause we are moments away from game seven. >> Yeah. >> Is everyone else watching game seven? It's been an incredible series. Probably one of the best of all time. >> Yeah, I mean-- >> You have a prediction here? >> You can mention that too. So I don't have a prediction. FiveThirtyEight has the Dodgers with a 60% chance of winning. >> [Katie] LA Fans. >> So you have two teams that are about equal. But the Dodgers pitching staff is in better shape at the moment. The end of a seven game series. And they're at home. >> But the statistics behind the two teams is pretty incredible. >> Yeah. It's like the first World Series in I think 56 years or something where you have two 100 win teams facing one another. There have been a lot of parity in baseball for a lot of years. Not that many offensive overall juggernauts. But this year, and last year with the Cubs and the Indians too really. But this year, you have really spectacular teams in the World Series. It kind of is a showcase of modern baseball. Lots of home runs. Lots of strikeouts. >> [Katie] Lots of extra innings. >> Lots of extra innings. Good defense. Lots of pitching changes. So if you love the modern baseball game, it's been about the best example that you've had. If you like a little bit more contact, and fewer strikeouts, maybe not so much. But it's been a spectacular and very exciting World Series. It's amazing to talk. MLB is huge with analysis. I mean, hands down. But across the board, if you can provide a few examples. Because there's so many teams in front offices putting such an, just a heavy intensity on the analysis side. And where the teams are going. And if you could provide any specific examples of teams that have really blown your mind. Especially over the last year or two. Because every year it gets more exciting if you will. I mean, so a big thing in baseball is defensive shifts. So if you watch tonight, you'll probably see a couple of plays where if you're used to watching baseball, a guy makes really solid contact. And there's a fielder there that you don't think should be there. But that's really very data driven where you analyze where's this guy hit the ball. That part's not so hard. But also there's game theory involved. Because you have to adjust for the fact that he knows where you're positioning the defenders. He's trying therefore to make adjustments to his own swing and so that's been a major innovation in how baseball is played. You know, how bullpens are used too. Where teams have realized that actually having a guy, across all sports pretty much, realizing the importance of rest. And of fatigue. And that you can be the best pitcher in the world, but guess what? After four or five innings, you're probably not as good as a guy who has a fresh arm necessarily. So I mean, it really is like, these are not subtle things anymore. It's not just oh, on base percentage is valuable. It really effects kind of every strategic decision in baseball. The NBA, if you watch an NBA game tonight, see how many three point shots are taken. That's in part because of data. And teams realizing hey, three points is worth more than two, once you're more than about five feet from the basket, the shooting percentage gets really flat. And so it's revolutionary, right? Like teams that will shoot almost half their shots from the three point range nowadays. Larry Bird, who wound up being one of the greatest three point shooters of all time, took only eight three pointers his first year in the NBA. It's quite noticeable if you watch baseball or basketball in particular. >> Not to focus too much on sports. One final question. In terms of Major League Soccer, and now in NFL, we're having the analysis and having wearables where it can now showcase if they wanted to on screen, heart rate and breathing and how much exertion. How much data is too much data? And when does it ruin the sport? >> So, I don't think, I mean, again, it goes sport by sport a little bit. I think in basketball you actually have a more exciting game. I think the game is more open now. You have more three pointers. You have guys getting higher assist totals. But you know, I don't know. I'm not one of those people who thinks look, if you love baseball or basketball, and you go in to work for the Astros, the Yankees or the Knicks, they probably need some help, right? You really have to be passionate about that sport. Because it's all based on what questions am I asking? As I'm a fan or I guess an employee of the team. Or a player watching the game. And there isn't really any substitute I don't think for the insight and intuition that a curious human has to kind of ask the right questions. So we can talk at great length about what tools do you then apply when you have those questions, but that still comes from people. I don't think machine learning could help with what questions do I want to ask of the data. It might help you get the answers. >> If you have a mid-fielder in a soccer game though, not exerting, only 80%, and you're seeing that on a screen as a fan, and you're saying could that person get fired at the end of the day? One day, with the data? >> So we found that actually some in soccer in particular, some of the better players are actually more still. So Leo Messi, maybe the best player in the world, doesn't move as much as other soccer players do. And the reason being that A) he kind of knows how to position himself in the first place. B) he realizes that you make a run, and you're out of position. That's quite fatiguing. And particularly soccer, like basketball, is a sport where it's incredibly fatiguing. And so, sometimes the guys who conserve their energy, that kind of old school mentality, you have to hustle at every moment. That is not helpful to the team if you're hustling on an irrelevant play. And therefore, on a critical play, can't get back on defense, for example. >> Sports, but also data is moving exponentially as we're just speaking about today. Tech, healthcare, every different industry. Is there any particular that's a favorite of yours to cover? And I imagine they're all different as well. >> I mean, I do like sports. We cover a lot of politics too. Which is different. I mean in politics I think people aren't intuitively as data driven as they might be in sports for example. It's impressive to follow the breakthroughs in artificial intelligence. It started out just as kind of playing games and playing chess and poker and Go and things like that. But you really have seen a lot of breakthroughs in the last couple of years. But yeah, it's kind of infused into everything really. >> You're known for your work in politics though. Especially presidential campaigns. >> Yeah. >> This year, in particular. Was it insanely challenging? What was the most notable thing that came out of any of your predictions? >> I mean, in some ways, looking at the polling was the easiest lens to look at it. So I think there's kind of a myth that last year's result was a big shock and it wasn't really. If you did the modeling in the right way, then you realized that number one, polls have a margin of error. And so when a candidate has a three point lead, that's not particularly safe. Number two, the outcome between different states is correlated. Meaning that it's not that much of a surprise that Clinton lost Wisconsin and Michigan and Pennsylvania and Ohio. You know I'm from Michigan. Have friends from all those states. Kind of the same types of people in those states. Those outcomes are all correlated. So what people thought was a big upset for the polls I think was an example of how data science done carefully and correctly where you understand probabilities, understand correlations. Our model gave Trump a 30% chance of winning. Others models gave him a 1% chance. And so that was interesting in that it showed that number one, that modeling strategies and skill do matter quite a lot. When you have someone saying 30% versus 1%. I mean, that's a very very big spread. And number two, that these aren't like solved problems necessarily. Although again, the problem with elections is that you only have one election every four years. So I can be very confident that I have a better model. Even one year of data doesn't really prove very much. Even five or 10 years doesn't really prove very much. And so, being aware of the limitations to some extent intrinsically in elections when you only get one kind of new training example every four years, there's not really any way around that. There are ways to be more robust to sparce data environments. But if you're identifying different types of business problems to solve, figuring out what's a solvable problem where I can add value with data science is a really key part of what you're doing. >> You're such a leader in this space. In data and analysis. It would be interesting to kind of peek back the curtain, understand how you operate but also how large is your team? How you're putting together information. How quickly you're putting it out. Cause I think in this right now world where everybody wants things instantly-- >> Yeah. >> There's also, you want to be first too in the world of journalism. But you don't want to be inaccurate because that's your credibility. >> We talked about this before, right? I think on average, speed is a little bit overrated in journalism. >> [Katie] I think it's a big problem in journalism. >> Yeah. >> Especially in the tech world. You have to be first. You have to be first. And it's just pumping out, pumping out. And there's got to be more time spent on stories if I can speak subjectively. >> Yeah, for sure. But at the same time, we are reacting to the news. And so we have people that come in, we hire most of our people actually from journalism. >> [Katie] How many people do you have on your team? >> About 35. But, if you get someone who comes in from an academic track for example, they might be surprised at how fast journalism is. That even though we might be slower than the average website, the fact that there's a tragic event in New York, are there things we have to say about that? A candidate drops out of the presidential race, are things we have to say about that. In periods ranging from minutes to days as opposed to kind of weeks to months to years in the academic world. The corporate world moves faster. What is a little different about journalism is that you are expected to have more precision where people notice when you make a mistake. In corporations, you have maybe less transparency. If you make 10 investments and seven of them turn out well, then you'll get a lot of profit from that, right? In journalism, it's a little different. If you make kind of seven predictions or say seven things, and seven of them are very accurate and three of them aren't, you'll still get criticized a lot for the three. Just because that's kind of the way that journalism is. And so the kind of combination of needing, not having that much tolerance for mistakes, but also needing to be fast. That is tricky. And I criticize other journalists sometimes including for not being data driven enough, but the best excuse any journalist has, this is happening really fast and it's my job to kind of figure out in real time what's going on and provide useful information to the readers. And that's really difficult. Especially in a world where literally, I'll probably get off the stage and check my phone and who knows what President Trump will have tweeted or what things will have happened. But it really is a kind of 24/7. >> Well because it's 24/7 with FiveThirtyEight, one of the most well known sites for data, are you feeling micromanagey on your people? Because you do have to hit this balance. You can't have something come out four or five days later. >> Yeah, I'm not -- >> Are you overseeing everything? >> I'm not by nature a micromanager. And so you try to hire well. You try and let people make mistakes. And the flip side of this is that if a news organization that never had any mistakes, never had any corrections, that's raw, right? You have to have some tolerance for error because you are trying to decide things in real time. And figure things out. I think transparency's a big part of that. Say here's what we think, and here's why we think it. If we have a model to say it's not just the final number, here's a lot of detail about how that's calculated. In some case we release the code and the raw data. Sometimes we don't because there's a proprietary advantage. But quite often we're saying we want you to trust us and it's so important that you trust us, here's the model. Go play around with it yourself. Here's the data. And that's also I think an important value. >> That speaks to open source. And your perspective on that in general. >> Yeah, I mean, look, I'm a big fan of open source. I worry that I think sometimes the trends are a little bit away from open source. But by the way, one thing that happens when you share your data or you share your thinking at least in lieu of the data, and you can definitely do both is that readers will catch embarrassing mistakes that you made. By the way, even having open sourceness within your team, I mean we have editors and copy editors who often save you from really embarrassing mistakes. And by the way, it's not necessarily people who have a training in data science. I would guess that of our 35 people, maybe only five to 10 have a kind of formal background in what you would call data science. >> [Katie] I think that speaks to the theme here. >> Yeah. >> [Katie] That everybody's kind of got to be data literate. >> But yeah, it is like you have a good intuition. You have a good BS detector basically. And you have a good intuition for hey, this looks a little bit out of line to me. And sometimes that can be based on domain knowledge, right? We have one of our copy editors, she's a big college football fan. And we had an algorithm we released that tries to predict what the human being selection committee will do, and she was like, why is LSU rated so high? Cause I know that LSU sucks this year. And we looked at it, and she was right. There was a bug where it had forgotten to account for their last game where they lost to Troy or something and so -- >> That also speaks to the human element as well. >> It does. In general as a rule, if you're designing a kind of regression based model, it's different in machine learning where you have more, when you kind of build in the tolerance for error. But if you're trying to do something more precise, then so much of it is just debugging. It's saying that looks wrong to me. And I'm going to investigate that. And sometimes it's not wrong. Sometimes your model actually has an insight that you didn't have yourself. But fairly often, it is. And I think kind of what you learn is like, hey if there's something that bothers me, I want to go investigate that now and debug that now. Because the last thing you want is where all of a sudden, the answer you're putting out there in the world hinges on a mistake that you made. Cause you never know if you have so to speak, 1,000 lines of code and they all perform something differently. You never know when you get in a weird edge case where this one decision you made winds up being the difference between your having a good forecast and a bad one. In a defensible position and a indefensible one. So we definitely are quite diligent and careful. But it's also kind of knowing like, hey, where is an approximation good enough and where do I need more precision? Cause you could also drive yourself crazy in the other direction where you know, it doesn't matter if the answer is 91.2 versus 90. And so you can kind of go 91.2, three, four and it's like kind of A) false precision and B) not a good use of your time. So that's where I do still spend a lot of time is thinking about which problems are "solvable" or approachable with data and which ones aren't. And when they're not by the way, you're still allowed to report on them. We are a news organization so we do traditional reporting as well. And then kind of figuring out when do you need precision versus when is being pointed in the right direction good enough? >> I would love to get inside your brain and see how you operate on just like an everyday walking to Walgreens movement. It's like oh, if I cross the street in .2-- >> It's not, I mean-- >> Is it like maddening in there? >> No, not really. I mean, I'm like-- >> This is an honest question. >> If I'm looking for airfares, I'm a little more careful. But no, part of it's like you don't want to waste time on unimportant decisions, right? I will sometimes, if I can't decide what to eat at a restaurant, I'll flip a coin. If the chicken and the pasta both sound really good-- >> That's not high tech Nate. We want better. >> But that's the point, right? It's like both the chicken and the pasta are going to be really darn good, right? So I'm not going to waste my time trying to figure it out. I'm just going to have an arbitrary way to decide. >> Serious and business, how organizations in the last three to five years have just evolved with this data boom. How are you seeing it as from a consultant point of view? Do you think it's an exciting time? Do you think it's a you must act now time? >> I mean, we do know that you definitely see a lot of talent among the younger generation now. That so FiveThirtyEight has been at ESPN for four years now. And man, the quality of the interns we get has improved so much in four years. The quality of the kind of young hires that we make straight out of college has improved so much in four years. So you definitely do see a younger generation for which this is just part of their bloodstream and part of their DNA. And also, particular fields that we're interested in. So we're interested in people who have both a data and a journalism background. We're interested in people who have a visualization and a coding background. A lot of what we do is very much interactive graphics and so forth. And so we do see those skill sets coming into play a lot more. And so the kind of shortage of talent that had I think frankly been a problem for a long time, I'm optimistic based on the young people in our office, it's a little anecdotal but you can tell that there are so many more programs that are kind of teaching students the right set of skills that maybe weren't taught as much a few years ago. >> But when you're seeing these big organizations, ESPN as perfect example, moving more towards data and analytics than ever before. >> Yeah. >> You would say that's obviously true. >> Oh for sure. >> If you're not moving that direction, you're going to fall behind quickly. >> Yeah and the thing is, if you read my book or I guess people have a copy of the book. In some ways it's saying hey, there are lot of ways to screw up when you're using data. And we've built bad models. We've had models that were bad and got good results. Good models that got bad results and everything else. But the point is that the reason to be out in front of the problem is so you give yourself more runway to make errors and mistakes. And to learn kind of what works and what doesn't and which people to put on the problem. I sometimes do worry that a company says oh we need data. And everyone kind of agrees on that now. We need data science. Then they have some big test case. And they have a failure. And they maybe have a failure because they didn't know really how to use it well enough. But learning from that and iterating on that. And so by the time that you're on the third generation of kind of a problem that you're trying to solve, and you're watching everyone else make the mistake that you made five years ago, I mean, that's really powerful. But that doesn't mean that getting invested in it now, getting invested both in technology and the human capital side is important. >> Final question for you as we run out of time. 2018 beyond, what is your biggest project in terms of data gathering that you're working on? >> There's a midterm election coming up. That's a big thing for us. We're also doing a lot of work with NBA data. So for four years now, the NBA has been collecting player tracking data. So they have 3D cameras in every arena. So they can actually kind of quantify for example how fast a fast break is, for example. Or literally where a player is and where the ball is. For every NBA game now for the past four or five years. And there hasn't really been an overall metric of player value that's taken advantage of that. The teams do it. But in the NBA, the teams are a little bit ahead of journalists and analysts. So we're trying to have a really truly next generation stat. It's a lot of data. Sometimes I now more oversee things than I once did myself. And so you're parsing through many, many, many lines of code. But yeah, so we hope to have that out at some point in the next few months. >> Anything you've personally been passionate about that you've wanted to work on and kind of solve? >> I mean, the NBA thing, I am a pretty big basketball fan. >> You can do better than that. Come on, I want something real personal that you're like I got to crunch the numbers. >> You know, we tried to figure out where the best burrito in America was a few years ago. >> I'm going to end it there. >> Okay. >> Nate, thank you so much for joining us. It's been an absolute pleasure. Thank you. >> Cool, thank you. >> I thought we were going to chat World Series, you know. Burritos, important. I want to thank everybody here in our audience. Let's give him a big round of applause. >> [Nate] Thank you everyone. >> Perfect way to end the day. And for a replay of today's program, just head on over to ibm.com/dsforall. I'm Katie Linendoll. And this has been Data Science for All: It's a Whole New Game. Test one, two. One, two, three. Hi guys, I just want to quickly let you know as you're exiting. A few heads up. Downstairs right now there's going to be a meet and greet with Nate. And we're going to be doing that with clients and customers who are interested. So I would recommend before the game starts, and you lose Nate, head on downstairs. And also the gallery is open until eight p.m. with demos and activations. And tomorrow, make sure to come back too. Because we have exciting stuff. I'll be joining you as your host. And we're kicking off at nine a.m. So bye everybody, thank you so much. >> [Announcer] Ladies and gentlemen, thank you for attending this evening's webcast. If you are not attending all cloud and cognitive summit tomorrow, we ask that you recycle your name badge at the registration desk. Thank you. Also, please note there are two exits on the back of the room on either side of the room. Have a good evening. Ladies and gentlemen, the meet and greet will be on stage. Thank you.

Published Date : Nov 1 2017

SUMMARY :

Today the ability to extract value from data is becoming a shared mission. And for all of you during the program, I want to remind you to join that conversation on And when you and I chatted about it. And the scale and complexity of the data that organizations are having to deal with has It's challenging in the world of unmanageable. And they have to find a way. AI. And it's incredible that this buzz word is happening. And to get to an AI future, you have to lay a data foundation today. And four is you got to expand job roles in the organization. First pillar in this you just discussed. And now you get to where we are today. And if you don't have a strategy for how you acquire that and manage it, you're not going And the way I think about that is it's really about moving from static data repositories And we continue with the architecture. So you need a way to federate data across different environments. So we've laid out what you need for driving automation. And so when you think about the real use cases that are driving return on investment today, Let's go ahead and come back to something that you mentioned earlier because it's fascinating And so the new job roles is about how does everybody have data first in their mind? Everybody in the company has to be data literate. So overall, group effort, has to be a common goal, and we all need to be data literate But at the end of the day, it's kind of not an easy task. It's not easy but it's maybe not as big of a shift as you would think. It's interesting to hear you say essentially you need to train everyone though across the And look, if you want to get your hands on code and just dive right in, you go to datascience.ibm.com. And I've heard that the placement behind those jobs, people graduating with the MS is high. Let me get back to something else you touched on earlier because you mentioned that a number They produce a lot of the shows that I'm sure you watch Katie. And this is a good example. So they have to optimize every aspect of their business from marketing campaigns to promotions And so, as we talk to clients we think about how do you start down this path now, even It's analytics first to the data, not the other way around. We as a practice, we say you want to bring data to where the data sits. And a Harvard Business Review even dubbed it the sexiest job of the 21st century. Female preferred, on the cover of Vogue. And how does it change everything? And while it's important to recognize this critical skill set, you can't just limit it And we call it clickers and coders. [Katie] I like that. And there's not a lot of things available today that do that. Because I hear you talking about the data scientists role and how it's critical to success, And my view is if you have the right platform, it enables the organization to collaborate. And every organization needs to think about what are the skills that are critical? Use this as your chance to reinvent IT. And I can tell you even personally being effected by how important the analysis is in working And think about if you don't do something. And now we're going to get to the fun hands on part of our story. And then how do you move analytics closer to your data? And in here I can see that JP Morgan is calling for a US dollar rebound in the second half But then where it gets interesting is you go to the bottom. data, his stock portfolios, and browsing behavior to build a model which can predict his affinity And so, as a financial adviser, you look at this and you say, all right, we know he loves And I want to do that by picking a auto stock which has got negative correlation with Ferrari. Cause you start clicking that and immediately we're getting instant answers of what's happening. And what I see here instantly is that Honda has got a negative correlation with Ferrari, As a financial adviser, you wouldn't think about federating data, machine learning, pretty And drive the machine learning into the appliance. And even score hundreds of customers for their affinities on a daily basis. And then you see when you deploy analytics next to your data, even a financial adviser, And as a data science leader or data scientist, you have a lot of the same concerns. But you guys each have so many unique roles in your business life. And just by looking at the demand of companies that wants us to help them go through this And I think the whole ROI of data is that you can now understand people's relationships Well you can have all the data in the world, and I think it speaks to, if you're not doing And I think that that's one of the things that customers are coming to us for, right? And Nir, this is something you work with a lot. And the companies that are not like that. Tricia, companies have to deal with data behind the firewall and in the new multi cloud And so that's why I think it's really important to understand that when you implement big And how are the clients, how are the users actually interacting with the system? And right now the way I see teams being set up inside companies is that they're creating But in order to actually see all of the RY behind the data, you also have to have a creative That's one of the things that we see a lot. So a lot of the training we do is sort of data engineers. And I think that's a very strong point when it comes to the data analysis side. And that's where you need the human element to come back in and say okay, look, you're And the people who are really great at providing that human intelligence are social scientists. the talent piece is actually the most important crucial hard to get. It may be to take folks internally who have a lot of that domain knowledge that you have And from data scientist to machine learner. And what I explain to them is look, you're still making decisions in the same way. And I mean, just to give you an example, we are partnering with one of the major cloud And what you're talking about with culture is really where I think we're talking about And I think that communication between the technical stakeholders and management You guys made this way too easy. I want to leave you with an opportunity to, anything you want to add to this conversation? I think one thing to conclude is to say that companies that are not data driven is And thank you guys again for joining us. And we're going to turn our attention to how you can deliver on what they're talking about And finally how you could build models anywhere and employ them close to where your data is. And thanks to Siva for taking us through it. You got to break it down for me cause I think we zoom out and see the big picture. And we saw some new capabilities that help companies avoid lock-in, where you can import And as a data scientist, you stop feeling like you're falling behind. We met backstage. And I go to you to talk about sports because-- And what it brings. And the reason being that sports consists of problems that have rules. And I was going to save the baseball question for later. Probably one of the best of all time. FiveThirtyEight has the Dodgers with a 60% chance of winning. So you have two teams that are about equal. It's like the first World Series in I think 56 years or something where you have two 100 And that you can be the best pitcher in the world, but guess what? And when does it ruin the sport? So we can talk at great length about what tools do you then apply when you have those And the reason being that A) he kind of knows how to position himself in the first place. And I imagine they're all different as well. But you really have seen a lot of breakthroughs in the last couple of years. You're known for your work in politics though. What was the most notable thing that came out of any of your predictions? And so, being aware of the limitations to some extent intrinsically in elections when It would be interesting to kind of peek back the curtain, understand how you operate but But you don't want to be inaccurate because that's your credibility. I think on average, speed is a little bit overrated in journalism. And there's got to be more time spent on stories if I can speak subjectively. And so we have people that come in, we hire most of our people actually from journalism. And so the kind of combination of needing, not having that much tolerance for mistakes, Because you do have to hit this balance. And so you try to hire well. And your perspective on that in general. But by the way, one thing that happens when you share your data or you share your thinking And you have a good intuition for hey, this looks a little bit out of line to me. And I think kind of what you learn is like, hey if there's something that bothers me, It's like oh, if I cross the street in .2-- I mean, I'm like-- But no, part of it's like you don't want to waste time on unimportant decisions, right? We want better. It's like both the chicken and the pasta are going to be really darn good, right? Serious and business, how organizations in the last three to five years have just And man, the quality of the interns we get has improved so much in four years. But when you're seeing these big organizations, ESPN as perfect example, moving more towards But the point is that the reason to be out in front of the problem is so you give yourself Final question for you as we run out of time. And so you're parsing through many, many, many lines of code. You can do better than that. You know, we tried to figure out where the best burrito in America was a few years Nate, thank you so much for joining us. I thought we were going to chat World Series, you know. And also the gallery is open until eight p.m. with demos and activations. If you are not attending all cloud and cognitive summit tomorrow, we ask that you recycle your

ENTITIES

Entity	Category	Confidence
Tricia Wang	PERSON	0.99+
Katie	PERSON	0.99+
Katie Linendoll	PERSON	0.99+
Rob	PERSON	0.99+
Google	ORGANIZATION	0.99+
Joane	PERSON	0.99+
Daniel	PERSON	0.99+
Michael Li	PERSON	0.99+
Nate Silver	PERSON	0.99+
Apple	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Trump	PERSON	0.99+
Nate	PERSON	0.99+
Honda	ORGANIZATION	0.99+
Siva	PERSON	0.99+
McKinsey	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Larry Bird	PERSON	0.99+
2017	DATE	0.99+
Rob Thomas	PERSON	0.99+
Michigan	LOCATION	0.99+
Yankees	ORGANIZATION	0.99+
New York	LOCATION	0.99+
Clinton	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Tesco	ORGANIZATION	0.99+
Michael	PERSON	0.99+
America	LOCATION	0.99+
Leo	PERSON	0.99+
four years	QUANTITY	0.99+
five	QUANTITY	0.99+
30%	QUANTITY	0.99+
Astros	ORGANIZATION	0.99+
Trish	PERSON	0.99+
Sudden Compass	ORGANIZATION	0.99+
Leo Messi	PERSON	0.99+
two teams	QUANTITY	0.99+
1,000 lines	QUANTITY	0.99+
one year	QUANTITY	0.99+
10 investments	QUANTITY	0.99+
NASDAQ	ORGANIZATION	0.99+
The Signal and the Noise	TITLE	0.99+
Tricia	PERSON	0.99+
Nir Kaldero	PERSON	0.99+
80%	QUANTITY	0.99+
BCG	ORGANIZATION	0.99+
Daniel Hernandez	PERSON	0.99+
ESPN	ORGANIZATION	0.99+
H2O	ORGANIZATION	0.99+
Ferrari	ORGANIZATION	0.99+
last year	DATE	0.99+
18	QUANTITY	0.99+
three	QUANTITY	0.99+
Data Incubator	ORGANIZATION	0.99+
Patriots	ORGANIZATION	0.99+

Vikram Murali, IBM | IBM Data Science For All

>> Narrator: Live from New York City, it's theCUBE. Covering IBM Data Science For All. Brought to you by IBM. >> Welcome back to New York here on theCUBE. Along with Dave Vellante, I'm John Walls. We're Data Science For All, IBM's two day event, and we'll be here all day long wrapping up again with that panel discussion from four to five here Eastern Time, so be sure to stick around all day here on theCUBE. Joining us now is Vikram Murali, who is a program director at IBM, and Vikram thank for joining us here on theCUBE. Good to see you. >> Good to see you too. Thanks for having me. >> You bet. So, among your primary responsibilities, The Data Science Experience. So first off, if you would, share with our viewers a little bit about that. You know, the primary mission. You've had two fairly significant announcements. Updates, if you will, here over the past month or so, so share some information about that too if you would. >> Sure, so my team, we build The Data Science Experience, and our goal is for us to enable data scientist, in their path, to gain insights into data using data science techniques, mission learning, the latest and greatest open source especially, and be able to do collaboration with fellow data scientist, with data engineers, business analyst, and it's all about freedom. Giving freedom to data scientist to pick the tool of their choice, and program and code in the language of their choice. So that's the mission of Data Science Experience, when we started this. The two releases, that you mentioned, that we had in the last 45 days. There was one in September and then there was one on October 30th. Both of these releases are very significant in the mission learning space especially. We now support Scikit-Learn, XGBoost, TensorFlow libraries in Data Science Experience. We have deep integration with Horton Data Platform, which is keymark of our partnership with Hortonworks. Something that we announced back in the summer, and this last release of Data Science Experience, two days back, specifically can do authentication with Technotes with Hadoop. So now our Hadoop customers, our Horton Data Platform customers, can leverage all the goodies that we have in Data Science Experience. It's more deeply integrated with our Hadoop based environments. >> A lot of people ask me, "Okay, when IBM announces a product like Data Science Experience... You know, IBM has a lot of products in its portfolio. Are they just sort of cobbling together? You know? So exulting older products, and putting a skin on them? Or are they developing them from scratch?" How can you help us understand that? >> That's a great question, and I hear that a lot from our customers as well. Data Science Experience started off as a design first methodology. And what I mean by that is we are using IBM design to lead the charge here along with the product and development. And we are actually talking to customers, to data scientist, to data engineers, to enterprises, and we are trying to find out what problems they have in data science today and how we can best address them. So it's not about taking older products and just re-skinning them, but Data Science Experience, for example, it started of as a brand new product: completely new slate with completely new code. Now, IBM has done data science and mission learning for a very long time. We have a lot of assets like SPSS Modeler and Stats, and digital optimization. And we are re-investing in those products, and we are investing in such a way, and doing product research in such a way, not to make the old fit with the new, but in a way where it fits into the realm of collaboration. How can data scientist leverage our existing products with open source, and how we can do collaboration. So it's not just re-skinning, but it's building ground up. >> So this is really important because you say architecturally it's built from the ground up. Because, you know, given enough time and enough money, you know, smart people, you can make anything work. So the reason why this is important is you mentioned, for instance, TensorFlow. You know that down the road there's going to be some other tooling, some other open source project that's going to take hold, and your customers are going to say, "I want that." You've got to then integrate that, or you have to choose whether or not to. If it's a super heavy lift, you might not be able to do it, or do it in time to hit the market. If you architected your system to be able to accommodate that. Future proof is the term everybody uses, so have you done? How have you done that? I'm sure API's are involved, but maybe you could add some color. >> Sure. So we are and our Data Science Experience and mission learning... It is a microservices based architecture, so we are completely dockerized, and we use Kubernetes under the covers for container dockerstration. And all these are tools that are used in The Valley, across different companies, and also in products across IBM as well. So some of these legacy products that you mentioned, we are actually using some of these newer methodologies to re-architect them, and we are dockerizing them, and the microservice architecture actually helps us address issues that we have today as well as be open to development and taking newer methodologies and frameworks into consideration that may not exist today. So the microservices architecture, for example, TensorFlow is something that you brought in. So we can just pin up a docker container just for TensorFlow and attach it to our existing Data Science Experience, and it just works. Same thing with other frameworks like XGBoost, and Kross, and Scikit-Learn, all these are frameworks and libraries that are coming up in open source within the last, I would say, a year, two years, three years timeframe. Previously, integrating them into our product would have been a nightmare. We would have had to re-architect our product every time something came, but now with the microservice architecture it is very easy for us to continue with those. >> We were just talking to Daniel Hernandez a little bit about the Hortonworks relationship at high level. One of the things that I've... I mean, I've been following Hortonworks since day one when Yahoo kind of spun them out. And know those guys pretty well. And they always make a big deal out of when they do partnerships, it's deep engineering integration. And so they're very proud of that, so I want to come on to test that a little bit. Can you share with our audience the kind of integrations you've done? What you've brought to the table? What Hortonworks brought to the table? >> Yes, so Data Science Experience today can work side by side with Horton Data Platform, HDP. And we could have actually made that work about two, three months back, but, as part of our partnership that was announced back in June, we set up drawing engineering teams. We have multiple touch points every day. We call it co-development, and they have put resources in. We have put resources in, and today, especially with the release that came out on October 30th, Data Science Experience can authenticate using secure notes. That I previously mentioned, and that was a direct example of our partnership with Hortonworks. So that is phase one. Phase two and phase three is going to be deeper integration, so we are planning on making Data Science Experience and a body management pact. And so a Hortonworks customer, if you have HDP already installed, you don't have to install DSX separately. It's going to be a management pack. You just spin it up. And the third phase is going to be... We're going to be using YARN for resource management. YARN is very good a resource management. And for infrastructure as a service for data scientist, we can actually delegate that work to YARN. So, Hortonworks, they are putting resources into YARN, doubling down actually. And they are making changes to YARN where it will act as the resource manager not only for the Hadoop and Spark workloads, but also for Data Science Experience workloads. So that is the level of deep engineering that we are engaged with Hortonworks. >> YARN stands for yet another resource negotiator. There you go for... >> John: Thank you. >> The trivia of the day. (laughing) Okay, so... But of course, Hortonworks are big on committers. And obviously a big committer to YARN. Probably wouldn't have YARN without Hortonworks. So you mentioned that's kind of what they're bringing to the table, and you guys primarily are focused on the integration as well as some other IBM IP? >> That is true as well as the notes piece that I mentioned. We have a notes commenter. We have multiple notes commenters on our side, and that helps us as well. So all the notes is part of the HDP package. We need knowledge on our side to work with Hortonworks developers to make sure that we are contributing and making end roads into Data Science Experience. That way the integration becomes a lot more easier. And from an IBM IP perspective... So Data Science Experience already comes with a lot of packages and libraries that are open source, but IBM research has worked on a lot of these libraries. I'll give you a few examples: Brunel and PixieDust is something that our developers love. These are visualization libraries that were actually cooked up by IBM research and the open sourced. And these are prepackaged into Data Science Experience, so there is IBM IP involved and there are a lot of algorithms, mission learning algorithms, that we put in there. So that comes right out of the package. >> And you guys, the development teams, are really both in The Valley? Is that right? Or are you really distributed around the world? >> Yeah, so we are. The Data Science Experience development team is in North America between The Valley and Toronto. The Hortonworks team, they are situated about eight miles from where we are in The Valley, so there's a lot of synergy. We work very closely with them, and that's what we see in the product. >> I mean, what impact does that have? Is it... You know, you hear today, "Oh, yeah. We're a virtual organization. We have people all over the world: Eastern Europe, Brazil." How much of an impact is that? To have people so physically proximate? >> I think it has major impact. I mean IBM is a global organization, so we do have teams around the world, and we work very well. With the invent of IP telephoning, and screen-shares, and so on, yes we work. But it really helps being in the same timezone, especially working with a partner just eight miles or ten miles a way. We have a lot of interaction with them and that really helps. >> Dave: Yeah. Body language? >> Yeah. >> Yeah. You talked about problems. You talked about issues. You know, customers. What are they now? Before it was like, "First off, I want to get more data." Now they've got more data. Is it figuring out what to do with it? Finding it? Having it available? Having it accessible? Making sense of it? I mean what's the barrier right now? >> The barrier, I think for data scientist... The number one barrier continues to be data. There's a lot of data out there. Lot of data being generated, and the data is dirty. It's not clean. So number one problem that data scientist have is how do I get to clean data, and how do I access data. There are so many data repositories, data lakes, and data swamps out there. Data scientist, they don't want to be in the business of finding out how do I access data. They want to have instant access to data, and-- >> Well if you would let me interrupt you. >> Yeah? >> You say it's dirty. Give me an example. >> So it's not structured data, so data scientist-- >> John: So unstructured versus structured? >> Unstructured versus structured. And if you look at all the social media feeds that are being generated, the amount of data that is being generated, it's all unstructured data. So we need to clean up the data, and the algorithms need structured data or data in a particular format. And data scientist don't want to spend too much time in cleaning up that data. And access to data, as I mentioned. And that's where Data Science Experience comes in. Out of the box we have so many connectors available. It's very easy for customers to bring in their own connectors as well, and you have instant access to data. And as part of our partnership with Hortonworks, you don't have to bring data into Data Science Experience. The data is becoming so big. You want to leave it where it is. Instead, push analytics down to where it is. And you can do that. We can connect to remote Spark. We can push analytics down through remote Spark. All of that is possible today with Data Science Experience. The second thing that I hear from data scientist is all the open source libraries. Every day there's a new one. It's a boon and a bane as well, and the problem with that is the open source community is very vibrant, and there a lot of data science competitions, mission learning competitions that are helping move this community forward. And it's a good thing. The bad thing is data scientist like to work in silos on their laptop. How do you, from an enterprise perspective... How do you take that, and how do you move it? Scale it to an enterprise level? And that's where Data Science Experience comes in because now we provide all the tools. The tools of your choice: open source or proprietary. You have it in here, and you can easily collaborate. You can do all the work that you need with open source packages, and libraries, bring your own, and as well as collaborate with other data scientist in the enterprise. >> So, you're talking about dirty data. I mean, with Hadoop and no schema on, right? We kind of knew this problem was coming. So technology sort of got us into this problem. Can technology help us get out of it? I mean, from an architectural standpoint. When you think about dirty data, can you architect things in to help? >> Yes. So, if you look at the mission learning pipeline, the pipeline starts with ingesting data and then cleansing or cleaning that data. And then you go into creating a model, training, picking a classifier, and so on. So we have tools built into Data Science Experience, and we're working on tools, that will be coming up and down our roadmap, which will help data scientist do that themselves. I mean, they don't have to be really in depth coders or developers to do that. Python is very powerful. You can do a lot of data wrangling in Python itself, so we are enabling data scientist to do that within the platform, within Data Science Experience. >> If I look at sort of the demographics of the development teams. We were talking about Hortonworks and you guys collaborating. What are they like? I mean people picture IBM, you know like this 100 plus year old company. What's the persona of the developers in your team? >> The persona? I would say we have a very young, agile development team, and by that I mean... So we've had six releases this year in Data Science Experience. Just for the on premises side of the product, and the cloud side of the product it's got huge delivery. We have releases coming out faster than we can code. And it's not just re-architecting it every time, but it's about adding features, giving features that our customers are asking for, and not making them wait for three months, six months, one year. So our releases are becoming a lot more frequent, and customers are loving it. And that is, in part, because of the team. The team is able to evolve. We are very agile, and we have an awesome team. That's all. It's an amazing team. >> But six releases in... >> Yes. We had immediate release in April, and since then we've had about five revisions of the release where we add lot more features to our existing releases. A lot more packages, libraries, functionality, and so on. >> So you know what monster you're creating now don't you? I mean, you know? (laughing) >> I know, we are setting expectation. >> You still have two months left in 2017. >> We do. >> We do not make frame release cycles. >> They are not, and that's the advantage of the microservices architecture. I mean, when you upgrade, a customer upgrades, right? They don't have to bring that entire system down to upgrade. You can target one particular part, one particular microservice. You componentize it, and just upgrade that particular microservice. It's become very simple, so... >> Well some of those microservices aren't so micro. >> Vikram: Yeah. Not. Yeah, so it's a balance. >> You're growing, but yeah. >> It's a balance you have to keep. Making sure that you componentize it in such a way that when you're doing an upgrade, it effects just one small piece of it, and you don't have to take everything down. >> Dave: Right. >> But, yeah, I agree with you. >> Well, it's been a busy year for you. To say the least, and I'm sure 2017-2018 is not going to slow down. So continue success. >> Vikram: Thank you. >> Wish you well with that. Vikram, thanks for being with us here on theCUBE. >> Thank you. Thanks for having me. >> You bet. >> Back with Data Science For All. Here in New York City, IBM. Coming up here on theCUBE right after this. >> Cameraman: You guys are clear. >> John: All right. That was great.

Published Date : Nov 1 2017

SUMMARY :

Brought to you by IBM. Good to see you. Good to see you too. about that too if you would. and be able to do collaboration How can you help us understand that? and we are investing in such a way, You know that down the and attach it to our existing One of the things that I've... And the third phase is going to be... There you go for... and you guys primarily are So that comes right out of the package. The Valley and Toronto. We have people all over the We have a lot of interaction with them Is it figuring out what to do with it? and the data is dirty. You say it's dirty. You can do all the work that you need with can you architect things in to help? I mean, they don't have to and you guys collaborating. And that is, in part, because of the team. and since then we've had about and that's the advantage of microservices aren't so micro. Yeah, so it's a balance. and you don't have to is not going to slow down. Wish you well with that. Thanks for having me. Back with Data Science For All. That was great.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Vikram	PERSON	0.99+
John	PERSON	0.99+
three months	QUANTITY	0.99+
six months	QUANTITY	0.99+
John Walls	PERSON	0.99+
October 30th	DATE	0.99+
2017	DATE	0.99+
April	DATE	0.99+
June	DATE	0.99+
one year	QUANTITY	0.99+
Daniel Hernandez	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
September	DATE	0.99+
one	QUANTITY	0.99+
ten miles	QUANTITY	0.99+
YARN	ORGANIZATION	0.99+
eight miles	QUANTITY	0.99+
Vikram Murali	PERSON	0.99+
New York City	LOCATION	0.99+
North America	LOCATION	0.99+
two day	QUANTITY	0.99+
Python	TITLE	0.99+
two releases	QUANTITY	0.99+
New York	LOCATION	0.99+
two years	QUANTITY	0.99+
three years	QUANTITY	0.99+
six releases	QUANTITY	0.99+
Toronto	LOCATION	0.99+
today	DATE	0.99+
Both	QUANTITY	0.99+
two months	QUANTITY	0.99+
a year	QUANTITY	0.99+
Yahoo	ORGANIZATION	0.99+
third phase	QUANTITY	0.98+
both	QUANTITY	0.98+
this year	DATE	0.98+
first methodology	QUANTITY	0.98+
First	QUANTITY	0.97+
second thing	QUANTITY	0.97+
one small piece	QUANTITY	0.96+
One	QUANTITY	0.96+
XGBoost	TITLE	0.96+
Cameraman	PERSON	0.96+
about eight miles	QUANTITY	0.95+
Horton Data Platform	ORGANIZATION	0.95+
2017-2018	DATE	0.94+
first	QUANTITY	0.94+
The Valley	LOCATION	0.94+
TensorFlow	TITLE	0.94+

IBM CDO Social Influencers | IBM CDO Strategy Summit 2017

>> Live from Boston, Massachusetts, it's The Cube! Covering IBM Chief Data Officer Summit, brought to you by IBM. >> Welcome back to The Cube's live coverage of IBM's Chief Data Strategy Summit, I'm your host Rebecca Knight, along with my cohost Dave Vellante. We have a big panel today, these are our social influencers. Starting at the top, we have Christopher Penn, VP Marketing of Shift Communications, then Tripp Braden, Executive Coach and Growth Strategist at Strategic Performance Partners, Mike Tamir, Chief Data Science Officer at TACT, Bob Hayes, President of Business Over Broadway. Thanks so much for joining us. >> Thank you. >> So we're talking about data as a way to engage customers, a way to engage employees. What business functions would you say stand to benefit the most from using data? >> I'll take a whack at that. I don't know if it's the biggest function, but I think the customer experience and customer success. How do you use data to help predict what customers will do, and how do you then use that information to kind of personalize that experience for them and drive up recommendations, retention, upselling, things like that. >> So it's really the customer experience that you're focusing on? >> Yes, and I just released a study. I found that analytical-leading companies tend to use analytics to understand their customers more than say analytical laggards. So those kind of companies who can actually get value from data, they focus their efforts around improving customer loyalty by just gaining a deeper understanding about their customers. >> Chris, you want to jump in here with- >> I was just going to say, as many of us said, we have three things we really care about as business people, right? We want to save money, save time, or make money. So any function that meets those qualifications, is a functional benefit from data. >> I think there's also another interesting dimension to this, when you start to look at the leadership team in the company, now having the ability to anticipate the future. I mean now, we are no longer just looking at static data. We are now looking at anticipatory capability and seeing around corners, so that the person comes to the team, they're bringing something completely different than the team has had in the past. This whole competency of being able to anticipate the future and then take from that, where you take your organization in the future. >> So follow up on that, Tripp, does data now finally trump gut feel? Remember the HBR article of 10, 15 years ago, can't beat gut feel? Is that, we hit a new era now? >> Well, I think we're moving into an era where we have both. I think it's no longer an either or, we have intuition or we have data. Now we have both. The organizations who can leverage both at the same time and develop that capability and earn the trust of the other members by doing that. I see the Chief Data Officer really being a catalyst for organizational change. >> So Dr. Tamir I wonder if I could ask you a question? Maybe the whole panel, but so we've all followed the big data trend and the meme, AI, deep learning, machine learning, same wine, new bottle, or is there something substantive behind it? >> So certainly our capabilities are growing, our capabilities in machine learning, and I think that's part of why now there's this new branding of AI. AI is not what your mother might have thought AI is. It's not robots and cylons and that sort of thing that are going to be able to think intelligently. They just did intelligence tests on the different, like Siri and Alexa, quote AIs from different companies, and they scored horribly. They scored much worse than my, much worse than my very intelligent seven-year old. And that's not a comment on the deficiencies in Alexa or in Siri. It's a comment on these are not actually artificial intelligences. These are just tools that apply machine learning strategically. >> So you are all thinking about data and how it is going to change the future and one of the things you said, Tripp, is that we can now see the future. Talk to me about some of the most exciting things that you're seeing that companies do that are anticipating what customers want. >> Okay, so for example, in the customer success space, a lot of Sass businesses have a monthly subscription, so they're very worried about customer churn. So companies are now leveraging all the user behavior to understand which customers are likely to leave next month, and if they know that, they can reach out to them with maybe some retention campaigns, or even use that data to find out who's most likely to buy more from you in the next month, and then market to those in effective ways. So don't just do a blast for everybody, focus on particular customers, their needs, and try to service them or market to them in a way that resonates with them that increases retention, upselling, and recommendations. >> So they've already seen certain behaviors that show a customer is maybe not going to re-up? >> Exactly, so you just, you throw this data in a machine learning, right. You find the predictors of your outcome that interest you, and then using that information, you say oh, maybe predictors A, B, and C, are the ones that actually drive loyalty behaviors, then you can use that information to segment your customers and market to them appropriately. It's pretty cool stuff. >> February 18th, 2018. >> Okay. >> So we did a study recently just for fun of when people search for the term "Outlook, out of office." Yeah, and you really only search for that term for one reason, you're going on vacation, and you want to figure out how to turn the feature on. So we did a five-year data poll of people, of the search times for that and then inverted it, so when do people search least for that term. That's when they're in the office, and it's the week of February 18th, 2018, will be that time when people like, yep, I'm at the office, I got to work. And knowing that, prediction and data give us specificity, like yeah, we know the first quarter is busy, we know between memorial Day and Labor Day is not as busy in the B to B world. But as a marketer, we need to put specificity, data and predictive analytics gives us specificity. We know what week to send our email campaigns, what week to turn our ad budgets all the way to full, and so on and so forth. If someone's looking for The Cube, when will they be doing that, you know, going forward? That's the power of this stuff, is that specificity. >> They know what we're going to search for before we search for it. (laughter) >> I'd like to know where I'm going to be next week. Why that date? >> That's the date that people least search for the term, "Outlook, out of office." >> Okay. >> So, they're not looking for that feature, which logically means they're in the office. >> Or they're on vacation. (laughter) Right, I'm just saying. >> That brings up a good point on not just, what you're predicting for interactions right now, but also anticipating the trends. So Bob brought up a good point about figuring out when people are churning. There's a flip side to that, which is how do you get your customers to be more engaged? And now we have really an explosion in reinforcement learning in particular, which is a tool for figuring out, not just how to interact with you right now as a one off, statically. But how do I interact with you over time, this week, next week, the week after that? And using reinforcement learning, you can actually do that. This is the the sort-of technique that they used to beat Alpha-Go or to beat humans with Alpha-Go. Machine-learning algorithms, supervised learning, works well when you get that immediate feedback, but if you're playing a game, you don't get that feedback that you're going to win 300 turns from now, right now. You have to create more advanced value functions and ways of anticipating where things are going, this move, so that you see things are on track for winning in 20, 30, 40 moves, down the road. And it's the same thing when you're dealing with customer engagement. You want to, you can make a decision, I'm going to give this customer a coupon that's going to make them spend 50 cents more today, or you can make decisions algorithmically that are going to give them a 50 cent discount this week, next week, and the week after that, that are going to make them become a coffee drinker for life, or customer for life. >> It's about finding those customers for life. >> IBM uses the term cognitive business. We go to these conferences, everybody talks about digital transformation. At the end of the day it's all about how you use data. So my question is, if you think about the bell curve of organizations that you work with, how do they, what's the shape of that curve, part one. And then part two is, where do you see IBM on that curve? >> Well I think a lot of my clients make a living predicting the future, they're insurance companies and financial services. That's where the CDO currently resides and they get a lot of benefit. But one of things we're all talking about, but talking around, is that human element. So now, how do we take the human element and incorporate this into the structure of how we make our decisions? And how do we take this information, and how do we learn to trust that? The one thing I hear from most of the executives I talk to, when they talk about how data is being used in their organizations is the lack of trust. Now, when you have that, and you start to look at the trends that we're dealing with, and we call them data points verses calling them people, now you have a problem, because people become very, almost analytically challenged, right? So how do we get people to start saying, okay, let's look at this from the point of view of, it's not an either or solution in the world we live in today. Cognitive organizations are not going to happen tomorrow morning, even the most progressive organizations are probably five years away from really deploying them completely. But the organizations who take a little bit of an edge, so five, ten percent edge out of there, they now have a really, a different advantage in their markets. And that's what we're talking about, hyper-critical thinking skills. I mean, when you start to say, how do I think like Warren Buffet, how do I start to look and make these kinds of decisions analytically? How do I recreate an artificial intelligence when machine-learning practice, and program that's going to provide that solution for people. And that's where I think organizations that are forward-leaning now are looking and saying, how do I get my people to use these capabilities and ultimately trust the data that they're told. >> So I forget who said it, but it was early on in the big data movement, somebody said that we're further away from a single version of the truth than ever, and it's just going to get worse. So as a data scientist, what say you? >> I'm not familiar with the truth quote, but I think it's very relevant, well very relevant to where we are today. There's almost an arms race of, you hear all the time about automating, putting out fake news, putting out misinformation, and how that can be done using all the technology that we have at our disposal for disbursing that information. The only way that that's going to get solved is also with algorithmic solutions with creating algorithms that are going to be able to detect, is this news, is this something that is trying to attack my emotions and convince me just based on fear, or is this an article that's trying to present actual facts to me and you can do that with machine-learning algorithms. Now we have the technology to do that, algorithmically. >> Better algos than like and share. >> From a technological perspective, to your question about where IBM is, IBM has a ton of stuff that I call AI as a service, essentially where if you're a developer on Bluemix, for example, you can plug in to the different components of Watson at literally pennies per usage, to say I want to do sentiment analysis, I want to do tone analysis, I want personality insights, about this piece, who wrote this piece of content. And to Dr. Tamir's point, this is stuff that, we need these tools to do things like, fingerprint this piece of text. Did the supposed author actually write this? You can tell that, so of all the four magi, we call it, the Microsoft, Amazon, Google, IBM, getting on board, and adding that five or ten percent edge that Tripp was talking about, is easiest with IBM Bluemix. >> Great. >> Well, one of the other parts of this is you start to talk about what we're doing and you start to look at the players that are doing this. They are all organizations that I would not call classical technology organizations. They were 10 years ago, look at a Microsoft. But you look at the leadership of Microsoft today, and they're much more about figuring out what the formula is for success for business, and that's the other place I think we're seeing a transformation occurring, and the early adopters, is they have gone through the first generation, and the pain, you know, of having to have these kinds of things, and now they're moving to that second generation, where they're looking for the gain. And they're looking for people who can bring them capability and have the conversation, and discuss them in ways that they can see the landscape. I mean part of this is if you get caught in the bits and bites, you miss the landscape that you should be seeing in the market, and that's why I think there's a tremendous opportunity for us to really look at multiple markets of the same data. I mean, imagine looking and here's what I see, everyone in this group would have a different opinion in what they're seeing, but now we have the ability to see it five different ways and share that with our executive team and what we're seeing, so we can make better decisions. >> I wonder if we could have a frank conversation, an honest conversation about the data and the data ownership. You heard IBM this morning, saying hey we're going to protect your data, but I'd love you guys, as independents to weigh in. You got this data, you guys are involved with your clients, building models, the data trains the model. I got to believe that that model gets used at a lot of different places, within an industry, like insurance or across retail, whatever it is. So I'm afraid that my data is, my IP is going to seep across the industry. Should I not be worried about that? I wonder if you guys could weigh in. >> Well if you work with a particular vendor, sometimes vendors have a stipulation that we will not share your models with other clients, so you just got to stick to that. But in terms of science, I mean you build a model, right? You want to generalize that to other businesses. >> Right! >> (drowned out by others talking) So maybe if you could work somehow with your existing clients, say here, this is what we want to do, we just want to elevate the waters for everybody, right? So everybody wins when all boats rise, right? So if you can kind of convince your clients that we just want to help the world be better, and function better, make employees happier, customers happier, let's take that approach and just use models in a, that may be generalized to other situations and use them. If if you don't, then you just don't. >> Right, that's your choice. >> It's a choice, it's a choice you have to make. >> As long as you're transparent about it. >> I'm not super worried, I mean, you, Dave, Tripp, and I are all dressed similarly, right? We have the model of shirt and tie so, if I put on your clothes, we wouldn't, but if I were to put on your clothes, it would not be, even though it's the same model, it's just not going to be the same outcome. It's going to look really bad, right, so. Yes, companies can share the models and the general flows and stuff, but there's so much, if a company's doing machine learning well, there's so much feature engineering that's unique to that company that trying to apply that somewhere else, is just going to blow up. >> Yeah, but we could switch ties, like Tripp has got a really cool tie, I'd be using that tie on July 4th. >> This is turning into a different kind of panel (laughter) Chris, Tripp, Mike, and Bob, thanks so much for joining us. This has been a really fun and interesting panel. >> Thank you very much. Thank you. >> Thanks you guys. >> We will have more from the IBM Summit in Boston just after this. (techno music)

Published Date : Oct 25 2017

SUMMARY :

brought to you by IBM. Starting at the top, we stand to benefit the most from using data? and how do you then use tend to use analytics to understand their So any function that meets so that the person comes and earn the trust I could ask you a question? that are going to be able one of the things you said, to buy more from you in the next month, to segment your customers and is not as busy in the B to B world. going to search for I'd like to know where That's the date that people least looking for that feature, Right, I'm just saying. that are going to make them become It's about finding of organizations that you and program that's going to it's just going to get worse. that are going to be able the four magi, we call it, and now they're moving to that and the data ownership. that to other businesses. that may be generalized to choice you have to make. is just going to blow up. Yeah, but we could switch Chris, Tripp, Mike, and Bob, Thank you very much. in Boston just after this.

ENTITIES

Entity	Category	Confidence
Rebecca Knight	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Chris	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Christopher Penn	PERSON	0.99+
Mike Tamir	PERSON	0.99+
Google	ORGANIZATION	0.99+
Bob Hayes	PERSON	0.99+
February 18th, 2018	DATE	0.99+
Bob	PERSON	0.99+
July 4th	DATE	0.99+
five	QUANTITY	0.99+
20	QUANTITY	0.99+
five-year	QUANTITY	0.99+
Mike	PERSON	0.99+
Tamir	PERSON	0.99+
50 cents	QUANTITY	0.99+
next week	DATE	0.99+
Dave	PERSON	0.99+
Tripp Braden	PERSON	0.99+
Tripp	PERSON	0.99+
Siri	TITLE	0.99+
next week	DATE	0.99+
Warren Buffet	PERSON	0.99+
30	QUANTITY	0.99+
tomorrow morning	DATE	0.99+
February 18th, 2018	DATE	0.99+
this week	DATE	0.99+
Boston, Massachusetts	LOCATION	0.99+
50 cent	QUANTITY	0.99+
both	QUANTITY	0.99+
next month	DATE	0.99+
first generation	QUANTITY	0.99+
five years	QUANTITY	0.99+
300 turns	QUANTITY	0.99+
Alexa	TITLE	0.99+
second generation	QUANTITY	0.99+
Boston	LOCATION	0.99+
10 years ago	DATE	0.99+
TACT	ORGANIZATION	0.98+
five different ways	QUANTITY	0.98+
seven-year old	QUANTITY	0.97+
one	QUANTITY	0.96+
40 moves	QUANTITY	0.96+
today	DATE	0.96+
HBR	ORGANIZATION	0.96+
IBM Summit	EVENT	0.96+
Strategic Performance Partners	ORGANIZATION	0.96+
10, 15 years ago	DATE	0.95+
Labor Day	EVENT	0.94+
President	PERSON	0.93+
one reason	QUANTITY	0.93+
ten percent	QUANTITY	0.93+
Shift Communications	ORGANIZATION	0.92+
Sass	TITLE	0.92+
Over Broadway	ORGANIZATION	0.91+
Alpha-Go	TITLE	0.91+
IBM	EVENT	0.89+
single version	QUANTITY	0.88+
first quarter	DATE	0.87+
this morning	DATE	0.87+
IBM Chief Data Officer Summit	EVENT	0.82+
memorial Day	EVENT	0.8+
CDO Strategy Summit 2017	EVENT	0.8+

Janet George , Western Digital | Western Digital the Next Decade of Big Data 2017

>> Announcer: Live from San Jose, California, it's theCUBE, covering Innovating to Fuel the Next Decade of Big Data, brought to you by Western Digital. >> Hey welcome back everybody, Jeff Frick here with theCUBE. We're at Western Digital at their global headquarters in San Jose, California, it's the Almaden campus. This campus has a long history of innovation, and we're excited to be here, and probably have the smartest person in the building, if not the county, area code and zip code. I love to embarrass here, Janet George, she is the Fellow and Chief Data Scientist for Western Digital. We saw you at Women in Data Science, you were just at Grace Hopper, you're everywhere and get to get a chance to sit down again. >> Thank you Jeff, I appreciate it very much. >> So as a data scientist, today's announcement about MAMR, how does that make you feel, why is this exciting, how is this going to make you be more successful in your job and more importantly, the areas in which you study? >> So today's announcement is actually a breakthrough announcement, both in the field of machine learning and AI, because we've been on this data journey, and we have been very selectively storing data on our storage devices, and the selection is actually coming from the preconstructed queries that we do with business data, and now we no longer have to preconstruct these queries. We can store the data at scale in raw form. We don't even have to worry about the format or the schema of the data. We can look at the schema dynamically as the data grows within the storage and within the applications. >> Right, cause there's been two things, right. Before data was bad 'cause it was expensive to store >> Yes. >> Now suddenly we want to store it 'cause we know data is good, but even then, it still can be expensive, but you know, we've got this concept of data lakes and data swamps and data all kind of oceans, pick your favorite metaphor, but we want the data 'cause we're not really sure what we're going to do with it, and I think what's interesting that you said earlier today, is it was schema on write, then we evolved to schema on read, which was all the rage at Hadoop Summit a couple years ago, but you're talking about the whole next generation, which is an evolving dynamic schema >> Exactly. >> Based whatever happens to drive that query at the time. >> Exactly, exactly. So as we go through this journey, we are now getting independent of schema, we are decoupled from schema, and what we are finding out is we can capture data at its raw form, and we can do the learning at the raw form without human interference, in terms of transformation of the data and assigning a schema to that data. We got to understand the fidelity of the data, but we can train at scale from that data. So with massive amounts of training, the models already know to train itself from raw data. So now we are only talking about incremental learning, as the train model goes out into the field in production, and actually performs, now we are talking about how does the model learn, and this is where fast data plays a very big role. >> So that's interesting, 'cause you talked about that also earlier in your part of the presentation, kind of the fast data versus big data, which kind of maps the flash versus hard drive, and the two are not, it's not either or, but it's really both, because within the storage of the big data, you build the base foundations of the models, and then you can adapt, learn and grow, change with the fast data, with the streaming data on the front end, >> Exactly >> It's a whole new world. >> Exactly, so the fast data actually helps us after the training phase, right, and these are evolving architectures. This is part of your journey. As you come through the big data journey you experience this. But for fast data, what we are seeing is, these architectures like Lambda and Kappa are evolving, and especially the Lambda architecture is very interesting, because it allows for batch processing of historical data, and then it allows for what we call a high latency layer or a speed layer, where this data can then be promoted up the stack for serving purposes. And then Kappa architecture's where the data is being streamed near real time, bounded and unbounded streams of data. So this is again very important when we build machine learning and AI applications, because evolution is happening on the fly, learning is happening on the fly. Also, if you think about the learning, we are mimicking more and more on how humans learn. We don't really learn with very large chunks of data all at once, right? That's important for initially model training and model learning, but on a regular basis, we are learning with small chunks of data that are streamed to us near real time. >> Right, learning on the Delta. >> Learning on the Delta. >> So what is the bound versus the unbound? Unpack that a little bit. What does that mean? >> So what is bounded is basically saying, hey we are going to get certain amounts of data, so you're sizing the data for example. Unbounded is infinite streams of data coming to you. And so if your architecture can absorb infinite streams of data, like for example, the sensors constantly transmitting data to you, right? At that point you're not worried about whether you can store that data, you're simply worried about the fidelity of that data. But bounded would be saying, I'm going to send the data in chunks. You could also do bounded where you basically say, I'm going to pre-process the data a little bit just to see if the data's healthy, or if there is signal in the data. You don't want to find that out later as you're training, right? You're trying to figure that out up front. >> But it's funny, everything is ultimately bounded, it just depends on how you define the unit of time, right, 'cause you take it down to infinite zero, everything is frozen. But I love the example of the autonomous cars. We were at the event with, just talking about navigation just for autonomous cars. Goldman Sachs says it's going to be a seven billion dollar industry, and the great example that you used of the two systems working well together, 'cause is it the car centers or is it the map? >> Janet: That's right. >> And he says, well you know, you want to use the map, and the data from the map as much as you can to set the stage for the car driving down the road to give it some level of intelligence, but if today we happen to be paving lane number two on 101, and there's cones, now it's the real time data that's going to train the system. But the two have to work together, and the two are not autonomous and really can't work independent of each other. >> Yes. >> Pretty interesting. >> It makes perfect sense, right. And why it makes perfect sense is because first the autonomous cars have to learn to drive. Then the autonomous cars have to become an experienced driver. And the experience cannot be learned. It comes on the road. So one of the things I was watching was how insurance companies were doing testing on these cars, and they had a human, a human driving a car, and then an autonomous car. And the autonomous car, with the sensors, were predicting the behavior, every permutation and combination of how a bicycle would react to that car. It was almost predicting what the human on the bicycle would do, like jump in front of the car, and it got it right 80% of the cases. But a human driving a car, we're not sure how the bicycle is going to perform. We don't have peripheral vision, and we can't predict how the bicycle is going to perform, so we get it wrong. Now, we can't transmit that knowledge. If I'm a driver and I just encountered a bicycle, I can't transmit that knowledge to you. But a driverless car can learn, it can predict the behavior of the bicycle, and then it can transfer that information to a fleet of cars. So it's very powerful in where the learning can scale. >> Such a big part of the autonomous vehicle story that most people don't understand, that not only is the car driving down the road, but it's constantly measuring and modeling everything that's happening around it, including bikes, including pedestrians, including everything else, and whether it gets in a crash or not, it's still gathering that data and building the model and advancing the models, and I think that's, you know, people just don't talk about that enough. I want follow up on another topic. So we were both at Grace Hopper last week, which is a phenomenal experience, if you haven't been, go. Ill just leave it at that. But Dr. Fei-Fei Li gave one of the keynotes, and she made a really deep statement at the end of her keynote, and we were both talking about it before we turned the cameras on, which is, there's no question that AI is going to change the world, and it's changing the world today. The real question is, who are the people that are going to build the algorithms that train the AI? So you sit in your position here, with the power, both in the data and the tools and the compute that are available today, and this brand new world of AI and ML. How do you think about that? How does that make you feel about the opportunity to define the systems that drive the cars, et cetera. >> I think not just the diversity in data, but the diversity in the representation of that data are equally powerful. We need both. Because we cannot tackle diverse data, diverse experiences with only a single representation. We need multiple representation to be able to tackle that data. And this is how we will overcome bias of every sort. So it's not the question of who is going to build the AI models, it is a question of who is going to build the models, but not the question of will the AI models be built, because the AI models are already being built, but some of the models have biases into it from any kind of lack of representation. Like who's building the model, right? So I think it's very important. I think we have a powerful moment in history to change that, to make real impact. >> Because the trick is we all have bias. You can't do anything about it. We grew up in the world in which we grew up, we saw what we saw, we went to our schools, we had our family relationships et cetera. So everyone is locked into who they are. That's not the problem. The problem is the acceptance of bring in some other, (chuckles) and the combination will provide better outcomes, it's a proven scientific fact. >> I very much agree with that. I also think that having the freedom, having the choice to hear another person's conditioning, another person's experiences is very powerful, because that enriches our own experiences. Even if we are constrained, even if we are like that storage that has been structured and processed, we know that there's this other storage, and we can figure out how to get the freedom between the two point of views, right? And we have the freedom to choose. So that's very, very powerful, just having that freedom. >> So as we get ready to turn the calendar on 2017, which is hard to imagine it's true, it is. You look to 2018, what are some of your personal and professional priorities, what are you looking forward to, what are you working on, what's top of mind for Janet George? >> So right now I'm thinking about genetic algorithms, genetic machine learning algorithms. This has been around for a while, but I'll tell you where the power of genetic algorithms is, especially when you're creating powerful new technology memory cell. So when you start out trying to create a new technology memory cell, you have materials, material deformations, you have process, you have hundred permutation combination, and the genetic algorithms, we can quickly assign a cause function, and we can kill all the survival of the fittest, all that won't fit we can kill, arriving to the fastest, quickest new technology node, and then from there, we can scale that in mass production. So we can use these survival of the fittest mechanisms that evolution has used for a long period of time. So this is biology inspired. And using a cause function we can figure out how to get the best of every process, every technology, all the coupling effects, all the master effects of introducing a program voltage on a particular cell, reducing the program voltage on a particular cell, resetting and setting, and the neighboring effects, we can pull all that together, so 600, 700 permutation combination that we've been struggling on and not trying to figure out how to quickly narrow down to that perfect cell, which is the new technology node that we can then scale out into tens of millions of vehicles, right? >> Right, you're going to have to >> Getting to that spot. >> You're going to have to get me on the whiteboard on that one, Janet. That is amazing. Smart lady. >> Thank you. >> Thanks for taking a few minutes out of your time. Always great to catch up, and it was terrific to see you at Grace Hopper as well. >> Thank you, I really appreciate it, I appreciate it very much. >> All right, Janet George, I'm Jeff Frick. You are watching theCUBE. We're at Western Digital headquarters at Innovating to Fuel the Next Generation of Big Data. Thanks for watching.

Published Date : Oct 11 2017

SUMMARY :

the Next Decade of Big Data, in San Jose, California, it's the Almaden campus. the preconstructed queries that we do with business data, Right, cause there's been two things, right. of the data and assigning a schema to that data. and especially the Lambda architecture is very interesting, So what is the bound versus the unbound? the sensors constantly transmitting data to you, right? and the great example that you used and the data from the map as much as you can and it got it right 80% of the cases. and advancing the models, and I think that's, So it's not the question of who is going to Because the trick is we all have bias. having the choice to hear another person's conditioning, So as we get ready to turn the calendar on 2017, and the genetic algorithms, we can quickly assign You're going to have to get me on the whiteboard and it was terrific to see you at Grace Hopper as well. I appreciate it very much. at Innovating to Fuel the Next Generation of Big Data.

ENTITIES

Entity	Category	Confidence
Janet George	PERSON	0.99+
Jeff	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Janet	PERSON	0.99+
Western Digital	ORGANIZATION	0.99+
80%	QUANTITY	0.99+
two things	QUANTITY	0.99+
2018	DATE	0.99+
last week	DATE	0.99+
2017	DATE	0.99+
Goldman Sachs	ORGANIZATION	0.99+
San Jose, California	LOCATION	0.99+
two systems	QUANTITY	0.99+
two	QUANTITY	0.99+
today	DATE	0.99+
both	QUANTITY	0.99+
seven billion dollar	QUANTITY	0.99+
Fei-Fei Li	PERSON	0.98+
Almaden	LOCATION	0.98+
two point	QUANTITY	0.97+
one	QUANTITY	0.97+
first	QUANTITY	0.95+
Grace Hopper	ORGANIZATION	0.95+
theCUBE	ORGANIZATION	0.95+
hundred permutation	QUANTITY	0.95+
MAMR	ORGANIZATION	0.94+
Women in Data Science	ORGANIZATION	0.91+
tens of millions of vehicles	QUANTITY	0.9+
one of	QUANTITY	0.89+
Kappa	ORGANIZATION	0.89+
Dr.	PERSON	0.88+
single representation	QUANTITY	0.83+
a couple years ago	DATE	0.83+
earlier today	DATE	0.82+
Next Decade	DATE	0.81+
Lambda	TITLE	0.8+
101	OTHER	0.8+
600, 700 permutation	QUANTITY	0.77+
Lambda	ORGANIZATION	0.7+
of data	QUANTITY	0.67+
keynotes	QUANTITY	0.64+
Hadoop Summit	EVENT	0.62+
zero	QUANTITY	0.6+
number	OTHER	0.55+
Delta	OTHER	0.54+
two	OTHER	0.35+

Rob Thomas, IBM Analytics | IBM Fast Track Your Data 2017

>> Announcer: Live from Munich, Germany, it's theCUBE. Covering IBM: Fast Track Your Data. Brought to you by IBM. >> Welcome, everybody, to Munich, Germany. This is Fast Track Your Data brought to you by IBM, and this is theCUBE, the leader in live tech coverage. We go out to the events, we extract the signal from the noise. My name is Dave Vellante, and I'm here with my co-host Jim Kobielus. Rob Thomas is here, he's the General Manager of IBM Analytics, and longtime CUBE guest, good to see you again, Rob. >> Hey, great to see you. Thanks for being here. >> Dave: You're welcome, thanks for having us. So we're talking about, we missed each other last week at the Hortonworks DataWorks Summit, but you came on theCUBE, you guys had the big announcement there. You're sort of getting out, doing a Hadoop distribution, right? TheCUBE gave up our Hadoop distributions several years ago so. It's good that you joined us. But, um, that's tongue-in-cheek. Talk about what's going on with Hortonworks. You guys are now going to be partnering with them essentially to replace BigInsights, you're going to continue to service those customers. But there's more than that. What's that announcement all about? >> We're really excited about that announcement, that relationship, just to kind of recap for those that didn't see it last week. We are making a huge partnership with Hortonworks, where we're bringing data science and machine learning to the Hadoop community. So IBM will be adopting HDP as our distribution, and that's what we will drive into the market from a Hadoop perspective. Hortonworks is adopting IBM Data Science Experience and IBM machine learning to be a core part of their Hadoop platform. And I'd say this is a recognition. One is, companies should do what they do best. We think we're great at data science and machine learning. Hortonworks is the best at Hadoop. Combine those two things, it'll be great for clients. And, we also talked about extending that to things like Big SQL, where they're partnering with us on Big SQL, around modernizing data environments. And then third, which relates a little bit to what we're here in Munich talking about, is governance, where we're partnering closely with them around unified governance, Apache Atlas, advancing Atlas in the enterprise. And so, it's a lot of dimensions to the relationship, but I can tell you since I was on theCUBE a week ago with Rob Bearden, client response has been amazing. Rob and I have done a number of client visits together, and clients see the value of unlocking insights in their Hadoop data, and they love this, which is great. >> Now, I mean, the Hadoop distro, I mean early on you got into that business, just, you had to do it. You had to be relevant, you want to be part of the community, and a number of folks did that. But it's really sort of best left to a few guys who want to do that, and Apache open source is really, I think, the way to go there. Let's talk about Munich. You guys chose this venue. There's a lot of talk about GDPR, you've got some announcements around unified government, but why Munich? >> So, there's something interesting that I see happening in the market. So first of all, you look at the last five years. There's only 10 companies in the world that have outperformed the S&P 500, in each of those five years. And we started digging into who those companies are and what they do. They are all applying data science and machine learning at scale to drive their business. And so, something's happening in the market. That's what leaders are doing. And I look at what's happening in Europe, and I say, I don't see the European market being that aggressive yet around data science, machine learning, how you apply data for competitive advantage, so we wanted to come do this in Munich. And it's a bit of a wake-up call, almost, to say hey, this is what's happening. We want to encourage clients across Europe to think about how do they start to do something now. >> Yeah, of course, GDPR is also a hook. The European Union and you guys have made some talk about that, you've got some keynotes today, and some breakout sessions that are discussing that, but talk about the two announcements that you guys made. There's one on DB2, there's another one around unified governance, what do those mean for clients? >> Yeah, sure, so first of all on GDPR, it's interesting to me, it's kind of the inverse of Y2K, which is there's very little hype, but there's huge ramifications. And Y2K was kind of the opposite. So look, it's coming, May 2018, clients have to be GDPR-compliant. And there's a misconception in the market that that only impacts companies in Europe. It actually impacts any company that does any type of business in Europe. So, it impacts everybody. So we are announcing a platform for unified governance that makes sure clients are GDPR-compliant. We've integrated software technology across analytics, IBM security, some of the assets from the Promontory acquisition that IBM did last year, and we are delivering the only platform for unified governance. And that's what clients need to be GDPR-compliant. The second piece is data has to become a lot simpler. As you think about my comment, who's leading the market today? Data's hard, and so we're trying to make data dramatically simpler. And so for example, with DB2, what we're announcing is you can download and get started using DB2 in 15 minutes or less, and anybody can do it. Even you can do it, Dave, which is amazing. >> Dave: (laughs) >> For the first time ever, you can-- >> We'll test that, Rob. >> Let's go test that. I would love to see you do it, because I guarantee you can. Even my son can do it. I had my son do it this weekend before I came here, because I wanted to see how simple it was. So that announcement is really about bringing, or introducing a new era of simplicity to data and analytics. We call it Download And Go. We started with SPSS, we did that back in March. Now we're bringing Download And Go to DB2, and to our governance catalog. So the idea is make data really simple for enterprises. >> You had a community edition previous to this, correct? There was-- >> Rob: We did, but it wasn't this easy. >> Wasn't this simple, okay. >> Not anybody could do it, and I want to make it so anybody can do it. >> Is simplicity, the rate of simplicity, the only differentiator of the latest edition, or I believe you have Kubernetes support now with this new addition, can you describe what that involves? >> Yeah, sure, so there's two main things that are new functionally-wise, Jim, to your point. So one is, look, we're big supporters of Kubernetes. And as we are helping clients build out private clouds, the best answer for that in our mind is Kubernetes, and so when we released Data Science Experience for Private Cloud earlier this quarter, that was on Kubernetes, extending that now to other parts of the portfolio. The other thing we're doing with DB2 is we're extending JSON support for DB2. So think of it as, you're working in a relational environment, now just through SQL you can integrate with non-relational environments, JSON, documents, any type of no-SQL environment. So we're finally bringing to fruition this idea of a data fabric, which is I can access all my data from a single interface, and that's pretty powerful for clients. >> Yeah, more cloud data development. Rob, I wonder if you can, we can go back to the machine learning, one of the core focuses of this particular event and the announcements you're making. Back in the fall, IBM made an announcement of Watson machine learning, for IBM Cloud, and World of Watson. In February, you made an announcement of IBM machine learning for the z platform. What are the machine learning announcements at this particular event, and can you sort of connect the dots in terms of where you're going, in terms of what sort of innovations are you driving into your machine learning portfolio going forward? >> I have a fundamental belief that machine learning is best when it's brought to the data. So, we started with, like you said, Watson machine learning on IBM Cloud, and then we said well, what's the next big corpus of data in the world? That's an easy answer, it's the mainframe, that's where all the world's transactional data sits, so we did that. Last week with the Hortonworks announcement, we said we're bringing machine learning to Hadoop, so we've kind of covered all the landscape of where data is. Now, the next step is about how do we bring a community into this? And the way that you do that is we don't dictate a language, we don't dictate a framework. So if you want to work with IBM on machine learning, or in Data Science Experience, you choose your language. Python, great. Scala or Java, you pick whatever language you want. You pick whatever machine learning framework you want, we're not trying to dictate that because there's different preferences in the market, so what we're really talking about here this week in Munich is this idea of an open platform for data science and machine learning. And we think that is going to bring a lot of people to the table. >> And with open, one thing, with open platform in mind, one thing to me that is conspicuously missing from the announcement today, correct me if I'm wrong, is any indication that you're bringing support for the deep learning frameworks like TensorFlow into this overall machine learning environment. Am I wrong? I know you have Power AI. Is there a piece of Power AI in these announcements today? >> So, stay tuned on that. We are, it takes some time to do that right, and we are doing that. But we want to optimize so that you can do machine learning with GPU acceleration on Power AI, so stay tuned on that one. But we are supporting multiple frameworks, so if you want to use TensorFlow, that's great. If you want to use Caffe, that's great. If you want to use Theano, that's great. That is our approach here. We're going to allow you to decide what's the best framework for you. >> So as you look forward, maybe it's a question for you, Jim, but Rob I'd love you to chime in. What does that mean for businesses? I mean, is it just more automation, more capabilities as you evolve that timeline, without divulging any sort of secrets? What do you think, Jim? Or do you want me to ask-- >> What do I think, what do I think you're doing? >> No, you ask about deep learning, like, okay, that's, I don't see that, Rob says okay, stay tuned. What does it mean for a business, that, if like-- >> Yeah. >> If I'm planning my roadmap, what does that mean for me in terms of how I should think about the capabilities going forward? >> Yeah, well what it means for a business, first of all, is what they're going, they're using deep learning for, is doing things like video analytics, and speech analytics and more of the challenges involving convolution of neural networks to do pattern recognition on complex data objects for things like connected cars, and so forth. Those are the kind of things that can be done with deep learning. >> Okay. And so, Rob, you're talking about here in Europe how the uptick in some of the data orientation has been a little bit slower, so I presume from your standpoint you don't want to over-rotate, to some of these things. But what do you think, I mean, it sounds like there is difference between certainly Europe and those top 10 companies in the S&P, outperforming the S&P 500. What's the barrier, is it just an understanding of how to take advantage of data, is it cultural, what's your sense of this? >> So, to some extent, data science is easy, data culture is really hard. And so I do think that culture's a big piece of it. And the reason we're kind of starting with a focus on machine learning, simplistic view, machine learning is a general-purpose framework. And so it invites a lot of experimentation, a lot of engagement, we're trying to make it easier for people to on-board. As you get to things like deep learning as Jim's describing, that's where the market's going, there's no question. Those tend to be very domain-specific, vertical-type use cases and to some extent, what I see clients struggle with, they say well, I don't know what my use case is. So we're saying, look, okay, start with the basics. A general purpose framework, do some tests, do some iteration, do some experiments, and once you find out what's hunting and what's working, then you can go to a deep learning type of approach. And so I think you'll see an evolution towards that over time, it's not either-or. It's more of a question of sequencing. >> One of the things we've talked to you about on theCUBE in the past, you and others, is that IBM obviously is a big services business. This big data is complicated, but great for services, but one of the challenges that IBM and other companies have had is how do you take that service expertise, codify it to software and scale it at large volumes and make it adoptable? I thought the Watson data platform announcement last fall, I think at the time you called it Data Works, and then so the name evolved, was really a strong attempt to do that, to package a lot of expertise that you guys had developed over the years, maybe even some different software modules, but bring them together in a scalable software package. So is that the right interpretation, how's that going, what's the uptake been like? >> So, it's going incredibly well. What's interesting to me is what everybody remembers from that announcement is the Watson Data Platform, which is a decomposable framework for doing these types of use cases on the IBM cloud. But there was another piece of that announcement that is just as critical, which is we introduced something called the Data First method. And that is the recipe book to say to a client, so given where you are, how do you get to this future on the cloud? And that's the part that people, clients, struggle with, is how do I get from step to step? So with Data First, we said, well look. There's different approaches to this. You can start with governance, you can start with data science, you can start with data management, you can start with visualization, there's different entry points. You figure out the right one for you, and then we help clients through that. And we've made Data First method available to all of our business partners so they can go do that. We work closely with our own consulting business on that, GBS. But that to me is actually the thing from that event that has had, I'd say, the biggest impact on the market, is just helping clients map out an approach, a methodology, to getting on this journey. >> So that was a catalyst, so this is not a sequential process, you can start, you can enter, like you said, wherever you want, and then pick up the other pieces from majority model standpoint? Exactly, because everybody is at a different place in their own life cycle, and so we want to make that flexible. >> I have a question about the clients, the customers' use of Watson Data Platform in a DevOps context. So, are more of your customers looking to use Watson Data Platform to automate more of the stages of the machine learning development and the training and deployment pipeline, and do you see, IBM, do you see yourself taking the platform and evolving it into a more full-fledged automated data science release pipelining tool? Or am I misunderstanding that? >> Rob: No, I think that-- >> Your strategy. >> Rob: You got it right, I would just, I would expand a little bit. So, one is it's a very flexible way to manage data. When you look at the Watson Data Platform, we've got relational stores, we've got column stores, we've got in-memory stores, we've got the whole suite of open-source databases under the composed-IO umbrella, we've got cloud in. So we've delivered a very flexible data layer. Now, in terms of how you apply data science, we say, again, choose your model, choose your language, choose your framework, that's up to you, and we allow clients, many clients start by building models on their private cloud, then we say you can deploy those into the Watson Data Platform, so therefore then they're running on the data that you have as part of that data fabric. So, we're continuing to deliver a very fluid data layer which then you can apply data science, apply machine learning there, and there's a lot of data moving into the Watson Data Platform because clients see that flexibility. >> All right, Rob, we're out of time, but I want to kind of set up the day. We're doing CUBE interviews all morning here, and then we cut over to the main tent. You can get all of this on IBMgo.com, you'll see the schedule. Rob, you've got, you're kicking off a session. We've got Hilary Mason, we've got a breakout session on GDPR, maybe set up the main tent for us. >> Yeah, main tent's going to be exciting. We're going to debunk a lot of misconceptions about data and about what's happening. Marc Altshuller has got a great segment on what he calls the death of correlations, so we've got some pretty engaging stuff. Hilary's got a great piece that she was talking to me about this morning. It's going to be interesting. We think it's going to provoke some thought and ultimately provoke action, and that's the intent of this week. >> Excellent, well Rob, thanks again for coming to theCUBE. It's always a pleasure to see you. >> Rob: Thanks, guys, great to see you. >> You're welcome; all right, keep it right there, buddy, We'll be back with our next guest. This is theCUBE, we're live from Munich, Fast Track Your Data, right back. (upbeat electronic music)

Published Date : Jun 22 2017

SUMMARY :

Brought to you by IBM. This is Fast Track Your Data brought to you by IBM, Hey, great to see you. It's good that you joined us. and machine learning to the Hadoop community. You had to be relevant, you want to be part of the community, So first of all, you look at the last five years. but talk about the two announcements that you guys made. Even you can do it, Dave, which is amazing. I would love to see you do it, because I guarantee you can. but it wasn't this easy. and I want to make it so anybody can do it. extending that now to other parts of the portfolio. What are the machine learning announcements at this And the way that you do that is we don't dictate I know you have Power AI. We're going to allow you to decide So as you look forward, maybe it's a question No, you ask about deep learning, like, okay, that's, and speech analytics and more of the challenges But what do you think, I mean, it sounds like And the reason we're kind of starting with a focus One of the things we've talked to you about on theCUBE And that is the recipe book to say to a client, process, you can start, you can enter, and deployment pipeline, and do you see, IBM, models on their private cloud, then we say you can deploy and then we cut over to the main tent. and that's the intent of this week. It's always a pleasure to see you. This is theCUBE, we're live from Munich,

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Dave Vellante	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Jim	PERSON	0.99+
Europe	LOCATION	0.99+
Rob	PERSON	0.99+
Marc Altshuller	PERSON	0.99+
Hilary	PERSON	0.99+
Hilary Mason	PERSON	0.99+
Rob Bearden	PERSON	0.99+
February	DATE	0.99+
Dave	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Rob Thomas	PERSON	0.99+
May 2018	DATE	0.99+
March	DATE	0.99+
Munich	LOCATION	0.99+
Scala	TITLE	0.99+
Apache	ORGANIZATION	0.99+
second piece	QUANTITY	0.99+
Last week	DATE	0.99+
Java	TITLE	0.99+
last year	DATE	0.99+
two announcements	QUANTITY	0.99+
10 companies	QUANTITY	0.99+
GDPR	TITLE	0.99+
Python	TITLE	0.99+
DB2	TITLE	0.99+
15 minutes	QUANTITY	0.99+
last week	DATE	0.99+
IBM Analytics	ORGANIZATION	0.99+
European Union	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
JSON	TITLE	0.99+
Watson Data Platform	TITLE	0.99+
third	QUANTITY	0.99+
One	QUANTITY	0.99+
this week	DATE	0.98+
today	DATE	0.98+
a week ago	DATE	0.98+
two things	QUANTITY	0.98+
SQL	TITLE	0.98+
last fall	DATE	0.98+
2017	DATE	0.98+
Munich, Germany	LOCATION	0.98+
each	QUANTITY	0.98+
Y2K	ORGANIZATION	0.98+

Scott Gnau, Hortonworks - DataWorks Summit 2017

>> Announcer: Live, from San Jose, in the heart of Silicon Valley, it's The Cube, covering DataWorks Summit 2017. Brought to you by Hortonworks. >> Welcome back to The Cube. We are live at DataWorks Summit 2017. I'm Lisa Martin with my cohost, George Gilbert. We've just come from this energetic, laser light show infused keynote, and we're very excited to be joined by one of the keynotes today, the CTO of Hortonworks, Scott Gnau. Scott, welcome back to The Cube. >> Great to be here, thanks for having me. >> Great to have you back here. One of the things that you talked about in your keynote today was collaboration. You talked about the modern data architecture and one of the things that I thought was really interesting is that now where Horton Works is, you are empowering cross-functional teams, operations managers, business analysts, data scientists, really helping enterprises drive the next generation of value creation. Tell us a little bit about that. >> Right, great. Thanks for noticing, by the way. I think the next, the important thing, kind of as a natural evolution for us as a company and as a community is, and I've seen this time and again in the tech industry, we've kind of moved from really cool breakthrough tech, more into a solutions base. So I think this whole notion is really about how we're making that natural transition. And when you think about all the cool technology and all the breakthrough algorithms and all that, that's really great, but how do we then take that and turn it to value really quickly and in a repeatable fashion. So, the notion that I launched today is really making these three personas really successful. If you can focus, combining all of the technology, usability and even some services around it, to make each of those folks more successful in their job. So I've broken it down really into three categories. We know the traditional business analyst, right? They've Sequel and they've been doing predictive modeling of structured data for a very long time, and there's a lot of value generated from that. Making the business analyst successful Hadoop inspired world is extremely valuable. And why is that? Well, it's because Hadoop actually now brings a lot more breadth of data and frankly a lot more depth of data than they've ever had access to before. But being able to communicate with that business analyst in a language they understand, Sequel, being able to make all those tools work seamlessly, is the next extension of success for the business analyst. We spent a lot of time this morning talking about data scientists, the next great frontier where you bring together lots and lots and lots and lots of data, for instance, Skin and Math and Heavy Compute, with the data scientists and really enable them to go build out that next generation of high definition kind of analytics, all right, and we're all, certainly I am, captured by the notion of self-driving cars, and you think about a self-driving car, and the success of that is purely based on the successful data science. In those cameras and those machines being able to infer images more accurately than a human being, and then make decisions about what those images mean. That's all data science, and it's all about raw processing power and lots and lots and lots of data to make those models train and more accurate than what would otherwise happen. So enabling the data scientist to be successful, obviously, that's a use case. You know, certainly voice activated, voice response kinds of systems, for better customer service; better fraud detection, you know, the cost of a false positive is a hundred times the cost of missing a fraudulent behavior, right? That's because you've irritated a really good customer. So being able to really train those models in high definition is extremely valuable. So bringing together the data, but the tool set so that data scientists can actually act as a team and collaborate and spend less of their time finding the data, and more of their time providing the models. And I said this morning, last but not least, the operations manager. This is really, really, really important. And a lot of times, especially geeks like myself, are just, ah, operations guys are just a pain in the neck. Really, really, really important. We've got data that we've never thought of. Making sure that it's secured properly, making sure that we're managing within the regulations of privacy requirements, making sure that we're governing it and making sure how that data is used, alongside our corporate mission is really important. So creating that tool set so that the operations manager can be confident in turning these massive files of data to the business analyst and to the data scientist and be confident that the company's mission, the regulation that they're working within in those jurisdictions are all in compliance. And so that's what we're building on, and that stack, of course, is built on open source Apache Atlas and open source Apache Ranger and it really makes for an enterprise grade experience. >> And a couple things to follow on to that, we've heard of this notion for years, that there is a shortage of data scientists, and now, it's such a core strategic enabler of business transformation. Is this collaboration, this team support that was talked about earlier, is this helping to spread data science across these personas to enable more of the to be data scientists? >> Yeah, I think there are two aspects to it, right? One is certainly really great data scientists are hard to find; they're scarce. They're unique creatures. And so, to the extent that we're able to combine the tool set to make the data scientists that we have more productive, and I think the numbers are astronomical, right? You could argue that, with the wrong tool set, a data scientist might spend 80% or 90% of his or her time just finding the data and only 10% working on the problem. If we can flip that around and make it 10% finding the data and 90%, that's like, in order of magnitude, more breadth of data science coverage that we get from the same pool of data scientists, so I think that from an efficiency perspective, that's really huge. The second thing, though, is that by looking at these personas and the tools that we're rolling out, can we start to package up things that the data scientists are learning and move those models into the business analysts desktop. So, now, not only is there more breadth and depth of data, but frankly, there's more depth and breadth of models that can be run, but inferred with traditional business process, which means, turning that into better decision making, turning that into better value for the business, just kind of happens automatically. So, you're leveraging the value of data scientists. >> Let me follow that up, Scott. So, if the, right now the biggest time sync for the data scientist or the data engineer is data cleansing and transformation. Where do the cloud vendors fit in in terms of having trained some very broad horizontal models in terms of vision, natural language understanding, text to speech, so where they have accumulated a lot of data assets, and then they created models that were trained and could be customized. Do you see a role for, not just mixed gen UI related models coming from the cloud vendors, but for other vendors who have data assets to provide more fully baked models so that you don't have to start from scratch? >> Absolutely. So, one of the things that I talked about also this morning is this notion, and I said it this morning, kind of opens where open community, open source, and open ecosystem, I think it's now open to the third power, right, and it's talking about open models and algorithms. And I think all of those same things are really creating a tremendous opportunity, the likes of which we've not seen before, and I think it's really driving the velocity in the market, right, so there's no, because we're collaborating in the open, things just get done faster and more efficiently, whether it be in the core open source stuff or whether it be in the open ecosystem, being able to pull tools in. Of course, the announcement earlier today, with IBMs Data Science Experience software as a framework for the data scientists to work as a team, but that thing in and of itself is also very open. You can plug in Python, you can plug in open source models and libraries, some of which were developed in the cloud and published externally. So, it's all about continued availability of open collaboration that is the hallmark of this wave of technology. >> Okay, so we have this issue of how much can we improve the productivity with better tools or with some amount of data. But then, the part that everyone's also point out, besides the cloud experience, is also the ability to operationalize the models and get them into production either in Bespoke apps or packaged apps. How's that going to sort of play out over time? >> Well, I think two things you'll see. One, certainly in the near term, again, with our collaboration with IBM and the Data Science Experience. One of the key things there is not only, not just making the data scientists be able to be more collaborative, but also the ease of which they can publish their models out into the wild. And so, kind of closing that loop to action is really important. I think, longer term, what you're going to see, and I gave a hint of this a little bit in my keynote this morning, is, I believe in five years, we'll be talking about scalability, but scalability won't be the way we think of it today, right? Oh, I have this many petabytes under management, or, petabytes. That's upkeep. But truly, scalability is going to be how many connected devices do you have interacting, and how many analytics can you actually push from model perspective, actually out to the center or out to the device to run locally. Why is that important? Think about it as a consumer with a mobile device. The time of interaction, your attention span, do you get an offer in the right time, and is that offer relevant. It can't be rules based, it has to be models based. There's no time for the electrons to move from your device across a power grid, run an analytic and have it come back. It's going to happen locally. So scalability, I believe, is going to be determined in terms of the CPU cycles and the total interconnected IOT network that you're working in. What does that mean from your original question? That means applications have to be portable, models have to be portable so that they can execute out to the edge where it's required. And so that's, obviously, part of the key technology that we're working with in Portworks Data Flow and the combination of Apache Nifi and Apache Caca and Storm to really combine that, "How do I manage, not only data in motion, but ultimately, how do I move applications and analytics to the data and not be required to move the data to the analytics?" >> So, question for you. You talked about real time offers, for example. We talk a lot about predicted analytics, advanced analytics, data wrangling. What are your thoughts on preemptive analytics? >> Well, I think that, while that sounds a little bit spooky, because we're kind of mind reading, I think those things can start to exist. Certainly because we now have access to all of the data and we have very sophisticated data science models that allow us to understand and predict behavior, yeah, the timing of real time analytics or real time offer delivery, could actually, from our human being perception, arrive before I thought about it. And isn't that really cool in a way. I'm thinking about, I need to go do X,Y,Z. Here's a relevant offer, boom. So it's no longer, I clicked here, I clicker here, I clicked here, and in five seconds I get a relevant offer, but before I even though to click, I got a relevant offer. And again, to the extent that it's relevant, it's not spooky. >> Right. >> If it's irrelevant, then you deal with all of the other downstream impact. So that, again, points to more and more and more data and more and more and more accurate and sophisticated models to make sure that that relevance exists. >> Exactly. Well, Scott Gnau, CTO of Hortonworks, thank you so much for stopping by The Cube once again. We appreciate your conversation and insights. And for George Gilbert, I am Lisa Martin. You're watching The Cube live, from day one of the DataWorks Summit in the heart of Silicon Valley. Stick around, though, we'll be right back.

Published Date : Jun 13 2017

SUMMARY :

in the heart of Silicon Valley, it's The Cube, the CTO of Hortonworks, Scott Gnau. One of the things that you talked about So enabling the data scientist to be successful, And a couple things to follow on to that, and the tools that we're rolling out, for the data scientist or the data engineer as a framework for the data scientists to work as a team, is also the ability to operationalize the models not just making the data scientists be able to be You talked about real time offers, for example. And again, to the extent that it's relevant, So that, again, points to more and more and more data of the DataWorks Summit in the heart of Silicon Valley.

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
George Gilbert	PERSON	0.99+
Scott	PERSON	0.99+
IBM	ORGANIZATION	0.99+
80%	QUANTITY	0.99+
San Jose	LOCATION	0.99+
10%	QUANTITY	0.99+
90%	QUANTITY	0.99+
Scott Gnau	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
IBMs	ORGANIZATION	0.99+
Python	TITLE	0.99+
two aspects	QUANTITY	0.99+
five seconds	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
One	QUANTITY	0.99+
DataWorks Summit 2017	EVENT	0.98+
Horton Works	ORGANIZATION	0.98+
Hadoop	TITLE	0.98+
one	QUANTITY	0.98+
DataWorks Summit	EVENT	0.98+
today	DATE	0.98+
each	QUANTITY	0.98+
five years	QUANTITY	0.97+
third	QUANTITY	0.96+
second thing	QUANTITY	0.96+
Apache Caca	ORGANIZATION	0.95+
three personas	QUANTITY	0.95+
this morning	DATE	0.95+
Apache Nifi	ORGANIZATION	0.95+
this morning	DATE	0.94+
three categories	QUANTITY	0.94+
CTO	PERSON	0.93+
The Cube	TITLE	0.9+
Sequel	PERSON	0.89+
Apache Ranger	ORGANIZATION	0.88+
two things	QUANTITY	0.86+
hundred times	QUANTITY	0.85+
Portworks	ORGANIZATION	0.82+
earlier today	DATE	0.8+
Data Science Experience	TITLE	0.79+
The Cube	ORGANIZATION	0.78+
Apache Atlas	ORGANIZATION	0.75+
Storm	ORGANIZATION	0.74+
day one	QUANTITY	0.74+
wave	EVENT	0.69+
one of the keynotes	QUANTITY	0.66+
lots	QUANTITY	0.63+
years	QUANTITY	0.53+
Hortonworks	EVENT	0.5+
lots of data	QUANTITY	0.49+
Sequel	ORGANIZATION	0.46+
Flow	ORGANIZATION	0.39+

Melvin Greer, Intel | AWS Public Sector Summit 2017

>> Narrator: Live from Washington D.C. it's the CUBE covering the AWS Public Sector Summit 2017. Brought to you by the Amazon web services and its partner Ecosystem. >> Melvin Greer is with us now he's the director of Data Science and Analytics at Intel. Now Melvin, thank you for being here with us on the CUBE. Good to see you here this morning. >> Thank you John and John I appreciate getting a chance to talk with you it's great to be here at the AWS Public Sector Summit. >> Yeah we make it easy for you. >> I never forget the names. >> John and John. Let's talk just about data science in general and analytics I mean tell us about, give us the broad definition of that. You know the elevator speech about what's being done and then we'll drill down a little bit deeper about Intel and what you're doing with in terms of government work and healthcare work. >> Sure well data science and analytics covers a number of key areas and it's really important to consider the granularity of each of these key areas. Primarily because there's so much confusion about what people think of as artificial intelligence. It's certainly got a number of facets associated with it. So we have core analytics like descriptive, diagnostic, predictive and prescriptive. This describes what happened, what's going to happen next, why is it happening and what should I do about it. So those are core analytics. >> And (mumbles) oh go ahead. >> And a different tech we have machine learning cognitive computing. These things are different than core analytics in that they are recognizing patterns and relying on the concepts of training algorithms and then inference. The use of these trained algorithms to infer new knowledge. And then we have things like deep learning and convolutional neuro networks which use convolutional layers to drive better and better granularity and understanding of data. They often typically don't rely on training and have a large focus area around deep learning and deep cognitive skills. And then all of those actually line up in this discussion around narrow artificial intelligence and you've seen a lot of that already haven't you john? You've seen where we teach a machine how to play poker or we teach a machine how to play Jeopardy or Go. These are narrow AI applications. When we think about general AI however, this is much different. This is when we're actually outsourcing human cognition to a thinking machine at internet speed. >> This is amazing I love this conversation cause couple things, in that thread you just brought up is poker which is great cause it's not just Jeopardy it's poker is unknown conditions. You don't know the personality of the other guy. You don't know their cards their dealing with so it's a lot like unstructured data and you have to think about that so but it really highlights the (mumbles) between super computing paradigm and data and that really kind of changes the game on data science cause the old data warehouse model storing information, pulling it back, latency, and so we're seeing machine learning in these new aps really disrupting old data analytics models. So, I want to get your thoughts on this because and what is Intel doing because you guys have restructured things a bit differently. The AI messages out there as this new revolution takes place with data, how are you guys handling that? >> So Intel formed in late 2016 its artificial intelligence product group and the formation of this group is extremely consistent with our pivot to becoming a data company. So we're certainly not going to be abandoning any of that great performance and strong capabilities that we have in silicon architectures but as a data company it means that now we're going to be using all of these assets in artificial intelligence, machine learning cognitive computing and Intel in fact by using this is really in a unique position to focus on what we have termed and what you'll hear our CEO talk about as the virtuous cycle of growth. This cycle of growth includes cloud computing, data center, and IOT. And our ability to harness the power of artificial intelligence in data science and analytics means that Intel is really capable of driving this discussion around cloud computing and powering the cloud and also driving the work that's required to make a smart and a connected world a reality. Our artificial intelligence product group expands our portfolio and it means that we're bringing all these capabilities that I talked to you that make up data science and analytics. Cognitive, machine learning, artificial intelligence, deep learning, convolutional neuro networks, to bare to solve some of the nation's most significant and important problems and it means that Intel with its partners are really focused on the utilization of our core capabilities to drive government missions. >> Well give us an example then in terms of federal government NAI. How you're applying that to the operation of what's going on in this giant bureaucracy of a town that we have. >> So one of the things that I'm most excited about it that there's really no agency almost every federal agency in the U.S. is doing an investigation of artificial intelligence. It started off with this discussion around business intelligence and as you said data warehousing and other things but clearly the government has come to realize that turning data into a strategic asset is important, very very important. And so there are a number of key domain spaces in the federal government where Intel has made a significant impact. One is in health and life sciences so when you think about health and life sciences and biometrics, genomics, using advanced analytics for phenotype and genotype analysis this is where Intel's strengths are in performance in the ability to deliver. We created a collaborative cancer cloud that allows researches to use Intel hardware and software to accelerate the learnings from all of these health and life sciences advances that they want. Sharing data without compromising that data. We're focused significantly on cyber intelligence where we're applying threat and vulnerability analytics to understanding how to identify real cyber problems and big cyber vulnerabilities. We are now able to use Intel products to encrypt from the bios all the way up through the application stack and what it means is, is that our government clients who typically are hyper sensitive around security, get a chance to have data follow their respective process and meet their mission in a safe and secure way. >> If I can drill down on that for a second cause this is kind of a really sweet area for innovation. Data is now the new development environment the new development >> You said Bacon is the Oil is the new bacon (laughing) >> Versus the gold nuggets so I was talking with >> You hear what he said? >> No. >> It's the new bacon. >> The new bacon (laughs) love that. >> Data's the new bacon. >> Everyone loves bacon, everyone loves data. There's a thirst for the data and this also applies is that I ask you the role of the CDO, the chief data officer is emerging in companies and so we're seeing that also at the federal level. I want to get your thoughts on that but to quote the professor from Carnegie Mellon who I interviewed last week said the problem with a lot of data problems its like looking for a needle in the haystack with there's so much data now you have a haystack of needles so his premise is you can't find everything you got to use machine learning and AI to help with that so this is also going to be an issue for this chief data officer a new role. So is there a chief data officer role is there a need for that is there a CCO? Who handles the data? (laughing) >> Yeah so this is >> it's a tough one cause there's a lot a tech involved but also there's policies. >> Yeah so the federal government has actually mandated that each agency assign a federal chief data officer at the agency level and this person is working very closely with the chief information officer and the agency leaders to insure that they have the ability to take advantage of this large set of data that they collect. Intel's been working with most of the folks in the federal data cabinet who are the CDO's who are working to solve this problem around data and analysis of data. We're excited about the fact that we have chief data officers as an entry point to help discuss this hyper convergence that you described in technology. Where we have large data sets, we have faster hardware, of course Intel's helping to provide much of that and then better mathematics and algorithms. When we converge these three things together it's the soup that makes it possible for us to continue to drive artificial intelligence but that not withstanding federal data officers have a really hard job and we've been engaging them at many levels. We just had our artificial intelligence day in government where we had folks from many federal agencies that are on that cabinet and they shared with us directly how important it is to get Intel's on both hardware, hardware performance but also on software. When we think about artificial intelligence and the chief data officer or the data scientist this is likely a different individual than the person that is buying our silicon architectures. This is a person who is focused primarily on an agency mission and is looking for Intel to provide hardware and software capabilities that drive that mission. >> I got to ask you from an Intel perspective you guys are doing a lot of innovative things you have a great R and D group but also silicon you mentioned is important and you know software is eating the world but data's eating software so what's next what's eating data? We believe it's memory and silica and so one of the trends in big data is real time analytics is moving closer and closer to memory and then and now silicon who have some of those security paradigms with data involved seeing silicon implementations, root security, malware, firmware, kind of innovations. This is an interesting trend cause if software gets on to the silicon to the level that is better security you have fingerprinting all kinds of technologies. How is that going to impact the analytics world? So if you believe that they want faster lower latency data it's going to end up in the silicon. >> John you described exactly why Intel is focused on the virtuous cycle of growth. Because as more cloud enabled data moves itself from the cloud through our 5g networks and out to the edge in IOT devices whether they be autonomous vehicles or drones this is exactly why we have this continuum that allows data to move seamlessly between these three areas and operationalizes the core missions of government as well as provides a unique experience that most people can't even imagine. You likely saw the NBA finals you talked about Kevin Durant and you saw there the Intel 360 demonstration >> Love that! >> Where you're able to see how through different camera angles the entire play is unfolding. That is a prime example of how we use back end cloud hyper connected hardware with networks and edge devices where we're pushing analytics closer and closer to the edge >> by the way that's a real life media example of an IOT situation where it's at the edge of the network AKA stadium. I mean we geek out on that as well as Amazon has the MLB thing Andy (mumbles) knows I love that because it's like we're both baseball fans. >> We're excited about it too we think that along with autonomous vehicles, we think that this whole concept of experiences rather than capabilities and technologies >> but most people don't know that that example of basketball takes massive amounts of compute I mean to make that work at that level. >> In real time. >> This is the CG environment we're seeing with gaming culture the people are expecting an interface that looks more like Call of Duty (laughing) or Minecraft than they are Windows desktop machines what we're used to. We think that's great. >> That's why we say we're building the future John. (men laughing) >> You touched on something you said a little bit ago. A data officer of the federal government has got a tough job, a big job. >> Yes. >> What's the difference between private and public sector somebody who is handling the same kinds of responsibilities but has different compliance pressures different enforcement pressures and those kinds of things so somebody in the public space, what are they facing that somebody on the other side of the fence is not? >> All data officers have a tough job whether it's about cleansing data, being able to ingest it. What we talk about, and you described this, a haystack of needles is the need and ability to create a hyper relevancy to data because hyper relevancy is what makes it possible for personalized medicine and precision medicine. That's what makes it possible for us to do hyper scale personalized retail. This is what makes it possible to drive new innovation is this hyper relevancy and so whether you're working in a highly regulated environment like energy or financial services or whether you're working in the federal government with the department of defense and intelligence agencies or deep space exploration like at NASA you're still solving many data problems that are in common. Of course there are some differences right when you work for the federal government you're a steward of citizen's data that adds a different level of responsibility. There's a legal framework that guides how that data's handled as opposed to just a regulatory and legal one but when it comes to artificial intelligence all of us as practitioners are really focusing on the legal, ethical, and societal implications associate with the implementation of these advanced technologies. >> Quick question end this segment I know we're a little running over time but I wanted to get this last point in and this is something that we've talked on the CUBE a lot me and Dave have been debating because data is very organic innovation. You don't know what your going to do until you get into it, alchemy if you will, but trust and security and policy is a top down slow down mentality so often in the past it's been restricting growth so the balance here that you're getting at is how do you provide the speed and agility of real time experiences while maintaining all the trust and secure requirements that have slowed things down. >> You mention a topic there John and in my last book, 21st Century Leadership I actually described this concept as ambidextrous leadership. This concept of being able to do operational excellence extremely well and focus on delivery of core mission and at the same time be in a position to drive innovation and look forward enough to think about how, not where you are today but where you will be going in the future. This ambidexterity is really a critical factor when we talk about all leadership today, not just leaders in government or people who just work mostly on artificial intelligence. >> It's multidimensional, multi disciplined too right I mean. >> That's right, that's right. >> That's the dev opps ethos, that's the cloud. Move fast, I mean Mark Zuckerberg had the best quote with Facebook, "move fast and break stuff" up until that time he had about a billion users and then changed to move fast and be secure and reliable. (laughing) >> Yeah and don't break anything >> Well he understood you can't just break stuff at some point you got to move fast and be reliable. >> One of five books I want to mention by the way. >> That's right I'm working on my sixth and seventh now but yeah. >> And also the managing of the Greer Institute of Leadership and Management so you've written now almost seven books, you're running this leadership, you're working with Intel what do you do in your spare time Melvin? >> My wife is the chef and >> He eats a lot. (laughing) >> And so I get a chance to chance to enjoy all of the great food she cooks and I have two young sons and they keep me very very busy believe me. >> I think you're busy enough (laughing). Thanks for being on the CUBE. >> I very much appreciate it. >> It's good to have you >> Thank you. >> With us here at the AWS Public Sector Summit back with more coverage live with here on the Cube, Washington D.C. right after this.

Published Date : Jun 13 2017

SUMMARY :

Brought to you by the Amazon web services Good to see you here this morning. chance to talk with you it's great to be here at You know the elevator speech about what's being done to consider the granularity of each of these key areas. a lot of that already haven't you john? You don't know the personality of the other guy. intelligence product group and the formation of this going on in this giant bureaucracy of a town that we have. are in performance in the ability to deliver. Data is now the new development environment The new bacon (laughs) that also at the federal level. it's a tough one cause We're excited about the fact that we have chief data How is that going to impact the analytics world? You likely saw the NBA finals you talked about angles the entire play is unfolding. by the way that's a of compute I mean to make that work at that level. This is the CG environment That's why we say we're building the future John. A data officer of the federal government has got a tough a haystack of needles is the need and ability it's been restricting growth so the balance here at the same time be in a position to drive innovation and It's multidimensional, That's the dev opps ethos, that's the cloud. at some point you got to move fast and be reliable. That's right I'm working on my sixth and seventh now (laughing) And so I get a chance to chance to enjoy all of Thanks for being on the CUBE. on the Cube, Washington D.C. right after this.

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
Melvin	PERSON	0.99+
Melvin Greer	PERSON	0.99+
Dave	PERSON	0.99+
Call of Duty	TITLE	0.99+
Mark Zuckerberg	PERSON	0.99+
NASA	ORGANIZATION	0.99+
Andy	PERSON	0.99+
Kevin Durant	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
Minecraft	TITLE	0.99+
last week	DATE	0.99+
Washington D.C.	LOCATION	0.99+
sixth	QUANTITY	0.99+
Greer Institute of Leadership and Management	ORGANIZATION	0.99+
One	QUANTITY	0.99+
late 2016	DATE	0.99+
Intel	ORGANIZATION	0.99+
seventh	QUANTITY	0.99+
each agency	QUANTITY	0.98+
Ecosystem	ORGANIZATION	0.98+
both	QUANTITY	0.98+
each	QUANTITY	0.97+
AWS Public Sector Summit	EVENT	0.97+
five books	QUANTITY	0.97+
Windows	TITLE	0.97+
one	QUANTITY	0.97+
AWS Public Sector Summit 2017	EVENT	0.97+
three things	QUANTITY	0.97+
about a billion users	QUANTITY	0.96+
two young sons	QUANTITY	0.96+
three areas	QUANTITY	0.91+
second	QUANTITY	0.9+
U.S.	LOCATION	0.88+
seven books	QUANTITY	0.87+
today	DATE	0.87+
360	COMMERCIAL_ITEM	0.84+
MLB	EVENT	0.83+
21st Century	TITLE	0.77+
Carnegie Mellon	ORGANIZATION	0.77+
this morning	DATE	0.75+
mumbles	PERSON	0.73+
Cube, Washington D.C.	LOCATION	0.71+
john	PERSON	0.67+
NBA	EVENT	0.67+
government	ORGANIZATION	0.61+
Data Science	ORGANIZATION	0.58+
CUBE	ORGANIZATION	0.58+
agency	QUANTITY	0.57+
couple things	QUANTITY	0.56+
Narrator	TITLE	0.53+
Jeopardy	TITLE	0.53+
baseball	TITLE	0.51+
of	ORGANIZATION	0.5+

Mark Grover & Jennifer Wu | Spark Summit 2017

>> Announcer: Live from San Francisco, it's the Cube covering Spark Summit 2017, brought to you by databricks. >> Hi, we're back here where the Cube is live, and I didn't even know it Welcome, we're at Spark Summit 2017. Having so much fun talking to our guests I didn't know the camera was on. We are doing a talk with Cloudera, a couple of experts that we have here. First is Mark Grover, who's a software engineer and an author. He wrote the book, "Dupe Application Architectures." Mark, welcome to the show. >> Mark: Thank you very much. Glad to be here. And just to his left we also have Jennifer Wu, and Jennifer's director of product management at Cloudera. Did I get that right? >> That's right. I'm happy to be here, too. >> Alright, great to have you. Why don't we get started talking a little bit more about what Cloudera is maybe introducing new at the show? I saw a booth over here. Mark, do you want to get started? >> Mark: Yeah, there are two exciting things that we've launched at least recently. There Cloudera Altus, which is for transient work loads and being able to do ETL-Like workloads, and Jennifer will be happy to talk more about that. And then there's Cloudera data science workbench, which is this tool that allows folks to use data science at scale. So, get away from doing data science in silos on your personal laptops, and do it in a secure environment on cloud. >> Alright, well, let's jump into Data Science Workbench first. Tell me a little bit more about that, and you mentioned it's for exploratory data science. So give us a little more detail on what it does. >> Yeah, absolutely. So, there was private beta for Cloudera Data Science Workbench earlier in the year and then it was GA a few months ago. And it's like you said, an exploratory data science tool that brings data science to the masses within an enterprise. Previously people used to have, it was this dichotomy, right? As a data scientist, I want to have the latest and greatest tools. I want to use the latest version of Python, the latest notebook kernel, and I want to be able to use R and Python to be able to crunch this data and run my models in machine learning. However, on the other side of this dichotomy are the IT organization of the organization, where if they want to make sure that all tools are compliant and that your clusters are secure, and your data is not going into places that are not secured by state of the art security solutions, like Kerberos for example, right? And of course if the data scientists are putting the data on their laptops and taking the laptop around to wherever they go, that's not really a solution. So, that was one problem. And the other one was if you were to bring them all together in the same solution, data scientists have different requirements. One may want to use Python 2.6. Another one maybe want to use 3.2, right? And so Cloudera Data Science Workbench is a new product that allows data scientists to visualize and do machine learning through this very nice notebook-like interface, share their work with the rest of their colleagues in the organization, but also allows you to keep your clusters secure. So it allows you to run against a Kerberized cluster, allows single sign on to your web interface to Data Science Workbench, and provides a really nice developer experience in the sense that My workflow and my tools and my version of Python does not conflict with Jennifer's version of Python. We all have our own docker and Kubernetes-based infrastructure that makes sure that we use the packages that we need, and they don't interfere with each other. We're going to go to Jennifer on Altus in just a few minutes, but George first give you a chance to maybe dig in on Data Science workshop. >> Two questions on the data science side: some of the really toughest nuts to crack have been Sort of a common environment for the collaborators, but also the ability to operationalize the models once you've sort of agreed on them, and manage the lifecycle across teams, you know? Like, challenger champion, promote something, or even before that doing the ab testing, and then sort of what's in production is typically in a different language from what, you know, it was designed in and sort of integrating it with the apps. Where is that on the road map? Cause no one really has a good answer for that. >> Yeah, that's an excellent question. In general I think it's the problem to crack these days. How do you productionalize something that was written by a data scientist in a notebook-like system onto the production cluster, right? And I think the part where the data scientist works in a different language than the language that's in production, I think that problem, the best I can say right now is to actually have someone rewrite that. Have someone rewrite that in the language you're going to make in production, right? I don't see that to be the more common part. I think the more widespread problem is even when the language is production, how do you go making the part that the data scientist wrote, the model or whatever that would be, into a prodution cluster? And so, Data Science Workbench in particular runs on the same cluster that is being managed by Cloudera manager, right? So this is a tool that you install, but that is available to you as a web server, as a web interface, and so that allows you to move your development machine learning algorithms from your data science workbench to production much more easier, because it's all running on the same hardware and same systems. There's no separate Cloudera managers that you have to use to manage the workbench compared to your actual cluster. >> Okay. A tangential question, but one of the, the difficulties of doing machine learning is finding all the training data and, and sort of data science expertise to sit with the domain expert to, you know, figure out proper model of features, things like that. One of the things we've seen so far from the cloud vendors is they take their huge datasets in terms of voice, you know, images. They do the natural language understanding, speech or rather text to speech, you know, facial recognition. Cause they have such huge datasets they can train on. We're hearing noises that they'd going to take that down to the more mundane statistical kind of machine learning algorithms, so that you wouldn't be, like, here's a algorithm to do churn, you know, go to town, but that they might have something that's already kind of pre-populated that you would just customize. Is that something that you guys would tackle, too? >> I can't speak for the road map in that sense, but I think some of that problem needs to be tackled by projects like Spark for example. So I think as the stack matures, it's going to raise the level of abstraction as time goes on. And I think whatever benefits Spark ecosystem will have will come directly to distributions like Cloudera. >> George: That's interesting. >> Yeah >> Okay >> Alright, well let's go to Jennifer now and talk about Altus a little bit. Now you've been on the Cube show before, right? >> I have not. >> Okay, well, familiar with your work. Tell us again, you're the product manager for Altus. What does it do, and what was the motivation to build it? >> Yeah, we're really excited about Cloudera Altus. So, we released Cloudera Altus in its first GA form in April, and we launched Cloudera Altus in a public environment in Strata London about two weeks ago, so we're really excited about this and we are very excited to now open this up to all of the customer base. And what it is is a platform as a service offering designed to leverage, basically, the agility and the scale of cloud, and make a very easy to use type of experience to expose Cloudera capacity for, in particular for data engineering type of workloads. So the end user will be able to very easily, in a very agile manner, get data engineering capacity on Cloudera in the cloud, and they'll be able to do things like ETL and large scale data processing, productionized machine learning workflows in the cloud with this new data engineering as a service experience. And we wanted to abstract away the cloud, and cluster operations, and make the end user a really, the end user experience very easy. So, jobs and workloads as first class objects. You can do things like submit jobs, clone jobs, terminate jobs, troubleshoot jobs. We wanted to make this very, very easy for the data engineering end user. >> It does sound like you've sort of abstracted away a lot of the infrastructure that you would associate with on-prem, and sort of almost make it, like, programmable and invisible. But, um, I guess my, one of my questions is when you put it in a cloud environment, when you're on-prem you have a certain set of competitors which is kind of restrictive, because you are the standalone platform. But when you go on the cloud, someone might say, "I want to use red shift on Amazon," or Snowflake, you know, as the MPP sequel database at the end of a pipeline. And it's not just, I'm using those as examples. There's, you know, dozens, hundreds, thousands of other services to choose from. >> Yes. >> What happens to the integrity of that platform if someone carves off one piece? >> Right. So, interoperability and a unified data pipeline is very important to us, so we want to make sure that we can still service the entire data pipeline all the way from ingest and data processing to analytics. So our team has 24 different open source components that we deliver in the CDH distribution, and we have committers across the entire stack. We know the application, and we want to make sure that everything's interoperable, no matter how you deploy the cluster. So if you deploy data engineering clusters through Cloudera Altus, but you deployed Impala clusters for data marks in the cloud through Cloudera Director or through any other format, we want all these clusters to be interoperable, and we've taken great pains in order to make everything work together well. >> George: Okay. So how do Altus and Sata Science Workbench interoperate with Spark? Maybe start with >> You want to go first with Altus? >> Sure, so, we, in terms of interoperability we focus on things like making sure there are no data silos so that the data that you use for your entire data lake can be consumed by the different components in our system, the different compute engines and different tools, and so if you're processing data you can also look at this data and visualize this data through Data Science Workbench. So after you do data ingestion and data processing, you can use any of the other analytic tools and then, and this includes Data Science Workbench. >> Right, and for Data Science Workbench runs, for example, with the latest version of Spark you could pick, the currently latest released version of Spark, Spark 2.1, Spark 2.2 is being boarded of course, and that will soon be integrated after its release. For example you could use Data Science Workbench with your flavor of Spark two's version and you can run PySpark or Scala jobs on this notebook-like interface, be able to share your work, and because you're using Spark Underneath the hood it uses yarn for resource management, the Data Science Workbench itself uses Docker for configuration management, and Kubernetes for resource managing these Docker containers. >> What would be, if you had to describe sort of the edge conditions and the sweet spot of the application, I mean you talked about data engineering. One thing, we were talking to Matei Zaharia and Ronald Chin about was, and Ali Ghodsi as well was if you put Spark on a database, or at least a, you know, sophisticated storage manager, like Kudu, all of a sudden there're a whole new class of jobs or applications that open up. Have you guys thought about what that might look like in the future, and what new applications you would tackle? >> I think a lot of that benefit, for example, could be coming from the underlying storage engine. So let's take Spark on Kudu, for example. The inherent characteristics of Kudu today allow you to do updates without having to either deal with the complexity of something like Hbase, or the crappy performance of dealing HDFS compactions, right? So the sweet spot comes from Kudu's capabilities. Of course it doesn't support transactions or anything like that today, but imagine putting something like Spark and being able to use the machine learning libraries and, we have been limited so far in the machine learning algorithms that we have implemented in Spark by the storage system sometimes, and, for example new machine learning algorithms or the existing ones could rewritten to make use of the update features for example, in Kudu. >> And so, it sounds like it makes it, the machine learning pipeline might get richer, but I'm not hearing that, and maybe this isn't sort of in the near term sort of roadmap, the idea that you would build sort of operational apps that have these sophisticated analytics built in, you know, where the analytics, um, you've done the training but at run time, you know, the inferencing influences a transaction, influences a decision. Is that something that you would foresee? >> I think that's totally possible. Again, at the core of it is the part that now you have one storage system that can do scans really well, and it can also do random reads and writes any place, right? So as your, and so that allows applications which were previously siloed because one appication that ran off of HDFS, another application that ran out of Hbase, and then so you had to correlate them to just being one single application that can use to train and then also use their trained data to then make decisions on the new transactions that come in. >> So that's very much within the sort of scope of imagination, or scope. That's part of sort of the ultimate plan? >> Mark: I think it's definitely conceivable now, yeah. >> Okay. >> We're up against a hard break coming up in just a minute, so you each get a 30-second answer here, so it's the same question. You've been here for a day and a half now. What's the most surprising thing you've learned that you thing should be shared more broadly with the Spark community? Let's start with you. >> I think one of the great things that's happening in Spark today is people have been complaining about latency for a long time. So if you saw the keynote yesterday, you would see that Spark is making forays into reducing that latency. And if you are interested in Spark, using Spark, it's very exciting news. You should keep tabs on it. We hope to deliver lower latency as a community sooner. >> How long is one millisecond? (Mark laughs) >> Yeah, I'm largely focused on cloud infrastructure and I found here at the conference that, like, many many people are very much prepared to actually start taking more, you know, more POCs and more interest in cloud and the response in terms of all of this in Altus has been very encouraging. >> Great. Well, Jennifer, Mark, thank you so much for spending some time here on the Cube with us today. We're going to come by your booth and chat a little bit more later. It's some interesting stuff. And thank you all for watching the Cube today here at Spark Summit 2017, and thanks to Cloudera for bringing us these two experts. And thank you for watching. We'll see you again in just a few minutes with our next interview.

Published Date : Jun 7 2017

SUMMARY :

covering Spark Summit 2017, brought to you by databricks. I didn't know the camera was on. And just to his left we also have Jennifer Wu, I'm happy to be here, too. Mark, do you want to get started? and being able to do ETL-Like workloads, and you mentioned it's for exploratory data science. And the other one was if you were to bring them all together and manage the lifecycle across teams, you know? and so that allows you to move your development machine the domain expert to, you know, I can't speak for the road map in that sense, and talk about Altus a little bit. to build it? on Cloudera in the cloud, and they'll be able to do things a lot of the infrastructure that you would associate with We know the application, and we want to make sure Maybe start with so that the data that you use for your entire data lake and you can run PySpark in the future, and what new applications you would tackle? or the existing ones could rewritten to make use the idea that you would build sort of operational apps Again, at the core of it is the part that now you have That's part of sort of the ultimate plan? that you thing should be shared more broadly So if you saw the keynote yesterday, you would see that and the response in terms of all of this on the Cube with us today.

ENTITIES

Entity	Category	Confidence
Jennifer	PERSON	0.99+
Mark Grover	PERSON	0.99+
Jennifer Wu	PERSON	0.99+
Ali Ghodsi	PERSON	0.99+
George	PERSON	0.99+
Mark	PERSON	0.99+
April	DATE	0.99+
Ronald Chin	PERSON	0.99+
San Francisco	LOCATION	0.99+
Matei Zaharia	PERSON	0.99+
30-second	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
Dupe Application Architectures	TITLE	0.99+
dozens	QUANTITY	0.99+
Python	TITLE	0.99+
yesterday	DATE	0.99+
Two questions	QUANTITY	0.99+
today	DATE	0.99+
Spark	TITLE	0.99+
Amazon	ORGANIZATION	0.99+
two experts	QUANTITY	0.99+
a day and a half	QUANTITY	0.99+
First	QUANTITY	0.99+
one problem	QUANTITY	0.99+
Python 2.6	TITLE	0.99+
Strata London	LOCATION	0.99+
one piece	QUANTITY	0.99+
first	QUANTITY	0.98+
Spark Summit 2017	EVENT	0.98+
Cloudera Altus	TITLE	0.98+
Scala	TITLE	0.98+
Docker	TITLE	0.98+
One	QUANTITY	0.97+
Kudu	ORGANIZATION	0.97+
one millisecond	QUANTITY	0.97+
PySpark	TITLE	0.96+
R	TITLE	0.95+
one	QUANTITY	0.95+
two weeks ago	DATE	0.93+
Data Science Workbench	TITLE	0.92+
Cloudera	TITLE	0.91+
hundreds	QUANTITY	0.89+
Hbase	TITLE	0.89+
each	QUANTITY	0.89+
24 different open source components	QUANTITY	0.89+
few months ago	DATE	0.89+
single	QUANTITY	0.88+
kernel	TITLE	0.88+
Altus	TITLE	0.88+

Dr. Jisheng Wang, Hewlett Packard Enterprise, Spark Summit 2017 - #SparkSummit - #theCUBE

>> Announcer: Live from San Francisco, it's theCUBE covering Sparks Summit 2017 brought to you by Databricks. >> You are watching theCUBE at Sparks Summit 2017. We continue our coverage here talking with developers, partners, customers, all things Spark, and today we're honored now to have our next guest Dr. Jisheng Wang who's the Senior Director of Data Science at the CTO Office at Hewlett Packard Enterprise. Dr. Wang, welcome to the show. >> Yeah, thanks for having me here. >> All right and also to my right we have Mr. Jim Kobielus who's the Lead Analyst for Data Science at Wikibon. Welcome, Jim. >> Great to be here like always. >> Well let's jump into it. At first I want to ask about your background a little bit. We were talking about the organization, maybe you could do a better job (laughs) of telling me where you came from and you just recently joined HPE. >> Yes. I actually recently joined HPE earlier this year through the Niara acquisition, and now I'm the Senior Director of Data Science in the CTO Office of Aruba. Actually, Aruba you probably know like two years back, HP acquired Aruba as a wireless networking company, and now Aruba takes charge of the whole enterprise networking business in HP which is about over three billion annual revenue every year now. >> Host: That's not confusing at all. I can follow you (laughs). >> Yes, okay. >> Well all I know is you're doing some exciting stuff with Spark, so maybe tell us about this new solution that you're developing. >> Yes, actually my most experience of Spark now goes back to the Niara time, so Niara was a three and a half year old startup that invented, reinvented the enterprise security using big data and data science. So what is the problem we solved, we tried to solve in Niara is called a UEBA, user and entity behavioral analytics. So I'll just try to be very brief here. Most of the transitional security solutions focus on detecting attackers from outside, but what if the origin of the attacker is inside the enterprise, say Snowden, what can you do? So you probably heard of many cases today employees leaving the company by stealing lots of the company's IP and sensitive data. So UEBA is a new solution try to monitor the behavioral change of the enterprise users to detect both this kind of malicious insider and also the compromised user. >> Host: Behavioral analytics. >> Yes, so it sounds like it's a native analytics which we run like a product. >> Yeah and Jim you've done a lot of work in the industry on this, so any questions you might have for him around UEBA? >> Yeah, give us a sense for how you're incorporating streaming analytics and machine learning into that UEBA solution and then where Spark fits into the overall approach that you take? >> Right, okay. So actually when we started three and a half years back, the first version when we developed the first version of the data pipeline, we used a mix of Hadoop, YARN, Spark, even Apache Storm for different kind of stream and batch analytics work. But soon after with increased maturity and also the momentum from this open source Apache Spark community, we migrated all our stream and batch, you know the ETL and data analytics work into Spark. And it's not just Spark. It's Spark, Spark streaming, MLE, the whole ecosystem of that. So there are at least a couple advantages we have experienced through this kind of a transition. The first thing which really helped us is the simplification of the infrastructure and also the reduction of the DevOps efforts there. >> So simplification around Spark, the whole stack of Spark that you mentioned. >> Yes. >> Okay. >> So for the Niara solution originally, we supported, even here today, we supported both the on-premise and the cloud deployment. For the cloud we also supported the public cloud like AWS, Microsoft Azure, and also Privia Cloud. So you can understand with, if we have to maintain a stack of different like open source tools over this kind of many different deployments, the overhead of doing the DevOps work to monitor, alarming, debugging this kind of infrastructure over different deployments is very hard. So Spark provides us some unified platform. We can integrate the streaming, you know batch, real-time, near real-time, or even longterm batch job all together. So that heavily reduced both the expertise and also the effort required for the DevOps. This is one of the biggest advantages we experienced, and certainly we also experienced something like the scalability, performance, and also the convenience for developers to develop a new applications, all of this, from Spark. >> So are you using the Spark structured streaming runtime inside of your application? Is that true? >> We actually use Spark in the steaming processing when the data, so like in the UEBS solutions, the first thing is collecting a lot of the data, different account data source, network data, cloud application data. So when the data comes in, the first thing is streaming job for the ETL, to process the data. Then after that, we actually also develop the some, like different frequency like one minute, 10 minute, one hour, one day of this analytics job on top of that. And even recently we have started some early adoption of the deep learning into this, how to use deep learning to monitor the user behavior change over time, especially after user gives a notice what user, is user going to access like most servers or download some of the sensitive data? So all of this requires very complex analytics infrastructure. >> Now there were some announcements today here at Spark Summit by Databricks of adding deep learning support to their core Spark code base. What are your thoughts about the deep learning pipelines, API, that they announced this morning? It's new news, I'll understand if you don't, haven't digested it totally, but you probably have some good thoughts on the topic. >> Yes, actually this is also news for me, so I can just speak from my current experience. How to integrate deep learning into Spark actually was a big challenge so far for us because what we used so far, the deep learning piece, we used TensorFlow. And certainly most of our other stream and data massaging or ETL work is done by Spark. So in this case, there are a couple ways to manage this, too. One is to set up two separate resource pool, one for Spark, the other one for TensorFlow, but in our deployment there is some very small on-premise department which has only like four node or five node cluster. It's not efficient to split resource in that way. So we actually also looking for some closer integration between deep learning and Spark. So one thing we looked before is called the TensorFlow on Spark which was open source a couple months ago by Yahoo. >> Right. >> So maybe this is certainly more exciting news for the Spark team to develop this native integration. >> Jim: Very good. >> Okay and we talked about the UEBA solution, but let's go back to a little broader HPE perspective. You have this concept called the intelligent edge, what's that all about? >> So that's a very cool name. Actually come a little bit back. I come from the enterprise background, and enterprise applications have some, actually a lag behind than consumer applications in terms of the adoption of the new data science technology. So there are some native challenges for that. For example, collecting and storing large amount of this enterprise sensitive data is a huge concern, especially in European countries. Also for the similar reason how to collect, normally weigh developer enterprise applications. You're lack of some good quantity and quality of the trending data. So this is some native challenges when you develop enterprise applications, but even despite of this, HPE and Aruba recently made several acquisitions of analytics companies to accelerate the adoption of analytics into different product line. Actually that intelligent age comes from this IOT, which is internet of things, is expected to be the fastest growing market in the next few years here. >> So are you going to be integrating the UEBA behavioral analytics and Spark capability into your IOT portfolio at HP? Is that a strategy or direction for you? >> Yes. Yes, for the big picture that certainly is. So you can think, I think some of the Gartner Report expected the number of the IOT devices is going to grow over 20 billion by 2020. Since all of this IOT devices are connected to either intranet or internet, either through wire or wireless, so as a networking company, we have the advantage of collecting data and even take some actions at the first of place. So the idea of this intelligent age is we want to turn each of these IOT devices, the small IOT devices like IP camera, like those motion detection, all of these small devices as opposed to the distributed sensor for the data collection and also some inline actor to do some real-time or even close to real-time decisions. For example, the behavior anomaly detection is a very good example here. If IOT devices is compromised, if the IP camera has been compromised, then use that to steal your internal data. We should detect and stop that at the first place. >> Can you tell me about the challenges of putting deep learning algorithms natively on resource constrained endpoints in the IOT? That must be really challenging to get them to perform well considering that there may be just a little bit of memory or flash capacity or whatever on the endpoints. Any thoughts about how that can be done effectively and efficiently? >> Very good question >> And at low cost. >> Yes, very good question. So there are two aspects into this. First is this global training of the intelligence which is not going to be done on each of the device. In that case, each of the device is more like the sensor for the data collection. So we are going to build a, collect the data sent to the cloud, or build all of this giant pool, like computing resource to trend the classifier, to trend the model, but when we trend the model, we are going to ship the model, so the inference and the detection of the model of those behavioral anomaly really happen on the endpoint. >> Do the training centrally and then push the trained algorithms down to the edge devices. >> Yes. But even like, the second as well even like you said, some of the device like say people try to put those small chips in the spoon, in the case of, in hospital to make it like more intelligent, you cannot put even just the detection piece there. So we also looking to some new technology. I know like Caffe recently announced, released some of the lightweight deep learning models. Also there's some, your probably know, there's some of the improvement from the chip industry. >> Jim: Yes. >> How to optimize the chip design for this kind of more analytics driven task there. So we are all looking to this different areas now. >> We have just a couple minutes left, and Jim you get one last question after this, but I got to ask you, what's on your wishlist? What do you wish you could learn or maybe what did you come to Spark Summit hoping to take away? >> I've always treated myself as a technical developer. One thing I am very excited these days is the emerging of the new technology, like a Spark, like TensorFlow, like Caffe, even Big-Deal which was announced this morning. So this is something like the first go, when I come to this big advanced industry events, I want to learn the new technology. And the second thing is mostly to share our experience and also about adopting of this new technology and also learn from other colleagues from different industries, how people change life, disrupt the old industry by taking advantage of the new technologies here. >> The community's growing fast. I'm sure you're going to receive what you're looking for. And Jim, final question? >> Yeah, I heard you mention DevOps and Spark in same context, and that's a huge theme we're seeing, more DevOps is being wrapped around the lifecycle of development and training and deployment of machine learning models. If you could have your ideal DevOps tool for Spark developers, what would it look like? What would it do in a nutshell? >> Actually it's still, I just share my personal experience. In Niara, we actually developed a lot of the in-house DevOps tools like for example, when you run a lot of different Spark jobs, stream, batch, like one minute batch verus one day batch job, how do you monitor the status of those workflows? How do you know when the data stop coming? How do you know when the workflow failed? Then even how, monitor is a big thing and then alarming when you have something failure or something wrong, how do you alarm it, and also the debug is another big challenge. So I certainly see the growing effort from both Databricks and the community on different aspects of that. >> Jim: Very good. >> All right, so I'm going to ask you for kind of a soundbite summary. I'm going to put you on the spot here, you're in an elevator and I want you to answer this one question. Spark has enabled me to do blank better than ever before. >> Certainly, certainly. I think as I explained before, it helped a lot from both the developer, even the start-up try to disrupt some industry. It helps a lot, and I'm really excited to see this deep learning integration, all different road map report, you know, down the road. I think they're on the right track. >> All right. Dr. Wang, thank you so much for spending some time with us. We appreciate it and go enjoy the rest of your day. >> Yeah, thanks for being here. >> And thank you for watching the Cube. We're here at Spark Summit 2017. We'll be back after the break with another guest. (easygoing electronic music)

Published Date : Jun 6 2017

SUMMARY :

brought to you by Databricks. at the CTO Office at Hewlett Packard Enterprise. All right and also to my right we have Mr. Jim Kobielus (laughs) of telling me where you came from of the whole enterprise networking business I can follow you (laughs). that you're developing. of the company's IP and sensitive data. Yes, so it sounds like it's a native analytics of the data pipeline, we used a mix of Hadoop, YARN, the whole stack of Spark that you mentioned. We can integrate the streaming, you know batch, of the deep learning into this, but you probably have some good thoughts on the topic. one for Spark, the other one for TensorFlow, for the Spark team to develop this native integration. Okay and we talked about the UEBA solution, Also for the similar reason how to collect, of the IOT devices is going to grow natively on resource constrained endpoints in the IOT? collect the data sent to the cloud, Do the training centrally But even like, the second as well even like you said, So we are all looking to this different areas now. And the second thing is mostly to share our experience And Jim, final question? If you could have your ideal DevOps tool So I certainly see the growing effort All right, so I'm going to ask you even the start-up try to disrupt some industry. We appreciate it and go enjoy the rest of your day. We'll be back after the break with another guest.

ENTITIES

Entity	Category	Confidence
Jim	PERSON	0.99+
HPE	ORGANIZATION	0.99+
HP	ORGANIZATION	0.99+
10 minute	QUANTITY	0.99+
one hour	QUANTITY	0.99+
one minute	QUANTITY	0.99+
Wang	PERSON	0.99+
San Francisco	LOCATION	0.99+
Yahoo	ORGANIZATION	0.99+
Jisheng Wang	PERSON	0.99+
Niara	ORGANIZATION	0.99+
first version	QUANTITY	0.99+
one day	QUANTITY	0.99+
two aspects	QUANTITY	0.99+
Jim Kobielus	PERSON	0.99+
Hewlett Packard Enterprise	ORGANIZATION	0.99+
First	QUANTITY	0.99+
Caffe	ORGANIZATION	0.99+
Spark	TITLE	0.99+
Spark	ORGANIZATION	0.99+
one	QUANTITY	0.99+
each	QUANTITY	0.99+
three and a half year	QUANTITY	0.99+
both	QUANTITY	0.99+
Sparks Summit 2017	EVENT	0.99+
first	QUANTITY	0.99+
DevOps	TITLE	0.99+
2020	DATE	0.99+
second thing	QUANTITY	0.99+
Aruba	ORGANIZATION	0.98+
Snowden	PERSON	0.98+
two years back	DATE	0.98+
first thing	QUANTITY	0.98+
one last question	QUANTITY	0.98+
AWS	ORGANIZATION	0.98+
over 20 billion	QUANTITY	0.98+
one question	QUANTITY	0.98+
UEBA	TITLE	0.98+
today	DATE	0.98+
Spark Summit	EVENT	0.97+
Microsoft	ORGANIZATION	0.97+
Spark Summit 2017	EVENT	0.96+
Apache	ORGANIZATION	0.96+
three and a half years back	DATE	0.96+
Databricks	ORGANIZATION	0.96+
one day batch	QUANTITY	0.96+
earlier this year	DATE	0.94+
Aruba	LOCATION	0.94+
One	QUANTITY	0.94+
#SparkSummit	EVENT	0.94+
One thing	QUANTITY	0.94+
one thing	QUANTITY	0.94+
European	LOCATION	0.94+
Gartner	ORGANIZATION	0.93+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for in Data Science 2017: