Madeleine Udell, Cornell University | WiDS 2019
>> Live from Stanford University it's theCUBE. Covering Global Women in Data Science Conference. Brought to you by SiliconANGLE Media. >> Welcome back to theCUBE's live coverage of Women in Data Science fourth annual global conference. I'm Lisa Martin here at the Arrillaga Alumni Center at Stanford joined by, a WiDS speaker and Standford alum Madeleine Udell. You are now an assistant professor at Cornell University. Madeleine welcome to theCUBE. >> Thank you it's great to be here. >> So this is your first WiDS. >> This is my first WiDS. >> But you were at Stanford a few years ago when the WiDS movement began. So tell us a little bit about what you do at Cornell. The research that you do, the classes that you teach, and the people men and women that you work with. >> Sure so at Cornell I'm studying optimization and machine learning. I'm really interested in understanding low dimensional structure in large messy data sets. So we can figure out ways of looking at the data set that make them seem cleaner, and smaller, and easier to work with. I teach a bunch of classes related to these topics. PhD classes on optimization and on optimization for machine learning. But one that I'm really excited about is an undergrad class that I teach called, Learning With Big Messy Data. That introduces undergraduates to what messy data sets look like which they often don't see in their undergraduate curriculum. And ways to wrangle them into the kinds of forms that they could use with other tools that they have learned about as undergraduates. >> You say messy, big messy data. >> Yes. >> With a big smile on your face. >> Yes. >> So this is something that might be introduced to these students as they enter their PhD program. Define messy data and some applications of it. >> Often times people only learn about big messy data when they go to industry and that's actually how I understood what these kinds of data sets looked like. I took a break from my PhD while my advisor was on sabbatical and I scampered off to the Obama 2012 campaign, and on the campaign they had these horrible data sets. They had you know hundreds of millions or rows. One for every voter in the United States, and maybe tens of thousands of columns about things that we knew about those voters. And they were weird kinds of things, right? They were things like gender, which in this data set was boolean, State, which took one of fifty values, Approximate education level, Approximate income weather or not they had voted in each of the last elections and I looked at this and I was like I don't know what to do, right? these are not numbers, right? They are boolean, they're categorical they're ordinals and a bunch of the data was missing so there were many people for which we didn't know their level of education or we didn't know their approximation of income or we didn't know weather or not they had voted in the last elections. So with this kind of horrible data set how do you do like basic things, how do you cluster, how do you even visualize this kind of data set so I came back to my PhD thinking, I want to figure out how this works I want to figure out the right way of approaching this data set Cause a lot of people would just sort of hack it and I wanted to understand what's really going on here what's the right model to think about this stuff. >> So that really was quite influential in the rest of your PhD and what your doing now, cause you found this interesting but also tangible in a way, right? especially working with a political campaign >> That's right so, I mean I'm both interested in the application and I'm interested in the math so I like to be able to come back to Stanford at the time we're now at Cornell and really think about what the mathematical structure is of these data sets what are good models for what the underlying latent spaces look like, but then I also like to take it back to people in industry, take it back to political campaigns but you know here at WiDS I'm excited to tell people about the kinds of mathematics that can help you deal with this kind of data set more easily. >> Did you have a talk this afternoon called filling in missing-- >> Yup >> Data with low rank models >> that's right >> One of the things before we get into that, that id love to kind of unpack with you is looking at, taking the campaign Obama 2012 campaign messy data as an example of something that is interesting there's a lot of science and mathematics behind it but there's also other skill I'd like to get your perspective on and that's creativity that's empathy it's being able to clearly understand and communicate to your audience, Where do those other skills factor into what you do as a professor and also the curriculum you're teaching >> Sure, I think they are incredibly important if you want your technical work to have an impact you need to be able to communicate it to other people you need to make, number one make sure you are working on the right problems which means talking to people to figure out what the right problems are and this is one aspect that I consider really fundamental to my career is going around talking to people in industry about what problems they are facing that they don't know how to solve, right? Then you go back to your universities you squirrel away and try and figure it out, often sometimes I can't figure it out on my own so I need to put together a team, I need to pull in other people from other disciplines who have the skills I don't have in order to figure out the full solution to the problem, right? Not just to solve the part of the problem that I know how but to solve the full problem I can see and so that also requires a lot of empathy and communication to make the team actually produce something more than what the individual members could. Then the third step is to communicate that result back to the people who could actually use it and put it into practice, and for that you know that's part of the reason I'm here at WiDS is to try show people the useful things I think that I've come up with but I'm also really excited to talk to people here and understand what gnarly problems do they not know how to solve yet. >> There's a lot of gnarly problems out there, love that you brought that word up >> (laughter) >> But I'm just curious before we go further is understanding did you understand when you was studying mathematics, computational engineering data science did you understand at that point the other important skills. A collaboration of communication or did you discover that along the way and is that something that is taught today to those students these are the other things we want to develop in you >> Yeah I think we barely teach those skills, >> Really? I think at the earliest level there's a lot of focus on the technical skills and it's hard to see the other skills that are going to enable you to get from 90 to 100% but that 90 to 100% is the most important part. Right? If you can't communicate your results back then it doesn't do so much good to have produced the results in the first place, >> Right but really a lot of the education right now at most universities is focused on the technical core and you can see that in the way we evaluate student, right? We evaluate them on their homework which are supposed to be individual on their test performance, right? maybe their projects and the projects I think are much better at helping them develop these skills of communication and teamwork, but that's you know not included in most courses because frankly it's hard to do it's hard to teach students how to work on projects It's hard to get them topics, it's hard to evaluate their results on their projects it's hard to give them time to present it to a group, but I think these are critical skills, right? The project work is much more what works becomes after they finish their studies. >> As you've been in the STEM fields for quite a while and gone so far in your academic career, tell me about the changes that you've seen in the curriculum and do you think you're going to have a chance to influence some of those other skills communication when I was in grad school studying biology, communication a long time ago was actually part of it for a semester but I'm just wondering do you think that this is something that a movement like WiDS could help inspire. >> I think it's important to help people see what, the skills they are going to need to use down the line I think that sometimes, the thing is I think that the technical foundation is really important and I think that doubling down on that particularly when your young and can concentrate on the, on the nitty gritty details I actually think that's something that becomes harder as you get older And so focusing on that for people on their undergrad and early PDH I think that actually makes sense but you want them to see what the final result is, right? You want them to see like what is their career and how is that different from what they are doing right now So I think events like WiDS are really great for showcasing that but I would also like to sort of pull that forward, to pull that project work forward, to the extent possible with the skills that the students have at any point in their curriculum in the class that I teach in big messy date the cap stone of the course is, class project where the students tackle a big messy data set that they find on their own, they define the problems and the form of what they are supposed to produce is supposed to be a report to their manager, right? To say the project proposal says, "manager this is why I should be allowed to work on this "project for the next month because it's so important "it's really going to drive growth in our business it's going to "open up new markets" But they're supposed to describe it industry terms not just academic terms, right? Then they try and figure out actually how to solve the problem and at the end they're supposed to once again write a report that's describing how what they found will help and impact the business >> That element of persuasion is key-- >> That's right that's right >> So the last thing here as we wrap up this is the fourth annual women in data science conference that I mentioned in the opening. The impact and the expansion that they have been able to drive in such a short period of time is something that I always loved seeing every year there's is a hundred and fifty plus regional events going on they're expected to reach a hundred thousand people what excites you about the opportunity that you have to present here at Stanford later today? >> I think that it's amazing that there is so many people that are excited about WiDS, I mean I can't travel to a hundred and fifty locations certainly not this year, not in many many years so the ability to, to be in touch with so many people in so many different places is really exciting to me I hope that they will be in touch with me too that direction is a little be harder with current technology but I want to learn from them as well as teaching them. >> Well Madeleine thank you so much for sharing some of your time with me this morning on theCUBE we appreciate that, and wish you good luck on your WiDS presentation this afternoon >> It was really fun to talk with you, thank you for having me here >> Ah my pleasure >> We want to thank you, you're watching theCUBE live from the forth annual women in data science conference WiDS here at Stanford, I'm Lisa Martin stick around I'll be right back after a break with my next guest. (upbeat funky music)
SUMMARY :
Brought to you by SiliconANGLE Media. Welcome back to theCUBE's live coverage and the people men and women that you work with. and easier to work with. to these students as they enter their PhD program. and I scampered off to the Obama 2012 campaign, take it back to political campaigns but you know the full solution to the problem, right? discover that along the way and is that something that is the other skills that are going to enable you to get it's hard to teach students how to work on projects and do you think you're going to have a chance to influence that you have to present here at Stanford later today? in so many different places is really exciting to me from the forth annual women in data science conference
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Lisa Martin | PERSON | 0.99+ |
Madeleine | PERSON | 0.99+ |
Madeleine Udell | PERSON | 0.99+ |
90 | QUANTITY | 0.99+ |
United States | LOCATION | 0.99+ |
Cornell University | ORGANIZATION | 0.99+ |
first | QUANTITY | 0.99+ |
third step | QUANTITY | 0.99+ |
Stanford University | ORGANIZATION | 0.99+ |
hundreds of millions | QUANTITY | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
each | QUANTITY | 0.98+ |
100% | QUANTITY | 0.98+ |
Stanford | LOCATION | 0.98+ |
this year | DATE | 0.98+ |
theCUBE | ORGANIZATION | 0.98+ |
one aspect | QUANTITY | 0.98+ |
Cornell | ORGANIZATION | 0.98+ |
one | QUANTITY | 0.97+ |
WiDS | EVENT | 0.97+ |
WiDS | ORGANIZATION | 0.96+ |
next month | DATE | 0.96+ |
Women in Data Science | EVENT | 0.96+ |
today | DATE | 0.95+ |
tens of thousands of columns | QUANTITY | 0.94+ |
One | QUANTITY | 0.93+ |
both | QUANTITY | 0.93+ |
this afternoon | DATE | 0.92+ |
Global Women in Data Science Conference | EVENT | 0.92+ |
a hundred and fifty plus regional events | QUANTITY | 0.9+ |
fifty values | QUANTITY | 0.9+ |
this morning | DATE | 0.89+ |
later today | DATE | 0.88+ |
forth annual women in data science conference | EVENT | 0.83+ |
hundred and fifty locations | QUANTITY | 0.82+ |
a hundred thousand people | QUANTITY | 0.81+ |
a lot of science and | QUANTITY | 0.8+ |
every voter | QUANTITY | 0.79+ |
few years ago | DATE | 0.78+ |
WiDS 2019 | EVENT | 0.77+ |
annual women in data science conference | EVENT | 0.76+ |
things | QUANTITY | 0.74+ |
One for | QUANTITY | 0.73+ |
2012 | DATE | 0.72+ |
Stanford | ORGANIZATION | 0.69+ |
Arrillaga Alumni Center | ORGANIZATION | 0.68+ |
Obama 2012 | EVENT | 0.68+ |
Standford | ORGANIZATION | 0.65+ |
many people | QUANTITY | 0.65+ |
lot of people | QUANTITY | 0.63+ |
fourth annual | QUANTITY | 0.58+ |
Covering | EVENT | 0.55+ |
people | QUANTITY | 0.54+ |
fourth | QUANTITY | 0.52+ |
conference | QUANTITY | 0.5+ |
Obama | EVENT | 0.48+ |
global | EVENT | 0.46+ |