Megan Price, Human Rights Data Analysis Group - Women in Data Science 2017 - #WiDS2017 - #theCUBE
(upbeat music) >> Voiceover: Live from Stanford University. It's the Cube covering the Women in Data Science Conference, 2017. >> Hi, welcome back to the Cube. I'm Lisa Martin and we are at the second annual Women in Data Science Conference at Stanford University. Such an inspiring day that we've had so far and right now we're joined by Megan Price, the executive director of the human rights data analysis group. Megan, welcome to the Cube. >> Thank you. >> It's so exciting to have you here. Megan, you're background is statistics. You have a PhD as a statistician. The Human Rights Data and Analysis Group, HRDAG, is focused on statistical analysis of mass violence. Talk to us about sort of the merger of your bio statistician or your statistician background with human rights. Was that something that you were always interested in? >> Sure. It was and I have to say I was really lucky. I got my Bachelor's and my Master's in statistics from a very technical engineering school in Ohio, where honestly, a lot of people would sort of, pat me on the head and say, "That's nice, that you're interested in human rights. You'll outgrow that." And fortunately I had one very thoughtful mentor, who said to me, "You know, I really think Public Health school is the direction you should go in", and so I got my PhD in biostatistics from Public Health school and it was really there that I was exposed to people who kind of said, "Yeah, social justice, human rights, do that as a day job. Get on it.", and so that was really great that I was exposed to that as something I can move into as a career. >> Exposed to them, but also you had the confidence. You obviously had a mentor that was very influential, but that takes some courage and some guts to go, you know what, yeah, this is needed. >> It's true, yeah. (laughs) >> So talk to us about some of the ... The HRDAG, we talked about it a little bit before we went live. The evolution. Show to our viewers, how it's evolved to what it is today. >> Sure. So the organization, the name and work started with work that my colleague, Dr. Patrick Ball started doing in El Salvador and in Guatemala in the 90s. And at the time, he was working ... He's formed a team to do the work at the American Association for the Advancement of Science. And so that was about 25 years ago. And then the work evolved and the team just kept kind of moving to where the right home was to get that work done and so in nearly 2000s, they moved out here to Paul Walter just up the street to Benetech, another technical non-profit. And they provided us a really nice home for our work for nine years. And then in 2013, the time had really come to be the right time for Patrick and I to spin out HRDAG as it's own non-profit organization. We're fiscally sponsored right now, but we're our own institution, which we're really excited about. >> So you mentioned some of the projects that Patrick was working on. What are some of the things that were really compelling to you, specifically within human rights, that really are catalysts for the work that you're doing today? >> Sure. I think that there are a lot of quantitative questions that get raised in looking at these questions about widespread patterns of violence, and asking questions about accountability and responsibility for violence. And to answer those questions, you have to look at statistical patterns, and so you need to bring a deep understanding of the data that are available and the appropriate way to analyze and answer those questions. >> How do you from an accuracy perspective, I understand that that's incredibly vital, especially where these important issues are concerned, how does HRDAG eliminate, mitigate inaccuracy issues with respect to data? >> Yeah, well we're always thinking about each of our projects as taking place in an adversarial environment, because we ultimately assume that at the end of the day our results are going to be either subjected to the kind of deep scrutiny that comes along with any kind of socially and politically sensitive topic, or with the kind of scrutiny that happens in a court room. And so that's really what motivates the level of rigor that we require in our work. And we maintain that by maintaining our relationship with mostly academicians, who are really pushing these methods forward and staying on top of what is the most cutting edge approach to this problem and how can we really know that we're being as transparent as possible in the way this data were collected, the way they were analyzed, the way they were processed and the limitations of those analysis. You know, the uncertainty present in any estimates that we put out. >> Give us an example of some of the type data sources that you're evaluating, say for the conflict in Syria. >> So in the case of Syria, we have relationships with four organizations that are all collecting information about victims who've been killed in the ongoing conflict in Syria. Those groups are the Syrian Center for Statistics and Research, Syrian Network for Human Rights, the Damascus Center for Human Rights Studies, and the Violations Documentation Center. And those are all citizen led, by groups that are maintaining networks collecting that information to the best of their ability. And they share with us, largely Excel spreadsheets that contain names of victims and any other information they were able to collect about those victims. >> You mentioned University collaboration a minute ago. From a methodology standpoint. Give me an insight into ... You're getting data from these various sources, largely Excel, where we know with Excel comes humans, comes sometimes, "Oops". How are you working with universities to help evaluate the data or what are some of the methodologies that they're recommending, given the data sources and the tools that you have? >> So there's really two stages that the data go through and the first one is within the groups themselves, who do that first layer of verification, and that is the human verification prior to, kind of all the risks of data entry problems. And so they're doing the on the ground, making sure that they've collected and confirmed that information, but then you're absolutely right, we get this data that's been hand entered and with all of the risks and potential down sides of hand entered data and so primarily what we do is fairly conventional data processing and data cleaning to just check for things like outliers, contradictory information. We'll do that using Python and using R. And then our friends and colleagues in academia, where they're really helping us out is, because there are these multiple sources collecting names of individual victims, what we have is a record linkage problem. And so we have multiple records that refer to the same individual. >> Okay. >> And so we work a lot with our academic partners to stay on top of the latest ways to de-duplicate databases, that might have multiple entries that refer to the same person. And so that's been really great lately. >> Okay. What are some of the methods that you've used in Syria to quantify mass violence and what have some of the outcomes been to date? >> So we rely primarily on methods from record linkage and that gets us to what we know and can observe. And then from there we need to build an estimate, what we don't know and what we can't observe, because inevitably in conflict violence, some of that violence is hidden. Some of those victims have not been identified or their stories have not been told yet. And it's our job as data scientists to use the tools at our disposal to estimate how much we don't know. And so for that step we use a class of statistical tools, called multiple systems estimation. And essentially what that does, is it builds on the patterns of data as they're collected by these multiple sources to model what the underlying population must have been. To generate what we were able to see. >> Okay. >> And so that's been the primary analysis we've done in Syria. And what we found from that analysis, is that as valuable and important as the documented data are, they often are overwhelmed, for example when violence peaks. It may be too dangerous and it may be impossible to accurately record how many people have been killed. >> Okay. >> And so we need a statistical model that can help us identify when data we observe seem to plateau, but perhaps our estimates tell us no, in fact that was a very violent period. And then we can dig in with field experts and interpret, was that a time when we know that territorial control was in contention. Or was that a time when we know, that there were clashes between certain groups. And so then we can infer further from that about responsibility for violence. >> So applying some additional attributers. Things that are attributing to this. What are some of the differences that you think that this has made so far? >> What I hope this has done so far, is simply to raise awareness about the scale of the violence that's happening in Syria. And what I hope ultimately, is that it helps to attribute accountability to those who are responsible for this violence. >> You've also got some projects going on in Guatemala. Can you share a little bit about that? >> We do. Yeah, we have a couple of projects in Guatemala. The one that I've worked on most closely, is looking at the historic archive of the national police in Guatemala. And that's actually the project that I started working on when I joined HRDAG. And Guatemala suffered an armed internal conflict from 1960 to 1996. And during that time period, many witnesses came forward and said that the national police force participated in the violence, but at the time that the UN, the United Nations broke our peace treaties, they weren't able to find any documentary evidence of the role the police played. And then in 2005, quite by accident, this archive, that's this cache of the police forces bureaucratic documents was discovered. And so we've been studying it since then. And it's been this really fascinating problem, if you have this building full of millions and millions and millions of pieces of paper, that are not really organized in any way. And how do you go about studying that? And so we partnered with other experts from the American Statistical Association, to design a random sample of the archive, so that we could learn about it as quickly as possible. >> What are some of the learnings that you've discovered so far? >> What we've discovered so far is just the sheer magnitude of the archive and in particular the amount of documents that were generated during the conflict. And then the other thing that we have discovered is the communication flow. The pattern of documents being sent to and from leadership the National Police Force. And specifically, Patrick Ball testified about that communication flow, to help establish command responsibility for the former chief of police, for a kidnapping that occurred in 1984. >> Wow, incredibly impactful work. But you've got some things on the domestic frontier. With us a little bit about what you're working on stateside. >> We do, yeah. In the past year, we've started our first US based project, which we're really excited about. And it's looking at the algorithms that are being used both in predictive policing and in criminal justice risk assessment. So decisions like whether or not someone should get bail or pre trial hearings, things like that. And we've been working with partners, primarily lawyers, to help assess, sort of, how are those algorithms working and what's the underlying data that's being fed into those algorithms. And what's the ways in which that data are biased. And so the algorithms are replicating the bias that exists in the data. >> Tell me, how does that conversation go, as a statistician with a lawyer, who is, you know, a business person. What sort of educating do you need to do to them about the impact that this data can make and how imperative it is that it'd be accurate. >> Yeah, well those conversations are really interesting, because there's so much education going in both directions. Where both we are helping them to turn their substantive question into an analytical question and sort of develop it in a way that we can do an analysis to get at that question, but then they're also helping us to understand, what's the way in which this information needs to be conveyed, so that it holds up in court, and so that it establish some sort of precedence, so that they can make policy change. >> It makes me think of, sort of the topic or the skill of communication. A number of our guests this morning on the program and those that we've heard speaking today, talk about the traditional data scientist skills. You know hybrid, hacker, someone that has statistics, mathematical skills, but now really looking at somebody who also has to have other behavioral skills. Be able to be creative, interpretive, but also to communicate it. I'd love to get your perspective as you've seen data science evolve in your own career. How have you maybe trained your team on the importance of communicating this information, so that it has a value and it has impact? >> Absolutely. I think creativity and communication are probably the two most important skills for a data scientist to have these days and that's definitely something that on our team, you know, it's always a painful process, but every time we give a talk, if we're fortunate enough that it's been videoed, we always have to go back and watch that. And I recommend to my teammates to do it quietly at home alone, maybe with their preferred beverage of choice, but that's the way that you learn and you discover, oh I could have said that differently or I could have said that another way, or I could have thought about a different way to present that, because I do think that that's absolutely vital. >> I'm just curious what you're perspective is from a curriculum standpoint, we've got a lot of students here, we've got some professors here. Is there something that you would recommend as part of ... Look back to your education. Would you think, you know what, being able to understand statistics is one thing, I need to be able to communicate it. Was that something that was part of your curriculum or something that you think, you know what, that's a vital component of this? >> It's absolutely a vital component. It was not part of my formal curriculum, but it was something that I got out of graduate school, because I was very lucky that I got to teach, essentially statistics 101 to introductory Public Health students. So they were graduate students, but there were a lot of students who maybe hadn't had a math class in a decade and were fairly math phobic. >> Lisa: Sounds like me. (both laughing) >> We could, you know, hold hands and get through it together. >> Okay, oh good. Beverage of my choice, awesome. (laughs) >> Exactly. And I really feel like that was what improved my communication skills, was experience with those students and thinking about how to convey the information to that class and going in day after day and designing that curriculum and really thinking about how to teach that class, is really the way that I have learned my communication skills. >> Oh that's fab. That real world experience, there's nothing that beats that. What are some of the things that have excited you about participating in (mumbles) this year? >> Oh my gosh, it is so much fun to be in an audience and to speak to an audience, that is so predominantly female. I mean of course, that's not something that we get to do very often. And so young, I mean this audience is really full of very energetic, ready to go tackle the world's problems women and it's very invigorating for me. It helps me to kind of go back and think, alright how can we do more and do bigger and create more opportunities for these folks to fill? >> It's a very symbiotic relationship, I think. They learn so much from you and you're learning so much from them. It's really nice. You can feel it. Right, you can feel it here in this environment. >> Absolutely. >> Well, Megan, thank you so much for joining us on the program today. We wish you the best of luck with HRDAG and your impending new little girl. >> Thank you. (laughs) I appreciate that. >> Absolutely. Well we thank you for watching the Cube. Again, we're live at the Women and Data Science Conference at Stanford University, second annual event. Stick around, we'll be right back. (upbeat music)
SUMMARY :
It's the Cube covering are at the second annual It's so exciting to have you here. school is the direction you should go in", and some guts to go, It's true, yeah. So talk to us about some of the ... And so that was about 25 years ago. What are some of the things And to answer those questions, you have to that at the end of the day say for the conflict in Syria. and the Violations Documentation Center. and the tools that you have? and that is the human And so we work a lot of the outcomes been to date? And so for that step we use And so that's been the primary analysis And so then we can infer further from that Things that are attributing to this. is that it helps to Can you share a little bit about that? forward and said that the that we have discovered on the domestic frontier. that exists in the data. the impact that this data can and so that it establish so that it has a value and it has impact? that's the way that you learn or something that you that I got to teach, Lisa: Sounds like me. We could, you know, hold hands Beverage of my choice, awesome. that was what improved What are some of the things and to speak to an audience, They learn so much from you and you're the program today. I appreciate that. Well we thank you for watching the Cube.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Megan | PERSON | 0.99+ |
Patrick Ball | PERSON | 0.99+ |
Lisa Martin | PERSON | 0.99+ |
Patrick | PERSON | 0.99+ |
2005 | DATE | 0.99+ |
Ohio | LOCATION | 0.99+ |
Guatemala | LOCATION | 0.99+ |
American Statistical Association | ORGANIZATION | 0.99+ |
Lisa | PERSON | 0.99+ |
El Salvador | LOCATION | 0.99+ |
Patrick Ball | PERSON | 0.99+ |
1984 | DATE | 0.99+ |
Megan Price | PERSON | 0.99+ |
National Police Force | ORGANIZATION | 0.99+ |
Syria | LOCATION | 0.99+ |
American Association for the Advancement of Science | ORGANIZATION | 0.99+ |
Syrian Network for Human Rights | ORGANIZATION | 0.99+ |
2013 | DATE | 0.99+ |
Violations Documentation Center | ORGANIZATION | 0.99+ |
United Nations | ORGANIZATION | 0.99+ |
Damascus Center for Human Rights Studies | ORGANIZATION | 0.99+ |
Excel | TITLE | 0.99+ |
Syrian Center for Statistics and Research | ORGANIZATION | 0.99+ |
1960 | DATE | 0.99+ |
HRDAG | ORGANIZATION | 0.99+ |
first | QUANTITY | 0.99+ |
nine years | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
1996 | DATE | 0.99+ |
Python | TITLE | 0.99+ |
US | LOCATION | 0.99+ |
each | QUANTITY | 0.99+ |
Stanford University | ORGANIZATION | 0.99+ |
both | QUANTITY | 0.99+ |
Human Rights Data and Analysis Group | ORGANIZATION | 0.99+ |
millions | QUANTITY | 0.98+ |
UN | ORGANIZATION | 0.98+ |
one | QUANTITY | 0.97+ |
Women in Data Science Conference | EVENT | 0.97+ |
today | DATE | 0.97+ |
past year | DATE | 0.97+ |
Women and Data Science Conference | EVENT | 0.97+ |
#WiDS2017 | EVENT | 0.97+ |
90s | DATE | 0.96+ |
Women in Data Science Conference | EVENT | 0.96+ |
two stages | QUANTITY | 0.96+ |
first one | QUANTITY | 0.95+ |
first layer | QUANTITY | 0.94+ |
both directions | QUANTITY | 0.94+ |
this morning | DATE | 0.93+ |
Stanford University | LOCATION | 0.93+ |
millions of pieces | QUANTITY | 0.91+ |
Benetech | ORGANIZATION | 0.91+ |
Public Health school | ORGANIZATION | 0.9+ |
Women in Data Science 2017 | EVENT | 0.9+ |
this year | DATE | 0.88+ |
2017 | DATE | 0.86+ |
about 25 years ago | DATE | 0.85+ |
Human Rights Data Analysis Group | ORGANIZATION | 0.81+ |
second annual | QUANTITY | 0.81+ |
Public Health school | ORGANIZATION | 0.81+ |
HRDAG | PERSON | 0.8+ |
101 | QUANTITY | 0.78+ |
human rights | ORGANIZATION | 0.77+ |
one thing | QUANTITY | 0.76+ |
Cube | ORGANIZATION | 0.74+ |
Paul Walter | LOCATION | 0.73+ |
2000s | DATE | 0.72+ |
couple | QUANTITY | 0.68+ |
paper | QUANTITY | 0.65+ |
a minute | DATE | 0.64+ |
analysis | ORGANIZATION | 0.55+ |
Dr. | PERSON | 0.53+ |