Fernando Lopez, Quanam | Dataworks 2018

>> Narrator: From Berlin, Germany, it's theCUBE, covering Dataworks Summit Europe 2018. Brought to you by Hortonworks. >> Well hello, welcome to the Cube. I'm James Kobielus, I'm the lead analyst for the Wikibon team within SiliconANGLE Media. I'm your host today here at Dataworks Summit 2018 in Berlin, Germany. We have one of Hortonworks' customers in South America with us. This is Fernando Lopez of Quanam. He's based in Montevideo, Uruguay. And he has won, here at the conference, he and his company have won an award, a data science award so what I'd like to do is ask Fernando, Fernando Lopez to introduce himself, to give us his job description, to describe the project for which you won the award and take it from there, Fernando. >> Hello and thanks for the chance >> Great to have you. >> I work for Quanam, as you already explained. We are about 400 people in the whole company. And we are spread across Latin America. I come from the kind of headquarters, which is located in Montevideo, Uruguay. And there we have a business analytics business unit. Within that, we are about 70 people and we have a big data and artificial intelligence and cognitive computing group, which I lead. And yes, we also implement Hortonworks. We are actually partnering with Hortonworks. >> When you say you lead the group, are you a data scientist yourself, or do you manage a group of data scientists or a bit of both? >> Well a bit of both. You know, you have to do different stuff in this life. So yes, I lead implementation groups. Sometimes the project is more big data. Sometimes it's more data science, different flavors. But within this group, we try to cover different aspects that are related in some sense with big data. It could be artificial intelligence. It could be cognitive computing, you know. >> Yes, so describe how you're using Hortonworks and describe the project for which you won, I assume it's a one project, for which you won the award, here at this conference. >> All right, yes. We are running several projects, but this one, the one about the prize, is one that I like so much because I'm actually a bioinformatics student so I have a special interest in this one. >> James: Okay. >> It's good to clarify that this was a joint effort between Quanam and GeneLifes. >> James: Genelabs. >> GeneLifes. >> James: GeneLifes. >> Yes, it's genetics and bioinformatics company. >> Right. >> That they specialize-- >> James: Is that a Montevideo based company? >> Yes. In a line, they are a startup that was born from the Institut Pasteur, but in Montevideo and they have a lot of people, who are specialists in bioinformatics, genetics, with a long career in the subject. And we come from the other side, from big data. I was kind of in the middle because of my interest with bioinformatics. So something like one year and a half ago, we met both companies. Actually there is a research, an innovation center, ICT4V. You can visit ICT4V.org, which is a non-profit organization after an agreement between Uruguay and France, >> Oh okay. >> Both governments. >> That makes possible different private or public organizations to collaborate. We have brainstorming sessions and so on. And from one of that brainstorming sessions, this project was born. So, after that we started to discuss ideas of how to bring tools to the medical genetiticists in order to streamline his work, in order to put on the top of his desktop different tools that could make his work easier and more productive. >> Looking for genetic diseases, or what are they looking for in the data specifically? >> Correct, correct. >> I'm not a geneticist but I try to explain myself as good as I can. >> James: Okay, that's good. You have a great job. >> If I am-- >> If I am the doctor, then I will spend a lot of hours researching literature. Bear in mind that we have nearly 300 papers each day, coming up in PubMed, that could be related with genetics. That's a lot. >> These are papers in Spanish that are published in South America? >> No, just talking about, >> Or Portuguese? >> PubMed from the NIH, it's papers published in English. >> Okay. >> PubMed or MEDLINE or-- >> Different languages different countries different sources. >> Yeah but most of it or everything in PubMed is in English. There is another PubMed in Europe and we have SciELO in Latin America also. But just to give you an idea, there's only from that source, 300 papers each day that could be related to genetics. So only speaking about literature, there's a huge amount of information. If I am the doctor, it's difficult to process that. Okay, so that's part of the issue. But on the core of the solution, what we want to give is, starting from the sequence genome of one patient, what can we assert, what can we say about the different variations. It is believed that we have around, each one of us, has about four million mutations. Mutation doesn't mean disease. Mutation actually leads to variation. And variation is not necessarily something negative. We can have different color of the eyes. We can have more or less hair. Or this could represent some disease, something that we need to pay attention as doctors, okay? So this part of the solution tries to implement heuristics on what's coming from the sequencing process. And this heuristics, in short, they tell you, which is the score of each variant, variation, of being more or less pathogenic. So if I am the doctor, part of the work is done there. Then I have to decide, okay, my diagnosis is there is this disease or not. This can be used in two senses. It can be used as prevention, in order to predict, this could happen, you have this genetic risk or this could be used in order to explain some disease and find a treatment. So that's the more bioinformatics part. On the other hand we have the literature. What we do with the literature is, we ingest this 300 daily papers, well abstracts not papers. Actually we have about three million abstracts. >> You ingest text and graphics, all of it? >> No, only the abstract, which is about a few hundred words. >> James: So just text? >> Yes >> Okay. >> But from there we try to identify relevant identities, proteins, diseases, phenotypes, things like that. And then we try to infer valid relationships. This phenotype or this disease can be caused because of this protein or because of the expression of that gene which is another entity. So this builds up kind of ontology, we call it the mini-ontology because it's specific to this domain. So we have kind of mini-semantic network with millions of nodes and edges, which is quite easy to interrogate. But the point is, there you have more than just text. You have something that is already enriched. You have a series of nodes and arrows, and you can query that in terms of reasoning. What leads to what, you know? >> So the analytical tools you're using, they come from, well Hortonworks doesn't make those tools. Are they coming from another partner in South America? Or another partner of Hortonworks' like an IBM or where does that come from? >> That's a nice question. Actually, we have an architecture. The core of the architecture is Hortonworks because we have scalability topics >> James: Yeah, HDP? >> Yes, HDFS, High-von-tessa, Spark. We have a number of items that need to be easily, ultra-escalated because when we talk about genome, it's easy to think about one terrabyte per patient of work. So that's one thing regarding storage and computing. On the other hand, we use a graph database. We use Neo4j for that. >> James: Okay the Neo4j for graph. The Neo4j, you have Hortonworks. >> Yes and we also use, in order to process natural language processing, we use Nine, which is based here in Berlin, actually. So we do part of the machine learning with Nine. Then we have Neo4j for the graph, for building this semantic network. And for the whole processing we have Hortonworks, for running this analysis and heuristics, and scoring the variance. We also use Solr for enterprise search, on top of the documents, or the conclusions of the documents that come from the ontology. >> Wow, that's a very complex and intricate deployment. So, great, in terms of the takeaways from this event, we only just have a little bit more time, what of all the discussions, the breakouts and the keynotes did you find most interesting so far about this show? Data stewardship was a theme of Scott Knowles, with that new solution, you know, in terms of what you're describing as operational application, have you built out something that can be deployed, is being deployed by your customers on an ongoing basis? It wasn't a one-time project, right? This is an ongoing application they can use internally. Is there a need in Uruguay or among your customers to provide privacy protections on this data? >> Sure. >> Will you be using these solutions like the data studio to enable a degree of privacy, protection of data equivalent to what, say, GDPR requires in Europe? Is that something? >> Yes actually we are running other projects in Uruguay. We are helping the, with other companies, we are helping the National Telecommunications Company. So there are security and privacy topics over there. And we are also starting these days a new project, again with ICT4V, another French company. We are in charge of their big data part, for an education program, which is based on the one laptop per child initiative, from the times of Nicholas Negroponte. Well, that initiative has already 10 years >> James: Oh from MIT, yes. >> Yes, from MIT, right. That initiative has already 10 years old in Uruguay, and now it has evolved also to retired people. So it's a kind of going towards the digital society. >> Excellent, I have to wrap it up Fernando, that's great you have a lot of follow on work. This is great, so clearly a lot of very advanced research is being done all over the world. I had the previous guest from South Africa. You from Uruguay so really south of the Equator. There's far more activity in big data than, we, here in the northern hemisphere, Europe and North America realize so I'm very impressed. And I look forward to hearing more from Quanam and through your provider, Hortonworks. Well, thank you very much. >> Thank you and thanks for the chance. >> It was great to have you here on theCUBE. I'm James Kobielus, we're here at DataWorks Summit, in Berlin and we'll be talking to another guest fairly soon. (mood music)

Published Date : Apr 18 2018

SUMMARY :

Brought to you by Hortonworks. to describe the project for which you won the award And there we have a business analytics business unit. Sometimes the project is more big data. and describe the project for which you won, the one about the prize, is one that I like so much It's good to clarify that this was a joint effort from the Institut Pasteur, but in Montevideo So, after that we started to discuss ideas of how to explain myself as good as I can. You have a great job. Bear in mind that we have nearly 300 papers each day, On the other hand we have the literature. But the point is, there you have more than just text. So the analytical tools you're using, The core of the architecture is Hortonworks We have a number of items that need to be James: Okay the Neo4j for graph. to process natural language processing, we use Nine, So, great, in terms of the takeaways from this event, from the times of Nicholas Negroponte. and now it has evolved also to retired people. You from Uruguay so really south of the Equator. It was great to have you here on theCUBE.

ENTITIES

Entity	Category	Confidence
Fernando	PERSON	0.99+
James	PERSON	0.99+
James Kobielus	PERSON	0.99+
Uruguay	LOCATION	0.99+
IBM	ORGANIZATION	0.99+
Fernando Lopez	PERSON	0.99+
Berlin	LOCATION	0.99+
Europe	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Hortonworks'	ORGANIZATION	0.99+
South Africa	LOCATION	0.99+
MIT	ORGANIZATION	0.99+
NIH	ORGANIZATION	0.99+
Scott Knowles	PERSON	0.99+
South America	LOCATION	0.99+
300 papers	QUANTITY	0.99+
Nicholas Negroponte	PERSON	0.99+
10 years	QUANTITY	0.99+
ICT4V	ORGANIZATION	0.99+
GeneLifes	ORGANIZATION	0.99+
both companies	QUANTITY	0.99+
Institut Pasteur	ORGANIZATION	0.99+
PubMed	TITLE	0.99+
Berlin, Germany	LOCATION	0.99+
North America	LOCATION	0.99+
Montevideo	LOCATION	0.99+
Montevideo, Uruguay	LOCATION	0.99+
Latin America	LOCATION	0.99+
one year and a half ago	DATE	0.99+
GDPR	TITLE	0.99+
two senses	QUANTITY	0.99+
Quanam	ORGANIZATION	0.99+
MEDLINE	TITLE	0.98+
Dataworks Summit 2018	EVENT	0.98+
English	OTHER	0.98+
Dataworks Summit	EVENT	0.98+
Wikibon	ORGANIZATION	0.98+
one-time	QUANTITY	0.97+
about 70 people	QUANTITY	0.97+
Portuguese	OTHER	0.97+
Equator	LOCATION	0.97+
one thing	QUANTITY	0.97+
2018	EVENT	0.97+
one project	QUANTITY	0.97+
each variant	QUANTITY	0.97+
National Telecommunications Company	ORGANIZATION	0.97+
millions of nodes	QUANTITY	0.97+
each one	QUANTITY	0.97+
about 400 people	QUANTITY	0.96+
both	QUANTITY	0.96+
one patient	QUANTITY	0.96+
nearly 300 papers	QUANTITY	0.95+
DataWorks Summit	EVENT	0.95+
one laptop	QUANTITY	0.94+
Both governments	QUANTITY	0.94+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Institut Pasteur: