Dr. Mark Ramsey & Bruno Aziza | BigData NYC 2017
>> Live from Mid Town Manhattan. It's the Cube, covering BIGDATA New York City 2017. Brought to you by, SiliconANGLE Media and it's ecosystems sponsors. >> Hey welcome back everyone live here in New York City for the Cube special presentation of BIGDATA NYC. Here all week with the Cube in conjunction with Strata Data even happening around the corner. I'm John Furrier the host. James Kobielus, our next two guests Doctor Mark Ramsey, chief data officer and senior vice president of R&D at GSK, Glasgow Pharma company. And Bruno as he's the CMO at Fscale, both Cube alumni. Welcome back. >> Thank for having us. >> So Bruno I want to start with you because I think that Doctor Mark has some great use cases I want to dig into and go deep on with Jim. But Fscale, give us the update of the company. You guys doing well, what's happening? How's the, you have the vision of this data layer we talked a couple years ago. It's working so tell us, give us the update. >> A lot of things have happened since we talked last. I think you might have seen some of the news in terms of growth. Ten X growth since we started and mainly driven around the customer use cases. That's why I'm excited to hear from Mark and share his stories with the rest of the audience here. We have a presentation at Strata tomorrow with Vivens. It's a great IOT use case as well. So what we're seeing is the industry is changing in terms of how it's spying the idea platforms. In the past, people would buy idea platforms vertically. They'd buy the visualization, they'd buy the sementic and buy the best of great integration. We're now live in a world where there's a multitude of BI tools. And the data platforms are not standardized either. And so what we're kind of riding as a trend is this idea of the need for the universal semantic layer. This idea that you can have a universal set of semantics. In a dictionary or ontology. that can be shared across all types of business users and business use cases. Or across any data. That's really the trend that's driving our growth. And you'll see it today at this show with the used cases and the customers. And of course some of the announcements that we're doing. We're announcing a new offer with cloud there and tableau. And so we're really excited about again how they in space and the partner ecosystems embracing our solutions. >> And you guys really have a Switzerland kind of strategy. You're going to play neutral, play nicely with everybody. Because you're in a different, your abstraction layer is really more on the data. >> That's right. The whole value proposition is that you don't want to move your data. And you don't want to move your users away from the tools that they already know but you do want them to be able to take advantage of the data that you store. And this concept of virtualized layer and you're universal semantic layer that enables the use case to happen faster. Is a big value proposition to all of them. >> Doctor Mark Ramsey, I want to get your quick thoughts on this. I'm obviously your customer so. I mean you're not bias, you ponder pressure everyday. Competitive noise out there is high in this area and you're a chief data officer. You run R&D so you got that 20 miles stare into the future. You've got experience running data at a wide scale. I mean there's a lot of other potential solutions out there. What made it attractive for you? >> Well it feels a need that we have around really that virtualization. So we can leave the data in the format that it is on the platform. And then allow the users to use like Bruno was mentioning. Use a number of standardized tools to access that information. And it also gives us an ability to learn how folks are consuming the data. So they will use a variety of tools, they'll interact with the data. At scale gives us a great capability to really look under the cover, see how they're using the data. And if we need to physicalize some of that to make easier access in the long term. It gives us that... >> It's really an agility model kind to data. You're kind of agile. >> Yeah its kind of a way to make, you know so if you're using a dash boarding tool it allows you to interact with the data. And then as you see how folks are actually consuming the information. Then you can physicalize it and make that readily available. So it is, it gives you that agile cycles to go through. >> In your use of the solution, what have you seen in terms of usage patterns. What are your users using at scale for? Have you been surprised by how they're using it? And where do you plan to go in terms of the use cases you're addressing going forward with this technology? >> This technology allows us to give the users the ability to query the data. So for example we use standardized ontologies in several of the areas. And standardized ontologies are great because the data is in one format. However that's not necessarily how the business would like to look at the data and so it gives us an ability to make the data appear like the way the users would like to consume the information. And then we understand which parts of the model they're actually flexing and then we can make the decision to physicalize that. Cause again it's a great technology but virtualization there is a cost. Because the machines have to create the illusion of the data being a certain way. If you know it's something that's going to be used day in and day out then you can move it to a physicalized version. >> Is there a specific threshold when you were looking at the metrics of usage. When you know that particular data, particular views need to be physicalized. What is that threshold or what are those criteria? >> I think it's, normally is a combination of the number of connections that you have. So the joins of the data across the number of repositories of data. And that balanced with the volume of data so if you're dealing with thousands of rows verses billions of rows then that can lead you to make that decision faster. There isn't a defined metric that says, well we have this number of rows and this many columns and this size that it really will lead you down that path. But the nice thing is you can experiment and so it does give you that ability to sort of prototype and see, are folks consuming the data before you evoke the energy to make it physical. >> You know federated, I use the word federated but semantic virtualization layers clearly have been around for quite sometime. A lot of solution providers offer them. A lot of customers have used them for disparate use cases. One of the wraps traditionally again estimating virtualization is that it's simply sort of a stop gap between chaos on the one end. You know where you have dozens upon dozens of databases with no unified roll up. That's a stop gap on the way to full centralization or migration to a big data hub. Did you see semantic virtualization as being sort of your target architecture for your operational BI and so forth? Or do you on some level is it simply like I said a stop gap or transitional approach on the way to some more centralized environment? >> I think you're talking about kind of two different scenarios here. One is in federated I would agree, when folks attempted to use that to bring disparate data sources together to make it look like it was consolidated. And they happen to be on different platforms, that was definitely a atop gap on a journey to really addressing the problem. Thing that's a little different here is we're talking about this running on a standardized platform. So it's not platformed disparate it's on the platform the data is being accessed on the platform. It really gives us that flexibility to allow the consumer of the data to have a variety of views of the data without actually physicalizing each of them. So I don' know that it's on a journey cause we're never going to get to where we're going to make the data look as so many different ways. But it's very different than you know ten, 15 years ago. When folks were trying to solve disparate data sources using federation. >> Would it be fair to characterize what you do as agile visualization of the data on a data lake platform? Is that what it's essentially about? >> Yeah that, it certainly enables that. In our particular case we use the data lake as the foundation and then we actually curate the data into standardized ontologies and then really, the consumer access layer is where we're applying virtualization. In the creation of the environment that we have we've integrated about a dozen different technologies. So one of the things we're focused on is trying to create an ecosystem. And at scale is one of the components of that. It gives us flexibility so that we don't have to physicalize. >> Well you'd have to stand up any costs. So you have the flexibility with at scale. I get this right? You get the data and people can play with it without actually provisioning. It's like okay save some cash, but then also you double down on winners that come in. >> Things that are a winner you check the box, you physicalize it. You provide that access. >> You get crowd sourcing benefits like going on in your. >> You know exactly. >> The curation you mentioned. So the curation goes on inside of at scale. Are you using a different tool or something you hand wrote in house to do that? Essentially it's a data governance and data cleansing. >> That is, we use technology called Tamer. That is a machine learning based data curation tool, that's one of our fundamental tools for curation. So one of the things in the life sciences industry is you tend to have several data sources that are slightly aligned. But they're actually different and so machine learning is an excellent application. >> Lets get into the portfolio. Obviously as a CTO you've got to build a holistic view. You have a tool chest of tools and a platform. How do you look at the big picture? On that scale if it's been beautifully makes a lot of sense. So good for those guys. But you know big picture is, you got to have a variety of things in your arsenal. How do you architect that tool shed or your platform? Is everything a hammer, everything's a nail. You've got all of them though. All the things to build. >> You bring up a great point cause unfortunately a lot of times. We'll use your analogy, it's like a tool shed. So you don't want 12 lawnmowers right? In your tool shed right? So one of the challenges is that a lot of the folks in this ecosystem. They start with one area of focus and then they try to grow into area of focuses. Which means that suddenly everybody's starts to be a lawnmower, cause they think that's... >> They start as a hammer and turn into a lawn mower. >> Right. >> How did that happen, that's called pivoting. >> You can mow your lawn with a hammer but. So it's really that portfolio of tools that all together get the job done. So certainly there's a data acquisition component, there's the curation component. There's visualization machines learning, there's the foundational layer of the environment. So all of those things, our approach has been to select. The kind of best in class tools around that and then work together and... Bruno and the team at scale have been part of this. We've actually had partner summits of how do we bring that ecosystem together. >> Is your stuff mostly on prime, obviously a lot of pharma IP there. So you guys have the game that poll patent thing which is well documented. You don't want to open up the kimono and start the cloth until it's releasing so. You obviously got to keep things confidential. Mix of cloud, on prime, is it 100 percent on prime? Is there some versing for the cloud? Is it a private cloud, how do you guys look at the cloud piece? >> Yeah majority of what we're doing is on prime. The profile for us is that we persist the data. So it's not. In some cases when we're doing some of the more advanced analytics we burst to the cloud for additional processors. But the model of persisting the data means that it's much more economical to have on prime instance of what we're doing. But it is a combination, but the majority of what we're doing is on prime. >> So will you hold on Jim, one more question. I mean obviously everyone's knocking on your door. You know how to get in that account. They spend a lot of money. But you're pretty disciplined it sounds like you've got to a good view of you don't want people to come in and turn into someone that you don't want them to be. But you also run R&D so you got to have to understand the head room. How do you look at the head room of what you need down the road in terms of how you interface with the suppliers that knock on your door. Whether it's at scale currently working with you now. And then people just trying to get in there and sell you a hammer or a lawn mower. Whatever they have they're going to try, you know you're dealing with the vendor pressure. >> Right well a lot of that is around what problem we're trying to solve. And we drive all of that based on the use cases and the value to the business. I mean and so if we identify gaps that we need to address. Some of those are more specific to life sciences types of challenges where they're very specific types of tools that the population of partners is quite small. And other things. We're building an actual production, operational environment. We're not building a proof of concept, so security is extremely important. We're coberosa enabled end to end to out rest inflight. Which means it breaks some of the tools and so there's criteria of things that need to be in place in order to... >> So you got anything about scale big time? So not just putting a beach head together. But foundationally building out platform. Having the tools that fit general purpose and also specialty but scales a big thing right? >> And it's also we're addressing what we see is three different cohorts of consumers of the data. One is more in the guided analytics, the more traditional dashboards, reports. One is in more of computational notebooks, more of the scientific using R, Python, other languages. The third is more kind of almost at the bare middle level machine learning, tenser flow a number of tools that people directly interact. People don't necessarily fit nicely into those three cohorts so we're also seeing that, there's a blend. And that's something that we're also... >> There's a fourth cohort. >> Yeah well you know someone's using a computational notebook but they want to draw upon a dashboard graphic. And then they want to run a predefined tenser flow and pull all that together so. >> And what you just said, tied up the question I was going to ask. So it's perfect so. One of my core focuses is as a Wikibon analyst is on deep learning. On AI so in semantic data virtualization in a life sciences pharma context. You have undoubtedly a lot of image data, visual data. So in terms of curating that and enabling you know virtualized access to what extent are you using deep learning, tenser flow, convolutional neural networks to be able to surface up the visual patterns that can conceivably be searched using a variety of techniques. Is that a part of your overall implementation of at scale for your particular use cases currently? Or do you plan to go there in terms of like tenser flow? >> No I mean we're active, very active. In deep learning, artificial intelligence, machine learning. Again it depends on which problem you're trying to solve and so we again, there's a number of components that come together when you're looking at the image analytics. Verses using data to drive out certain decisions. But we're acting in all of those areas. Our ultimate goal is to transform the way that R&D is done within a pharmaceutical company. To accelerate the, right now it takes somewhere between five and 15 years to develop a new medicine. The goal is to really to do a lot more analytics to shorten that time significantly. Helps the patients, gets the medicines to market faster. >> That's your end game you've got to create an architecture that enables the data to add value. >> Right. >> The business. Doctor Mark Ramsey thanks so much for sharing the insight from your environment. Bruno you got something there to show us. What do you got there? He always brings a prop on. >> A few years ago I think I had a tattoo on my neck or something like this. But I'm happy that I brought this because you could see how big Mark's vision is. the reason why he's getting recognized by club they're on the data awards and so forth. Is because he's got a huge vision and it's a great opportunity for a lot of CTOs out there. I think the average CEO spent a 100 million dollars to deploy big data solutions over the last five years. But they're not able to consumer all the data they produce. I think in your case you consume about a 100 percent of the instructor data. And the average in this space is we're able to consume about one percent of the data. And this is essentially the analogy today that you're dealing with if you're on the enterprise. We'd spent a lot of time putting data in large systems and so forth. But the tool set that we give, that you did officers in their team is a cocktail straw lik this in order to drink out of it. >> That's a data lake actually. >> It's an actual lake. It's a Slurpee cup. Multiple Slurpees with the same straw. >> Who has the Hudson river water here? >> I can't answer that question I think I'd have to break a few things if I did. But the idea here is that it's not very satisfying. Enough the frustration business users and business units. When at scale's done is we built this, this is the straw you want. So I would kind of help CTOs contemplate this idea of the Slurpee and the cocktail straw. How much money are you spending here and how much money are you spending there. Because the speed at which you can get the insights to the business user. >> You got to get that straw you got to break it down so it's available everywhere. So I think that's a great innovation and it makes me thirsty. >> You know what, you can have it. >> Bruno thanks for coming from at scale. Doctor Mark Ramsey good to see you again great to have you come back. Again anytime love to have chief data officers on. Really a pioneering position, is the critical position in all organizations. It will be in the future and will continue being. Thanks for sharing your insights. It's the Cube, more live coverage after this short break. (tech music)
SUMMARY :
Brought to you by, And Bruno as he's the CMO at Fscale, So Bruno I want to start with you And of course some of the announcements that we're doing. And you guys really have a Switzerland And you don't want to move your users You run R&D so you got that in the format that it is on the platform. It's really an agility model kind to data. So it is, it gives you that agile cycles to go through. And where do you plan to go and day out then you can move it to a physicalized version. When you know that particular data, particular views But the nice thing is you can experiment You know where you have dozens upon dozens of databases So it's not platformed disparate it's on the platform So one of the things we're focused on So you have the flexibility with at scale. Things that are a winner you check the box, You get crowd sourcing benefits So the curation goes on So one of the things in the life sciences industry you got to have a variety of things in your arsenal. So one of the challenges is that a lot of the folks Bruno and the team at scale have been part of this. So you guys have the game that poll patent thing but the majority of what we're doing is on prime. of what you need down the road and the value to the business. So you got anything about scale big time? more of the scientific using R, Python, other languages. Yeah well you know someone's using to what extent are you using deep learning, Helps the patients, gets the medicines to market faster. that enables the data to add value. Bruno you got something there to show us. that you did officers in their team is a cocktail straw It's a Slurpee cup. Because the speed at which you can get the insights you got to break it down so it's available everywhere. Doctor Mark Ramsey good to see you again
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Jim | PERSON | 0.99+ |
James Kobielus | PERSON | 0.99+ |
Mark | PERSON | 0.99+ |
Bruno | PERSON | 0.99+ |
New York City | LOCATION | 0.99+ |
John Furrier | PERSON | 0.99+ |
20 miles | QUANTITY | 0.99+ |
Mark Ramsey | PERSON | 0.99+ |
100 percent | QUANTITY | 0.99+ |
12 lawnmowers | QUANTITY | 0.99+ |
GSK | ORGANIZATION | 0.99+ |
100 million dollars | QUANTITY | 0.99+ |
Fscale | ORGANIZATION | 0.99+ |
third | QUANTITY | 0.99+ |
dozens | QUANTITY | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
One | QUANTITY | 0.99+ |
15 years | QUANTITY | 0.99+ |
Python | TITLE | 0.99+ |
today | DATE | 0.99+ |
Bruno Aziza | PERSON | 0.99+ |
both | QUANTITY | 0.99+ |
one | QUANTITY | 0.98+ |
each | QUANTITY | 0.98+ |
fourth cohort | QUANTITY | 0.98+ |
NYC | LOCATION | 0.98+ |
Cube | ORGANIZATION | 0.98+ |
Hudson river | LOCATION | 0.98+ |
Vivens | ORGANIZATION | 0.98+ |
Switzerland | LOCATION | 0.98+ |
three cohorts | QUANTITY | 0.98+ |
Doctor | PERSON | 0.98+ |
billions of rows | QUANTITY | 0.97+ |
Ten X | QUANTITY | 0.97+ |
tomorrow | DATE | 0.97+ |
two guests | QUANTITY | 0.97+ |
one format | QUANTITY | 0.97+ |
thousands of rows | QUANTITY | 0.97+ |
BIGDATA | ORGANIZATION | 0.97+ |
prime | COMMERCIAL_ITEM | 0.96+ |
one more question | QUANTITY | 0.96+ |
couple years ago | DATE | 0.96+ |
Dr. | PERSON | 0.96+ |
agile | TITLE | 0.96+ |
R&D | ORGANIZATION | 0.95+ |
two different scenarios | QUANTITY | 0.95+ |
about one percent | QUANTITY | 0.95+ |
five | QUANTITY | 0.93+ |
Strata Data | ORGANIZATION | 0.93+ |
three different cohorts | QUANTITY | 0.92+ |
Mid Town Manhattan | LOCATION | 0.92+ |
dozens of databases | QUANTITY | 0.92+ |
Wikibon | ORGANIZATION | 0.92+ |
ten, | DATE | 0.89+ |
about a 100 percent | QUANTITY | 0.89+ |
BigData | ORGANIZATION | 0.88+ |
2017 | DATE | 0.86+ |
one area | QUANTITY | 0.81+ |
BIGDATA New York City 2017 | EVENT | 0.79+ |
last five years | DATE | 0.78+ |
15 years ago | DATE | 0.78+ |
about a dozen different technologies | QUANTITY | 0.76+ |
A few years ago | DATE | 0.76+ |
one end | QUANTITY | 0.74+ |
Glasgow Pharma | ORGANIZATION | 0.7+ |
things | QUANTITY | 0.69+ |
R | TITLE | 0.65+ |