ON DEMAND R AND D DATA PLATFORM GSK FINAL2

>>Hey, everyone, Thanks for taking them to join the story. Hope you and your loved ones are safe during these tough times. Let me start by introducing myself. My name is Michelle. When I walk for GlaxoSmithKline, GSK as an engineering manager in my current role, A little protocol platform A P s, which is part of the already data platform here in G S, K R and D Tech. I live in Dallas, Texas. I have a Masters degree in computer science on a bachelor's in electronics and communication engineering. I started my career as a software developer on over these years again a lot of experience in leading and building, not scale and predicts products and solutions. I also have a complete accountability for container platforms here at GSK or any tick. I've been working very closely with Dr Enterprise, which is no Miranda's for more than three years to enable container platforms that yes, came on mainly in our own Itek. So that's me. Let >>me give you a quick overview on agenda for today's talk. I'll start with what we do here at GSK on what is RND data platform. Then I'll give you an overview on What are the business drivers that >>motivated US toe? Take this container Germany on some insight into learnings on accomplishments over these years. Working with Dr Enterprise on the container platforms Lately, you must have seen a lot of articles off there which talk about how ts case liberating technologies like artificial intelligence, mission learning, UN data and analytics for the Douglas Corey process. I'm very excited to see the progress we have made in technology, but what makes us truly unique is our commitment to the patient. >>We're G escape, help millions of people, do more, feel better and live longer. Wear a global company that is focused on three were tickles pharmaceuticals vaccines on consumer healthcare. Our main intent is to lower the >>burden on the impact of diseases on the patients. Here at GSK, we allow science to drive the technology. This helps us toe build innovative products. That's helps our scientists to make better and faster additions throughout the drug discovery by plane. >>With that, let me give you some >>context on what currently data platform is how it is enabled. A T escape started in mid 2016. What used to be called us are any information platform whose main focus was to centralize curate on rationalized all the data produced within the others are in the business systems in orderto drive, a strategic business value, standardization of clinical trials, Genome Wide Association Study Analysis, also known as Jesus Storage and Crossing Off Rheal. World Evidence data some of the examples off how the only platform was used to deliver the business value four years later. No, a new set off business rivals of changing our landscape. The irony Information Platform is evolving to be a hybrid, multi cloud solution and is known as already did a platform refering to 20 >>19 GSK's annual report. These are the four teams that there are any platform will be mainly focused on. We're expanding our data capabilities to support the use. Escape by a former company on evolving into a hybrid medical platform is one of the many steps that we're taking to be future ready. Our key focus will still be making >>greater recommendations better and faster by using that wants us. We're making the areas like artificial intelligence and machine learning. No doc brings us toe. What is Germany is important. Why are we taking this German with that? Let me take you to the next topic off. Like the process of discovery, Francisco is not an easy process. Talking about the recent events occurred over the last few months on the way. How all our lives are impacted. It is a lot of talk on information going about. Why did drug discovery process is so tough working for a global health care company? I get asked this question very frequently. From many people I interact with. Question is like, Why is that? This car is so tough on why it takes so much time. Drug discovery is a complex process that involves multiple different stages on at each and every stage. There is huge amounts of data that the scientists have took process to make some decisions. Studies have shown that only 3% off small molecules entering the human studies actually become medicines. If you're new to drug discovery, you may ask, like what is the targets? Targets so low? We humans are very complex species, >>not going into the details of the process. We're G escape >>have made a lot of investments into technology that enabled us to make data river conditions. Throw the drug Discovery pipeline >>as we implement. As we started implementing these tools and technologies to enable already did a platform, we started to get a better appreciation off how these tools in track on integrate >>with each other. Our goal wants to make this platform a jail, the platform that can work at scale so that we can provide a great user experience and contribute back to the bread discovery pipeline so that the scientists can make faster editions. We want our ardently users to consume the data, and the service is available on the platform seamlessly in a self service fashion. And we also have to accomplish this by establishing trust. And then we have to end also enable the academic partnerships, acquisitions, collaborations that DSK has, which actually brings a lot of data on value to our scientists. So when we talk about so many collaborations and a lot of these systems, what this brings in is wide range off systems and platforms that are fundamentally built on different infrastructure. This is where Doctor comes into fiction on our containers significance. >>We have realized the power of containers on how we can simplify this complex ecosystem by using containers and provide a faster access off data to war scientists who didn't go >>back and contribute back to the drug discovery by play. >>With that, let me take talk to you about >>the containers journey and she escaped. So we started our container journey in late 2017. We started working with Dr Enterprise to enable the container platform. This is on our on prem infrastructure Back then, or first year or so we walked through multiple Pelosis did a lot of testing to make sure our platform is stable before we onboard either the data or the user applications. I was part of this complete journey on Dr Stream has worked with us very closely towards you. The first milestone off establishing a stable container platform. A tsk. Now, getting into 2019 we started deploying our applications in production environment. I cannot go into the details of what this Absar, but they do include both data pipelines as well as Web services. You know, initial days we have worked a lot on swamp, but in 2019 is when we started looking into communities in the same year, we enable kubernetes orchestration on the doctor and replace platform here at GSK and also made it as a de facto orchestra coming into 2020. All our micro service applications are undead. A pipelines are migrated to the container platforms on all of these are orchestrated by Cuban additional on these air applications that are running in production. As of today, we have made the container forced approach as an architectural standard across already taking GSK. We also started deploying our AML training models onto containers on All this work is happening on our Doctor Enterprise platform. Also as part off are currently platforms hybrid multicolored journey. We started enabling container and kubernetes based platforms on public clubs. Now going into 2021 on future. Enabling our RND users to easily access data and applications in a platform agnostic way is very crucial for our success because previously we had only onto him. Now we have public clothes that are getting involved on One of >>the many steps we're taking through this journey is to >>watch allies the data on ship data and containers or kubernetes volumes on demand to our our end users of scientists. And this allows us to deliver data to our scientists wherever they want in a very security on. We're leveraging doctor to do it. So that's >>our future. Learning on with that, let's take a deep dive into fuel for >>our accomplishments over these years. I want to start with a general demand and innovative one very interesting use case that we developed on Dr. This is a rapid prototyping capability that enabled our scientists seamlessly to Monday cluster communication. This was one off the biggest challenges which way his face for a long time and with the help of containers, were able to solve this on provide this as a capability to our scientists. We actually have shockers this capability in one of the doctor conferences before next. As I've said before, by migrating all over web services into containers, we not only achieved horizontal scalability for those specific services, but also saved more than 50% in support costs for the applications which we have migrated by making Docker image as an immutable artifact In our bill process, we are now able to deploy our APS or models in any container or Cuban, its base platform, either in on Prem or in a public club. We also made significant improvements towards the process. A not a mission By leveraging docker containers, containers have played a significant role in keeping US platform agnostic and thus enabling our hybrid multi cloud Germany valuable for out already did scientists. As I mentioned before, data virtualization is another viewpoint we have in terms off our next steps off where we want to take kubernetes on where we wanna leverage open it. Us. What you see here are just a few off many accomplishments which we have our, um, achieved by using containers for the past three years or so. So with that before I close all the time and acknowledge all our internal partners who has contributed a lot to this journey mainly are in the business are on the deck on the broader take. Organizations that escape also want to time document present Miranda's for being such a great partner throughout this journey and also giving us an opportunity to share this success story today. Lastly, thanks for everyone to listening to the stop and please feel free to reach out. If you have any questions or suggestions, let's be fit safe. Thank you

Published Date : Sep 14 2020

SUMMARY :

Hey, everyone, Thanks for taking them to join the story. What are the business drivers that our commitment to the patient. Our main intent is to lower the burden on the impact of diseases on the patients. World Evidence data some of the examples off how the only platform was evolving into a hybrid medical platform is one of the many steps that we're taking to be There is huge amounts of data that the scientists have took process to not going into the details of the process. have made a lot of investments into technology that enabled us to make data river conditions. enable already did a platform, we started to get a better appreciation off how these And then we have to end also enable the academic partnerships, I cannot go into the details of what this Absar, but they do include both data pipelines We're leveraging doctor to do it. Learning on with that, let's making Docker image as an immutable artifact In our bill process, we are now able to

ENTITIES

Entity	Category	Confidence
DSK	ORGANIZATION	0.99+
Michelle	PERSON	0.99+
2019	DATE	0.99+
2020	DATE	0.99+
GSK	ORGANIZATION	0.99+
late 2017	DATE	0.99+
2021	DATE	0.99+
G S	ORGANIZATION	0.99+
three	QUANTITY	0.99+
mid 2016	DATE	0.99+
K R	ORGANIZATION	0.99+
Monday	DATE	0.99+
more than 50%	QUANTITY	0.99+
D Tech	ORGANIZATION	0.99+
Dallas, Texas	LOCATION	0.99+
four teams	QUANTITY	0.99+
more than three years	QUANTITY	0.98+
GlaxoSmithKline	ORGANIZATION	0.98+
four years later	DATE	0.98+
US	LOCATION	0.98+
today	DATE	0.98+
first milestone	QUANTITY	0.98+
Dr Stream	ORGANIZATION	0.97+
millions of people	QUANTITY	0.97+
3%	QUANTITY	0.97+
one	QUANTITY	0.96+
Miranda	PERSON	0.95+
Germany	LOCATION	0.94+
20	QUANTITY	0.94+
Itek	ORGANIZATION	0.93+
both data pipelines	QUANTITY	0.92+
Dr Enterprise	ORGANIZATION	0.92+
Francisco	PERSON	0.88+
Miranda	ORGANIZATION	0.84+
each	QUANTITY	0.82+
Cuban	OTHER	0.82+
G escape	ORGANIZATION	0.78+
first year	QUANTITY	0.75+
One	QUANTITY	0.74+
last	DATE	0.72+
past three years	DATE	0.71+
months	DATE	0.7+
Crossing Off Rheal	TITLE	0.68+
GSK	TITLE	0.67+
German	OTHER	0.65+
Douglas Corey	PERSON	0.62+
same year	DATE	0.59+
Cuban	LOCATION	0.56+
Wide Association	TITLE	0.55+
Jesus Storage	TITLE	0.55+
R	ORGANIZATION	0.5+
19 GSK	QUANTITY	0.5+
Genome	ORGANIZATION	0.48+
Doctor	TITLE	0.45+
Pelosis	LOCATION	0.42+

Manish Sood, Reltio | AWS re:Invent 2022

(upbeat intro music) >> Good afternoon, ladies and gentlemen and welcome back to fabulous Las Vegas, Nevada where we are theCUBE covering AWS re:Invent for the 10th year in a row. John Furrier, you've been here for all 10. How does this one stack up? >> It's feeling great. It's just back into the saddle of more people. Everyone's getting bigger and growing up. The companies that were originally on are getting stronger, bigger. They're doing takeovers in restaurants and still new players are coming in. More startups are coming in and taking care of what I call the (indistinct) on classic, all the primitives. And then you starting to see a lot more ecosystem platforms building on top of AWS. I call that NextGen Cloud, NextGen AWS. It's happening. It's happening right now. >> Best thing about all of these startups is they grow up, they mature, and we stay the same age, John. (John laughing) All right. All right. All right. Very excited to introduce you our next guest, he wears a lot of hats as the CEO, founder, and chairman at Reltio, please welcome Manish. Manish, welcome to the show. How is your show going so far? >> Well, thank you so much. You know, this is amazing. Just the energy, the number of people. You know, I was here last year, just after the pandemic, and I think it's almost double, if not more the number of people this year. >> John: Pushing 50,000. The high water mark was 65,000 in 2019. >> We should be doing like a Price Is Right sort of thing here on the show and figure out. >> Yeah, $1. >> Savannah: Yeah, yeah. (laughing) One guest, 80,000 guests. How many guests are here? Just in case the audience is not familiar, we know you're fast growing, very exciting business. Tell us what Reltio does. >> So, Reltio is a SaaS platform for data unification and we started Reltio in 2011. We have been serving some of the largest customers across industries like life sciences, healthcare, financial services, insurance, high tech, and retail. Those are, you know, some of the areas that we are focused on. The product capabilities are horizontal because we see the same data problem across every industry. Highly fragmented, highly siloed data that is slowing down the business for every organization out there. And that's the problem that we are solving. We are breaking down these silos, you know, one profile or one record, or one customer product supplier information record at a time, and bringing the acceleration of this unified data to every organization. >> This is the show Steam this year, Adam Celeste is going to be on stage talking about data end to end. Okay. Integrating in all aspects of a company. The word data analyst probably goes away pretty shortly. Everyone was going to be using data. This has been, and he talks about horizontal and vertical use cases. We've been saying that in theCUBE, I think it was about seven years ago, we first said we're going to start to see horizontally scalable data not just compute and cloud. This is now primetime conversation. Making that all work with governance is a real hard problem. Understanding the data. Companies have to put this horizontal and vertical capabilities in place together. >> Absolutely. You know, the data problem may be a horizontal problem, but every industry or vertical that you go into adds its own nuance or flavor to it. And that's why, you know, this has to be a combination of the horizontal and vertical. And we at Reltio thought about this for a while, where, you know, every time we enter a conversation, we are talking about patient data or physician data or client data and financial services or policy and customer information and insurance. But every time it's the number of silos that we encounter that is just an increasing number of applications, increasing number of third party data sources, and bringing that together in a manner where you can understand the semantics of it. Because, you know, every record is not created equal. Every piece of information is not created equal. But at the same time, you have to stitch it together in order to create that holistic, you know, the so-called 360 degree view. Because without that, the types of problems that you're trying to solve are not possible. Right? It's not possible to make those breakthroughs. And that's where I think the problem may be horizontal, but the application of the capabilities has to be verticalized. >> John: I'm smiling because, you know, when you're a founder like you are, and Dave, a lot here are at theCUBE, you're often misunderstood before people figure out what you do and why you started the company. And I can imagine, and knowing you and covering your company, that this is not just yesterday you came up with this idea that now everyone's talking about. There was probably moments in your history when you started, you're scratching it, "Hey the future's going to be this horizontal and vertical, especially where machine learning needs to know the data, the linguistics, whatever the data is, it's got to be very particular for the vertical, but you need to expand it." So when did you have the moment where people finally figured out like, what you guys doing is, like, relevant? I mean, now the whole world now sees- >> Savannah: Overnight success 11 years later. >> John: This shows the first time I've heard Amazon and the industry generally agree that horizontally scalable data systems with vertical value, that it's natural. We've been saying it for seven years on theCUBE. You've been doing the startup. >> Yeah. >> As a founder, you were there early. Now people are getting it. What's it like? Tell, take us through. When did you have the moment? When did you tipping point for the world getting it? >> Yeah, and you know, the key thing to remember is that, you know, not only have I been in this space for a long time but the experiences that we have gone through starting in 2011, there was a lot of focus on, you know, even AWS was at that point in time in the infancy stages. >> Yeah. >> And we said that we are going to set up a software as a service capability that runs only on public cloud because we had seen what customers had tried to do behind their firewalls and the types of hurdles that they had run into before. And while the concept was still in its nascent stages, but the directional signals, the fact that number of applications that you see in use today across any organization, that's growing. It used to be a case when in early 2000s, you know, this is early part of my career, where having six different applications across the enterprise landscape was considered complex. But now those same organizations are talking about 400, 500, a thousand different applications that they're using to run their business end to end. So, you know, this direction was clear. The need for digital transformation was becoming clear. And the fact that, you know, cloud was the only vehicle that you could use to solve these types of ad scale problems was also becoming clear. But what wasn't yet mainstream was this notion that, you know, if you're doing digital transformation, you need access to clean, consistent, trusted information. Or if you're doing machine learning or any kind of data analytics, you need similar kinds of trusted information. It wasn't a mainstream concept, but people were struggling with it because, you know, the whole notion of garbage in garbage out was becoming clearer to them as they started running into hurdles. And it's great to see that now, you know, after having gone through the transformation of, yes, we have provided the compute and the storage, but now we really need to unlock the value out of data that goes on this compute and storage. You know, it's great to see that even Amazon or AWS is talking about it. >> Well, as a founder, it's satisfying, and congratulations, we've been covering that. I got to ask, you mention this end to end. I like the example of in the 2006 applications considered complex, now hundreds and thousands of workloads are on an enterprise. Today we're going to hear more end to end data services on AWS and off AWS, hybrid or edge or whatever, that's happens. Now cross, it sounds like it's going to get more complex still. >> I mean... >> John: Right. I mean, that's not easy. >> Savannah: The gentle understatement of the century. I love that. Yes. >> If Adam's message is end to end, it's going to be more complex. How does it get easier? Because the enterprise, you know, the enterprise vendors love solving complexity with more complexity. That's the wrong answer. >> Well, you're absolutely right that things are going to get more complex. But you know, this is where, whether it is Amazon or you know, us, Reltio as a vendor coming in, the goal should always be what are we going to simplify for the customer? Because they are going to end up with a complex landscape on their hands anyway. Right? >> Savannah: Right. >> So that is where, what can be below the surface and simplified for the customers to use versus bringing their focus to the business value that they can get out of it. Unlocking that business value has to be the key aspect that we have to bring to the front. And, you know, that is where, yes, the landscape complexity may grow, but how is the solution making it simpler, easier, faster for you to get value out of the data that you're trying to work with? >> As a mission, that seems very clear and clean cut, but I'm curious, I can imagine there's so many different things that you're prioritizing when you're thinking about how to solve those problems. What is that decision matrix like for you? >> For us, it goes back to the core focus and the core problem that we are in the business of solving which is in a siloed, fragmented landscape, how can we create a single source of truth orientation that your business can depend on? If you're looking for the unified view of the customer, the product, the supplier, the location, the asset, all these are elements that are critical or crucial for you to run your business end to end. And we are there to provide that solution as Reltio to our customers. So, you know, we always, for our decision matrix have to go back to are we simplifying that problem for our customers and how much faster, easier, nimbler can it be, you know, both as a solution and also the time to value that it brings to the equation for the customer. >> Super important, end of the equation. Clearly you are on to something. You are not only a unicorn company, unicorn company being evaluated at over $1 billion latest evaluation, correct me if I'm wrong, is $1.7 billion as of last year. But you are also a centaur, which is seven times more rare than a unicorn, which for the audience maybe not familiar with the mythical creatures that define the Silicon Valley nomenclature in Lexicon. A centaur is a company with a hundred million in annual reoccurring revenue. How does it feel to be able to say that as a CEO or to hear me say that to you? >> Well, as a CEO, it's, you know, something that we have been working towards. the goal that we can deliver value to our customers, help every industry, you know, you just think about the types of products that you touch in a day, whether it's, you know, any healthcare related products that you're looking at. We are working with customers who are solving for the patient record to be unified with our platform. We are working with financial services companies who are helping you simplify how you do banking with them. We are working with retailers who are working in the area of, you know, leisure apparel or athletic goods and they are using our capabilities to simplify how they deliver better experience to you. So as I go across these industries, being able to influence and touch and simplify things overall for the customers that these companies are serving, that's an amazing feeling. And, you know, doing this while we are also making sure that we can build a durable business that has substantial revenue behind it- >> Savannah: Substantial. >> Gives us a lot of legs to stand on and talk about how we can change how the companies should run their entire data stack. >> And you're obviously a very efficient team practicing what you teach. You told me how many employees that you have? >> We have 450 employees across the globe. >> 450 employees and a hundred million in reoccurring revenue. It's pretty strong. It's pretty strong. >> Thank you. >> That's a quarter million in rev per employee. They're doing a pretty good job. That's absolutely fantastic. >> The cloud has been very successful, partnering with the cloud, a lot of leverage for the cloud. >> And that's been a part of our thesis from the very beginning that, you know, the capabilities that we build and bring to life have to be built on public cloud infrastructure. That's something that has been core to our innovation cycle because we look at it as a layer cake of innovation that we sit on and we can continue to drive faster value for our customers. >> John: Okay, so normally we do a bumper sticker. Tell me the bumper sticker for the show. We changed it to kind of modernize it called the Insta Challenge, Instagram challenge. Instagram has reels, short videos. What's the Instagram reel from your perspective? You have to do an Instagram reel right now about why this time in history, this time in for Amazon web services, this point for Reltio. Why is this moment in time important in the computer industry? Because, you know, we've reported, I put a story out, NextGen Clouds here. People are seeing their status go from ISV to ecosystem platforms on top of AWS. Your success has continued to grow. Something's going on. What's the Instagram reel about why this year's so important in the history of the cloud? >> Well, you know, just think about the overall macroeconomic conditions. You know, everybody's trying to think about where the next, you know, the set of growth is going to come from or how we are going to tackle, you know, what we have as challenges in front of us. And at the end of the day, most of the efficiency that came from applying new applications or, you know, buying new products in the application space has delivered its value. The next unlock is going to come from data. And that is the key that we have to think about because the traditional model of going across 500 different applications to run your business is no longer going to be a scalable model to work with. If you really want to move faster with your business, you have to think about how to use data as a strategic asset and think about things differently. And we are talking about delivering experience at the edge, delivering, you know, real time type of engagement with the customers that we work with. And that is where the entire data value proposition starts to deliver a whole new set of options to the customers. And that's something that we all have to think about differently. It's going to require a fundamentally different architecture, innovation, leading with data instead of thinking about the traditional landscape that we have been running with. >> Leading with data and transforming architecture. A couple themes we've had on the show lately already. >> John: Well I think there's been a great, I mean this is a great leadership example of what's going on in the industry. As young people are looking at their careers. I've talked with a lot of folks under 30, they're trying to figure out what's a good career path and they're looking at all this change in front of them. >> That's a great point, John. >> Whether it's a computer science student or someone in healthcare, these industries are being reinvented with data. What's your advice to those young, this up and coming generation that might not take the traditional path traveled 'cause it might not be there. What's your advice for those people making these career decisions? >> I think there are two things that are relevant to every career option out there. Knowledge and awareness of data and how to apply computing techniques to the data is key and relevant. It's the language that we all have to learn and be familiar with. Without that, you know, you'll be missing a key part of your arsenal that you will be required to bring to work but won't have access to if you're not well-versed or familiar with those two areas. So this is lingua franca that we all have to get used to. >> Data and computer technology applied to business or some application or some problem. >> Manish: Applied to business. You know, figuring out how to apply it to deliver business outcomes is the key thing to keep in mind. >> Okay. >> Yeah. Last question for you to wrap us up. It's obviously an exciting, thrilling, vibrant moment here on the show floor, but I'm curious because I can imagine some of your customers, especially given the scale that they're at, I mean we're talking about some Fortune 100s here, how are you delivering value in this uncertain market? I mean, I know you solved this baseline problem but I can imagine there's a little bit of frantic energy within your customer base. >> Manish: Yeah. You know, with data this has been a traditional challenge. Everybody talks about the motherhood and apple pie. If you have better data, you can drive better outcomes. But some of the work that we have been doing is quantifying, measuring those outcomes and translating what the dollar impact of that value is for each one of the customers. And this is where the work that we have done with large, you know, let's say life sciences companies like AstraZeneca or GSK or in financial services with companies like Northwestern Mutual or Fidelity or, you know, common household names like McDonald's where they're delivering their digital transformation with the data capabilities that we are helping build with them. That's the key part that's been, you know, extremely valuable. And that is where in each one of these situations, we are helping them measure what the ROI is at every turn. So being able to go into these discussions with the hard dollar ROI that you can expect out of it is the key thing that we are focused on. >> And that's so mission critical now and at any economic juncture. Just to echo that, I noticed that Forrester did an independent study looking at customers that invested in your MDM solution. 366% ROI and a total net present value of 13 million over three years. So you clearly deliver on what you just promised there with customers and brands that we touch in all of our everyday lives. Manish, thank you so much for being on the show with us today. You and Reltio are clearly crushing it. We can't wait to have you back hopefully for some more exciting updates at next year's AWS re:Invent. John, thanks for- >> Or sooner. >> Yeah, yeah. Or sooner or maybe in the studios or who knows, at one of the other fabulous events we'll all be at. I'm sure you'll be traveling around given the success that the company is seeing. And John, thanks for bringing the young folks into the conversation, was a really nice touch. >> We got skill gaps, we might as well solve that right now. >> Yeah. And I like to think that there are young minds watching theCUBE or at least watching, maybe their parents are- >> We're streaming to Twitch. All the gamers are watching this right now. Stop playing the video games. >> We have the hottest stream on Twitch right now if you're not already ready for it. John Furrier, Manish Sood, thank you so much for being on the show with us. Thank all of you at home or at the office or in outer space or wherever you happen to be tuned in to this fabulous live stream. You are watching theCUBE, the leader in high tech coverage. My name is Savannah Peterson. We're at AWS re:Invent here in Las Vegas where we'll have our head in the clouds all week.

Published Date : Nov 29 2022

SUMMARY :

for the 10th year in a row. It's just back into the Very excited to introduce you the number of people this year. The high water mark was 65,000 in 2019. the show and figure out. Just in case the audience is not familiar, some of the areas that we are focused on. This is the show Steam But at the same time, you the future's going to be this Savannah: Overnight and the industry generally agree that for the world getting it? the key thing to remember And the fact that, you know, I got to ask, you mention this end to end. I mean, that's not easy. I love that. Because the enterprise, you or you know, us, Reltio and simplified for the customers to use how to solve those problems. and also the time to value that it brings that define the Silicon Valley for the patient record to be how the companies should employees that you have? in reoccurring revenue. in rev per employee. lot of leverage for the cloud. from the very beginning that, you know, in the history of the cloud? And that is the key that on the show lately already. I mean this is a great leadership example might not take the It's the language that technology applied to business the key thing to keep in mind. especially given the is the key thing that we are focused on. on the show with us today. or maybe in the studios or who knows, We got skill gaps, we might that there are young minds All the gamers are for being on the show with us.

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Dave	PERSON	0.99+
AWS	ORGANIZATION	0.99+
2011	DATE	0.99+
AstraZeneca	ORGANIZATION	0.99+
GSK	ORGANIZATION	0.99+
2019	DATE	0.99+
Fidelity	ORGANIZATION	0.99+
Savannah Peterson	PERSON	0.99+
13 million	QUANTITY	0.99+
Savannah	PERSON	0.99+
Adam	PERSON	0.99+
65,000	QUANTITY	0.99+
Northwestern Mutual	ORGANIZATION	0.99+
seven years	QUANTITY	0.99+
John Furrier	PERSON	0.99+
$1	QUANTITY	0.99+
Manish	PERSON	0.99+
seven times	QUANTITY	0.99+
2006	DATE	0.99+
80,000 guests	QUANTITY	0.99+
450 employees	QUANTITY	0.99+
$1.7 billion	QUANTITY	0.99+
366%	QUANTITY	0.99+
last year	DATE	0.99+
Las Vegas	LOCATION	0.99+
last year	DATE	0.99+
360 degree	QUANTITY	0.99+
One guest	QUANTITY	0.99+
500 different applications	QUANTITY	0.99+
Twitch	ORGANIZATION	0.99+
Adam Celeste	PERSON	0.99+
Manish Sood	PERSON	0.99+
two things	QUANTITY	0.99+
Las Vegas, Nevada	LOCATION	0.99+
50,000	QUANTITY	0.99+
Forrester	ORGANIZATION	0.99+
early 2000s	DATE	0.99+
Silicon Valley	LOCATION	0.98+
yesterday	DATE	0.98+
six different applications	QUANTITY	0.98+
Instagram	ORGANIZATION	0.98+
this year	DATE	0.98+
hundreds	QUANTITY	0.98+
11 years later	DATE	0.98+
over $1 billion	QUANTITY	0.98+
next year	DATE	0.98+
one record	QUANTITY	0.98+
Today	DATE	0.98+
two areas	QUANTITY	0.98+
pandemic	EVENT	0.98+
Steam	ORGANIZATION	0.98+
Reltio	ORGANIZATION	0.97+
first	QUANTITY	0.97+
Manish Sood	PERSON	0.97+
today	DATE	0.97+
one customer	QUANTITY	0.97+
Reltio	PERSON	0.97+
first time	QUANTITY	0.96+
one profile	QUANTITY	0.96+
Reltio	TITLE	0.96+
McDonald's	ORGANIZATION	0.96+
both	QUANTITY	0.96+
each	QUANTITY	0.96+
a day	QUANTITY	0.95+

Keynote Analysis | AWS re:Inforce 2022

>>Hello, everyone. Welcome to the Cube's live coverage here in Boston, Massachusetts for AWS reinforce 2022. I'm John fur, host of the cube with Dave. Valante my co-host for breaking analysis, famous podcast, Dave, great to see you. Um, Beck in Boston, 2010, we started >>The queue. It all started right here in this building. John, >>12 years ago, we started here, but here, you know, just 12 years, it just seems like a marathon with the queue. Over the years, we've seen many ways. You call yourself a historian, which you are. We are both now, historians security is doing over. And we said in 2013 is security to do where we asked pat GSK. Now the CEO of Intel prior to that, he was the CEO of VMware. This is the security show fors. It's called the reinforce. They have reinvent, which is their big show. Now they have these, what they call reshow, re Mars, machine learning, automation, um, robotics and space. And then they got reinforced, which is security. It's all about security in the cloud. So great show. Lot of talk about the keynotes were, um, pretty, I wouldn't say generic on one hand, but specific in the other clear AWS posture, we were both watching. What's your take? >>Well, John, actually looking back to may of 2010, when we started the cube at EMC world, and that was the beginning of this massive boom run, uh, which, you know, finally, we're starting to see some, some cracks of the armor. Of course, we're threats of recession. We're in a recession, most likely, uh, in inflationary pressures, interest rate hikes. And so, you know, finally the tech market has chilled out a little bit and you have this case before we get into the security piece of is the glass half full or half empty. So budgets coming into this year, it was expected. They would grow at a very robust eight point half percent CIOs have tuned that down, but it's still pretty strong at around 6%. And one of the areas that they really have no choice, but to focus on is security. They moved everything into the cloud or a lot of stuff into the cloud. >>They had to deal with remote work and that created a lot of security vulnerabilities. And they're still trying to figure that out and plug the holes with the lack of talent that they have. So it's interesting re the first reinforc that we did, which was also here in 2019, Steven Schmidt, who at the time was chief information security officer at Amazon web services said the state of cloud security is really strong. All this narrative, like the pat Gelsinger narrative securities, a do over, which you just mentioned, security is broken. It doesn't help the industry. The state of cloud security is very strong. If you follow the prescription. Well, see, now Steven Schmidt, as you know, is now chief security officer at Amazon. So we followed >>Jesse all Amazon, not just AWS. So >>He followed Jesse over and I asked him, well, why no, I, and they said, well, he's responsible now for physical security. Presumably the warehouses I'm like, well, wait a minute. What about the data centers? Who's responsible for that? So it's kind of funny, CJ. Moses is now the CSO at AWS and you know, these events are, are good. They're growing. And it's all about best practices, how to apply the practices. A lot of recommendations from, from AWS, a lot of tooling and really an ecosystem because let's face it. Amazon doesn't have the breadth and depth of tools to do it alone. >>And also the attendance is interesting, cuz we are just in New York city for the, uh, ado summit, 19,000 people, massive numbers, certainly in the pandemic. That's probably one of the top end shows and it was a summit. This is a different audience. It's security. It's really nerdy. You got OT, you got cloud. You've got on-prem. So now you have cloud operations. We're calling super cloud. Of course we're having our inaugural pilot event on August 9th, check it out. We're called super cloud, go to the cube.net to check it out. But this is the super cloud model evolving with security. And what you're hearing today, Dave, I wanna get your reaction to this is things like we've got billions of observational points. We're certainly there's no perimeter, right? So the perimeter's dead. The new perimeter, if you will, is every transaction at scale. So you have to have a new model. So security posture needs to be rethought. They actually said that directly on the keynote. So security, although numbers aren't as big as last week or two weeks ago in New York still relevant. So alright. There's sessions here. There's networking. Very interesting demographic, long hair. Lot of >>T-shirts >>No lot of, not a lot of nerds doing to build out things over there. So, so I gotta ask you, what's your reaction to this scale as the new advantage? Is that a tailwind or a headwind? What's your read? >>Well, it is amazing. I mean he actually, Steven Schmidt talked about quadrillions of events every month, quadrillions 15 zeros. What surprised me, John. So they, they, Amazon talks about five areas, but by the, by the way, at the event, they got five tracks in 125 sessions, data protection and privacy, GRC governance, risk and compliance, identity network security and threat detection. I was really surprised given the focus on developers, they didn't call out container security. I would've thought that would be sort of a separate area of focus, but to your point about scale, it's true. Amazon has a scale where they'll see events every day or every month that you might not see in a generation if you just kind of running your own data center. So I do think that's, that's, that's, that's a, a, a, a valid statement having said that Amazon's got a limited capability in terms of security. That's why they have to rely on the ecosystem. Now it's all about APIs connecting in and APIs are one of the biggest security vulnerability. So that's kind of, I, I I'm having trouble squaring that circle. >>Well, they did just to come up, bring back to the whole open source and software. They did say they did make a measurement was store, but at the beginning, Schmidt did say that, you know, besides scale being an advantage for Amazon with a quadri in 15 zeros, don't bolt on security. So that's a classic old school. We've heard that before, right. But he said specifically, weave in security in the dev cycles. And the C I C D pipeline that is, that basically means shift left. So sneak is here, uh, company we've covered. Um, and they, their whole thing is shift left. That implies Docker containers that implies Kubernetes. Um, but this is not a cloud native show per se. It's much more crypto crypto. You heard about, you know, the, uh, encrypt everything message on the keynote. You heard, um, about reasoning, quantum, quantum >>Skating to the puck. >>Yeah. So yeah, so, you know, although the middleman is logged for J heard that little little mention, I love the quote from Lewis Hamilton that they put up on stage CJ, Moses said, team behind the scenes make it happen. So a big emphasis on teamwork, big emphasis on don't bolt on security, have it in the beginning. We've heard that before a lot of threat modeling discussions, uh, and then really this, you know, the news around the cloud audit academy. So clearly skills gap, more threats, more use cases happening than ever before. >>Yeah. And you know, to your point about, you know, the teamwork, I think the problem that CISOs have is they just don't have the talent to that. AWS has. So they have a real difficulty applying that talent. And so but's saying, well, join us at these shows. We'll kind of show you how to do it, how we do it internally. And again, I think when you look out on this ecosystem, there's still like thousands and thousands of tools that practitioners have to apply every time. There's a tool, there's a separate set of skills to really understand that tool, even within AWS's portfolio. So this notion of a shared responsibility model, Amazon takes care of, you know, securing for instance, the physical nature of S3 you're responsible for secure, make sure you're the, the S3 bucket doesn't have public access. So that shared responsibility model is still very important. And I think practitioners still struggling with all this complexity in this matrix of tools. >>So they had the layered defense. So, so just a review opening keynote with Steve Schmidt, the new CSO, he talked about weaving insecurity in the dev cycles shift left, which is the, I don't bolt it on keep in the beginning. Uh, the lessons learned, he talked a lot about over permissive creates chaos, um, and that you gotta really look at who has access to what and why big learnings there. And he brought up the use cases. The more use cases are coming on than ever before. Um, layered defense strategy was his core theme, Dave. And that was interesting. And he also said specifically, no, don't rely on single security control, use multiple layers, stronger together. Be it it from the beginning, basically that was the whole ethos, the posture, he laid that down >>And he had a great quote on that. He said, I'm sorry to interrupt single controls. And binary states will fail guaranteed. >>Yeah, that's a guarantee that was basically like, that's his, that's not a best practice. That's a mandate. <laugh> um, and then CJ, Moses, who was his deputy in the past now takes over a CSO, um, ownership across teams, ransomware mitigation, air gaping, all that kind of in the weeds kind of security stuff. You want to check the boxes on. And I thought he did a good job. Right. And he did the news. He's the new CISO. Okay. Then you had lean is smart from Mongo DB. Come on. Yeah. Um, she was interesting. I liked her talk, obviously. Mongo is one of the ecosystem partners headlining game. How do you read into that? >>Well, I, I I'm, its really interesting. Right? You didn't see snowflake up there. Right? You see data breaks up there. You had Mongo up there and I'm curious is her and she's coming on the cube tomorrow is her primary role sort of securing Mongo internally? Is it, is it securing the Mongo that's running across clouds. She's obviously here talking about AWS. So what I make of it is, you know, that's, it's a really critical partner. That's driving a lot of business for AWS, but at the same time it's data, they talked about data security being one of the key areas that you have to worry about and that's, you know what Mongo does. So I'm really excited. I talked to her >>Tomorrow. I, I did like her mention a big idea, a cube alumni, yeah. Company. They were part of our, um, season one of our eight of us startup showcase, check out AWS startups.com. If you're watching this, we've been doing now, we're in season two, we're featuring the fastest growing hottest startups in the ecosystem. Not the big players, that's ISVs more of the startups. They were mentioned. They have a great product. So I like to mention a big ID. Um, security hub mentioned a config. They're clearly a big customer and they have user base, a lot of E C, two and storage going on. People are building on Mongo so I can see why they're in there. The question I want to ask you is, is Mongo's new stuff in line with all the upgrades in the Silicon. So you got graviton, which has got great stuff. Um, great performance. Do you see that, that being a key part of things >>Well, specifically graviton. So I I'll tell you this. I'll tell you what I know when you look at like snowflake, for instance, is optimizing for graviton. For certain workloads, they actually talked about it on their earnings call, how it's lowered the cost for customers and actually hurt their revenue. You know, they still had great revenue, but it hurt their revenue. My sources indicate to me that that, that Mongo is not getting as much outta graviton two, but they're waiting for graviton three. Now they don't want to make that widely known because they don't wanna dis AWS. But it's, it's probably because Mongo's more focused on analytics. But so to me, graviton is the future. It's lower cost. >>Yeah. Nobody turns off the database. >>Nobody turns off the database. >><laugh>, it's always cranking C two cycles. You >>Know the other thing I wanted to bring, bring up, I thought we'd hear, hear more about ransomware. We heard a little bit of from Kirk Coel and he, and he talked about all these things you could do to mitigate ransomware. He didn't talk about air gaps and that's all you hear is how air gap. David Flo talks about this all the time. You must have air gaps. If you wanna, you know, cover yourself against ransomware. And they didn't even mention that. Now, maybe we'll hear that from the ecosystem. That was kind of surprising. Then I, I saw you made a note in our shared doc about encryption, cuz I think all the talk here is encryption at rest. What about data in motion? >>Well, this, this is the last guy that came on the keynote. He brought up encryption, Kurt, uh, Goel, which I love by the way he's VP of platform. I like his mojo. He's got the long hair >>And he's >>Geeking out swagger, but I, he hit on some really cool stuff. This idea of the reasoning, right? He automated reasoning is little pet project that is like killer AI. That's next generation. Next level >>Stuff. Explain that. >>So machine learning does all kinds of things, you know, goes to sit pattern, supervise, unsupervised automate stuff, but true reasoning. Like no one connecting the dots with software. That's like true AI, right? That's really hard. Like in word association, knowing how things are connected, looking at pattern and deducing things. So you predictive analytics, we all know comes from great machine learning. But when you start getting into deduction, when you say, Hey, that EC two cluster never should be on the same VPC, is this, this one? Why is this packet trying to go there? You can see patterns beyond normal observation space. So if you have a large observation space like AWS, you can really put some killer computer science technology on this. And that's where this reasoning is. It's next level stuff you don't hear about it because nobody does it. Yes. I mean, Google does it with metadata. There's meta meta reasoning. Um, we've been, I've been watching this for over two decades now. It's it's a part of AI that no one's tapped and if they get it right, this is gonna be a killer part of the automation. So >>He talked about this, basically it being advanced math that gets you to provable security, like you gave an example. Another example I gave is, is this S3 bucket open to the public is a, at that access UN restricted or unrestricted, can anyone access my KMS keys? So, and you can prove, yeah. The answer to that question using advanced math and automated reasoning. Yeah, exactly. That's a huge leap because you used to be use math, but you didn't have the data, the observation space and the compute power to be able to do it in near real time or real time. >>It's like, it's like when someone, if in the physical world real life in real life, you say, Hey, that person doesn't belong here. Or you, you can look at something saying that doesn't fit <laugh> >>Yeah. Yeah. >>So you go, okay, you observe it and you, you take measures on it or you query that person and say, why you here? Oh, okay. You're here. It doesn't fit. Right. Think about the way on the right clothes, the right look, whatever you kind of have that data. That's deducing that and getting that information. That's what reasoning is. It's it's really a killer level. And you know, there's encrypt, everything has to be data. Lin has to be data in at movement at rest is one thing, but you gotta get data in flight. Dave, this is a huge problem. And making that work is a key >>Issue. The other thing that Kirk Coel talked about was, was quantum, uh, quantum proof algorithms, because basically he put up a quote, you're a hockey guy, Wayne Greski. He said the greatest hockey player ever. Do you agree? I do agree. Okay, great. >>Bobby or, and Wayne Greski. >>Yeah, but okay, so we'll give the nada Greski, but I always skate to the where the puck is gonna be not to where it's been. And basically his point was where skating to where quantum is going, because quantum, it brings risks to basically blow away all the existing crypto cryptographic algorithms. I, I, my understanding is N just came up with new algorithms. I wasn't clear if those were supposed to be quantum proof, but I think they are, and AWS is testing them. And AWS is coming out with, you know, some test to see if quantum can break these new algos. So that's huge. The question is interoperability. Yeah. How is it gonna interact with all the existing algorithms and all the tools that are out there today? So I think we're a long way off from solving that problem. >>Well, that was one of Kurt's big point. You talking about quantum resistant cryptography and they introduce hybrid post quantum key agreements. That means KMS cert certification, cert manager and manager all can manage the keys. This was something that's gives more flexibility on, on, on that quantum resistance argument. I gotta dig into it. I really don't know how it works, what he meant by that in terms of what does that hybrid actually mean? I think what it means is multi mode and uh, key management, but we'll see. >>So I come back to the ho the macro for a second. We've got consumer spending under pressure. Walmart just announced, not great earning. Shouldn't be a surprise to anybody. We have Amazon meta and alphabet announcing this weekend. I think Microsoft. Yep. So everybody's on edge, you know, is this gonna ripple through now? The flip side of that is BEC because the economy yeah. Is, is maybe not in, not such great shape. People are saying maybe the fed is not gonna raise after September. Yeah. So that's, so that's why we come back to this half full half empty. How does that relate to cyber security? Well, people are prioritizing cybersecurity, but it's not an unlimited budget. So they may have to steal from other places. >>It's a double whammy. Dave, it's a double whammy on the spend side and also the macroeconomic. So, okay. We're gonna have a, a recession that's predicted the issue >>On, so that's bad on the one hand, but it's good from a standpoint of not raising interest rates, >>It's one of the double whammy. It was one, it's one of the double whammy and we're talking about here, but as we sit on the cube two weeks ago at <inaudible> summit in New York, and we did at re Mars, this is the first recession where the cloud computing hyperscale is, are pumping full cylinder, all cylinders. So there's a new economic engine called cloud computing that's in place. So unlike data center purchase in the past, that was CapEx. When, when spending was hit, they pause was a complete shutdown. Then a reboot cloud computer. You can pause spending for a little bit, make, might make the cycle longer in sales, but it's gonna be quickly fast turned on. So, so turning off spending with cloud is not that hard to do. You can hit pause and like check things out and then turn it back on again. So that's just general cloud economics with security though. I don't see the spending slowing down. Maybe the sales cycles might go longer, but there's no spending slow down in my mind that I see. And if there's any pause, it's more of refactoring, whether it's the crypto stuff or new things that Amazon has. >>So, so that's interesting. So a couple things there. I do think you're seeing a slight slow down in the, the, the ex the velocity of the spend. When you look at the leaders in spending velocity in ETR data, CrowdStrike, Okta, Zscaler, Palo Alto networks, they're all showing a slight deceleration in spending momentum, but still highly elevated. Yeah. Okay. So, so that's a, I think now to your other point, really interesting. What you're saying is cloud spending is discretionary. That's one of the advantages. I can dial it down, but track me if I'm wrong. But most of the cloud spending is with reserved instances. So ultimately you're buying those reserved instances and you have to spend over a period of time. So they're ultimately AWS is gonna see that revenue. They just might not see it for this one quarter. As people pull back a little bit, right. >>It might lag a little bit. So it might, you might not see it for a quarter or two, so it's impact, but it's not as severe. So the dialing up, that's a key indicator get, I think I'm gonna watch that because that's gonna be something that we've never seen before. So what's that reserve now the wild card and all this and the dark horse new services. So there's other services besides the classic AC two, but security and others. There's new things coming out. So to me, this is absolutely why we've been saying super cloud is a thing because what's going on right now in security and cloud native is there's net new functionality that needs to be in place to handle multiple clouds, multiple abstraction layers, and to do all these super cloudlike capabilities like Mike MongoDB, like these vendors, they need to up their gain. And that we're gonna see new cloud native services that haven't exist. Yeah. I'll use some hatchy Corp here. I'll use something over here. I got some VMware, I got this, but there's gaps. Dave, there'll be gaps that are gonna emerge. And I think that's gonna be a huge wild >>Cup. And now I wanna bring something up on the super cloud event. So you think about the layers I, as, uh, PAs and, and SAS, and we see super cloud permeating, all those somebody ask you, well, because we have Intuit coming on. Yep. If somebody asks, why Intuit in super cloud, here's why. So we talked about cloud being discretionary. You can dial it down. We saw that with snowflake sort of Mongo, you know, similarly you can, if you want dial it down, although transaction databases are to do, but SAS, the SAS model is you pay for it every month. Okay? So I've, I've contended that the SAS model is not customer friendly. It's not cloudlike and it's broken for customers. And I think it's in this decade, it's gonna get fixed. And people are gonna say, look, we're gonna move SAS into a consumption model. That's more customer friendly. And that's something that we're >>Gonna explore in the super cloud event. Yeah. And one more thing too, on the spend, the other wild card is okay. If we believe super cloud, which we just explained, um, if you don't come to the August 9th event, watch the debate happen. But as the spending gets paused, the only reason why spending will be paused in security is the replatforming of moving from tools to platforms. So one of the indicators that we're seeing with super cloud is a flight to best of breeds on platforms, meaning hyperscale. So on Amazon web services, there's a best of breed set of services from AWS and the ecosystem on Azure. They have a few goodies there and customers are making a choice to use Azure for certain things. If they, if they have teams or whatever or office, and they run all their dev on AWS. So that's kind of what's happened. So that's, multi-cloud by our definition is customers two clouds. That's not multi-cloud, as in things are moving around. Now, if you start getting data planes in there, these customers want platforms. If I'm a cybersecurity CSO, I'm moving to platforms, not just tools. So, so maybe CrowdStrike might have it dial down, but a little bit, but they're turning into a platform. Splunk trying to be a platform. Okta is platform. Everybody's scale is a platform. It's a platform war right now, Dave cyber, >>A right paying identity. They're all plat platform, beach products. We've talked about that a lot in the queue. >>Yeah. Well, great stuff, Dave, let's get going. We've got two days alive coverage. Here is a cubes at, in Boston for reinforc 22. I'm Shante. We're back with our guests coming on the queue at the short break.

Published Date : Jul 26 2022

SUMMARY :

I'm John fur, host of the cube with Dave. It all started right here in this building. Now the CEO of Intel prior to that, he was the CEO of VMware. And one of the areas that they really have no choice, but to focus on is security. out and plug the holes with the lack of talent that they have. So And it's all about best practices, how to apply the practices. So you have to have a new No lot of, not a lot of nerds doing to build out things over there. Now it's all about APIs connecting in and APIs are one of the biggest security vulnerability. And the C I C D pipeline that is, that basically means shift left. I love the quote from Lewis Hamilton that they put up on stage CJ, Moses said, I think when you look out on this ecosystem, there's still like thousands and thousands I don't bolt it on keep in the beginning. He said, I'm sorry to interrupt single controls. And he did the news. So what I make of it is, you know, that's, it's a really critical partner. So you got graviton, which has got great stuff. So I I'll tell you this. You and he, and he talked about all these things you could do to mitigate ransomware. He's got the long hair the reasoning, right? Explain that. So machine learning does all kinds of things, you know, goes to sit pattern, supervise, unsupervised automate but you didn't have the data, the observation space and the compute power to be able It's like, it's like when someone, if in the physical world real life in real life, you say, Hey, that person doesn't belong here. the right look, whatever you kind of have that data. He said the greatest hockey player ever. you know, some test to see if quantum can break these new cert manager and manager all can manage the keys. So everybody's on edge, you know, is this gonna ripple through now? We're gonna have a, a recession that's predicted the issue I don't see the spending slowing down. But most of the cloud spending is with reserved So it might, you might not see it for a quarter or two, so it's impact, but it's not as severe. So I've, I've contended that the SAS model is not customer friendly. So one of the indicators that we're seeing with super cloud is a We've talked about that a lot in the queue. We're back with our guests coming on the queue at the short break.

ENTITIES

Entity	Category	Confidence
Steven Schmidt	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Wayne Greski	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Boston	LOCATION	0.99+
John	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
2013	DATE	0.99+
Moses	PERSON	0.99+
New York	LOCATION	0.99+
Mongo	ORGANIZATION	0.99+
August 9th	DATE	0.99+
David Flo	PERSON	0.99+
Bobby	PERSON	0.99+
2019	DATE	0.99+
Steve Schmidt	PERSON	0.99+
Shante	PERSON	0.99+
Kurt	PERSON	0.99+
thousands	QUANTITY	0.99+
Jesse	PERSON	0.99+
Lewis Hamilton	PERSON	0.99+
125 sessions	QUANTITY	0.99+
two days	QUANTITY	0.99+
VMware	ORGANIZATION	0.99+
last week	DATE	0.99+
Google	ORGANIZATION	0.99+
eight	QUANTITY	0.99+
12 years	QUANTITY	0.99+
2010	DATE	0.99+
John fur	PERSON	0.99+
today	DATE	0.99+
19,000 people	QUANTITY	0.99+
Greski	PERSON	0.99+
Zscaler	ORGANIZATION	0.99+
Kirk Coel	PERSON	0.99+
SAS	ORGANIZATION	0.99+
Goel	PERSON	0.99+
Intel	ORGANIZATION	0.99+
two	QUANTITY	0.99+
12 years ago	DATE	0.98+
both	QUANTITY	0.98+
Okta	ORGANIZATION	0.98+
Tomorrow	DATE	0.98+
two weeks ago	DATE	0.98+
15 zeros	QUANTITY	0.98+
five tracks	QUANTITY	0.98+
first	QUANTITY	0.98+
Beck	PERSON	0.98+

Aaron Kalb, Alation | MIT CDOIQ 2019

>> From Cambridge, Massachusetts, it's theCUBE covering MIT Chief Data Officer and Information Quality Symposium 2019, brought to you by SiliconANGLE Media. (dramatic music) >> Welcome back to Cambridge, Massachusetts, everybody. This is theCUBE, the leader in live tech coverage. We go out to the events, and we extract the signal from then noise. And, we're here at the MIT CDOIQ, the Chief Data Officer conference. I'm Dave Vellante with my cohost Paul Gillin. Day two of our wall to wall coverage. Aaron Kalb is here. He's the cofounder and chief data officer of Alation. Aaron, thanks for making the time to come on. >> Thanks so much Dave and Paul for having me. >> You're welcome. So, words matter, you know, and we've been talking about data, and big data, and the three Vs, and data is the new oil, and all this stuff. You gave a talk this week about, you know, "We're maybe not talking the right language "when it comes to data." What did you mean by all that? >> Absolutely, so I get a little bit frustrated by some of these cliques we hear at conference after conference, and the one I, sort of, took aim at in this talk is, data is the new oil. I think what people want to invoke with that is to say, in the same way that oil powered the industrial age, data's powering the information age. Just saying, data's really cool and trendy and important. That's true, but there are a lot of other associations and contexts that people have with data, and some of them don't really apply as, I'm sorry, with oil. And, some of them apply, as well, to data. >> So, is data more valuable than oil? >> Well, I think they're each valuable in different ways, but I think there's a couple issues with the metaphor. One is that data is scarce and dwindling, and part of value comes from the fact that it's so rare. Whereas, the experience with data is that it's so plentiful and abundant, we're almost drowning in it. And so, what I contend is, instead of talking about data as compared to oil, we should talk about data compared to water. And, the idea is, you know, water is very plentiful on the planet, but sometimes, you know, if you have saltwater or contaminated water, you can't drink it. Water is good for different purposes, depending on its form, and so it's all about getting the right data for the right purpose, like water. >> Well, we've certainly, at least in my opinion, fought wars, Paul, over oil. >> And, over water. >> And, certainly, conflicts over water. Do you think we'll be fighting wars over data? Or, are we already? >> No, we might be. One of my favorite talks from the sessions here was a keynote by the CDO for the Department of Defense, who was talking about, you know, the civic duty about transparency but was observing that, actually, more IP addresses from China and Russia are looking at our public datasets than from within the country. So, you know, it's definitely a resource that can be very powerful. >> So, what was the reaction to your premise from the audience. What kind of questions did you get? >> You know, people actually responded very favorably, including some folks from the oil and gas industry, which I was pleased to find. We have a lot of customers in energy, so that was cool. But, what it was nice being here at MIT and just really geeking out about language and linguistics and data with a bunch of CDOs and other people who are, kind of, data intellectuals. >> Right, so if data is not the new oil. >> And, water isn't really a good analogy either, because the supply of water is finite. >> That's true. >> So, what is data? >> Yeah. >> Space? >> Yeah, it's a good point. >> Matter? >> Maybe it is like the universe in that it's always expanding, right, somehow. Right, because any thing, any physic which is on the planet probably won't be growing at that exponential speed. >> So, give us the punchline. >> Well, so I contend that water, while imperfect, is, actually, a really good metaphor that helps for a lot of things. It has properties like the fact that if it's a data quality issue, it flows downstream like pollution in a river. It's the fact that it can come in different forms, useful for different purposes. You might have gray water, right, which is good enough for, you know, irrigation or industrial purposes, but not safe to drink. And so, you rely on metadata to get the data that's in the right form. And, you know, the talk is more fun because you've a lot of visual examples that make this clear. >> Yeah, of course, yeah. >> I actually had one person in the audience say that he used a similar analogy in his own company, so it's fun to trade notes. >> So, chief data officer is a relatively new title for you, is it not? In terms of your role at Alation. >> Yeah, that's right, and the most fun thing about my job is being able to interact with all of the other CDOs and CDAOs at a conference like this. And, it was cool to see. I believe this conference doubled since the last year. Is that right? >> No. >> No, it's up about a hundred, though. >> Right. >> Well. >> And, it's about double from three years ago. >> And, when we first started, in 2013, yeah. >> 130 people, yeah. >> Yeah, it was a very small and intimate event. >> Yeah, here we're outgrowing this building, it seems. >> Yeah, they're kicking us out. >> I think what's interesting is, you know, if we do a little bit of analysis, this is a small data, within our own company, you know, our biggest and most visionary customers typically bought Alation. The buyer champion either was a CDO or they weren't a CDO when they bought the software and have since been promoted to be a CDO. And so, seeing this trend of more and more CDOs cropping up is really exciting for us. And also, just hearing all of the people at the conference saying, two trends we're hearing. A move from, sort of, infrastructure and technology to driving business value, and a move from defense and governance to, sort of, playing offense and doing revenue generation with data. Both of those trends are really exciting for us. >> So, don't hate me for asking this question, because what a lot of companies will do is, they'll give somebody a CDO title, and it's, kind of, a little bit of gimmick, right, to go to market. And, they'll drag you into sales, because I'm sure they do, as a cofounder. But, as well, I know CDOs at tech companies that are actually trying to apply new techniques, figure out how data contributes to their business, how they can cut costs, raise revenue. Do you have an internal role, as well? >> Absolutely, yeah. >> Explain that. >> So, Alation, you know, we're about 250 people, so we're not at the same scale as many of the attendees here. But, we want to learn, you know, from the best, and always apply everything that we learn internally as well. So, obviously, analytics, data science is a huge role in our internal operations. >> And so, what kinds of initiatives are you driving internally? Is it, sort of, cost initiatives, efficiency, innovation? >> Yeah, I think it's all of the above, right. Every single division and both in the, sort of, operational efficiency and cost cutting side as well as figuring out the next big bet to make, can be informed by data. And, our goal was to empower a curious and rational world, and our every decision be based not on the highest paid person's opinion, but on the best evidence possible. And so, you know, the goal of my function is largely to enable that both centrally and within each business unit. >> I want to talk to you about data catalogs a bit because it's a topic close to my heart. I've talked to a lot of data catalog companies over the last couple years, and it seems like, for one thing, the market's very crowded right now. It seems to me. Would you agree there are a lot of options out there? >> Yeah, you know, it's been interesting because when we started it, we were basically the first company to make this technology and to, kind of, use this term, data catalog, in this way. And, it's been validating to see, you know, a lot of big players and other startups even, kind of, coming to that terminology. But, yeah, it has gotten more crowded, and I think our customers who, or our prospects, used to ask us, you know, "What is it that you do? "Explain this catalog metaphor to me," are now saying, "Yeah, catalogs, heard about that." >> It doesn't need to be defined anymore. >> "Which one should I pick? "Why you?" Yeah. >> What distinguished one product from another, you know? What are the major differentiation points? >> Yeah, I think one thing that's interesting is, you know, my talk was about how the metaphors we use shape the way we think. And, I think there's a sense in which, kind of, the history of each company shapes their philosophy and their approach, so we've always been a data catalog company. That's our one product. Some of the other catalog vendors come from ETL background, so they're a lot more focused on technical metadata and infrastructure. Some of the catalog products grew out of governance, and so it's, sort of, governance first, no sorry, defense first and then offense secondary. So, I think that's one of the things, I think, we encourage our prospects to look at, is, kind of, the soul of the company and how that affects their decisions. The other thing is, of course, technology. And, what we at Alation are really excited about, and it's been validating to hear Gartner and others and a lot of the people here, like the GSK keynote speaker yesterday, talking about the importance of comprehensiveness and on taking a behavioral approach, right. We have our Behavioral IO technology that really says, "Let's not look at all the bits and the bytes, "but how are people using the data to drive results?" As our core differentiator. >> Do your customers generally standardize on one data catalog, or might they have multiple catalogs for multiple purposes? >> Yeah, you know, we heard a term more last season, of catalog of catalogs, you know. And, people here can get arbitrarily, you know, meta, meta, meta data, where we like to go there. I think the customers we see most successful tend to have one catalog that serves this function of the single source of reference. Many of our customers will say, you know, that their catalog serves as, sort of, their internal Google for data. Or, the one stop shop where you could find everything. Even though they may have many different sources, Typically you don't want to have siloed catalogs. It makes it harder to find what you're looking for. >> Let's play a little word association with some metaphors. Data lake. (laughter) >> Data lake's another one that I sort of hate. If you think about it, people had data warehouses and didn't love them, but at least, when you put something into a warehouse, you can get it out, right. If you throw something into a lake, you know, there's really no hope you're ever going to find it. It's probably not going to be in great shape, and we're not surprised to find that many folks who invested heavily in data lakes are now having to invest in a layer over it, to make it comprehensible and searchable. >> So, yeah, the lake is where we hide the stolen cars. Data swamp. >> Yeah, I mean, I think if your point is it's worse than lake, it works. But, I think we can do better a lake, right. >> How about data ocean? (laughter) >> You know, out of respect for John Furrier, I'll say it's fantastic. But, to us we think, you know, it isn't really about the size. The more data you have, people think the more data the better. It's actually the more data the worse unless you have a mechanism for finding the little bit of data that is relevant and useful for your task and put it to use. >> And to, want to set up, enter the catalog. So, technically, how does the catalog solve that problem? >> Totally, so if we think about, maybe let's go to the warehouse, for example. But, it works just as well on a data lake in practice. >> Yeah, cool. >> Through the catalog is. It starts with the inventory, you know, what's on every single shelf. But, if you think about what Amazon has done, they have the inventory warehouse in the back, but what you see as a consumer is a simple search interface, where you type in the word of the product you're looking for. And then, you see ranked suggestions for different items, you know, toasters, lamps, whatever, books I want to buy. Same thing for data. I can type in, you know, if I'm at the DOD, you know, information about aircraft, or information about, you know, drug discovery if I'm at GSK. And, I should be able to therefore see all of the different data sets that I have. And, that's true in almost any catalog, that you can do some search over the curated data sets there. With Alation in particular, what I can see is, who's using it, how are they using it, what are they joining it with, what results do they find in that process. And, that can really accelerate the pace of discovery. >> Go ahead. >> I'm sorry, Dave. To what degree can you automate some of that detail, like who's using it and what it's being used for. I mean, doesn't that rely on people curating the catalog? Or, to what degree can you automate that? >> Yeah, so it's a great question. I think, sometimes, there's a sense with AI or ML that it's like the computer is making the decisions or making things up. Which is, obviously, very scary. Usually, the training data comes from humans. So, our goal is to learn from humans in two ways. There's learning from humans where humans explicitly teach you. Somebody goes and says, "This is goal standard data versus this is, "you know, low quality data." And, they do that manually. But, there's also learning implicitly from people. So, in the same way on amazon.com, if I buy one item and then buy another, I'm doing that for my own purposes, but Amazon can do collaborative filtering over all of these trends and say, "You might want to buy this item." We can do a similar thing where we parse the query logs, parse the usage logs and be eye tools, and can basically watch what people are doing for their own purposes. Not to, you know, extra work on top of their job to help us. We can learn from that and make everybody more effective. >> Aaron, is data classification a part of all this? Again, when we started in the industry, data classification was a manual exercise. It's always been a challenge. Certainly, people have applied math to it. You've seen support vector machines and probabilistic latent cement tech indexing being used to classify data. Have we solved that problem, as an industry? Can you automate the classification of data on creation or use at this point in time? >> Well, one thing that came up in a few talks about AI and ML here is, regardless of the algorithm you're using, whether it's, you know, IFH or SVM, or something really modern and exciting that keeps learning. >> Stuff that's been around forever or, it's like you say, some new stuff, right. >> Yeah, you know, actually, I think it was said best by Michael Collins at the DOD, that data is more important than the algorithm because even the best algorithm is useless without really good training data. Plus, the algorithm's, kind of, everyone's got them. So, really often, training data is the limiting reactant in getting really good classification. One thing we try to do at Alation is create an upward spiral where maybe some data is curated manually, and then we can use that as a seed to make some suggestions about how to label other data. And then, it's easier to just do a confirm or deny of a guess than to actually manually label everything. So, then you get more training, get it faster, and it kind of accelerates that way instead of being a big burden. >> So, that's really the advancement in the last five to what, five, six years. Where you're able to use machine intelligence to, sort of, solve that problem as opposed to brute forcing it with some algorithm. Is that fair? >> Yeah, I think that's right, and I think what gets me very excited is when you can have these interactive loops where the human helps the computer, which helps the human. You get, again, this upward spiral. Instead of saying, "We have to have all of this, "you know, manual step done "before we even do the first step," or trying to have an algorithm brute force it without any human intervention. >> It's kind of like notes key mode on write, except it actually works. I'm just kidding to all my ADP friends. All right, Aaron, hey. Thanks very much for coming on theCUBE, but give your last word on the event. I think, is this your first one or no? >> This is our first time here. >> Yeah, okay. So, what are your thoughts? >> I think we'll be back. It's just so exciting to get people who are thinking really big about data but are also practitioners who are solving real business problems. And, just the exchange of ideas and best practices has been really inspiring for me. >> Yeah, that's great. >> Yeah. >> Well, thank you for the support of the event, and thanks for coming on theCUBE. It was great to see you again. >> Thanks Dave, thanks Paul. >> All right, you're welcome. >> Thank you, sir. >> All right, keep it right there, everybody. We'll be back with our next guest right after this short break. You're watching theCUBE from MIT CDOIQ. Be right back. (upbeat music)

Published Date : Aug 1 2019

SUMMARY :

brought to you by SiliconANGLE Media. Aaron, thanks for making the time to come on. and data is the new oil, and all this stuff. in the same way that oil powered the industrial age, And, the idea is, you know, water is very plentiful Well, we've certainly, at least in my opinion, Do you think we'll be fighting wars over data? So, you know, it's definitely a resource What kind of questions did you get? We have a lot of customers in energy, so that was cool. because the supply of water is finite. Maybe it is like the universe And, you know, the talk is more fun because you've a lot I actually had one person in the audience say So, chief data officer is a relatively Yeah, that's right, and the most fun thing I think what's interesting is, you know, And, they'll drag you into sales, But, we want to learn, you know, from the best, And so, you know, the goal of my function I want to talk to you about data catalogs a bit And, it's been validating to see, you know, "Which one should I pick? Yeah, I think one thing that's interesting is, you know, Or, the one stop shop where you could find everything. Data lake. when you put something into a warehouse, So, yeah, the lake is where we hide the stolen cars. But, I think we can do better a lake, right. But, to us we think, you know, So, technically, how does the catalog solve that problem? maybe let's go to the warehouse, for example. I can type in, you know, if I'm at the DOD, you know, Or, to what degree can you automate that? Not to, you know, extra work on top of their job to help us. Can you automate the classification of data whether it's, you know, IFH or SVM, or something it's like you say, some new stuff, right. Yeah, you know, actually, I think it was said best in the last five to what, five, six years. when you can have these interactive loops I'm just kidding to all my ADP friends. So, what are your thoughts? And, just the exchange of ideas It was great to see you again. We'll be back with our next guest

ENTITIES

Entity	Category	Confidence
Michael Collins	PERSON	0.99+
Paul Gillin	PERSON	0.99+
Paul	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Dave	PERSON	0.99+
2013	DATE	0.99+
Aaron Kalb	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Aaron	PERSON	0.99+
five	QUANTITY	0.99+
Department of Defense	ORGANIZATION	0.99+
six years	QUANTITY	0.99+
John Furrier	PERSON	0.99+
amazon.com	ORGANIZATION	0.99+
yesterday	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Alation	PERSON	0.99+
Alation	ORGANIZATION	0.99+
Gartner	ORGANIZATION	0.99+
one item	QUANTITY	0.99+
Cambridge, Massachusetts	LOCATION	0.99+
first step	QUANTITY	0.99+
last year	DATE	0.99+
GSK	ORGANIZATION	0.99+
both	QUANTITY	0.99+
DOD	ORGANIZATION	0.99+
one person	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
130 people	QUANTITY	0.98+
One	QUANTITY	0.98+
first time	QUANTITY	0.98+
MIT	ORGANIZATION	0.98+
one product	QUANTITY	0.97+
three years ago	DATE	0.97+
this week	DATE	0.97+
two	QUANTITY	0.97+
MIT CDOIQ	ORGANIZATION	0.96+
MIT Chief Data Officer and	EVENT	0.96+
one data catalog	QUANTITY	0.96+
each	QUANTITY	0.96+
each company	QUANTITY	0.95+
Both	QUANTITY	0.95+
one thing	QUANTITY	0.95+
first one	QUANTITY	0.94+
one catalog	QUANTITY	0.93+
two trends	QUANTITY	0.93+
theCUBE	ORGANIZATION	0.93+
first	QUANTITY	0.92+
first company	QUANTITY	0.92+
last couple years	DATE	0.92+
CDO	ORGANIZATION	0.91+
about a hundred	QUANTITY	0.91+
single shelf	QUANTITY	0.88+
about 250 people	QUANTITY	0.88+
single source	QUANTITY	0.87+
China	LOCATION	0.87+
2019	DATE	0.86+
Day two	QUANTITY	0.86+
one	QUANTITY	0.85+
each business unit	QUANTITY	0.82+
MIT CDOIQ	EVENT	0.79+
ADP	ORGANIZATION	0.79+
couple issues	QUANTITY	0.76+
Information Quality Symposium 2019	EVENT	0.76+
One thing	QUANTITY	0.7+
single division	QUANTITY	0.69+
one stop	QUANTITY	0.68+
Russia	LOCATION	0.64+
three	QUANTITY	0.61+
double	QUANTITY	0.59+
favorite	QUANTITY	0.5+
CDOIQ	EVENT	0.46+
Chief	PERSON	0.42+

Mark Ramsey, Ramsey International LLC | MIT CDOIQ 2019

>> From Cambridge, Massachusetts. It's theCUBE, covering MIT Chief Data Officer and Information Quality Symposium 2019. Brought to you by SiliconANGLE Media. >> Welcome back to Cambridge, Massachusetts, everybody. We're here at MIT, sweltering Cambridge, Massachusetts. You're watching theCUBE, the leader in live tech coverage, my name is Dave Vellante. I'm here with my co-host, Paul Gillin. Special coverage of the MITCDOIQ. The Chief Data Officer event, this is the 13th year of the event, we started seven years ago covering it, Mark Ramsey is here. He's the Chief Data and Analytics Officer Advisor at Ramsey International, LLC and former Chief Data Officer of GlaxoSmithKline. Big pharma, Mark, thanks for coming onto theCUBE. >> Thanks for having me. >> You're very welcome, fresh off the keynote. Fascinating keynote this evening, or this morning. Lot of interest here, tons of questions. And we have some as well, but let's start with your history in data. I sat down after 10 years, but I could have I could have stretched it to 20. I'll sit down with the young guns. But there was some folks in there with 30 plus year careers. How about you, what does your data journey look like? >> Well, my data journey, of course I was able to stand up for the whole time because I was in the front, but I actually started about 32, a little over 32 years ago and I was involved with building. What I always tell folks is that Data and Analytics has been a long journey, and the name has changed over the years, but we've been really trying to tackle the same problems of using data as a strategic asset. So when I started I was with an insurance and financial services company, building one of the first data warehouse environments in the insurance industry, and that was in the 87, 88 range, and then once I was able to deliver that, I ended up transitioning into being in consulting for IBM and basically spent 18 years with IBM in consulting and services. When I joined, the name had evolved from Data Warehousing to Business Intelligence and then over the years it was Master Data Management, Customer 360. Analytics and Optimization, Big Data. And then in 2013, I joined Samsung Mobile as their first Chief Data Officer. So, moving out of consulting, I really wanted to own the end-to-end delivery of advanced solutions in the Data Analytics space and so that made the transition to Samsung quite interesting, very much into consumer electronics, mobile phones, tablets and things of that nature, and then in 2015 I joined GSK as their first Chief Data Officer to deliver a Data Analytics solution. >> So you have long data history and Paul, Mark took us through. And you're right, Mark-o, it's a lot of the same narrative, same wine, new bottle but the technology's obviously changed. The opportunities are greater today. But you took us through Enterprise Data Warehouse which was ETL and then MAP and then Master Data Management which is kind of this mapping and abstraction layer, then an Enterprise Data Model, top-down. And then that all failed, so we turned to Governance which has been very very difficult and then you came up with another solution that we're going to dig into, but is it the same wine, new bottle from the industry? >> I think it has been over the last 20, 30 years, which is why I kind of did the experiment at the beginning of how long folks have been in the industry. I think that certainly, the technology has advanced, moving to reduction in the amount of schema that's required to move data so you can kind of move away from the map and move type of an approach of a data warehouse but it is tackling the same type of problems and like I said in the session it's a little bit like Einstein's phrase of doing the same thing over and over again and expecting a different answer is certainly the definition of insanity and what I really proposed at the session was let's come at this from a very different perspective. Let's actually use Data Analytics on the data to make it available for these purposes, and I do think I think it's a different wine now and so I think it's just now a matter of if folks can really take off and head that direction. >> What struck me about, you were ticking off some of the issues that have failed like Data Warehouses, I was surprised to hear you say Data Governance really hasn't worked because there's a lot of talk around that right now, but all of those are top-down initiatives, and what you did at GSK was really invert that model and go from the bottom up. What were some of the barriers that you had to face organizationally to get the cooperation of all these people in this different approach? >> Yeah, I think it's still key. It's not a complete bottoms up because then you do end up really just doing data for the sake of data, which is also something that's been tried and does not work. I think it has to be a balance and that's really striking that right balance of really tackling the data at full perspective but also making sure that you have very definitive use cases to deliver value for the organization and then striking the balance of how you do that and I think of the things that becomes a struggle is you're talking about very large breadth and any time you're covering multiple functions within a business it's getting the support of those different business functions and I think part of that is really around executive support and what that means, I did mention it in the session, that executive support to me is really stepping up and saying that the data across the organization is the organization's data. It isn't owned by a particular person or a particular scientist, and I think in a lot of organization, that gatekeeper mentality really does put barriers up to really tackling the full breadth of the data. >> So I had a question around digital initiatives. Everywhere you go, every C-level Executive is trying to get digital right, and a lot of this is top-down, a lot of it is big ideas and it's kind of the North Star. Do you think that that's the wrong approach? That maybe there should be a more tactical line of business alignment with that threaded leader as opposed to this big picture. We're going to change and transform our company, what are your thoughts? >> I think one of the struggles is just I'm not sure that organizations really have a good appreciation of what they mean when they talk about digital transformation. I think there's in most of the industries it is an initiative that's getting a lot of press within the organizations and folks want to go through digital transformation but in some cases that means having a more interactive experience with consumers and it's maybe through sensors or different ways to capture data but if they haven't solved the data problem it just becomes another source of data that we're going to mismanage and so I do think there's a risk that we're going to see the same outcome from digital that we have when folks have tried other approaches to integrate information, and if you don't solve the basic blocking and tackling having data that has higher velocity and more granularity, if you're not able to solve that because you haven't tackled the bigger problem, I'm not sure it's going to have the impact that folks really expect. >> You mentioned that at GSK you collected 15 petabytes of data of which only one petabyte was structured. So you had to make sense of all that unstructured data. What did you learn about that process? About how to unlock value from unstructured data as a result of that? >> Yeah, and I think this is something. I think it's extremely important in the unstructured data to apply advanced analytics against the data to go through a process of making sense of that information and a lot of folks talk about or have talked about historically around text mining of trying to extract an entity out of unstructured data and using that for the value. There's a few steps before you even get to that point, and first of all it's classifying the information to understand which documents do you care about and which documents do you not care about and I always use the story that in this vast amount of documents there's going to be, somebody has probably uploaded the cafeteria menu from 10 years ago. That has no scientific value, whereas a protocol document for a clinical trial has significant value, you don't want to look through manually a billion documents to separate those, so you have to apply the technology even in that first step of classification, and then there's a number of steps that ultimately lead you to understanding the relationship of the knowledge that's in the documents. >> Side question on that, so you had discussed okay, if it's a menu, get rid of it but there's certain restrictions where you got to keep data for decades. It struck me, what about work in process? Especially in the pharmaceutical industry. I mean, post Federal Rules of Civil Procedure was everybody looking for a smoking gun. So, how are organizations dealing with what to keep and what to get rid of? >> Yeah, and I think certainly the thinking has been to remove the excess and it's to your point, how do you draw the line as to what is excess, right, so you don't want to just keep every document because then if an organization is involved in any type of litigation and there's disclosure requirements, you don't want to have to have thousands of documents. At the same time, there are requirements and so it's like a lot of things. It's figuring out how do you abide by the requirements, but that is not an easy thing to do, and it really is another driver, certainly document retention has been a big thing over a number of years but I think people have not applied advanced analytics to the level that they can to really help support that. >> Another Einstein bro-mahd, you know. Keep everything you must but no more. So, you put forth a proposal where you basically had this sort of three approaches, well, combined three approaches. The crawlers to go, the spiders to go out and do the discovery and I presume that's where the classification is done? >> That's really the identification of all of the source information >> Okay, so find out what you got, okay. >> so that's kind of the start. Find out what you have. >> Step two is the data repository. Putting that in, I thought it was when I heard you I said okay it must be a logical data repository, but you said you basically told the CIO we're copying all the data and putting it into essentially one place. >> A physical location, yes. >> Okay, and then so I got another question about that and then use bots in the pipeline to move the data and then you sort of drew the diagram of the back end to all the databases. Unstructured, structured, and then all the fun stuff up front, visualization. >> Which people love to focus on the fun stuff, right? Especially, you can't tell how many articles are on you got to apply deep learning and machine learning and that's where the answers are, we have to have the data and that's the piece that people are missing. >> So, my question there is you had this tactical mindset, it seems like you picked a good workload, the clinical trials and you had at least conceptually a good chance of success. Is that a fair statement? >> Well, the clinical trials was one aspect. Again, we tackled the entire data landscape. So it was all of the data across all of R&D. It wasn't limited to just, that's that top down and bottom up, so the bottom up is tackle everything in the landscape. The top down is what's important to the organization for decision making. >> So, that's actually the entire R&D application portfolio. >> Both internal and external. >> So my follow up question there is so that largely was kind of an inside the four walls of GSK, workload or not necessarily. My question was what about, you hear about these emerging Edge applications, and that's got to be a nightmare for what you described. In other words, putting all the data into one physical place, so it must be like a snake swallowing a basketball. Thoughts on that? >> I think some of it really does depend on you're always going to have these, IOT is another example where it's a large amount of streaming information, and so I'm not proposing that all data in every format in every location needs to be centralized and homogenized, I think you have to add some intelligence on top of that but certainly from an edge perspective or an IOT perspective or sensors. The data that you want to then make decisions around, so you're probably going to have a filter level that will impact those things coming in, then you filter it down to where you're going to really want to make decisions on that and then that comes together with the other-- >> So it's a prioritization exercise, and that presumably can be automated. >> Right, but I think we always have these cases where we can say well what about this case, and you know I guess what I'm saying is I've not seen organizations tackle their own data landscape challenges and really do it in an aggressive way to get value out of the data that's within their four walls. It's always like I mentioned in the keynote. It's always let's do a very small proof of concept, let's take a very narrow chunk. And what ultimately ends up happening is that becomes the only solution they build and then they go to another area and they build another solution and that's why we end up with 15 or 25-- (all talk over each other) >> The conventional wisdom is you start small. >> And fail. >> And you go on from there, you fail and that's now how you get big things done. >> Well that's not how you support analytic algorithms like machine learning and deep learning. You can't feed those just fragmented data of one aspect of your business and expect it to learn intelligent things to then make recommendations, you've got to have a much broader perspective. >> I want to ask you about one statistic you shared. You found 26 thousand relational database schemas for capturing experimental data and you standardized those into one. How? >> Yeah, I mean we took advantage of the Tamr technology that Michael Stonebraker created here at MIT a number of years ago which is really, again, it's applying advanced analytics to the data and using the content of the data and the characteristics of the data to go from dispersed schemas into a unified schema. So if you look across 26 thousand schemas using machine learning, you then can understand what's the consolidated view that gives you one perspective across all of those different schemas, 'cause ultimately when you give people flexibility they love to take advantage of it but it doesn't mean that they're actually doing things in an extremely different way, 'cause ultimately they're capturing the same kind of data. They're just calling things different names and they might be using different formats but in that particular case we use Tamr very heavily, and that again is back to my example of using advanced analytics on the data to make it available to do the fun stuff. The visualization and the advanced analytics. >> So Mark, the last question is you well know that the CDO role emerged in these highly regulated industries and I guess in the case of pharma quasi-regulated industries but now it seems to be permeating all industries. We have Goka-lan from McDonald's and virtually every industry is at least thinking about this role or has some kind of de facto CDO, so if you were slotted in to a CDO role, let's make it generic. I know it depends on the industry but where do you start as a CDO for an organization large company that doesn't have a CDO. Even a mid-sized organization, where do you start? >> Yeah, I mean my approach is that a true CDO is maximizing the strategic value of data within the organization. It isn't a regulatory requirement. I know a lot of the banks started there 'cause they needed someone to be responsible for data quality and data privacy but for me the most critical thing is understanding the strategic objectives of the organization and how will data be used differently in the future to drive decisions and actions and the effectiveness of the business. In some cases, there was a lot of discussion around monetizing the value of data. People immediately took that to can we sell our data and make money as a different revenue stream, I'm not a proponent of that. It's internally monetizing your data. How do you triple the size of the business by using data as a strategic advantage and how do you change the executives so what is good enough today is not good enough tomorrow because they are really focused on using data as their decision making tool, and that to me is the difference that a CDO needs to make is really using data to drive those strategic decision points. >> And that nuance you mentioned I think is really important. Inderpal Bhandari, who is the Chief Data Officer of IBM often says how can you monetize the data and you're right, I don't think he means selling data, it's how does data contribute, if I could rephrase what you said, contribute to the value of the organization, that can be cutting costs, that can be driving new revenue streams, that could be saving lives if you're a hospital, improving productivity. >> Yeah, and I think what I've shared typically shared with executives when I've been in the CDO role is that they need to change their behavior, right? If a CDO comes in to an organization and a year later, the executives are still making decisions on the same data PowerPoints with spinning logos and they said ooh, we've got to have 'em. If they're still making decisions that way then the CDO has not been successful. The executives have to change what their level of expectation is in order to make a decision. >> Change agents, top down, bottom up, last question. >> Going back to GSK, now that they've completed this massive data consolidation project how are things different for that business? >> Yeah, I mean you look how Barron joined as the President of R&D about a year and a half ago and his primary focus is using data and analytics and machine learning to drive the decision making in the discovery of a new medicine and the environment that has been created is a key component to that strategic initiative and so they are actually completely changing the way they're selecting new targets for new medicines based on data and analytics. >> Mark, thanks so much for coming on theCUBE. >> Thanks for having me. >> Great keynote this morning, you're welcome. All right, keep it right there everybody. We'll be back with our next guest. This is theCUBE, Dave Vellante with Paul Gillin. Be right back from MIT. (upbeat music)

Published Date : Jul 31 2019

SUMMARY :

Brought to you by SiliconANGLE Media. Special coverage of the MITCDOIQ. I could have stretched it to 20. and so that made the transition to Samsung and then you came up with another solution on the data to make it available some of the issues that have failed striking the balance of how you do that and it's kind of the North Star. the bigger problem, I'm not sure it's going to You mentioned that at GSK you against the data to go through a process of Especially in the pharmaceutical industry. as to what is excess, right, so you and do the discovery and I presume Okay, so find out what you so that's kind of the start. all the data and putting it into essentially one place. and then you sort of drew the diagram of and that's the piece that people are missing. So, my question there is you had this Well, the clinical trials was one aspect. My question was what about, you hear about these and homogenized, I think you have to exercise, and that presumably can be automated. and then they go to another area and that's now how you get big things done. Well that's not how you support analytic and you standardized those into one. on the data to make it available to do the fun stuff. and I guess in the case of pharma the difference that a CDO needs to make is of the organization, that can be Yeah, and I think what I've shared and the environment that has been created This is theCUBE, Dave Vellante with Paul Gillin.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Paul Gillin	PERSON	0.99+
Mark	PERSON	0.99+
Mark Ramsey	PERSON	0.99+
15 petabytes	QUANTITY	0.99+
Samsung	ORGANIZATION	0.99+
Inderpal Bhandari	PERSON	0.99+
Michael Stonebraker	PERSON	0.99+
2013	DATE	0.99+
Paul	PERSON	0.99+
GlaxoSmithKline	ORGANIZATION	0.99+
Barron	PERSON	0.99+
Ramsey International, LLC	ORGANIZATION	0.99+
26 thousand schemas	QUANTITY	0.99+
GSK	ORGANIZATION	0.99+
18 years	QUANTITY	0.99+
2015	DATE	0.99+
thousands	QUANTITY	0.99+
Einstein	PERSON	0.99+
Cambridge, Massachusetts	LOCATION	0.99+
tomorrow	DATE	0.99+
Samsung Mobile	ORGANIZATION	0.99+
26 thousand	QUANTITY	0.99+
Ramsey International LLC	ORGANIZATION	0.99+
30 plus year	QUANTITY	0.99+
a year later	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Federal Rules of Civil Procedure	TITLE	0.99+
20	QUANTITY	0.99+
25	QUANTITY	0.99+
Both	QUANTITY	0.99+
first step	QUANTITY	0.99+
one petabyte	QUANTITY	0.98+
today	DATE	0.98+
15	QUANTITY	0.98+
one	QUANTITY	0.98+
three approaches	QUANTITY	0.98+
13th year	QUANTITY	0.98+
one aspect	QUANTITY	0.97+
MIT	ORGANIZATION	0.97+
seven years ago	DATE	0.97+
McDonald's	ORGANIZATION	0.96+
MIT Chief Data Officer and	EVENT	0.95+
R&D	ORGANIZATION	0.95+
10 years ago	DATE	0.95+
this morning	DATE	0.94+
this evening	DATE	0.93+
one place	QUANTITY	0.93+
one perspective	QUANTITY	0.92+
about a year and a half ago	DATE	0.91+
over 32 years ago	DATE	0.9+
a lot of talk	QUANTITY	0.9+
a billion documents	QUANTITY	0.9+
CDO	TITLE	0.89+
decades	QUANTITY	0.88+
one statistic	QUANTITY	0.87+
2019	DATE	0.85+
first data	QUANTITY	0.84+
of years ago	DATE	0.83+
Step two	QUANTITY	0.8+
Tamr	OTHER	0.77+
Information Quality Symposium 2019	EVENT	0.77+
PowerPoints	TITLE	0.76+
documents	QUANTITY	0.75+
theCUBE	ORGANIZATION	0.75+
one physical	QUANTITY	0.73+
10 years	QUANTITY	0.72+
87, 88 range	QUANTITY	0.71+
President	PERSON	0.7+
Chief Data Officer	PERSON	0.7+
Enterprise Data Warehouse	ORGANIZATION	0.66+
Goka-lan	ORGANIZATION	0.66+
first Chief Data	QUANTITY	0.63+
first Chief Data Officer	QUANTITY	0.63+
Edge	TITLE	0.63+
tons	QUANTITY	0.62+

Keynote Analysis | MIT CDOIQ 2019

>> From Cambridge, Massachusetts, it's The Cube! Covering MIT Chief Data Officer and Information Qualities Symposium 2019. Brought to you by SiliconANGLE Media. >> Welcome to Cambridge, Massachusetts everybody. You're watching The Cube, the leader in live tech coverage. My name is Dave Vellante and I'm here with my cohost Paul Gillin. And we're covering the 13th annual MIT CDOIQ conference. The Cube first started here in 2013 when the whole industry Paul, this segment of the industry was kind of moving out of the ashes of the compliance world and the data quality world and kind of that back office role, and it had this tailwind of the so called big data movement behind it. And the Chief Data Officer was emerging very strongly within as we've talked about many times in theCube, within highly regulated industries like financial services and government and healthcare and now we're seeing data professionals from all industries join this symposium at MIT as I say 13th year, and we're now seeing a lot of discussion about not only the role of the Chief Data Officer, but some of what we heard this morning from Mark Ramsey some of the failures along the way of all these north star data initiatives, and kind of what to do about it. So this conference brings together several hundred practitioners and we're going to be here for two days just unpacking all the discussions the major trends that touch on data. The data revolution, whether it's digital transformation, privacy, security, blockchain and the like. Now Paul, you've been involved in this conference for a number of years, and you've seen it evolve. You've seen that chief data officer role both emerge from the back office into a c-level executive role, and now spanning a very wide scope of responsibilities. Your thoughts? >> It's been like being part of a soap opera for the last eight years that I've been part of this conference because as you said Dave, we've gone through all of these transitions. In the early days this conference actually started as an information qualities symposium. It has evolved to become about chief data officer and really about the data as an asset to the organization. And I thought that the presentation we saw this morning, Mark Ramsey's talk, we're going to have him on later, very interesting about what they did at GlaxoSmithKline to get their arms around all of the data within that organization. Now a project like that would've unthinkable five years ago, but we've seen all of these new technologies come on board, essentially they've created a massive search engine for all of their data. We're seeing organizations beginning to get their arms around this massive problem. And along the way I say it's a soap opera because along the way we've seen failure after failure, we heard from Mark this morning that data governance is a failure too. That was news to me! All of these promising initiatives that have started and fallen flat or failed to live up to their potential, the chief data officer role has emerged out of that to finally try to get beyond these failures and really get their arms around that organizational data and it's a huge project, and it's something that we're beginning to see some organization succeed at. >> So let's talk a little bit about the role. So the chief data officer in many ways has taken a lot of the heat off the chief information officer, right? It used to be CIO stood for career is over. Well, when you throw all the data problems at an individual c-level executive, that really is a huge challenge. And so, with the cloud it's created opportunities for CIOs to actually unburden themselves of some of the crapplications and actually focus on some of the mission critical stuff that they've always been really strong at and focus their budgets there. But the chief data officer has had somewhat of an unclear scope. Different organizations have different roles and responsibilities. And there's overlap with the chief digital officer. There's a lot of emphasis on monetization whether that's increasing revenue or cutting costs. And as we heard today from the keynote speaker Mark Ramsey, a lot of the data initiatives have failed. So what's your take on that role and its viability and its longterm staying power? >> I think it's coming together. I think last year we saw the first evidence of that. I talked to a number of CDOs last year as well as some of the analysts who were at this conference, and there was pretty good clarity beginning to emerge about what they chief data officer role stood for. I think a lot of what has driven this is this digital transformation, the hot buzz word of 2019. The foundation of digital transformation is a data oriented culture. It's structuring the entire organization around data, and when you get to that point when an organization is ready to do that, then the role of the CDO I think becomes crystal clear. It's not so much just an extract transform load discipline. It's not just technology, it's not just governance. It really is getting that data, pulling that data together and putting it at the center of the organization. That's the value that the CDO can provide, I think organizations are coming around to that. >> Yeah and so we've seen over the last 10 years the decrease, the rapid decrease in cost, the cost of storage. Microprocessor performance we've talked about endlessly. And now you've got the machine intelligence piece layering in. In the early days Hadoop was the hot tech, and interesting now nobody talks even about Hadoop. Rarely. >> Yet it was discussed this morning. >> It was mentioned today. It is a fundamental component of infrastructures. >> Yeah. >> But what it did is it dramatically lowered the cost of storing data, and allowing people to leave data in place. The old adage of ship a five megabytes of code to a petabyte of data versus the reverse. Although we did hear today from Mark Ramsey that they copied all the data into a centralized location so I got some questions on that. But the point I want to make is that was really early days. We're now entered an era and it's underscored by if you look at the top five companies in terms of market cap in the US stock market, obviously Microsoft is now over a trillion. Microsoft, Apple, Amazon, Google and Facebook. Top five. They're data companies, their assets are all data driven. They've surpassed the banks, the energy companies, of course any manufacturing automobile companies, et cetera, et cetera. So they're data companies, and they're wrestling with big issues around security. You can't help but open the paper and see issues on security. Yesterday was the big Capital One. The Equifax issue was resolved in terms of the settlement this week, et cetera, et cetera. Facebook struggling mightily with whether or not how to deal fake news, how to deal with deep fakes. Recently it shut down likes for many Instagram accounts in some countries because they're trying to protect young people who are addicted to this. Well, they need to shut down likes for business accounts. So what kids are doing is they're moving over to the business Instagram accounts. Well when that happens, it exposes their emails automatically so they've all kinds of privacy landmines and people don't know how to deal with them. So this data explosion, while there's a lot of energy and excitement around it, brings together a lot of really sticky issues. And that falls right in the lap of the chief data officer, doesn't it? >> We're in uncharted territory and all of the examples you used are problems that we couldn't have foreseen, those companies couldn't have foreseen. A problem may be created but then the person who suffers from that problem changes their behavior and it creates new problems as you point out with kids shifting where they're going to communicate with each other. So these are all uncharted waters and I think it's got to be scary if you're a company that does have large amounts of consumer data in particular, consumer packaged goods companies for example, you're looking at what's happening to these big companies and these data breaches and you know that you're sitting on a lot of customer data yourself, and that's scary. So we may see some backlash to this from companies that were all bought in to the idea of the 360 degree customer view and having these robust data sources about each one of your customers. Turns out now that that's kind of a dangerous place to be. But to your point, these are data companies, the companies that business people look up to now, that they emulate, are companies that have data at their core. And that's not going to change, and that's certainly got to be good for the role of the CDO. >> I've often said that the enterprise data warehouse failed to live up to its expectations and its promises. And Sarbanes-Oxley basically saved EDW because reporting became a critical component post Enron. Mark Ramsey talked today about EDW failing, master data management failing as kind of a mapping and masking exercise. The enterprise data model which was a top down push for a sort of distraction layer, that failed. You had all these failures and so we turned to governance. That failed. And so you've had this series of issues. >> Let me just point out, what do all those have in common? They're all top down. >> Right. >> All top down initiatives. And what Glaxo did is turn that model on its head and left the data where it was. Went and discovered it and figured it out without actually messing with the data. That may be the difference that changes the game. >> Yeah and it's prescription was basically taking a tactical approach to that problem, start small, get quick hits. And then I think they selected a workload that was appropriate for solving this problem which was clinical trials. And I have some questions for him. And of the big things that struck me is the edge. So as you see a new emerging data coming out of the edge, how are organizations going to deal with that? Because I think a lot of what he was talking about was a lot of legacy on-prem systems and data. Think about JEDI, a story we've been following on SiliconANGLE the joint enterprise defense infrastructure. This is all about the DOD basically becoming cloud enabled. So getting data out into the field during wartime fast. We're talking about satellite data, you're talking about telemetry, analytics, AI data. A lot of distributed data at the edge bringing new challenges to how organizations are going to deal with data problems. It's a whole new realm of complexity. >> And you talk about security issues. When you have a lot of data at the edge and you're sending data to the edge, you're bringing it back in from the edge, every device in the middle is from the smart thermostat. at the edge all the way up to the cloud is a potential failure point, a potential vulnerability point. These are uncharted waters, right? We haven't had to do this on a large scale. Organizations like the DOD are going to be the ones that are going to be the leaders in figuring this out because they are so aggressive. They have such an aggressive infrastructure and place. >> The other question I had, striking question listening to Mark Ramsey this morning. Again Mark Ramsey was former data God at GSK, GlaxoSmithKline now a consultant. We're going to hear from a number of folks like him and chief data officers. But he basically kind of poopooed, he used the example of build it and they will come. You know the Kevin Costner movie Field of Dreams. Don't go after the field of dreams. So my question is, and I wonder if you can weigh in on this is, everywhere we go we hear about digital transformation. They have these big digital transformation projects, they generally are top down. Every CEO wants to get digital right. Is that the wrong approach? I want to ask Mark Ramsey that. Are they doing field of dreams type stuff? Is it going to be yet another failure of traditional legacy systems to try to compete with cloud native and born in data era companies? >> Well he mentioned this morning that the research is already showing that digital transformation most initiatives are failing. Largely because of cultural reasons not technical reasons, and I think Ramsey underscored that point this morning. It's interesting that he led off by mentioning business process reengineering which you remember was a big fad in the 1990s, companies threw billions of dollars at trying to reinvent themselves and most of them failed. Is digital transformation headed down the same path? I think so. And not because the technology isn't there, it's because creating a culture where you can break down these silos and you can get everyone oriented around a single view of the organizations data. The bigger the organization the less likely that is to happen. So what does that mean for the CDO? Well, chief information officer at one point we said the CIO stood for career is over. I wonder if there'll be a corresponding analogy for the CDOs at some of these big organizations when it becomes obvious that pulling all that data together is just not feasible. It sounds like they've done something remarkable at GSK, maybe we'll learn from that example. But not all organizations have the executive support, which was critical to what they did, or just the organizational will to organize themselves around that central data storm. >> And I also said before I think the CDO is taking a lot of heat off the CIO and again my inference was the GSK use case and workload was actually quite narrow in clinical trials and was well suited to success. So my takeaway in this, if I were CDO what I would be doing is trying to figure out okay how does data contribute to the monetization of my organization? Maybe not directly selling the data, but what data do I have that's valuable and how can I monetize that in terms of either saving money, supply chain, logistics, et cetera, et cetera, or making money? Some kind of new revenue opportunity. And I would super glue myself for the line of business executive and go after a small hit. You're talking about digital transformations being top down and largely failing. Shadow digital transformations is maybe the answer to that. Aligning with a line of business, focusing on a very narrow use case, and building successes up that way using data as the ingredient to drive value. >> And big ideas. I recently wrote about Experian which launched a service last called Boost that enables the consumers to actually impact their own credit scores by giving Experian access to their bank accounts to see that they are at better credit risk than maybe portrayed in the credit store. And something like 600,000 people signed up in the first six months of this service. That's an example I think of using inspiration, creating new ideas about how data can be applied And in the process by the way, Experian gains data that they can use in other context to better understand their consumer customers. >> So digital meets data. Data is not the new oil, data is more valuable than oil because you can use it multiple times. The same data can be put in your car or in your house. >> Wish we could do that with the oil. >> You can't do that with oil. So what does that mean? That means it creates more data, more complexity, more security risks, more privacy risks, more compliance complexity, but yet at the same time more opportunities. So we'll be breaking that down all day, Paul and myself. Two days of coverage here at MIT, hashtag MITCDOIQ. You're watching The Cube, we'll be right back right after this short break. (upbeat music)

Published Date : Jul 31 2019

SUMMARY :

and Information Qualities Symposium 2019. and the data quality world and really about the data as an asset to the organization. and actually focus on some of the mission critical stuff and putting it at the center of the organization. In the early days Hadoop was the hot tech, It is a fundamental component of infrastructures. And that falls right in the lap of and all of the examples you used I've often said that the enterprise data warehouse what do all those have in common? and left the data where it was. And of the big things that struck me is the edge. Organizations like the DOD are going to be the ones Is that the wrong approach? the less likely that is to happen. and how can I monetize that in terms of either saving money, that enables the consumers to actually Data is not the new oil, You can't do that with oil.

ENTITIES

Entity	Category	Confidence
Mark Ramsey	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Paul	PERSON	0.99+
Apple	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
Paul Gillin	PERSON	0.99+
Google	ORGANIZATION	0.99+
2013	DATE	0.99+
Ramsey	PERSON	0.99+
Kevin Costner	PERSON	0.99+
Enron	ORGANIZATION	0.99+
last year	DATE	0.99+
DOD	ORGANIZATION	0.99+
Experian	ORGANIZATION	0.99+
2019	DATE	0.99+
GlaxoSmithKline	ORGANIZATION	0.99+
Dave	PERSON	0.99+
GSK	ORGANIZATION	0.99+
Glaxo	ORGANIZATION	0.99+
Two days	QUANTITY	0.99+
five megabytes	QUANTITY	0.99+
360 degree	QUANTITY	0.99+
two days	QUANTITY	0.99+
today	DATE	0.99+
Cambridge, Massachusetts	LOCATION	0.99+
Field of Dreams	TITLE	0.99+
billions of dollars	QUANTITY	0.99+
Mark	PERSON	0.99+
Equifax	ORGANIZATION	0.99+
Yesterday	DATE	0.99+
over a trillion	QUANTITY	0.99+
1990s	DATE	0.98+
600,000 people	QUANTITY	0.98+
US	LOCATION	0.98+
this week	DATE	0.98+
SiliconANGLE Media	ORGANIZATION	0.98+
first six months	QUANTITY	0.98+
Instagram	ORGANIZATION	0.98+
The Cube	TITLE	0.98+
five years ago	DATE	0.97+
Capital One	ORGANIZATION	0.96+
first evidence	QUANTITY	0.96+
both	QUANTITY	0.96+
first	QUANTITY	0.95+
MIT	ORGANIZATION	0.93+
this morning	DATE	0.91+
Hadoop	TITLE	0.88+
one point	QUANTITY	0.87+
13th year	QUANTITY	0.86+
MIT CDOIQ conference	EVENT	0.84+
MITCDOIQ	TITLE	0.84+
each one	QUANTITY	0.82+
hundred practitioners	QUANTITY	0.82+
EDW	ORGANIZATION	0.81+
last eight years	DATE	0.81+
MIT Chief Data Officer and	EVENT	0.81+
Sarbanes-Oxley	PERSON	0.8+
top five companies	QUANTITY	0.78+
The Cube	ORGANIZATION	0.75+
Top five	QUANTITY	0.74+
single view	QUANTITY	0.7+
last 10 years	DATE	0.69+
Boost	TITLE	0.68+
a petabyte of data	QUANTITY	0.65+
EDW	TITLE	0.64+
SiliconANGLE	ORGANIZATION	0.64+

Paul Appleby, Kinetica | Big Data SV 2018

>> Announcer: From San Jose, it's theCUBE. (upbeat music) Presenting Big Data, Silicon Valley, brought to you by Silicon Angle Media and its ecosystem partners. >> Welcome back to theCUBE. We are live on our first day of coverage of our event, Big Data SV. This is our tenth Big Data event. We've done five here in Silicon Valley. We also do them in New York City in the fall. We have a great day of coverage. We're next to where the Startup Data conference is going on at Forger Tasting Room and Eatery. Come on down, be part of our audience. We also have a great party tonight where you can network with some of our experts and analysts. And tomorrow morning, we've got a breakfast briefing. I'm Lisa Martin with my co-host, Peter Burris, and we're excited to welcome to theCUBE for the first time the CEO of Kinetica, Paul Appleby. Hey Paul, welcome. >> Hey, thanks, it's great to be here. >> We're excited to have you here, and I saw something marketer, and terms, I grasp onto them. Kinetica is the insight engine for the extreme data economy. What is the extreme data economy, and what are you guys doing to drive insight from it? >> Wow, how do I put that in a snapshot? Let me share with you my thoughts on this because the fundamental principals around data have changed. You know, in the past, our businesses are really validated around data. We reported out how our business performed. We reported to our regulators. Over time, we drove insights from our data. But today, in this kind of extreme data world, in this world of digital business, our businesses need to be powered by data. >> So what are the, let me task this on you, so one of the ways that we think about it is that data has become an asset. >> Paul: Oh yeah. >> It's become an asset. But now, the business has to care for, has to define it, care for it, feed it, continue to invest in it, find new ways of using it. Is that kind of what you're suggesting companies to think about? >> Absolutely what we're saying. I mean, if you think about what Angela Merkel said at the World Economic Forum earlier this year, that she saw data as the raw material of the 21st century. And talking about about Germany fundamentally shifting from being an engineering, manufacturing centric economy to a data centric economy. So this is not just about data powering our businesses, this is about data powering our economies. >> So let me build on that if I may because I think it gets to what, in many respects Kinetica's Core Value proposition is. And that is, is that data is a different type of an asset. Most assets are characterized by, you apply it here, or you apply it there. You can't apply it in both places at the same time. And it's one of the misnomers of the notion of data as fuels. Because fuel is still an asset that has certain specificities, you can't apply it to multiple places. >> Absolutely. >> But data, you can, which means that you can copy it, you can share it. You can combine it in interesting ways. But that means that the ... to use data as an asset, especially given the velocity and the volume that we're talking about, you need new types of technologies that are capable of sustaining the quality of that data while making it possible to share it to all the different applications. Have I got that right? And what does Kinetica do in that regard? >> You absolutely nailed it because what you talked about is a shift from predictability associated with data, to unpredictability. We actually don't know the use cases that we're going to leverage for our data moving forward, but we understand how valuable an asset it is. And I'll give you two examples of that. There's a company here, based in the Bay Area, a really cool company called Liquid Robotics. And they build these autonomous aquatic robots. And they've carried a vast array of senses and now we're collecting data. And of course, that's hugely powerful to oil and gas exploration, to research, to shipping companies, etc. etc. etc. Even homeland security applications. But what they did, they were selling the robots, and what they realized over time is that the value of their business wasn't the robots. It was the data. And that one piece of data has a totally different meaning to a shipping company than it does to a fisheries companies. But they could sell that exact same piece of data to multiple companies. Now, of course, their business has grown on in Scaldon. I think they were acquired by Bowing. But what you're talking about is exactly where Kinetica sits. It's an engine that allows you to deal with the unpredictability of data. Not only the sources of data, but the uses of data, and enables you to do that in real time. >> So Kinetica's technology was actually developed to meet some intelligence needs of the US Army. My dad was a former army ranger airborne. So tell us a little bit about that and kind of the genesis of the technology. >> Yeah, it's a fascinating use case if you think about it, where we're all concerned, globally, about cyber threat. We're all concerned about terrorist threats. But how do you identity terrorist threats in real time? And the only way to do that is to actually consume vast amount of data, whether it's drone footage, or traffic cameras. Whether it's mobile phone data or social data. but the ability to stream all of those sources of data and conduct analytics on that in real time was, really, the genesis of this business. It was a research project with the army and the NSA that was aimed at identifying terrorist threats in real time. >> But at the same time, you not only have to be able to stream all the data in and do analytics on it, you also have to have interfaces and understandable approaches to acquiring the data, because I have a background, some background in that as well, to then be able to target the threat. So you have to be able to get the data in and analyze it, but also get it out to where it needs to be so an action can be taken. >> Yeah, and there are two big issues there. One issue is the inter-offer ability of the platform and the ability for you to not only consume data in real time from multiple sources, but to push that out to a variety of platforms in real time. That's one thing. The other thing is to understand that in this world that we're talking about today, there are multiple personas that want to consume that data, and many of them are not data scientists. They're not IT people, they're business people. They could be executives, or they could be field operatives in the case of intelligence. So you need to be able to push this data out in real time onto platforms that they consume, whether it's via mobile devices or any other device for that matter. >> But you also have to be able to build applications on it, right? >> Yeah, absolutely. >> So how does Kinetica facilitate that process? Because it looks more like a database, which is, which is, it's more than that, but it satisfies some of those conventions so developers have an afinity for it. >> Absolutely, so in the first instance, we provide tools ourselves for people to consume that data and to leverage the power of that data in real time in an incredibly visual way with a geospatial platform. But we also create the ability for a, to interface with really commonly used tools, because the whole idea, if you think about providing some sort of ubiquitous access to the platform, the easiest way to do that is to provide that through tools that people are used to using, whether that's something like Tablo, for example, or Esri, if you want to talk about geospatial data. So the first instance, it's actually providing access, in real time, through platforms that people are used to using. And then, of course, by building our technology in a really, really open framework with a broadly published set of APIs, we're able to support, not only the ability for our customers to build applications on that platform, and it could well be applications associated with autonomous vehicles. It could well be applications associated with Smart City. We're doing some incredible things with some of the bigger cities on the planet and leveraging the power of big data to optimize transportation, for example, in the city of London. It's those sorts of things that we're able to do with the platform. So it's not just about a database platform or an insights engine for dealing with these complex, vast amounts of data, but also the tools that allow you to visualize and utilize that data. >> Turn that data into an action. >> Yeah, because the data is useless until you're doing something with it. And that's really, if you think about the promise of things like smart grid. Collecting all of that data from all of those smart sensors is absolutely useless until you take an action that is meaningful for a consumer or meaningful in terms of the generational consumption of power. >> So Paul, as the CEO, when you're talking to customers, we talk about chief data officer, chief information officer, chief information security officer, there's a lot, data scientist engineers, there's just so many stakeholders that need access to the data. As businesses transform, there's new business models that can come into development if, like you were saying, the data is evaluated and it's meaningful. What are the conversations that you're having, I guess I'm curious, maybe, which personas are the table (Paul laughs) when you're talking about the business values that this technology can deliver? >> Yeah, that's a really, really good question because the truth is, there are multiple personas at the table. Now, we, in the technology industry, are quite often guilty of only talking to the technology personas. But as I've traveled around the world, whether I'm meeting with the world's biggest banks, the world's biggest Telco's, the world's biggest auto manufacturers, the people we meet, more often than not, are the business leaders. And they're looking for ways to solve complex problems. How do you bring the connected card alive? How do you really bring it to life? One car traveling around the city for a full day generates a terabyte of data. So what does that really mean when we start to connect the billions of cars that are in the marketplace in the framework of connected car, and then, ultimately, in a world of autonomous vehicles? So, for us, we're trying to navigate an interesting path. We're dragging the narrative out of just a technology-based narrative speeds and feeds, algorithms, and APIs, into a narrative about, well what does it mean for the pharmaceutical industry, for example? Because when you talk to pharmaceutical executives, the holy grail for the pharma industry is, how do we bring new and compelling medicines to market faster? Because the biggest challenge for them is the cycle times to bring new drugs to market. So we're helping companies like GSK shorten the cycle times to bring drugs to market. So they're the kinds of conversations that we're having. It's really about how we're taking data to power a transformational initiative in retail banking, in retail, in Telco, in pharma, rather than a conversation about the role of technology. Now, we always needs to deal with the technologists. We need to deal with the data scientists and the IT executives, and that's an important part of the conversation. But you would have seen, in recent times, the conversation that we're trying to have is far more of a business conversation. >> So if I can build on that. So do you think, in your experience, and recognizing that you have a data management tool with some other tools that helps people use the data that gets into Kinetica, are we going to see the population of data scientists increase fast enough so our executives don't have to become familiar with this new way of thinking, or are executives going to actually adopt some of these new ways of thinking about the problem from a data risk perspective? I know which way I think. >> Paul: Wow, >> Which way do you think? >> It's a loaded question, but I think if we're going to be in a world where business is powered by data, where our strategy is driven by data, our investment decisions are driven by data, and the new areas of business that we explored to creat new paths to value are driven by data, we have to make data more accessible. And if what you need to get access to the data is a whole team of data scientists, it kind of creates a barrier. I'm not knocking data scientists, but it does create a barrier. >> It limits the aperture. >> Absolutely, because every company I talk to says, "Our biggest challenge is, we can't get access to the data scientists that we need." So a big part of our strategy from the get go was to actually build a platform with all of these personas in mind, so it is built on this standard principle, the common principles of a relational database, that you're built around anti-standard sequel. >> Peter: It's recognizable. >> And it's recognizable, and consistent with the kinds of tools that executives have been using throughout their careers. >> Last question, we've got about 30 seconds left. >> Paul: Oh, okay. >> No pressure. >> You have said Kinetica's plan is to measure the success of the business by your customers' success. >> Absolutely. >> Where are you on that? >> We've begun that journey. I won't say we're there yet. We announced three weeks ago that we created a customer success organization. We've put about 30% of the company's resources into that customer success organization, and that entire team is measured not on revenue, not on project delivered on time, but on value delivered to the customer. So we baseline where the customer is at. We agree what we're looking to achieve with each customer, and we're measuring that team entirely against the delivery of those benefits to the customer. So it's a journey. We're on that journey, but we're committed to it. >> Exciting. Well, Paul, thank you so much for stopping by theCUBE for the first time. You're now a CUBE alumni. >> Oh, thank you, I've had a lot of fun. >> And we want to thank you for watching theCUBE. I'm Lisa Martin, live in San Jose, with Peter Burris. We are at the Forger Tasting Room and Eatery. Super cool place. Come on down, hang out with us today. We've got a cocktail party tonight. Well, you're sure to learn lots of insights from our experts, and tomorrow morning. But stick around, we'll be right back with our next guest after a short break. (CUBE theme music)

Published Date : Mar 7 2018

SUMMARY :

brought to you by Silicon Angle Media the CEO of Kinetica, Paul Appleby. We're excited to have you here, You know, in the past, our businesses so one of the ways that we think about it But now, the business has to care for, that she saw data as the raw material of the 21st century. And it's one of the misnomers of the notion But that means that the ... is that the value of their business wasn't the robots. and kind of the genesis of the technology. but the ability to stream all of those sources of data So you have to be able to get the data in of the platform and the ability for you So how does Kinetica facilitate that process? but also the tools that allow you to visualize Yeah, because the data is useless that need access to the data. is the cycle times to bring new drugs to market. and recognizing that you have a data management tool and the new areas of business So a big part of our strategy from the get go and consistent with the kinds of tools is to measure the success of the business the delivery of those benefits to the customer. for stopping by theCUBE for the first time. We are at the Forger Tasting Room and Eatery.

ENTITIES

Entity	Category	Confidence
Paul	PERSON	0.99+
Peter Burris	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Peter	PERSON	0.99+
Angela Merkel	PERSON	0.99+
San Jose	LOCATION	0.99+
Silicon Valley	LOCATION	0.99+
Kinetica	ORGANIZATION	0.99+
Paul Appleby	PERSON	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
London	LOCATION	0.99+
New York City	LOCATION	0.99+
Telco	ORGANIZATION	0.99+
tomorrow morning	DATE	0.99+
One issue	QUANTITY	0.99+
US Army	ORGANIZATION	0.99+
NSA	ORGANIZATION	0.99+
21st century	DATE	0.99+
Liquid Robotics	ORGANIZATION	0.99+
tonight	DATE	0.99+
first instance	QUANTITY	0.99+
today	DATE	0.99+
Bay Area	LOCATION	0.99+
CUBE	ORGANIZATION	0.99+
five	QUANTITY	0.99+
two examples	QUANTITY	0.99+
first day	QUANTITY	0.99+
both places	QUANTITY	0.99+
billions of cars	QUANTITY	0.99+
GSK	ORGANIZATION	0.98+
One car	QUANTITY	0.98+
three weeks ago	DATE	0.98+
each customer	QUANTITY	0.98+
two big issues	QUANTITY	0.98+
first time	QUANTITY	0.97+
earlier this year	DATE	0.97+
tenth	QUANTITY	0.96+
Bowing	ORGANIZATION	0.96+
Startup Data	EVENT	0.96+
one	QUANTITY	0.96+
Esri	TITLE	0.95+
Big Data	EVENT	0.94+
about 30 seconds	QUANTITY	0.93+
about 30%	QUANTITY	0.93+
Tablo	TITLE	0.93+
World Economic Forum	EVENT	0.92+
one thing	QUANTITY	0.92+
theCUBE	ORGANIZATION	0.88+
2018	DATE	0.87+
Big Data SV	EVENT	0.84+
a terabyte	QUANTITY	0.81+
one piece of data	QUANTITY	0.77+
Forger Tasting Room and	ORGANIZATION	0.73+
Big Data SV	ORGANIZATION	0.72+
Eatery	LOCATION	0.7+
Tasting	ORGANIZATION	0.67+
Germany	LOCATION	0.67+
data	QUANTITY	0.65+
Forger	LOCATION	0.65+
Room	LOCATION	0.56+
CEO	PERSON	0.55+
Kinetica	COMMERCIAL_ITEM	0.45+
Eatery	ORGANIZATION	0.43+
Scaldon	ORGANIZATION	0.38+

Basil Faruqui, BMC Software | BigData NYC 2017

>> Announcer: Live from Midtown Manhattan its theCUBE. Covering BigData New York City 2017. Brought to you by SiliconANGLE Media and it's ecosystem sponsors. >> His name is Jim Kobielus. >> Jim: That right, John Furrier is actually how I pronounce his name for the record. But he is Basil Faruqui. >> Basil Faruqui who's the solutions marketing manager at BMC, welcome to theCUBE. >> Basil: Thank you, good to be back on theCUBE. >> So, first of all, I heard you guys had a tough time in Houston, so hope everything's getting better and best wishes. >> Basil: Definitely in recovery mode now. >> Hopefully that can get straightened out. What's going on BMC, give us a quick update and in context to BigData NYC what's happening, what is BMC doing in the the big data space now? The AI space now, the IoT space now, the cloud space? >> Like you said you know the data space, the IoT space. the AI space. There are four components of this entire picture that literally haven't changed since the beginning of computing. If you look at those four components of a data pipeline a suggestion, storage. processing and analytics. What keeps changing around it is the infrastructure, the types of data, the volume of data and the applications that surround it. The rate of change has picked up immensely over the last few years with Hadoop coming into the picture, public cloud providers pushing it. It's obviously created a number of challenges, but one of the biggest challenges that we are seeing in the market and we're helping customers address is the challenge of automating this. And obviously the benefit of automation is in scalability as well as reliability. So when you look at this rather simple data pipeline, which is now becoming more and more complex. How do you automate all of this from a single point of control? How do you continue to absorb new technologies and not re-architect your automation strategy every time. Whether it's Hadoop, whether it's bringing in machine learning from a cloud provider. And that is the the issue we've been solving for customers. >> All right, let me jump into it. So first of all you mention some things some things that never change, ingestion storage, and what was the third one? >> Ingestions, storage, processing and eventual analytics. >> So OK, so that's cool, totally buy that. Now if you move and say hey okay so you believe that's standard but now in the modern era that we live in, which is complex, you want breadth of data, and also you want the specialization when you get down the machine learning. That's highly bound, that's where the automation it is right now. We see the trend essentially making that automation more broader as it goes into the customer environments. >> Basil: Correct. >> How do you architect that? If I'm a CXO to I'm a CDO, what's in it for me? How do I architect this because that's really the number one thing is I know what the building blocks are but they've changed in their dynamics to the marketplace. >> So the way I look at it is that what defines success and failure, and particularly in big data projects, is your ability to scale. If you start a pilot and you spend, you know, three months on it and you deliver some results. But if you cannot roll it out worldwide, nationwide, whatever it is essentially the project has failed. The analogy often give is Walmart has been testing the pick up tower, I don't know if you seen, so this is basically a giant ATM for you to go pick up an order that you placed online. They're testing this at about hundred stores today. Now that's a success and Walmart wants to roll this out nationwide. How much time do you think their IT departments can have? Is this is a five year project, ten year project? No, the management's going to want this done six months, ten months. So essentially, this is where automation becomes extremely crucial because it is now allowing you to deliver speed to market and without automation you are not going to be able to get to an operational stage in a repeatable and reliable manner. >> You're describing a very complex automation scenario. How can you automate in a hurry without sacrificing you know, the details of what needs to be, In other words, you seem to call for re purposing or reusing prior automation scripts and rules and so forth. How how can the Walmart's of the world do that fast, but also do it well? >> So we do it we go about it in two ways. One is that out of the box we provide a lot of pre built integrations to some of the most commonly used systems in an enterprise. All the way up from the mainframes, Oracle's, SAP's Hadoop, Tableau's, of the world. They're all available out of the box for you to quickly reuse these objects and build an automated data pipeline. The other challenge we saw, and particularly when we entered the big data space four years ago, was that the automation was something that was considered close to the project becoming operational. And that's where a lot of rework happened because developers have been writing their own scripts, using point solutions. So we said all right, it's time to shift automation left and allow companies to build automation as an artifact very early in the development lifecycle. About a month ago we released what we call Control-M Workbench which is essentially a Community Edition of Control-M targeted towards developers. So that instead of writing their own scripts they can use a Control-M in a completely offline manner without having to connect to an enterprise system. As they build and test and iterate, they're using Control-M to do that. So as the application progresses the development lifecycle, and all of that work can then translate easily into an Enterprise Edition of Control-M. >> So quickly, just explain what shift-left means for the folks that might not know software methodologies, left political or left alt-right, this is software development so please take a minute explain what shift-left means, and the importance of it. >> Correct, so the if you if you think of software development and as a straight line continuum you can start with building some code, you will do some testing, then unit testing, than user acceptance testing. As it moves along this chain, there was a point right before production where all of the automation used to happen. You know, developers would come in and deliver the application to ops, and ops would say, well hang on a second all this CRON tab and all these other point solutions have been using for automation, that's not what we use in production. And we need you to now. >> To test early and often. >> Test early and often. The challenge was the developers, the tools they use, we're not the tools that were being used on the production end of the cycle. And there was good reason for it because developers don't need something really heavy and with all the bells and whistles early in the development lifecycle. Control-M Workbench is a very light version which is targeted at developers and focuses on the needs that they have when they're building and developing as the application progresses through its life cycle. >> How much are you seeing Waterfall and then people shifting-left becoming more prominent now. What percentage of your customers have moved to Agile and shifting-left percentage wise? >> So we survey our customers on a regular basis. In the last survey showed that 80% of the customers have either implemented a more continuous integration delivery type of framework, or are in the process of doing it. And that's the other. >> And getting upfront costs as possible, a tipping point is reached. >> What is driving all of that is the need from the business, you know, the days of the five year implementation timelines are gone. This is something that you need to deliver every week, two weeks, and iteration. And we have also innovated in that space and the approach we call Jobs-as-Code where you can build entire, complex data pipelines in code formats so that you can enable the automation in a continuous integration and delivery framework. >> I have one quick question, Jim, and then I'll let you take the floor and got to learn to get a word in soon. But I have one final question on this BMC methodology thing. You guys have a history obviously BMC goes way back. Remember Max Watson CEO, and then in Palm Beach back in 97 we used to chat with him. Dominated that landscape, but we're kind of going back to a systems mindset, so the question for you is how do you view the issue of the this holy grail, the promised land of AI and machine learning. Where, you know, end-to-end visibility is really the goal, right. At the same time, you want bounded experiences at root level so automation can kick in to enable more activity. So it's a trade off between going for the end-to-end visibility out of the gate, but also having bounded visibility and data to automate. How do you guys look at that market because customers want the end-to-end promise, but they don't want to try to get there too fast as a dis-economies of scale potentially. How do you talk about that? >> And that's exactly the approach we've taken with Control-M Workbench the Community Edition. Because early on you don't need capabilities like SLA management and forecasting and automated promotion between environments. Developers want to be able to quickly build, and test and show value, OK. And they don't need something that, as you know, with all the bells and whistles. We're allowing you to handle that piece in that manner, through Control-M Workbench. As things progress, and the application progresses, the needs change as well. Now I'm closer to delivering this to the business, I need to be able to manage this within an SLA. I need to be able to manage this end-to-end and connect this other systems of record and streaming data and click stream data, all of that. So that we believe that there it doesn't have to be a trade off. That you don't have to compromise speed and quality and visibility and enterprise grade automation. >> You mention trade-offs so the Control-M Workbench the developer can use it offline, so what amount of testing can they possibly do on a complex data pipeline automation, when it's when the tool is off line? I mean it simply seems like the more development they do off line, the greater the risk that it simply won't work when they go into production. Give us a sense for how they mitigate that risk. >> Sure, we spent a lot of time observing how developers work and very early in the development stage, all they're doing is working off of their Mac or their laptop and they're not really connecting to any. And that is where they end up writing a lot of scripts because whatever code, business logic, that they've written the way they're going to make it run is by writing scripts. And that essentially becomes a problem because then you have scripts managing more scripts and as the the application progresses, you have this complex web of scripts and CRON tabs and maybe some open source solutions. trying to make, simply make, all of this run. And by doing this I don't know offline manner that doesn't mean that they're losing all of the other controlling capabilities. Simply, as the application progresses whatever automation that they've built in Control-M can seamlessly now flow into the next stage. So when you are ready take an application into production there is essentially no rework required from an automation perspective. All of that that was built can now be translated into the enterprise grade Control-M and that's where operations can then go in and add the other artifacts such as SLA management forecasting and other things that are important from an operational perspective. >> I'd like to get both your perspectives because you're like an analyst here. So Jim, I want you guys to comment, my question to both of you would be you know, looking at this time in history, obviously on the BMC side, mention some of the history. You guys are transforming on a new journey and extending that capability in this world. Jim, you're covering state of the art AI machine learning. What's your take of the space now? Strata Data which is now Hadoop World, which is, Cloudera went public, Hortonworks is now public. Kind of the big, the Hadoop guys kind of grew up, but the world has changed around them. It's not just about Hadoop anymore. So I want to get your thoughts on this kind of perspective. We're seeing a much broader picture in BigData NYC versus the Strata Hadoop, which seems to be losing steam. But, I mean, in terms of the focus, the bigger focus is much broader horizontally scalable your thoughts on the ecosystem right now. >> Let Basil answer first unless Basil wants me to go first. >> I think the reason the focus is changing is because of where the projects are in their life cycle. You know now what we're seeing is most companies are grappling with how do I take this to the next level. How do I scale, how do I go from just proving out one or two use cases to making the entire organization data driven and really inject data driven decision making in all facets of decision making. So that is, I believe, what's driving the change that we're seeing, that you know now you've gone from Strata Hadoop to being Strata Data, and focus on that element. Like I said earlier, these difference between success and failure is your ability to scale and operationalize. Take machine learning for example. >> And really it's not a hype market. Show me the meat on the bone, show me scale, I got operational concerns of security and whatnot. >> And machine learning you know that's one of the hottest topics. A recent survey I read which polled a number of data scientists, it revealed that they spent about less than 3% of their time in training the data models and about 80% of their time in data manipulation, data transformation and enrichment. That is obviously not the best use of the data scientists time, and that is exactly one of the problems we're solving for our customers around the world. >> And it needs to be automated to the hilt to help them to be more productive delivering fast results. >> Ecosystem perspective, Jim whats you thoughts? >> Yes everything that Basil said, and I'll just point out that many of the core use cases for AI are automation of the data pipeline. You know it's driving machine learning driven predictions, classifications, you know abstractions and so forth, into the data pipeline, into the application pipeline to drive results in a way that is contextually and environmentally aware of what's going on. The path, the history historical data, what's going on in terms of current streaming data to drive optimal outcomes, you know, using predictive models and so forth, in line to applications. So really, fundamentally then, what's going on is that automation is an artifact that needs to be driven into your application architecture as a re-purposeful resource for a variety of jobs. >> How would you even know what to automate? I mean that's the question. >> You're automating human judgment, your automating effort. Like the judgments that a working data engineer makes to prepare data for modeling and whatever. More and more that need can be automated because those are patterned, structured activities that have been mastered by smart people over many years. >> I mean we just had a customer on his with a glass company, GSK, with that scale, and his attitude is we see the results from the users then we double down and pay for it and automate it. So the automation question, it's a rhetorical question but this begs the question, which is you know who's writing the algorithms as machines get smarter and start throwing off their own real time data. What are you looking at, how do you determine you're going to need you machine learning for machine learning? You're going to need AI for AI? Who writes the algorithms for the algorithms? >> Automated machine learning is a hot hot, not only research focus, but we're seeing it more and more solution providers like Microsoft and Google and others, are going deep down doubling down and investments in exactly that area. That's a productivity play for data scientists. >> I think the data markets going to change radically in my opinion, so you're starting to see some things with blockchain some other things that are interesting. Data sovereignty, data governance are huge issues. Basil, just give your final thoughts for this segment as we wrap this up. Final thoughts on data and BMC, what should people know about BMC right now, because people might have a historical view of BMC. What's the latest, what should they know, what's the new Instagram picture of BMC? What should they know about you guys? >> I think what I would say people should know about BMC is that you know all the work that we've done over the last 25 years, in virtually every platform that came before Hadoop, we have now innovated to take this into things like big data and cloud platforms. So when you are choosing Control-M as a platform for automation, you are choosing a very very mature solution. An example of which is Navistar and their CIO is actually speaking at the keynote tomorrow. They've had Control-M for 15, 20 years and have automated virtually every business function through Control-M. And when they started their predictive maintenance project where there ingesting data from about 300 thousand vehicles today, to figure out when this vehicle might break and do predictive maintenance on it. When they started their journey they said that they always knew that they were going to use Control-M for it because that was the enterprise standard. And they knew that they could simply now extend that capability into this area. And when they started about three four years ago there were ingesting data from about a hundred thousand vehicles, that has now scaled over 325 thousand vehicles and they have not had to re-architect their strategy as they grow and scale. So, I would say that is one of the key messages that we are are taking to market, is that we are bringing innovation that has spanned over 25 years and evolving it. >> Modernizing it. >> Modernizing it and bringing it to newer platforms. >> Congratulations, I wouldn't call that a pivot, I'd call it an extensibility issue, kind of modernizing the core things. >> Absolutely. >> Thanks for coming and sharing the BMC perspective inside theCUBE here. On BigData NYC this is theCUBE. I'm John Furrier, Jim Kobielus here in New York City, more live coverage the three days we will be here, today, tomorrow and Thursday at BigData NYC. More coverage after this short break.

Published Date : Sep 27 2017

SUMMARY :

Brought to you by SiliconANGLE Media how I pronounce his name for the record. Basil Faruqui who's the solutions marketing manager So, first of all, I heard you guys The AI space now, the IoT space now, the cloud space? And that is the the issue we've been solving So first of all you mention some things some things the specialization when you get down the machine learning. the number one thing is I know what the building blocks are the pick up tower, I don't know if you seen, How how can the Walmart's of the world One is that out of the box we provide for the folks that might not know software methodologies, Correct, so the if you if you think and developing as the application progresses How much are you seeing Waterfall And that's the other. And getting upfront costs as possible, What is driving all of that is the need from At the same time, you want bounded experiences And that's exactly the approach we've taken with I mean it simply seems like the more development and as the the application progresses, Kind of the big, the Hadoop guys kind of grew up, that we're seeing, that you know now you've gone Show me the meat on the bone, show me scale, of the data scientists time, and that is exactly And it needs to be automated to the hilt that many of the core use cases for AI are automation I mean that's the question. Like the judgments that a working data engineer makes So the automation question, it's a rhetorical question and more solution providers like Microsoft What's the latest, what should they know, is that you know all the work that we've done and bringing it to newer platforms. the core things. more live coverage the three days we will be here,

ENTITIES

Entity	Category	Confidence
Jim	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
Jim Kobielus	PERSON	0.99+
Basil Faruqui	PERSON	0.99+
John Furrier	PERSON	0.99+
BMC	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Basil	PERSON	0.99+
Google	ORGANIZATION	0.99+
Houston	LOCATION	0.99+
New York City	LOCATION	0.99+
15	QUANTITY	0.99+
80%	QUANTITY	0.99+
Palm Beach	LOCATION	0.99+
one	QUANTITY	0.99+
ten months	QUANTITY	0.99+
five year	QUANTITY	0.99+
ten year	QUANTITY	0.99+
two weeks	QUANTITY	0.99+
six months	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
three days	QUANTITY	0.99+
over 325 thousand vehicles	QUANTITY	0.99+
Mac	COMMERCIAL_ITEM	0.99+
both	QUANTITY	0.99+
One	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
three months	QUANTITY	0.99+
tomorrow	DATE	0.99+
two ways	QUANTITY	0.99+
Thursday	DATE	0.99+
GSK	ORGANIZATION	0.99+
about 300 thousand vehicles	QUANTITY	0.99+
about 80%	QUANTITY	0.99+
today	DATE	0.99+
Midtown Manhattan	LOCATION	0.99+
SAP	ORGANIZATION	0.98+
one quick question	QUANTITY	0.98+
third one	QUANTITY	0.98+
Strata Hadoop	TITLE	0.98+
four years ago	DATE	0.98+
over 25 years	QUANTITY	0.98+
single point	QUANTITY	0.98+
about a hundred thousand vehicles	QUANTITY	0.97+
one final question	QUANTITY	0.97+
About a month ago	DATE	0.96+
Max Watson	PERSON	0.96+
Instagram	ORGANIZATION	0.96+
BigData	ORGANIZATION	0.95+
four components	QUANTITY	0.95+
about hundred stores	QUANTITY	0.95+
first	QUANTITY	0.95+
two use cases	QUANTITY	0.95+
NYC	LOCATION	0.94+
Navistar	ORGANIZATION	0.94+
BMC Software	ORGANIZATION	0.93+
97	DATE	0.93+
Agile	TITLE	0.89+

Dr. Mark Ramsey & Bruno Aziza | BigData NYC 2017

>> Live from Mid Town Manhattan. It's the Cube, covering BIGDATA New York City 2017. Brought to you by, SiliconANGLE Media and it's ecosystems sponsors. >> Hey welcome back everyone live here in New York City for the Cube special presentation of BIGDATA NYC. Here all week with the Cube in conjunction with Strata Data even happening around the corner. I'm John Furrier the host. James Kobielus, our next two guests Doctor Mark Ramsey, chief data officer and senior vice president of R&D at GSK, Glasgow Pharma company. And Bruno as he's the CMO at Fscale, both Cube alumni. Welcome back. >> Thank for having us. >> So Bruno I want to start with you because I think that Doctor Mark has some great use cases I want to dig into and go deep on with Jim. But Fscale, give us the update of the company. You guys doing well, what's happening? How's the, you have the vision of this data layer we talked a couple years ago. It's working so tell us, give us the update. >> A lot of things have happened since we talked last. I think you might have seen some of the news in terms of growth. Ten X growth since we started and mainly driven around the customer use cases. That's why I'm excited to hear from Mark and share his stories with the rest of the audience here. We have a presentation at Strata tomorrow with Vivens. It's a great IOT use case as well. So what we're seeing is the industry is changing in terms of how it's spying the idea platforms. In the past, people would buy idea platforms vertically. They'd buy the visualization, they'd buy the sementic and buy the best of great integration. We're now live in a world where there's a multitude of BI tools. And the data platforms are not standardized either. And so what we're kind of riding as a trend is this idea of the need for the universal semantic layer. This idea that you can have a universal set of semantics. In a dictionary or ontology. that can be shared across all types of business users and business use cases. Or across any data. That's really the trend that's driving our growth. And you'll see it today at this show with the used cases and the customers. And of course some of the announcements that we're doing. We're announcing a new offer with cloud there and tableau. And so we're really excited about again how they in space and the partner ecosystems embracing our solutions. >> And you guys really have a Switzerland kind of strategy. You're going to play neutral, play nicely with everybody. Because you're in a different, your abstraction layer is really more on the data. >> That's right. The whole value proposition is that you don't want to move your data. And you don't want to move your users away from the tools that they already know but you do want them to be able to take advantage of the data that you store. And this concept of virtualized layer and you're universal semantic layer that enables the use case to happen faster. Is a big value proposition to all of them. >> Doctor Mark Ramsey, I want to get your quick thoughts on this. I'm obviously your customer so. I mean you're not bias, you ponder pressure everyday. Competitive noise out there is high in this area and you're a chief data officer. You run R&D so you got that 20 miles stare into the future. You've got experience running data at a wide scale. I mean there's a lot of other potential solutions out there. What made it attractive for you? >> Well it feels a need that we have around really that virtualization. So we can leave the data in the format that it is on the platform. And then allow the users to use like Bruno was mentioning. Use a number of standardized tools to access that information. And it also gives us an ability to learn how folks are consuming the data. So they will use a variety of tools, they'll interact with the data. At scale gives us a great capability to really look under the cover, see how they're using the data. And if we need to physicalize some of that to make easier access in the long term. It gives us that... >> It's really an agility model kind to data. You're kind of agile. >> Yeah its kind of a way to make, you know so if you're using a dash boarding tool it allows you to interact with the data. And then as you see how folks are actually consuming the information. Then you can physicalize it and make that readily available. So it is, it gives you that agile cycles to go through. >> In your use of the solution, what have you seen in terms of usage patterns. What are your users using at scale for? Have you been surprised by how they're using it? And where do you plan to go in terms of the use cases you're addressing going forward with this technology? >> This technology allows us to give the users the ability to query the data. So for example we use standardized ontologies in several of the areas. And standardized ontologies are great because the data is in one format. However that's not necessarily how the business would like to look at the data and so it gives us an ability to make the data appear like the way the users would like to consume the information. And then we understand which parts of the model they're actually flexing and then we can make the decision to physicalize that. Cause again it's a great technology but virtualization there is a cost. Because the machines have to create the illusion of the data being a certain way. If you know it's something that's going to be used day in and day out then you can move it to a physicalized version. >> Is there a specific threshold when you were looking at the metrics of usage. When you know that particular data, particular views need to be physicalized. What is that threshold or what are those criteria? >> I think it's, normally is a combination of the number of connections that you have. So the joins of the data across the number of repositories of data. And that balanced with the volume of data so if you're dealing with thousands of rows verses billions of rows then that can lead you to make that decision faster. There isn't a defined metric that says, well we have this number of rows and this many columns and this size that it really will lead you down that path. But the nice thing is you can experiment and so it does give you that ability to sort of prototype and see, are folks consuming the data before you evoke the energy to make it physical. >> You know federated, I use the word federated but semantic virtualization layers clearly have been around for quite sometime. A lot of solution providers offer them. A lot of customers have used them for disparate use cases. One of the wraps traditionally again estimating virtualization is that it's simply sort of a stop gap between chaos on the one end. You know where you have dozens upon dozens of databases with no unified roll up. That's a stop gap on the way to full centralization or migration to a big data hub. Did you see semantic virtualization as being sort of your target architecture for your operational BI and so forth? Or do you on some level is it simply like I said a stop gap or transitional approach on the way to some more centralized environment? >> I think you're talking about kind of two different scenarios here. One is in federated I would agree, when folks attempted to use that to bring disparate data sources together to make it look like it was consolidated. And they happen to be on different platforms, that was definitely a atop gap on a journey to really addressing the problem. Thing that's a little different here is we're talking about this running on a standardized platform. So it's not platformed disparate it's on the platform the data is being accessed on the platform. It really gives us that flexibility to allow the consumer of the data to have a variety of views of the data without actually physicalizing each of them. So I don' know that it's on a journey cause we're never going to get to where we're going to make the data look as so many different ways. But it's very different than you know ten, 15 years ago. When folks were trying to solve disparate data sources using federation. >> Would it be fair to characterize what you do as agile visualization of the data on a data lake platform? Is that what it's essentially about? >> Yeah that, it certainly enables that. In our particular case we use the data lake as the foundation and then we actually curate the data into standardized ontologies and then really, the consumer access layer is where we're applying virtualization. In the creation of the environment that we have we've integrated about a dozen different technologies. So one of the things we're focused on is trying to create an ecosystem. And at scale is one of the components of that. It gives us flexibility so that we don't have to physicalize. >> Well you'd have to stand up any costs. So you have the flexibility with at scale. I get this right? You get the data and people can play with it without actually provisioning. It's like okay save some cash, but then also you double down on winners that come in. >> Things that are a winner you check the box, you physicalize it. You provide that access. >> You get crowd sourcing benefits like going on in your. >> You know exactly. >> The curation you mentioned. So the curation goes on inside of at scale. Are you using a different tool or something you hand wrote in house to do that? Essentially it's a data governance and data cleansing. >> That is, we use technology called Tamer. That is a machine learning based data curation tool, that's one of our fundamental tools for curation. So one of the things in the life sciences industry is you tend to have several data sources that are slightly aligned. But they're actually different and so machine learning is an excellent application. >> Lets get into the portfolio. Obviously as a CTO you've got to build a holistic view. You have a tool chest of tools and a platform. How do you look at the big picture? On that scale if it's been beautifully makes a lot of sense. So good for those guys. But you know big picture is, you got to have a variety of things in your arsenal. How do you architect that tool shed or your platform? Is everything a hammer, everything's a nail. You've got all of them though. All the things to build. >> You bring up a great point cause unfortunately a lot of times. We'll use your analogy, it's like a tool shed. So you don't want 12 lawnmowers right? In your tool shed right? So one of the challenges is that a lot of the folks in this ecosystem. They start with one area of focus and then they try to grow into area of focuses. Which means that suddenly everybody's starts to be a lawnmower, cause they think that's... >> They start as a hammer and turn into a lawn mower. >> Right. >> How did that happen, that's called pivoting. >> You can mow your lawn with a hammer but. So it's really that portfolio of tools that all together get the job done. So certainly there's a data acquisition component, there's the curation component. There's visualization machines learning, there's the foundational layer of the environment. So all of those things, our approach has been to select. The kind of best in class tools around that and then work together and... Bruno and the team at scale have been part of this. We've actually had partner summits of how do we bring that ecosystem together. >> Is your stuff mostly on prime, obviously a lot of pharma IP there. So you guys have the game that poll patent thing which is well documented. You don't want to open up the kimono and start the cloth until it's releasing so. You obviously got to keep things confidential. Mix of cloud, on prime, is it 100 percent on prime? Is there some versing for the cloud? Is it a private cloud, how do you guys look at the cloud piece? >> Yeah majority of what we're doing is on prime. The profile for us is that we persist the data. So it's not. In some cases when we're doing some of the more advanced analytics we burst to the cloud for additional processors. But the model of persisting the data means that it's much more economical to have on prime instance of what we're doing. But it is a combination, but the majority of what we're doing is on prime. >> So will you hold on Jim, one more question. I mean obviously everyone's knocking on your door. You know how to get in that account. They spend a lot of money. But you're pretty disciplined it sounds like you've got to a good view of you don't want people to come in and turn into someone that you don't want them to be. But you also run R&D so you got to have to understand the head room. How do you look at the head room of what you need down the road in terms of how you interface with the suppliers that knock on your door. Whether it's at scale currently working with you now. And then people just trying to get in there and sell you a hammer or a lawn mower. Whatever they have they're going to try, you know you're dealing with the vendor pressure. >> Right well a lot of that is around what problem we're trying to solve. And we drive all of that based on the use cases and the value to the business. I mean and so if we identify gaps that we need to address. Some of those are more specific to life sciences types of challenges where they're very specific types of tools that the population of partners is quite small. And other things. We're building an actual production, operational environment. We're not building a proof of concept, so security is extremely important. We're coberosa enabled end to end to out rest inflight. Which means it breaks some of the tools and so there's criteria of things that need to be in place in order to... >> So you got anything about scale big time? So not just putting a beach head together. But foundationally building out platform. Having the tools that fit general purpose and also specialty but scales a big thing right? >> And it's also we're addressing what we see is three different cohorts of consumers of the data. One is more in the guided analytics, the more traditional dashboards, reports. One is in more of computational notebooks, more of the scientific using R, Python, other languages. The third is more kind of almost at the bare middle level machine learning, tenser flow a number of tools that people directly interact. People don't necessarily fit nicely into those three cohorts so we're also seeing that, there's a blend. And that's something that we're also... >> There's a fourth cohort. >> Yeah well you know someone's using a computational notebook but they want to draw upon a dashboard graphic. And then they want to run a predefined tenser flow and pull all that together so. >> And what you just said, tied up the question I was going to ask. So it's perfect so. One of my core focuses is as a Wikibon analyst is on deep learning. On AI so in semantic data virtualization in a life sciences pharma context. You have undoubtedly a lot of image data, visual data. So in terms of curating that and enabling you know virtualized access to what extent are you using deep learning, tenser flow, convolutional neural networks to be able to surface up the visual patterns that can conceivably be searched using a variety of techniques. Is that a part of your overall implementation of at scale for your particular use cases currently? Or do you plan to go there in terms of like tenser flow? >> No I mean we're active, very active. In deep learning, artificial intelligence, machine learning. Again it depends on which problem you're trying to solve and so we again, there's a number of components that come together when you're looking at the image analytics. Verses using data to drive out certain decisions. But we're acting in all of those areas. Our ultimate goal is to transform the way that R&D is done within a pharmaceutical company. To accelerate the, right now it takes somewhere between five and 15 years to develop a new medicine. The goal is to really to do a lot more analytics to shorten that time significantly. Helps the patients, gets the medicines to market faster. >> That's your end game you've got to create an architecture that enables the data to add value. >> Right. >> The business. Doctor Mark Ramsey thanks so much for sharing the insight from your environment. Bruno you got something there to show us. What do you got there? He always brings a prop on. >> A few years ago I think I had a tattoo on my neck or something like this. But I'm happy that I brought this because you could see how big Mark's vision is. the reason why he's getting recognized by club they're on the data awards and so forth. Is because he's got a huge vision and it's a great opportunity for a lot of CTOs out there. I think the average CEO spent a 100 million dollars to deploy big data solutions over the last five years. But they're not able to consumer all the data they produce. I think in your case you consume about a 100 percent of the instructor data. And the average in this space is we're able to consume about one percent of the data. And this is essentially the analogy today that you're dealing with if you're on the enterprise. We'd spent a lot of time putting data in large systems and so forth. But the tool set that we give, that you did officers in their team is a cocktail straw lik this in order to drink out of it. >> That's a data lake actually. >> It's an actual lake. It's a Slurpee cup. Multiple Slurpees with the same straw. >> Who has the Hudson river water here? >> I can't answer that question I think I'd have to break a few things if I did. But the idea here is that it's not very satisfying. Enough the frustration business users and business units. When at scale's done is we built this, this is the straw you want. So I would kind of help CTOs contemplate this idea of the Slurpee and the cocktail straw. How much money are you spending here and how much money are you spending there. Because the speed at which you can get the insights to the business user. >> You got to get that straw you got to break it down so it's available everywhere. So I think that's a great innovation and it makes me thirsty. >> You know what, you can have it. >> Bruno thanks for coming from at scale. Doctor Mark Ramsey good to see you again great to have you come back. Again anytime love to have chief data officers on. Really a pioneering position, is the critical position in all organizations. It will be in the future and will continue being. Thanks for sharing your insights. It's the Cube, more live coverage after this short break. (tech music)

Published Date : Sep 27 2017

SUMMARY :

Brought to you by, And Bruno as he's the CMO at Fscale, So Bruno I want to start with you And of course some of the announcements that we're doing. And you guys really have a Switzerland And you don't want to move your users You run R&D so you got that in the format that it is on the platform. It's really an agility model kind to data. So it is, it gives you that agile cycles to go through. And where do you plan to go and day out then you can move it to a physicalized version. When you know that particular data, particular views But the nice thing is you can experiment You know where you have dozens upon dozens of databases So it's not platformed disparate it's on the platform So one of the things we're focused on So you have the flexibility with at scale. Things that are a winner you check the box, You get crowd sourcing benefits So the curation goes on So one of the things in the life sciences industry you got to have a variety of things in your arsenal. So one of the challenges is that a lot of the folks Bruno and the team at scale have been part of this. So you guys have the game that poll patent thing but the majority of what we're doing is on prime. of what you need down the road and the value to the business. So you got anything about scale big time? more of the scientific using R, Python, other languages. Yeah well you know someone's using to what extent are you using deep learning, Helps the patients, gets the medicines to market faster. that enables the data to add value. Bruno you got something there to show us. that you did officers in their team is a cocktail straw It's a Slurpee cup. Because the speed at which you can get the insights you got to break it down so it's available everywhere. Doctor Mark Ramsey good to see you again

ENTITIES

Entity	Category	Confidence
Jim	PERSON	0.99+
James Kobielus	PERSON	0.99+
Mark	PERSON	0.99+
Bruno	PERSON	0.99+
New York City	LOCATION	0.99+
John Furrier	PERSON	0.99+
20 miles	QUANTITY	0.99+
Mark Ramsey	PERSON	0.99+
100 percent	QUANTITY	0.99+
12 lawnmowers	QUANTITY	0.99+
GSK	ORGANIZATION	0.99+
100 million dollars	QUANTITY	0.99+
Fscale	ORGANIZATION	0.99+
third	QUANTITY	0.99+
dozens	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
One	QUANTITY	0.99+
15 years	QUANTITY	0.99+
Python	TITLE	0.99+
today	DATE	0.99+
Bruno Aziza	PERSON	0.99+
both	QUANTITY	0.99+
one	QUANTITY	0.98+
each	QUANTITY	0.98+
fourth cohort	QUANTITY	0.98+
NYC	LOCATION	0.98+
Cube	ORGANIZATION	0.98+
Hudson river	LOCATION	0.98+
Vivens	ORGANIZATION	0.98+
Switzerland	LOCATION	0.98+
three cohorts	QUANTITY	0.98+
Doctor	PERSON	0.98+
billions of rows	QUANTITY	0.97+
Ten X	QUANTITY	0.97+
tomorrow	DATE	0.97+
two guests	QUANTITY	0.97+
one format	QUANTITY	0.97+
thousands of rows	QUANTITY	0.97+
BIGDATA	ORGANIZATION	0.97+
prime	COMMERCIAL_ITEM	0.96+
one more question	QUANTITY	0.96+
couple years ago	DATE	0.96+
Dr.	PERSON	0.96+
agile	TITLE	0.96+
R&D	ORGANIZATION	0.95+
two different scenarios	QUANTITY	0.95+
about one percent	QUANTITY	0.95+
five	QUANTITY	0.93+
Strata Data	ORGANIZATION	0.93+
three different cohorts	QUANTITY	0.92+
Mid Town Manhattan	LOCATION	0.92+
dozens of databases	QUANTITY	0.92+
Wikibon	ORGANIZATION	0.92+
ten,	DATE	0.89+
about a 100 percent	QUANTITY	0.89+
BigData	ORGANIZATION	0.88+
2017	DATE	0.86+
one area	QUANTITY	0.81+
BIGDATA New York City 2017	EVENT	0.79+
last five years	DATE	0.78+
15 years ago	DATE	0.78+
about a dozen different technologies	QUANTITY	0.76+
A few years ago	DATE	0.76+
one end	QUANTITY	0.74+
Glasgow Pharma	ORGANIZATION	0.7+
things	QUANTITY	0.69+
R	TITLE	0.65+

Bruno Aziza & Josh Klahr, AtScale - Big Data SV 17 - #BigDataSV - #theCUBE1

>> Announcer: Live from San Jose, California, it's The Cube. Covering Big Data, Silicon Valley, 2017. (electronic music) >> Okay, welcome back everyone, live at Silicon Valley for the big The Cube coverage, I'm John Furrier, with me Wikibon analyst George Gilbert, Bruno Aziza, who's on the CMO of AtScale, Cube alumni, and Josh Klahr VP at AtScale, welcome to the Cube. >> Welcome back. >> Thank you. >> Thanks, Brian. >> Bruno, great to see you. You look great, you're smiling as always. Business is good? >> Business is great. >> Give us the update on AtScale, what's up since we last saw you in New York? >> Well, thanks for having us, first of all. And, yeah, business is great, we- I think Last time I was here on The Cube we talked about the Hadoop Maturity Survey and at the time we'd just launched the company. And, so now you look about a year out and we've grown about 10x. We have large enterprises across just about any vertical you can think of. You know, financial services, your American Express, healthcare, think about ETNA, SIGNA, GSK, retail, Home Depot, Macy's and so forth. And, we've also done a lot of work with our partner Ecosystem, so Mork's- OEM's AtScale technology which is a great way for us to get you AtScale across the US, but also internationally. And then our customers are getting recognized for the work that they are doing with AtScale. So, last year, for instance, Yellowpages got recognized by Cloudera, on their leadership award. And Macy's got a leadership award as well. So, things are going the right trajectory, and I think we're also benefitting from the fact that the industry is changing, it's maturing on the the big data side, but also there's a right definition of what business intelligence means. This idea that you can have analytics on large-scale data without having to change your visualization tools and make that work with existing stock you have in place. And, I think that's been helping us in growing- >> How did you guys do it? I mean, you know, we've talked many times in there's some secret sauce there, but, at the time when you guys were first starting it was kind of crowded field, right? >> Bruno: Yeah. >> And all these BI tools were out there, you had front end BI tools- >> Bruno: Yep. But everyone was still separate from the whole batch back end. So, what did you guys do to break out? >> So, there's two key differentiators with AtScale. The first one is we are the only platform that does not have a visualization tool. And, so people think about this as, that's a bug, that's actually a feature. Because, most enterprises have already that stuff made with traditional BI tools. And so our ability to talk to MDX and SQL types of BI tools, without any changes is a big differentiator. And then the other piece of our technology, this idea that you can get the speed, the scale and security on large data sets without having to move the data. It's a big differentiation for our enterprise to get value out of the data. They already have in Hadoop as well as non-Hadoop systems, which we cover. >> Josh, you're the VP of products, you have the roadmaps, give us a peek into what's happening with the current product. And, where's the work areas? Where are you guys going? What's the to-do list, what's the check box, and what's the innovation coming around the corner? >> Yeah, I think, to follow up on what Bruno said about how we hit the sweet spot. I think- we made a strategic choice, which is we don't want to be in the business of trying to be Tableu or Excel or be a better front end. And there's so much diversity on the back end if you look at the ecosystem right now, whether it's Spark Sequel, or Hive, or Presto, or even new cloud based systems, the sweet spot is really how do you fit into those ecosystems and support the right level of BI on top of those applications. So, what we're looking at, from a road map perspective is how do we expand and support the back end data platforms that customers are asking about? I think we saw a big white space in BI on Hadoop in particular. And that's- I'd say, we've nailed it over the past year and a half. But, we see customers now that are asking us about Google Big Query. They're asking us about Athena. I think these server-less data platforms are really, really compelling. They're going to take a while to get adoption. So, that's a big investment area for us. And then, in terms of supporting BI front ends, we're kind of doubling down on making sure our Tableau integration is great, Power BI is I think getting really big traction. >> Well, two great products, you've got Microsoft and Tableau, leaders in that area. >> The self-service BI revolution has, I would say, has won. And the business user wants their tool of choice. Where we come in is the folks responsible for data platforms on the back end, they want some level of control and consistency and so they're trying to figure out, where do you draw the line? Where do you provide standards? Where do you provide governance, and where do you let the business lose? >> All right, so, Bruno and Josh, I want you to answer the questions, be a good quiz. So, define next generation BI platforms from a functional standpoint and then under the hood. >> Yeah, there's a few things you can look at. I think if you were at the Gartner BI conference last week you saw that there was 24 vendors in the magic quadrant and I think in general people are now realizing that this is a space that is extremely crowded and it's also sitting on technology that was built 20 years ago. Now, when you talk to enterprises like the ones we work with, like, as I named earlier, you realize that they all have multiple BI tools. So, the visualization war, if you will, kind of has been set up and almost won by Microsoft and Tableau at this point. And, the average enterprise is 15 different BI tools. So, clearly, if you're trying to innovate on the visualization side, I would say you're going to have a very hard time. So, you're dealing with that level of complexity. And then, at the back end standpoint, you're now having to deal with database from the past - that's the Teradata of this world - data sources from today - Hadoop - and data sources from the future, like Google Big Query. And, so, I think the CIO answer of what is the next gen BI platform I want is something that is enabling me to simplify this very complex world. I have lots of BI tools, lots of data, how can I standardize in the middle in order to provide security, provide scale, provide speed to my business users and, you know, that's really radically going to change the space, I think. If you're trying to sell a full stack that's integrated from the bottom all the way to visualization, I don't think that's what enterprises want anymore >> Josh, under the hood, what's the next generation- you know, key leverage for the tech, and, just the enabler. >> Yeah, so, for me the end state for the next generation GI platform is a user can log in, they can point to their data, wherever that data is, it's on Prime, it's in the cloud, it's in a relational database, it's a flat file, they can design their business model. We spend a lot of time making sure we can support the creation of business models, what are the key metrics, what are the hierarchies, what are the measures, it may sound like I'm talking about OLAP. You know, that's what our history is steeped in. >> Well, faster data is coming, that's- streaming and data is coming together. >> So, I should be able to just point at those data sets and turn around and be able to analyze it immediately. On the back end that means we need to have pretty robust modeling capabilities. So that you can define those complex metrics, so you can functionally do what are traditional business analytics, period over period comparisons, rolling averages, navigate up and down business hierarchies. The optimizations should be built in. It shouldn't be the responsibility of the designer to figure out, do I need to create indeces, do I need to create aggregates, do I need to create summarization? That should all be handled for you automatically. Shouldn't think about data movement. And so that's really what we've built in from an AtScale perspective on the back end. Point to data, we're smart about creating optimal data structure so you get fast performance. And then, you should be able to connect whatever BI tool you want. You should be able to connect Excel, we can talk the MDX Query language. We can talk Sequel, we can talk Dax, whatever language you want to talk. >> So, take the syntax out of the hands of the user. >> Yeah. >> Yeah. >> And getting in the weeds on that stuff. Make it easier for them- >> Exactly. >> And the key word I think, for the future of BI is open, right? We've been buying tools over the last- >> What do you mean by that, explain. >> Open means that you can choose whatever BI tool you want, and you can choose whatever data you want. And, as a business user there's no real compromise. But, because you're getting an open platform it doesn't mean that you have to trade off complexity. I think some of the stuff that Josh was talking about, period analysis, the type of multidimensional analysis that you need, calendar analysis, historical data, that's still going to be needed, but you're going to need to provide this in a world where the business, user, and IT organization expects that the tools they buy are going to be open to the rest of the ecosystem, and that's new, I think. >> George, you want to get a question in, edgewise? Come on. (group laughs) >> You know, I've been sort of a single-issue candidate, I guess, this week on machine learning and how it's sort of touching all the different sectors. And, I'm wondering, are you- how do you see yourselves as part of a broader pipeline of different users adding different types of value to data? >> I think maybe on the machine learning topic there is a few different ways to look at it. The first is we do use machine learning in our own product. I talked about this concept of auto-optimization. One of the things that AtScale does is it looks at end-user query patterns. And we look at those query patterns and try to figure out how can we be smart about anticipating the next thing they're going to ask so we can pre-index, or pre-materialize that data? So, there's machine learning in the context of making AtScale a better product. >> Reusing things that are already done, that's been the whole machine-learning- >> Yes. >> Demos, we saw Google Next with the video editing and the video recognition stuff, that's been- >> Exactly. >> Huge part of it. >> You've got users giving you signals, take that information and be smart with it. I think, in terms of the customer work flow - Comcast, for example, a customer of ours - we are in a data discovery phase, there's a data science group that looks at all of their set top box data, and they're trying to discover programming patterns. Who uses the Yankees' network for example? And where they use AtScale is what I would call a descriptive element, where they're trying to figure out what are the key measures and trends, and what are the attributes that contribute to that. And then they'll go in and they'll use machine learning tools on top of that same data set to come up with predictive algorithms. >> So, just to be clear there, they're hypotehsizing about, like, say, either the pattern of users that might be- have an affinity for a certain channel or channels, or they're looking for pathways. >> Yes. And I'd say our role in that right now is a descriptive role. We're supporting the descriptive element of that analytics life cycle. I think over time our customers are going to push us to build in more of our own capabilities, when it comes to, okay, I discovered something descriptive, can you come up with a model that helps me predict it the next time around? Honestly, right now people want BI. People want very traditional BI on the next generation data platform. >> Just, continuing on that theme, leaving machine learning aside, I guess, as I understand it, when we talked about the old school vendors, Care Data, when they wanted to support data scientists they grafted on some machine learning, like a parallel version of our- in the core Teradata engine. They also bought Astro Data, which was, you know, for a different audience. So, I guess, my question is, will we see from you, ultimately, a separate product line to support a new class of users? Or, are you thinking about new functionality that gets integrated into the core product. I think it's more of the latter. So, the way that we view it- and this is really looking at, like I said, what people are asking for today is, kind of, the basic, traditional BI. What we're building is essentially a business model. So, when someone uses AtScale, they're designing and they're telling us, they're asserting, these are the things I'm interested in measuring, and these are the attributes that I think might contribute to it. And, so that puts us in a pretty good position to start using, whether it's Spark on the back end, or built in machine learning algorithms on the Hadoop cluster, let's start using our knowledge of that business model to help make predictions on behalf of the customer. So, just a follow-up, and this really leaves out the machine learning part, which is, it sounds like, we went- in terms of big data we we first to archive it- supported more data retension than could do affordably with the data warehouse. Then we did the ETL offload, now we're doing more and more of the visualization, the ad-hoc stuff. >> That's exactly right. So, what- in a couple years time, what remains in the classic data warehouse, and what's in the Hadoop category? >> Well, so there is, I think what you're describing is the pure evolution, of, you know, any technology where you start with the infrastructure, you know, we've been in this for over ten years, now, you've got cloud. They are going APO and then going into the data science workbench. >> That's not official yet. >> I think we read about this, or at least they filed. But I think the direction is showing- now people are relying on the platform, the Hadoop platform, in order to build applications on top of it. And, so, I think, just like Josh is saying, the mainstream application on top of the database - and I think this is true for non-Hadoop systems as well - is always going to be analytics. Of course, data science is something that provides a lot of value, but it typically provides a lot of value to a few set of people that will then scale it out to the rest of their organization. I think if you now project out to what does this mean for the CIO and their environment, I don't think any of these platforms, Teradata or Hadoop, or Google, or Amazon or any of those, I don't think do 100% replace. And, I think that's where it becomes interesting, because you're now having to deal with a hetergeneous environment, where the business user is up, they're using Excel, they're using they're standard net application, they might be using the result of machine learning models, but they're also having to deal with the heterogeneous environment at the data level. Hadoop on Prime, Hadoop in the cloud, non-Hadoop in the cloud and non-Hadoop on Prime. And, of course that's a market that I think is very interesting for us as a simplification platform for that world. >> I think you guys are really thinking about it in a new way, and I think that's kind of a great, modern approach, let the freedom- and by the way, quick question on the Microsoft tool and Tableau, what percentage share do you think they are of the market? 50? Because you mentioned those are the two top ones. >> Are they? >> Yeah, I mentioned them, because if you look at the magic quadrant, clearly Microsoft, Power BI and Tableau have really shot up all the way to the right. >> Because it's easy to use, and it's easy to work with data. >> I think so, I think- look, from a functionality standpoint, you see Tableau's done a very good job on the visualization side. I think, from a business standpoint, and a business model execution, and I can talk from my days at Microsoft, it's a very great distribution model to get thousands and thousands of users to use power BI. Now, the guys that we didn't talk about on the last magic quadrant. People who are like Google Data Studio, or Amazon Quicksite, and I think that will change the ecosystem as well. Which, again, is great news for AtScale. >> More muscle coming in. >> That's right. >> For you guys, just more rising tide floats all boats. >> That's right. >> So, you guys are powering it. >> That's right. >> Modern BI would be safe to say? >> That's the idea. The idea is that the visualization is basically commoditized at this point. And what business users want and what enterprise leaders want is the ability to provide freedom and openness to their business users and never have to compromise security, speed and also the complexity of those models, which is what we- we're in the business of. >> Get people working, get people productive faster. >> In whatever tool they want. >> All right, Bruno. Thanks so much. Thanks for coming on. AtScale. Modern BI here in The Cube. Breaking it down. This is The Cube covering bid data SV strata Hadoop. Back with more coverage after this short break. (electronic music)

Published Date : Mar 15 2017

SUMMARY :

it's The Cube. live at Silicon Valley for the big The Cube coverage, Bruno, great to see you. Hadoop Maturity Survey and at the time So, what did you guys do to break out? this idea that you can get the speed, What's the to-do list, what's the check box, the sweet spot is really how do you Microsoft and Tableau, leaders in that area. and where do you let the business lose? I want you to answer the questions, So, the visualization war, if you will, and, just the enabler. for the next generation GI platform is and data is coming together. of the designer to figure out, So, take the syntax out of the hands And getting in the weeds on that stuff. the type of multidimensional analysis that you need, George, you want to get a question in, edgewise? all the different sectors. the next thing they're going to ask You've got users giving you signals, either the pattern of users that might be- on the next generation data platform. So, the way that we view it- and what's in the Hadoop category? is the pure evolution, of, you know, the Hadoop platform, in order to build applications I think you guys are really thinking about it because if you look at the magic quadrant, and it's easy to work with data. Now, the guys that we didn't talk about For you guys, just more The idea is that the visualization This is The Cube covering bid data

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Bruno	PERSON	0.99+
Bruno Aziza	PERSON	0.99+
George	PERSON	0.99+
Comcast	ORGANIZATION	0.99+
ETNA	ORGANIZATION	0.99+
Brian	PERSON	0.99+
John Furrier	PERSON	0.99+
New York	LOCATION	0.99+
Josh Klahr	PERSON	0.99+
SIGNA	ORGANIZATION	0.99+
GSK	ORGANIZATION	0.99+
Josh	PERSON	0.99+
Home Depot	ORGANIZATION	0.99+
24 vendors	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
Yankees'	ORGANIZATION	0.99+
thousands	QUANTITY	0.99+
US	LOCATION	0.99+
Excel	TITLE	0.99+
last year	DATE	0.99+
Amazon	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
San Jose, California	LOCATION	0.99+
last week	DATE	0.99+
Silicon Valley	LOCATION	0.99+
AtScale	ORGANIZATION	0.99+
American Express	ORGANIZATION	0.99+
first one	QUANTITY	0.99+
first	QUANTITY	0.99+
20 years ago	DATE	0.99+
50	QUANTITY	0.98+
2017	DATE	0.98+
Tableau	TITLE	0.98+
Macy's	ORGANIZATION	0.98+
One	QUANTITY	0.98+
Mork	ORGANIZATION	0.98+
power BI	TITLE	0.98+
Ecosystem	ORGANIZATION	0.98+
Sequel	PERSON	0.97+
Google	ORGANIZATION	0.97+
this week	DATE	0.97+
Power BI	TITLE	0.97+
Cloudera	ORGANIZATION	0.96+
15 different BI tools	QUANTITY	0.95+
past year and a half	DATE	0.95+
over ten years	QUANTITY	0.95+
today	DATE	0.95+
Tableu	TITLE	0.94+
Tableau	ORGANIZATION	0.94+
SQL	TITLE	0.93+
Astro Data	ORGANIZATION	0.93+
Cube	ORGANIZATION	0.92+
Wikibon	ORGANIZATION	0.92+
two key differentiators	QUANTITY	0.92+
AtScale	TITLE	0.91+
Care Data	ORGANIZATION	0.9+
about 10x	QUANTITY	0.9+
Spark Sequel	TITLE	0.89+
two top ones	QUANTITY	0.89+
Hadoop	TITLE	0.88+
Athena	ORGANIZATION	0.87+
two great products	QUANTITY	0.87+
Big Query	TITLE	0.86+
The Cube	ORGANIZATION	0.85+
Big Data	ORGANIZATION	0.85+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for GSK: