Sizzle Reel | RSA Conference 2020

absolutely I think if I were to net it out Jeff what I'm sensing is there is a whole movement to shift security left which is this whole idea of IT stepping up as the first line of defense reduce cyber exposure take care of patching multi-factor authentication reduce their tax surface intrinsic security right so you know DevOps active ops take care of it right up front with all the apps even get built right then there is another movement to shift things right which is take care of the new new aspects of the attack surface right what the hackers always take advantage of of other areas where in a sense we are unprepared and for a long time they've seen us being unprepared in terms of reducing the attack surface and then they go after the new aspects of the tak surface and what are those IT I ot ot data as as an attack surface and the edge right so so these are areas where there's a lot of activity a lot of innovation you know on the on the air on the floor here if you walk the corners shifting left shifting right as in all the new aspects of the tax F is I'm seeing a lot of conversations a lot of innovation in that area I think it also boils down to real-world examples we've been really understand the demographics that we're working for I think today it's the first time really in history that we have four generations working side-by-side in the workforce so we have to understand that people learn differently training should be adjusted to the type of people that we're teaching but phishing doesn't just oil down to clicking on links phishing teaches also it boils down to tricking somebody getting someone's trust and it can come in different forms for example think of social media how do people connect we're connecting across social media on many different platforms I'll give a very easy example LinkedIn LinkedIn is for business have form we're all connected on LinkedIn why we connect on LinkedIn because that's a social platform that people feel safe on because we're able to connect to each other in a business form I want to think of the person who's getting the first job with an organization their first job in maybe their project manager and they're working for Bank a excited to be working for Bank a hey I'm gonna list all the projects I'm working for so here's now my resume on LinkedIn I'm working on project ABCD and this is my manager I report to perfect there's some information sitting there on LinkedIn now what else I will tell you is that you might have somebody who's looking to get into that Bank what will they do let's look for the lowest hanging fruit who this new project manager oh I see they're working on these projects and they're reporting in to someone well I'm not a project manager I'm a senior project manager from a competing bank I'm gonna befriend them and tell them that I'm really excited about the work they're doing here there's social engineering their way into their friendship into the good graces into their trust once done the video becomes a trusted source people share information freely so people are putting too much information out there on social trusting too easily opening the door for more than a phishing attack and things are just rapidly going out of control right so my co-founder and I both came from the world of being practitioners and we saw how limited the space wasn't actually changing human behavior I was given some animated powerpoints that use this to keep the Russians out of your Network which is a practical joke unless your job is on the line I took a huge step back and I said there are other fields that have figured this out behavioral science being one of them they use positive reinforcement gamification marketing and advertisement has figured out how to engage this human element just look around the RSA floor and there are so many learnings of how we make decisions as human beings that can be applied into changing people's behaviors and security so that's what we did adventure so this is my first early stage company we're still seeking series a we're a young company but our mantras we are the data value company so they have had this very robust analytics engine that goes into the heart of data I can track it and map it and make it beautiful and Along Came McNeely who actually sits on our board Oh does he and they said we need someone who's this week it's all happening so they asked Scott McNealy who is the craziest person in privacy and data that you know and he said oh my god get the done any woman so they got the den of a woman and that's what I do now so I'm taking this analytics value engine I'm pointing it to the board as I've always said Grace Hopper said data value and data risk has to be on the corporate balance sheet and so that's what we're building is a data balance sheet for everyone to use to actually value data for me it starts with technology that takes look we've only got so many security practitioners in the company actually defend your email example we've got to defend every user from those kinds of problems and so how do I find technology solutions that help take that load off the security practitioners so they can focus on the niche examples that are really really well-crafted emails and and and help take that load off the user because users just you're not going to be able to handle that right it's not fair to ask them and like you said it was just poorly timed that helps protect it so how do we help make sure that we're taking that technology load off identify the threats in advance and and protect them and so I think one of the biggest things that Chris and I talk a lot about is how do our solutions help make it easier for people to secure themselves instead of just providing only a technology technology advantage so the virtual analyst is able to sit on premises so it's localized learning collector has to understand the nature of those strats collect to be able to look at the needles of the needles if you will make sense of that and then automatically generate reports based off of that right so it's really an assist tool that a network in min or a security analyst was able to pick up and virtually save hours and hours of time so we have this we call it a thread research group within the company and their job is to take all the data from the sensors we have I mean we have we look at about 25 petabytes of data every day all our solutions are cloud solutions as well as on forum so we get the benefit of basically seeing all the data's that are hitting our customers every day I mean we block about 1 million attacks every minutes like every minute 1 billion attacks every minute minute right we protect over 3 million databases and you know we've mitigated some of the largest DDoS attacks that's ever been reported so we have a lot of date right that we're seen and the interesting thing is that you're right we are having to always we're using that threat research data to see what's happening how the threat landscape is changing therefore guiding us on how we need to augment and add to our products to prevent that but interestingly we're also consuming AI and machine learning as well on our products because we're able to use those solutions to actually do a lot of attack analytics and do a lot of predictive and research for our customers that can kind of guide them about you know where things are happening because what's happening is that before a lot of the tacks were just sort of fast and furious now we're seeing a pattern towards snow snow and continuous if that makes sense we're seeing all these patterns and threats coming in so we're fighting against those technologies like AI Barossa using those technologies to help us soon you know decide where we need to continue to add capabilities to stop it you know the whole bad box thing wasn't a problem right a number of years ago and so it's it's ever-changing your world which frankly speaking makes it an interesting place to be yes who wants to be in a static in a boring place right well I mean we do you're a good package or a bad package you have to traverse the network to be interesting we've all you know put our phones in airplane mode at blackhat or events like that but we don't want to be on it they're really boring when they're offline but they're also really boring too attackers when they're offline as soon as you turn them on you have a problem or could have a problem but as things traverse the network what better place to see who and what's on your network and on the gear and end of the day we're able to provide that visibility we're able to provide that enforcement so as you mentioned 2020 is now the year of awareness for us so the threat aware network we're able to do things like look at encrypted traffic do heuristics and analysis to figure out should that even be on my network because as you bring it into a network and you have to decrypt it a there's privacy concerns of that in these times but also it's computationally expensive to do that so it becomes a challenge from a both a financial perspective as well as a compliance perspective so we're helping solve s even kind of offset that traffic and be able to ensure your network secure so when we started developing our cyber recovery solution about five years ago we used the NIST cybersecurity framework which is a very well known standard that defines really five pillars of how organizations can think about building a cyber resilience strategy a cyber resilience strategy really encompasses everything from perimeter threat detection and response all the way through incident response after an attack and everything that happens in between protecting the data and recovering the data right and critical systems so I think of cyber resilience is that holistic strategy of protecting an organization and its data from a cyberattack yeah I think the human element is the hardest part you know in mind of this conference and its theme the human element the hardest part about this job is that it's not just mechanical issues and routing issues and networking issues but is about dealing with all types of humans innocent humans that do strange and bad things unknowingly and it's in malicious people who do very bad things that is by design and so the research suggests that no matter what we do in security awareness training some four percent of our employee base will continually bail security awareness that's what we fished and actively and so one of the things that we need to do is use automation and intelligence so that you can comb through all of that data and make a better informed decision about what risks are going to mitigate right and for this four percent are habitually abusing the system and can't be retrained well you can isolate them right and make sure that they're separated and then they're not able to to do things that may harm the organization you

Published Date : Mar 5 2020

**Summary and Sentiment Analysis are not been shown because of improper transcript**

ENTITIES

Entity	Category	Confidence
Chris	PERSON	0.99+
Grace Hopper	PERSON	0.99+
Scott McNealy	PERSON	0.99+
2020	DATE	0.99+
Jeff	PERSON	0.99+
LinkedIn	ORGANIZATION	0.99+
four percent	QUANTITY	0.99+
first job	QUANTITY	0.99+
one	QUANTITY	0.99+
over 3 million databases	QUANTITY	0.99+
1 billion attacks	QUANTITY	0.98+
four percent	QUANTITY	0.98+
first time	QUANTITY	0.98+
NIST	ORGANIZATION	0.98+
about 1 million attacks	QUANTITY	0.97+
both	QUANTITY	0.97+
Came McNeely	PERSON	0.95+
about 25 petabytes	QUANTITY	0.95+
DevOps	TITLE	0.93+
this week	DATE	0.93+
today	DATE	0.92+
first early stage	QUANTITY	0.92+
first line	QUANTITY	0.92+
five pillars	QUANTITY	0.9+
RSA Conference 2020	EVENT	0.87+
every minutes	QUANTITY	0.85+
every minute	QUANTITY	0.83+
five years ago	DATE	0.8+
things	QUANTITY	0.74+
every minute	QUANTITY	0.71+
number of years ago	DATE	0.7+
Sizzle	ORGANIZATION	0.66+
RSA	TITLE	0.63+
about	DATE	0.61+
Russians	PERSON	0.55+
every day	QUANTITY	0.54+
four generations	QUANTITY	0.52+
every	QUANTITY	0.52+
Reel	PERSON	0.47+
project	TITLE	0.46+
ABCD	OTHER	0.38+
Barossa	ORGANIZATION	0.38+

Daphne Koller, insitro | Stanford Women in Data Science (WiDS) Conference 2020

>>live from Stanford University. It's the queue covering Stanford women in data science 2020. Brought to you by Silicon Angle Media. >>Hi! And welcome to the Cube. I'm your host, Sonia, to guard. And we're live at Stanford University covering Woods Women in Data Science Conference The fifth annual one And joining us today is Daphne Koller, who is the co founder who sorry is the CEO and founder of In Citro that Daphne. Welcome to the Cube. >>Nice to be here, Sonia. Thank you for having me. So >>tell us a little bit about in Citro how you how you got founded and more about your >>role. So I've been working in the intersection of machine learning and biology and health for quite a while, and it was always a bit of an interesting journey and that the data sets were quite small and limited. We're now in a different world where there's tools that are allowing us to create massive biological data sense that I think can help us solve really significant societal problems. And one of those problems that I think is really important is drug discovery and development, where despite many important advancements, the costs just keep going up and up and up. And the question is, can we use machine learning to solve that problem >>better? And you talk about this more in your keynote, so give us a few highlights of what you talked about. So in the last, you can think of >>drug discovery development in the last 50 to 70 years as being a bit of a glass half full glass, half empty. The glass half full is the fact that there's diseases that used to be a death sentence or of sentenced, a lifelong of pain and suffering that >>are now >>addressed by some of the modern day medicines. And I think that's absolutely amazing. The >>other side of >>it is that the cost of developing new drugs has been growing exponentially and what's come to be known as the Rooms law being the inverse of Moore's law, which is the one we're all familiar with because the number of drugs approved per 1,000,000,000 U. S. Dollars just keeps going down exponentially. So the question is, can we change that curve? >>And you talked in your keynote about the interdisciplinary culture to tell us more about that? I think in >>order to address some of the critical problems that we're facing. One needs to really build a culture of people who work together at from different disciplines, each bringing their own insights and their own ideas into the mix. So and in Citro, we actually have a company. That's half life scientists, many of whom are producing data for the purpose of driving machine learning models and the other Halford machine learning people in data scientists who are working on those. But it's not a handoff where one group produces that they then the other one consumes and interpreted. But really, they start from the very beginning to understand. What are the problems that one could solve together? How do you design the experiment? How do you build the model and how do you derive insights from that that can help us make better medicines for people? >>And, um, I also wanted to ask you the you co founded coursera, so tell us a little bit more about that platform. So I found that >>coursera as a result of work that I've been doing at Stanford, working on how technology can make education better and more accessible. This was a project that I did here, number of my colleagues as well. And at some point in the fall of 2011 there was an experiment of Let's take some of the content that we've been we've been developing within within Stanford and put it out there for people to just benefit from, and we didn't know what would happen. Would it be a few 1000 people, but within a matter of weeks with minimal advertising Other than one New York Times article that went viral, we had 100,000 people in each of those courses. And that was a moment in time where, you know, we looked at it at this and said, Can we just go back to writing more papers or is there an incredible opportunity to transform access to education to people all over the world? And so I ended up taking a what was supposed to be to really absence from Stanford to go and co found coursera, and I thought I'd go back after two years, but the But at the end of that two year period, the there was just so much more to be done and so much more impact that we could bring to people all over the world, people of both genders, people of different social economic status, every single country around the world. We just felt like this was something that I couldn't not dio. >>And how did you Why did you decide to go from an educational platform to then going into machine learning and biomedicine? >>So I've been doing Corsair for about five years in 2016 and the company was on a great trajectory. But it's primarily >>a >>a content company, and around me, machine learning was transforming the world, and I wanted to come back and be part of that. And when I looked around, I saw machine learning being applied to e commerce and the natural language and to self driving cars. But there really wasn't a lot of impact being made on the life science area. I wanted to be part of making that happen, partly because I felt like coming back to your earlier comment that in order to really have that impact, you need to have someone who speaks both languages. And while there's a new generation of researchers who are bilingual in biology and machine learning, there's still a small group in there, very few of those in kind of my age cohort and I thought that I would be able to have a real impact by bullying company in the space. >>So it sounds like your background is pretty varied. What advice would you give to women who are just starting college now who may be interested in the similar field? Would you tell them they have to major in math? Or or do you think that maybe, like there's some other majors that may be influential as well? I think >>there is a lot of ways to get into data science. Math is one of them. But there's also statistics or physics. And I would say that especially for the field that I'm currently in, which is at the intersection of machine learning data science on the one hand, and biology and health on the other one can, um, get there from biology or medicine as well. But what I think is important is not to shy away from the more mathematically oriented courses in whatever major you're in, because that foundation is a really strong one. There is ah lot of people out there who are basically lightweight consumers of data science, and they don't really understand how the methods that they're deploying, how they work and that limits thumb in their ability to advance the field and come up with new methods that are better suited, perhaps, of the problems of their tackling. So I think it's totally fine. And in fact, there's a lot of value to coming into data science from fields other than now third computer science. But I think taking courses in those fields, even while you're majoring in whatever field you're interested in, is going to make you a much better person who lives at that intersection. >>And how do you think having a technology background has helped you in in founding your companies and has helped you become a successful CEO in companies >>that are very strongly R and D, focused like like in Citro and others? Having a technical co founder is absolutely essential because it's fine to have and understanding of whatever the user needs and so on and come from the business side of it. And a lot of companies have a business co founder. But not understanding what the technology can actually do is highly limiting because you end up hallucinating. Oh, if we could only do this and that would be great. But you can't and people end up often times making ridiculous promises about what's technology will or will not do because they just don't understand where the land mines sit. And, um, and where you're going to hit reels, obstacles in the path. So I think it's really important to have a strong technical foundation in these companies. >>And that being said, Where do you see in Teacher in the future? And how do you see it solving, Say, Nash, that you talked about in your keynote. >>So we hope that in Citro will be a fully integrated drug discovery and development company that is based on a completely different foundation than a traditional pharma company where they grew up. In the old approach of that is very much a bespoke scientific um, analysis of the biology of different diseases and then going after targets are ways of dealing with the disease that are driven by human intuition. Where I think we have the opportunity to go today is to build a very data driven approach that collects massive amounts of data and then let analysis of those data really reveal new hypotheses that might not be the ones that accord with people's preconceptions of what matters and what doesn't. And so hopefully we'll be able to overtime create enough data and applying machine learning to address key bottlenecks in the drug discovery development process that we can bring better drugs to people, and we can do it faster and hopefully it much lower cost. >>That's great. And you also mention in your keynote that you think the 20 twenties is like a digital biology era, so tell us more about that. So I think if >>you look, if you take a historical perspective on science and think back, you realize that there's periods in history where one discipline has made a tremendous amount of progress in relatively short amount of time because of a new technology or a new way of looking at things in the 18 seventies, that discipline was chemistry with the understanding of the periodic table, and that you actually couldn't turn lead into gold in the 19 hundreds. That was physics with understanding the connection between matter and energy in between space and time. In the 19 fifties that was computing where silicon chips were suddenly able to perform calculations that up until that point, only people have been able to >>dio. And then in 19 nineties, >>there was an interesting bifurcation. One was three era of data, which is related to computing but also involves elements, statistics and optimization of neuroscience. And the other one was quantitative biology. In which file do you move from a descriptive signs of taxonomy izing phenomenon to really probing and measuring biology in a very detailed on high throughput way, using techniques like micro arrays that measure the activity of 20,000 genes at once, or the human genome sequencing of the human genome and many others. But >>these two fields kind of >>evolved in parallel, and what I think is coming now, 30 years later, is the convergence of those two fields into one field that I like to think of a digital biology where we are able using the tools that have and continue to be developed, measure biology, an entirely new levels of detail, of fidelity of scale. We can use the techniques of machine learning and data signs to interpret what we're seeing and then use some of the technologies that are also emerging to engineer biology to do things that it otherwise wouldn't do. And that will have implications and bio materials in energy and the environment in agriculture. And I think also in human health. And it's a incredibly exciting space toe to be in right now, because just so much is happening in the opportunities to make a difference and make the world a better place or just so large. >>That sounds awesome. Stephanie. Thank you for your insight. And thanks for being on the Cube. Thank you. I'm Sonia. Taqueria. Thanks for watching. Stay tuned for more. Okay? Great. Yeah, yeah, yeah.

Published Date : Mar 3 2020

SUMMARY :

Brought to you by Silicon Angle Media. And we're live at Stanford University covering Thank you for having me. And the question is, can we use machine learning to solve that problem So in the last, you can think of drug discovery development in the last 50 to 70 years as being a bit of a glass half full glass, And I think that's absolutely amazing. it is that the cost of developing new drugs has been growing exponentially and the other Halford machine learning people in data scientists who are working And, um, I also wanted to ask you the you co founded coursera, so tell us a little bit more about And at some point in the fall of 2011 there was an experiment the company was on a great trajectory. comment that in order to really have that impact, you need to have someone who speaks both languages. What advice would you give to women who are just starting methods that are better suited, perhaps, of the problems of their tackling. So I think it's really important to have a strong technical And that being said, Where do you see in Teacher in the future? key bottlenecks in the drug discovery development process that we can bring better drugs to people, And you also mention in your keynote that you think the 20 twenties is like the understanding of the periodic table, and that you actually couldn't turn lead into gold in And then in 19 nineties, And the other one was quantitative biology. is the convergence of those two fields into one field that I like to think of a digital biology And thanks for being on the Cube.

ENTITIES

Entity	Category	Confidence
Sonia	PERSON	0.99+
Daphne Koller	PERSON	0.99+
Stephanie	PERSON	0.99+
2016	DATE	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
20,000 genes	QUANTITY	0.99+
100,000 people	QUANTITY	0.99+
Stanford University	ORGANIZATION	0.99+
18 seventies	DATE	0.99+
Corsair	ORGANIZATION	0.99+
19 fifties	DATE	0.99+
one field	QUANTITY	0.99+
two fields	QUANTITY	0.99+
Moore	PERSON	0.99+
Daphne	PERSON	0.99+
fall of 2011	DATE	0.99+
20 twenties	DATE	0.99+
one	QUANTITY	0.99+
both genders	QUANTITY	0.99+
each	QUANTITY	0.98+
both languages	QUANTITY	0.98+
30 years later	DATE	0.97+
Taqueria	PERSON	0.97+
One	QUANTITY	0.97+
today	DATE	0.97+
Nash	PERSON	0.97+
two year	QUANTITY	0.97+
third	QUANTITY	0.97+
Stanford	ORGANIZATION	0.96+
Woods Women in Data Science Conference	EVENT	0.96+
19 hundreds	DATE	0.96+
one discipline	QUANTITY	0.96+
Halford	ORGANIZATION	0.95+
2020	DATE	0.95+
New York Times	ORGANIZATION	0.94+
about five years	QUANTITY	0.94+
Citro	ORGANIZATION	0.94+
70 years	QUANTITY	0.93+
1000 people	QUANTITY	0.93+
Stanford Women in Data Science	EVENT	0.89+
19 nineties	DATE	0.86+
one group	QUANTITY	0.77+
fifth annual one	QUANTITY	0.76+
Citro	TITLE	0.72+
WiDS) Conference 2020	EVENT	0.69+
three	QUANTITY	0.66+
single country	QUANTITY	0.65+
50	QUANTITY	0.64+
half full	QUANTITY	0.62+
two years	QUANTITY	0.6+
1,000,000,000 U. S. Dollars	QUANTITY	0.59+
in Citro	ORGANIZATION	0.53+
Rooms	TITLE	0.52+
In	ORGANIZATION	0.51+
Cube	ORGANIZATION	0.47+

Talithia Williams, Harvey Mudd College | Stanford Women in Data Science (WiDS) Conference 2020

>>live from Stanford University. It's the queue covering Stanford women in Data Science 2020. Brought to you by Silicon Angle Media >>and welcome to the Cube. I'm your host Sonia category, and we're live at Stanford University, covering the fifth annual Woods Women in Data Science conference. Joining us today is Tilapia Williams, who's the associate professor of mathematics at Harvey Mudd College and host of Nova Wonders at PBS to leave a welcome to the Cappy to be here. Thanks for having me. So you have a lot of rules. So let's first tell us about being an associate professor at Harvey Mudd. >>Yeah, I've been at Harvey Mudd now for 11 years, so it's been really a lot of fun in the math department, but I'm a statistician by training, so I teach a lot of courses and statistics and data science and things like that. >>Very cool. And you're also a host of API s show called Novo Wonders. >>Yeah, that came about a couple of years ago. Folks at PBS reached out they had seen my Ted talk, and they said, Hey, it looks like you could be fund host of this science documentary shows So, Nova Wonders, is a six episode Siri's. It kind of takes viewers on a journey of what the cutting edge questions and science are. Um, so I got to host the show with a couple other co host and really think about like, you know, what are what are the animals saying? And so we've got some really fun episodes to do. What's the universe made of? Was one of them what's living inside of us. That was definitely a gross win. Todo figure out all the different micro organisms that live inside our body. So, yeah, it's been funded in hopes that show as well. >>And you talk about data science and AI and all that stuff on >>Yeah. Oh, yeah, yeah, one of the episodes. Can we build a Brain was dealt with a lot of data, big data and artificial intelligence, and you know, how good can we get? How good can computers get and really sort of compared to what we see in the movies? We're a long way away from that, but it seems like you know we're getting better every year, building technology that is truly intelligent, >>and you gave a talk today about mining for your own personal data. So give us some highlights from your talk. Yeah, >>so that talks sort of stemmed out of the Ted talk that I gave on owning your body's data. And it's really challenging people to think about how they can use data that they collect about their bodies to help make better health decisions on DSO ways that you can use, like your temperature data or your heart rate. Dina. Or what is data say over time? What does it say about your body's health and really challenging the audience to get excited about looking at that data? We have so many devices that collect data automatically for us, and often we don't pause on enough to actually look at that historical data. And so that was what the talk was about today, like, here's what you can find when you actually sit down and look at that data. >>What's the most important data you think people should be collecting about themselves? >>Well, definitely not. Your weight is. I don't >>want to know what that >>is. Um, it depends, you know, I think for women who are in the fertile years of life taking your daily waking temperature can tell you when your body's fertile. When you're ovulating, it can. So that information could give women during that time period really critical information. But in general, I think it's just a matter of being aware of of how your body is changing. So for some people, maybe it's your blood pressure or your blood sugar. You have high blood pressure or high blood sugar. Those things become really critical to keep an eye on. And, um, and I really encourage people whatever data they take, too, the active in the understanding of an interpretation of the data. It's not like if you take this data, you'll be healthy radio. You live to 100. It's really a matter of challenging people to own the data that they have and get excited about understanding the data that they are taking. So >>absolutely put putting people in charge of their >>own bodies. That's >>right. >>And actually speaking about that in your Ted talk, you mentioned how you were. Your doctor told you to have a C section and you looked at the data and he said, No, I'm gonna have this baby naturally. So tell us more about that. >>Yes, you should always listen to your medical pressures. But in this case, I will say that it was It was definitely more of a dialogue. And so I wasn't just sort of trying to lean on the fact that, like, I have a PhD in statistics and I know data, he was really kind of objectively with the on call doctor at the time, looking at the data >>and talking about it. >>And this doctor was this is his first time seeing me. And so I think it would have been different had my personal midwife or my doctor been telling me that. But this person would have only looked at this one chart and was it was making a decision without thinking about my historical data. And so I tried to bring that to the conversation and say, like, let me tell you more about you know, my body and this is pregnancy number three like, here's how my body works. And I think this person in particular just wasn't really hearing any of that. It was like, Here's my advice. We just need to do this. I'm like, >>Oh, >>you know, and so is gently as possible. I tried to really share that data. Um, and then it got to the point where it was sort of like either you're gonna do what I say or you're gonna have to sign a waiver. And we were like, Well, to sign the waiver that cost quite a buzz in the hospital that day. But we came back and had a very successful labor and delivery. And so, yeah, >>I think >>that at the time, >>But, >>you know, with that caveat that you should listen to what, your doctors >>Yeah. I mean, there's really interesting, like, what's the boundary between, Like what the numbers tell you and what professional >>tells me Because I don't have an MD. Right. And so, you know, I'm cautious not to overstep that, but I felt like in that case, the doctor wasn't really even considering the data that I was bringing. Um, I was we were actually induced with our first son, but again, that was more of a conversation, more of a dialogue. Here's what's happening here is what we're concerned about and the data to really back it up. And so I felt like in that case, like Yeah, I'm happy to go with your suggestion, but I could number three. It was just like, No, this isn't really >>great. Um, so you also wrote a book called Power In Numbers. The Rebel Women of Mathematics. So what inspired you to write this book? And what do you hope readers take away from it? >>A couple different things. I remember when I saw the movie hidden figures. And, um, I spent three summers at NASA working at JPL, the Jet Propulsion Laboratory. And so I had this very fun connection toe, you know, having worked at NASA. And, um, when this movie came out and I'm sitting there watching it and I'm, like ball in just crying, like I didn't know that there were black women who worked at NASA like, before me, you know, um and so it felt it felt it was just so transformative for me to see these stories just sort of unfold. And I thought, like, Well, why didn't I learn about these women growing up? Like imagine, Had I known about Katherine Johnsons of the world? Maybe that would have really inspired Not just me, but, you know, thinking of all the women of color who aren't in mathematics or who don't see themselves working at at NASA. And so for me, the book was really a way to leave that legacy to the generation that's coming up and say, like, there have been women who've done mathematics, um, and statistics and data science for years, and they're women who are doing it now. So a lot of the about 1/3 of the book are women who were still here and, like, active in the field and doing great things. And so I really wanted to highlight sort of where we've been, where we've been, but also where we're going and the amazing women that are doing work in it. And it's very visual. So some things like, Oh my gosh, >>women in math >>It is really like a very picturesque book of showing this beautiful images of the women and their mathematics and their work. And yes, I'm really proud of it. >>That's awesome. And even though there is like greater diversity now in the tech industry, there's still very few African American women, especially who are part of this industry. So what advice would you give to those women who who feel like they don't belong. >>Yeah, well, a they really do belong. Um, and I think it's also incumbent of people in the industry to sort of recognize ways that they could be advocate for women, and especially for women of color, because often it takes someone who's already at the table to invite other people to the table. And I can't just walk up like move over, get out the way I'm here now. But really being thoughtful about who's not representative, how do we get those voices here? And so I think the onus is often mawr on. People who occupy those spaces are ready to think about how they can be more intentional in bringing diversity in other spaces >>and going back to your talk a little bit. Um uh, how how should people use their data? >>Yeah, so I mean, I think, um, the ways that we've used our data, um, have been to change our lifestyle practices. And so, for example, when I first got a Fitbit, um, it wasn't really that I was like, Oh, I have a goal. It was just like I want something to keep track of my steps And then I look at him and I feel like, Oh, gosh, I didn't even do anything today. And so I think having sort of even that baseline data gave me a place to say, Okay, let me see if I hit 10 stuff, you know, 10,000 >>steps in a day or >>and so, in some ways, having the data allows you to set goals. Some people come in knowing, like, I've got this goal. I want to hit it. But for me, it was just sort of like, um and so I think that's also how I've started to use additional data. So when I take my heart rate data or my pulse, I'm really trying to see if I can get lower than how it was before. So the push is really like, how is my exercise and my diet changing so that I can bring my resting heart rate down? And so having the data gives me a gold up, restore it, and it also gives me that historical information to see like, Oh, this is how far I've come. Like I can't stop there, you know, >>that's a great social impact. >>That's right. Yeah, absolutely. >>and, um, Do you think that so in terms of, like, a security and privacy point of view, like if you're recording all your personal data on these devices, how do you navigate that? >>Yeah, that's a tough one. I mean, because you are giving up that data privacy. Um, I usually make sure that the data that I'm allowing access to this sort of data that I wouldn't care if it got published on the cover of you know, the New York Times. Maybe I wouldn't want everyone to see what my weight is, but, um, and so in some ways, while it is my personal data, there's something that's a bit abstract from it. Like it could be anyone's data as opposed to, say, my DNA. Like I'm not going to do a DNA test. You know, I don't want my data to be mapped it out there for the world. Um, but I think that that's increasingly become a concern because people are giving access to of their information to different companies. It's not clear how companies would use that information, so if they're using my data to build a product will make a product better. You know we don't see any world from that way. We don't have the benefit of it, but they have access to our data. And so I think in terms of data, privacy and data ethics, there's a huge conversation to have around that. We're only kind >>of at the beginning of understanding what that is. Yeah, >>well, thank you so much for being on the Cube. Really having you here. Thank you. Thanks. So I'm Sonia to Gary. Thanks so much for watching the cube and stay tuned for more. Yeah, yeah, yeah.

Published Date : Mar 3 2020

SUMMARY :

Brought to you by Silicon Angle Media So you have a lot of rules. the math department, but I'm a statistician by training, so I teach a lot of courses and statistics and data And you're also a host of API s show called Novo Wonders. so I got to host the show with a couple other co host and really think about like, with a lot of data, big data and artificial intelligence, and you know, how good can we get? and you gave a talk today about mining for your own personal data. And so that was what the talk was about today, like, here's what you can find when you actually sit down and look at that data. I don't is. Um, it depends, you know, I think for women who are in That's And actually speaking about that in your Ted talk, you mentioned how you were. And so I wasn't just bring that to the conversation and say, like, let me tell you more about you know, my body and this is pregnancy number Um, and then it got to the point where it was sort of like either you're gonna do what I say or you're gonna have you and what professional And so I felt like in that case, like Yeah, I'm happy to go with your suggestion, And what do you hope readers take away from it? And so I had this very fun connection toe, you know, having worked at NASA. And yes, I'm really proud of it. So what advice would you give to those women who who feel like they don't belong. And so I think the onus and going back to your talk a little bit. me a place to say, Okay, let me see if I hit 10 stuff, you know, 10,000 so I think that's also how I've started to use additional data. Yeah, absolutely. And so I think in terms of data, of at the beginning of understanding what that is. well, thank you so much for being on the Cube.

ENTITIES

Entity	Category	Confidence
Tilapia Williams	PERSON	0.99+
Sonia	PERSON	0.99+
Talithia Williams	PERSON	0.99+
PBS	ORGANIZATION	0.99+
Gary	PERSON	0.99+
11 years	QUANTITY	0.99+
NASA	ORGANIZATION	0.99+
10,000	QUANTITY	0.99+
Siri	TITLE	0.99+
100	QUANTITY	0.99+
Novo Wonders	TITLE	0.99+
Jet Propulsion Laboratory	ORGANIZATION	0.99+
Power In Numbers	TITLE	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
Katherine Johnsons	PERSON	0.99+
Stanford University	ORGANIZATION	0.99+
first son	QUANTITY	0.99+
today	DATE	0.99+
Harvey Mudd College	ORGANIZATION	0.99+
first time	QUANTITY	0.99+
Dina	PERSON	0.99+
first	QUANTITY	0.99+
JPL	ORGANIZATION	0.99+
three summers	QUANTITY	0.98+
six episode	QUANTITY	0.98+
Harvey Mudd	ORGANIZATION	0.97+
So, Nova Wonders	TITLE	0.97+
one	QUANTITY	0.96+
The Rebel Women of Mathematics	TITLE	0.96+
10 stuff	QUANTITY	0.94+
New York Times	ORGANIZATION	0.94+
couple of years ago	DATE	0.93+
Stanford	ORGANIZATION	0.93+
Stanford Women in Data Science	EVENT	0.92+
Woods Women in Data Science conference	EVENT	0.92+
a day	QUANTITY	0.92+
one chart	QUANTITY	0.91+
about 1/3	QUANTITY	0.88+
Fitbit	ORGANIZATION	0.86+
pregnancy	QUANTITY	0.81+
Ted	TITLE	0.8+
hidden figures	TITLE	0.79+
fifth	QUANTITY	0.77+
Ted talk	TITLE	0.71+
African American	OTHER	0.7+
couple	QUANTITY	0.7+
WiDS) Conference 2020	EVENT	0.68+
three	QUANTITY	0.68+
number three	QUANTITY	0.67+
Nova Wonders	TITLE	0.63+
co	QUANTITY	0.63+
2020	DATE	0.5+
Data	EVENT	0.46+
Science	TITLE	0.42+
Cappy	ORGANIZATION	0.37+

Newsha Ajami, Stanford University | Stanford Women in Data Science (WiDS) Conference 2020

>>live from Stanford University. It's the queue covering Stanford women in data science 2020. Brought to you by Silicon Angle Media. >>Yeah, yeah, and welcome to the Cube. I'm your host Sonia Category and we're live at Stanford University, covering the fifth annual Woods Women in Data Science Conference. Joining us today is new Sha Ajami, who's the director of urban water policy for Stanford. You should welcome to the Cube. Thank you for having me. Absolutely. So tell us a little bit about your role. So >>I directed around water policy program at Stanford. We focused on building solutions for resilient cities to try to use data science and also the mathematical models to better understand how water use is changing and how we can build a future cities and infrastructure to address the needs of the people in the US, in California and across the world. >>That's great. And you're gonna give a talk today about how to build water security using big data. So give us a preview of your talk. >>Sure. So the 20th century water infrastructure model was very much of a >>top down model, >>so we built solutions or infrastructure to bring water to people, but people were not part of the loop. They were not the way that they behaved their decision making process. What they used, how they use it wasn't necessarily part of the process and the assume. There's enough water out there to bring water to people, and they can do whatever they want with it. So what we're trying to do is you want to change this paradigm and try to make it more bottom up at to engage people's decision making process and the uncertainty associated with that as part of the infrastructure planning process. Until I'll be talking, I'll talk a little bit about that. >>And where is the most water usage coming from? So, >>interestingly enough, in developed world, especially in the in the western United States, 50% of our water is used outdoors for grass and outdoor spacing, which we don't necessarily are dependent on. Our lives depend on it. I'll talk about the statistics and my talk, but grass is the biggest club you're going in the US while you're not really needing it for food consumption and also uses four times more water >>than than >>corn, which is which is a lot of water. And in California alone, if you just think about some of the spaces that we have grass or green spaces, we have our doors in the in. The in the malls are institutional buildings or different outdoor spaces. We have some of that water. If we can save, it can provide water for about a 1,000,000 or two million people a year. So that's a lot of water that we can be able to we can save and use, or you are actually a repurpose for needs that you really half. >>So does that also boil down to like people of watering their own lawns? Or is the problem for a much bigger grass message? >>Actually, interestingly enough, that's only 10% of that water out the water use. The rest of it is actually the residential water use, which is what you and I, the grass you and I have in our backyard and watering it so that water is even more than that amount that I mentioned. So we use a lot of water outdoors and again. Some of these green spaces are important for community building for making sure everybody has access to green spaces and people. Kids can play soccer or play outdoors, but really our individual lawns and outdoor spaces. If there are not really a native you know landscaping, it's not something that views enough to justify the amount of water you use for that purpose. >>So taking longer showers and all the stuff is very minimal compared to no, not >>at all. Sure, those are also very, very important. That's another 50% of our water. They're using that urban areas. It is important to be mindful the baby wash dishes. Maybe take shower the baby brush rt. They're not wasting water while you're doing that. And a lot of other individual decisions that we make that can impact water use on a daily basis. >>Right, So So tell us a little bit more about right now in California, We just had a dry February was the 1st 150 years, and you know, this is a huge issue for cities, agriculture and for potential wildfires. So tell us about your opinion about that. So, >>um, the 20th century's infrastructure model I mentioned at the beginning One of the flaws in that system is that it assumes that we will have enough snow in the mountains that would melt during the spring and summer time and would provide us water. The problem is, climate change has really, really impacted that assumption, and now you're not getting as much snow, which is comes back to the fact that this February we have not received any snow. We're still in the winter and we have spring weather and we don't really have much snow on the mountain. Which means that's going to impact the amount of water we have for summer and spring time this year. We had a great last year. We got enough water in our reservoirs, which means that you can potentially make it through. But then you have consecutive years that are dry and they don't receive a lot of water precipitation in form of snow or rain. That will become a very problematic issue to meet future water demands in California. >>And do you think this issue is along with not having enough rainfall, but also about how we store water, or do you think there should be a change in that policy? >>Sure, I think that it definitely has something also in the way we store water and be definitely you're in the 21st century. We have different problems and challenges. It's good to think about alternative ways off a storing water, including using groundwater sources. Groundwater as a way off, storing excess water or moving water around faster and making sure we use every drop of water that falls on the ground and also protecting our water supplies from contamination or pollution. >>And you see it's ever going to desalination or to get clean water. So, interestingly >>enough, I think desalination definitely has worth in other parts of the world, and then they have. Then you have smaller population or you have already tapped out of all the other options that are available to you. Desalination is expensive. Solution costs a lot of money to build this infrastructure and also again depends on you know, this centralized approach that we will build something and provide resources to people from from that location. So it's very costly to build this kind of solutions. I think for for California we still have plenty of water that we can save and repurpose, I would say, and also we still can do recycling and reuse. We can capture our stone water and reuse it, so there's so many other, cheaper, more accessible options available before you go ahead and build a desalination plants >>and you're gonna be talking about sustainable water resource management. So tell us a little bit more about that, too. So the thing with >>water mismanagement and occasionally I use also the word like building resilient water. Future is all about diversifying our water supply and being mindful of how they use our water, every drop of water that use its degraded on. It needs to be cleaned up and put back in the environment, so it always starts from the bottom. The more you save, the less impact you have on the environment. The second thing is you want to make sure every trouble wanted have used. We can use it as many times possible and not make it not not. Take it, use it, lose its right away, but actually be able to use it multiple times for different purposes. Another point that's very important, as actually majority of the water they've used on a daily basis is it doesn't need to be extremely clean drinking water quality. For example, if you tell someone that you're flushing down our toilets. Drinkable water would surprise you that we would spend this much time and resources and money and energy to clean that water to flush it down the toilet video using it. So So basically rethinking the way we built this infrastructure model is very important, being able to tailor water to the needs that we have and also being mindful of Have you use that resource? >>So is your research focus mainly on California or the local community? We actually >>are solutions that we built on our California focus. Actually, we try to build solutions that can be easily applied to different places. Having said that, because you're working from the bottom up, wavy approach water from the bottom up, you need to have a local collaboration and local perspective to bring to their to this picture on. A lot of our collaborators have been so far in California, we have had data from them. We were able to sort of demonstrate some of the assumptions we had in California. But we work actually all over the world. We have collaborators in Europe in Asia and they're all trying to do the same thing that we dio on. You're trying to sort of collaborate with them on some of the projects in other parts of the world. >>That's awesome. So going forward, what do you hope to see with sustainable water management? So, to >>be honest with you, I would often we think about technology as a way that would solve all our problems and move us out of the challenges we have. I would say technology is great, but we need to really rethink the way we manager resource is on the institutions that we have on there. We manage our data and information that we have. And I really hope that became revolutionized that part of the water sector and disrupt that part because as we disrupt this institutional part >>on the >>system, provide more system level thinking to the water sector, I'm hoping that that would change the way we manage our water and then actually opens up space for some of these technologies to come into play as >>we go forward. That's awesome. So before we leave here, you're originally from Tehran. Um and and now you're in this data science industry. What would you say to a kid who's abroad, who wants to maybe move here and have a career in data science? >>I would say Study hard, Don't let anything to disk or do you know we're all equal? Our brains are all made the same way. Doesn't matter what's on the surface. So, um so I and encourage all the girls study hard and not get discouraged and fail as many times as you can, because failing is an opportunity to become more resilient and learn how to grow. And, um and I have, and I really hope to see more girls and women in this in these engineering and stem fields, to be more active on, become more prominent. >>Have you seen a large growth within the past few years? Definitely, >>the conversation is definitely there, and there are a lot more women, and I love how Margot and her team are sort of trying to highlight the number of people who are out there. And working on these issues because that demonstrates that the field wasn't necessarily empty was just not not highlighted as much. So for sure, it's very encouraging to see how much growth you have seen over the years for sure >>you shed. Thank you so much. It's really inspiring all the work you do. Thank you for having me. So no, Absolutely nice to meet you. I'm Senator Gary. Thanks for watching the Cube and stay tuned for more. Yeah, yeah, yeah.

Published Date : Mar 3 2020

SUMMARY :

Brought to you by Silicon Angle Media. Thank you for having me. models to better understand how water use is changing So give us a preview of your talk. to do is you want to change this paradigm and try to make it more bottom up at and my talk, but grass is the biggest club you're going in the US So that's a lot of water that we can be able to we can save and use, The rest of it is actually the residential water use, which is what you and I, They're not wasting water while you're doing that. We just had a dry February was the 1st 150 years, and you know, Which means that's going to impact the amount of water we have for summer and spring time this year. Sure, I think that it definitely has something also in the way we store water and be definitely you're And you see it's ever going to desalination or to get clean water. I think for for California we still have plenty of water that we can save and repurpose, So the thing with the needs that we have and also being mindful of Have you use that resource? the bottom up, you need to have a local collaboration and local So going forward, what do you hope to see with sustainable that part of the water sector and disrupt that part because as we disrupt this institutional So before we leave here, you're originally from Tehran. and fail as many times as you can, because failing is an opportunity to become more resilient it's very encouraging to see how much growth you have seen over the years for sure It's really inspiring all the work you do.

ENTITIES

Entity	Category	Confidence
Europe	LOCATION	0.99+
California	LOCATION	0.99+
US	LOCATION	0.99+
Sha Ajami	PERSON	0.99+
Tehran	LOCATION	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
Margot	PERSON	0.99+
20th century	DATE	0.99+
50%	QUANTITY	0.99+
21st century	DATE	0.99+
Newsha Ajami	PERSON	0.99+
Stanford University	ORGANIZATION	0.99+
last year	DATE	0.99+
February	DATE	0.99+
Sonia	PERSON	0.98+
second thing	QUANTITY	0.98+
10%	QUANTITY	0.98+
Asia	LOCATION	0.98+
today	DATE	0.98+
Gary	PERSON	0.97+
Stanford	ORGANIZATION	0.96+
Woods Women in Data Science Conference	EVENT	0.96+
four times	QUANTITY	0.95+
Senator	PERSON	0.94+
western United States	LOCATION	0.93+
1st 150 years	QUANTITY	0.93+
2020	DATE	0.92+
Stanford Women in Data Science (	EVENT	0.9+
this year	DATE	0.86+
two million people a year	QUANTITY	0.85+
Cube	ORGANIZATION	0.82+
about a 1,000,000	QUANTITY	0.8+
WiDS) Conference 2020	EVENT	0.77+
this February	DATE	0.75+
One	QUANTITY	0.74+
Cube	TITLE	0.63+
past	DATE	0.55+
fifth	EVENT	0.54+
data	TITLE	0.52+
drop	QUANTITY	0.51+
years	DATE	0.49+
annual	QUANTITY	0.41+

Emily Glassberg Sands, Coursera | Stanford Women in Data Science (WiDS) Conference 2020

>> Reporter: Live from Stanford University, it's theCUBE, covering Stanford Women in Data Science 2020. Brought to you by SiliconANGLE media. >> Hi, and welcome to theCUBE. I'm your host, Sonia Tagare, and we're live at Stanford University covering the fifth annual WiDs, Women in Data Science conference. Joining us today is Emily Glassberg Sands, the Head of Data Science at Coursera, Emily, welcome to theCUBE. >> Thanks, so great to be on. >> So, tell us a little bit more about what you do at Coursera. >> Yeah, absolutely, so Coursera is the world's largest platform for higher education. We partner with about 160 universities and 20 industry partners and we provide top learning content from data science to child nutrition to about 50 million learners around the world. I lead the end to end data team so spanning data engineering, data science and machine learning. >> Wow, and we just had Daphne Koller on earlier this morning who is the co-founder of Coursera and she's also the one who hired you. >> Yeah. >> So tell us more about that relationship. >> Well, I love Daphne, I think the world of her, as I will talk about shortly, she actually didn't hire me from the start. The first answer I got one from Coursera was a no, that the company wasn't quite ready for someone who wasn't a full blown coder. But I eventually talked to her into bringing me on board, and she's been an inspiration ever since. I think one of my first memories of Daphne was when she was painting the vision of what's possible with online education, and she said, "think about the first movie." The first movie was literally just filming a play on stage. You'll appreciate this, given your background in film, and then fast forward to today and think about what's possible in movies that could never be possible on the brick-and-mortar stage. And the analog she was creating was the first MOOC, the first Massive Open Online Course was very simply filming a professor in a classroom. But she was thinking forward to today and tomorrow and five years from now, and what's possible in terms of how data and technology can transform, how educators teach and how learners learn. >> That's very cool. So, how has Coursera changed from when she started it to now? >> So, it's evolved a lot. So, I've been at Coursera about six years, when I joined the company, it had less than 50 people. Today we're 10 times that size, we have 500. I think there have been obviously dramatic growth in the platform over all the three main changes to our business model. The first is we've moved from partnering exclusively with universities to recognizing that actually, a lot of the most important education for folks in the labor market is being taught within companies. So, Google is super incentivized to train people in Google Cloud, Amazon and AWS. Folks need to learn Tableau and a whole host of other software's. So, we've expanded to including education that's provided not just by top institutions like Stanford, but also by top institutions that are companies like Amazon and Google. The second big change is we've recognized that while for many learners and individual course or a MOOC is sufficient, some learners need access to full degree, a diploma bearing credential. So we've moved to the degree space we now have 14 degrees live on the platform masters in computer science and data science but also in business, accounting, and so on. And the third major changes, I think just sort of as the world has evolved to recognize that folks need to be learning throughout their lives. There's also general consensus that it's not just on the individuals to learn, but also on their companies to train them and governments as well, and so we launched Coursera enterprise, which is about providing learning content through employers and through governments so we can reach a wider swath of individuals who might not be able to afford it themselves. >> And how are you able to use data science to track individual, user preferences and user behavior? >> Yeah, that's a great question so you can imagine right? 50 million learners, they're from almost every country in the world from a range of different backgrounds have a bunch of different goals, And so I think what you're getting out is that so much of creating the right learning experience for each person is about personalizing that experience. And we personalized throughout the learner journey so in discovery up-front, when you first joined the platform, we ask you, what's your career goal? What role are you in today? And then we help you find the right content to close the gap. As you're moving through courses we predict whether or not you need some additional support. Whether it's a fully automated intervention like a behavioral nudge, emphasizing growth mindset, or a pedagogical nudge like recommending the right review material and provide it to you, and then we also do the same to accelerate support staff on campus. So, we identify for each individual what type of human touch might they need, and we serve up to support staff recommendations for who they should reach out to, whether it's a counselor reaching out to degree student who hasn't logged in for a while, or a TA reaching out to a degree student who's struggling with an assignment. So, data really powers all of that, understanding someone's goals, their backgrounds, the content that's going to close the gap, as well as understanding where they need additional support and what type of help we can provide. >> And how are you able to track this data, are you using AV testing? >> Yeah, great question, so the, we call it a venting level data, which basically tracks what every learner is doing as they're moving through the platform. And then we use AV testing to understand the influence of kind of our big feature. So, say we roll out a new search ranking algorithm or a new learning experience we would AV-Test that, yes to understand how learners in the new variant compared to learners in the old variant. But for many of our machine learn systems, we're actually doing more of a multi-armed bandit approach where on the margin, we're changing a little bit the experience people have to understand what effect that has on their downstream behavior, separate from this mass hold-in or hold-out AV-Test. >> And so today, you're giving a talk about Coursera's latest data products so give us a little insight about that. >> So, I'm covering three data products that we've launched over the last couple of years. The first two are oriented around really helping learners be successful in the learning experience. So the first is predicting when learners are going to need additional nudges and intervening in fully automated ways to get them back on track. The second is about identifying learners who need human support and serving up really easily interpretable insights to support staff so they can reach out to the right learner with the right help. And then the third is a little bit different. It's about once learners are out in the labor market, how can they credibly signal what they know, so that they can be rewarded for that learning on the job. And this is a product called skill scoring, where we're actually measuring what skills each learner has up to what level so I can for example, compare that to the skills required in my target career or show it to my employer so I can be rewarded for what I know. >> That can be really helpful when people are creating resumes, by ranking how much of a skill that they have. >> Absolutely. So, it's really interesting when you talk about resumes, so many of what, so much of what's shown on resumes are traditional credentials, things like What school did you go to? what did you major in? what jobs have you had? And as you and I both know, there's unequal access to the school you go to or the early jobs you get. And so, part of the motivation behind skill scoring is to create more equitable or fair or accessible signals for the labor market. So, we're really excited about that direction. >> And do you think companies are taking that into consideration when they're hiring people who say have like a five out of five skills in computer science, but they didn't go to Stanford? >> Yeah. >> Think they're taking that >> Absolutely, I think companies are hungry to find more diverse talent and the biggest challenge is, when you look at people from diverse backgrounds, it's hard to know who has what skills. And so skill scoring provides a really valuable input, we're actually seeing it in use already by many of our enterprise customers who are using it to identify who have their internal employees is well positioned for new opportunities or new roles. For example, I may have a bunch of backend engineers, if I know who's good in math and machine learning and statistics, I can actually tap those folks to transition over to machine learning roles. And so it's used both as an external signal and external labor market, as well as an internal signal within companies. >> And just our last question here, what advice would you give to young women who are either out of college or just starting college who are interested in data science? Who maybe, don't haven't majored in a typical data science major? What advice would you give to them? >> So, I love that you asked you haven't made it, majored in a typical data science major. I'm actually an economist by training. And I think that's probably the reason why I was at first rejected from Coursera because an economist is a very strange background to go into data science. I think my primary advice to those young women would be to really not get too lost in the data science, in the math, in the algorithms and instead to remember that those are a means to an end, and the end is impact. So, think about the problems in the world that you care about. For me, it's education. For others, it's health care, or personal finance or a range of other issues. And remember that data science provides this vast set of tools that you can use to solve the problems you care about most. >> That's great, thank you so much for being on theCUBE. >> Thank you. I'm Sonia Tagare, thank you so much for watching theCUBE and stay tuned for more. (upbeat music)

Published Date : Mar 3 2020

SUMMARY :

Brought to you by SiliconANGLE media. covering the fifth annual WiDs, about what you do at Coursera. I lead the end to end data team and she's also the one who hired you. and then fast forward to today So, how has Coursera changed that it's not just on the individuals to learn, And then we help you find the right content the experience people have to understand what effect And so today, you're giving a talk about Coursera's compare that to the skills required in my target career resumes, by ranking how much of a skill that they have. to the school you go to or the early jobs you get. and statistics, I can actually tap those folks to transition and instead to remember that those are a means to an end, I'm Sonia Tagare, thank you so much for watching theCUBE

ENTITIES

Entity	Category	Confidence
Sonia Tagare	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Daphne	PERSON	0.99+
Daphne Koller	PERSON	0.99+
Stanford	ORGANIZATION	0.99+
10 times	QUANTITY	0.99+
Coursera	ORGANIZATION	0.99+
14 degrees	QUANTITY	0.99+
Emily	PERSON	0.99+
five	QUANTITY	0.99+
first movie	QUANTITY	0.99+
tomorrow	DATE	0.99+
500	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
third	QUANTITY	0.99+
first	QUANTITY	0.99+
Today	DATE	0.99+
second	QUANTITY	0.99+
20 industry partners	QUANTITY	0.99+
Emily Glassberg Sands	PERSON	0.99+
Stanford University	ORGANIZATION	0.99+
less than 50 people	QUANTITY	0.99+
each person	QUANTITY	0.98+
SiliconANGLE	ORGANIZATION	0.98+
today	DATE	0.98+
theCUBE	ORGANIZATION	0.98+
both	QUANTITY	0.98+
about 160 universities	QUANTITY	0.97+
first two	QUANTITY	0.96+
first answer	QUANTITY	0.95+
first MOOC	QUANTITY	0.95+
50 million learners	QUANTITY	0.95+
about 50 million learners	QUANTITY	0.94+
Tableau	TITLE	0.93+
about six years	QUANTITY	0.93+
three	QUANTITY	0.92+
each individual	QUANTITY	0.92+
WiDs, Women in Data Science conference	EVENT	0.91+
third major	QUANTITY	0.9+
each learner	QUANTITY	0.89+
one	QUANTITY	0.89+
WiDS	EVENT	0.88+
earlier this morning	DATE	0.87+
Conference 2020	EVENT	0.85+
last couple of years	DATE	0.85+
first memories	QUANTITY	0.85+
five skills	QUANTITY	0.83+
three data products	QUANTITY	0.83+
Stanford Women in Data Science	EVENT	0.82+
Google Cloud	TITLE	0.81+
five years	QUANTITY	0.77+
first Massive	QUANTITY	0.72+
Stanford Women in Data Science 2020	EVENT	0.69+
fifth	QUANTITY	0.54+

Ya Xu, LinkedIn | Stanford Women in Data Science (WiDS) Conference 2020

>> Narrator: Live from Stanford University, it's theCUBE! Covering Stanford Women in Data Science 2020, brought to you by SiliconAngle Media. >> Hi, and welcome to the cube, I'm your host, Sonia Tagare. And we're live at Stanford University, covering the fifth annual WiDS, Women in Data Science Conference. Joining us today is Ya XU, the head of data science at LinkedIn. Ya Welcome to the cube. >> Thank you for having me. >> So tell us a little bit about your role and about LinkedIn. >> So LinkedIn is, first of all, the biggest professional social network, where we have a massive economic graph that we have been creating with millions actually close to 700 million members and millions of companies and jobs and of course, you know, with students of skills and also schools as well as part of it. And, and I lead the data science team at LinkedIn. And my team really spans across the global presence that LinkedIn offices have. And yeah really working on various different areas. That's both thinking about how we can iterate and understand and improve our products, that we deliver to our members and our customers. And also at the same time thinking about how we can make our infrast6ructure more efficient, and thinking about how we can make our sales and marketing more efficient as well, so we really span across. >> And how has the use of data science evolved to deliver a better user experience for users of LinkedIn? >> Yeah, so first of all, I think we LinkedIn in general, we truly believe that everybody can benefit from better data, better data access, in general. So we're certainly using data to continuously understand better of what our members are looking for. As a simple example, is that whenever we launch new feature, we're not just blindly deciding ourselves what is the better feature for our members, but we actually understand how our users are reacting to it. Right? So we use data to understand that, and then certainly making decisions, and whether we should be eventually launching this feature to all members or not. So that's a very prominent way for us to use data. And obviously, we also use data to understand and just even before we build certain features. Is this sort of feature that's right feature to build. We do both survey and understand the survey data, but also at the same time understanding just user behavior data for us to be able to come up with better features for users. >> And do you use AB testing as well? >> Oh absolutely, Yeah. So we do a lot of AV experiments. That's what, I was not trying to use that word by that like that terminology, but this is what we use to have an understanding of user features that we are developing, that we are putting in front of our users. Is that what they enjoy as much as we think they will enjoy? >> Right, so you had a talk today about creating global economic opportunities with responsible data. So give us some highlights from your talk. >> So, first of all, at LinkedIn we we truly believe in the vision that we are working towards, which is really creating economic opportunity for every member of the global workforce. And if you're kind of starting from that, and thinking about that is our sort of the axiom that we're working towards, and then thinking about how you can do that, and obviously, the sort of the table stake or just the fundamental thing that we have to start with is to be able to preserve the privacy of our members as we are leveraging the data that our members entrust with us. Right, so how can we do that? We have some early effort in using and developing differential privacy as a technique for us to do a lot better. Always regarding preserving their privacy as we're leveraging the data, but also at the same time, it doesn't ends there, right? Because you're thinking about creating opportunity. It's not just about to preserve their privacy, but also, when we are leveraging the data, how can we leverage the data in a way that is able to create opportunity in a fair way? So here is also a lot of effort that we're having with regarding, how can we do that? And what does fairest mean? What are the ways we can actually turn some of the key concepts that we have into action that is really able to drive the way we develop product, the way that we think about responsible design, and the way that we build our algorithms, the way that we measure in every single dimension. >> And and speaking about that bias, at the opening address, they mentioned that diversity is really great because it provides many perspectives, and also helps reduce this bias. So how have you at LinkedIn been able to create a more diverse team? >> So first of all, I think it's certain we all believe that diversity is certainly better as we building product. Thinking about if you have a diverse team that is really a representation of the customer and some members that you're serving, then definitely you're able to come up with better features that is able to serve the needs of the population of our members. But also at the same time, that's just the right thing to do as well. Right, thinking about we all have had experiences we may not you know, feel as much belonging when we walk into a room that we are the only person that we identify with to be in that room. And, we certainly wanted to be able to create that environment for all the employees as well. And and thinking about, I think there is also studies that has done as what makes a high performing team. Some of the studies has done I google with the psychological safety aspects of it, which is really there's a lot of brain science that says when you make people feel they belong, that they will actually be so much more creative and innovative and everything right. So we have that belief. But tactically, there are many things that we're doing from all the divs aspect, right? How can you bring diversity, inclusion and belonging? Starting from and hiring, right? So we certainly are very much emphasized how can we increase the diversity of individuals that we're bringing to LinkedIn? And when they are at LinkedIn, can we make them feel more belonging, and feel more included in every aspects? We have different inclusion groups, right? We have I mean, obviously, I'm very much involved in Women tech. At LinkedIn we have both money efforts that we do to help women at LinkedIn in engineering, and in other groups as well to feel they belong to this community. At the same time, there is concrete actions that we're taking too. Right, that we are helping women to have a much better understanding, and aware of some of the ways that we operate that is slightly different from maybe our male colleagues will operate, right? There are certain things that we're doing to change the current processes, hiring processes, promotion process, that we are able to bring more equal footing to the way that we're thinking about gender gap and gender diversity. >> Right, that's great. And what advice would you give to women who are just starting college or who are just out of college who are interested in going into data science. >> So I want to say the biggest learning for me, is just have that can do attitude. I, you know, the woman biologically and all just like in every way, we're not any less than men. And that you certainly have seen many strong and very talented women that we have in the field. So don't let people's perceptions or biases around you to bring you down. And then thinking about what you wanted, and then just go for it, and then go for the the advice that you can get from people. And then there are so many as you can see in the conference today, so many talented women that you can reach out to who are winning and very willing to help you as well. >> And in this age of AI and ML, where do you see data science going in the future? >> That's a really interesting question. So in the way that, you know, data science I want to say is a field that is really broad, right? So if you're thinking about things that I would consider to be part of data science may not necessarily part of AI, but some of the course of influence that is extremely popular and important. And then I think the fields will continue to evolve, there are going to be and then the fields are continually overlapping with each other as well. You cannot do data science without understanding or have a strong skill in AI and machine learning. And you also can't do great machine learning without understanding the data science either. Right? So thinking about some of the talk that definitely colder earlier was sharing, as in you know, you can blind in the wrong algorithm and without realizing the bias. That all the algorithm is really just detecting the machines that's using the images versus you know, actually detecting the difference between broken bones or not right, like so. So I think having, I do see there is a continuously big overlap and I think the individuals who are involved in both communities should continue to be very comfortable being in that way too. >> Right, great. Thank you so much for being on theCUBE and thank you for your insight. >> Of course, thank you for having me. >> I'm your host, Sonia Takari. Thank you for watching theCUBE and stay tuned for more. (Upbeat music)

Published Date : Mar 3 2020

SUMMARY :

brought to you by SiliconAngle Media. Hi, and welcome to the cube, and about LinkedIn. and thinking about how we can make our sales and marketing and just even before we build certain features. that we are putting in front of our users. Right, so you had a talk today and the way that we build our algorithms, And and speaking about that bias, at the opening address, and aware of some of the ways that we operate And what advice would you give to women And that you certainly have seen many strong So in the way that, you know, data science and thank you for your insight. Thank you for watching theCUBE

ENTITIES

Entity	Category	Confidence
Sonia Takari	PERSON	0.99+
Sonia Tagare	PERSON	0.99+
LinkedIn	ORGANIZATION	0.99+
millions	QUANTITY	0.99+
today	DATE	0.99+
both	QUANTITY	0.98+
SiliconAngle Media	ORGANIZATION	0.98+
Stanford University	ORGANIZATION	0.97+
Ya Xu	PERSON	0.95+
Stanford Women in Data Science	EVENT	0.95+
WiDS, Women in Data Science Conference	EVENT	0.93+
both communities	QUANTITY	0.9+
700 million members	QUANTITY	0.89+
WiDS) Conference 2020	EVENT	0.79+
Stanford Women in Data Science 2020	EVENT	0.78+
millions of companies	QUANTITY	0.77+
single dimension	QUANTITY	0.7+
XU	PERSON	0.63+
first	QUANTITY	0.62+
fifth annual	QUANTITY	0.56+
theCUBE	TITLE	0.42+

Nhung Ho, Intuit | Stanford Women in Data Science (WiDS) Conference 2020

>>live from Stanford University. It's the queue covering Stanford women in data science 2020. Brought to you by Silicon Angle Media. Yeah. >>Hi. And welcome to the Cube. I'm your host Sonia Category. And we're live at Stanford University for the fifth annual Woods Women in Data Science Conference. Joining us today is none. Ho, the director of data Science at Intuit None. Welcome to the Cube. >>Thank you for having me here, so yeah, >>so tell us a little bit about your role at Intuit. So I leave the >>applied Machine Learning teams for our QuickBooks product lines and also for our customer success organization within my team. We do applied machine learning. So what? We specialize in building machine learning products and delivering them into our products for >>our users. Great. Today. Today you're giving a talk. You talked about how organizations want to achieve greater flexibility, speed and cost efficiencies on. And you're giving it a technical vision. Talk today about data science in the cloud world. So what should data scientists know about data science in a cloud world? >>Well, I'll just give you a little bit of a preview into my talk later because I don't want to spoil anything. Yeah, but I think one of the most important things being a data scientist in a cloud world is that you have to fundamentally change the way you work a lot of a start on our laptops or a server and do our work. But when you move to the cloud, it's like all bets are off. All the limiters are off. And so how do you fully take advantage of that? How do you change your workflow? What are some of the things that are available to you that you may not know about? And in addition to that, some some things that you have to rewire in your brain to operate in this new environment. And I'm going to share some experiences that I learned firsthand and also from my team in into its cloud migration over the past six years. >>That's great. Excited to hear that on DSO you were getting into it into it has sponsored Woods for many years now. Last year we spoke with could be the San Juan from Intuit. So tell us about this Intuit's sponsorship. Yeah, >>so into it. We are a champion of gender diversity and also all sorts of diversity. And when we first learned about which we said, We need to be a champion of the women in data science conference because for me personally, often times when I'm in a room, um, going over technical details I'm often the only woman and not just I'm often the only woman executive and so part of the sponsorship is to create this community of women, very technical women in this field, to share our work together to build this community and also to show the great diversity of work that's going on across the field of data science. >>And so Intuit has always been really great for embracing diversity. Tell us a little bit about about bad experience, about being part of Intuit and also about the tech women part. Yeah, >>so one of the things that into it that I really appreciate is we have employees groups around specific interests, and one of those employees groups is tech women at Intuit and Tech women at Intuit. The goal is to create a community of women who can provide coaching, mentorship, technical development, leadership development and I think one of the unique things about it is that it's not just focused on the technical development side, but on helping women develop into leadership positions. For me, When I first started out, there were very few women in executive positions in our field and data science is a brand new field, and so it takes time to get there. Now that I'm on the other side, one of the things that I want to do is be able to give back and coach the next generation. And so the tech women at Intuit Group allows me to do that through a very strong mentorship program that matches me and early career mentees across multiple different fields so that I can provide that coaching in that leadership development >>and speaking about like diversity. In the opening address, we heard that diversity creates perspectives, and it also takes away bias. So why gender diversity is so important into it, and how does it help take away that bias? Yeah, >>so one of the important things that I think a lot of people don't realize is when you go and you build your products, you bring in a lot of biases and how you build the product and ultimately the people who use your products are the general population for us. We serve consumer, small businesses and self employed. And if you take a look at the diversity of our customers, it mirrors the general population. And so when you think about building products, you need to bring in those diverse perspectives so you could build the best products possible because of people who are using those products come from a diverse background as well, >>right? And so now at Intuit like instead of going from a desktop based application, we're at a cloud based application, which is a big part of your talk. How do you use data Teoh for a B testing and why is it important? >>Yeah, a B testing That is a personal passion of mine, actually, because as a scientist, what we like to do is run a lot of experiments and say, Okay, what is the best thing out there so that ultimately, when you ship a new product or feature, you send the best thing possible that's verified by data, and you know exactly how users are going to react to it. When we were on desktop, they made it incredibly difficult because those were back in the days. And I don't know if you remember those put back in the days when you had a floppy disk, right or even a CD ROM's. That's how we shipped our products. And so all the changes that you wanted to make had to be contained. In the end, you really only ship it once per year. So if there's any type of testing that we did, we're bringing our users and have them use our products a little bit and then say Okay, we know exactly what we need to dio ship that out. So you only get one chance now that we're in the cloud. What that allows us to do is to test continuously via a B, testing every new feature that comes out. We have a champion Challenger model, and we can say Okay, the new version that we're shipping out is this much better than the previous one. We know it performs in this way, and then we got to make the decision. Is this the best thing to do for a customer? And so you turn what was once a one time process, a one time change management process. So one that's distributed throughout the entire year and at any one time we're running hundreds of tests to make sure that we're shipping exactly the best things for our customers. >>That's awesome. Um, so, um, what advice would you give to the next generation of women who are interested in stem but maybe feel like, Oh, I might be the only woman. I don't know if I should do this. Yeah, I think that the biggest >>thing for me was finding men's ownership, and initially, when I was very early career and even when I was doing my graduate studies for me, a mentor with someone who was in my field. But when I first joined into it, an executive in another group who is a female, said, Hey, I'd like to take your side, provide you some feedback, and this is some coaching I want to give you, And that was when I realized you don't actually need to have that person be in your field to actually guide you through to the next up. And so, for women who are going through their journey and early on, I recommend finding a mentor who is at a stage where you want to go, regardless of which field there in, because everybody has diverse perspectives and things that they can teach you as you go along. >>And how do you think Woods is helping women feel like they can do data science and be a part of the community? Yeah, I think >>what you'll see in the program today is a huge diversity of our speakers, our Panelists through all different stages of their career and all different fields. And so what we get to see is not only the time baseline of women who are in their PhDs all the way to very, very well established women. The provost of Stanford University was here today, which is amazing to see someone at the very top of the career who's been around the block. But the other thing is also the diversity and fields. When you think about data science, a lot of us think about just the tech industry. But you see it in healthcare. You see it in academia and there's a scene that wide diversity of where data science and where women who are practicing data science come from. I think it's really empowering because you can see yourself in the representation does matter quite a bit. >>Absolutely. And where do you see data science going forward? >>Oh, that is a, uh, tough and interesting question, actually. And I think that in the current environment today, we could talk about where it could go wrong or where it could actually open the doors. And for me, I'm an eternal optimist on one of the things that I think is really, really exciting for the future is we're getting to a stage where we're building models, not just for the general population. We have enough data and we have enough compute where we can build a model. Taylor just for you, for all of your life's on for me. I think that that is really, really powerful because we can build exactly the right solution to help our customers and our users succeed. Specifically, me working in the personal friend, Small business finance lease. That means I can hope that cupcake shop owner actually manage her cash flow and help her succeed to me that I think that's really powerful. And that's where data science is headed. >>None. Thank you so much for being on the Cube and thank you for your insight. Thank you so much. I'm so sorry. Thanks for watching the Cube. Stay tuned for more. Yeah, Yeah, yeah, yeah, yeah, yeah.

Published Date : Mar 3 2020

SUMMARY :

Brought to you by Silicon Angle Media. And we're live at Stanford University for the fifth so tell us a little bit about your role at Intuit. We do applied machine learning. And you're giving it a technical vision. What are some of the things that are available to you that you may not know about? Excited to hear that on DSO you were getting into it into it has sponsored We need to be a champion of the women in data science conference because And so Intuit has always been really great for embracing diversity. And so the tech women at Intuit Group allows me to do that through a very strong mentorship program that In the opening address, we heard that diversity creates And so when you think about building products, you need to bring in those diverse How do you use data Teoh for a B testing and And so all the changes that you wanted to make had to be contained. Um, so, um, what advice would you give to the next generation of women I recommend finding a mentor who is at a stage where you want to go, And so what we get to see is not only the time baseline of women who are in their PhDs all And where do you see data science going forward? And for me, I'm an eternal optimist on one of the things that I think is really, Thank you so much.

ENTITIES

Entity	Category	Confidence
Intuit	ORGANIZATION	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
Today	DATE	0.99+
Last year	DATE	0.99+
today	DATE	0.99+
Intuit Group	ORGANIZATION	0.99+
one time	QUANTITY	0.99+
Stanford University	ORGANIZATION	0.99+
one	QUANTITY	0.99+
Sonia	PERSON	0.99+
Nhung Ho	PERSON	0.99+
one chance	QUANTITY	0.99+
Taylor	PERSON	0.98+
first	QUANTITY	0.98+
Ho	PERSON	0.97+
QuickBooks	TITLE	0.97+
Intuit None	ORGANIZATION	0.95+
Woods Women in Data Science Conference	EVENT	0.94+
Stanford	ORGANIZATION	0.93+
hundreds of tests	QUANTITY	0.93+
2020	DATE	0.93+
past six years	DATE	0.88+
Stanford Women in Data Science (	EVENT	0.88+
DSO	ORGANIZATION	0.86+
one time process	QUANTITY	0.86+
once per year	QUANTITY	0.86+
Woods	PERSON	0.83+
Cube	COMMERCIAL_ITEM	0.77+
WiDS) Conference 2020	EVENT	0.75+
Woods	EVENT	0.66+
once	QUANTITY	0.61+
fifth	EVENT	0.55+
Cube	ORGANIZATION	0.51+
San Juan	LOCATION	0.46+
annual	QUANTITY	0.37+

Lillian Carrasquillo, Spotify | Stanford Women in Data Science (WiDS) Conference 2020

>>live from Stanford University. It's the queue covering Stanford women in data science 2020. Brought to you by Silicon Angle Media. >>Yeah, yeah. Hi. And welcome to the Cube. I'm your host, Sonia Atari. And we're live at Stanford University, covering the fifth annual Woods Women in Data Science Conference. Joining us today is Lillian Kearse. Keo, who's the Insights manager at Spotify. Slowly and welcome to the Cube. Thank you so much for having me. So tell us a little bit about your role at a Spotify. >>Yeah, So I'm actually one of the few insights managers in the personalization team. Um, and within my little group, we think about data and algorithms that help power the larger personalization experiences throughout Spotify. So, from your limits to discover weekly to your year and wrap stories to your experience on home and the search results, that's >>awesome. Can you tell us a little bit more about the personalization? Um, team? >>Yes. We actually have a variety of different product areas that come together to form the personalization mission, which is the mission is like the term that we use for a big department at Spotify, and we collaborate across different product areas to understand what are the foundational data sets and the foundational machine learning tools that are needed to be able to create features that a user can actually experience in the app? >>Great. Um, and so you're going to be on the career panel today? How do you feel about that? I'm >>really excited. Yeah, Yeah, the would seem is in a great job of bringing together Diverse is very, uh, it's overused term. Sometimes they're a very diverse group of people with lots of different types of experiences, which I think is core. So how I think about data science, it's a wide definition. And so I think it's great to show younger and mid career women all of the different career paths that we can all take. >>And what advice would you would you give to? Women were coming out of college right now about data science. >>Yeah, so my my big advice is to follow your interests. So there's so many different types of data science problems. You don't have to just go into a title that says data scientists or a team that says Data scientist, You can follow your interest into your data science. Use your data science skills in ways that might require a lot of collaboration or mixed methods, or work within a team where there are different types of different different types of expertise coming together to work on problems. >>And speaking of mixed methods, insights is a team that's a mixed methods research groups. So tell us more about that. Yes, I >>personally manage a data scientist, Um, user researcher and the three of us collaborate highly together across their disciplines. We also collaborate across research science, the research science team right into the product and engineering teams that are actually delivering the different products that users get to see. So it's highly collaborative, and the idea is to understand the problem. Space deeply together, be able to understand. What is it that we're trying to even just form in our head is like the need that a user work and human and user human has, um, in bringing in research from research scientists and the product side to be able to understand those needs and then actually have insights that another human, you know, a product owner you can really think through and understand the current space and like the product opportunities >>and to understand that user insight do use a B testing. >>We use a lot of >>a B testing, so that's core to how we think about our users at Spotify. So we use a lot of a B testing. We do a lot of offline experiments to understand the potential consequences or impact that certain interventions can have. But I think a B testing, you know, there's so much to learn about best practices there and where you're talking about a team that does foundational data and foundational features. You also have to think about unintended or second order effects of algorithmic a B test. So it's been just like a huge area of learning in a huge area of just very interesting outcomes. And like every test that we run, we learn a lot about not just the individual thing. We're testing with just the process overall. >>And, um, what are some features of Spotify that customers really love anything? Anything >>that's like we know use a daily mix people absolutely love every time that I make a new friend and I saw them what they work on there like I was just listening to my daily makes this morning discover weekly for people who really want >>to stay, >>you know, open to new music is also very popular. But I think the one that really takes it is any of the end of year wrapped campaigns that we have just the nostalgia that people have, even just for the last year. But in 2019 we were actually able to do 10 years, and that amount of nostalgia just went through the roof like people were just like, Oh my goodness, you captured the time that I broke up with that, you >>know, the 1st 5 years ago, or just like when I discovered that I love Taylor Swift, even though I didn't think I like their or something like that, you know? >>Are there any surprises or interesting stories that you have about, um, interesting user experiences? Yeah. >>I mean, I could give I >>can give you an example from my experience. So recently, A few a few months ago, I was scrolling through my home feed, and I noticed that one of the highly rated things for me was women in >>country, and I was like, Oh, that's kind of weird. I don't consider >>myself a country fan, right? And I was like having this moment where I went through this path of Wait, That's weird. Why would Why would this recommend? Why would the home screen recommend women in country, country music to me? And then when I click through it, um, it would show you a little bit of information about it because it had, you know, Dolly Parton. It had Margo Price and it had the high women and those were all artistes. And I've been listening to a lot, but I just had not formed an identity as a country music. And then I click through It was like, Oh, this is a great play list and I listen to it and it got me to the point where I was realizing I really actually do like country music when the stories were centered around women, that it was really fun to discover other artists that I wouldn't have otherwise jumped into as well. Based on the fact that I love the story writing and the song, writing these other country acts that >>so quickly discovered that so you have a degree in industrial mathematics, went to a liberal arts college on purpose because you want to try out different classes. So how is that diversity of education really helped >>you in your Yes, in my undergrad is from Smith College, which is a liberal arts school, very strong liberal arts foundation. And when I went to visit, one of the math professors that I met told me that he, you know, he considers studying math, not just to make you better at math, but that it makes you a better thinker. And you can take in much more information and sort of question assumptions and try to build a foundation for what? The problem that you're trying to think through is. And I just found that extremely interesting. And I also, you know, I haven't undeclared major in Latin American studies, and I studied like neuroscience and quantum physics for non experts and film class and all of these other things that I don't know if I would have had the same opportunity at a more technical school, and I just found it really challenging and satisfying to be able to push myself to think in different ways. I even took a poetry writing class I did not write good poetry, but the experience really stuck with me because it was about pushing myself outside of my own boundaries. >>And would you recommend having this kind of like diverse education to young women now who are looking >>and I absolutely love it? I mean, I think, you know, there's some people believe that instead of thinking about steam, we should be talking instead of thinking about stem. Rather, we should be talking about steam, which adds the arts education in there, and liberal arts is one of them. And I think that now, in these conversations that we have about biases in data and ML and AI and understanding, fairness and accountability, accountability bitterly, it's a hardware. Apparently, I think that a strong, uh, cross disciplinary collaborative and even on an individual level, cross disciplinary education is really the only way that we're gonna be able to make those connections to understand what kind of second order effects for having based on the decisions of parameters for a model. In a local sense, we're optimizing and doing a great job. But what are the global consequences of those decisions? And I think that that kind of interdisciplinary approach to education as an individual and collaboration as a team is really the only way. >>And speaking about bias. Earlier, we heard that diversity is great because it brings out new perspectives, and it also helps to reduce that unfair bias. So how it Spotify have you managed? Or has Spotify managed to create a more diverse team? >>Yeah, so I mean, it starts with recruiting. It starts with what kind of messaging we put out there, and there's a great team that thinks about that exclusively. And they're really pushing all of us as managers. As I seizes leaders to really think about the decisions in the way that we talk about things and all of these micro decisions that we make and how that creates an inclusive environments, it's not just about diversity. It's also about making people feel like this is where they should be. On a personal level, you know, I talk a lot with younger folks and people who are trying to just figure out what their place is in technology, whether it be because they come from a different culture, >>there are, >>you know, they might be gender, non binary. They might be women who feel like there is in a place for them. It's really about, You know, the things that I think about is because you're different. Your voice is needed even more. You know, like your voice matters and we need to figure out. And I always ask, How can I highlight your voice more? You know, how can I help? I have a tiny, tiny bit of power and influence. You know, more than some other folks. How can I help other people acquire that as well? >>Lilian, thank you so much for your insight. Thank you for being on the Cube. Thank you. I'm your host, Sonia today. Ari. Thank you for watching and stay tuned for more. Yeah, yeah.

Published Date : Mar 3 2020

SUMMARY :

Brought to you by Silicon Angle Media. Thank you so much for having me. that help power the larger personalization experiences throughout Spotify. Can you tell us a little bit more about the personalization? and we collaborate across different product areas to understand what are the foundational data sets and How do you feel about that? And so I think it's great to show younger And what advice would you would you give to? Yeah, so my my big advice is to follow your interests. And speaking of mixed methods, insights is a team that's a mixed methods research groups. in bringing in research from research scientists and the product side to be able to understand those needs And like every test that we run, we learn a lot about not just the individual thing. you know, open to new music is also very popular. Are there any surprises or interesting stories that you have about, um, interesting user experiences? can give you an example from my experience. I don't consider And I was like having this moment where I went through this path of Wait, so quickly discovered that so you have a degree in industrial mathematics, And I also, you know, I haven't undeclared major in Latin American studies, I mean, I think, you know, there's some people believe that So how it Spotify have you managed? As I seizes leaders to really think about the decisions in the way that we talk And I always ask, How can I highlight your voice more? Lilian, thank you so much for your insight.

ENTITIES

Entity	Category	Confidence
Lillian Carrasquillo	PERSON	0.99+
Lillian Kearse	PERSON	0.99+
Lilian	PERSON	0.99+
Sonia	PERSON	0.99+
Spotify	ORGANIZATION	0.99+
2019	DATE	0.99+
Ari	PERSON	0.99+
Sonia Atari	PERSON	0.99+
three	QUANTITY	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
today	DATE	0.99+
Stanford University	ORGANIZATION	0.99+
Smith College	ORGANIZATION	0.99+
10 years	QUANTITY	0.99+
Keo	PERSON	0.98+
last year	DATE	0.98+
one	QUANTITY	0.98+
Dolly Parton	PERSON	0.98+
Margo Price	PERSON	0.97+
Stanford Women in Data Science	EVENT	0.97+
1st 5 years ago	DATE	0.95+
Woods Women in Data Science Conference	EVENT	0.94+
Latin American	OTHER	0.9+
Taylor Swift	PERSON	0.88+
second order	QUANTITY	0.82+
Stanford	ORGANIZATION	0.82+
2020	DATE	0.81+
WiDS) Conference 2020	EVENT	0.8+
a few months ago	DATE	0.77+
end	DATE	0.61+
this morning	DATE	0.6+
fifth	EVENT	0.5+
data	TITLE	0.5+
Cube	COMMERCIAL_ITEM	0.5+
annual	QUANTITY	0.4+

Lucy Bernholz, Stanford University | Stanford Women in Data Science (WiDS) Conference 2020

>> Announcer: Live from Stanford University. It's theCUBE, covering Stanford Women in Data Science 2020, brought to you by SiliconANGLE Media. (upbeat music) >> Hi, and welcome to theCUBE. I'm your host, Sonia Tagare. And we're live at Stanford University covering the fifth annual WiDS Women in Data Science Conference. Joining us today is Lucy Bernholz, who is the Senior Research Scholar at Stanford University. Lucy, welcome to theCUBE. >> Thanks for having me. >> So you've led the Digital Civil Society Lab at Stanford for the past 11 years. So tell us more about that. >> Sure, so the Digital Civil Society Lab actually exists because we don't think digital civil society exists. So let me take that apart for you. Civil society is that weird third space outside of markets and outside of government. So it's where we associate together, it's where we as people get together and do things that help other people could be the nonprofit sector, it might be political action, it might be the eight of us just getting together and cleaning up a park or protesting something we don't like. So that's civil society. But what's happened over the last 30 years really is that everything we use to do that work has become dependent on digital systems and those digital systems, some tier, I'm talking gadgets, from our phones, to the infrastructure over which data is exchanged. That entire digital system is built by companies and surveilled by governments. So where do we as people get to go digitally? Where we could have a private conversation to say, "Hey, let's go meet downtown and protest x and y, or let's get together and create an alternative educational opportunity 'cause we feel our kids are being overlooked, whatever." All of that information that get exchanged, all of that associating that we might do in the digital world, it's all being watched. It's all being captured (laughs). And that's a problem because both history and political science, history and democracy theory show us that when there's no space for people to get together voluntarily, take collective action, and do that kind of thinking and planning and communicating it just between the people they want involved in that when that space no longer exists, democracies fall. So the lab exists to try to recreate that space. And in order to do that, we have to first of all recognize that it's being closed in. Secondly, we have to make real technological process, we need a whole set of different kind of different digital devices and norms. We need different kinds of organizations, and we need different laws. So that's what the lab does. >> And how does ethics play into that. >> It's all about ethics. And it's a word I try to avoid actually, because especially in the tech industry, I'll be completely blunt here. It's an empty term. It means nothing the companies are using it to avoid being regulated. People are trying to talk about ethics, but they don't want to talk about values. But you can't do that. Ethics is a code of practice built on a set of articulated values. And if you don't want to talk about values, you don't really having conversation about ethics, you're not having a conversation about the choices you're going to make in a difficult situation. You're not having a conversation over whether one life is worth 5000 lives or everybody's lives are equal. Or if you should shift the playing field to account for the millennia of systemic and structural biases that have been built into our system. There's no conversation about ethics, if you're not talking about that thing and those things. As long as we're just talking about ethics, we're not talking about anything. >> And you were actually on the ethics panel just now. So tell us a little bit about what you guys talked about and what were some highlights. >> So I think one of the key things about the ethics panel here at WiDS this morning was that first of all started the day, which is a good sign. It shouldn't be a separate topic of discussion. We need this conversation about values about what we're trying to build for, who we're trying to protect, how we're trying to recognize individual human agency that has to be built in throughout data science. So it's a good start to have a panel about it, the beginning of the conference, but I'm hopeful that the rest of the conversation will not leave it behind. We talked about the fact that just as civil society is now dependent on these digital systems that it doesn't control. Data scientists are building data sets and algorithmic forms of analysis, that are both of those two things are just coated sets of values. And if you try to have a conversation about that, at just the math level, you're going to miss the social level, you're going to miss the fact that that's humanity you're talking about. So it needs to really be integrated throughout the process. Talking about the values of what you're manipulating, and the values of the world that you're releasing these tools into. >> And what are some key issues today regarding ethics and data science? And what are some solutions? >> So I mean, this is the Women and Data Science Conference that happens because five years ago or whenever it was, the organizers realize, "Hey, women are really underrepresented in data science and maybe we should do something about that." That's true across the board. It's great to see hundreds of women here and around the world participating in the live stream, right? But as women, we need to make sure that as you're thinking about, again, the data and the algorithm, the data and the analysis that we're thinking about all of the people, all of the different kinds of people, all of the different kinds of languages, all of the different abilities, all of the different races, languages, ages, you name it that are represented in that data set and understand those people in context. In your data set, they may look like they're just two different points of data. But in the world writ large, we know perfectly well that women of color face a different environment than white men, right? They don't work, walk through the world in the same way. And it's ridiculous to assume that your shopping algorithm isn't going to affect that difference that they experience to the real world that isn't going to affect that in some way. It's fantasy, to imagine that is not going to work that way. So we need different kinds of people involved in creating the algorithms, different kinds of people in power in the companies who can say we shouldn't build that, we shouldn't use it. We need a different set of teaching mechanisms where people are actually trained to consider from the beginning, what's the intended positive, what's the intended negative, and what is some likely negatives, and then decide how far they go down that path? >> Right and we actually had on Dr. Rumman Chowdhury, from Accenture. And she's really big in data ethics. And she brought up the idea that just because we can doesn't mean that we should. So can you elaborate more on that? >> Yeah well, just because we can analyze massive datasets and possibly make some kind of mathematical model that based on a set of value statements might say, this person is more likely to get this disease or this person is more likely to excel in school in this dynamic or this person's more likely to commit a crime. Those are human experiences. And while analyzing large data sets, that in the best scenario might actually take into account the societal creation that those actual people are living in. Trying to extract that kind of analysis from that social setting, first of all is absurd. Second of all, it's going to accelerate the existing systemic problems. So you've got to use that kind of calculation over just because we could maybe do some things faster or with larger numbers, are the externalities that are going to be caused by doing it that way, the actual harm to living human beings? Or should those just be ignored, just so you can meet your shipping deadline? Because if we expanded our time horizon a little bit, if you expand your time horizon and look at some of the big companies out there now, they're now facing those externalities, and they're doing everything they possibly can to pretend that they didn't create them. And that loop needs to be shortened, so that you can actually sit down at some way through the process before you release some of these things and say, in the short term, it might look like we'd make x profit, but spread out that time horizon I don't know two x. And you face an election and the world's largest, longest lasting, stable democracy that people are losing faith in. Set up the right price to pay for a single company to meet its quarterly profit goals? I don't think so. So we need to reconnect those externalities back to the processes and the organizations that are causing those larger problems. >> Because essentially, having externalities just means that your data is biased. >> Data are biased, data about people are biased because people collect the data. There's this idea that there's some magic debias data set is science fiction. It doesn't exist. It certainly doesn't exist for more than two purposes, right? If we could, and I don't think we can debias a data set to then create an algorithm to do A, that same data set is not going to be debiased for creating algorithm B. Humans are biased. Let's get past this idea that we can strip that bias out of human created tools. What we're doing is we're embedding them in systems that accelerate them and expand them, they make them worse (laughs) right? They make them worse. So I'd spend a whole lot of time figuring out how to improve the systems and structures that we've already encoded with those biases. And using that then to try to inform the data science we're going about, in my opinion, we're going about this backwards. We're building the biases into the data science, and then exporting those tools into bias systems. And guess what problems are getting worse. That so let's stop doing that (laughs). >> Thank you so much for your insight Lucy. Thank you for being on theCUBE. >> Oh, thanks for having me. >> I'm Sonia Tagare, thanks for watching theCUBE. Stay tuned for more. (upbeat music)

Published Date : Mar 3 2020

SUMMARY :

brought to you by SiliconANGLE Media. covering the fifth annual WiDS for the past 11 years. So the lab exists to try to recreate that space. for the millennia of systemic and structural biases So tell us a little bit about what you guys talked about but I'm hopeful that the rest of the conversation that they experience to the real world doesn't mean that we should. And that loop needs to be shortened, just means that your data is biased. that same data set is not going to be debiased Thank you so much for your insight Lucy. I'm Sonia Tagare, thanks for watching theCUBE.

ENTITIES

Entity	Category	Confidence
Lucy Bernholz	PERSON	0.99+
Sonia Tagare	PERSON	0.99+
Lucy	PERSON	0.99+
Digital Civil Society Lab	ORGANIZATION	0.99+
5000 lives	QUANTITY	0.99+
Accenture	ORGANIZATION	0.99+
Rumman Chowdhury	PERSON	0.99+
one life	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
both	QUANTITY	0.98+
five years ago	DATE	0.98+
two things	QUANTITY	0.98+
eight	QUANTITY	0.98+
Stanford University	ORGANIZATION	0.97+
one	QUANTITY	0.97+
theCUBE	ORGANIZATION	0.96+
single company	QUANTITY	0.96+
WiDS Women in Data Science Conference	EVENT	0.96+
today	DATE	0.95+
two different points	QUANTITY	0.95+
Stanford Women in Data Science	EVENT	0.95+
Stanford	LOCATION	0.95+
Secondly	QUANTITY	0.94+
more than two purposes	QUANTITY	0.93+
Women and Data Science Conference	EVENT	0.93+
last 30 years	DATE	0.92+
hundreds of women	QUANTITY	0.91+
Second	QUANTITY	0.91+
first	QUANTITY	0.87+
third space	QUANTITY	0.81+
this morning	DATE	0.81+
Stanford Women in Data Science 2020	EVENT	0.76+
two	QUANTITY	0.73+
past 11 years	DATE	0.71+
Conference 2020	EVENT	0.69+
WiDS)	EVENT	0.67+
WiDS	EVENT	0.62+
fifth annual	QUANTITY	0.58+

John Hoegger, Microsoft | Stanford Women in Data Science (WiDS) Conference 2020

>>live from Stanford University. It's the queue covering Stanford women in data Science 2020. Brought to you by Silicon Angle Media. >>Hi, and welcome to the Cube. I'm your host, Sonia today, Ari. And we're live at Stanford University covering wigs, Women in Data Science Conference 2020 And this is the fifth annual one. Joining us today is John Hoegger, who is the principal data scientist manager at Microsoft. John. Welcome to the Cube. Thanks. So tell us a little bit about your role at Microsoft. >>I manage a central data science team for myself. 3 65 >>And tell us more about what you do on a daily basis. >>Yeah, so we look at it across all the different myself. 365 products Office Windows security products has really try and drive growth, whether it's trying to provide recommendations to customers to end uses to drive more engagement with the products that they use every day. >>And you're also on the Weeds Conference Planning Committee. So tell us about how you joined and how that experience has been like, >>Yeah, actually, I was at Stanford about a week after the very first conference on. I got talking to Karen, one of this co organizers of that that conference and I found out there was only one sponsor very first year, which was WalMart Labs >>on. >>The more that she talked about it, the more that I wanted to be involved on. I thought that makes it really should be a sponsor, this initiative. And so I got details. I went back and my assessment sponsor. Ever since I've been on the committee trying it help with. I didn't find speakers on and review and the different speakers that we have each year. And it's it's amazing just to see how this event has grown over the four years. >>Yeah, that's awesome. So when you first started, how many people attended in the beginning? >>So it started off as we're in this conference with 400 people and just a few other regional events, and so was live streamed but just ready to a few universities. And ever since then it's gone with the words ambassadors and people around the world. >>Yes, and outwits has is over 60 countries on every continent except Antarctica has told them in the Kino a swell as has 400 plus attendees here and his life stream. So how do you think would has evolved over the years? >>Uh, it's it's term from just a conference to a movement. Now it's Ah, there's all these new Our regional events have been set up every year and just people coming together, I'm working together. So, Mike, self hosting different events. We had events in Redmond. I had office and also in New York and Boston and other places as well. >>So as a as a data scientist manager for many years at Microsoft, I'm I'm sure you've seen it increase in women taking technical roles. Tell us a little bit about that. >>Yeah, And for any sort of company you have to try and provide that environment. And part of that is even from recruiting and ensuring that you've got a diverse into s. So we make sure that we have women on every set of interviews to be able to really answer the question. What's it like to be a woman on this team and your old men contents of that question on? So you know that helps as faras we try, encourage more were parented some of these things demos on. I've now got a team of 30 data scientists, and half of them are women, which is great. >>That's also, um So, uh, um, what advice would you give to young professional women who are just coming out of college or who just starting college or interested in a stem field? But maybe think, Oh, I don't know if they'll be anyone like me in the room. >>Uh, you ask the questions when you interview I go for those interviews and asked, like Like, say, What's it like to be a woman on the team? All right. You're really ensuring that the teams that you're joining the companies you joined in a inclusive on and really value diversity in the workforce >>and talking about that as we heard in the opening address that diversity brings more perspectives, and it also helps take away bias from data science. How have you noticed that that bias becoming more fair, especially at your time at Microsoft? >>Yeah, and that's what the rest is about. Is just having those diverse set of perspectives on opinions in heaven. More people just looking like a data and thinking through your holiday to come. Views on and ensure has been used in the right way. >>Right. Um and so, um, what do you going forward? Do you plan to still be on the woods committee? What do you see with is going how DC woods in five years? >>Ah, yeah. I live in for this conference I've been on the committee on. I just expected to continue to grow. I think it's just going right beyond a conference. Dossevi in the podcasts on all the other initiatives that occurring from that. >>Great. >>John, Thank you so much for being on the Cube. It was great having >>you here. Thank you. >>Thanks for watching the Cube. I'm your host, Sonia, to worry and stay tuned for more. Yeah.

Published Date : Mar 3 2020

SUMMARY :

Brought to you by Silicon Angle Media. So tell us a little bit about your role at Microsoft. I manage a central data science team for myself. Yeah, so we look at it across all the different myself. you joined and how that experience has been like, I got talking to Karen, one of this co organizers of that that conference And it's it's amazing just to see how this event has grown over So when you first started, how many people attended in the beginning? So it started off as we're in this conference with 400 people and just a So how do you think would has evolved over the years? Uh, it's it's term from just a conference to a movement. Tell us a little bit about that. So you know that helps as faras we That's also, um So, uh, um, what advice would you give to Uh, you ask the questions when you interview I go for those interviews and asked, and talking about that as we heard in the opening address that diversity brings more perspectives, Yeah, and that's what the rest is about. Um and so, um, what do you going forward? I just expected to continue to grow. John, Thank you so much for being on the Cube. you here. I'm your host, Sonia, to worry and stay tuned for more.

ENTITIES

Entity	Category	Confidence
Karen	PERSON	0.99+
John Hoegger	PERSON	0.99+
Sonia	PERSON	0.99+
Redmond	LOCATION	0.99+
New York	LOCATION	0.99+
Mike	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
John	PERSON	0.99+
Ari	PERSON	0.99+
400 people	QUANTITY	0.99+
Dossevi	PERSON	0.99+
Stanford University	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
WalMart Labs	ORGANIZATION	0.99+
30 data scientists	QUANTITY	0.99+
each year	QUANTITY	0.99+
today	DATE	0.98+
Office	TITLE	0.98+
Weeds Conference Planning Committee	ORGANIZATION	0.98+
one	QUANTITY	0.98+
first conference	QUANTITY	0.97+
five years	QUANTITY	0.97+
one sponsor	QUANTITY	0.97+
over 60 countries	QUANTITY	0.97+
first	QUANTITY	0.96+
400 plus attendees	QUANTITY	0.96+
first year	QUANTITY	0.95+
half	QUANTITY	0.94+
DC	LOCATION	0.94+
Stanford	ORGANIZATION	0.94+
fifth annual	QUANTITY	0.93+
Stanford Women in Data Science (	EVENT	0.88+
Women in Data Science Conference 2020	EVENT	0.87+
Stanford	LOCATION	0.86+
Antarctica	LOCATION	0.85+
four years	QUANTITY	0.79+
3	OTHER	0.78+
WiDS) Conference 2020	EVENT	0.75+
Cube	COMMERCIAL_ITEM	0.74+
365	QUANTITY	0.71+
in data Science 2020	EVENT	0.65+
about a week	DATE	0.64+
Kino	LOCATION	0.63+
Windows	TITLE	0.6+

Daphne Koller, insitro | WiDS Women in Data Science Conference 2020

live from Stanford University it's the hue covering Stanford women in data science 2020 brought to you by Silicon angle media hi and welcome to the cube I'm your host Sonia - Garrett and we're live at Stanford University covering wigs women in data science conference the fifth annual one and joining us today is Daphne Koller who is the co-founder who sari is the CEO and founder of in seat row that Daphne welcome to the cube nice to be here Sonia thank you for having me so tell us a little bit about in seat row how you how it you got it founded and more about your role so I've been working in the intersection of machine learning and biology and health for quite a while and it was always a bit of a an interesting journey in that the data sets were quite small and limited we're now in a different world where there's tools that are allowing us to create massive biological data sets that I think can help us solve really significant societal problems and one of those problems that I think is really important is drug discovery development where despite many important advancements the costs just keep going up and up and up and the question is can we use machine learning to solve that problem better and you talk about this more in your keynote so give us a few highlights of what you talked about so in the last you can think of drug discovery and development in the last 50 to 70 years as being a bit of a glass half-full glass half-empty the glass half-full is the fact that there's diseases that used to be a death sentence or of the sentence still a life long of pain and suffering that are now addressed by some of the modern-day medicines and I think that's absolutely amazing the other side of it is that the cost of developing new drugs has been growing exponentially in what's come to be known as Arun was law being the inverse of Moore's Law which is the one we're all familiar with because the number of drugs approved per billion u.s. dollars just keeps going down exponentially so the question is can we change that curve and you talked in your keynote about the interdisciplinary cold to tell us more about that I think in order to address some of the critical problems that were facing one needs to really build a culture of people who work together at from different disciplines each bringing their own insights and their own ideas into the mix so and in seat row we actually have a company that's half-life scientists many of whom are producing data for the purpose of driving machine learning models and the other half are machine learning people and data scientists who are working on those but it's not a handoff where one group produces the data and the other one consumes and interpreted but really they start from the very beginning to understand what are the problems that one could solve together how do you design the experiment how do you build the model and how do you derive insights from that that can help us make better medicines for people and I also wanted to ask you you co-founded Coursera so tell us a little bit more about that platform so I founded Coursera as a result of work that I'd been doing at Stanford working on how technology can make education better and more accessible this was a project that I did here a number of my colleagues as well and at some point in the fall of 2011 there was an experiment let's take some of the content that we've been we've been developing within it's within Stanford and put it out there for people to just benefit from and we didn't know what would happen would it be a few thousand people but within a matter of weeks with minimal advertising other than one New York Times article that went viral we had a hundred thousand people in each of those courses and that was a moment in time where you know we looked at this and said can we just go back to writing more papers or is there an incredible opportunity to transform access to education to people all over the world and so I ended up taking a what was supposed to be a teary leave of absence from Stanford to go and co-found Coursera and I thought I'd go back after two years but the but at the end of that two-year period the there was just so much more to be done and so much more impact that we could bring to people all over the world people of both genders people of the different social economic status every single country around the world we I just felt like this was something that I couldn't not do and how did you why did you decide to go from an educational platform to then going into machine learning and biomedicine so I've been doing Coursera for about five years in 2016 and the company was on a great trajectory but it's primarily a Content company and around me machine learning was transforming the world and I wanted to come back and be part of that and when I looked around I saw machine learning being applied to ecommerce and the natural language and to self-driving cars but there really wasn't a lot of impact being made on the life science area and I wanted to be part of making that happen partly because I felt like coming back to our earlier comment that in order to really have that impact you need to have someone who speaks both languages and while there's a new generation of researchers who are bilingual in biology and in machine learning there's still a small group and there very few of those in kind of my age cohort and I thought that I would be able to have a real impact by building and company in the space so it sounds like your background is pretty varied what advice would you give to women who are just starting college now who may be interested in a similar field would you tell them they have to major in math or or do you think that maybe like there are some other majors that may be influential as well I think there's a lot of ways to get into data science math is one of them but there's also statistics or physics and I would say that especially for the field that I'm currently in which is at the intersection of machine learning data science on the one hand and biology and health on the other one can get there from biology or medicine as well but what I think is important is not to shy away from the more mathematically oriented courses in whatever major you're in because that found the is a really strong one there's a lot of people out there who are basically lightweight consumers of data science and they don't really understand how the methods that they're deploying how they work and that limits them in their ability to advance the field and come up with new methods that are better suited perhaps to the problems that they're tackling so I think it's totally fine and in fact there's a lot of value to coming into data science from fields other than a third computer science but I think taking courses in those fields even while you're majoring in whatever field you're interested in is going to make you a much better person who lives at that intersection and how do you think having a technology background has helped you in in founding your companies and has helped you become a successful CEO in companies that are very strongly Rd focused like like in C tro and others having a technical co-founder is absolutely essential because it's fine to have an understanding of whatever the user needs and so on and come from the business side of it and a lot of companies have a business co-founder but not understanding what the technology can actually do is highly limiting because you end up hallucinating oh if we could only do this and yet that would be great but you can't and people end up oftentimes making ridiculous promises about what technology will or will not do because they just don't understand where the land mines sit and and where you're gonna hit real obstacles and in the path so I think it's really important to have a strong technical foundation in these companies and that being said where do you see an teacher in the future and and how do you see it solving say Nash that you talked about in your keynote so we hope that in seat row we'll be a fully integrated drug discovery and development company that is based on a slightly different foundation than a traditional pharma company where they grew up in the old approach of that is very much bespoke scientific analysis of the biology of different diseases and then going after targets or our ways of dealing with the disease that are driven by human intuition where I think we have the opportunity to go today is to build a very data-driven approach that collects massive amounts of data and then let analysis of those data really reveal new hypotheses that might not be the ones that the cord with people's preconceptions of what matters and what doesn't and so hopefully we'll be able to over time create enough data and apply machine learning to address key bottlenecks in the drug discovery development process so we can bring better drugs to people and we can do it faster and hopefully at much lower cost that's great and you also mentioned in your keynote that you think that 2020s is like a digital biology era so tell us more about that so I think if you look if you take a historical perspective on science and think back you realize that there's periods in history where one discipline has made a tremendous amount of progress in a relatively short amount of time because of a new technology or a new way of looking at things in the 1870s that discipline was chemistry was the understanding of the periodic table and that you actually couldn't turn lead into gold in the 1900s that was physics with understanding the connection between matter and energy and between space and time in the 1950s that was computing where silicon chips were suddenly able to perform calculations that up until that point only people have been able to do and then in 1990s there was an interesting bifurcation one was the era of data which is related to computing but also involves elements statistics and optimization of neuroscience and the other one was quantitative biology in which biology moved from a descriptive science of techsan amaizing phenomena to really probing and measuring biology in a very detailed and a high-throughput way using techniques like microarrays that measure the activity of 20,000 genes at once Oh the human genome sequencing of the human genome and many others but these two feels kind of evolved in parallel and what I think is coming now 30 years later is the convergence of those two fields into one field that I like to think of as digital biology where we are able using the tools that have and continue to be developed measure biology in entirely new levels of detail of fidelity of scale we can use the techniques of machine learning and data science to interpret what we're seeing and then use some of the technologies that are also emerging to engineer biology to do things that it otherwise wouldn't do and that will have implications in biomaterials in energy in the environment in agriculture and I think also in human health and it's an incredibly exciting space to be in right now because just so much is happening and the opportunities to make a difference and make the world a better place are just so large that sounds awesome Daphne thank you for your insight and thank you for being on cute thank you I'm so neat agario thanks for watching stay tuned for more great

Published Date : Mar 3 2020

SUMMARY :

in the last you can think of drug

ENTITIES

Entity	Category	Confidence
Daphne Koller	PERSON	0.99+
Sonia	PERSON	0.99+
Daphne	PERSON	0.99+
1950s	DATE	0.99+
1990s	DATE	0.99+
Sonia - Garrett	PERSON	0.99+
2016	DATE	0.99+
20,000 genes	QUANTITY	0.99+
1900s	DATE	0.99+
1870s	DATE	0.99+
two fields	QUANTITY	0.99+
one field	QUANTITY	0.99+
Stanford University	ORGANIZATION	0.99+
Stanford	ORGANIZATION	0.99+
Coursera	ORGANIZATION	0.98+
2020s	DATE	0.98+
both languages	QUANTITY	0.98+
both genders	QUANTITY	0.98+
two	QUANTITY	0.98+
fall of 2011	DATE	0.98+
two-year	QUANTITY	0.98+
today	DATE	0.97+
about five years	QUANTITY	0.96+
30 years later	DATE	0.93+
every single country	QUANTITY	0.93+
WiDS Women in Data Science Conference 2020	EVENT	0.93+
one	QUANTITY	0.91+
one discipline	QUANTITY	0.9+
a hundred thousand people	QUANTITY	0.9+
Nash	PERSON	0.89+
sari	PERSON	0.89+
each	QUANTITY	0.84+
Silicon angle media	ORGANIZATION	0.83+
few thousand people	QUANTITY	0.83+
billion u.s. dollars	QUANTITY	0.83+
two years	QUANTITY	0.82+
New York Times	ORGANIZATION	0.8+
one of those problems	QUANTITY	0.79+
Moore's Law	TITLE	0.79+
one group	QUANTITY	0.79+
Coursera	TITLE	0.78+
2020	DATE	0.77+
70 years	QUANTITY	0.76+
third computer	QUANTITY	0.74+
fifth annual one	QUANTITY	0.68+
each of those courses	QUANTITY	0.68+
science	EVENT	0.68+
lot of people	QUANTITY	0.66+
half	QUANTITY	0.64+
per	QUANTITY	0.49+
last 50	DATE	0.46+
Arun	TITLE	0.4+

Krishna Cheriath, Bristol Myers Squibb | MITCDOIQ 2020

>> From the Cube Studios in Palo Alto in Boston, connecting with thought leaders all around the world, this is a Cube Conversation. >> Hi everyone, this is Dave Vellante and welcome back to the Cube's coverage of the MIT CDOIQ. God, we've been covering this show since probably 2013, really trying to understand the intersection of data and organizations and data quality and how that's evolved over time. And with me to discuss these issues is Krishna Cheriath, who's the Vice President and Chief Data Officer, Bristol-Myers Squibb. Krishna, great to see you, thanks so much for coming on. >> Thank you so much Dave for the invite, I'm looking forward to it. >> Yeah first of all, how are things in your part of the world? You're in New Jersey, I'm also on the East coast, how you guys making out? >> Yeah, I think these are unprecedented times all around the globe and whether it is from a company perspective or a personal standpoint, it is how do you manage your life, how do you manage your work in these unprecedented COVID-19 times has been a very interesting challenge. And to me, what is most amazing has been, I've seen humanity rise up and so to our company has sort of snap to be able to manage our work so that the important medicines that have to be delivered to our patients are delivered on time. So really proud about how we have done as a company and of course, personally, it has been an interesting journey with my kids from college, remote learning, wife working from home. So I'm very lucky and blessed to be safe and healthy at this time. So hopefully the people listening to this conversation are finding that they are able to manage through their lives as well. >> Obviously Bristol-Myers Squibb, very, very strong business. You guys just recently announced your quarter. There's a biologics facility near me in Devon's, Massachusetts, I drive by it all the time, it's a beautiful facility actually. But extremely broad portfolio, obviously some COVID impact, but you're managing through that very, very well, if I understand it correctly, you're taking a collaborative approach to a COVID vaccine, you're now bringing people physically back to work, you've been very planful about that. My question is from your standpoint, what role did you play in that whole COVID response and what role did data play? >> Yeah, I think it's a two part as you rightly pointed out, the Bristol-Myers Squibb, we have been an active partner on the the overall scientific ecosystem supporting many different targets that is, from many different companies I think. Across biopharmaceuticals, there's been a healthy convergence of scientific innovation to see how can we solve this together. And Bristol-Myers Squibb have been an active participant as our CEO, as well as our Chief Medical Officer and Head of Research have articulated publicly. Within the company itself, from a data and technology standpoint, data and digital is core to the response from a company standpoint to the COVID-19, how do we ensure that our work continues when the entire global workforce pivots to a kind of a remote setting. So that really calls on the digital infrastructure to rise to the challenge, to enable a complete global workforce. And I mean workforce, it is not just employees of the company but the all of the third-party partners and others that we work with, the whole ecosystem needs to work. And I think our digital infrastructure has proven to be extremely resilient than that. From a data perspective, I think it is twofold. One is how does the core book of business of data continue to drive forward to make sure that our companies key priorities are being advanced. Secondarily, we've been partnering with a research and development organization as well as medical organization to look at what kind of real world data insights can really help in answering the many questions around COVID-19. So I think it is twofold. Main summary; one is, how do we ensure that the data and digital infrastructure of the company continues to operate in a way that allows us to progress the company's mission even during a time when globally, we have been switched to a remote working force, except for some essential staff from lab and manufacturing standpoint. And secondarily is how do we look at the real-world evidence as well as the scientific data to be a good partner with other companies to look at progressing the societal innovations needed for this. >> I think it's a really prudent approach because let's face it, sometimes one shot all vaccine can be like playing roulette. So you guys are both managing your risk and just as I say, financially, a very, very successful company in a sound approach. I want to ask you about your organization. We've interviewed many, many Chief Data Officers over the years, and there seems to be some fuzziness as to the organizational structure. It's very clear with you, you report in to the CIO, you came out of a technical bag, you have a technical degree but you also of course have a business degree. So you're dangerous from that standpoint. You got both sides which is critical, I would think in your role, but let's start with the organizational reporting structure. How did that come about and what are the benefits of reporting into the CIO? >> I think the Genesis for that as Bristol-Myers Squibb and when I say Bristol-Myers Squibb, the new Bristol-Myers Squibb is a combination of Heritage Bristol-Myers Squibb and Heritage Celgene after the Celgene acquisition last November. So in the Heritage Bristol-Myers Squibb acquisition, we came to a conclusion that in order for BMS to be able to fully capitalize on our scientific innovation potential as well as to drive data-driven decisions across the company, having a robust data agenda is key. Now the question is, how do you progress that? Historically, we had approached a very decentralized mechanism that made a different data constituencies. We didn't have a formal role of a Chief Data Officer up until 2018 or so. So coming from that realization that we need to have an effective data agenda to drive forward the necessary data-driven innovations from an analytic standpoint. And equally importantly, from optimizing our execution, we came to conclusion that we need an enterprise-level data organization, we need to have a first among equals if you will, to be mandated by the CEO, his leadership team, to be the kind of an orchestrator of a data agenda for the company, because data agenda cannot be done individually by a singular CDO. It has to be done in partnership with many stakeholders, business, technology, analytics, et cetera. So from that came this notion that we need an enterprise-wide data organization. So we started there. So for awhile, I would joke around that I had all of the accountabilities of the CDO without the lofty title. So this journey started around 2016, where we create an enterprise-wide data organization. And we made a very conscious choice of separating the data organization from analytics. And the reason we did that is when we look at the bowl of Bristol-Myers Squibb, analytics for example, is core and part of our scientific discovery process, research, our clinical development, all of them have deep data science and analytic embedded in it. But we also have other analytics whether it is part of our sales and marketing, whether it is part of our finance and our enabling functions they catch all across global procurement et cetera. So the world of analytics is very broad. BMS did a separation between the world of analytics and from the world of data. Analytics at BMS is in two modes. There is a central analytics organization called Business Insights and Analytics that drive most of the enterprise-level analytics. But then we have embedded analytics in our business areas, which is research and development, manufacturing and supply chain, et cetera, to drive what needs to be closer to the business idea. And the reason for separating that out and having a separate data organization is that none of these analytic aspirations or the business aspirations from data will be met if the world of data is, you don't have the right level of data available, the velocity of data is not appropriate for the use cases, the quality of data is not great or the control of the data. So that we are using the data for the right intent, meeting the compliance and regulatory expectations around the data is met. So that's why we separated out that data world from the analytics world, which is a little bit of a unique construct for us compared to what we see generally in the world of CDOs. And from that standpoint, then the decision was taken to make that report for global CIO. At Bristol-Myers Squibb, they have a very strong CIO organization and IT organization. When I say strong, it is from this lens standpoint. A, it is centralized, we have centralized the budget as well as we have centralized the execution across the enterprise. And the CDO reporting to the CIO with that data-specific agenda, has a lot of value in being able to connect the world of data with the world of technology. So at BMS, their Chief Data Officer organization is a combination of traditional CDO-type accountabilities like data risk management, data governance, data stewardship, but also all of the related technologies around master data management, data lake, data and analytic engineering and a nascent AI data and technology lab. So that construct allows us to be a true enterprise horizontal, supporting analytics, whether it is done in a central analytics organization or embedded analytics teams in the business area, but also equally importantly, focus on the world of data from operational execution standpoint, how do we optimize data to drive operational effectiveness? So that's the construct that we have where CDO reports to the CIO, data organization separated from analytics to really focus around the availability but also the quality and control of data. And the last nuance that is that at BMS, the Chief Data Officer organization is also accountable to be the Data Protection Office. So we orchestrate and facilitate all privacy-related actions across because that allows us to make sure that all personal data that is collected, managed and consumed, meets all of the various privacy standards across the world, as well as our own commitments as a company from across from compliance principles standpoint. >> So that makes a lot of sense to me and thank you for that description. You're not getting in the way of R&D and the scientists, they know data science, they don't need really your help. I mean, they need to innovate at their own pace, but the balance of the business really does need your innovation, and that's really where it seems like you're focused. You mentioned master data management, data lakes, data engineering, et cetera. So your responsibility is for that enterprise data lifecycle to support the business side of things, and I wonder if you could talk a little bit about that and how that's evolved. I mean a lot has changed from the old days of data warehouse and cumbersome ETL and you mentioned, as you say data lakes, many of those have been challenging, expensive, slow, but now we're entering this era of cloud, real-time, a lot of machine intelligence, and I wonder if you could talk about the changes there and how you're looking at and thinking about the data lifecycle and accelerating the time to insights. >> Yeah, I think the way we think about it, we as an organization in our strategy and tactics, think of this as a data supply chain. The supply chain of data to drive business value whether it is through insights and analytics or through operation execution. When you think about it from that standpoint, then we need to get many elements of that into an effective stage. This could be the technologies that is part of that data supply chain, you reference some of them, the master data management platforms, data lake platforms, the analytics and reporting capabilities and business intelligence capabilities that plug into a data backbone, which is that I would say the technology, swim lane that needs to get right. Along with that, what we also need to get right for that effective data supply chain is that data layer. That is, how do you make sure that there is the right data navigation capability, probably you make sure that we have the right ontology mapping and the understanding around the data. How do we have data navigation? It is something that we have invested very heavily in. So imagine a new employee joining BMS, any organization our size has a pretty wide technology ecosystem and data ecosystem. How do you navigate that, how do we find the data? Data discovery has been a key focus for us. So for an effective data supply chain, then we knew that and we have instituted our roadmap to make sure that we have a robust technology orchestration of it, but equally important is an effective data operations orchestration. Both needs to go hand in hand for us to be able to make sure that that supply chain is effective from a business use case and analytic use standpoint. So that has led us on a journey from a cloud perspective, since you refer that in your question, is we have invested very heavily to move from very disparate set of data ecosystems to a more converse cloud-based data backbone. That has been a big focus at the BMS since 2016, whether it is from a research and development standpoint or from commercialization, it is our word for the sales and marketing or manufacturing and supply chain and HR, et cetera. How do we create a converged data backbone that allows us to use that data as a resource to drive many different consumption patterns? Because when you imagine an enterprise of our size, we have many different consumers of the data. So those consumers have different consumption needs. You have deep data science population who just needs access to the data and they have data science platforms but they are at once programmers as well, to the other end of the spectrum where executives need pre-packaged KPIs. So the effective orchestration of the data ecosystem at BMS through a data supply chain and the data backbone, there's a couple of things for us. One, it drives productivity of our data consumers, the scientific researchers, analytic community or other operational staff. And second, in a world where we need to make sure that the data consumption appalls ethical standards as well as privacy and other regulatory expectations, we are able to build it into our system and process the necessary controls to make sure that the consumption and the use of data meets our highest trust advancements standards. >> That makes a lot of sense. I mean, converging your data like that, people always talk about stove pipes. I know it's kind of a bromide but it's true, and allows you to sort of inject consistent policies. What about automation? How has that affected your data pipeline recently and on your journey with things like data classification and the like? >> I think in pursuing a broad data automation journey, one of the things that we did was to operate at two different speed points. In a historically, the data organizations have been bundled with long-running data infrastructure programs. By the time you complete them, their business context have moved on and the organization leaders are also exhausted from having to wait from these massive programs to reach its full potential. So what we did very intentionally from our data automation journey is to organize ourselves in two speed dimensions. First, a concept called Rapid Data Lab. The idea is that recognizing the reality that the data is not well automated and orchestrated today, we need a SWAT team of data engineers, data SMEs to partner with consumers of data to make sure that we can make effective data supply chain decisions here and now, and enable the business to answer questions of today. Simultaneously in a longer time horizon, we need to do the necessary work of moving the data automation to a better footprint. So enterprise data lake investments, where we built services based on, we had chosen AWS as the cloud backbone for data. So how do we use the AWS services? How do we wrap around it with the necessary capabilities so that we have a consistent reference and technical architecture to drive the many different function journeys? So we organized ourselves into speed dimensions; the Rapid Data Lab teams focus around partnering with the consumers of data to help them with data automation needs here and now, and then a secondary team focused around the convergence of data into a better cloud-based data backbone. So that allowed us to one, make an impact here and now and deliver value from data to the dismiss here and now. Secondly, we also learned a lot from actually partnering with consumers of data on what needs to get adjusted over a period of time in our automation journey. >> It makes sense, I mean again, that whole notion of converged data, putting data at the core of your business, you brought up AWS, I wonder if I could ask you a question. You don't have to comment on specific vendors, but there's a conversation we have in our community. You have AWS huge platform, tons of partners, a lot of innovation going on and you see innovation in areas like the cloud data warehouse or data science tooling, et cetera, all components of that data pipeline. As well, you have AWS with its own tooling around there. So a question we often have in the community is will technologists and technology buyers go for kind of best of breed and cobble together different services or would they prefer to have sort of the convenience of a bundled service from an AWS or a Microsoft or Google, or maybe they even go best of breeds for all cloud. Can you comment on that, what's your thinking? >> I think, especially for organizations, our size and breadth, having a converged to convenient, all of the above from a single provider does not seem practical and feasible, because a couple of reasons. One, the heterogeneity of the data, the heterogeneity of consumption of the data and we are yet to find a single stack provider who can meet all of the different needs. So I am more in the best of breed camp with a few caveats, a hybrid best of breed, if you will. It is important to have a converged the data backbone for the enterprise. And so whether you invest in a singular cloud or private cloud or a combination, you need to have a clear intention strategy around where are you going to host the data and how is the data is going to be organized. But you could have a lot more flexibility in the consumption of data. So once you have the data converged into, in our case, we converged on AWS-based backbone. We allow many different consumptions of the data, because I think the analytic and insights layer, data science community within R&D is different from a data science community in the supply chain context, we have business intelligence needs, we have a catered needs and then there are other data needs that needs to be funneled into software as service platforms like the sales forces of the world, to be able to drive operational execution as well. So when you look at it from that context, having a hybrid model of best of breed, whether you have a lot more convergence from a data backbone standpoint, but then allow for best of breed from an analytic and consumption of data is more where my heart and my brain is. >> I know a lot of companies would be excited to hear that answer, but I love it because it fosters competition and innovation. I wish I could talk for you forever, but you made me think of another question which is around self-serve. On your journey, are you at the point where you can deliver self-serve to the lines of business? Is that something that you're trying to get to? >> Yeah, I think it does. The self-serve is an absolutely important point because I think the traditional boundaries of what you consider the classical IT versus a classical business is great. I think there is an important gray area in the middle where you have a deep citizen data scientist in the business community who really needs to be able to have access to the data and I have advanced data science and programming skills. So self-serve is important but in that, companies need to be very intentional and very conscious of making sure that you're allowing that self-serve in a safe containment sock. Because at the end of the day, whether it is a cyber risk or data risk or technology risk, it's all real. So we need to have a balanced approach between promoting whether you call it data democratization or whether you call it self-serve, but you need to balance that with making sure that you're meeting the right risk mitigation strategy standpoint. So that's how then our focus is to say, how do we promote self-serve for the communities that they need self-serve, where they have deeper levels of access? How do we set up the right safe zones for those which may be the appropriate mitigation from a cyber risk or data risk or technology risk. >> Security pieces, again, you keep bringing up topics that I could talk to you forever on, but I heard on TV the other night, I heard somebody talking about how COVID has affected, because of remote access, affected security. And it's like hey, give everybody access. That was sort of the initial knee-jerk response, but the example they gave as well, if your parents go out of town and the kid has a party, you may have some people show up that you don't want to show up. And so, same issue with remote working, work from home. Clearly you guys have had to pivot to support that, but where does the security organization fit? Does that report separate alongside the CIO? Does it report into the CIO? Are they sort of peers of yours, how does that all work? >> Yeah, I think at Bristol-Myers Squibb, we have a Chief Information Security Officer who is a peer of mine, who also reports to the global CIO. The CDO and the CSO are effective partners and are two sides of the coin and trying to advance a total risk mitigation strategy, whether it is from a cyber risk standpoint, which is the focus of the Chief Information Security Officer and whether it is the general data consumption risk. And that is the focus from a Chief Data Officer in the capacities that I have. And together, those are two sides of a coin that the CIO needs to be accountable for. So I think that's how we have orchestrated it, because I think it is important in these worlds where you want to be able to drive data-driven innovation but you want to be able to do that in a way that doesn't open the company to unwanted risk exposures as well. And that is always a delicate balancing act, because if you index too much on risk and then high levels of security and control, then you could lose productivity. But if you index too much on productivity, collaboration and open access and data, it opens up the company for risks. So it is a delicate balance within the two. >> Increasingly, we're seeing that reporting structure evolve and coalesce, I think it makes a lot of sense. I felt like at some point you had too many seats at the executive leadership table, too many kind of competing agendas. And now your structure, the CIO is obviously a very important position. I'm sure has a seat at the leadership table, but also has the responsibility for managing that sort of data as an asset versus a liability which my view, has always been sort of the role of the Head of Information. I want to ask you, I want to hit the Escape key a little bit and ask you about data as a resource. You hear a lot of people talk about data is the new oil. We often say data is more valuable than oil because you can use it, it doesn't follow the laws of scarcity. You could use data in infinite number of places. You can only put oil in your car or your house. How do you think about data as a resource today and going forward? >> Yeah, I think the data as the new oil paradigm in my opinion, was an unhealthy, and it prompts different types of conversations around that. I think for certain companies, data is indeed an asset. If you're a company that is focused on information products and data products and that is core of your business, then of course there's monetization of data and then data as an asset, just like any other assets on the company's balance sheet. But for many enterprises to further their mission, I think considering data as a resource, I think is a better focus. So as a vital resource for the company, you need to make sure that there is an appropriate caring and feeding for it, there is an appropriate management of the resource and an appropriate evolution of the resource. So that's how I would like to consider it, it is a personal end of one perspective, that data as a resource that can power the mission of the company, the new products and services, I think that's a good, healthy way to look at it. At the center of it though, a lot of strategies, whether people talk about a digital strategy, whether the people talk about data strategy, what is important is a company to have a pool north star around what is the core mission of the company and what is the core strategy of the company. For Bristol-Myers Squibb, we are about transforming patients' lives through science. And we think about digital and data as key value levers and drivers of that strategy. So digital for the sake of digital or data strategy for the sake of data strategy is meaningless in my opinion. We are focused on making sure that how do we make sure that data and digital is an accelerant and has a value lever for the company's mission and company strategy. So that's why thinking about data as a resource, as a key resource for our scientific researchers or a key resource for our manufacturing team or a key resource for our sales and marketing, allows us to think about the actions and the strategies and tactics we need to deploy to make that effective. >> Yeah, that makes a lot of sense, you're constantly using that North star as your guideline and how data contributes to that mission. Krishna Cheriath, thanks so much for coming on the Cube and supporting the MIT Chief Data Officer community, it was a really pleasure having you. >> Thank you so much for Dave, hopefully you and the audience is safe and healthy during these times. >> Thank you for that and thank you for watching everybody. This is Vellante for the Cube's coverage of the MIT CDOIQ Conference 2020 gone virtual. Keep it right there, we'll right back right after this short break. (lively upbeat music)

Published Date : Sep 3 2020

SUMMARY :

leaders all around the world, coverage of the MIT CDOIQ. I'm looking forward to it. so that the important medicines I drive by it all the time, and digital infrastructure of the company of reporting into the CIO? So that's the construct that we have and accelerating the time to insights. and the data backbone, and allows you to sort of and enable the business to in areas like the cloud data warehouse and how is the data is to the lines of business? in the business community that I could talk to you forever on, that the CIO needs to be accountable for. about data is the new oil. that can power the mission of the company, and supporting the MIT Chief and healthy during these times. of the MIT CDOIQ Conference

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Bristol-Myers Squibb	ORGANIZATION	0.99+
New Jersey	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
Devon	LOCATION	0.99+
Palo Alto	LOCATION	0.99+
Rapid Data Lab	ORGANIZATION	0.99+
2013	DATE	0.99+
Krishna Cheriath	PERSON	0.99+
two sides	QUANTITY	0.99+
two	QUANTITY	0.99+
COVID-19	OTHER	0.99+
Celgene	ORGANIZATION	0.99+
First	QUANTITY	0.99+
Cube	ORGANIZATION	0.99+
Krishna	PERSON	0.99+
Heritage Bristol-Myers Squibb	ORGANIZATION	0.99+
2018	DATE	0.99+
both sides	QUANTITY	0.99+
Both	QUANTITY	0.98+
Boston	LOCATION	0.98+
2016	DATE	0.98+
CDO	TITLE	0.98+
two modes	QUANTITY	0.98+
COVID	OTHER	0.98+
first	QUANTITY	0.98+
Bristol-Myers Squibb	ORGANIZATION	0.98+
last November	DATE	0.98+
Data Protection Office	ORGANIZATION	0.98+
One	QUANTITY	0.98+
two part	QUANTITY	0.98+
Secondly	QUANTITY	0.98+
second	QUANTITY	0.98+
MIT	ORGANIZATION	0.98+
both	QUANTITY	0.98+
MIT CDOIQ Conference 2020	EVENT	0.97+
Heritage Celgene	ORGANIZATION	0.97+
one	QUANTITY	0.97+
COVID-19 times	OTHER	0.96+
today	DATE	0.96+
BMS	ORGANIZATION	0.96+
single provider	QUANTITY	0.95+
single stack	QUANTITY	0.93+
Bristol Myers Squibb	PERSON	0.93+
one shot	QUANTITY	0.92+
Cube Studios	ORGANIZATION	0.9+
one perspective	QUANTITY	0.9+
Bristol-Myers	ORGANIZATION	0.9+
Business Insights	ORGANIZATION	0.89+
two speed	QUANTITY	0.89+
twofold	QUANTITY	0.84+
secondary	QUANTITY	0.8+
Secondarily	QUANTITY	0.77+
MIT CDOIQ	ORGANIZATION	0.76+
Massachusetts	LOCATION	0.75+
MITCDOIQ 2020	EVENT	0.74+
Vellante	PERSON	0.72+
Data	PERSON	0.71+
Chief Data Officer	PERSON	0.61+

Rich Gaston, Micro Focus | Virtual Vertica BDC 2020

(upbeat music) >> Announcer: It's theCUBE covering the virtual Vertica Big Data Conference 2020 brought to you by Vertica. >> Welcome back to the Vertica Virtual Big Data Conference, BDC 2020. You know, it was supposed to be a physical event in Boston at the Encore. Vertica pivoted to a digital event, and we're pleased that The Cube could participate because we've participated in every BDC since the inception. Rich Gaston this year is the global solutions architect for security risk and governance at Micro Focus. Rich, thanks for coming on, good to see you. >> Hey, thank you very much for having me. >> So you got a chewy title, man. You got a lot of stuff, a lot of hairy things in there. But maybe you can talk about your role as an architect in those spaces. >> Sure, absolutely. We handle a lot of different requests from the global 2000 type of organization that will try to move various business processes, various application systems, databases, into new realms. Whether they're looking at opening up new business opportunities, whether they're looking at sharing data with partners securely, they might be migrating it to cloud applications, and doing migration into a Hybrid IT architecture. So we will take those large organizations and their existing installed base of technical platforms and data, users, and try to chart a course to the future, using Micro Focus technologies, but also partnering with other third parties out there in the ecosystem. So we have large, solid relationships with the big cloud vendors, with also a lot of the big database spenders. Vertica's our in-house solution for big data and analytics, and we are one of the first integrated data security solutions with Vertica. We've had great success out in the customer base with Vertica as organizations have tried to add another layer of security around their data. So what we will try to emphasize is an enterprise wide data security approach, where you're taking a look at data as it flows throughout the enterprise from its inception, where it's created, where it's ingested, all the way through the utilization of that data. And then to the other uses where we might be doing shared analytics with third parties. How do we do that in a secure way that maintains regulatory compliance, and that also keeps our company safe against data breach. >> A lot has changed since the early days of big data, certainly since the inception of Vertica. You know, it used to be big data, everyone was rushing to figure it out. You had a lot of skunkworks going on, and it was just like, figure out data. And then as organizations began to figure it out, they realized, wow, who's governing this stuff? A lot of shadow IT was going on, and then the CIO was called to sort of reign that back in. As well, you know, with all kinds of whatever, fake news, the hacking of elections, and so forth, the sense of heightened security has gone up dramatically. So I wonder if you can talk about the changes that have occurred in the last several years, and how you guys are responding. >> You know, it's a great question, and it's been an amazing journey because I was walking down the street here in my hometown of San Francisco at Christmastime years ago and I got a call from my bank, and they said, we want to inform you your card has been breached by Target, a hack at Target Corporation and they got your card, and they also got your pin. And so you're going to need to get a new card, we're going to cancel this. Do you need some cash? I said, yeah, it's Christmastime so I need to do some shopping. And so they worked with me to make sure that I could get that cash, and then get the new card and the new pin. And being a professional in the inside of the industry, I really questioned, how did they get the pin? Tell me more about this. And they said, well, we don't know the details, but you know, I'm sure you'll find out. And in fact, we did find out a lot about that breach and what it did to Target. The impact that $250 million immediate impact, CIO gone, CEO gone. This was a big one in the industry, and it really woke a lot of people up to the different types of threats on the data that we're facing with our largest organizations. Not just financial data; medical data, personal data of all kinds. Flash forward to the Cambridge Analytica scandal that occurred where Facebook is handing off data, they're making a partnership agreement --think they can trust, and then that is misused. And who's going to end up paying the cost of that? Well, it's going to be Facebook at a tune of about five billion on that, plus some other finds that'll come along, and other costs that they're facing. So what we've seen over the course of the past several years has been an evolution from data breach making the headlines, and how do my customers come to us and say, help us neutralize the threat of this breach. Help us mitigate this risk, and manage this risk. What do we need to be doing, what are the best practices in the industry? Clearly what we're doing on the perimeter security, the application security and the platform security is not enough. We continue to have breaches, and we are the experts at that answer. The follow on fascinating piece has been the regulators jumping in now. First in Europe, but now we see California enacting a law just this year. They came into a place that is very stringent, and has a lot of deep protections that are really far-reaching around personal data of consumers. Look at jurisdictions like Australia, where fiduciary responsibility now goes to the Board of Directors. That's getting attention. For a regulated entity in Australia, if you're on the Board of Directors, you better have a plan for data security. And if there is a breach, you need to follow protocols, or you personally will be liable. And that is a sea change that we're seeing out in the industry. So we're getting a lot of attention on both, how do we neutralize the risk of breach, but also how can we use software tools to maintain and support our regulatory compliance efforts as we work with, say, the largest money center bank out of New York. I've watched their audit year after year, and it's gotten more and more stringent, more and more specific, tell me more about this aspect of data security, tell me more about encryption, tell me more about money management. The auditors are getting better. And we're supporting our customers in that journey to provide better security for the data, to provide a better operational environment for them to be able to roll new services out with confidence that they're not going to get breached. With that confidence, they're not going to have a regulatory compliance fine or a nightmare in the press. And these are the major drivers that help us with Vertica sell together into large organizations to say, let's add some defense in depth to your data. And that's really a key concept in the security field, this concept of defense in depth. We apply that to the data itself by changing the actual data element of Rich Gaston, I will change that name into Ciphertext, and that then yields a whole bunch of benefits throughout the organization as we deal with the lifecycle of that data. >> Okay, so a couple things I want to mention there. So first of all, totally board level topic, every board of directors should really have cyber and security as part of its agenda, and it does for the reasons that you mentioned. The other is, GDPR got it all started. I guess it was May 2018 that the penalties went into effect, and that just created a whole Domino effect. You mentioned California enacting its own laws, which, you know, in some cases are even more stringent. And you're seeing this all over the world. So I think one of the questions I have is, how do you approach all this variability? It seems to me, you can't just take a narrow approach. You have to have an end to end perspective on governance and risk and security, and the like. So are you able to do that? And if so, how so? >> Absolutely, I think one of the key areas in big data in particular, has been the concern that we have a schema, we have database tables, we have CALMS, and we have data, but we're not exactly sure what's in there. We have application developers that have been given sandbox space in our clusters, and what are they putting in there? So can we discover that data? We have those tools within Micro Focus to discover sensitive data within in your data stores, but we can also protect that data, and then we'll track it. And what we really find is that when you protect, let's say, five billion rows of a customer database, we can now know what is being done with that data on a very fine grain and granular basis, to say that this business process has a justified need to see the data in the clear, we're going to give them that authorization, they can decrypt the data. Secure data, my product, knows about that and tracks that, and can report on that and say at this date and time, Rich Gaston did the following thing to be able to pull data in the clear. And that could be then used to support the regulatory compliance responses and then audit to say, who really has access to this, and what really is that data? Then in GDPR, we're getting down into much more fine grained decisions around who can get access to the data, and who cannot. And organizations are scrambling. One of the funny conversations that I had a couple years ago as GDPR came into place was, it seemed a couple of customers were taking these sort of brute force approach of, we're going to move our analytics and all of our data to Europe, to European data centers because we believe that if we do this in the U.S., we're going to violate their law. But if we do it all in Europe, we'll be okay. And that simply was a short-term way of thinking about it. You really can't be moving your data around the globe to try to satisfy a particular jurisdiction. You have to apply the controls and the policies and put the software layers in place to make sure that anywhere that someone wants to get that data, that we have the ability to look at that transaction and say it is or is not authorized, and that we have a rock solid way of approaching that for audit and for compliance and risk management. And once you do that, then you really open up the organization to go back and use those tools the way they were meant to be used. We can use Vertica for AI, we can use Vertica for machine learning, and for all kinds of really cool use cases that are being done with IOT, with other kinds of cases that we're seeing that require data being managed at scale, but with security. And that's the challenge, I think, in the current era, is how do we do this in an elegant way? How do we do it in a way that's future proof when CCPA comes in? How can I lay this on as another layer of audit responsibility and control around my data so that I can satisfy those regulators as well as the folks over in Europe and Singapore and China and Turkey and Australia. It goes on and on. Each jurisdiction out there is now requiring audit. And like I mentioned, the audits are getting tougher. And if you read the news, the GDPR example I think is classic. They told us in 2016, it's coming. They told us in 2018, it's here. They're telling us in 2020, we're serious about this, and here's the finds, and you better be aware that we're coming to audit you. And when we audit you, we're going to be asking some tough questions. If you can't answer those in a timely manner, then you're going to be facing some serious consequences, and I think that's what's getting attention. >> Yeah, so the whole big data thing started with Hadoop, and Hadoop is open, it's distributed, and it just created a real governance challenge. I want to talk about your solutions in this space. Can you tell us more about Micro Focus voltage? I want to understand what it is, and then get into sort of how it works, and then I really want to understand how it's applied to Vertica. >> Yeah, absolutely, that's a great question. First of all, we were the originators of format preserving encryption, we developed some of the core basic research out of Stanford University that then became the company of Voltage; that build-a-brand name that we apply even though we're part of Micro Focus. So the lineage still goes back to Dr. Benet down at Stanford, one of my buddies there, and he's still at it doing amazing work in cryptography and keeping moving the industry forward, and the science forward of cryptography. It's a very deep science, and we all want to have it peer-reviewed, we all want to be attacked, we all want it to be proved secure, that we're not selling something to a major money center bank that is potentially risky because it's obscure and we're private. So we have an open standard. For six years, we worked with the Department of Commerce to get our standard approved by NIST; The National Institute of Science and Technology. They initially said, well, AES256 is going to be fine. And we said, well, it's fine for certain use cases, but for your database, you don't want to change your schema, you don't want to have this increase in storage costs. What we want is format preserving encryption. And what that does is turns my name, Rich, into a four-letter ciphertext. It can be reversed. The mathematics of that are fascinating, and really deep and amazing. But we really make that very simple for the end customer because we produce APIs. So these application programming interfaces can be accessed by applications in C or Java, C sharp, other languages. But they can also be accessed in Microservice Manor via rest and web service APIs. And that's the core of our technical platform. We have an appliance-based approach, so we take a secure data appliance, we'll put it on Prim, we'll make 50 of them if you're a big company like Verizon and you need to have these co-located around the globe, no problem; we can scale to the largest enterprise needs. But our typical customer will install several appliances and get going with a couple of environments like QA and Prod to be able to start getting encryption going inside their organization. Once the appliances are set up and installed, it takes just a couple of days of work for a typical technical staff to get done. Then you're up and running to be able to plug in the clients. Now what are the clients? Vertica's a huge one. Vertica's one of our most powerful client endpoints because you're able to now take that API, put it inside Vertica, it's all open on the internet. We can go and look at Vertica.com/secure data. You get all of our documentation on it. You understand how to use it very quickly. The APIs are super simple; they require three parameter inputs. It's a really basic approach to being able to protect and access data. And then it gets very deep from there because you have data like credit card numbers. Very different from a street address and we want to take a different approach to that. We have data like birthdate, and we want to be able to do analytics on dates. We have deep approaches on managing analytics on protected data like Date without having to put it in the clear. So we've maintained a lead in the industry in terms of being an innovator of the FF1 standard, what we call FF1 is format preserving encryption. We license that to others in the industry, per our NIST agreement. So we're the owner, we're the operator of it, and others use our technology. And we're the original founders of that, and so we continue to sort of lead the industry by adding additional capabilities on top of FF1 that really differentiate us from our competitors. Then you look at our API presence. We can definitely run as a dup, but we also run in open systems. We run on main frame, we run on mobile. So anywhere in the enterprise or one in the cloud, anywhere you want to be able to put secure data, and be able to access the protect data, we're going to be there and be able to support you there. >> Okay so, let's say I've talked to a lot of customers this week, and let's say I'm running in Eon mode. And I got some workload running in AWS, I've got some on Prim. I'm going to take an appliance or multiple appliances, I'm going to put it on Prim, but that will also secure my cloud workloads as part of a sort of shared responsibility model, for example? Or how does that work? >> No, that's absolutely correct. We're really flexible that we can run on Prim or in the cloud as far as our crypto engine, the key management is really hard stuff. Cryptography is really hard stuff, and we take care of all that, so we've all baked that in, and we can run that for you as a service either in the cloud or on Prim on your small Vms. So really the lightweight footprint for me running my infrastructure. When I look at the organization like you just described, it's a classic example of where we fit because we will be able to protect that data. Let's say you're ingesting it from a third party, or from an operational system, you have a website that collects customer data. Someone has now registered as a new customer, and they're going to do E-commerce with you. We'll take that data, and we'll protect it right at the point of capture. And we can now flow that through the organization and decrypt it at will on any platform that you have that you need us to be able to operate on. So let's say you wanted to pick that customer data from the operational transaction system, let's throw it into Eon, let's throw it into the cloud, let's do analytics there on that data, and we may need some decryption. We can place secure data wherever you want to be able to service that use case. In most cases, what you're doing is a simple, tiny little atomic efetch across a protected tunnel, your typical TLS pipe tunnel. And once that key is then cashed within our client, we maintain all that technology for you. You don't have to know about key management or dashing. We're good at that; that's our job. And then you'll be able to make those API calls to access or protect the data, and apply the authorization authentication controls that you need to be able to service your security requirements. So you might have third parties having access to your Vertica clusters. That is a special need, and we can have that ability to say employees can get X, and the third party can get Y, and that's a really interesting use case we're seeing for shared analytics in the internet now. >> Yeah for sure, so you can set the policy how we want. You know, I have to ask you, in a perfect world, I would encrypt everything. But part of the reason why people don't is because of performance concerns. Can you talk about, and you touched upon it I think recently with your sort of atomic access, but can you talk about, and I know it's Vertica, it's Ferrari, etc, but anything that slows it down, I'm going to be a concern. Are customers concerned about that? What are the performance implications of running encryption on Vertica? >> Great question there as well, and what we see is that we want to be able to apply scale where it's needed. And so if you look at ingest platforms that we find, Vertica is commonly connected up to something like Kafka. Maybe streamsets, maybe NiFi, there are a variety of different technologies that can route that data, pipe that data into Vertica at scale. Secured data is architected to go along with that architecture at the node or at the executor or at the lowest level operator level. And what I mean by that is that we don't have a bottleneck that everything has to go through one process or one box or one channel to be able to operate. We don't put an interceptor in between your data and coming and going. That's not our approach because those approaches are fragile and they're slow. So we typically want to focus on integrating our APIs natively within those pipeline processes that come into Vertica within the Vertica ingestion process itself, you can simply apply our protection when you do the copy command in Vertica. So really basic simple use case that everybody is typically familiar with in Vertica land; be able to copy the data and put it into Vertica, and you simply say protect as part of the data. So my first name is coming in as part of this ingestion. I'll simply put the protect keyword in the Syntax right in SQL; it's nothing other than just an extension SQL. Very very simple, the developer, easy to read, easy to write. And then you're going to provide the parameters that you need to say, oh the name is protected with this kind of a format. To differentiate it between a credit card number and an alphanumeric stream, for example. So once you do that, you then have the ability to decrypt. Now, on decrypt, let's look at a couple different use cases. First within Vertica, we might be doing select statements within Vertica, we might be doing all kinds of jobs within Vertica that just operate at the SQL layer. Again, just insert the word "access" into the Vertica select string and provide us with the data that you want to access, that's our word for decryption, that's our lingo. And we will then, at the Vertica level, harness the power of its CPU, its RAM, its horsepower at the node to be able to operate on that operator, the decryption request, if you will. So that gives us the speed and the ability to scale out. So if you start with two nodes of Vertica, we're going to operate at X number of hundreds of thousands of transactions a second, depending on what you're doing. Long strings are a little bit more intensive in terms of performance, but short strings like social security number are our sweet spot. So we operate very very high speed on that, and you won't notice the overhead with Vertica, perse, at the node level. When you scale Vertica up and you have 50 nodes, and you have large clusters of Vertica resources, then we scale with you. And we're not a bottleneck and at any particular point. Everybody's operating independently, but they're all copies of each other, all doing the same operation. Fetch a key, do the work, go to sleep. >> Yeah, you know, I think this is, a lot of the customers have said to us this week that one of the reasons why they like Vertica is it's very mature, it's been around, it's got a lot of functionality, and of course, you know, look, security, I understand is it's kind of table sticks, but it's also can be a differentiator. You know, big enterprises that you sell to, they're asking for security assessments, SOC 2 reports, penetration testing, and I think I'm hearing, with the partnership here, you're sort of passing those with flying colors. Are you able to make security a differentiator, or is it just sort of everybody's kind of got to have good security? What are your thoughts on that? >> Well, there's good security, and then there's great security. And what I found with one of my money center bank customers here in San Francisco was based here, was the concern around the insider access, when they had a large data store. And the concern that a DBA, a database administrator who has privilege to everything, could potentially exfil data out of the organization, and in one fell swoop, create havoc for them because of the amount of data that was present in that data store, and the sensitivity of that data in the data store. So when you put voltage encryption on top of Vertica, what you're doing now is that you're putting a layer in place that would prevent that kind of a breach. So you're looking at insider threats, you're looking at external threats, you're looking at also being able to pass your audit with flying colors. The audits are getting tougher. And when they say, tell me about your encryption, tell me about your authentication scheme, show me the access control list that says that this person can or cannot get access to something. They're asking tougher questions. That's where secure data can come in and give you that quick answer of it's encrypted at rest. It's encrypted and protected while it's in use, and we can show you exactly who's had access to that data because it's tracked via a different layer, a different appliance. And I would even draw the analogy, many of our customers use a device called a hardware security module, an HSM. Now, these are fairly expensive devices that are invented for military applications and adopted by banks. And now they're really spreading out, and people say, do I need an HSM? Well, with secure data, we certainly protect your crypto very very well. We have very very solid engineering. I'll stand on that any day of the week, but your auditor is going to want to ask a checkbox question. Do you have HSM? Yes or no. Because the auditor understands, it's another layer of protection. And it provides me another tamper evident layer of protection around your key management and your crypto. And we, as professionals in the industry, nod and say, that is worth it. That's an expensive option that you're going to add on, but your auditor's going to want it. If you're in financial services, you're dealing with PCI data, you're going to enjoy the checkbox that says, yes, I have HSMs and not get into some arcane conversation around, well no, but it's good enough. That's kind of the argument then conversation we get into when folks want to say, Vertica has great security, Vertica's fantastic on security. Why would I want secure data as well? It's another layer of protection, and it's defense in depth for you data. When you believe in that, when you take security really seriously, and you're really paranoid, like a person like myself, then you're going to invest in those kinds of solutions that get you best in-class results. >> So I'm hearing a data-centric approach to security. Security experts will tell you, you got to layer it. I often say, we live in a new world. The green used to just build a moat around the queen, but the queen, she's leaving her castle in this world of distributed data. Rich, incredibly knowlegable guest, and really appreciate you being on the front lines and sharing with us your knowledge about this important topic. So thanks for coming on theCUBE. >> Hey, thank you very much. >> You're welcome, and thanks for watching everybody. This is Dave Vellante for theCUBE, we're covering wall-to-wall coverage of the Virtual Vertica BDC, Big Data Conference. Remotely, digitally, thanks for watching. Keep it right there. We'll be right back right after this short break. (intense music)

Published Date : Mar 31 2020

SUMMARY :

Vertica Big Data Conference 2020 brought to you by Vertica. and we're pleased that The Cube could participate But maybe you can talk about your role And then to the other uses where we might be doing and how you guys are responding. and they said, we want to inform you your card and it does for the reasons that you mentioned. and put the software layers in place to make sure Yeah, so the whole big data thing started with Hadoop, So the lineage still goes back to Dr. Benet but that will also secure my cloud workloads as part of a and we can run that for you as a service but can you talk about, at the node to be able to operate on that operator, a lot of the customers have said to us this week and we can show you exactly who's had access to that data and really appreciate you being on the front lines of the Virtual Vertica BDC, Big Data Conference.

ENTITIES

Entity	Category	Confidence
Australia	LOCATION	0.99+
Europe	LOCATION	0.99+
Target	ORGANIZATION	0.99+
Verizon	ORGANIZATION	0.99+
Vertica	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
May 2018	DATE	0.99+
NIST	ORGANIZATION	0.99+
2016	DATE	0.99+
Boston	LOCATION	0.99+
2018	DATE	0.99+
San Francisco	LOCATION	0.99+
New York	LOCATION	0.99+
Target Corporation	ORGANIZATION	0.99+
$250 million	QUANTITY	0.99+
50	QUANTITY	0.99+
Rich Gaston	PERSON	0.99+
Singapore	LOCATION	0.99+
Turkey	LOCATION	0.99+
Ferrari	ORGANIZATION	0.99+
six years	QUANTITY	0.99+
2020	DATE	0.99+
one box	QUANTITY	0.99+
China	LOCATION	0.99+
C	TITLE	0.99+
Stanford University	ORGANIZATION	0.99+
Java	TITLE	0.99+
First	QUANTITY	0.99+
one	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
U.S.	LOCATION	0.99+
this week	DATE	0.99+
National Institute of Science and Technology	ORGANIZATION	0.99+
Each jurisdiction	QUANTITY	0.99+
both	QUANTITY	0.99+
Vertica	TITLE	0.99+
Rich	PERSON	0.99+
this year	DATE	0.98+
Vertica Virtual Big Data Conference	EVENT	0.98+
one channel	QUANTITY	0.98+
one process	QUANTITY	0.98+
GDPR	TITLE	0.98+
SQL	TITLE	0.98+
five billion rows	QUANTITY	0.98+
about five billion	QUANTITY	0.97+
One	QUANTITY	0.97+
C sharp	TITLE	0.97+
Benet	PERSON	0.97+
first	QUANTITY	0.96+
four-letter	QUANTITY	0.96+
Vertica Big Data Conference 2020	EVENT	0.95+
Hadoop	TITLE	0.94+
Kafka	TITLE	0.94+
Micro Focus	ORGANIZATION	0.94+

Colin Mahony, Vertica at Micro Focus | Virtual Vertica BDC 2020

>>It's the queue covering the virtual vertical Big Data Conference 2020. Brought to you by vertical. >>Hello, everybody. Welcome to the new Normal. You're watching the Cube, and it's remote coverage of the vertical big data event on digital or gone Virtual. My name is Dave Volante, and I'm here with Colin Mahoney, who's a senior vice president at Micro Focus and the GM of Vertical Colin. Well, strange times, but the show goes on. Great to see you again. >>Good to see you too, Dave. Yeah, strange times indeed. Obviously, Safety first of everyone that we made >>a >>decision to go Virtual. I think it was absolutely the right all made it in advance of how things have transpired, but we're making the best of it and appreciate your time here, going virtual with us. >>Well, Joe and we're super excited to be here. As you know, the Cube has been at every single BDC since its inception. It's a great event. You just you just presented the key note to your to your audience, You know, it was remote. You didn't have that that live vibe. And you have a lot of fans in the vertical community But could you feel the love? >>Yeah, you know, it's >>it's hard to >>feel the love virtually, but I'll tell you what. The silver lining in all this is the reach that we have for this event now is much broader than it would have been a Z you know, you know, we brought this event back. It's been a few years since we've done it. We're super excited to do it, obviously, you know, in Boston, where it was supposed to be on location, but there wouldn't have been as many people that could participate. So the silver lining in all of this is that I think there's there's a lot of love out there we're getting, too. I have a lot of participants who otherwise would not have been able to participate in this. Both live as well. It's a lot of these assets that we're gonna have available. So, um, you know, it's out there. We've got an amazing customers and of practitioners with vertical. We've got so many have been with us for a long time. We've of course, have a lot of new customers as well that we're welcoming, so it's exciting. >>Well, it's been a while. Since you've had the BDC event, a lot of transpired. You're now part of micro focus, but I know you and I know the vertical team you guys have have not stopped. You've kept the innovation going. We've been following the announcements, but but bridge the gap between the last time. You know, we had coverage of this event and where we are today. A lot has changed. >>Oh, yeah, a lot. A lot has changed. I mean, you know, it's it's the software industry, right? So nothing stays the same. We constantly have Teoh keep going. Probably the only thing that stays the same is the name Vertical. Um and, uh, you know, you're not spending 10 which is just a phenomenal released for us. So, you know, overall, the the organization continues to grow. The dedication and commitment to this great form of vertical continues every single release we do as you know, and this hasn't changed. It's always about performance and scale and adding a whole bunch of new capabilities on that front. But it's also about are our main road map and direction that we're going towards. And I think one of the things have been great about it is that we've stayed true that from day one we haven't tried to deviate too much and get into things that are barred to outside your box. But we've really done, I think, a great job of extending vertical into places where people need a lot of help. And with vertical 10 we know we're going to talk more about that. But we've done a lot of that. It's super exciting for our customers, and all of this, of course, is driven by our customers. But back to the big data conference. You know, everybody has been saying this for years. It was one of the best conferences we've been to just so really it's. It's developers giving tech talks, its customers giving talks. And we have more customers that wanted to give talks than we had slots to fill this year at the event, which is another benefit, a little bit of going virtually accommodate a little bit more about obviously still a tight schedule. But it really was an opportunity for our community to come together and talk about not just America, but how to deal with data, you know, we know the volumes are slowing down. We know the complexity isn't slowing down. The things that people want to do with AI and machine learning are moving forward in a rapid pace as well. There's a lot talk about and share, and that's really huge part of what we try to do with it. >>Well, let's get into some of that. Um, your customers are making bets. Micro focus is actually making a bet on one vertical. I wanna get your perspective on one of the waves that you're riding and where are you placing your bets? >>Yeah, No, it's great. So, you know, I think that one of the waves that we've been writing for a long time, obviously Vertical started out as a sequel platform for analytics as a sequel, database engine, relational engine. But we always knew that was just sort of takes that we wanted to do. People were going to trust us to put enormous amounts of data in our platform and what we owe everyone else's lots of analytics to take advantage of that data in the lots of tools and capabilities to shape that data to get into the right format. The operational reporting but also in this day and age for machine learning and from some pretty advanced regressions and other techniques of things. So a huge part of vertical 10 is just doubling down on that commitment to what we call in database machine learning and ai. Um, And to do that, you know, we know that we're not going to come up with the world's best algorithms. Nor is that our focus to do. Our advantage is we have this massively parallel platform to ingest store, manage and analyze the data. So we made some announcements about incorporating PM ML models into the product. We continue to deepen our python integration. Building off of a new open source project we started with uber has been a great customer and partner on This is one of our great talks here at the event. So you know, we're continuing to do that, and it turns out that when it comes to anything analytics machine learning, certainly so much of what you have to do is actually prepare the big shape the data get the data in the right format, apply the model, fit the model test a model operationalized model and is a great platform to do that. So that's a huge bet that were, um, continuing to ride on, taking advantage of and then some of the other things that we've just been seeing. You continue. I'll take object. Storage is an example on, I think Hadoop and what would you point through ultimately was a huge part of this, but there's just a massive disruption going on in the world around object storage. You know, we've made several bets on S three early we created America Yang mode, which separates computing story. And so for us that separation is not just about being able to take care of your take advantage of cloud economics as we do, or the economics of object storage. It's also about being able to truly isolate workloads and start to set the sort of platform to be able to do very autonomous things in the databases in the database could actually start self analysing without impacting many operational workloads, and so that continues with our partnership with pure storage. On premise, we just announced that we're supporting beyond Google Cloud now. In addition to Amazon, we supported on we've got a CFS now being supported by are you on mode. So we continue to ride on that mega trend as well. Just the clouds in general. Whether it's a public cloud, it's a private cloud on premise. Giving our customers the flexibility and choice to run wherever it makes sense for them is something that we are very committed to. From a flexibility standpoint. There's a lot of lock in products out there. There's a lot of cloud only products now more than ever. We're hearing our customers that they want that flexibility to be able to run anywhere. They want the ease of use and simplicity of native cloud experiences, which we're giving them as well. >>I want to stay in that architectural component for a minute. Talk about separating compute from storage is not just about economics. I mean apart Is that you, you know, green, really scale compute separate from storage as opposed to in chunks. It's more efficient, but you're saying there's other advantages to operational and workload. Specificity. Um, what is unique about vertical In this regard, however, many others separate compute from storage? What's different about vertical? >>Yeah, I think you know, there's a lot of differences about how we do it. It's one thing if you're a cloud native company, you do it and you have a shared catalog. That's key value store that all of your customers are using and are on the same one. Frankly, it's probably more of a security concern than anything. But it's another thing. When you give that capability to each customer on their own, they're fully protected. They're not sharing it with any other customers. And that's something that we hear a lot of insights from our customers. They want to be able to separate compute and storage. But they want to be able to do this in their own environment so that they know that in their data catalog there's no one else is. You share in that catalog, there's no single point of failure. So, um, that's one huge advantage that we have. And frankly, I think it just comes from being a company that's operating on premise and, uh, up in the cloud. I think another huge advantages for us is we don't know what object storage platform is gonna win, nor do we necessarily have. We designed the young vote so that it's an sdk. We started with us three, but it could be anything. It's DFS. That's three. Who knows what what object storage formats were going to be there and then finally, beyond just the object storage. We're really one of the only database companies that actually allows our customers to natively operate on data in very different formats, like parquet and or if you're familiar with those in the Hadoop community. So we not only embrace this kind of object storage disruption, but we really embrace the different data formats. And what that means is our customers that have data pipelines that you know, fully automated, putting this information in different places. They don't have to completely reload everything to take advantage of the Arctic analytics. We can go where the data is connected into it, and we offer them a lot of different ways to take advantage of those analytics. So there are a couple of unique differences with verdict, and again, I think are really advance. You know, in many ways, by not being a cloud native platform is that we're very good at operating in different environments with different formats that changing formats over time. And I don't think a lot of the other companies out there that I think many, particularly many of the SAS companies were scrambling. They even have challenges moving from saying Amazon environment to a Microsoft azure environment with their office because they've got so much unique Band Aid. Excuse me in the background. Just holding the system up that is native to any of those. >>Good. I'm gonna summarize. I'm hearing from you your Ferrari of databases that we've always known. Your your object store agnostic? Um, it's any. It's the cloud experience that you can bring on Prem to virtually any cloud. All the popular clouds hybrid. You know, aws, azure, now Google or on Prem and in a variety of different data formats. And that is, I think, you know, you need the combination of those I think is unique in the marketplace. Um, before we get into the news, I want to ask you about data silos and data silos. You mentioned H DFs where you and I met back in the early days of big data. You know, in some respects, you know, Hadoop help break down the silos with distributing the date and leave it in place, and in other respects, they created Data Lakes, which became silos. And so we have. Yet all these other sales people are trying to get to, Ah, digital transformation meeting, putting data at their core virtually obviously, and leave it in place. What's your thoughts on that in terms of data being a silo buster Buster, How does verdict of way there? >>Yeah, so And you're absolutely right, I think if even if you look at his due for all the new data that gets into the do. In many ways, it's created yet another large island of data that many organizations are struggling with because it's separate from their core traditional data warehouse. It's separate from some of the operational systems that they have, and so there might be a lot of data in there, but they're still struggling with How do I break it out of that large silo and or combine it again? I think some some of the things that verdict it doesn't part of the announcement just attend his migration tools to make it really easy. If you do want to move it from one platform to another inter vertical, but you don't have to move it, you can actually take advantage of a lot of the data where it resides with vertical, especially in the Hadoop brown with our external table storage with our building or compartment natively. So we're very pragmatic about how our customers go about this. Very few customers, Many of them tried it with Hadoop and realize that didn't work. But very few customers want a wholesale. Just say we're going to throw everything out. We're gonna get rid of our data warehouse. We're gonna hit the pause button and we're going to go from there. Just it's not possible to do that. So we've spent a lot of time investing in the product, really work with them to go where the data is and then seamlessly migrate. And when it makes sense to migrate, you mentioned the performance of America. Um, and you talked about it is the variety. It definitely is. And one other thing that we're really proud of this is that it actually is not a gas guzzler. Easy either One of the things that we're seeing, a lot of the other cloud databases pound for pound you get on the 10th the hardware vertical running up there. You get over 10 x performance. We're seeing that a lot, so it's Ah, it's not just about the performance, but it's about the efficiency as well. And I think that efficiency is really important when it comes to silos. Because there's there's just only so much horsepower out there. And it's easier for companies to play tricks and lots of servers environment when they start up for so many organizations and cloud and frankly, looking at the bills they're getting from these cloud workloads that are running. They really conscious of that. >>Yeah. The big, big energy companies love the gas guzzlers. A lot of a lot of cloud. Cute. But let's get into the news. Uh, 10 dot io you shared with your the audience in your keynote. One of the one of the highlights of data. What do we need to know? >>Yeah, so, you know, again doubling down on these mega trends, I'll start with Machine Learning and ai. We've done a lot of work to integrate so that you can take native PM ml models, bring them into vertical, run them massively parallel and help shape you know your data and prepare it. Do all the work that we know is required true machine learning. And for all the hype that there is around it, this is really you know, people want to do a lot of unsupervised machine learning, whether it's for healthcare fraud, detection, financial services. So we've doubled down on that. We now also support things like Tensorflow and, you know, as I mentioned, we're not going to come up with the best algorithms. Our job is really to ensure that those algorithms that people coming up with could be incorporated, that we can run them against massive data sets super efficiently. So that's that's number one number two on object storage. We continue to support Mawr object storage platforms for ya mode in the cloud we're expanding to Google G CPI, Google's cloud beyond just Amazon on premise or in the cloud. Now we're also supporting HD fs with beyond. Of course, we continue to have a great relationship with our partners, your storage on premise. Well, what we continue to invest in the eon mode, especially. I'm not gonna go through all the different things here, but it's not just sort of Hey, you support this and then you move on. There's so many different things that we learn about AP I calls and how to save our customers money and tricks on performance and things on the third areas. We definitely continue to build on that flexibility of deployment, which is related to young vote with. Some are described, but it's also about simplicity. It's also about some of the migration tools that we've announced to make it easy to go from one platform to another. We have a great road map on these abuse on security, on performance and scale. I mean, for us. Those are the things that we're working on every single release. We probably don't talk about them as much as we need to, but obviously they're critically important. And so we constantly look at every component in this product, you know, Version 10 is. It is a huge release for any product, especially an analytic database platform. And so there's We're just constantly revisiting you know, some of the code base and figuring out how we can do it in new and better ways. And that's a big part of 10 as well. >>I'm glad you brought up the machine Intelligence, the machine Learning and AI piece because we would agree that it is really one of the things we've noticed is that you know the new innovation cocktail. It's not being driven by Moore's law anymore. It's really a combination of you. You've collected all this data over the last 10 years through Hadoop and other data stores, object stores, etcetera. And now you're applying machine intelligence to that. And then you've got the cloud for scale. And of course, we talked about you bringing the cloud experience, whether it's on Prem or hybrid etcetera. The reason why I think this is important I wanted to get your take on this is because you do see a lot of emerging analytic databases. Cloud Native. Yes, they do suck up, you know, a lot of compute. Yeah, but they also had a lot of value. And I really wanted to understand how you guys play in that new trend, that sort of cloud database, high performance, bringing in machine learning and AI and ML tools and then driving, you know, turning data into insights and from what I'm hearing is you played directly in that and your differentiation is a lot of the things that we talk about including the ability to do that on from and in the cloud and across clouds. >>Yeah, I mean, I think that's a great point. We were a great cloud database. We run very well upon three major clouds, and you could argue some of the other plants as well in other parts of the world. Um, if you talk to our customers and we have hundreds of customers who are running vertical in the cloud, the experience is very good. I think it would always be better. We've invested a lot in taking advantage of the native cloud ecosystem, so that provisioning and managing vertical is seamless when you're in that environment will continue to do that. But vertical excuse me as a cloud platform is phenomenal. And, um, you know, there's a There's a lot of confusion out there, you know? I think there's a lot of marketing dollars spent that won't name many of the companies here. You know who they are, You know, the cloud Native Data Warehouse and it's true, you know their their software as a service. But if you talk to a lot of our customers, they're getting very good and very similar. experiences with Bernie comic. We stopped short of saying where software is a service because ultimately our customers have that control of flexibility there. They're putting verdict on whichever cloud they want to run it on, managing it. Stay tuned on that. I think you'll you'll hear from or more from us about, you know, that going going even further. But, um, you know, we do really well in the cloud, and I think he on so much of yang. And, you know, this has really been a sort of 2.5 years and never for us. But so much of eon is was designed around. The cloud was designed around Cloud Data Lakes s three, separation of compute and storage on. And if you look at the work that we're doing around container ization and a lot of these other elements, it just takes that to the next level. And, um, there's a lot of great work, so I think we're gonna get continue to get better at cloud. But I would argue that we're already and have been for some time very good at being a cloud analytic data platform. >>Well, since you open the door I got to ask you. So it's e. I hear you from a performance and architectural perspective, but you're also alluding two. I think something else. I don't know what you can share with us. You said stay tuned on that. But I think you're talking about Optionality, maybe different consumption models. That am I getting that right and you share >>your difficult in that right? And actually, I'm glad you wrote something. I think a huge part of Cloud is also has nothing to do with the technology. I think it's how you and seeing the product. Some companies want to rent the product and they want to rent it for a certain period of time. And so we allow our customers to do that. We have incredibly flexible models of how you provision and purchase our product, and I think that helps a lot. You know, I am opening the door Ah, a little bit. But look, we have customers that ask us that we're in offer them or, you know, we can offer them platforms, brawl in. We've had customers come to us and say please take over systems, um, and offer something as a distribution as I said, though I think one thing that we've been really good at is focusing on on what is our core and where we really offer offer value. But I can tell you that, um, we introduced something called the Verdict Advisor Tool this year. One of the things that the Advisor Tool does is it collects information from our customer environments on premise or the cloud, and we run through our own machine learning. We analyze the customer's environment and we make some recommendations automatically. And a lot of our customers have said to us, You know, it's funny. We've tried managed service, tried SAS off, and you guys blow them away in terms of your ability to help us, like automatically managed the verdict, environment and the system. Why don't you guys just take this product and converted into a SAS offering, so I won't go much further than that? But you can imagine that there's a lot of innovation and a lot of thoughts going into how we can do that. But there's no reason that we have to wait and do that today and being able to offer our customers on premise customers that same sort of experience from a managed capability is something that we spend a lot of time thinking about as well. So again, just back to the automation that ease of use, the going above and beyond. Its really excited to have an analytic platform because we can do so much automation off ourselves. And just like we're doing with Perfect Advisor Tool, we're leveraging our own Kool Aid or Champagne Dawn. However you want to say Teoh, in fact, tune up and solve, um, some optimization for our customers automatically, and I think you're going to see that continue. And I think that could work really well in a bunch of different wallets. >>Welcome. Just on a personal note, I've always enjoyed our conversations. I've learned a lot from you over the years. I'm bummed that we can't hang out in Boston, but hopefully soon, uh, this will blow over. I loved last summer when we got together. We had the verdict throwback. We had Stone Breaker, Palmer, Lynch and Mahoney. We did a great series, and that was a lot of fun. So it's really it's a pleasure. And thanks so much. Stay safe out there and, uh, we'll talk to you soon. >>Yeah, you too did stay safe. I really appreciate it up. Unity and, you know, this is what it's all about. It's Ah, it's a lot of fun. I know we're going to see each other in person soon, and it's the people in the community that really make this happen. So looking forward to that, but I really appreciate it. >>Alright. And thank you, everybody for watching. This is the Cube coverage of the verdict. Big data conference gone, virtual going digital. I'm Dave Volante. We'll be right back right after this short break. >>Yeah.

Published Date : Mar 31 2020

SUMMARY :

Brought to you by vertical. Great to see you again. Good to see you too, Dave. I think it was absolutely the right all made it in advance of And you have a lot of fans in the vertical community But could you feel the love? to do it, obviously, you know, in Boston, where it was supposed to be on location, micro focus, but I know you and I know the vertical team you guys have have not stopped. I mean, you know, it's it's the software industry, on one of the waves that you're riding and where are you placing your Um, And to do that, you know, we know that we're not going to come up with the world's best algorithms. I mean apart Is that you, you know, green, really scale Yeah, I think you know, there's a lot of differences about how we do it. It's the cloud experience that you can bring on Prem to virtually any cloud. to another inter vertical, but you don't have to move it, you can actually take advantage of a lot of the data One of the one of the highlights of data. And so we constantly look at every component in this product, you know, And of course, we talked about you bringing the cloud experience, whether it's on Prem or hybrid etcetera. And if you look at the work that we're doing around container ization I don't know what you can share with us. I think it's how you and seeing the product. I've learned a lot from you over the years. Unity and, you know, this is what it's all about. This is the Cube coverage of the verdict.

ENTITIES

Entity	Category	Confidence
Colin Mahoney	PERSON	0.99+
Dave Volante	PERSON	0.99+
Dave	PERSON	0.99+
Boston	LOCATION	0.99+
Joe	PERSON	0.99+
Colin Mahony	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
uber	ORGANIZATION	0.99+
three	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
python	TITLE	0.99+
hundreds	QUANTITY	0.99+
Ferrari	ORGANIZATION	0.99+
10	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
one	QUANTITY	0.99+
2.5 years	QUANTITY	0.99+
two	QUANTITY	0.99+
Kool Aid	ORGANIZATION	0.99+
Vertical Colin	ORGANIZATION	0.99+
10th	QUANTITY	0.99+
Both	QUANTITY	0.99+
Micro Focus	ORGANIZATION	0.98+
each customer	QUANTITY	0.98+
Moore	PERSON	0.98+
America	LOCATION	0.98+
this year	DATE	0.98+
one platform	QUANTITY	0.97+
today	DATE	0.96+
One	QUANTITY	0.96+
10	TITLE	0.96+
Vertica	ORGANIZATION	0.96+
last summer	DATE	0.95+
third areas	QUANTITY	0.94+
one thing	QUANTITY	0.93+
Vertical	ORGANIZATION	0.92+
this year	DATE	0.92+
single point	QUANTITY	0.92+
Big Data Conference 2020	EVENT	0.92+
Arctic	ORGANIZATION	0.91+
Hadoop	ORGANIZATION	0.89+
three major clouds	QUANTITY	0.88+
H DFs	ORGANIZATION	0.86+
Cloud Data Lakes	TITLE	0.86+
Stone Breaker	ORGANIZATION	0.86+
one huge advantage	QUANTITY	0.86+
Hadoop	TITLE	0.85+
BDC	EVENT	0.83+
day one	QUANTITY	0.83+
Version 10	TITLE	0.83+
Cube	COMMERCIAL_ITEM	0.82+
Google Cloud	TITLE	0.82+
BDC 2020	EVENT	0.81+
thing	QUANTITY	0.79+
Bernie	PERSON	0.79+
first	QUANTITY	0.79+
over 10 x	QUANTITY	0.78+
Prem	ORGANIZATION	0.78+
one vertical	QUANTITY	0.77+
Virtual Vertica	ORGANIZATION	0.77+
Verdict	ORGANIZATION	0.75+
SAS	ORGANIZATION	0.75+
Champagne Dawn	ORGANIZATION	0.73+
every single release	QUANTITY	0.72+
Perfect	TITLE	0.71+
years	QUANTITY	0.7+
last 10 years	DATE	0.69+
Palmer	ORGANIZATION	0.67+
Tensorflow	TITLE	0.65+
single release	QUANTITY	0.65+
a minute	QUANTITY	0.64+
Advisor Tool	TITLE	0.63+
customers	QUANTITY	0.62+

Ben White, Domo | Virtual Vertica BDC 2020

>> Announcer: It's theCUBE covering the Virtual Vertica Big Data Conference 2020, brought to you by Vertica. >> Hi, everybody. Welcome to this digital coverage of the Vertica Big Data Conference. You're watching theCUBE and my name is Dave Volante. It's my pleasure to invite in Ben White, who's the Senior Database Engineer at Domo. Ben, great to see you, man. Thanks for coming on. >> Great to be here and here. >> You know, as I said, you know, earlier when we were off-camera, I really was hoping I could meet you face-to-face in Boston this year, but hey, I'll take it, and, you know, our community really wants to hear from experts like yourself. But let's start with Domo as the company. Share with us what Domo does and what your role is there. >> Well, if I can go straight to the official what Domo does is we provide, we process data at BI scale, we-we-we provide BI leverage at cloud scale in record time. And so what that means is, you know, we are a business-operating system where we provide a number of analytical abilities to companies of all sizes. But we do that at cloud scale and so I think that differentiates us quite a bit. >> So a lot of your work, if I understand it, and just in terms of understanding what Domo does, there's a lot of pressure in terms of being real-time. It's not, like, you sometimes don't know what's coming at you, so it's ad-hoc. I wonder if you could sort of talk about that, confirm that, maybe add a little color to it. >> Yeah, absolutely, absolutely. That's probably the biggest challenge it is to being, to operating Domo is that it is an ad hoc environment. And certainly what that means, is that you've got analysts and executives that are able to submit their own queries with out very... With very few limitations. So from an engineering standpoint, that challenge in that of course is that you don't have this predictable dashboard to plan for, when it comes to performance planning. So it definitely presents some challenges for us that we've done some pretty unique things, I think, to address those. >> So it sounds like your background fits well with that. I understand your people have called you a database whisperer and an envelope pusher. What does that mean to a DBA in this day and age? >> The whisperer part is probably a lost art, in the sense that it's not really sustainable, right? The idea that, you know, whatever it is I'm able to do with the database, it has to be repeatable. And so that's really where analytics comes in, right? That's where pushing the envelope comes in. And in a lot of ways that's where Vertica comes in with this open architecture. And so as a person who has a reputation for saying, "I understand this is what our limitations should be, but I think we can do more." Having a platform like Vertica, with such an open architecture, kind of lets you push those limits quite a bit. >> I mean I've always felt like, you know, Vertica, when I first saw the stone breaker architecture and talked to some of the early founders, I always felt like it was the Ferrari of databases, certainly at the time. And it sounds like you guys use it in that regard. But talk a little bit more about how you use Vertica, why, you know, why MPP, why Vertica? You know, why-why can't you do this with RDBMS? Educate us, a little bit, on, sort of, the basics. >> For us it was, part of what I mentioned when we started, when we talked about the very nature of the Domo platform, where there's an incredible amount of resiliency required. And so Vertica, the MPP platform, of course, allows us to build individual database clusters that can perform best for the workload that might be assigned to them. So the open, the expandable, the... The-the ability to grow Vertica, right, as your base grows, those are all important factors, when you're choosing early on, right? Without a real idea of how growth would be or what it will look like. If you were kind of, throwing up something to the dark, you look at the Vertica platform and you can see, well, as I grow, I can, kind of, build with this, right? I can do some unique things with the platform in terms of this open architecture that will allow me to not have to make all my decisions today, right? (mutters) >> So, you're using Vertica, I know, at least in part, you're working with AWS as well, can you describe sort of your environment? Do you give anything on-prem, is everything in cloud? What's your set up look like? >> Sure, we have a hybrid cloud environment where we have a significant presence in public files in our own private cloud. And so, yeah, having said that, we certainly have a really an extensive presence, I would say, in AWS. So, they're definitely the partner of our when it comes to providing the databases and the server power that we need to operate on. >> From a standpoint of engineering and architecting a database, what were some of the challenges that you faced when you had to create that hybrid architecture? What did you face and how did you overcome that? >> Well, you know, some of the... There were some things we faced in terms of, one, it made it easy that Vertica and AWS have their own... They play well together, we'll say that. And so, Vertica was designed to work on AWS. So that part of it took care of it's self. Now our own private cloud and being able to connect that to our public cloud has been a part of our own engineering abilities. And again, I don't want to make little, make light of it, it certainly not impossible. And so we... Some of the challenges that pertain to the database really were in the early days, that you mentioned, when we talked a little bit earlier about Vertica's most recent eon mode. And I'm sure you'll get to that. But when I think of early challenges, some of the early challenges were the architecture of enterprise mode. When I talk about all of these, this idea that we can have unique databases or database clusters of different sizes, or this elasticity, because really, if you know the enterprise architecture, that's not necessarily the enterprise architecture. So we had to do some unique things, I think, to overcome that, right, early. To get around the rigidness of enterprise. >> Yeah, I mean, I hear you. Right? Enterprise is complex and you like when things are hardened and fossilized but, in your ad hoc environment, that's not what you needed. So talk more about eon mode. What is eon mode for you and how do you apply it? What are some of the challenges and opportunities there, that you've found? >> So, the opportunities were certainly in this elastic architecture and the ability to separate in the storage, immediately meant that for some of the unique data paths that we wanted to take, right? We could do that fairly quickly. Certainly we could expand databases, right, quickly. More importantly, now you can reduce. Because previously, in the past, right, when I mentioned the enterprise architecture, the idea of growing a database in itself has it's pain. As far as the time it takes to (mumbles) the data, and that. Then think about taking that database back down and (telephone interference). All of a sudden, with eon, right, we had this elasticity, where you could, kind of, start to think about auto scaling, where you can go up and down and maybe you could save some money or maybe you could improve performance or maybe you could meet demand, At a time where customers need it most, in a real way, right? So it's definitely a game changer in that regard. >> I always love to talk to the customers because I get to, you know, I hear from the vendor, what they say, and then I like to, sort of, validate it. So, you know, Vertica talks a lot about separating compute and storage, and they're not the only one, from an architectural standpoint who do that. But Vertica stresses it. They're the only one that does that with a hybrid architecture. They can do it on-prem, they can do it in the cloud. From your experience, well first of all, is that true? You may or may not know, but is that advantageous to you, and if so, why? >> Well, first of all, it's certainly true. Earlier in some of the original beta testing for the on-prem eon modes that we... I was able to participate in it and be aware of it. So it certainly a realty, they, it's actually supported on Pure storage with FlashBlade and it's quite impressive. You know, for who, who will that be for, tough one. It's probably Vertica's question that they're probably still answering, but I think, obviously, some enterprise users that probably have some hybrid cloud, right? They have some architecture, they have some hardware, that they themselves, want to make use of. We certainly would probably fit into one of their, you know, their market segments. That they would say that we might be the ones to look at on-prem eon mode. Again, the beauty of it is, the elasticity, right? The idea that you could have this... So a lot of times... So I want to go back real quick to separating compute. >> Sure. Great. >> You know, we start by separating it. And I like to think of it, maybe more of, like, the up link. Because in a true way, it's not necessarily separated because ultimately, you're bringing the compute and the storage back together. But to be able to decouple it quickly, replace nodes, bring in nodes, that certainly fits, I think, what we were trying to do in building this kind of ecosystem that could respond to unknown of a customer query or of a customer demand. >> I see, thank you for that clarification because you're right, it's really not separating, it's decoupling. And that's important because you can scale them independently, but you still need compute and you still need storage to run your work load. But from a cost standpoint, you don't have to buy it in chunks. You can buy in granular segments for whatever your workload requires. Is that, is that the correct understanding? >> Yeah, and to, the ability to able to reuse compute. So in the scenario of AWS or even in the scenario of your on-prem solution, you've got this data that's safe and secure in (mumbles) computer storage, but the compute that you have, you can reuse that, right? You could have a scenario that you have some query that needs more analytic, more-more fire power, more memory, more what have you that you have. And so you can kind of move between, and that's important, right? That's maybe more important than can I grow them separately. Can I, can I borrow it. Can I borrow that compute you're using for my (cuts out) and give it back? And you can do that, when you're so easily able to decouple the compute and put it where you want, right? And likewise, if you have a down period where customers aren't using it, you'd like to be able to not use that, if you no longer require it, you're not going to get it back. 'Cause it-it opened the door to a lot of those things that allowed performance and process department to meet up. >> I wonder if I can ask you a question, you mentioned Pure a couple of times, are you using Pure FlashBlade on-prem, is that correct? >> That is the solution that is supported, that is supported by Vertica for the on-prem. (cuts out) So at this point, we have been discussing with them about some our own POCs for that. Before, again, we're back to the idea of how do we see ourselves using it? And so we certainly discuss the feasibility of bringing it in and giving it the (mumbles). But that's not something we're... Heavily on right now. >> And what is Domo for Domo? Tell us about that. >> Well it really started as this idea, even in the company, where we say, we should be using Domo in our everyday business. From the sales folk to the marketing folk, right. Everybody is going to use Domo, it's a business platform. For us in engineering team, it was kind of like, well if we use Domo, say for instance, to be better at the database engineers, now we've pointed Domo at itself, right? Vertica's running Domo in the background to some degree and then we turn around and say, "Hey Domo, how can we better at running you?" So it became this kind of cool thing we'd play with. We're now able to put some, some methods together where we can actually do that, right. Where we can monitor using our platform, that's really good at processing large amounts of data and spitting out useful analytics, right. We take those analytics down, make recommendation changes at the-- For now, you've got Domo for Domo happening and it allows us to sit at home and work. Now, even when we have to, even before we had to. >> Well, you know, look. Look at us here. Right? We couldn't meet in Boston physically, we're now meeting remote. You're on a hot spot because you've got some weather in your satellite internet in Atlanta and we're having a great conversation. So-so, we're here with Ben White, who's a senior database engineer at Domo. I want to ask you about some of the envelope pushing that you've done around autonomous. You hear that word thrown around a lot. Means a lot of things to a lot of different people. How do you look at autonomous? And how does it fit with eon and some of the other things you're doing? >> You know, I... Autonomous and the idea idea of autonomy is something that I don't even know if that I have already, ready to define. And so, even in my discussion, I often mention it as a road to it. Because exactly where it is, it's hard to pin down, because there's always this idea of how much trust do you give, right, to the system or how much, how much is truly autonomous? How much already is being intervened by us, the engineers. So I do hedge on using that. But on this road towards autonomy, when we look at, what we're, how we're using Domo. And even what that really means for Vertica, because in a lot of my examples and a lot of the things that we've engineered at Domo, were designed to maybe overcome something that I thought was a limitation thing. And so many times as we've done that, Vertica has kind of met us. Like right after we've kind of engineered our architecture stuff, that we thought that could help on our side, Vertica has a release that kind of addresses it. So, the autonomy idea and the idea that we could analyze metadata, make recommendations, and then execute those recommendations without innervation, is that road to autonomy. Once the database is properly able to do that, you could see in our ad hoc environment how that would be pretty useful, where with literally millions of queries every hour, trying to figure out what's the best, you know, profile. >> You know for- >> (overlapping) probably do a better job in that, than we could. >> For years I felt like IT folks sometimes were really, did not want that automation, they wanted the knobs to turn. But I wonder if you can comment. I feel as though the level of complexity now, with cloud, with on-prem, with, you know, hybrid, multicloud, the scale, the speed, the real time, it just gets, the pace is just too much for humans. And so, it's almost like the industry is going to have to capitulate to the machine. And then, really trust the machine. But I'm still sensing, from you, a little bit of hesitation there, but light at the end of the tunnel. I wonder if you can comment? >> Sure. I think the light at the end of the tunnel is even in the recent months and recent... We've really begin to incorporate more machine learning and artificial intelligence into the model, right. And back to what we're saying. So I do feel that we're getting closer to finding conditions that we don't know about. Because right now our system is kind of a rule, rules based system, where we've said, "Well these are the things we should be looking for, these are the things that we think are a problem." To mature to the point where the database is recognizing anomalies and taking on pattern (mutters). These are problems you didn't know happen. And that's kind of the next step, right. Identifying the things you didn't know. And that's the path we're on now. And it's probably more exciting even than, kind of, nailing down all the things you think you know. We figure out what we don't know yet. >> So I want to close with, I know you're a prominent member of the, a respected member of the Vertica Customer Advisory Board, and you know, without divulging anything confidential, what are the kinds of things that you want Vertica to do going forward? >> Oh, I think, some of the in dated base for autonomy. The ability to take some of the recommendations that we know can derive from the metadata that already exists in the platform and start to execute some of the recommendations. And another thing we've talked about, and I've been pretty open about talking to it, talking about it, is the, a new version of the database designer, I think, is something that I'm sure they're working on. Lightweight, something that can give us that database design without the overhead. Those are two things, I think, as they nail or basically the database designer, as they respect that, they'll really have all the components in play to do in based autonomy. And I think that's, to some degree, where they're heading. >> Nice. Well Ben, listen, I really appreciate you coming on. You're a thought leader, you're very open, open minded, Vertica is, you know, a really open community. I mean, they've always been quite transparent in terms of where they're going. It's just awesome to have guys like you on theCUBE to-to share with our community. So thank you so much and hopefully we can meet face-to-face shortly. >> Absolutely. Well you stay safe in Boston, one of my favorite towns and so no doubt, when the doors get back open, I'll be coming down. Or coming up as it were. >> Take care. All right, and thank you for watching everybody. Dave Volante with theCUBE, we're here covering the Virtual Vertica Big Data Conference. (electronic music)

Published Date : Mar 31 2020

SUMMARY :

brought to you by Vertica. of the Vertica Big Data Conference. I really was hoping I could meet you face-to-face And so what that means is, you know, I wonder if you could sort of talk about that, confirm that, is that you don't have this predictable dashboard What does that mean to a DBA in this day and age? The idea that, you know, And it sounds like you guys use it in that regard. that can perform best for the workload that we need to operate on. Some of the challenges that pertain to the database and you like when things are hardened and fossilized and the ability to separate in the storage, but is that advantageous to you, and if so, why? The idea that you could have this... And I like to think of it, maybe more of, like, the up link. And that's important because you can scale them the compute and put it where you want, right? that is supported by Vertica for the on-prem. And what is Domo for Domo? From the sales folk to the marketing folk, right. I want to ask you about some of the envelope pushing and a lot of the things that we've engineered at Domo, than we could. But I wonder if you can comment. nailing down all the things you think you know. And I think that's, to some degree, where they're heading. It's just awesome to have guys like you on theCUBE Well you stay safe in Boston, All right, and thank you for watching everybody.

ENTITIES

Entity	Category	Confidence
AWS	ORGANIZATION	0.99+
Dave Volante	PERSON	0.99+
Ben White	PERSON	0.99+
Boston	LOCATION	0.99+
Vertica	ORGANIZATION	0.99+
Atlanta	LOCATION	0.99+
Ferrari	ORGANIZATION	0.99+
Domo	ORGANIZATION	0.99+
Vertica Customer Advisory Board	ORGANIZATION	0.99+
Ben	PERSON	0.99+
two things	QUANTITY	0.98+
this year	DATE	0.98+
Vertica	TITLE	0.98+
theCUBE	ORGANIZATION	0.97+
Vertica Big Data Conference	EVENT	0.97+
Domo	TITLE	0.97+
Domo	PERSON	0.96+
Virtual Vertica Big Data Conference	EVENT	0.96+
Virtual Vertica Big Data Conference 2020	EVENT	0.96+
first	QUANTITY	0.95+
eon	TITLE	0.92+
one	QUANTITY	0.87+
today	DATE	0.87+
millions of queries	QUANTITY	0.84+
FlashBlade	TITLE	0.82+
Virtual Vertica	EVENT	0.75+
couple	QUANTITY	0.7+
Pure FlashBlade	COMMERCIAL_ITEM	0.58+
BDC 2020	EVENT	0.56+
MPP	TITLE	0.55+
times	QUANTITY	0.51+
RDBMS	TITLE	0.48+

Joy King, Vertica | Virtual Vertica BDC 2020

>>Yeah, it's the queue covering the virtual vertical Big Data Conference 2020 Brought to You by vertical. >>Welcome back, everybody. My name is Dave Vellante, and you're watching the Cube's coverage of the verdict of Virtual Big Data conference. The Cube has been at every BTC, and it's our pleasure in these difficult times to be covering BBC as a virtual event. This digital program really excited to have Joy King joining us. Joy is the vice president of product and go to market strategy in particular. And if that weren't enough, he also runs marketing and education curve for him. So, Joe, you're a multi tool players. You've got the technical side and the marketing gene, So welcome to the Cube. You're always a great guest. Love to have you on. >>Thank you so much, David. The pleasure, it really is. >>So I want to get in. You know, we'll have some time. We've been talking about the conference and the virtual event, but I really want to dig in to the product stuff. It's a big day for you guys. You announced 10.0. But before we get into the announcements, step back a little bit you know, you guys are riding the waves. I've said to ah, number of our guests that that brick has always been good. It riding the wave not only the initial MPP, but you you embraced, embraced HD fs. You embrace data science and analytics and in the cloud. So one of the trends that you see the big waves that you're writing >>Well, you're absolutely right, Dave. I mean, what what I think is most interesting and important is because verdict is, at its core a true engineering culture founded by, well, a pretty famous guy, right, Dr Stone Breaker, who embedded that very technical vertical engineering culture. It means that we don't pretend to know everything that's coming, but we are committed to embracing the tech. An ology trends, the innovations, things like that. We don't pretend to know it all. We just do it all. So right now, I think I see three big imminent trends that we are addressing. And matters had we have been for a while, but that are particularly relevant right now. The first is a combination of, I guess, a disappointment in what Hadoop was able to deliver. I always feel a little guilty because she's a very reasonably capable elephant. She was designed to be HD fs highly distributed file store, but she cant be an entire zoo, so there's a lot of disappointment in the market, but a lot of data. In HD FM, you combine that with some of the well, not some the explosion of cloud object storage. You're talking about even more data, but even more data silos. So data growth and and data silos is Trend one. Then what I would say Trend, too, is the cloud Reality Cloud brings so many events. There are so many opportunities that public cloud computing delivers. But I think we've learned enough now to know that there's also some reality. The cloud providers themselves. Dave. Don't talk about it well, because not, is it more agile? Can you do things without having to manage your own data center? Of course you can. That the reality is it's a little more pricey than we expected. There are some security and privacy concerns. There's some workloads that can go to the cloud, so hybrid and also multi cloud deployments are the next trend that are mandatory. And then maybe the one that is the most exciting in terms of changing the world we could use. A little change right now is operationalize in machine learning. There's so much potential in the technology, but it's somehow has been stuck for the most part in science projects and data science lab, and the time is now to operationalize it. Those are the three big trends that vertical is focusing on right now. >>That's great. I wonder if I could ask you a couple questions about that. I mean, I like you have a soft spot in my heart for the and the thing about the Hadoop that that was, I think, profound was it got people thinking about, you know, bringing compute to the data and leaving data in place, and it really got people thinking about data driven cultures. It didn't solve all the problems, but it collected a lot of data that we can now take your third trend and apply machine intelligence on top of that data. And then the cloud is really the ability to scale, and it gives you that agility and that it's not really that cloud experience. It's not not just the cloud itself, it's bringing the cloud experience to wherever the data lives. And I think that's what I'm hearing from you. Those are the three big super powers of innovation today. >>That's exactly right. So, you know, I have to say I think we all know that Data Analytics machine learning none of that delivers real value unless the volume of data is there to be able to truly predict and influence the future. So the last 7 to 10 years has been correctly about collecting the data, getting the data into a common location, and H DFS was well designed for that. But we live in a capitalist world, and some companies stepped in and tried to make HD Fs and the broader Hadoop ecosystem be the single solution to big data. It's not true. So now that the key is, how do we take advantage of all of that data? And now that's exactly what verdict is focusing on. So as you know, we began our journey with vertical back in the day in 2007 with our first release, and we saw the growth of the dupe. So we announced many years ago verdict a sequel on that. The idea to be able to deploy vertical on Hadoop nodes and query the data in Hadoop. We wanted to help. Now with Verdict A 10. We are also introducing vertical in eon mode, and we can talk more about that. But Verdict and Ian Mode for HDs, This is a way to apply it and see sequel database management platform to H DFS infrastructure and data in each DFS file storage. And that is a great way to leverage the investment that so many companies have made in HD Fs. And I think it's fair to the elephant to treat >>her well. Okay, let's get into the hard news and auto. Um, she's got, but you got a mature stack, but one of the highlights of append auto. And then we can drill into some of the technologies >>Absolutely so in well in 2018 vertical announced vertical in Deon mode is the separation of compute from storage. Now this is a great example of vertical embracing innovation. Vertical was designed for on premises, data centers and bare metal servers, tightly coupled storage de l three eighties from Hewlett Packard Enterprises, Dell, etcetera. But we saw that cloud computing was changing fundamentally data center architectures, and it made sense to separate compute from storage. So you add compute when you need compute. You add storage when you need storage. That's exactly what the cloud's introduced, but it was only available on the club. So first thing we did was architect vertical and EON mode, which is not a new product. Eight. This is really important. It's a deployment option. And in 2018 our customers had the opportunity to deploy their vertical licenses in EON mode on AWS in September of 2019. We then broke an important record. We brought cloud architecture down to earth and we announced vertical in eon mode so vertical with communal or shared storage, leveraging pure storage flash blade that gave us all the advantages of separating compute from storage. All of the workload, isolation, the scale up scale down the ability to manage clusters. And we did that with on Premise Data Center. And now, with vertical 10 we are announcing verdict in eon mode on HD fs and vertically on mode on Google Cloud. So what we've got here, in summary, is vertical Andy on mode, multi cloud and multiple on premise data that storage, and that gives us the opportunity to help our customers both with the hybrid and multi cloud strategies they have and unifying their data silos. But America 10 goes farther. >>Well, let me stop you there, because I just wanna I want to mention So we talked to Joe Gonzalez and past Mutual, who essentially, he was brought in. And one of this task was the lead into eon mode. Why? Because I'm asking. You still had three separate data silos and they wanted to bring those together. They're investing heavily in technology. Joe is an expert, though that really put data at their core and beyond Mode was a key part of that because they're using S three and s o. So that was Ah, very important step for those guys carry on. What else do we need to know about? >>So one of the reasons, for example, that Mass Mutual is so excited about John Mode is because of the operational advantages. You think about exactly what Joe told you about multiple clusters serving must multiple use cases and maybe multiple divisions. And look, let's be clear. Marketing doesn't always get along with finance and finance doesn't necessarily get along with up, and I t is often caught the middle. Erica and Dion mode allows workload, isolation, meaning allocating the compute resource is that different use cases need without allowing them to interfere with other use cases and allowing everybody to access the data. So it's a great way to bring the corporate world together but still protect them from each other. And that's one of the things that Mass Mutual is going to benefit from, as well, so many of >>our other customers I also want to mention. So when I saw you, ah, last last year at the Pure Storage Accelerate conference just today we are the only company that separates you from storage that that runs on Prem and in the cloud. And I was like I had to think about it. I've researched. I still can't find anybody anybody else who doesn't know. I want to mention you beat actually a number of the cloud players with that capability. So good job and I think is a differentiator, assuming that you're giving me that cloud experience and the licensing and the pricing capability. So I want to talk about that a little >>bit. Well, you're absolutely right. So let's be clear. There is no question that the public cloud public clouds introduced the separation of compute storage and these advantages that they do not have the ability or the interest to replicate that on premise for vertical. We were born to be software only. We make no money on underlying infrastructure. We don't charge as a package for the hardware underneath, so we are totally motivated to be independent of that and also to continuously optimize the software to be as efficient as possible. And we do the exact same thing to your question about life. Cloud providers charge for note indignance. That's how they charge for their underlying infrastructure. Well, in some cases, if you're being, if you're talking about a use case where you have a whole lot of data, but you don't necessarily have a lot of compute for that workload, it may make sense to pay her note. Then it's unlimited data. But what if you have a huge compute need on a relatively small data set that's not so good? Vertical offers per node and four terabyte for our customers, depending on their use case, we also offer perpetual licenses for customers who want capital. But we also offer subscription for companies that they Nope, I have to have opt in. And while this can certainly cause some complexity for our field organization, we know that it's all about choice, that everybody in today's world wants it personalized just for me. And that's exactly what we're doing with our pricing in life. >>So just to clarify, you're saying I can pay by the drink if I want to. You're not going to force me necessarily into a term or Aiken choose to have, you know, more predictable pricing. Is that, Is that correct? >>Well, so it's partially correct. The first verdict, a subscription licensing is a fixed amount for the period of the subscription. We do that so many of our customers cannot, and I'm one of them, by the way, cannot tell finance what the budgets forecast is going to be for the quarter after I spent you say what it's gonna be before, So our subscription facing is a fixed amount for a period of time. However, we do respect the fact that some companies do want usage based pricing. So on AWS, you can use verdict up by the hour and you pay by the hour. We are about to launch the very same thing on Google Cloud. So for us, it's about what do you need? And we make it happen natively directly with us or through AWS and Google Cloud. >>So I want to send so the the fixed isn't some floor. And then if you want a surge above that, you can allow usage pricing. If you're on the cloud, correct. >>Well, you actually license your cluster vertical by the hour on AWS and you run your cluster there. Or you can buy a license from vertical or a fixed capacity or a fixed number of nodes and deploy it on the cloud. And then, if you want to add more nodes or add more capacity, you can. It's not usage based for the license that you bring to the cloud. But if you purchase through the cloud provider, it is usage. >>Yeah, okay. And you guys are in the marketplace. Is that right? So, again, if I want up X, I can do that. I can choose to do that. >>That's awesome. Next usage through the AWS marketplace or yeah, directly from vertical >>because every small business who then goes to a salesforce management system knows this. Okay, great. I can pay by the month. Well, yeah, Well, not really. Here's our three year term in it, right? And it's very frustrating. >>Well, and even in the public cloud you can pay for by the hour by the minute or whatever, but it becomes pretty obvious that you're better off if you have reserved instance types or committed amounts in that by vertical offers subscription. That says, Hey, you want to have 100 terabytes for the next year? Here's what it will cost you. We do interval billing. You want to do monthly orderly bi annual will do that. But we won't charge you for usage that you didn't even know you were using until after you get the bill. And frankly, that's something my finance team does not like. >>Yeah, I think you know, I know this is kind of a wonky discussion, but so many people gloss over the licensing and the pricing, and I think my take away here is Optionality. You know, pricing your way of That's great. Thank you for that clarification. Okay, so you got Google Cloud? I want to talk about storage. Optionality. If I found him up, I got history. I got I'm presuming Google now of you you're pure >>is an s three compatible storage yet So your story >>Google object store >>like Google object store Amazon s three object store HD fs pure storage flash blade, which is an object store on prim. And we are continuing on this theft because ultimately we know that our customers need the option of having next generation data center architecture, which is sort of shared or communal storage. So all the data is in one place. Workloads can be managed independently on that data, and that's exactly what we're doing. But what we already have in two public clouds and to on premise deployment options today. And as you said, I did challenge you back when we saw each other at the conference. Today, vertical is the only analytic data warehouse platform that offers that option on premise and in multiple public clouds. >>Okay, let's talk about the ah, go back through the innovation cocktail. I'll call it So it's It's the data applying machine intelligence to that data. And we've talked about scaling at Cloud and some of the other advantages of Let's Talk About the Machine Intelligence, the machine learning piece of it. What's your story there? Give us any updates on your embracing of tooling and and the like. >>Well, quite a few years ago, we began building some in database native in database machine learning algorithms into vertical, and the reason we did that was we knew that the architecture of MPP Columbia execution would dramatically improve performance. We also knew that a lot of people speak sequel, but at the time, not so many people spoke R or even Python. And so what if we could give act us to machine learning in the database via sequel and deliver that kind of performance? So that's the journey we started out. And then we realized that actually, machine learning is a lot more as everybody knows and just algorithms. So we then built in the full end to end machine learning functions from data preparation to model training, model scoring and evaluation all the way through to fold the point and all of this again sequel accessible. You speak sequel. You speak to the data and the other advantage of this approach was we realized that accuracy was compromised if you down sample. If you moved a portion of the data from a database to a specialty machine learning platform, you you were challenged by accuracy and also what the industry is calling replica ability. And that means if a model makes a decision like, let's say, credit scoring and that decision isn't anyway challenged, well, you have to be able to replicate it to prove that you made the decision correctly. And there was a bit of, ah, you know, blow up in the media not too long ago about a credit scoring decision that appeared to be gender bias. But unfortunately, because the model could not be replicated, there was no way to this Prove that, and that was not a good thing. So all of this is built in a vertical, and with vertical 10. We've taken the next step, just like with with Hadoop. We know that innovation happens within vertical, but also outside of vertical. We saw that data scientists really love their preferred language. Like python, they love their tools and platforms like tensor flow with vertical 10. We now integrate even more with python, which we have for a while, but we also integrate with tensorflow integration and PM ML. What does that mean? It means that if you build and train a model external to vertical, using the machine learning platform that you like, you can import that model into a vertical and run it on the full end to end process. But run it on all the data. No more accuracy challenges MPP Kilometer execution. So it's blazing fast. And if somebody wants to know why a model made a decision, you can replicate that model, and you can explain why those are very powerful. And it's also another cultural unification. Dave. It unifies the business analyst community who speak sequel with the data scientist community who love their tools like Tensorflow and Python. >>Well, I think joy. That's important because so much of machine intelligence and ai there's a black box problem. You can't replicate the model. Then you do run into a potential gender bias. In the example that you're talking about there in their many you know, let's say an individual is very wealthy. He goes for a mortgage and his wife goes for some credit she gets rejected. He gets accepted this to say it's the same household, but the bias in the model that may be gender bias that could be race bias. And so being able to replicate that in and open up and make the the machine intelligence transparent is very, very important, >>It really is. And that replica ability as well as accuracy. It's critical because if you're down sampling and you're running models on different sets of data, things can get confusing. And yet you don't really have a choice. Because if you're talking about petabytes of data and you need to export that data to a machine learning platform and then try to put it back and get the next at the next day, you're looking at way too much time doing it in the database or training the model and then importing it into the database for production. That's what vertical allows, and our customers are. So it right they reopens. Of course, you know, they are the ones that are sort of the Trailblazers they've always been, and ah, this is the next step. In blazing the ML >>thrill joint customers want analytics. They want functional analytics full function. Analytics. What are they pushing you for now? What are you delivering? What's your thought on that? >>Well, I would say the number one thing that our customers are demanding right now is deployment. Flexibility. What? What the what the CEO or the CFO mandated six months ago? Now shout Whatever that thou shalt is is different. And they would, I tell them is it is impossible. No, what you're going to be commanded to do or what options you might have in the future. The key is not having to choose, and they are very, very committed to that. We have a large telco customer who is multi cloud as their commit. Why multi cloud? Well, because they see innovation available in different public clouds. They want to take advantage of all of them. They also, admittedly, the that there's the risk of lock it right. Like any vendor, they don't want that either, so they want multi cloud. We have other customers who say we have some workloads that make sense for the cloud and some that we absolutely cannot in the cloud. But we want a unified analytics strategy, so they are adamant in focusing on deployment flexibility. That's what I'd say is 1st 2nd I would say that the interest in operationalize in machine learning but not necessarily forcing the analytics team to hammer the data science team about which tools or the best tools. That's the probably number two. And then I'd say Number three. And it's because when you look at companies like Uber or the Trade Desk or A T and T or Cerner performance at scale, when they say milliseconds, they think that flow. When they say petabytes, they're like, Yeah, that was yesterday. So performance at scale good enough for vertical is never good enough. And it's why we're constantly building at the core the next generation execution engine, database designer, optimization engine, all that stuff >>I wanna also ask you. When I first started following vertical, we covered the cube covering the BBC. One of things I noticed was in talking to customers and people in the community is that you have a community edition, uh, free addition, and it's not neutered ais that have you maintain that that ethos, you know, through the transitions into into micro focus. And can you talk about that a little bit >>absolutely vertical community edition is vertical. It's all of the verdict of functionality geospatial time series, pattern matching, machine learning, all of the verdict, vertical neon mode, vertical and enterprise mode. All vertical is the community edition. The only limitation is one terabyte of data and three notes, and it's free now. If you want commercial support, where you can file a support ticket and and things like that, you do have to buy the life. But it's free, and we people say, Well, free for how long? Like our field? I've asked that and I say forever and what he said, What do you mean forever? Because we want people to use vertical for use cases that are small. They want to learn that they want to try, and we see no reason to limit that. And what we look for is when they're ready to grow when they need the next set of data that goes beyond a terabyte or they need more compute than three notes, then we're here for them, and it also brings up an important thing that I should remind you or tell you about Davis. You haven't heard it, and that's about the Vertical Academy Academy that vertical dot com well, what is that? That is, well, self paced on demand as well as vertical essential certification. Training and certification means you have seven days with your hands on a vertical cluster hosted in the cloud to go through all the certification. And guess what? All of that is free. Why why would you give it for free? Because for us empowering the market, giving the market the expert East, the learning they need to take advantage of vertical, just like with Community Edition is fundamental to our mission because we see the advantage that vertical can bring. And we want to make it possible for every company all around the world that take advantage >>of it. I love that ethos of vertical. I mean, obviously great product. But it's not just the product. It's the business practices and really progressive progressive pricing and embracing of all these trends and not running away from the waves but really leaning in joy. Thanks so much. Great interview really appreciate it. And, ah, I wished we could have been faced face in Boston, but I think it's prudent thing to do, >>I promise you, Dave we will, because the verdict of BTC and 2021 is already booked. So I will see you there. >>Haas enjoyed King. Thanks so much for coming on the Cube. And thank you for watching. Remember, the Cube is running this program in conjunction with the virtual vertical BDC goto vertical dot com slash BBC 2020 for all the coverage and keep it right there. This is Dave Vellante with the Cube. We'll be right back. >>Yeah, >>yeah, yeah.

Published Date : Mar 31 2020

SUMMARY :

Yeah, it's the queue covering the virtual vertical Big Data Conference Love to have you on. Thank you so much, David. So one of the trends that you see the big waves that you're writing Those are the three big trends that vertical is focusing on right now. it's bringing the cloud experience to wherever the data lives. So now that the key is, how do we take advantage of all of that data? And then we can drill into some of the technologies had the opportunity to deploy their vertical licenses in EON mode on Well, let me stop you there, because I just wanna I want to mention So we talked to Joe Gonzalez and past Mutual, And that's one of the things that Mass Mutual is going to benefit from, I want to mention you beat actually a number of the cloud players with that capability. for the hardware underneath, so we are totally motivated to be independent of that So just to clarify, you're saying I can pay by the drink if I want to. So for us, it's about what do you need? And then if you want a surge above that, for the license that you bring to the cloud. And you guys are in the marketplace. directly from vertical I can pay by the month. Well, and even in the public cloud you can pay for by the hour by the minute or whatever, and the pricing, and I think my take away here is Optionality. And as you said, I'll call it So it's It's the data applying machine intelligence to that data. So that's the journey we started And so being able to replicate that in and open up and make the the and get the next at the next day, you're looking at way too much time doing it in the What are they pushing you for now? commanded to do or what options you might have in the future. And can you talk about that a little bit the market, giving the market the expert East, the learning they need to take advantage of vertical, But it's not just the product. So I will see you there. And thank you for watching.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Dave Vellante	PERSON	0.99+
September of 2019	DATE	0.99+
Joe Gonzalez	PERSON	0.99+
Dave	PERSON	0.99+
2007	DATE	0.99+
Dell	ORGANIZATION	0.99+
Joy King	PERSON	0.99+
Joe	PERSON	0.99+
Joy	PERSON	0.99+
Uber	ORGANIZATION	0.99+
2018	DATE	0.99+
Boston	LOCATION	0.99+
Vertical Academy Academy	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
seven days	QUANTITY	0.99+
one terabyte	QUANTITY	0.99+
python	TITLE	0.99+
three notes	QUANTITY	0.99+
Today	DATE	0.99+
Hewlett Packard Enterprises	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
BBC	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
100 terabytes	QUANTITY	0.99+
Ian Mode	PERSON	0.99+
six months ago	DATE	0.99+
Python	TITLE	0.99+
first release	QUANTITY	0.99+
1st 2nd	QUANTITY	0.99+
three year	QUANTITY	0.99+
Mass Mutual	ORGANIZATION	0.99+
Eight	QUANTITY	0.99+
next year	DATE	0.99+
Stone Breaker	PERSON	0.99+
first	QUANTITY	0.99+
one	QUANTITY	0.98+
America 10	TITLE	0.98+
King	PERSON	0.98+
today	DATE	0.98+
four terabyte	QUANTITY	0.97+
John Mode	PERSON	0.97+
Haas	PERSON	0.97+
yesterday	DATE	0.97+
first verdict	QUANTITY	0.96+
one place	QUANTITY	0.96+
s three	COMMERCIAL_ITEM	0.96+
single	QUANTITY	0.95+
first thing	QUANTITY	0.95+
One	QUANTITY	0.95+
both	QUANTITY	0.95+
Tensorflow	TITLE	0.95+
Hadoop	TITLE	0.95+
third trend	QUANTITY	0.94+
MPP Columbia	ORGANIZATION	0.94+
Hadoop	PERSON	0.94+
last last year	DATE	0.92+
three big trends	QUANTITY	0.92+
vertical 10	TITLE	0.92+
two public clouds	QUANTITY	0.92+
Pure Storage Accelerate conference	EVENT	0.91+
Andy	PERSON	0.9+
few years ago	DATE	0.9+
next day	DATE	0.9+
Mutual	ORGANIZATION	0.9+
Mode	PERSON	0.89+
telco	ORGANIZATION	0.89+
three big	QUANTITY	0.88+
eon	TITLE	0.88+
Verdict	PERSON	0.88+
three separate data	QUANTITY	0.88+
Cube	COMMERCIAL_ITEM	0.87+
petabytes	QUANTITY	0.87+
Google Cloud	TITLE	0.86+

Larry Lancaster, Zebrium | Virtual Vertica BDC 2020

>> Announcer: It's theCUBE! Covering the Virtual Vertica Big Data Conference 2020 brought to you by Vertica. >> Hi, everybody. Welcome back. You're watching theCUBE's coverage of the Vertica Virtual Big Data Conference. It was, of course, going to be in Boston at the Encore Hotel. Win big with big data with the new casino but obviously Coronavirus has changed all that. Our hearts go out and we are empathy to those people who are struggling. We are going to continue our wall-to-wall coverage of this conference and we're here with Larry Lancaster who's the founder and CTO of Zebrium. Larry, welcome to theCUBE. Thanks for coming on. >> Hi, thanks for having me. >> You're welcome. So first question, why did you start Zebrium? >> You know, I've been dealing with machine data a long time. So for those of you who don't know what that is, if you can imagine servers or whatever goes on in a data center or in a SAS shop. There's data coming out of those servers, out of those applications and basically, you can build a lot of cool stuff on that. So there's a lot of metrics that come out and there's a lot of log files that come. And so, I've built this... Basically spent my career building that sort of thing. So tools on top of that or products on top of that. The problem is that since at least log files are completely unstructured, it's always doing the same thing over and over again, which is going in and understanding the data and extracting the data and all that stuff. It's very time consuming. If you've done it like five times you don't want to do it again. So really, my idea was at this point with machine learning where it's at there's got to be a better way. So Zebrium was founded on the notion that we can just do all that automatically. We can take a pile of machine data, we can turn it into a database, and we can build stuff on top of that. And so the company is really all about bringing that value to the market. >> That's cool. I want to get in to that, just better understand who you're disrupting and understand that opportunity better. But before I do, tell us a little bit about your background. You got kind of an interesting background. Lot of tech jobs. Give us some color there. >> Yeah, so I started in the Valley I guess 20 years ago and when my son was born I left grad school. I was in grad school over at Berkeley, Biophysics. And I realized I needed to go get a job so I ended up starting in software and I've been there ever since. I mean, I spent a lot of time at, I guess I cut my teeth at Nedap, which was a storage company. And then I co-founded a business called Glassbeam, which was kind of an ETL database company. And then after that I ended up at Nimble Storage. Another company, EMC, ended up buying the Glassbeam so I went over there and then after Nimble though, which where I build the InfoSight platform. That's where I kind of, after that I was able to step back and take a year and a half and just go into my basement, actually, this is my kind of workspace here, and come up with the technology and actually build it so that I could go raise money and get a team together to build Zebrium. So that's really my career in a nutshell. >> And you've got Hello Kitty over your right shoulder, which is kind of cool >> That's right. >> And then up to the left you got your monitor, right? >> Well, I had it. It's over here, yeah. >> But it was great! Pull it out, pull it out, let me see it. So, okay, so you got that. So what do you do? You just sit there and code all night or what? >> Yeah, that's right. So Hello Kitty's over here. I have a daughter and she setup my workspace here on this side with Hello Kitty and so on. And over on this side, I've got my recliner where I basically lay it all the way back and then I pivot this thing down over my face and put my keyboard on my lap and I can just sit there for like 20 hours. It's great. Completely comfortable. >> That's cool. All right, better put that monitor back or our guys will yell at me. But so, obviously, we're talking to somebody with serious coding chops and I'll also add that the Nimble InfoSight, I think it was one of the best pick ups that HP, HPE, has had in a while. And the thing that interested me about that, Larry, is the ability that the company was able to take that InfoSight and poured it very quickly across its product lines. So that says to me it was a modern, architecture, I'm sure API, microservices, and all those cool buzz words, but the proof is in their ability to bring that IP to other parts of the portfolio. So, well done. >> Yeah, well thanks. Appreciate that. I mean, they've got a fantastic team there. And the other thing that helps is when you have the notion that you don't just build on top of the data, you extract the data, you structure it, you put that in a database, we used Vertica there for that, and then you build on top of that. Taking the time to build that layer is what lets you build a scalable platform. >> Yeah, so, why Vertica? I mean, Vertica's been around for awhile. You remember you had the you had the old RDBMS, Oracles, Db2s, SQL Server, and then the database was kind of a boring market. And then, all of a sudden, you had all of these MPP companies came out, a spade of them. They all got acquired, including Vertica. And they've all sort of disappeared and morphed into different brands and Micro Focus has preserved the Vertica brand. But it seems like Vertica has been able to survive the transitions. Why Vertica? What was it about that platform that was unique and interested you? >> Well, I mean, so they're the first fund to build, what I would call a real column store that's kind of market capable, right? So there was the C-Store project at Berkeley, which Stonebreaker was involved in. And then that became sort of the seed from which Vertica was spawned. So you had this idea of, let's lay things out in a columnar way. And when I say columnar, I don't just mean that the data for every column is in a different set of files. What I mean by that is it takes full advantage of things like run length and coding, and L file and coding, and block--impression, and so you end up with these massive orders of magnitude savings in terms of the data that's being pulled off of storage as well as as it's moving through the pipeline internally in Vertica's query processing. So why am I saying all this? Because it's fundamentally, it was a fundamentally disruptive technology. I think column stores are ubiquitous now in analytics. And I think you could name maybe a couple of projects which are mostly open source who do something like Vertica does but name me another one that's actually capable of serving an enterprise as a relational database. I still think Vertica is unique in being that one. >> Well, it's interesting because you're a startup. And so a lot of startups would say, okay, we're going with a born-in-the-cloud database. Now Vertica touts that, well look, we've embraced cloud. You know, we have, we run in the cloud, we run on PRAM, all different optionality. And you hear a lot of vendors say that, but a lot of times they're just taking their stack and stuffing it into the cloud. But, so why didn't you go with a cloud-native database and is Vertica able to, I mean, obviously, that's why you chose it, but I'm interested from a technologist standpoint as to why you, again, made that choice given all these other choices around there. >> Right, I mean, again, I'm not, so... As I explained a column store, which I think is the appropriate definition, I'm not aware of another cloud-native-- >> Hm, okay. >> I'm aware of other cloud-native transactional databases, I'm not aware of one that has the analytics form it and I've tried some of them. So it was not like I didn't look. What I was actually impressed with and I think what let me move forward using Vertica in our stack is the fact that Eon really is built from the ground up to be cloud-native. And so we've been using Eon almost ever since we started the work that we're doing. So I've been really happy with the performance and with reliability of Eon. >> It's interesting. I've been saying for years that Vertica's a diamond in the rough and it's previous owner didn't know what to do with it because it got distracted and now Micro Focus seems to really see the value and is obviously putting some investments in there. >> Yeah >> Tell me more about your business. Who are you disrupting? Are you kind of disrupting the do-it-yourself? Or is there sort of a big whale out there that you're going to go after? Add some color to that. >> Yeah, so our broader market is monitoring software, that's kind of the high-level category. So you have a lot of people in that market right now. Some of them are entrenched in large players, like Datadog would be a great example. Some of them are smaller upstarts. It's a pretty, it's a pretty saturated market. But what's happened over the last, I'd say two years, is that there's been sort of a push towards what's called observability in terms of at least how some of the products are architected, like Honeycomb, and how some of them are messaged. Most of them are messaged these days. And what that really means is there's been sort of an understanding that's developed that that MTTR is really what people need to focus on to keep their customers happy. If you're a SAS company, MTTR is going to be your bread and butter. And it's still measured in hours and days. And the biggest reason for that is because of what's called unknown unknowns. Because of complexity. Now a days, things are, applications are ten times as complex as they used to be. And what you end up with is a situation where if something is new, if it's a known issue with a known symptom and a known root cause, then you can setup a automation for it. But the ones that really cost a lot of time in terms of service disruption are unknown unknowns. And now you got to go dig into this massive mass of data. So observability is about making tools to help you do that, but it's still going to take you hours. And so our contention is, you need to automate the eyeball. The bottleneck is now the eyeball. And so you have to get away from this notion of a person's going to be able to do it infinitely more efficient and recognize that you need automated help. When you get an alert agent, it shouldn't be that, "Hey, something weird's happening. Now go dig in." It should be, "Here's a root cause and a symptom." And that should be proposed to you by a system that actually does the observing. That actually does the watching. And that's what Zebrium does. >> Yeah, that's awesome. I mean, you're right. The last thing you want is just another alert and it say, "Go figure something out because there's a problem." So how does it work, Larry? In terms of what you built there. Can you take us inside the covers? >> Yeah, sure. So there's really, right now there's two kinds of data that we're ingesting. There's metrics and there's log files. Metrics, there's actually sort of a framework that's really popular in DevOp circles especially but it's becoming popular everywhere, which is called Prometheus. And it's a way of exporting metrics so that scrapers can collect them. And so if you go look at a typical stack, you'll find that most of the open source components and many of the closed source components are going to have exporters that export all their stacks to Prometheus. So by supporting that stack we can bring in all of those metrics. And then there's also the log files. And so you've got host log files in a containerized environment, you've got container logs, and you've got application-specific logs, perhaps living on a host mount. And you want to pull all those back and you want to be able to associate this log that I've collected here is associated with the same container on the same host that this metric is associated with. But now what? So once you've got that, you've got a pile of unstructured logs. So what we do is we take a look at those logs and we say, let's structure those into tables, right? So where I used to have a log message, if I look in my log file and I see it says something like, X happened five times, right? Well, that event types going to occur again and it'll say, X happened six times or X happened three times. So if I see that as a human being, I can say, "Oh clearly, that's the same thing." And what's interesting here is the times that X, that X happened, and that this number read... I may want to know when the numbers happened as a time series, the values of that column. And so you can imagine it as a table. So now I have table for that event type and every time it happens, I get a row. And then I have a column with that number in it. And so now I can do any kind of analytics I want almost instantly across my... If I have all my event types structured that way, every thing changes. You can do real anomaly detection and incident detection on top of that data. So that's really how we go about doing it. How we go about being able to do autonomous monitoring in a way that's effective. >> How do you handle doing that for, like the Spoke app? Do you have to, does somebody have to build a connector to those apps? How do you handle that? >> Yeah, that's a really good question. So you're right. So if I go and install a typical log manager, there'll be connectors for different apps and usually what that means is pulling in the stuff on the left, if you were to be looking at that log line, and it will be things like a time stamp, or a severity, or a function name, or various other things. And so the connector will know how to pull those apart and then the stuff to the right will be considered the message and that'll get indexed for search. And so our approach is we actually go in with machine learning and we structure that whole thing. So there's a table. And it's going to have a column called severity, and timestamp, and function name. And then it's going to have columns that correspond to the parameters that are in that event. And it'll have a name associated with the constant parts of that event. And so you end up with a situation where you've structured all of it automatically so we don't need collectors. It'll work just as well on your home-grown app that has no collectors or no parsers to find or anything. It'll work immediately just as well as it would work on anything else. And that's important, because you can't be asking people for connectors to their own applications. It just, it becomes now they've go to stop what they're doing and go write code for you, for your platform and they have to maintain it. It's just untenable. So you can be up and running with our service in three minutes. It'll just be monitoring those for you. >> That's awesome! I mean, that is really a breakthrough innovation. So, nice. Love to see that hittin' the market. Who do you sell to? Both types of companies and what role within the company? >> Well, definitely there's two main sort of pushes that we've seen, or I should say pulls. One is from DevOps folks, SRE folks. So these are people who are tasked with monitoring an environment, basically. And then you've got people who are in engineering and they have a staging environment. And what they actually find valuable is... Because when we find an incident in a staging environment, yeah, half the time it's because they're tearing everything up and it's not release ready, whatever's in stage. That's fine, they know that. But the other half the time it's new bugs, it's issues and they're finding issues. So it's kind of diverged. You have engineering users and they don't have titles like QA, they're Dev engineers or Dev managers that are really interested. And then you've got DevOps and SRE people there (mumbles). >> And how do I consume your product? Is the SAS... I sign up and you say within three minutes I'm up and running. I'm paying by the drink. >> Well, (laughs) right. So there's a couple ways. So, right. So the easiest way is if you use Kubernetes. So Kubernetes is what's called a container orchestrator. So these days, you know Docker and containers and all that, so now there's container orchestrators have become, I wouldn't say ubiquitous but they're very popular now. So it's kind of on that inflection curve. I'm not exactly sure the penetration but I'm going to say 30-40% probably of shops that were interested are using container orchestrators. So if you're using Kubernetes, basically you can install our Kubernetes chart, which basically means copying and pasting a URL and so on into your little admin panel there. And then it'll just start collecting all the logs and metrics and then you just login on the website. And the way you do that is just go to our website and it'll show you how to sign up for the service and you'll get your little API key and link to the chart and you're off and running. You don't have to do anything else. You can add rules, you can add stuff, but you don't have to. You shouldn't have to, right? You should never have to do any more work. >> That's great. So it's a SAS capability and I just pay for... How do you price it? >> Oh, right. So it's priced on volume, data volume. I don't want to go too much into it because I'm not the pricing guy. But what I'll say is that it's, as far as I know it's as cheap or cheaper than any other log manager or metrics product. It's in that same neighborhood as the very low priced ones. Because right now, we're not trying to optimize for take. We're trying to make a healthy margin and get the value of autonomous monitoring out there. Right now, that's our priority. >> And it's running in the cloud, is that right? AWB West-- >> Yeah, that right. Oh, I should've also pointed out that you can have a free account if it's less than some number of gigabytes a day we're not going to charge. Yeah, so we run in AWS. We have a multi-tenant instance in AWS. And we have a Vertica Eon cluster behind that. And it's been working out really well. >> And on your freemium, you have used the Vertica Community Edition? Because they don't charge you for that, right? So is that how you do it or... >> No, no. We're, no, no. So, I don't want to go into that because I'm not the bizdev guy. But what I'll say is that if you're doing something that winds up being OEM-ish, you can work out the particulars with Vertica. It's not like you're going to just go pay retail and they won't let you distinguish between tests, and prod, and paid, and all that. They'll work with you. Just call 'em up. >> Yeah, and that's why I brought it up because Vertica, they have a community edition, which is not neutered. It runs Eon, it's just there's limits on clusters and storage >> There's limits. >> But it's still fully functional though. >> So to your point, we want it multi-tenant. So it's big just because it's multi-tenant. We have hundred of users on that (audio cuts out). >> And then, what's your partnership with Vertica like? Can we close on that and just describe that a little bit? >> What's it like. I mean, it's pleasant. >> Yeah, I mean (mumbles). >> You know what, so the important thing... Here's what's important. What's important is that I don't have to worry about that layer of our stack. When it comes to being able to get the performance I need, being able to get the economy of scale that I need, being able to get the absolute scale that I need, I've not been disappointed ever with Vertica. And frankly, being able to have acid guarantees and everything else, like a normal mature database that can join lots of tables and still be fast, that's also necessary at scale. And so I feel like it was definitely the right choice to start with. >> Yeah, it's interesting. I remember in the early days of big data a lot of people said, "Who's going to need these acid properties and all this complexity of databases." And of course, acid properties and SQL became the killer features and functions of these databases. >> Who didn't see that one coming, right? >> Yeah, right. And then, so you guys have done a big seed round. You've raised a little over $6 million dollars and you got the product market fit down. You're ready to rock, right? >> Yeah, that's right. So we're doing a launch probably, well, when this airs it'll probably be the day before this airs. Basically, yeah. We've got people... Like literally in the last, I'd say, six to eight weeks, It's just been this sort of pique of interest. All of a sudden, everyone kind of gets what we're doing, realizes they need it, and we've got a solution that seems to meet expectations. So it's like... It's been an amazing... Let me just say this, it's been an amazing start to the year. I mean, at the same time, it's been really difficult for us but more difficult for some other people that haven't been able to go to work over the last couple of weeks and so on. But it's been a good start to the year, at least for our business. So... >> Well, Larry, congratulations on getting the company off the ground and thank you so much for coming on theCUBE and being part of the Virtual Vertica Big Data Conference. >> Thank you very much. >> All right, and thank you everybody for watching. This is Dave Vellante for theCUBE. Keep it right there. We're covering wall-to-wall Virtual Vertica BDC. You're watching theCUBE. (upbeat music)

Published Date : Mar 31 2020

SUMMARY :

brought to you by Vertica. and we're here with Larry Lancaster why did you start Zebrium? and basically, you can build a lot of cool stuff on that. and understand that opportunity better. and actually build it so that I could go raise money It's over here, yeah. So what do you do? and then I pivot this thing down over my face and I'll also add that the Nimble InfoSight, And the other thing that helps is when you have the notion and Micro Focus has preserved the Vertica brand. and so you end up with these massive orders And you hear a lot of vendors say that, I'm not aware of another cloud-native-- I'm not aware of one that has the analytics form it and now Micro Focus seems to really see the value Are you kind of disrupting the do-it-yourself? And that should be proposed to you In terms of what you built there. And so you can imagine it as a table. And so you end up with a situation I mean, that is really a breakthrough innovation. and it's not release ready, I sign up and you say within three minutes And the way you do that So it's a SAS capability and I just pay for... and get the value of autonomous monitoring out there. that you can have a free account So is that how you do it or... and they won't let you distinguish between Yeah, and that's why I brought it up because Vertica, But it's still So to your point, I mean, it's pleasant. What's important is that I don't have to worry I remember in the early days of big data and you got the product market fit down. that haven't been able to go to work and thank you so much for coming on theCUBE All right, and thank you everybody for watching.

ENTITIES

Entity	Category	Confidence
Larry Lancaster	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Larry	PERSON	0.99+
Boston	LOCATION	0.99+
five times	QUANTITY	0.99+
three times	QUANTITY	0.99+
six times	QUANTITY	0.99+
EMC	ORGANIZATION	0.99+
six	QUANTITY	0.99+
Zebrium	ORGANIZATION	0.99+
20 hours	QUANTITY	0.99+
Glassbeam	ORGANIZATION	0.99+
Nedap	ORGANIZATION	0.99+
Vertica	ORGANIZATION	0.99+
Nimble	ORGANIZATION	0.99+
Nimble Storage	ORGANIZATION	0.99+
HP	ORGANIZATION	0.99+
HPE	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
a year and a half	QUANTITY	0.99+
Micro Focus	ORGANIZATION	0.99+
ten times	QUANTITY	0.99+
two kinds	QUANTITY	0.99+
two years	QUANTITY	0.99+
three minutes	QUANTITY	0.99+
first question	QUANTITY	0.99+
eight weeks	QUANTITY	0.98+
Stonebreaker	ORGANIZATION	0.98+
Prometheus	TITLE	0.98+
30-40%	QUANTITY	0.98+
Eon	ORGANIZATION	0.98+
hundred of users	QUANTITY	0.98+
One	QUANTITY	0.98+
Vertica Virtual Big Data Conference	EVENT	0.98+
Kubernetes	TITLE	0.97+
first fund	QUANTITY	0.97+
Virtual Vertica Big Data Conference 2020	EVENT	0.97+
AWB West	ORGANIZATION	0.97+
Virtual Vertica Big Data Conference	EVENT	0.97+
Honeycomb	ORGANIZATION	0.96+
SAS	ORGANIZATION	0.96+
20 years ago	DATE	0.96+
Both types	QUANTITY	0.95+
theCUBE	ORGANIZATION	0.95+
Datadog	ORGANIZATION	0.95+
two main	QUANTITY	0.94+
over $6 million dollars	QUANTITY	0.93+
Hello Kitty	ORGANIZATION	0.93+
SQL	TITLE	0.93+
Zebrium	PERSON	0.91+
Spoke	TITLE	0.89+
Encore Hotel	LOCATION	0.88+
InfoSight	ORGANIZATION	0.88+
Coronavirus	OTHER	0.88+
one	QUANTITY	0.86+
less	QUANTITY	0.85+
Oracles	ORGANIZATION	0.85+
2020	DATE	0.85+
CTO	PERSON	0.84+
Vertica	TITLE	0.82+
Nimble InfoSight	ORGANIZATION	0.81+

Ron Cormier, The Trade Desk | Virtual Vertica BDC 2020

>> David: It's the cube covering the virtual Vertica Big Data conference 2020 brought to you by Vertica. Hello, buddy, welcome to this special digital presentation of the cube. We're tracking the Vertica virtual Big Data conferences, the cubes. I think fifth year doing the BDC. We've been to every big data conference that they've held and really excited to be helping with the digital component here in these interesting times. Ron Cormier is here, Principal database engineer at the Trade Desk. Ron, great to see you. Thanks for coming on. >> Hi, David, my pleasure, good to see you as well. >> So we're talking a little bit about your background you got, you're basically a Vertica and database guru, but tell us about your role at Trade Desk and then I want to get into a little bit about what Trade Desk does. >> Sure, so I'm a principal database engineer at the Trade Desk. The Trade Desk was one of my customers when I was working with Hp, at HP, as a member of the Vertica team, and I joined the Trade Desk in early 2016. And since then, I've been working on building out their Vertica capabilities and expanding the data warehouse footprint and as ever growing database technology, data volume environment. >> And the Trade Desk is an ad tech firm and you are specializing in real time ad serving and pricing. And I guess real time you know, people talk about real time a lot we define real time as before you lose the customer. Maybe you can talk a little bit about you know, the Trade Desk in the business and maybe how you define real time. >> Totally, so to give everybody kind of a frame of reference. Anytime you pull up your phone or your laptop and you go to a website or you use some app and you see an ad what's happening behind the scenes is an auction is taking place. And people are bidding on the privilege to show you an ad. And across the open Internet, this happens seven to 13 million times per second. And so the ads, the whole auction dynamic and the display of the ad needs to happen really fast. So that's about as real time as it gets outside of high frequency trading, as far as I'm aware. So we put the Trade Desk participates in those auctions, we bid on behalf of our customers, which are ad agencies, and the agencies represent brands so the agencies are the madman companies of the world and they have brands that under their guidance, and so they give us budget to spend, to place the ads and to display them and once the ads get displayed, so we bid on the hundreds of thousands of auctions per second. Once we make those bids, anytime we do make a bid some data flows into our data platform, which is powered by Vertica. And, so we're getting hundreds of thousands of events per second. We have other events that flow into Vertica as well. And we clean them up, we aggregate them, and then we run reports on the data. And we run about 40,000 reports per day on behalf of our customers. The reports aren't as real time as I was talking about earlier, they're more batch oriented. Our customers like to see big chunks of time, like a whole day or a whole week or a whole month on a single report. So we wait for that time period to complete and then we run the reports on the results. >> So you you have one of the largest commercial infrastructures, in the Big Data sphere. Paint a picture for us. I understand you got a couple of like 320 node clusters we're talking about petabytes of data. But describe what your environment looks like. >> Sure, so like I said, we've been very good customers for a while. And we started out with with a bunch of enterprise clusters. So the Enterprise Mode is the traditional Vertica deployment where the compute and the storage is tightly coupled all raid arrays on the servers. And we had four of those and we're doing okay, but our volumes are ever increasing, we wanted to store more data. And we wanted to run more reports in a shorter period of time, was to keep pushing. And so we had these four clusters and then we started talking with Vertica about Eon mode, and that's Vertica separation of compute and storage where you get the compute and the storage can be scaled independently, we can add storage without adding compute or vice versa or we can add both, like. So that was something that we were very interested in for a couple reasons. One, our enterprise clusters, we're running out of disk, like when adding disk is expensive. In Enterprise Mode, it's kind of a pain, you got to add, compute at the same time, so you kind of end up in an unbalanced place. So beyond mode that problem gets a lot better. We can add disk, infinite disk because it's backed by S3. And we can add compute really easy to scale, the number of things that we run in parallel concurrency, just add a sub cluster. So they are two US East and US west of Amazon, so reasonably diverse. And and the real benefit is that they can, we can stop nodes when we don't need them. Our workload is fairly lumpy, I call it. Like we, after the day completes, we do the ingest, we do the aggregation for ingesting and aggregating all day, but the final hour, so it needs to be completed. And then once that's done, then the number of reports that we need to run spikes up, it goes really high. And we run those reports, we spin up a bunch of extra compute on the fly, run those reports and then spin them down. And we don't have to pay for that, for the rest of the day. So Eon has been a nice Boone for us for both those reasons. >> I'd love to explore you on little bit more. I mean, it's relatively new, I think 2018 Vertica announced Eon mode, so it's only been out there a couple years. So I'm curious for the folks that haven't moved the Eon mode, can you which presumably they want to for the same reasons that you mentioned why by the stories and chunks when you're on Storage if you don't have to, what were some of the challenges that you had to, that you faced in going to Eon mode? What kind of things did you have to prepare for? Were there any out of scope expectations? Can you share that experience with us? >> Sure, so we were an early adopter. We participated in the beta program. I mean, we, I think it's fair to say we actually drove the requirements and a lot of ways because we approached Vertica early on. So the challenges were what you'd expect any early adopter to be going through. The sort of getting things working as expected. I mean, there's a number of cases, which I could touch upon, like, we found an efficiency in the way that it accesses the data on S3 and it was accessing the data too frequently, which ended up was just expensive. So our S3 bill went up pretty significantly for a couple of months. So that was a challenge, but we worked through that another was that we recently made huge strides in with Vertica was the ability to stop and start nodes and not have to start them very quickly. And when they start to not interfere with any running queries, so when we create, when we want to spin up a bunch to compute, there was a point in time when it would break certain queries that were already running. So that that was a challenge. But again, the very good team has been quite responsive to solving these issues and now that's behind us. In terms of those who need to get started, there's or looking to get started. there's a number of things to think about. Off the top of my head there's sort of new configuration items that you'll want to think about, like how instance type. So certainly the Amazon has a variety of instances and its important to consider one of Vertica's architectural advantages in these areas Vertica has this caching layer on the instances themselves. And what that does is if we can keep the data in cache, what we've found is that the performance is basically the same performance of Enterprise Mode. So having a good size cast when needed, can be a little worrying. So we went with the I three instance types, which have a lot of local NVME storage that we can, so we can cache data and get good performance. That's one thing to think about. The number of nodes, the instance type, certainly the number of shards is a sort of technical item that needs to be considered. It's how the data gets, its distributed. It's sort of a layer on top of the segmentation that some Vertica engineers will be familiar with. And probably I mean, the, one of the big things that one needs to consider is how to get data in the database. So if you have an existing database, there's no sort of nice tool yet to suck all the data into an Eon database. And so I think they're working on that. But we're at the point we got there. We had to, we exported all our data out of enterprise cluster as cache dumped it out to S3 and then we had the Eon cluster to suck that data. >> So awesome advice. Thank you for sharing that with the community. So but at the end of the day, so it sounds like you had some learning to do some tweaking to do and obviously how to get the data in. At the end of the day, was it worth it? What was the business impact? >> Yeah, it definitely was worth it for us. I mean, so right now, we have four times the data in our Eon cluster that we have in our enterprise clusters. We still run some enterprise clusters. We started with four at the peak. Now we're down to two. So we have the two young clusters. So it's been, I think our business would say it's been a huge win, like we're doing things that we really never could have done before, like for accessing the data on enterprise would have been really difficult. It would have required non trivial engineering to do things like daisy chaining clusters together, and then how to aggregate data across clusters, which would, again, non trivial. So we have all the data we want, we can continue to grow data, where running reports on seasonality. So our customers can compare their campaigns last year versus this year, which is something we just haven't been able to do in the past. We've expanded that. So we grew the data vertically, we've expanded the data horizontally as well. So we were adding columns to our aggregates. We are, in reaching the data much more than we have in the past. So while we still have enterprise kicking around, I'd say our clusters are doing the majority of the heavy lifting. >> And the cloud was part of the enablement, here, particularly with scale, is that right? And are you running certain... >> Definitely. >> And you are running on prem as well, or are you in a hybrid mode? Or is it all AWS? >> Great question, so yeah. When I've been speaking about enterprise, I've been referring to on prem. So we have a physical machines in data centers. So yeah, we are running a hybrid now and I mean, and so it's really hard to get like an apples to apples direct comparison of enterprise on prem versus Eon in the cloud. One thing that I touched upon in my presentation is it would require, if I try to get apples to apples, And I think about how I would run the entire workload on enterprise or on Eon, I had to run the entire thing, we want both, I tried to think about how many cores, we would need CPU cores to do that. And basically, it would be about the same number of cores, I think, for enterprise on prime versus Eon in the cloud. However, Eon nodes only need to be running half the course only need to be running about six hours out of the day. So the other the other 18 hours I can shut them down and not be paying for them, mostly. >> Interesting, okay, and so, I got to ask you, I mean, notwithstanding the fact that you've got a lot invested in Vertica, and get a lot of experience there. A lot of you know, emerging cloud databases. Did you look, I mean, you know, a lot about database, not just Vertica, your database guru in many areas, you know, traditional RDBMS, as well as MPP new cloud databases. What is it about Vertica that works for you in this specific sweet spot that you've chosen? What's really the difference there? >> Yeah, so I think the key differences is the maturity. There are a number, I am familiar with another, a number of other database platforms in the cloud and otherwise, column stores specifically, that don't have the maturity that we're used to and we need at our scale. So being able to specify alternate projections, so different sort orders on my data is huge. And, there's other platforms where we don't have that capability. And so the, Vertica is, of course, the original column store and they've had time to build up a lead in terms of their maturity and features and I think that other other column stores cloud, otherwise are playing a little bit of catch up in that regard. Of course, Vertica is playing catch up on the cloud side. But if I had to pick whether I wanted to write a column store, first graph from scratch, or use a defined file system, like a cloud file system from scratch, I'd probably think it would be easier to write the cloud file system. The column store is where the real smarts are. >> Interesting, let's talk a little bit about some of the challenges you have in reporting. You have a very dynamic nature of reporting, like I said, your clients want to they want to a time series, they just don't want to snap snapshot of a slice. But at the same time, your reporting is probably pretty lumpy, a very dynamic, you know, demand curve. So first of all, is that accurate? Can you describe that sort of dynamic, dynamism and how are you handling that? >> Yep, that's exactly right. It is lumpy. And that's the exact word that I use. So like, at the end of the UTC day, when UTC midnight rolls around, that's we do the final ingest the final aggregate and then the queue for the number of reports that need to run spikes. So the majority of those 40,000 reports that we run per day are run in the four to six hours after that spikes up. And so that's when we need to have all the compute come online. And that's what helps us answer all those queries as fast as possible. And that's a big reason why Eon is advantage for us because the rest of the day we kind of don't necessarily need all that compute and we can shut it down and not pay for it. >> So Ron, I wonder if you could share with us just sort of the wrap here, where you want to take this you're obviously very close to Vertica. Are you driving them in a heart and Eon mode, you mentioned before you'd like, you'd have the ability to load data into Eon mode would have been nice for you, I guess that you're kind of over that hump. But what are the kinds of things, If Column Mahoney is here in the room, what are you telling him that you want the team, the engineering team at Vertica to work on that would make your life better? >> I think the things that need the most attention sort of near term is just the smoothing out some of the edges in terms of making it a little bit more seamless in terms of the cloud aspects to it. So our goal is to be able to start instances and have them join the cluster in less than five minutes. We're not quite there yet. If you look at some of the other cloud database platforms, they're beating that handle it so I know the team is working on that. Some of the other things are the control. Like I mentioned, while we like control in the column store, we also want control on the cloud side of things in terms of being able to dedicate cluster, some clusters specific. We can pin workloads against a specific sub cluster and take advantage of the cast that's over there. We can say, okay, this resource pool. I mean, the sub cluster is a new concept, relatively new concept for Vertica. So being able to have control of many things at sub cluster level, resource pools, configuration parameters, and so on. >> Yeah, so I mean, I personally have always been impressed with Vertica. And their ability to sort of ride the wave adopt new trends. I mean, they do have a robust stack. It's been, you know, been 10 plus years around. They certainly embraced to do, the embracing machine learning, we've been talking about the cloud. So I actually have a lot of confidence to them, especially when you compare it to other sort of mid last decade MPP column stores that came out, you know, Vertica is one of the few remaining certainly as an independent brand. So I think that speaks the team there and the engineering culture. But give your final word. Just final thoughts on your role the company Vertica wherever you want to take it. >> Yeah, no, I mean, we're really appreciative and we value the partners that we have and so I think it's been a win win, like our volumes are, like I know that we have some data that got pulled into their test suite. So I think it's been a win win for both sides and it'll be a win for other Vertica customers and prospects, knowing that they're working with some of the highest volume, velocity variety data that (mumbles) >> Well, Ron, thanks for coming on. I wish we could have met face to face at the the Encore in Boston. I think next year we'll be able to do that. But I appreciate that technology allows us to have these remote conversations. Stay safe, all the best to you and your family. And thanks again. >> My pleasure, David, good speaking with you. >> And thank you for watching everybody, we're covering this is the Cubes coverage of the Vertica virtual Big Data conference. I'm Dave volante. We'll be right back right after this short break. (soft music)

Published Date : Mar 31 2020

SUMMARY :

brought to you by Vertica. So we're talking a little bit about your background and I joined the Trade Desk in early 2016. And the Trade Desk is an ad tech firm And people are bidding on the privilege to show you an ad. So you you have one of the largest And and the real benefit is that they can, for the same reasons that you mentioned why by dumped it out to S3 and then we had the Eon cluster So but at the end of the day, So we have all the data we want, And the cloud was part of the enablement, here, half the course only need to be running I mean, notwithstanding the fact that you've got that don't have the maturity about some of the challenges you have in reporting. because the rest of the day we kind of So Ron, I wonder if you could share with us in terms of the cloud aspects to it. the company Vertica wherever you want to take it. and we value the partners that we have Stay safe, all the best to you and your family. of the Vertica virtual Big Data conference.

ENTITIES

Entity	Category	Confidence
Ron	PERSON	0.99+
David	PERSON	0.99+
Vertica	ORGANIZATION	0.99+
Ron Cormier	PERSON	0.99+
HP	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
last year	DATE	0.99+
AWS	ORGANIZATION	0.99+
40,000 reports	QUANTITY	0.99+
Boston	LOCATION	0.99+
18 hours	QUANTITY	0.99+
fifth year	QUANTITY	0.99+
US	LOCATION	0.99+
Dave volante	PERSON	0.99+
next year	DATE	0.99+
seven	QUANTITY	0.99+
both	QUANTITY	0.99+
One	QUANTITY	0.99+
2018	DATE	0.99+
less than five minutes	QUANTITY	0.99+
this year	DATE	0.99+
10 plus years	QUANTITY	0.99+
one	QUANTITY	0.99+
four	QUANTITY	0.99+
early 2016	DATE	0.98+
apples	ORGANIZATION	0.98+
two young clusters	QUANTITY	0.98+
two	QUANTITY	0.98+
both sides	QUANTITY	0.98+
about six hours	QUANTITY	0.98+
Cubes	ORGANIZATION	0.98+
six hours	QUANTITY	0.98+
US East	LOCATION	0.98+
Hp	ORGANIZATION	0.98+
Eon	ORGANIZATION	0.96+
S3	TITLE	0.95+
13 million times per second	QUANTITY	0.94+
half	QUANTITY	0.94+
prime	COMMERCIAL_ITEM	0.94+
four times	QUANTITY	0.92+
hundreds of thousands of auctions	QUANTITY	0.92+
mid last decade	DATE	0.89+
one thing	QUANTITY	0.88+
One thing	QUANTITY	0.87+
single report	QUANTITY	0.85+
couple reasons	QUANTITY	0.84+
four clusters	QUANTITY	0.83+
first graph	QUANTITY	0.81+
Vertica	TITLE	0.81+
hundreds of thousands of events per second	QUANTITY	0.8+
about 40,000 reports per day	QUANTITY	0.78+
Vertica Big Data conference 2020	EVENT	0.77+
320 node	QUANTITY	0.74+
a whole week	QUANTITY	0.72+
Vertica virtual Big Data	EVENT	0.7+

Joe Gonzalez, MassMutual | Virtual Vertica BDC 2020

(bright music) >> Announcer: It's theCUBE. Covering the Virtual Vertica Big Data Conference 2020, brought to you by Vertica. Hello everybody, welcome back to theCUBE's coverage of the Vertica Big Data Conference, the Virtual BDC. My name is Dave Volante, and you're watching theCUBE. And we're here with Joe Gonzalez, who is a Vertica DBA, at MassMutual Financial. Joe, thanks so much for coming on theCUBE I'm sorry that we can't be face to face in Boston, but at least we're being responsible. So thank you for coming on. >> (laughs) Thank you for having me. It's nice to be here. >> Yeah, so let's set it up. We'll talk about, you know, a little bit about MassMutual. Everybody knows it's a big financial firm, but what's your role there and kind of your mission? >> So my role is Vertica DBA. I was hired January of last year to come on and manage their Vertica cluster. They've been on Vertica for probably about a year and a half before that started out on on-prem cluster and then move to AWS Enterprise in the cloud, and brought me on just as they were considering transitioning over to Vertica's EON mode. And they didn't really have anybody dedicated to Vertica, nobody who really knew and understood the product. And I've been working with Vertica for about probably six, seven years, at that point. I was looking for something new and landed a really good opportunity here with a great company. >> Yeah, you have a lot of experience in Vertica. You had a role as a market research, so you're a data guy, right? I mean that's really what you've been doing your entire career. >> I am, I've worked with Pitney Bowes, in the postage industry, I worked with healthcare auditing, after seven years in market research. And then I've been with MassMutual for a little over a year now, yeah, quite a lot. >> So tell us a little bit about kind of what your objectives are at MassMutual, what you're kind of doing with the platform, what application just supporting, paint a picture for us if you would. >> Certainly, so my role is, MassMutual just decided to make Vertica its enterprise data warehouse. So they've really bought into Vertica. And we're moving all of our data there probably about to good 80, 90% of MassMutual's data is going to be on the Vertica platform, in EON mode. So, and we have a wide usage of that data across corporation. Right now we're about 50 terabytes and growing quickly. And a wide variety of users. So there's a lot of ETLs coming in overnight, loading a lot of data, transforming a lot of data. And a lot of reporting tools are using it. So currently, Tableau MicroStrategy. We have Alteryx using it, and we also have API's running against it throughout the day, 24/7 with people coming in, especially now these days with the, you know, some financial uncertainty going on. A lot of people coming and checking their 401k's, checking their insurance and status and what not. So we have to handle a lot of concurrent traffic on top of the normal big query. So it's a quite diverse cluster. And I'm glad they're really investing in using Vertica as their overall solution for this. >> Yeah, I mean, these days your 401k like this, right? (laughing) Afraid to look. So I wonder, Joe if you could share with our audience. I mean, for those who might not be as familiar with the history of just Vertica, and specifically, about MPP, you've had historically you have, you know, traditional RDBMS, whether it's Db2 or Oracle, and then you had a spate of companies that came out with this notion of MPP Vertica is the one that, I think it's probably one of the few if only brands that they've survived, but what did that bring to the industry and why is that important for people to understand, just in terms of whatever it is, scale, performance, cost. Can you explain that? >> To me, it actually brought scale at good cost. And that's why I've been a big proponent of Vertica ever since I started using it. There's a number, like you said of different platforms where you can load big data and store and house big data. But the purpose of having that big data is not just for it to sit there, but to be used, and used in a variety of ways. And that's from, you know, something small, like the first installation I was on was about 10 terabytes. And, you know, I work with the data warehouses up to 100 terabytes, and, you know, there's Vertica installations with, you know, hundreds of petabytes on them. You want to be able to use that data, so you need a platform that's going to be able to access that data and get it to the clients, get it to the customers as quickly as possible, and not paying an arm and a leg for the privilege to do so. And Vertica allows companies to do that, not only get their data to clients and you know, in company users quickly, but save money while doing so. >> So, but so, why couldn't I just use a traditional RDBMS? Why not just throw it all into Oracle? >> One, cost, Oracle is very expensive while Vertica's a lot more affordable than that. But the column-score structure of Vertica allows for a lot more optimized queries. Some of the queries that you can run in Vertica in 2, 3, 4 seconds, will take minutes and sometimes hours in an RDBMS, like Oracle, like SQL Server. They have the capability to store that amount of data, no question, but the usability really lacks when you start querying tables that are 180 billion column, 180 billion rows rather of tables in Vertica that are over 1000 columns. Those will take hours to run on a traditional RDBMS and then running them in Vertica, I get my queries back in a sec. >> You know what's interesting to me, Joe and I wonder if you could comment, it seems that Vertica has done a good job of embracing, you know, riding the waves, whether it was HDFS and the big data in our early part of the big data era, the machine learning, machine intelligence. Whether it's, you know, TensorFlow and other data science tools, it seems like Vertica somehow in the cloud is the other one, right? A lot of times cloud is super disruptive, particularly to companies that started on-prem, it seems like Vertica somehow has been able to adopt and embrace some of these trends. Why, from your standpoint, first of all, from your standpoint, as a customer, is that true? And why do you think that is? Is it architectural? Is it true mindset engineering? I wonder if you could comment on that. >> It's absolutely true, I've started out again, on an on-prem Vertica data warehouse, and we kind of, you know, rolled kind of along with them, you know, more and more people have been using data, they want to make it accessible to people on the web now. And you know, having that, the option to provide that data from an on-prem solution, from AWS is key, and now Vertica is offering even a hybrid solution, if you want to keep some of your data behind a firewall, on-prem, and put some in the cloud as well. So data at Vertica has absolutely evolved along with the industry in ways that no other company really has that I've seen. And I think the reason for it and the reason I've stayed with Vertica, and specifically have remained at Vertica DBA for the last seven years, is because of the way Vertica stays in touch with it's persons. I've been working with the same people for the seven, eight years, I've been using Vertica, they're family. I'm part of their family, and you know, I'm good friends with some of these people. And they really are in tune not only with the customer but what they're doing. They really sit down with you and have those conversations about, you know, what are your needs? How can we make Vertica better? And they listen to their clients. You know, just having access to the data engineers who develop Vertica to be arranged on a phone call or whatnot, I've never had that with any other company. Vertica makes that available to their customers when they need it. So the personal touch is a huge for them. >> That's good, it's always good to get the confirmation from the practitioners, just not hear from the vendor. I want to ask you about the EON transition. You mentioned that MassMutual brought you in to help with that. What were some of the challenges that you faced? And how did you get over them? And what did, what is, why EON? You know, what was the goal, the outcome and some of the challenges maybe that you had to overcome? >> Right. So MassMutual had an interesting setup when I first came in. They had three different Vertica clusters to accommodate three different portions of their business. The data scientists who use the data quite extensively in very large queries, very intense queries, their work with their predictive analytics and whatnot. It was a separate one for the API's, which needed, you know, sub-second query response times. And the enterprise solution, they weren't always able to get the performance they needed, because the fast queries were being overrun by the larger queries that needed more resources. And then they had a third for starting to develop this enterprise data platform and started, you know, looking into their future. The first challenge was, first of all, bringing all those three together, and back into a single cluster, and allowing our users to have both of the heavy queries and the API queries running at the same time, on the same platform without having to completely separate them out onto different clusters. EON really helps with that because it allows to store that data in the S3 communal storage, have the main cluster set up to run the heavy queries. And then you can set up sub clusters that still point to that S3 data, but separates out the compute so that the API's really have their own resources to run and not be interfered with by the other process. >> Okay, so that, I'm hearing a couple of things. One is you're sort of busting down data silos. So you're able to have a much more coherent view of your data, which I would imagine is critical, certainly. Companies like MassMutual, have been around for 100 years, and so you've got all kinds of data dispersed. So to the extent that you can break down those silos, that's important, but also being able to I guess have granular increments of compute and storage is what I'm hearing. What does that do for you? It make that more efficient? Well, they are other business benefits? Maybe you could elucidate. >> Well, one cost is again, a huge benefit, the cost of running three different clusters in even AWS, in the enterprise solution was a little costly, you know, you had to have your dedicated servers here and there. So you're paying for like, you know, 12, 15 different servers, for example. Whereas we bring them all back into EON, I can run everything on a six-node production cluster. And you know, when things are busy, I can spin up the three-node top cluster for the API's, only paid for when I need them, and then bring them back into the main cluster when things are slowed down a bit, and they can get that performance that they need. So that saves a ton on resource costs, you know, you're not paying for the storage, you're paying for one S3 bucket, you're only paying for the nodes, these are two instances, that are up and running when you need them., and that is huge. And again, like you said, it gives us the ability to silo our data without having to completely separate our data into different storage areas. Which is a big benefit, it gives us the ability to query everything from one single cluster without having to synchronize it to, you know, three different ones. So this one going to have there's, this one going to have there's, but everyone's still looking at the same data and replicate that in QA and Devs so that people can do it outside of production and do some testing as well. >> So EON, obviously a very important innovation. And of course, Vertica touts the difference between others who separate huge storage, and you know, they're not the only one that does that, but they are really I think the only one that does it for on-prem, and virtually across clouds. So my question is, and I think you're doing a breakout session on the Virtual BDC. We're going to be in Boston, now we're doing it online. If I'm in the audience, I'm imagining I'm a junior DBA at an organization that maybe doesn't have a Joe. I haven't been an expert for seven years. How hard is it for me to get, what do I need to do to get up to speed on EON? It sounds great, I want it. I'm going to save my company money, but I'm nervous 'cause I've only been at Vertica DBA for, you know, a year, and I'm sort of, you know, not as experienced as you. What are the things that I should be thinking about? Do I need to bring in? Do I need to hire somebody? Do I need to bring in a consultant? Can I learn it myself? What would you advise? >> It's definitely easy enough that if you have at least a little bit of work experience, you can learn it yourself, okay? 'Cause the concepts are still there. There's some you know, little bits of nuances where you do need to be aware of certain changes between the Enterprise and EON edition. But I would also say consult with your Vertica Account Manager, consult with your, you know, let them bring in the right people from Vertica to help you get up to speed and if you need to, there are also resources available as far as consultants go, that will help you get up to speed very quickly. And we did work together with Vertica and with one of their partners, Clarity, in helping us to understand EON better, set it up the right way, you know, how do we take our, the number of shards for our data warehouse? You know, they helped us evaluate all that and pick the right number of shards, the right number of nodes to get set up and going. And, you know, helped us figure out the best ways to get our data over from the Enterprise Edition into EON very quickly and very efficient. So different with yourself. >> I wanted to ask you about organizational, you know, issues because, you know, the guys like you practitioners always tell me, "Look, the tech, technology comes and goes, that's kind of the easy part, we're good at that. It's the people it's the processes, the skill sets." What does your, you know, team regime look like? And do you have any sort of ideal team makeup or, you know, ideal advice, is it two piece of teams? Is it what kind of skills? What kind of interaction and communications to senior leadership? I wonder if you could just give us some color on that. >> One of the things that makes me extremely proud to be working for MassMutual right now, is that they do what a lot of companies have not been doing and that is investing in IT. They have put a lot of thought, a lot of money, and a lot of support into setting up their enterprise data platform and putting Vertica at the center. And not only did they put the money into getting the software that they needed, like Vertica, you know, MicroStrategy, and all the other tools that we were using to use that, they put the money in the people. Our managers are extremely supportive of us. We hired about 40 to 45 different people within a four-month time frame, data engineers, data analysts, data modelers, a nice mix of people across who can help shape your data and bring the data in and help the users use the data properly, and allow me as the database administrator to make sure that they're doing what they're doing most efficiently and focus on my job. So you have to have that diversity among the different data skills in order to make your team successful. >> That's awesome. Kind of a side question, and it's really not Vertica's wheelhouse, but I'm curious, you know, in the early days of the big data, you know, movement, a lot of the data scientists would complain, and they still do that, "80% of my time is spent wrangling data." The tools for the data engineer, the data scientists, the database, you know, experts, they're all different. And is that changing? And to what degree is that changing? Kind of what ending are we in and just in terms of a more facile environment for all those roles? >> Again, I think it depends on company to company, you know, what resources they make available to the data scientists. And the data scientists, we have a lot of them at MassMutual. And they're very much into doing a lot of machine learning, model training, predictive analytics. And they are, you know, used to doing it outside of Vertica too, you know, pulling that data out into Python and Scalars Bar, and tools like that. And they're also now just getting into using Vertica's in-database analytics and machine learning, which is a skill that, you know, definitely nobody else out there has. So being able to have one somebody who understands Vertica like myself, and being able to train other people to use Vertica the way that is most efficient for them is key. But also just having people who understand not only the tools that you're using, but how to model data, how to architect your tables, your schemas, the interaction between your tables and schemas and whatnot, you need to have that diversity in order to make this work. And our data scientists have benefited immensely from the struct that MassMutual put in place by our data management delivery team. >> That's great, I think I saw, somewhere in your background, that you've trained about 100 people in Vertica. Did I get that right? >> Yes, I've, since I started here, I've gone to our Boston location, our Springfield location, and our New York City location and trained, probably about this point, about 120, 140 of our Vertica users. And I'm trying to do, you know, a couple of follow-up sessions per year. >> So adoption, obviously, is a big goal of yours. Getting people to adopt the platform, but then more importantly, I guess, deliver business value and outcomes. >> Absolutely. >> Yeah, I wanted to ask you about encryption. You know, in the perfect world, everything would be encrypted, but there are trade offs. Are you using encryption? What are you doing in that regard? >> We are actually just getting into that now due to the New York and the CCPA regulations that are now in place. We do have a lot of Person Identifiable Information in our data store that does require encryption. So we are going through a month's long process that started in December, I think, it's actually a bit earlier than that, to start identifying all the columns, not only in our Vertica database, but in, you know, the other databases that we do use, you know, we have Postgres database, SQL Server, Teradata for the time being, until that moves into Vertica. And identify where that data sits, what downstream applications, pull that data from the data sources and store it locally as well, and starts encrypting that data. And because of the tight relationship between Voltage and Vertica, we settled on Voltages as the major platform to start doing that encryption. So we're going to be implementing that in Vertica probably within the next month or two, and roll it out to all the teams that have data that requires encryption. We're going to start rolling it out to the downstream application owners to make sure that they are encrypting the data as they get it pulled over. And we're also using another product for several other applications that don't mesh well as well with both. >> Voltage being micro, focuses encryption solution, correct? >> Right, yes. >> Yes, of course, like a focus for the audience's is the, it owns Vertica and if Vertica is a separate brand. So I want to ask you kind of close on what success looks like. You've been at this for a number of years, coming into MassMutual which was great to hear. I've had some past experience with MassMutual, it's an awesome company, I've been to the Springfield facility and in Boston as well, and I have great respect for them, and they've really always been a leader. So it's great to hear that they're investing in technology as a differentiator. What does success look like for you? Let's say you're at MassMutual for a few years, you're looking back, what success look like? Go. >> A good question. It's changing every day just, you know, with more and more, you know, applications coming onboard, more and more data being pulled in, more uses being found for the data that we have. I think success for me is making sure that Vertica, first of all, is always up made, is always running at its most optimal to keep our users happy. I think when I started, you know, we had a lot of processes that were running, you know, six, seven hours, some of them were taking, you know, almost a day long, because they were so complicated, we've got those running in under an hour now, some of them running in a matter of minutes. I want to keep that optimization going for all of our processes. Like I said, there's a lot of users using this data. And it's been hard over the first year of me being here to get to all of them. And thankfully, you know, I'm getting a bit of help now, I have a couple of system DBAs, and I'm training up to help out with these optimizations, you know, fixing queries, fixing projections to make sure that queries do run as quickly as possible. So getting that to its optimal stage is one. Two, getting our data encrypted and protected so that even if for whatever reasons, somehow somebody breaks into our data, they're not going to be able to get anything at all, because our data is 100% protected. And I think more companies need to be focusing on that as well. And third, I want to see our data science teams using more and more of Vertica's in-database predictive analytics, in-database machine learning products, and really helping make their jobs more efficient by doing so. >> Joe, you're awesome guest I mean, we always like I said, love having the practitioners on and getting the straight, skinny and pros. You're welcome back anytime, and as I say, I wish we could have met in Boston, maybe next year at the BDC. But it's great to have you online, and thanks for coming on theCUBE. >> And thank you for having me and hopefully we'll meet next year. >> Yeah, I hope so. And thank you everybody for watching that. Remember theCUBE is running concurrent with the Vertica Virtual BDC, it's vertica.com/bdc2020. If you want to check out all the keynotes, and all the breakout sessions, I'm Dave Volante for theCUBE. We'll be going. More interviews, for people right there. Thanks for watching. (bright music)

Published Date : Mar 31 2020

SUMMARY :

Big Data Conference 2020, brought to you by Vertica. (laughs) Thank you for having me. We'll talk about, you know, cluster and then move to AWS Enterprise in the cloud, Yeah, you have a lot of experience in Vertica. in the postage industry, I worked with healthcare auditing, paint a picture for us if you would. with the, you know, some financial uncertainty going on. and then you had a spate of companies that came out their data to clients and you know, Some of the queries that you can run in Vertica a good job of embracing, you know, riding the waves, And you know, having that, the option to provide and some of the challenges maybe that you had to overcome? It was a separate one for the API's, which needed, you know, So to the extent that you can break down those silos, So that saves a ton on resource costs, you know, and I'm sort of, you know, not as experienced as you. to help you get up to speed and if you need to, because, you know, the guys like you practitioners the database administrator to make sure that they're doing of the big data, you know, movement, Again, I think it depends on company to company, you know, Did I get that right? And I'm trying to do, you know, a couple of follow-up Getting people to adopt the platform, but then more What are you doing in that regard? the other databases that we do use, you know, So I want to ask you kind of close on what success looks like. And thankfully, you know, I'm getting a bit of help now, But it's great to have you online, And thank you for having me And thank you everybody for watching that.

ENTITIES

Entity	Category	Confidence
Joe Gonzalez	PERSON	0.99+
Vertica	ORGANIZATION	0.99+
Dave Volante	PERSON	0.99+
MassMutual	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
December	DATE	0.99+
100%	QUANTITY	0.99+
Joe	PERSON	0.99+
six	QUANTITY	0.99+
New York City	LOCATION	0.99+
seven years	QUANTITY	0.99+
12	QUANTITY	0.99+
80%	QUANTITY	0.99+
seven	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
four-month	QUANTITY	0.99+
vertica.com/bdc2020	OTHER	0.99+
Springfield	LOCATION	0.99+
2	QUANTITY	0.99+
next year	DATE	0.99+
two instances	QUANTITY	0.99+
seven hours	QUANTITY	0.99+
both	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
Scalars Bar	TITLE	0.99+
Python	TITLE	0.99+
180 billion rows	QUANTITY	0.99+
Two	QUANTITY	0.99+
third	QUANTITY	0.99+
15 different servers	QUANTITY	0.99+
two piece	QUANTITY	0.98+
One	QUANTITY	0.98+
180 billion column	QUANTITY	0.98+
over 1000 columns	QUANTITY	0.98+
eight years	QUANTITY	0.98+
Voltage	ORGANIZATION	0.98+
three	QUANTITY	0.98+
hundreds of petabytes	QUANTITY	0.98+
first	QUANTITY	0.98+
six-node	QUANTITY	0.98+
one	QUANTITY	0.98+
one single cluster	QUANTITY	0.98+
Vertica Big Data Conference	EVENT	0.98+
MassMutual Financial	ORGANIZATION	0.98+
4 seconds	QUANTITY	0.98+
EON	ORGANIZATION	0.98+
New York	LOCATION	0.97+
about 10 terabytes	QUANTITY	0.97+
first challenge	QUANTITY	0.97+
next month	DATE	0.97+

Keynote Analysis | Virtual Vertica BDC 2020

(upbeat music) >> Narrator: It's theCUBE, covering the Virtual Vertica Big Data Conference 2020. Brought to you by Vertica. >> Dave Vellante: Hello everyone, and welcome to theCUBE's exclusive coverage of the Vertica Virtual Big Data Conference. You're watching theCUBE, the leader in digital event tech coverage. And we're broadcasting remotely from our studios in Palo Alto and Boston. And, we're pleased to be covering wall-to-wall this digital event. Now, as you know, originally BDC was scheduled this week at the new Encore Hotel and Casino in Boston. Their theme was "Win big with big data". Oh sorry, "Win big with data". That's right, got it. And, I know the community was really looking forward to that, you know, meet up. But look, we're making the best of it, given these uncertain times. We wish you and your families good health and safety. And this is the way that we're going to broadcast for the next several months. Now, we want to unpack Colin Mahony's keynote, but, before we do that, I want to give a little context on the market. First, theCUBE has covered every BDC since its inception, since the BDC's inception that is. It's a very intimate event, with a heavy emphasis on user content. Now, historically, the data engineers and DBAs in the Vertica community, they comprised the majority of the content at this event. And, that's going to be the same for this virtual, or digital, production. Now, theCUBE is going to be broadcasting for two days. What we're doing, is we're going to be concurrent with the Virtual BDC. We got practitioners that are coming on the show, DBAs, data engineers, database gurus, we got a security experts coming on, and really a great line up. And, of course, we'll also be hearing from Vertica Execs, Colin Mahony himself right of the keynote, folks from product marketing, partners, and a number of experts, including some from Micro Focus, which is the, of course, owner of Vertica. But I want to take a moment to share a little bit about the history of Vertica. The company, as you know, was founded by Michael Stonebraker. And, Verica started, really they started out as a SQL platform for analytics. It was the first, or at least one of the first, to really nail the MPP column store trend. Not only did Vertica have an early mover advantage in MPP, but the efficiency and scale of its software, relative to traditional DBMS, and also other MPP players, is underscored by the fact that Vertica, and the Vertica brand, really thrives to this day. But, I have to tell you, it wasn't without some pain. And, I'll talk a little bit about that, and really talk about how we got here today. So first, you know, you think about traditional transaction databases, like Oracle or IMBDB tour, or even enterprise data warehouse platforms like Teradata. They were simply not purpose-built for big data. Vertica was. Along with a whole bunch of other players, like Netezza, which was bought by IBM, Aster Data, which is now Teradata, Actian, ParAccel, which was the basis for Redshift, Amazon's Redshift, Greenplum was bought, in the early days, by EMC. And, these companies were really designed to run as massively parallel systems that smoked traditional RDBMS and EDW for particular analytic applications. You know, back in the big data days, I often joked that, like an NFL draft, there was run on MPP players, like when you see a run on polling guards. You know, once one goes, they all start to fall. And that's what you saw with the MPP columnar stores, IBM, EMC, and then HP getting into the game. So, it was like 2011, and Leo Apotheker, he was the new CEO of HP. Frankly, he has no clue, in my opinion, with what to do with Vertica, and totally missed one the biggest trends of the last decade, the data trend, the big data trend. HP picked up Vertica for a song, it wasn't disclosed, but my guess is that it was around 200 million. So, rather than build a bunch of smart tokens around Vertica, which I always call the diamond in the rough, Apotheker basically permanently altered HP for years. He kind of ruined HP, in my view, with a 12 billion dollar purchase of Autonomy, which turned out to be one of the biggest disasters in recent M&A history. HP was forced to spin merge, and ended up selling most of its software to Microsoft, Micro Focus. (laughs) Luckily, during its time at HP, CEO Meg Whitman, largely was distracted with what to do with the mess that she inherited form Apotheker. So, Vertica was left alone. Now, the upshot is Colin Mahony, who was then the GM of Vertica, and still is. By the way, he's really the CEO, and he just doesn't have the title, I actually think they should give that to him. But anyway, he's been at the helm the whole time. And Colin, as you'll see in our interview, is a rockstar, he's got technical and business jobs, people love him in the community. Vertica's culture is really engineering driven and they're all about data. Despite the fact that Vertica is a 15-year-old company, they've really kept pace, and not been polluted by legacy baggage. Vertica, early on, embraced Hadoop and the whole open-source movement. And that helped give it tailwinds. It leaned heavily into cloud, as we're going to talk about further this week. And they got a good story around machine intelligence and AI. So, whereas many traditional database players are really getting hurt, and some are getting killed, by cloud database providers, Vertica's actually doing a pretty good job of servicing its install base, and is in a reasonable position to compete for new workloads. On its last earnings call, the Micro Focus CFO, Stephen Murdoch, he said they're investing 70 to 80 million dollars in two key growth areas, security and Vertica. Now, Micro Focus is running its Suse play on these two parts of its business. What I mean by that, is they're investing and allowing them to be semi-autonomous, spending on R&D and go to market. And, they have no hardware agenda, unlike when Vertica was part of HP, or HPE, I guess HP, before the spin out. Now, let me come back to the big trend in the market today. And there's something going on around analytic databases in the cloud. You've got companies like Snowflake and AWS with Redshift, as we've reported numerous times, and they're doing quite well, they're gaining share, especially of new workloads that are merging, particularly in the cloud native space. They combine scalable compute, storage, and machine learning, and, importantly, they're allowing customers to scale, compute, and storage independent of each other. Why is that important? Because you don't have to buy storage every time you buy compute, or vice versa, in chunks. So, if you can scale them independently, you've got granularity. Vertica is keeping pace. In talking to customers, Vertica is leaning heavily into the cloud, supporting all the major cloud platforms, as we heard from Colin earlier today, adding Google. And, why my research shows that Vertica has some work to do in cloud and cloud native, to simplify the experience, it's more robust in motor stack, which supports many different environments, you know deep SQL, acid properties, and DNA that allows Vertica to compete with these cloud-native database suppliers. Now, Vertica might lose out in some of those native workloads. But, I have to say, my experience in talking with customers, if you're looking for a great MMP column store that scales and runs in the cloud, or on-prem, Vertica is in a very strong position. Vertica claims to be the only MPP columnar store to allow customers to scale, compute, and storage independently, both in the cloud and in hybrid environments on-prem, et cetera, cross clouds, as well. So, while Vertica may be at a disadvantage in a pure cloud native bake-off, it's more robust in motor stack, combined with its multi-cloud strategy, gives Vertica a compelling set of advantages. So, we heard a lot of this from Colin Mahony, who announced Vertica 10.0 in his keynote. He really emphasized Vertica's multi-cloud affinity, it's Eon Mode, which really allows that separation, or scaling of compute, independent of storage, both in the cloud and on-prem. Vertica 10, according to Mahony, is making big bets on in-database machine learning, he talked about that, AI, and along with some advanced regression techniques. He talked about PMML models, Python integration, which was actually something that they talked about doing with Uber and some other customers. Now, Mahony also stressed the trend toward object stores. And, Vertica now supports, let's see S3, with Eon, S3 Eon in Google Cloud, in addition to AWS, and then Pure and HDFS, as well, they all support Eon Mode. Mahony also stressed, as I mentioned earlier, a big commitment to on-prem and the whole cloud optionality thing. So 10.0, according to Colin Mahony, is all about really doubling down on these industry waves. As they say, enabling native PMML models, running them in Vertica, and really doing all the work that's required around ML and AI, they also announced support for TensorFlow. So, object store optionality is important, is what he talked about in Eon Mode, with the news of support for Google Cloud and, as well as HTFS. And finally, a big focus on deployment flexibility. Migration tools, which are a critical focus really on improving ease of use, and you hear this from a lot of customers. So, these are the critical aspects of Vertica 10.0, and an announcement that we're going to be unpacking all week, with some of the experts that I talked about. So, I'm going to close with this. My long-time co-host, John Furrier, and I have talked some time about this new cocktail of innovation. No longer is Moore's law the, really, mainspring of innovation. It's now about taking all these data troves, bringing machine learning and AI into that data to extract insights, and then operationalizing those insights at scale, leveraging cloud. And, one of the things I always look for from cloud is, if you've got a cloud play, you can attract innovation in the form of startups. It's part of the success equation, certainly for AWS, and I think it's one of the challenges for a lot of the legacy on-prem players. Vertica, I think, has done a pretty good job in this regard. And, you know, we're going to look this week for evidence of that innovation. One of the interviews that I'm personally excited about this week, is a new-ish company, I would consider them a startup, called Zebrium. What they're doing, is they're applying AI to do autonomous log monitoring for IT ops. And, I'm interviewing Larry Lancaster, who's their CEO, this week, and I'm going to press him on why he chose to run on Vertica and not a cloud database. This guy is a hardcore tech guru and I want to hear his opinion. Okay, so keep it right there, stay with us. We're all over the Vertica Virtual Big Data Conference, covering in-depth interviews and following all the news. So, theCUBE is going to be interviewing these folks, two days, wall-to-wall coverage, so keep it right there. We're going to be right back with our next guest, right after this short break. This is Dave Vellante and you're watching theCUBE. (upbeat music)

Published Date : Mar 31 2020

SUMMARY :

Brought to you by Vertica. and the Vertica brand, really thrives to this day.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Larry Lancaster	PERSON	0.99+
Colin	PERSON	0.99+
IBM	ORGANIZATION	0.99+
HP	ORGANIZATION	0.99+
70	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
Michael Stonebraker	PERSON	0.99+
Colin Mahony	PERSON	0.99+
Stephen Murdoch	PERSON	0.99+
Vertica	ORGANIZATION	0.99+
EMC	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
Zebrium	ORGANIZATION	0.99+
two days	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
Verica	ORGANIZATION	0.99+
Micro Focus	ORGANIZATION	0.99+
2011	DATE	0.99+
HPE	ORGANIZATION	0.99+
Uber	ORGANIZATION	0.99+
first	QUANTITY	0.99+
Mahony	PERSON	0.99+
Meg Whitman	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Aster Data	ORGANIZATION	0.99+
Snowflake	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
First	QUANTITY	0.99+
12 billion dollar	QUANTITY	0.99+
One	QUANTITY	0.99+
this week	DATE	0.99+
John Furrier	PERSON	0.99+
15-year-old	QUANTITY	0.98+
Python	TITLE	0.98+
Oracle	ORGANIZATION	0.98+
olin Mahony	PERSON	0.98+
around 200 million	QUANTITY	0.98+
Virtual Vertica Big Data Conference 2020	EVENT	0.98+
theCUBE	ORGANIZATION	0.98+
80 million dollars	QUANTITY	0.97+
today	DATE	0.97+
two parts	QUANTITY	0.97+
Vertica Virtual Big Data Conference	EVENT	0.97+
Teradata	ORGANIZATION	0.97+
one	QUANTITY	0.97+
Actian	ORGANIZATION	0.97+

Gabriel Chapman, Pure Storage | Virtual Vertica BDC 2020

>>Yeah, it's the queue covering the virtual vertical Big Data Conference 2020. Brought to you by vertical. >>Hi, everybody. And welcome to this cube special presentation of the vertical virtual Big Data conference. The Cube is running in parallel with Day One and day two of the vertical of Big Data event. By the way, the Cube has been every single big data event in It's our pleasure to be here in the virtual slash digital event as well. Gabriel Chapman is here. He's the director of Flash Blade Products Solutions Marketing at Pure Storage. Great to see you. Thanks for coming on. >>Great to see you too. How's it going? >>It's going very well. I mean, I wish we were meeting in Boston at the Encore Hotel, but, uh, you know, and hopefully we'll be able to meet it, accelerate at some point, future or one of the sub shows that you guys are doing the regional shows, but because we've been covering that show as well. But I really want to get into it. And the last accelerate September 2019 pure and vertical announced. Ah, partnership. I remember a joint being ran up to me and said, Hey, you got to check this out. The separation of compute and storage by EON mode now available on Flash Blade. So, uh and and I believe still the only company that can support that separation and independent scaling both on Prem and in the cloud. So I want to ask, what were the trends and analytical database and cloud led to this partnership? You know, >>realistically, I think what we're seeing is that there's been a kind of a larger shift when it comes to modern analytics platforms towards moving away from the traditional, you know, Hadoop type architecture where we were doing on and leveraging a lot of directors that storage primarily because of the limitations of how that solution was architected. When we start to look at the larger trends towards you know how organizations want to do this type of work on premises, they're looking at solutions that allow them to scale the compute storage pieces independently and therefore, you know, the flash blade platform ended up being a great solution to support America in their transition Tian mode. Leveraging essentially is an S three object store. >>Okay, so let's let's circle back on that you guys in your in your announcement of the flash blade, you make the claim that Flash Blade is the industry's most advanced file and object storage platform ever. That's a bold statement. So defend that What? >>I would like to go beyond that and just say, you know, So we've really kind of looked at this from a standpoint of, you know, as as we've developed Flash Blade as a platform and keep in mind, it's been a product that's been around for over three years now and has been very successful for pure storage. The reality is, is that fast file and fast object as a combined storage platform is a direction that many organizations are looking to go, and we believe that we're a leader in that fast object best file storage place in realistically, which we start to see more organizations start to look at building solutions that leverage cloud storage characteristics. But doing so on Prem for a multitude of different reasons. We've built a platform that really addresses a lot of those needs around simplicity around, you know, making things this year that you know, fast matters for us. Ah, simple is smart. Um we can provide, you know, cloud integrations across the spectrum. And, you know, there's a subscription model that fits into that as well. We fall that that falls into our umbrella of what we consider the modern day takes variance. And it's something that we've built into the entire pure portfolio. >>Okay, so I want to get into the architecture a little bit of flash blade and then understand the fit for, uh, analytic databases generally, but specifically for vertical. So it is a blade, so you got compute and network included. It's a key value store based system. So you're talking about scale out. Unlike, unlike, uh, pure is sort of, you know, initial products which were scale up, Um, and so I want on It is a fabric based system. I want to understand what that all means to take us through the architecture. You know, some of the quote unquote firsts that you guys talk about. So let's start with sort of the blade >>aspect. Yeah, the blade aspect of what we call the flash blade. Because if you look at the actual platform, you have, ah, primarily a chassis with built in networking components, right? So there's ah, fabric interconnect with inside the platform that connects to each one of the individual blades. Individual blades have their own compute that drives basically a pure storage flash components inside. It's not like we're just taking SSD is and plugging them into a system and like you would with the traditional commodity off the shelf hardware design. This is very much an engineered solution that is built towards the characteristics that we believe were important with fast filing past object scalability, massive parallel ization. When it comes to performance and the ability to really kind of grow and scale from essentially seven blades right now to 150 that's that's the kind of scale that customers are looking for, especially as we start to address these larger analytics pools. They are multi petabytes data sets, you know that single addressable object space and, you know, file performance that is beyond what most of your traditional scale up storage platforms are able to deliver. >>Yes, I interviewed cause last September and accelerate, and Christie Pure has been attacked by some of the competitors. There's not having scale out. I asked him his thoughts on that, he said Well, first of all, our flash blade is scale out. He said, Look, anything that adds complexity, you know we avoid. But for the workloads that are associated with flash blade scale out is the right sort of approach. Maybe you could talk about why that is. Well, >>realistically, I think you know that that approach is better when we're starting to work with large, unstructured data sets. I mean, flash blade is unique. The architected to allow customers to achieve superior resource utilization for compute and storage, while at the same time, you know, reducing significantly the complexity that has arisen around this kind of bespoke or siloed nature of big data and analytics solutions. I mean, we're really kind of look at this from a standpoint of you have built and delivered are created applications in the public cloud space of dress, you know, object storage and an unstructured data. And for some organizations, the importance is bringing that on Prem. I mean, we do see about repatriation coming on a lot of organizations as these data egress, charges continue to expand and grow, um, and then organizations that want even higher performance and what we're able to get into the public cloud space. They are bringing that data back on Prem They are looking at from a stamp. We still want to be able to scale the way we scale in the cloud. We still want to operate the same way we operate in the cloud, but we want to do it within control of our own, our own borders. And so that's, you know, that's one of the bigger pieces to that. And we start to look at how do we address cloud characteristics and dynamics and consumption metrics or models? A zealous the benefits and efficiencies of scale that they're able to afford but allowing customers to do that with inside their own data center. >>So you're talking about the trends earlier. You have these cloud native databases that allowed of the scaling of compute and storage independently. Vertical comes in with eon of a lot of times we talk about these these partnerships as Barney deals of you know I love you, You love me. Here's a press release and then we go on or they're just straight, you know, go to market. Are there other aspects of this partnership that they're non Barney deal like, in other words, any specific engineering. Um, you know other go to market programs? Could you talk about that a little bit? Yeah, >>it's it's It's more than just that what we consider a channel meet in the middle or, you know, that Barney type of deal. It's realistically, you know, we've done some first with Veronica that I think, really Courtney, if they think you look at the architecture and how we did, we've brought to market together. Ah, we have solutions. Teams in the back end who are, you know, subject matter experts. In this space, if you talk to joy and the people from vertical, they're very high on our very excited about the partnership because it often it opens up a new set of opportunities for their customers to leverage on mode and get into some of the the nuance task specs of how they leverage the depot depot with inside each individual. Compute node in adjustments with inside their reach. Additional performance gains for customers on Prem and at the same time, for them, that's still tough. The ability to go into that cloud model if they wish to. And so I think a lot of it is around. How do we partner is to companies? How do we do a joint selling motions? How do we show up in and do white papers and all of the traditional marketing aspects that we bring to the market? And then, you know, joint selling opportunities exist where they are, and so that's realistically. I think, like any other organization that's going to market with a partner on MSP that they have, ah, strong partnership with. You'll continue to see us, you know, talking about are those mutually beneficial relationships and the solutions that we're bringing to the market. >>Okay, you know, of course, he used to be a Gartner analyst, and you go to the vendor side now, but it's but it's, but it's a Gartner analyst. You're obviously objective. You see it on, you know well, there's a lot of ways to skin the cat There, there their strengths, weaknesses, opportunities, threats, etcetera for every vendor. So you have you have vertical who's got a very mature stack and talking to a number of the customers out there who are using EON mode. You know there's certain workloads where these cloud native databases makes sense. It's not just the economics of scaling and storage independently. I want to talk more about that. There's flexibility aspect as well. But Vertical really has to play its its trump card, which is Look, we've got a big on premise state, and we're gonna bring that eon capability both on Prem and we're embracing the cloud now. There obviously have been there to play catch up in the cloud, but at the same time, they've got a much more mature stack than a lot of these other cloud native databases that might have just started a couple of years ago. So you know, so there's trade offs that customers have to make. How do you sort through that? Where do you see the interest in this? And and what's the sweet spot for this partnership? You know, we've >>been really excited to build the partnership with vertical A and provide, you know, we're really proud to provide pretty much the only on Prem storage platform that's validated with the yang mode to deliver a modern data experience for our customers together. You know, it's ah, it's that partnership that allows us to go into customers that on Prem space, where I think that there's still not to say that not everybody wants to go there, but I think there's aspects and solutions that worked very well there. But for the vast majority, I still think that there's, you know, the your data center is not going away. And you do want to have control over some of the many of the assets with inside of the operational confines. So therefore, we start to look at how do we can do the best of what cloud offers but on prim. And that's realistically, where we start to see the stronger push for those customers. You still want to manage their data locally. A swell as maybe even worked around some of the restrictions that they might have around cost and complexity hiring. You know, the different types of skills skill sets that are required to bring applications purely cloud native. It's still that larger part of that digital transformation that many organizations are going for going forward with. And realistically, I think they're taking a look at the pros and cons, and we've been doing cloud long enough where people recognize that you know it's not perfect for everything and that there's certain things that we still want to keep inside our own data center. So I mean, realistically, as we move forward, that's, Ah, that better option when it comes to a modern architecture that can do, you know, we can deliver an address, a diverse set of performance requirements and allow the organization to continue to grow the model to the data, you know, based on the data that they're actually trying to leverage. And that's really what Flash was built for. It was built for a platform that could address small files or large files or high throughput, high throughput, low latency scale of petabytes in a single name. Space in a single rack is we like to put it in there. I mean, we see customers that have put 150 flash blades into production as a single name space. It's significant for organizations that are making that drive towards modern data experience with modern analytics platforms. Pure and Veronica have delivered an experience that can address that to a wide range of customers that are implementing uh, you know, particularly on technology. >>I'm interested in exploring the use case. A little bit further. You just sort of gave some parameters and some examples and some of the flexibility that you have, um, and take us through kind of what the customer discussions are like. Obviously you've got a big customer base, you and vertical that that's on Prem. That's the the unique advantage of this. But there are others. It's not just the economics of the granular scaling of compute and storage independently. There are other aspects of take us through that sort of a primary use case or use cases. Yeah, you >>know, I mean, I could give you a couple customer examples, and we have a large SAS analyst company which uses vertical on last way to authenticate the quality of digital media in real time, You know, then for them it makes a big difference is they're doing their streaming and whatnot that they can. They can fine tune the grand we control that. So that's one aspect that that we address. We have a multinational car car company, which uses vertical on flash blade to make thousands of decisions per second for autonomous vehicle decision making trees. You know, that's what really these new modern analytics platforms were built for, um, there's another healthcare organization that uses vertical on flash blade to enable healthcare providers to make decisions in real time. The impact lives, especially when we start to look at and, you know, the current state of affairs with code in the Corona virus. You know, those types of technologies, we're really going to help us kind of get of and help lower invent, bend that curve downward. So, you know, there's all these different areas where we can address that the goals and the achievements that we're trying to look bored with with real time analytics decision making tools like and you know, realistically is we have these conversations with customers they're looking to get beyond the ability of just, you know, a data scientist or a data architect looking to just kind of driving information >>that we're talking about Hadoop earlier. We're kind of going well beyond that now. And I guess what I'm saying is that in the first phase of cloud, it was all about infrastructure. It was about, you know, uh, spin it up. You know, compute and storage is a little bit of networking in there. >>It >>seems like the next new workload that's clearly emerging is you've got. And it started with the cloud native databases. But then bringing in, you know, AI and machine learning tooling on top of that Ah, and then being able to really drive these new types of insights and it's really about taking data these bog this bog of data that we've collected over the last 10 years. A lot of that is driven by a dupe bringing machine intelligence into the equation, scaling it with either cloud public cloud or bringing that cloud experience on Prem scale. You know, across organizations and across your partner network, that really is a new emerging workloads. You see that? And maybe talk a little bit about what you're seeing with customers. >>Yeah. I mean, it really is. We see several trends. You know, one of those is the ability to take a take this approach to move it out of the lab, but into production. Um, you know, especially when it comes to data science projects, machine learning projects that traditionally start out as kind of small proofs of concept, easy to spin up in the cloud. But when a customer wants to scale and move towards a riel you know, derived a significant value from that. They do want to be able to control more characteristic site, and we know machine learning, you know, needs toe needs to learn from a massive amounts of data to provide accuracy. There's just too much data retrieving the cloud for every training job. Same time Predictive analytics without accuracy is not going to deliver the business advantage of what everyone is seeking. You know, we see this. Ah, the visualization of Data Analytics is Tricia deployed is being on a continuum with, you know, the things that we've been doing in the long in the past with data warehousing, data Lakes, ai on the other end. But this way, we're starting to manifest it and organizations that are looking towards getting more utility and better elasticity out of the data that they are working for. So they're not looking to just build apps, silos of bespoke ai environments. They're looking to leverage. Ah, you know, ah, platform that can allow them to, you know, do ai, for one thing, machine learning for another leverage multiple protocols to access that data because the tools are so much Jeff um, you know, it is a growing diversity of of use cases that you can put on a single platform I think organizations are looking for as they try to scale these environment. >>I think it's gonna be a big growth area in the coming years. Gable. I wish we were in Boston together. You would have painted your little corner of Boston orange. I know that you guys have but really appreciate you coming on the cube wall to wall coverage. Two days of the vertical vertical virtual big data conference. Keep it right there. Right back. Right after this short break, Yeah.

Published Date : Mar 31 2020

SUMMARY :

Brought to you by vertical. of the vertical of Big Data event. Great to see you too. future or one of the sub shows that you guys are doing the regional shows, but because we've been you know, the flash blade platform ended up being a great solution to support America Okay, so let's let's circle back on that you guys in your in your announcement of the I would like to go beyond that and just say, you know, So we've really kind of looked at this from a standpoint you know, initial products which were scale up, Um, and so I want on It is a fabric based object space and, you know, file performance that is beyond what most adds complexity, you know we avoid. you know, that's one of the bigger pieces to that. straight, you know, go to market. it's it's It's more than just that what we consider a channel meet in the middle or, you know, So you know, so there's trade offs that customers have to make. been really excited to build the partnership with vertical A and provide, you know, we're really proud to provide pretty and some examples and some of the flexibility that you have, um, and take us through you know, the current state of affairs with code in the Corona virus. It was about, you know, uh, spin it up. But then bringing in, you know, AI and machine learning data because the tools are so much Jeff um, you know, it is a growing diversity of I know that you guys have but really appreciate you coming on the cube wall to wall coverage.

ENTITIES

Entity	Category	Confidence
Gabriel Chapman	PERSON	0.99+
September 2019	DATE	0.99+
Boston	LOCATION	0.99+
Barney	ORGANIZATION	0.99+
Gartner	ORGANIZATION	0.99+
Two days	QUANTITY	0.99+
Veronica	PERSON	0.99+
Jeff	PERSON	0.99+
last September	DATE	0.99+
thousands	QUANTITY	0.98+
150	QUANTITY	0.98+
Courtney	PERSON	0.98+
one	QUANTITY	0.98+
one aspect	QUANTITY	0.98+
Day One	QUANTITY	0.97+
day two	QUANTITY	0.97+
seven blades	QUANTITY	0.97+
both	QUANTITY	0.96+
Virtual Vertica	ORGANIZATION	0.96+
over three years	QUANTITY	0.96+
150 flash blades	QUANTITY	0.95+
first	QUANTITY	0.95+
single rack	QUANTITY	0.94+
Corona virus	OTHER	0.94+
single name	QUANTITY	0.94+
first phase	QUANTITY	0.94+
Pure Storage	ORGANIZATION	0.93+
Prem	ORGANIZATION	0.92+
Christie Pure	ORGANIZATION	0.91+
single platform	QUANTITY	0.91+
each individual	QUANTITY	0.91+
this year	DATE	0.91+
firsts	QUANTITY	0.9+
Big Data Conference 2020	EVENT	0.9+
America	LOCATION	0.89+
Flash Blade Products Solutions	ORGANIZATION	0.89+
couple of years ago	DATE	0.88+
single name	QUANTITY	0.84+
each one	QUANTITY	0.84+
one thing	QUANTITY	0.83+
Tricia	PERSON	0.82+
Pure	ORGANIZATION	0.81+
last 10 years	DATE	0.8+
Hadoop	TITLE	0.75+
single addressable	QUANTITY	0.74+
second	QUANTITY	0.72+
Veronica	ORGANIZATION	0.7+
Encore Hotel	LOCATION	0.68+
Big Data	EVENT	0.67+
Cube	COMMERCIAL_ITEM	0.66+
SAS	ORGANIZATION	0.65+
Flash Blade	TITLE	0.62+
petabytes	QUANTITY	0.62+
eon	ORGANIZATION	0.59+
couple customer	QUANTITY	0.55+
EON	ORGANIZATION	0.53+
single big	QUANTITY	0.5+
Big	EVENT	0.49+
years	DATE	0.48+
sub	QUANTITY	0.46+
2020	DATE	0.33+

UNLIST TILL 4/2 - The Road to Autonomous Database Management: How Domo is Delivering SLAs for Less

hello everybody and thank you for joining us today at the virtual Vertica BBC 2020 today's breakout session is entitled the road to autonomous database management how Domo is delivering SLA for less my name is su LeClair I'm the director of marketing at Vertica and I'll be your host for this webinar joining me is Ben white senior database engineer at Domo but before we begin I want to encourage you to submit questions or comments during the virtual session you don't have to wait just type your question or comment in the question box below the slides and click Submit there will be a Q&A session at the end of the presentation we'll answer as many questions as we're able to during that time any questions that we aren't able to address or drew our best to answer them offline alternatively you can visit vertical forums to post your questions there after the session our engineering team is planning to join the forum to keep the conversation going also as a reminder you can maximize your screen by clicking the double arrow button in the lower right corner of the slide and yes this virtual session is being recorded and will be available to view on demand this week we'll send you notification as soon as it's ready now let's get started then over to you greetings everyone and welcome to our virtual Vertica Big Data conference 2020 had we been in Boston the song you would have heard playing in the intro would have been Boogie Nights by heatwaves if you've never heard of it it's a great song to fully appreciate that song the way I do you have to believe that I am a genuine database whisperer then you have to picture me at 3 a.m. on my laptop tailing a vertical log getting myself all psyched up now as cool as they may sound 3 a.m. boogie nights are not sustainable they don't scale in fact today's discussion is really all about how Domo engineers the end of 3 a.m. boogie nights again well I am Ben white senior database engineer at Domo and as we heard the topic today the road to autonomous database management how Domo is delivering SLA for less the title is a mouthful in retrospect I probably could have come up with something snazzy er but it is I think honest for me the most honest word in that title is Road when I hear that word it evokes for me thoughts of the journey and how important it is to just enjoy it when you truly embrace the journey often you look up and wonder how did we get here where are we and of course what's next right now I don't intend to come across this too deep so I'll submit there's nothing particularly prescient and simply noticing the elephant in the room when it comes to database economy my opinion is then merely and perhaps more accurately my observation the office context imagine a place where thousands and thousands of users submit millions of ad-hoc queries every hour now imagine someone promised all these users that we could deliver bi leverage at cloud scale in record time I know what many of you should be thinking who in the world would do such a thing of course that news was well received and after the cheers from executives and business analysts everywhere and chance of Keep Calm and query on finally started to subside someone that turns an ass that's possible we can do that right except this is no imaginary place this is a very real challenge we face the demo through imaginative engineering demo continues to redefine what's possible the beautiful minds at Domo truly embrace the database engineering paradigm that one size does not fit all that little philosophical nugget is one I would pick up while reading the white papers and books of some guy named stone breaker so to understand how I and by extension Domo came to truly value analytic database administration look no further than that philosophy and what embracing it would mean it meant really that while others were engineering skyscrapers we would endeavor to build Datta neighborhoods with a diverse kapala G of database configuration this is where our journey at Domo really gets under way without any purposeful intent to define our destination not necessarily thinking about database as a service or anything like that we had planned this ecosystem of clusters capable of efficiently performing varied workloads we achieve this with custom configurations for node count resource pool configuration parameters etc but it also meant concerning ourselves with the unattended consequences of our ambition the impact of increased DDL activities on the catalog system overhead in general what would be the management requirements of an ever-evolving infrastructure we would be introducing multiple points of failure what are the advantages the disadvantages those types of discussions and considerations really help to define what would be the basic characteristics of our system the database itself needed to be trivial redundant potentially ephemeral customizable and above all scalable and we'll get more into that later with this knowledge of what we were getting into automation would have to be an integral part of development one might even say automation will become the first point of interest on our journey now using popular DevOps tools like saltstack terraform ServiceNow everything would be automated I mean it discluded everything from larger multi-step tasks like database designs database cluster creation and reboots to smaller routine tasks like license updates move-out and projection refreshes all of this cool automation certainly made it easier for us to respond to problems within the ecosystem these methods alone still if our database administration reactionary and reacting to an unpredictable stream of slow query complaints is not a good way to manage a database in fact that's exactly how three a.m. Boogie Nights happen and again I understand there was a certain appeal to them but ultimately managing that level of instability is not sustainable earlier I mentioned an elephant in the room which brings us to the second point of interest on our road to autonomy analytics more specifically analytic database administration why our analytics so important not just in this case but generally speaking I mean we have a whole conference set up to discuss it domo itself is self-service analytics the answer is curiosity analytics is the method in which we feed the insatiable human curiosity and that really is the impetus for analytic database administration analytics is also the part of the road I like to think of as a bridge the bridge if you will from automation to autonomy and with that in mind I say to you my fellow engineers developers administrators that as conductors of the symphony of data we call analytics we have proven to be capable producers of analytic capacity you take pride in that and rightfully so the challenge now is to become more conscientious consumers in some way shape or form many of you already employ some level of analytics to inform your decisions far too often we are using data that would be categorized as nagging perhaps you're monitoring slow queries in the management console better still maybe you consult the workflows analyzing how about a logging and alerting system like sumo logic if you're lucky you do have demo where you monitor and alert on query metrics like this all examples of analytics that help inform our decisions being a Domo the incorporation of analytics into database administration is very organic in other words pretty much company mandated as a company that provides BI leverage a cloud scale it makes sense that we would want to use our own product could be better at the business of doma adoption of stretches across the entire company and everyone uses demo to deliver insights into the hands of the people that need it when they need it most so it should come as no surprise that we have from the very beginning use our own product to make informed decisions as it relates to the application back engine in engineering we call it our internal system demo for Domo Domo for Domo in its current iteration uses a rules-based engine with elements through machine learning to identify and eliminate conditions that cause slow query performance pulling data from a number of sources including our own we could identify all sorts of issues like global query performance actual query count success rate for instance as a function of query count and of course environment timeout errors this was a foundation right this recognition that we should be using analytics to be better conductors of curiosity these types of real-time alerts were a legitimate step in the right direction for the engineering team though we saw ourselves in an interesting position as far as demo for demo we started exploring the dynamics of using the platform to not only monitor an alert of course but to also triage and remediate just how much economy could we give the application what were the pros and cons of that Trust is a big part of that equation trust in the decision-making process trust that we can mitigate any negative impacts and Trust in the very data itself still much of the data comes from systems that interacted directly and in some cases in directly with the database by its very nature much of the data was past tense and limited you know things that had already happened without any reference or correlation to the condition the mayor to those events fortunately the vertical platform holds a tremendous amount of information about the transaction it had performed its configurations the characteristics of its objects like tables projections containers resource pools etc this treasure trove of metadata is collected in the vertical system tables and the appropriately named data collector tables as a version 9 3 there are over 190 tables that define the system tables while the data collector is the collection of 215 components a rich collection can be found in the vertical system tables these tables provide a robust stable set of views that let you monitor information about your system resources background processes workload and performance allowing you to more efficiently profile diagnose and correlate historical data such as low streams query profiles to pool mover operations and more here you see a simple query to retrieve the names and descriptions of the system tables and an example of some of the tables you'll find the system tables are divided into two schemas the catalog schema contains information about persistent objects and the monitor schema tracks transient system States most of the tables you find there can be grouped into the following areas system information system resources background processes and workload and performance the Vertica data collector extends system table functionality by gathering and retaining aggregating information about your database collecting the data collector mixes information available in system table a moment ago I show you how you get a list of the system tables in their description but here we see how to get that information for the data collector tables with data from the data collecting tables in the system tables we now have enough data to analyze that we would describe as conditional or leading data that will allow us to be proactive in our system management this is a big deal for Domo and particularly Domo for demo because from here we took the critical next step where we analyze this data for conditions we know or suspect lead to poor performance and then we can suggest the recommended remediation really for the first time we were using conditional data to be proactive in a database management in record time we track many of the same conditions the Vertica support analyzes via scrutinize like tables with too many production or non partition fact tables which can negatively affect query performance and life in vertical in viral suggests if the table has a data a time step column you recommend the partitioning by the month we also can track catalog sizes percentage of total memory and alert thresholds and trigger remediations requests per hour is a very important metric in determining when a trigger are scaling solution tracking memory usage over time allows us to adjust resource pool parameters to achieve the optimal performance for the workload of course the workload analyzer is a great example of analytic database administration I mean from here one can easily see the logical next step where we were able to execute these recommendations manually or automatically be of some configuration parameter now when I started preparing for this discussion this slide made a lot of sense as far as the logical next iteration for the workload analyzing now I left it in because together with the next slide it really illustrates how firmly Vertica has its finger on the pulse of the database engineering community in 10 that OS management console tada we have the updated work lies will load analyzer we've added a column to show tuning commands the management console allows the user to select to run certain recommendations currently tuning commands that are louder and alive statistics but you can see where this is going for us using Domo with our vertical connector we were able to then pull the metadata from all of our clusters we constantly analyze that data for any number of known conditions we build these recommendations into script that we can then execute immediately the actions or we can save it to a later time for manual execution and as you would expect those actions are triggered by thresholds that we can set from the moment nyan mode was released to beta our team began working on a serviceable auto-scaling solution the elastic nature of AI mode separated store that compute clearly lent itself to our ecosystems requirement for scalability in building our system we worked hard to overcome many of the obstacles they came with the more rigid architecture of enterprise mode but with the introduction is CRM mode we now have a practical way of giving our ecosystem at Domo the architectural elasticity our model requires using analytics we can now scale our environment to match demand what we've built is a system that scales without adding management overhead or our necessary cost all the while maintaining optimal performance well we're really this is just our journey up to now and which begs the question what's next for us we expand the use of Domo for Domo within our own application stack maybe more importantly we continue to build logic into the tools we have by bringing machine learning and artificial intelligence to our analysis and decision making really do to further illustrate those priorities we announced the support for Amazon sage maker autopilot at our demo collusive conference just a couple of weeks ago for vertical the future must include in database economy the enhanced capabilities in the new management console to me are clear nod to that future in fact with a streamline and lightweight database design process all the pieces should be in place versions deliver economists database management itself we'll see well I would like to thank you for listening and now of course we will have a Q&A session hopefully very robust thank you [Applause]

Published Date : Mar 31 2020

SUMMARY :

conductors of the symphony of data we

ENTITIES

Entity	Category	Confidence
Boston	LOCATION	0.99+
Vertica	ORGANIZATION	0.99+
thousands	QUANTITY	0.99+
Domo	ORGANIZATION	0.99+
3 a.m.	DATE	0.99+
Amazon	ORGANIZATION	0.99+
today	DATE	0.99+
first time	QUANTITY	0.98+
this week	DATE	0.97+
over 190 tables	QUANTITY	0.97+
two schemas	QUANTITY	0.96+
second point	QUANTITY	0.96+
215 components	QUANTITY	0.96+
first point	QUANTITY	0.96+
three a.m.	DATE	0.96+
Boogie Nights	TITLE	0.96+
millions of ad-hoc queries	QUANTITY	0.94+
Domo	TITLE	0.93+
Vertica Big Data conference 2020	EVENT	0.93+
Ben white	PERSON	0.93+
10	QUANTITY	0.91+
thousands of users	QUANTITY	0.9+
one size	QUANTITY	0.89+
saltstack	TITLE	0.88+
4/2	DATE	0.86+
a couple of weeks ago	DATE	0.84+
Datta	ORGANIZATION	0.82+
end of 3 a.m.	DATE	0.8+
Boogie Nights	EVENT	0.78+
double arrow	QUANTITY	0.78+
every hour	QUANTITY	0.74+
ServiceNow	TITLE	0.72+
DevOps	TITLE	0.72+
Database Management	TITLE	0.69+
su LeClair	PERSON	0.68+
many questions	QUANTITY	0.63+
SLA	TITLE	0.62+
The Road	TITLE	0.58+
Vertica BBC	ORGANIZATION	0.56+
2020	EVENT	0.55+
database management	TITLE	0.52+
Domo Domo	TITLE	0.46+
version 9 3	OTHER	0.44+

Jeff Healey, Vertica at Micro Focus | CUBEConversations, March 2020

>> Narrator: From theCUBE studios in Palo Alto in Boston, connecting with top leaders all around the world, this is theCUBE Conversation. >> Hi everybody, I'm Dave Vellante, and welcome to the Vertica Big Data Conference virtual. This is our digital presentation, wall to wall coverage actually, of the Vertica Big Data Conference. And with me is Jeff Healy, who directs product marketing at Vertica. Jeff, good to see you. >> Good to see you, Dave. Thanks for the opportunity to chat. >> You're very welcome Now I'm excited about the products that you guys announced and you're hardcore into product marketing, but we're going to talk about the Vertica Big Data Conference. It's been a while since you guys had this. Obviously, new owner, new company, some changes, but that new company Microfocus has announced that it's investing, I think the number was $70 million into two areas. One was security and the other, of course, was Vertica. So we're really excited to be back at the virtual Big Data Conference. And let's hear it from you, what are your thoughts? >> Yeah, Dave, thanks. And we love having theCUBE at all of these events. We're thrilled to have the next Vertica Big Data Conference. Actually it was a physical event, we're moving it online. We know it's going to be a big hit because we've been doing this for some time particularly with two of the webcast series we have every month. One is under the Hood Webcast Series, which is led by our engineers and the other is what we call a Data Disruptors Webcast Series, which is led by all customers. So we're really confident this is going to be a big hit we've seen the registration spike. We just hit 1,000 and we're planning on having about 1,000 at the physical event. It's growing and growing. We're going to see those big numbers and it's not going to be a one time thing. We're going to keep the conversation going, make sure there's plenty of best practices learning throughout the year. >> We've been at all the big BDCs and the first one's were really in the heart of the Big Data Movement, really exciting time and the interesting thing about this event is it was always sort of customers talking to customers. There wasn't a lot of commercials, an intimate event. Of course I loved it because it was in our hometown. But I think you're trying to carry that theme obviously into the digital sphere. Maybe you can talk about that a little bit. >> Yeah, Dave, absolutely right. Of course, nothing replaces face to face, but everything that you just mentioned that makes it special about the Big Data Conference, and you know, you guys have been there throughout and shown great support in talking to so many customers and leaders and what have you. We're doing the same thing all right. So we had about 40 plus sessions planned for the physical event. We're going to run half of those and we're not going to lose anything though, that's the key point. So what makes the Vertica Big Data Conference really special is that the only presenters that are allowed to present are either engineers, Vertica engineers, or best practices engineers and then customers. Customers that actually use the product. There's no sales or marketing pitches or anything like that. And I'll tell you as far as the customer line up that we have, we've got five or six already lined up as part of those 20 sessions, customers like Uber, customers like the Trade Desk, customers like Phillips talking about predictive maintenance, so list goes on and on. You won't want to miss it if you're on the fence or if you're trying to figure out if you want to register for this event. Best part about it, it's all free, and if you can't attend it live, it will be live Q&A chat on every single one of those sessions, we promise we'll answer every question if we don't get it live, as we always do. They'll all be available on demand. So no reason not to register and attend or watch later. >> Thinking about the content over the years, in the early days of the Big Data Conference, of course Vertica started before the whole Big Data Conference meme really took off and then as it took off, plugged right into it, but back then the discussion was a lot of what do I do with big data, Gartner's three Vs and how do I wrangle it all, and what's the best approach and this stuff is, Hadoop is really complicated. Of course Vertica was an alternative to RDBMS that really couldn't scale or give that type of performance for analytical databases so you had your foot in that door. But now the conversation that's interesting your theme, it's win big with data. Of course, the physical event was at the Encore, which is the new Casino in Boston. But my point is, the conversation is no longer about, how to wrangle all this data, you know how to lower the cost of storing this data, how to make it go faster, and actually make it work. It's really about how to turn data into insights and transform your organizations and quote and quote, win with big data. >> That's right. Yeah, that's great point, Dave. And that's why I mean, we chose the title really, because it's about our customers and what they're able to do with our platform. And it's we know, it's not just one platform, all of the ecosystem, all of our incredible partners. Yeah it's funny when I started with the organization about seven years ago, we were closing lots of deals, and I was following up on case studies and it was like, Okay, why did you choose Vertica? Well, the queries went fast. Okay, so what does that mean for your business? We knew we're kind of in the early adopter stage. And we were disrupting the data warehouse market. Now we're talking to our customers that their volumes are growing, growing and growing. And they really have these analytical use cases again, talk to the value at the entire organization is gaining from it. Like that's the difference between now and a few years ago, just like you were saying, when Vertica disrupted the database market, but also the data warehouse market, you can speak to our customers and they can tell you exactly what's happening, how it's moving the needle or really advancing the entire organization, regardless of the analytical use case, whether it's an internet of things around predictive maintenance, or customer behavior analytics, they can speak confidently of it more than just, hey, our queries went faster. >> You know, I've mentioned before the Micro Focus investment, I want to drill into that a bit because the Vertica brand stands alone. It's a Micro Focus company, but Vertica has its own sort of brand awareness. The reason I've mentioned that is because if you go back to the early days of MPP Database, there was a spate of companies, startups that formed. And many if not all of those got acquired, some lived on with the Codebase, going into the cloud, but generally speaking, many of those brands have gone away Vertica stays. And so my point is that we've seen Vertica have staying power throughout, I think it's a function of the architecture that Stonebraker originally envisioned, you guys were early on the market had a lot of good customer traction, and you've been very responsive to a lot of the trends. Colin Mahony will talk about how you adopted and really embrace cloud, for example, and different data formats. And so you've really been able to participate in a lot of the new emerging waves that have come out to the market. And I would imagine some of that's cultural. I wonder if you could just address that in the context of BDC. >> Oh, yeah, absolutely. You hit on all the key points here, Dave. So a lot of changes in the industry. We're in the hottest industry, the tech industry right now. There's lots of competition. But one of the things we'll say in terms of, Hey, who do you compete with? You compete with these players in the cloud, open source alternatives, traditional enterprise data warehouses. That's true, right. And one of the things we've stayed true within calling is really kind of led the charge for the organization is that we know who we are right. So we're an analytical database platform. And we're constantly just working on that one sole Source Code base, to make sure that we don't provide a bunch of different technologies and databases, and different types of technologies need to stitch together. This platform just has unbelievable universal capabilities from everything from running analytics at scale, to in Database Machine Learning with the different approach to all different types of deployment models that are supported, right. We don't go to our companies and we say, yeah, we take care of all your problems but you have to stitch together all these different types of technologies. It's all based on that core Vertica engine, and we've expanded it to meet all these market needs. So Colin knows and what he believes and what he tells the team what we lead with, is that it lead with that one core platform that can address all these analytical initiatives. So we know who we are, we continue to improve on it, regardless of the pivots and the drastic measures that some of the other competitors have taken. >> You know, I got to ask you, so we're in the middle of this global pandemic with Coronavirus and COVID-19, and things change daily by the hour sometimes by the minute. I mean, every day you get up to something new. So you see a lot of forecasts, you see a lot of probability models, best case worst case likely case even though nobody really knows what that likely case looks like, So there's a lot of analytics going on and a lot of data that people are crunching new data sources come in every day. Are you guys participating directly in that, specifically your customers? Are they using your technology? You can't use a traditional data warehouse for this. It's just you know, too slow to asynchronous, the process is cumbersome. What are you seeing in the customer base as it relates to this crisis? >> Sure, well, I mean naturally, we have a lot of customers that are healthcare technology companies, companies, like Cerner companies like Philips, right, that are kind of leading the charge here. And of course, our whole motto has always been, don't throw away any the data, there's value in that data, you don't have to with Vertica right. So you got petabyte scale types of analytics across many of our customers. Again, just a few years ago, we called the customers a petabyte club. Now a majority of our large enterprise software companies are approaching those petabyte volumes. So it's important to be able to run those analytics at that scale and that volume. The other thing we've been seeing from some of our partners is really putting that analytics to use with visualizations. So one of the customers that's going to be presenting as part of the Vertica Big Data conferences is Domo. Domo has a really nice stout demo around be able to track the Coronavirus the outbreak and how we're getting care and things like that in a visual manner you're seeing more of those. Well, Domo embeds Vertica, right. So that's another customer of ours. So think of Vertica is that embedded analytical engine to support those visualizations so that just anyone in the world can track this. And hopefully as we see over time, cases go down we overcome this. >> Talk a little bit more about that. Because again, the BDC has always been engineers presenting to audiences, you guys have a lot of you just mentioned the demo by Domo, you have a lot of brand names that we've interviewed on theCUBE before, but maybe you could talk a little bit more about some of the customers that are going to be speaking at the virtual event, and what people can expect. >> Sure, yeah, absolutely. So we've got Uber that's presenting just a quick fact around Uber. Really, the analytical data warehouse is all Vertica, right. And it works very closely with Open Source or what have you. Just to quick stat on on Uber, 14 million rides per day, what Uber is able to do is connect the riders with the drivers so that they can determine the appropriate pricing. So Uber is going to be a great session that everyone will want to tune in on that. Others like the Trade Desk, right massive Ad Tech company 10 billion ad auctions daily, it may even be per second or per minute, the amount of scale and analytical volume that they have, that they are running the queries across, it can really only be accomplished with a few platforms in the world and that's Vertica that's another a hot one is with the Trade Desk. Philips is going to be presenting IoT analytical workloads we're seeing more and more of those across not only telematics, which you would expect within automotive, but predictive maintenance that cuts across all the original manufacturers and Philips has got a long history of being able to handle sensor data to be able to apply to those business cases where you can improve customer satisfaction and lower costs related to services. So around their MRI machines and predictive maintenance initiative, again, Vertica is kind of that heartbeat, that analytical platform that's driving those initiatives So list goes on and on. Again, the conversation is going to continue with the Data Disruptors in the Under Hood webcast series. Any customers that weren't able to present and we had a few that just weren't able to do it, they've already signed up for future months. So we're already booked out six months out more and more customer stories you're going to hear from Vertica.com. >> Awesome, and we're going to be sharing some of those on theCUBE as well, the BDC it's always been intimate event, one of my favorites, a lot of substance and I'm sure the online version, the virtual digital version is going to be the same. Jeff Healey, thanks so much for coming on theCUBE and give us a little preview of what we can expect at the Vertica BDC 2020. >> You bet. >> Thank you. >> Yeah, Dave, thanks to you and the whole CUBE team. Appreciate it >> Alright, and thank you for watching everybody. Keep it right here for all the coverage of the virtual Big Data conference 2020. You're watching theCUBE. I'm Dave Vellante, we'll see you soon

Published Date : Mar 20 2020

SUMMARY :

connecting with top leaders all around the world, actually, of the Vertica Big Data Conference. Thanks for the opportunity to chat. Now I'm excited about the products that you guys announced and it's not going to be a one time thing. and the interesting thing about this event is that the only presenters that are allowed to present how to wrangle all this data, you know how to lower the cost all of the ecosystem, all of our incredible partners. in a lot of the new emerging waves So a lot of changes in the industry. and a lot of data that people are crunching So one of the customers that's going to be presenting that are going to be speaking at the virtual event, Again, the conversation is going to continue and I'm sure the online version, the virtual digital version Yeah, Dave, thanks to you and the whole CUBE team. of the virtual Big Data conference 2020.

ENTITIES

Entity	Category	Confidence
Jeff Healy	PERSON	0.99+
Philips	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Jeff Healey	PERSON	0.99+
Colin Mahony	PERSON	0.99+
Vertica	ORGANIZATION	0.99+
five	QUANTITY	0.99+
Dave	PERSON	0.99+
Microfocus	ORGANIZATION	0.99+
Jeff	PERSON	0.99+
Palo Alto	LOCATION	0.99+
Uber	ORGANIZATION	0.99+
$70 million	QUANTITY	0.99+
Colin	PERSON	0.99+
20 sessions	QUANTITY	0.99+
six	QUANTITY	0.99+
two	QUANTITY	0.99+
Boston	LOCATION	0.99+
March 2020	DATE	0.99+
Gartner	ORGANIZATION	0.99+
One	QUANTITY	0.99+
six months	QUANTITY	0.99+
Domo	ORGANIZATION	0.98+
one platform	QUANTITY	0.98+
Big Data Conference	EVENT	0.98+
two areas	QUANTITY	0.98+
one	QUANTITY	0.98+
CUBE	ORGANIZATION	0.98+
Vertica Big Data Conference	EVENT	0.98+
Coronavirus	OTHER	0.98+
Stonebraker	ORGANIZATION	0.98+
about 40 plus sessions	QUANTITY	0.97+
COVID-19	OTHER	0.96+
BDC	ORGANIZATION	0.96+
one core platform	QUANTITY	0.95+
Vertica BDC 2020	EVENT	0.95+
1,000	QUANTITY	0.95+
Vertica Big Data	EVENT	0.95+
one time	QUANTITY	0.95+
Micro Focus	ORGANIZATION	0.94+
few years ago	DATE	0.93+
about 1,000	QUANTITY	0.93+
Codebase	ORGANIZATION	0.93+
Phillips	ORGANIZATION	0.93+
Cerner	ORGANIZATION	0.92+
10 billion ad auctions	QUANTITY	0.91+
14 million rides per day	QUANTITY	0.9+
Coronavirus	EVENT	0.89+
first one	QUANTITY	0.89+
Under Hood	TITLE	0.86+
Hadoop	TITLE	0.85+
BDC	EVENT	0.83+
seven years ago	DATE	0.8+
outbreak	EVENT	0.79+

Pete Gerr, Dell EMC | RSAC USA 2020

>> Announcer: Live from San Francisco, it's theCUBE covering RSA Conference 2020 San Francisco, brought to you by SiliconANGLE Media. >> Okay, welcome back, everyone, to CUBE's coverage here in San Francisco at RSA Conference 2020. I'm John Furrier, your host. You know, cybersecurity industry's changing. Enterprises are now awake to the fact that it's now a bigger picture around securing the enterprise, 'cause it's not only the data center. It's cloud, it's the edge, a lot of great stuff. We've got a great guest here from Dell EMC. Peter Gerr's a consultant, cyber resilience solutions and services marketing at Dell EMC. Great to see you. >> You too, John. >> Thanks for coming on. >> Good to see you again, thank you. >> So, you know, I was joking with Dave Volante just this morning around the three waves of cloud, public cloud, hybrid cloud, multicloud. And we see obviously the progression. Hybrid cloud is where everyone spends most of their time. That's from ground to cloud, on-premises to cloud. So pretty much everyone knows-- >> Peter: On-ramp, kind of. >> That on-prem is not going away. Validated by all the big cloud players. but you got to nail the equation down for on-premises to the cloud, whether it's, I'm Amazon-Amazon, Azure-Azure, whatever, all those clouds. But the multicloud will be a next generation wave. That as an industry backdrop is very, very key. Plus AI and data are huge inputs into solving a lot of what is going to be new gaps, blind spots, whatever insecurity. So I got to, you know, Dell has a history with huge client base, traditional enterprises transforming. You're in the middle of all this, so you got the airplane at 30,000 feet and the companies have to swap out their engines and reboot their teams, and it's a huge task. What's going on with cyber and the enterprises? What are some of the key things? >> Well, so I like to keep it pretty simple. I've been in this industry over 20 years and I've really consistently talked about data as the global currency, right? So it's beautifully simple. Whatever industry you're in, whatever size company you're in, enterprise or even now small to medium businesses, their businesses are driven by data. Connectivity to that data, availability of the data, integrity of the data, and confidentiality of the data. And so sort of the area of the world that I focus upon is protecting customers' most valuable data assets, now, whether those are on-prem, in the cloud, or in a variety of modalities, and ensuring that those assets are protected and isolated from the attack surface, and then ability to recover those critical assets quickly so they can resume business operations. That's really the area that I work in. Now, that data, as you pointed out, it could start on-prem. It could live in multicloud. It can live in a hybrid environment. The key is really to understand that not all data is created equally. If you were to have a widespread cyber attack, really the key is to bring up those critical applications systems and data sets first to return to business operations. >> Yeah, it's funny-- >> Peter: It's really challenging >> You know, it's not funny, it's actually just ironic, but it's really kind of indicative of the society now is that EMC was bought by Dell Storage and the idea of disruption has always been a storage concept. We don't want a lot of disruption when we're doing things, right? >> Peter: None, we can't, yeah. >> So whether it's backup and recovery or cyber ransomware, whatever it is, the idea of non-disruptive operations-- >> Absolutely. >> Has been a core tenant. Now, that's obviously the same for cyber, as you can tell. So I got to ask you, what is your definition and view of cyber resilience? Because, well, that's what we're talking about here, cyber resilience. What's your view on that? >> So when we started developing our cyber recovery solution about five years ago, we used the NIST cybersecurity framework, which is a very well-known standard that defines really five pillars of how organizations can think about building a cyber resilience strategy. A cyber resilience strategy really encompasses everything from perimeter threat detection and response all the way through incident response after an attack and everything that happens in between, protecting the data and recovering the data, right? And critical systems. So I think of cyber resilience as that holistic strategy of protecting an organization and its data from a cyber attack. >> That's great insight. I want to get your thoughts on how that translates into the ecosystem, because this is an ecosystem around cyber resilience. >> Peter: Absolutely. >> And let's just say, and you may or may not be able to comment on this, but RSA is now being sold. >> Peter: Yeah, no, that's fair. >> So that's going out of the Dell family. But you guys have obviously VMware and Secureworks. But it's not just you guys. It's an ecosystem. >> It really is. >> How does Dell now without, with and without RSA, fit into the ecosystem? >> So as I mentioned, cyber resilience is really thought of as a holistic strategy. RSA and other Dell assets like Carbon Black fit in somewhere in that continuum, right? So RSA is really more on threat detection and response, perimeter protection. The area of the business that I work on, data protection and cyber recovery, really doesn't address the prevention of attacks. We really start with the premise that preventing a cyber attack is not 100% possible. If you believe that, then you need to look at protecting and recovering your assets, right? And so whether it's RSA, whether it's Carbon Black, whether it's Secureworks, which is about cyber incident and response, we really work across those groups. It's about technology, processes, and people. It's not any one thing. We also work outside of the Dell technologies umbrella. So we integrate, our cyber recovery solution is integrated with Unisys Stealth. So there's an example of how we're expanding and extending the cyber recovery solution to bring in other industry standards. >> You know, it's interesting. I talk to a lot of people, like, I'm on theCube here at RSA. Everyone wants better technology, but there's also a shift back to best-of-breed, 'cause you want to have the best new technology, but at the same time, you got to have proven solutions. >> Peter: That's the key. >> So what are you guys selling, what is the best-of-breed from Dell that you guys are delivering to customers? What are some of the areas? >> So I'm old EMC guy myself, right? And back from the days of disaster recovery and business continuity, right? More traditional data protection and backup. The reality is that the modern threats of cyber hackers, breaches, insider attacks, whatever you like, those traditional data protection strategies weren't built to address those types of threats. So along with transformation and modernization, we need to modernize our data protection. That's what cyber recovery is. It's a modern solution to the modern threat. And what it does is it augments your data, excuse me, your disaster recovery and your backup environment with a purpose-built isolated air gap digital vault which is built around our proven Data Domain and PowerProtect DD platforms that have been around for over a decade. But what we've done is added intelligence, analytics, we've hardened that system, and we isolate it so customers can protect really their most valuable assets in that kind of a vault. >> So one of things I've been doing some research on and digging into is cyber resilience, which you just talked about, cyber security, which is the industry trend, and you're getting at cyber recovery, okay? >> Peter: Correct. >> Can you talk about some examples of how this all threads together? What are some real recent wins or examples? >> Sure, sure. So think of cyber recovery as a purpose-built digital vault to secure your most valuable assets. Let me give you an example. One of our customers is a global paint manufacturer, okay? And when we worked with them to try to decide what of their apps and data sets should go into this cyber recovery vault, we said, "What is the most critical intellectual property "that you have?" So in their case, and, you know, some customers might say my Oracle financials or my Office 365 environment. For this customer it was their proprietary paint matching system. So they generate $80 to $100 million every day based upon this proprietary paint matching system which they've developed and which they use every day to run their business. If that application, if those algorithms were destroyed, contaminated, or posted on the public internet somewhere, that would fundamentally change that company. So that's really what we're talking about. We're working with customers to help them identify their most critical assets, data, systems, applications, and isolate those from the threat vector. >> Obviously all verticals are impacted by cyber security. >> Every vertical is data-driven, that's right. >> And so obviously the low-hanging fruit, are they the normal suspects, financial services? Is there a particular one that's hotter than, obviously financial services has got fraud and all that stuff on it, but is that still number one, or-- >> So I think there's two sides to the coin. One, if you look at the traditional enterprise environments, absolutely financial services and healthcare 'cause they're both heavily regulated, therefore that data has very high value and is a very attractive target to the would-be hackers. If you look on the other end of the spectrum, though, the small to medium businesses that all rely on the internet for their business to run, they're the ones that are most susceptible because they don't have the budgets, the infrastructure, or the expertise to protect themselves from a sophisticated hacker. So we work across all verticals. Obviously the government is also very susceptible to cyber threats. But it's every industry, any business that's data-driven. I mean, everyone's been breached so many times, no one even knows how many times. I got to ask you about some cool trends we're reporting on here. Homomorphic encryption is getting a lot of traction here because financial services and healthcare are two-- >> Peter: Homomorphic? >> Homomorphic, yeah. Did I say that right? >> It's the first time I've ever heard that term, John. >> It's encryption at in use. So you have data at rest, data in flight, and data in use. So it's encryption when you're doing all your, protecting all your transactional data. So it's full implementation with Discovery. Intel's promoting it. We discovered a startup that's doing that, as well. >> Peter: Yeah, that's new for me, yeah. >> But it allows for more use cases. But data in use, not just motion, or in-flight, whatever they call it. >> Peter: I get it, yeah, static. >> So that's opening up these other thing. But it brings up the why, why that's important, and the reason is that financial services and healthcare, because they're regulated, have systems that were built many moons ago or generations ago. >> Absolutely. >> So there was none of these problems that you were mentioning earlier, like, they weren't built for that. >> Correct. >> But now you need more data. AI needs sharing of data. Sharing is a huge deal. >> Real-time sharing, too, right? >> Real-time sharing. >> And I think that's where the homomorphic encryption comes in. >> That's exactly right. So you mentioned that. So these industries, how can they maintain their existing operations and then get more data sharing? Do you have any insight into how you see that? Because that's one of those areas that's becoming like, okay, HIPAA, we know why that was built, but it's also restrictive. How do you maintain the purity of a process-- >> If your infrastructure is old? That is a challenge, healthcare especially, because, I mean, if I'm running a health system, every dollar that I have should really go into improving patient care, not necessarily into my IT infrastructure. But the more that every industry moves towards a real-time data-driven model for how we give care, right, the more that companies need to realize that data drives their business. They need to do everything they can to protect it and also ensure that they can recover it when and if a cyber attack happens. >> Well, I really appreciate the insight, and it's going to be great to see Dell Technologies World coming up. We'll dig into a lot of that stuff. While we're here and talking us about some of these financial services, banking, I want to get your thoughts. I've been hearing this term Sheltered Harbor being kicked around. What is that about? What does that mean? >> Sheltered Harbor, you're right, I think you'll hear a lot more about it. So Sheltered Harbor is a financial industries group and it's also a set of best practices and specifications. And really, the purpose of Sheltered Harbor is to protect consumer and financial institutions' data and public confidence in the US financial system. So the use case is this. You can imagine that a bank having a cyber attack and being unable to produce transactions could cause problems for customers of that bank. But just like we were talking about, the interconnectedness of the banking system means that one financial institution failing because of a cyber attack, it could trigger a cascade and a panic and a run on the US financial banks and therefore the global financial system. Sheltered Harbor was developed to really protect public confidence in the financial system by ensuring that banks, brokerages, credit unions are protecting their customer data, their account records, their most valuable assets from cyber attack, and that they can recover them and resume banking operations quickly. >> So this is an industry group? >> It's an industry group. >> Or is it a Dell group or-- >> No, Sheltered Harbor is a US financial industry group. It's a non-profit. You can learn more about it at shelteredharbor.org. The interesting thing for Dell Technologies is we're actually the first member of the Sheltered Harbor solution provider program, and we'll be announcing that shortly, in fact, this week, and we'll have a cyber recovery for Sheltered Harbor solution in the market very shortly. >> Cyber resilience, great topic, and you know, it just goes to show storage is never going away. The basic concepts of IT, recovery, continuous operations, non-disruptive operations. Cloud scale changes the game. >> Peter: It's all about the data. >> It's all about the data. >> Still, yes, sir. >> Thanks for coming on and sharing your insights. >> Thank you, John. >> RSA coverage here, CUBE, day two of three days of coverage. I'm John Furrier here on the ground floor in Moscone in San Francisco. Thanks for watching (electronic music)

Published Date : Feb 28 2020

SUMMARY :

brought to you by SiliconANGLE Media. It's cloud, it's the edge, the three waves of cloud, and the companies have and confidentiality of the data. and the idea of disruption Now, that's obviously the same and everything that happens in between, into the ecosystem, and you may or may not be So that's going out of the Dell family. and extending the cyber recovery solution but at the same time, The reality is that the modern threats So in their case, and, you know, Obviously all verticals are data-driven, that's right. or the expertise to protect themselves Did I say that right? It's the first time I've So you have data at rest, data But data in use, not just motion, and the reason is that financial that you were mentioning earlier, But now you need more data. the homomorphic encryption comes in. So you mentioned that. the more that companies need to realize and it's going to be great to see So the use case is this. of the Sheltered Harbor and you know, it just goes to show and sharing your insights. I'm John Furrier here on the ground floor

ENTITIES

Entity	Category	Confidence
Pete Gerr	PERSON	0.99+
Peter Gerr	PERSON	0.99+
$80	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
John	PERSON	0.99+
Dave Volante	PERSON	0.99+
Dell	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Peter	PERSON	0.99+
two sides	QUANTITY	0.99+
100%	QUANTITY	0.99+
Sheltered Harbor	ORGANIZATION	0.99+
$100 million	QUANTITY	0.99+
San Francisco	LOCATION	0.99+
Dell EMC	ORGANIZATION	0.99+
30,000 feet	QUANTITY	0.99+
three days	QUANTITY	0.99+
RSA	ORGANIZATION	0.99+
Moscone	LOCATION	0.99+
Oracle	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Dell Technologies	ORGANIZATION	0.99+
shelteredharbor.org	OTHER	0.99+
Unisys Stealth	ORGANIZATION	0.99+
CUBE	ORGANIZATION	0.99+
One	QUANTITY	0.99+
RSA Conference 2020	EVENT	0.99+
this week	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
VMware	ORGANIZATION	0.99+
five pillars	QUANTITY	0.98+
Office 365	TITLE	0.98+
EMC	ORGANIZATION	0.98+
over 20 years	QUANTITY	0.98+
Secureworks	ORGANIZATION	0.98+
both	QUANTITY	0.97+
first time	QUANTITY	0.97+
US	LOCATION	0.97+
Dell Technologies World	ORGANIZATION	0.97+
one	QUANTITY	0.96+
Dell Storage	ORGANIZATION	0.95+
HIPAA	TITLE	0.95+
NIST	ORGANIZATION	0.95+
RSA Conference 2020 San	EVENT	0.94+
this morning	DATE	0.91+
over a decade	QUANTITY	0.9+
one thing	QUANTITY	0.9+
Azure	ORGANIZATION	0.89+
RSAC	ORGANIZATION	0.84+
first member	QUANTITY	0.84+
Black	OTHER	0.84+
first	QUANTITY	0.83+
five years ago	DATE	0.81+
day two	QUANTITY	0.79+
Carbon Black	ORGANIZATION	0.78+
Intel	ORGANIZATION	0.76+
three waves	EVENT	0.71+
Discovery	ORGANIZATION	0.7+
RSA	TITLE	0.7+
Sheltered Harbor	OTHER	0.68+
PowerProtect	ORGANIZATION	0.65+
many moons	DATE	0.64+

Vittorio Viarengo, McAfee | RSAC USA 2020

>> Announcer: Live from San Francisco, it's theCUBE covering RSA Conference 2020, San Francisco. Brought to you by SiliconANGLE Media. >> Welcome back everybody, Jeff Frick with theCUBE. We're at RSA 2020. It's day four, it's Thursday. This is a crazy long conference, 40,000 people. Even with the challenges presented by coronavirus, and there's a lot of weird stuff going on, the team pulled it together, they went forward. And even though there was drops out here and there, I think all in all, most people will tell you, it's been a pretty successful conference. And we're excited to be joined by really one of the top level sponsors here, that's still here and still doing good things. It's Vittorio Viare... Viarengo, sorry, the new interim CMO of McAfee. >> Yeah. >> Vittorio, I just call you Vittorio all the time. I never look past your first name. Great to see you. >> Likewise. It's always a pleasure to be here with an institution of Silicon Valley-- >> Oh thank you, thank you. So interim CMO, I always think of like interim football coaches that they get pulled in halfway through the season, so the good news is you kind of got the job and all the responsibilities. The bad news is, you still have that interim thing, but you don't care, you just go to work, right? >> Now whenever you have an interim job, you have to just do the job and then that's the best way to operate. >> Yeah, so again, I couldn't help but go back and look at that conversation that we had at Xerox Parc, which is interesting. That's pretty foundational, everything that happens in Silicon Valley, and so many discoveries up there. And you touched on some really key themes in the way you manage your teams, but I think they're really much more valuable, and worth bringing back up again. And the context was using scrum as a way to manage people, but more importantly, what you said is it forced you as a leader to set first priorities and have great communication; and to continually do that on this two week pace, to keep everybody moving down the road. I think that is so powerful and so lacking unfortunately, in a lot of organizations today. >> Yeah, look, I think that when you hire smart people, if you just make sure that they understand what their priorities are, and then remove the obstacle and get out of the way, magical things happen. And I give you example that is very close to your heart. When I took over a great team at Skyhigh, that got bought by McAfee, they had content marketing down to a science, but they were lacking videos. So I brought that in. I said, "Guys, people watch videos, "people engage with videos, "we need to start telling the story through videos." And I started pushing, pushing, pushing, and then I pulled back, and these guys took it to a whole new level. And then they're doing videos, they're very creative, they are crisp. And I'm like, "Yeah, my job is done." >> It is really wild how video has become such an important way for education. I mean it used to be... I remember the first time I ever saw an engineer use Google to answer a question on writing code. I had never seen that before. I'm not a coder. Wow, I thought it was just for finding my local store or whatever. And now to see what really... I think YouTube has pushed people to expect that the answer to any question should be in a video. >> So, yesterday literally, somebody from a company I don't even know stopped me and said, "I watch you to videos on container. "Thank you very much." I was like, "What, you?" And the genesis of that was the sales people ask me, "Hey, we're selling container security and all that," but I don't even understand what containers are. Okay, sure. So I shot a video and I'm the CMO, I was the vice president. I think you have to put your face on your content. It doesn't matter how senior you are, you're not in a corner office, you're down there with the team. So I got into the studio, based on my background at VMware, I knew virtual machine, and I said, "Okay, how do you explain this "to somebody who's not technical?" And next thing you know, it makes its way out there, not just to our sales force, but to the market at large. That's fantastic. >> Right, and let me ask you to follow up on that because it seems like the world is very divergent as to those who kind of want their face, and more their personality to be part of their business culture and their business messaging, and those that don't. And you know, as part of our process, we always are looking at people's LinkedIn, and looking at people's Twitter. I get when people don't have Twitter, but it really surprises me when professionals, senior professionals within the industry aren't on LinkedIn. And is just like, wow! That is such a different kind of world. >> LinkedIn right now is... and I'm stealing this from Gary on the Chuck, as a big believer in this. LinkedIn right now is like Facebook 10 years ago. You get amazing organic distribution, and it's a crime not to use it. And the other thing is if you don't use it, how are you going to inspire your team to do the right thing? Modern marketing is all about organic distribution with a great content. If you're not doing it yourself... I grew up in a bakery. I used to look at my mom, we have a big bakery. We had eight people working, and I said, "Ma, why are you workin' so hard? "Your first day, last hour?" And she said, "Look, you cannot ask your people, "to work harder than you do." That was an amazing lesson. So it's not just about working hard, and harder than your team, it's about are you walking the walk? Are you doing the content? Are you doing the modern marketing things that work today, if you expect your people to also do it? >> Yeah, it's just funny 'cause, when we talk to them, I'm like, "If you don't even have a LinkedIn account, "we shouldn't even be talking to you "because you just won't get what we do. "You won't see the value, you won't understand it "and if you're not engaging at least "a little bit in the world then..." And then you look at people say like Michael Dell, I'll pick on or Pat Gelsinger who use social media, and put their personalities out there. And I think it's, people want to know who these people are, they want to do business with people that they they like, right? >> Absolutely. You know what's the worst to me? I can tell when an executive as somebody else manages their account, I can tell from a mile away. That's the other thing. You have to be genuine. You have to be who you are on your social and all your communication because people resonate with that, right? >> Right. All right, so what are you doing now? You got your new title, you've got some new power, you've got a great brand, leading brand in the industry, been around for a while, what are some of your new priorities? What's some of the energy that you're bringing in and where you want to to go with this thing? >> Well, my biggest priority right now is to get the brand and our marketing to catch up with what the products and the customers are already which is, Cloud, Cloud, Cloud. So when we spun off from Intel two years ago, we had this amazing heritage in the endpoint security. And then we bought Skyhigh, and Skyhigh was transformational for us because it became the foundation for us to move to become a cloud-first organization. And is in the process of becoming a cloud-first organization, and creating a business that is growing really fast. We also brought along the endpoint, which now is all delivered from the Cloud, to the cloud-first open unified approach, which is exciting. >> And we see Edge is just an extension of endpoints, I would assume. It just changes the game. >> Yeah, so if you think about today modern work gets done with the backend in the Cloud, and accessing those backends from the device, right? >> Right. >> And so, our strategy is to secure data where modern work gets done, and it's in the device, in the Cloud, and on the edge. Because data moves in and out of the Cloud, and that's kind of the edge of the Cloud. That's what we launched this week at RSA we launched Unified Cloud Edge, which is our kind of a, Gartner call's it SaaS-y, so that we are kind of the security. We believe we have the most complete and unified security part of the SaaS-y world. >> Okay, I just laugh at Gartner and the trough of disillusion men and Jeff and I always go back to a Mars law. Mar does not get enough credit for a Mars law. We've got a lot of laws, but Mars law, we tend to overestimate in the short term, the impact of these technologies, and they completely underestimate really the long tail of this technology improvements, and we see it here. So let's shift gears a little bit. When you have your customers coming in here, and they walk into RSA for the first time, how do you tell people to navigate this crazy show and the 5,000 vendors and the more kind of solutions and spin vocabulary, then is probably save for anyone to consume over three days? >> Look, security is tough because you look around and say, "You have six, 700 vendors here." It's hard to stand out from the crowd. So what I tell our customers is use this as a way to meet with your strategic vendors in the booth upstairs. That's where you conduct business and all that. And I walk around to see from the ground up, send your more junior team out there to see what's happening because some of these smaller companies that are out here will be the big transformational companies or the future like Skyhigh was three four years ago, and now we're part of McAfee, and leading the charge there. >> Yeah, just how do you find the diamond in the rough, right? >> Yeah. >> 'Cause there's just so much. But it's still the little guys that are often on the leading edge and the bleeding edge, of the innovation so you want to know what's going on so that you're kind of walking into the back corners of the floor as well. >> That's why I am lifelong learner, so I go around to see what people do from a marketing perspective because, the last thing I want to do, I want to become obsolete. (Jeff laughs) And the way you don't become obsolete is to see what the new kids on the block do and steal their ideas, steal their tactics take them to the next level. >> Right, so I want to ask you a sensitive question about the conference itself and the coronavirus thing and we all saw what happened in Mobile World Congress. I guess it just got announced today that Facebook pulled F8, their developer conference. We're in the conference business. You go to a lot of conferences. Did you have some thought process? There were some big sponsors that pulled out of this thing. How did you guys kind of approach the situation? >> It's a tough one. >> It's a really tough one. >> It's a very tough one 'cause last thing you want to do is to put your employees and your customers at risk. But the way we looked at it was there were zero cases of coronavirus in San Francisco. And we saw what the rest of the industry was doing, and we made the call to come here, give good advice to our employees, wash their hands, and usual and this too will pass. >> Yeah, yeah. Well Vittorio, it's always great to catch up with you. >> Likewise. >> I just loved the energy, and congratulations. I know you'll do good things, and I wouldn't be at all surprised if that interim title fades away like we see with most great coaches. >> Good. >> So thanks for stopping by. >> My pleasure. >> All right, he's Vittorio, I'm Jeff. You're watching theCUBE, we're at RSA 2020 in San Francisco. Thanks for watching, we'll see you next time. (upbeat music)

Published Date : Feb 28 2020

SUMMARY :

Brought to you by SiliconANGLE Media. and there's a lot of weird stuff going on, Vittorio, I just call you It's always a pleasure to be here so the good news is you kind of got the job you have to just do the job in the way you manage your teams, And I give you example that is very close to your heart. that the answer to any question should be in a video. I think you have to put your face on your content. Right, and let me ask you to follow up on that And the other thing is if you don't use it, "we shouldn't even be talking to you You have to be who you are and where you want to to go with this thing? and our marketing to catch up with what the products It just changes the game. and it's in the device, in the Cloud, and on the edge. security part of the SaaS-y world. and the 5,000 vendors and the more kind of solutions That's where you conduct business and all that. and the bleeding edge, of the innovation And the way you don't become obsolete is to see and we all saw what happened in Mobile World Congress. 'cause last thing you want to do Well Vittorio, it's always great to catch up with you. I just loved the energy, Thanks for watching, we'll see you next time.

ENTITIES

Entity	Category	Confidence
Jeff Frick	PERSON	0.99+
Pat Gelsinger	PERSON	0.99+
Jeff	PERSON	0.99+
Michael Dell	PERSON	0.99+
Facebook	ORGANIZATION	0.99+
YouTube	ORGANIZATION	0.99+
San Francisco	LOCATION	0.99+
McAfee	ORGANIZATION	0.99+
Vittorio	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
Skyhigh	ORGANIZATION	0.99+
Gartner	ORGANIZATION	0.99+
Thursday	DATE	0.99+
Vittorio Viarengo	PERSON	0.99+
yesterday	DATE	0.99+
Gary	PERSON	0.99+
two week	QUANTITY	0.99+
40,000 people	QUANTITY	0.99+
first time	QUANTITY	0.99+
RSA Conference 2020	EVENT	0.99+
Intel	ORGANIZATION	0.99+
LinkedIn	ORGANIZATION	0.99+
eight people	QUANTITY	0.99+
RSA	ORGANIZATION	0.99+
Mobile World Congress	EVENT	0.99+
six, 700 vendors	QUANTITY	0.98+
today	DATE	0.98+
SiliconANGLE Media	ORGANIZATION	0.98+
RSA 2020	EVENT	0.98+
first day	QUANTITY	0.98+
10 years ago	DATE	0.98+
5,000 vendors	QUANTITY	0.98+
this week	DATE	0.98+
over three days	QUANTITY	0.97+
coronavirus	OTHER	0.97+
Viarengo	PERSON	0.97+
one	QUANTITY	0.96+
first	QUANTITY	0.96+
two years ago	DATE	0.96+
VMware	ORGANIZATION	0.96+
Google	ORGANIZATION	0.95+
Twitter	ORGANIZATION	0.94+
theCUBE	ORGANIZATION	0.93+
three four years ago	DATE	0.91+
Mar	ORGANIZATION	0.91+
zero cases	QUANTITY	0.89+
Xerox Parc	ORGANIZATION	0.84+
first name	QUANTITY	0.83+
Mars	LOCATION	0.79+
Vittorio Viare	PERSON	0.79+
RSAC USA 2020	ORGANIZATION	0.78+
Cloud Edge	TITLE	0.77+
day four	QUANTITY	0.76+
first organization	QUANTITY	0.72+
a mile	QUANTITY	0.64+
RSA	TITLE	0.63+
F8	COMMERCIAL_ITEM	0.62+
Chuck	PERSON	0.54+
level	QUANTITY	0.52+
McAfee	PERSON	0.49+
Edge	TITLE	0.44+
Mars	TITLE	0.43+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Conference 2020: