Coco Brown, The Athena Alliance | CUBE Conversation, August 2020

>> Narrator: From theCube studios in Palo Alto in Boston, connecting with thought leaders all around the world, this is theCube Conversation. >> Hey, welcome back, everybody. Jeff Frick here with theCube. We're still on our Palo Alto studios, we're still getting through COVID and we're still doing all of our remotes, all of our interviews via remote and I'm really excited to have a guest we had around a long time ago. I looked it up is 2016, April 2016. She's Coco Brown, the founder and CEO of the Athena Alliance. Coco, it's great to see you. >> It's great to see you as well. We actually formally started in April of 2016. >> I know, I saw, I noticed that on LinkedIn. So we were at the Girls in Tech Catalyst Conference in Phoenix, I remembers was a really cool conference, met a ton of people, a lot of them have turned out that are on your board. So yeah, and you formally on LinkedIn, it says you started in May. So that was right at the very, very beginning. >> Yeah, that's right. >> So for people that aren't familiar with the at the Athena Alliance give them the quick overview. >> Okay. Well, it's a little different that it was four years ago. So Athena first and foremost is a digital platform. So you literally log in to Athena. And we're a combination of community access to opportunity and learning. And so you can kind of envision it a little bit like a walled garden around the LinkedIn, meets Khan Academy for senior executives, meets Hollywood agency for women trying to get into the boardroom and senior level roles in the c-suite as advisors, et cetera. And then the way that we operate is you can have a self-service experience of Athena, you can have a concierge experience with Athena with real humans in the loop making key connections for you and you can add accelerators where we build brand packages and BIOS and give you executive coaching. So... >> Wow. >> Kind of a... >> You've built out your services portfolio over the last several years. But still the focus >> yes, we have. is boards, right? Still the focus is getting women on public boards, or is that no longer still the focus? >> No, that's a big piece of it for sure. I mean, one of the things that we discovered, that was the very first mission of Athena, was to bring more women into the boardroom. And as we were doing that we discovered that once you get into a senior realm of leadership in general, there's more things that you want to do than just get into the boardroom. Some of it may be wanting to be an investor or an LP in a fund or become a CEO, or certainly join outside boards but also be relevant to your own inside board. And so we started to look at Athena as a more holistic experience for senior leaders who are attempting to make sure that they are the best they can be in this very senior realm of overarching stewardship of business. >> Awesome. and have you seen, so obviously your your focus shifted 'cause you needed to add more services based on the demand from the customers. But have you seen the receptiveness to women board members change over the last four years? How have you seen kind of the marketplace change? >> Yeah, it's changed a lot, I would say. First of all I think laws like the California law and Goldman Sachs coming out saying they won't take companies public unless they have diverse board data. The statements by big entities that people are paying attention to made the boardroom dynamics a conversation around the dinner table in general. So it became more of a common conversation and common interest as opposed to just the interest of a few people who are trying to get in there. And so that's created a lot of momentum as well as sort of thoughtfulness from leaders and from employees and from larger stakeholders to say the diversity at the top business has to mimic the demographics of society as a whole. And that's become a little bit more accepted as opposed to grudgingly sort of taken in. >> Right. So one of the big problems always it's like the VC problem, right? Is the whole matchmaking problem. How do you, how do qualified people find qualified opportunities? And I wonder if you can speak a little bit as to how that process has evolved, how are you really helping because there's always people that are looking for quality candidates, and there's great quality candidates out there that just don't know where to go. How are you helping bridge kind of that kind of basic matchmaking function? >> Yeah. I mean, there's a couple of different ways to go about it. One is certainly to understand and have real connections into the parts of the leadership ecosystem that influences or makes the decision as to who sits around that table. So that would be communities of CEOs, it's communities of existing board directors, it's venture capital firms, its private equity firms, and as you get really entrenched in those organizations and those ecosystems, you become part of that ecosystem and you become what they turn to to say, "Hey, do you know somebody?" Because it still is a "who do yo know" approach at the senior most levels. So that's one way. The other mechanism is really for individuals who are looking for board seats who want to be on boards to actually be thinking about how they proactively navigate their way to the kinds of boards that they would fit to. I like in a very much to the way our children go after the schools that they might want to when it's time for university. You'll figure out who your safeties, your matches, your reaches are, and figure out how you're going to take six degrees of separation and turn them into one through connections. So those are that's another way to go about it. >> You know, it's interesting, I talked to Beth Stewart from True Star, they also help place women on boards. And one of the issues is just the turnover. And I asked that just straight up, are there formal mechanisms to make sure that people who've been doing business from way before there were things like email and the internet eventually get swapped out. And she said, that's actually a big part of the problem is there isn't really a formal way to keep things fresh and to kind of rotate the incumbents out to enable somebody who's new and maybe has a different point of view to come in. So I'm curious when someone is targeting their A-list and B-list and C-lists, how do they factor in kind of the age of the board composition of the existing board, to really look for where there's these opportunities where a spot opens up, 'cause if there's not a spot open up clearly, there's really not much opportunity there. >> Yeah, I mean, you have to look at the whole ecosystem, right? I mean, there's anything from let's say series A, venture backed private companies all the way up to the mega cap companies, right? And there's this continuum. And it's not, there's not one universal answer to what you're talking about. So for example, if you're talking about smaller private companies, you're competing against, not somebody giving up their seat, but whether or not the company feels real motivation to fill that particular independent director seat. So the biggest competition is often that that seat goes unfilled. When you're talking about public companies, the biggest competition is really the fact that as my friend Adam Epstein of the small cap Institute will tell you, that 80% of public companies are actually small cap companies. And they don't have the same kinds of pressures that large caps do to have turnover. But yeah, it takes a big piece of the challenge is really boards having the disposition collectively to see the board as a competitive advantage for the business as a very necessary and productive piece of the business and when they see that then they take more proactive measures to make sure they have a evolving and strong board that does turnover as it needs to. >> Right. So I'm curious when you're talking to the high power women, right, who are in operational roles probably most of the time, how do you help coach them, how should they be thinking, what do they have to do different when they want to kind of add board seats to their portfolio? Very different kind of a role than an operational role, very different kind of concerns and day to day tasks. So, and clearly, you've added a whole bunch of extra things to your portfolio. So how do you help people, what do you tell women who say, "Okay, I've been successful, "I'm like successful executive, "but now I want to do this other thing, "I want to take this next step in my career"? What usually the gaps and what are the things that they need to do to prepare for that? >> Well, I'm going to circle in then land a little bit. Autodesk was actually a really great partner to us back when you and I first met. They had a couple of women at the top of the organization that were part of Athena, specifically because they wanted to join boards. They are on boards now, Lisa Campbell, Amy Bunszel, Debbie Clifford. And what they told us is they were experiencing everything that we were offering in terms of developing them, helping them to position themselves, understand themselves, navigate their way, was that they simply became better leaders as a result of focusing on themselves as that next level up, irrespective of the fact that it took them two to three years to land that seat. They became stronger in their executive role in general and better able to communicate and engage with their own boards. So I think, now I'm landing, the thing that I would say about that is don't wait until you're thinking oh, I want to join a board, to do the work to get yourself into that ecosystem, into that atmosphere and into that mindset, because the sooner you do that as an executive, the better you will be in that atmosphere, the more prepared you will be. And you also have to recognize that it will take time. >> Right. And the how has COVID impacted it, I mean, on one hand, meeting somebody for coffee and having a face to face is a really important part of getting to know someone and a big part of I'm sure, what was the recruitment process, and do you know someone, yeah, let's go meet for a cup of coffee or dinner or whatever. Can't do that anymore, but we can all meet this way, we can all get on virtually and so in some ways, it's probably an enabler, which before you could grab an hour or you didn't have to fly cross-country or somebody didn't have to fly cross-country. So I'm kind of curious in this new reality, which is going to continue for some time. How has that impacted kind of people's ability to discover and get to know and build trust for these very very senior positions. >> HBR just came out with a really great article about the virtual board meeting. I don't know if you saw it but I can send you a link. I think that what I'm learning from board directors in general and leaders in general is that yes, there's things that make it difficult to engage remotely, but there's also a lot of benefit to being able to get comfortable with the virtual world. So it's certainly, particularly with COVID, with racial equity issues, with the uncertain economy, boards are having to meet more often and they're having, some are having weekly stand ups and those are facilitated by getting more and more comfortable with being virtual. And I think they're realizing that you don't have to press flesh, as they say, to actually build intimacy and real connection. And that's been a hold up, but I think as the top leadership gets to understand that and feel that for themselves, it becomes easier for them to adopt it throughout the organization that the virtual world is one we can really embrace, not just for a period of time. >> It's funny we had John Chambers on early on in this whole process, really talking about leadership and leading through transition. And he used the example, I think had been that day or maybe a couple days off from our interview where they had a board meeting, I think they were talking about some hamburger restaurant, and so they just delivered hamburgers to everybody's office and they had the board meeting. But that's really progressive for a board to actually be doing weekly stand ups. That really shows a pretty transformative way to manage the business and kind of what we think is the stodgy old traditional get together now and then, fly and then get some minutes and fly out, that's super progressive. >> Yeah. I mean, I was on three different board meetings this week with a company I'm on the board of in Minnesota. And we haven't seen each other in person in, I guess since January. (woman laughs) >> So final tips for women that want to make this this move, who, they've got some breathing space, they're not homeschooling the kids all day while they're trying to get their job done and trying to save their own business, but have some cycles and the capabilities. What do you tell them, where should they begin, how should they start thinking about, kind of taking on this additional responsibility and really professional growth in their life? >> Well, I mean, I think something very important for all of us to think about with regard to board service and in general as we get into a very senior level point in our careers at a managing and impact portfolio. People get into a senior point and they don't just want to be an executive for one company, they want to have a variety of ways that they're delivering impact, whether it's as an investor or as a board member or as other things as well as being an operator. And I think the misnomer is that people believe that you have to add them up and they, one plus one plus one equals three, and it's just not true. The truth is that when you add a board seat, when you add that other thing that you're doing it makes you better as a leader in general. Every board meeting I have with [Indistinct] gives me more than I bring back to Athena as an example. And so I think we tend to think of not being able to take on one more thing and I say that we all have a little more space than we think we have to take on the things we want to do. >> Right? That's a good message to me. It is often said if you want to get something done, give it to the busiest person in the room. It's more likely to get it done 'cause you got to be efficient and you just have that kind of get it done attitude. >> That's right. >> All right, Coco. Well, thank you for sharing your thoughts. >> Congratulations, so I guess it's your four year anniversary, five year anniversary [Indistinct] about right? >> Yes, four. >> That's terrific. And we look forward to continuing to watch the growth and hopefully checking in face to face at some point in the not too distant future. >> I would like that. >> All right. Thanks a lot Coco. >> Great talking to you. >> Already. >> She's Coco, I'm Jeff. You're watching theCube. Thanks for watching, we'll see you next time. (upbeat music)

Published Date : Aug 3 2020

SUMMARY :

leaders all around the world, and I'm really excited to have It's great to see you as well. So yeah, and you formally on LinkedIn, So for people that aren't familiar and give you executive coaching. But still the focus or is that no longer still the focus? I mean, one of the things and have you seen, and from larger stakeholders to say And I wonder if you can speak a little bit and as you get really entrenched in those kind of the age of the board composition that large caps do to have turnover. that they need to do because the sooner you and get to know and build trust and feel that for themselves, for a board to actually And we haven't seen but have some cycles and the capabilities. that you have to add them up and you just have that Well, thank you for sharing your thoughts. in the not too distant future. Thanks a lot Coco. we'll see you next time.

ENTITIES

Entity	Category	Confidence
Lisa Campbell	PERSON	0.99+
Amy Bunszel	PERSON	0.99+
Adam Epstein	PERSON	0.99+
Coco Brown	PERSON	0.99+
two	QUANTITY	0.99+
Jeff Frick	PERSON	0.99+
Beth Stewart	PERSON	0.99+
Coco	PERSON	0.99+
Minnesota	LOCATION	0.99+
August 2020	DATE	0.99+
Athena Alliance	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
Jeff	PERSON	0.99+
80%	QUANTITY	0.99+
2016	DATE	0.99+
Debbie Clifford	PERSON	0.99+
April of 2016	DATE	0.99+
May	DATE	0.99+
Autodesk	ORGANIZATION	0.99+
January	DATE	0.99+
Khan Academy	ORGANIZATION	0.99+
Phoenix	LOCATION	0.99+
three	QUANTITY	0.99+
John Chambers	PERSON	0.99+
Goldman Sachs	ORGANIZATION	0.99+
three years	QUANTITY	0.99+
six degrees	QUANTITY	0.99+
Boston	LOCATION	0.99+
theCube	ORGANIZATION	0.99+
LinkedIn	ORGANIZATION	0.99+
April 2016	DATE	0.98+
four years ago	DATE	0.98+
an hour	QUANTITY	0.98+
one	QUANTITY	0.98+
Athena	ORGANIZATION	0.98+
this week	DATE	0.97+
Athena	LOCATION	0.97+
First	QUANTITY	0.96+
first	QUANTITY	0.96+
small cap Institute	ORGANIZATION	0.95+
five year anniversary	QUANTITY	0.95+
four year anniversary	QUANTITY	0.95+
one company	QUANTITY	0.95+
one more thing	QUANTITY	0.95+
HBR	ORGANIZATION	0.94+
The Athena Alliance	ORGANIZATION	0.93+
True Star	ORGANIZATION	0.93+
first mission	QUANTITY	0.9+
one way	QUANTITY	0.89+
COVID	OTHER	0.88+
last four years	DATE	0.87+
COVID	ORGANIZATION	0.87+
four	QUANTITY	0.87+
series A	OTHER	0.86+
California	LOCATION	0.84+
Hollywood	ORGANIZATION	0.84+
Girls in Tech Catalyst Conference	EVENT	0.81+
one of	QUANTITY	0.77+
years	DATE	0.74+
three different board meetings	QUANTITY	0.74+
Conversation	EVENT	0.7+
last	DATE	0.66+
days	QUANTITY	0.64+
couple	QUANTITY	0.6+
people	QUANTITY	0.53+
COVID	TITLE	0.48+

Dan Sonke, Campbell Soup and David Sypnieski, Athena Intelligence - Food IT 2017 - #FoodIT #theCUBE

>> Announcer: Live from the Computer History Museum, in the heart of Silicon Valley, it's theCUBE, covering Food IT: Fork to Farm. Brought to you by Western Digital. >> Hi, welcome back, I'm Lisa Martin with theCUBE, we are at the Farm IT event. This is an incredible opportunity to talk with folks that are experts in agriculture, food and agriculture, academia, farmers, producers, those all across the food chain. The theme of this event is Fork to Farm, and I'm excited to be joined by my next two guests, we have Dan Sonke, the Director of Sustainable Agriculture from Campbell's Soup, welcome. >> Thank you. >> And you can't say this, but Dan has Campbell Soup tennis shoes on and they're awesome. And David Sypnieski, the Founder and CEO of Athena Intelligence, welcome gentlemen. >> Thank you. >> Thank you, good to be here. >> So this has been, before we went on we were kind of talking about kind of my thoughts on Ag-Tech, and this is a really interesting and unique opportunity for theCUBE, to really look at the influences of Big Data and Analytics, Cloud Computing, Open-source Software, Blockchain, and how this all can be very influential across the food chain and you know, from the event's theme perspective, it's really been a lot this morning, talking about the tech-enabled food consumer really driving a lot of this change, expectation-wise. But Dan, first question to you, knowing, growing up on Campbell's Soup as a kid, founded in 1869, how is Campbell's Soup taking action to implement not only support-sustainable agriculture, but also, what were the drivers? >> Well, we definitely see consumers driving interest in where the food comes from, where ingredients that go into Campbell's Soup come from. We, a few years ago, decided that we wanted to be a company that makes real food that matters for life's moments, so that's our mission, that's our purpose, and so we want to connect to consumers with the information that supports that claim, that the food is trustworthy, that it's authentic, and that it resonates with the emotional side of how it's consumed in families, and the moments that matter. >> And also probably from a branch perspective, this is a historic brand in the United States, and that's probably quite important to meet those needs. >> Absolutely, we want to we the most transparent food company, we want to be open and honest with our consumers, and satisfy their desire for real food. >> So talk to us about kind of the genesis of the sustainability in agriculture at Campbell, when did that start? And really, besides the consumers, maybe some on the customer side, who was really driving this initiative? >> Well, we drive it internally, so six years ago, we decided to venture into sustainable agriculture in a formal way. We did a stakeholder assessment, so we talked to customers, we talked to investors, we talked to farmers, suppliers, folks inside the company, outside the company, North America, Europe, Australia, and asked them a series of questions, and said where should we focus, what are the crops, what are the subject areas we should focus on in agriculture sustainability? And we came up with a focus on tomatoes and other vegetables that people think of when they think of Campbell's Soup, we're largely a vegetable nutrition, and whole-grain nutrition company, so we wanted to focus there. And we focused on water, fertilizer, greenhouse gases, soil and pesticides, so that was our focus area, and we really took a measure-to-manage approach, so intentionally going to farmers, starting with tomatoes, with a limited set of questions that capture a lot of information and would be information growers would have, so we asked them how much water did you apply to make the crop, how much fertilizer do you use, what was the irrigation system, what are some of the decision tools that you used to make informed decisions? And so we started collecting that data. We also started capturing the geographic locations of the fields, believing that the technology would come to enable us to put that together, and lo and behold, fast-forward five years, now we have five years of data. We've tracked some really great stuff that our farmers have done. For example, last year water use per pound of tomato grown, was down by 20% over our first year of tracking that data. >> Wow. >> Huge gains, and efficiency and, you know, especially since it's a California crop, that was in the period of a five-year drought, so very encouraging to see that growers can do that kind of thing, and very proud of our growers for doing that. >> Absolutely, and on the technology side, so we've got David here. Athena Intelligence, talk to us a little bit about the genesis of Athena Intelligence, and how your working in partnership with Campbell's Soup. >> Sure, so I've got a storied background in agricultural tech work with production, growers, ag-tech companies, processors like Campbell's and others. And several years ago I kind of realized the fact that while all of this technology is from Silicon Valley and around the world, it's starting to, kind of make it's way into agriculture. An assumption that everyone makes is that the data is ready to be used in some sort of technology. >> Right. >> Alright, so kind of the the running joke in the field is that, you know, that a lot of technology has built a lot of solutions that are desperately looking for a problem to solve. And the problem, while it sounds simple, it not so easy to put together. But the problem is that, as Campbell's Soup for example, was collecting all of that data, you know, the entire industry has never really been familiar with the structure of how do you actually use data in any kind of meaningful kind of data science or analytical way and so, just being able to compile it all from various different formats and sources was a burden, so while you had all this data, it actually couldn't be used at all. And so Athena Intelligence was about basically, me coming to the realization, and collaborating with Dan, and Campbell's has been a great partner of saying, you know, we're going to solve that one problem, the unglamourous, the unsexy, problem of building a piece of technology which can efficiently and automatically begin to clean up, and normalize, and standardized data sets from multiple different sources and-- >> And we're talking about like data from weather sources, sensors, satellite imagery-- >> Right, so it's a fusion of public and private data, so the public data, everything from satellite imagery to soil, to weather stations, river flows, 98 different attributes of the weather, and water-related data. And then of course all of the private data, both Campbell's internal processing data, and then all the data that they're collaborating with their suppliers so, it's a pretty broad assortment which comes from, I mean the formats are everything from a hand-written notebook, to a PDF, to Excel to-- >> Wow. >> It's all over the board. >> So this is really Big Data and Analytics, being able to bring and aggregate data from different sources, facilitate data discovery. >> We're making data efficient right now, because the problem is that it's so, it's such a laborious effort. You know, 90% of the time people are putting in, just trying to clean and organize it. >> Right. >> Leaving very little time to be able to analyze it, let alone make any decisions or collaborate on it. So we're addressing that 90% of the time that people spend on trying to put the stuff together in the first place. >> Okay so Dan, walk us through kind of a use-case example of how your implementing, or have implemented, Athena Intelligence software, and what some of the outcomes have been so far. >> Right, so the goal has been to take the quality data that comes in to our systems, and that is one area where we do use data historically quite a bit, we have tons of data on every load of tomatoes that comes into our processing plants. But then we're marrying that data to the publicly available weather, soil, water data, and the data that the growers report on sustainability practices. And the goal is to find the win, win, win, the win for the environment, the win for the farm profitability, and the win for Campbell's Soup quality, and sustainability drivers as well. And the example that we're currently pursuing is tomato solids, so that's an obscure term for most people, but it's a industry measurement of how much sugar is in the tomatoes basically. >> Okay. >> The solids of the tomatoes coming in, affect how they process into our ingredients, the higher solids, the easier they are for us to process, and the less energy it requires for us to do that. So it's a sustainability win as well. We already pay growers for higher solids. We know a few things that can generate higher solids on the farm, but we think there are more pieces of information that have been hiding in that Big Data set. So can we tease out what soils produce higher solids, or what irrigation practices drive higher solids, or whatever it is, so we're in the process right now. We've got a project going between our research innovation fund, Athena, and that's the target that we're going after this summer is to dig into five years of data, and find that win. >> Wow. So it sounds like Athena Intelligence has really enabled Campbell's Soup to become a data-driven company? >> Well, we certainly are a data-driven company, but this is extending the reach of the data outside the four walls of our factory-- >> And also into the farmer, so you're really enabling the farmers to embrace data, evaluate what they have. Have you seen any...? So one of the things we were talking about earlier today, or was being talked about was the labor shortages, as well as attrition. So you mentioned you know, things in ledgers and hard copy. Are you also seeing an influence maybe, that Campbell's having to your farmers, becoming much more, less paper-driven, and maybe more modern in terms of the way that they're collecting and storing data? >> Well, I can't say that we can take credit for that, but we certainly want to be one of the many voices at events such as this one, to be a beacon, calling the industry to solve this problem. David really mentioned it. The challenge is, growers don't have the resources to capture data easily. If they were you know, if that was their mindset, they'd probably be accountants and not farmers right? Farm they have, you know, they're in farming for all the attributes of a farm lifestyle, not a data-capture lifestyle. >> Right. >> So capturing that farm data, and making it easy for them to get the data into systems that they can then use, is one of my passions right? A lot of companies are out there saying, "Oh, we can create a platform that will help Campbell's "get information out of the farms." And I keep telling them, "No, if you create the system "that makes it easier for farmers to use their own data, "to get more efficient and more profitable, "they'll put the data in." >> Okay. >> That's not-- >> So you think that's really where the sweet spot is, and the next step is really-- >> And that's how we drive sustainability. >> Because if they, if the tools can help them with the data to make more informed decisions that's, that's what we want to get out of our sustainability programs, it's not just data for reports say, for Campbell's, it's how do we drive progress on the farm, and we do that by creating the systems that everybody can use more easily. >> Well, it's so neat to hear that a company that so many of us know and have grown up with, has evolved so much to be very focused, and have sustainability really, as a core, and it's also great to know that there are technologists out there that have that Ag-Tech experience, that are enabling companies to leverage the power of Big Data, so gentlemen, I want to thank you so much for stopping by theCUBE and sharing your insights with us, we wish you the best of luck, and look forward to seeing what happens in the next few years. >> Thank you. >> Thank you very much. >> My pleasure. And we want to thank you for watching theCUBE again, I'm Lisa Martin, and we are at the Farm IT event From Fork to Farm, or Food IT event. We will be back with some more great guests, so stick around. (techno music)

Published Date : Jun 28 2017

SUMMARY :

Brought to you by Western Digital. and I'm excited to be joined by my next two guests, the Founder and CEO of Athena Intelligence, across the food chain and you know, and so we want to connect to consumers and that's probably quite important to meet those needs. we want to be open and honest with our consumers, so intentionally going to farmers, starting with tomatoes, that was in the period of a five-year drought, Absolutely, and on the technology side, the data is ready to be used in some sort of technology. Alright, so kind of the so the public data, everything from satellite imagery being able to bring and aggregate data You know, 90% of the time people are putting in, to put the stuff together in the first place. and what some of the outcomes have been so far. Right, so the goal has been to take the quality data and that's the target that we're going after this summer to become a data-driven company? So one of the things we were talking about earlier today, Well, I can't say that we can take credit for that, and making it easy for them to get the data into systems and we do that by creating the systems and it's also great to know that there are I'm Lisa Martin, and we are at the Farm IT event

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Lisa Martin	PERSON	0.99+
David Sypnieski	PERSON	0.99+
Dan Sonke	PERSON	0.99+
Athena	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
90%	QUANTITY	0.99+
Europe	LOCATION	0.99+
20%	QUANTITY	0.99+
Athena Intelligence	ORGANIZATION	0.99+
Dan	PERSON	0.99+
five years	QUANTITY	0.99+
Australia	LOCATION	0.99+
Western Digital	ORGANIZATION	0.99+
five-year	QUANTITY	0.99+
Excel	TITLE	0.99+
United States	LOCATION	0.99+
1869	DATE	0.99+
first question	QUANTITY	0.99+
North America	LOCATION	0.99+
last year	DATE	0.99+
Campbell	ORGANIZATION	0.99+
Campbell's Soup	ORGANIZATION	0.99+
first year	QUANTITY	0.99+
Campbell	PERSON	0.97+
several years ago	DATE	0.96+
California	LOCATION	0.96+
one	QUANTITY	0.96+
one area	QUANTITY	0.95+
six years ago	DATE	0.95+
Campbell Soup	ORGANIZATION	0.95+
both	QUANTITY	0.95+
Farm IT	EVENT	0.95+
one problem	QUANTITY	0.95+
Fork to Farm	EVENT	0.95+
two guests	QUANTITY	0.94+
90% of	QUANTITY	0.93+
this morning	DATE	0.89+
few years ago	DATE	0.87+
98 different attributes	QUANTITY	0.86+
earlier today	DATE	0.85+
theCUBE	ORGANIZATION	0.85+
Food IT	EVENT	0.83+
Campbell's	ORGANIZATION	0.81+
first place	QUANTITY	0.75+
Fork to Farm,	EVENT	0.72+
#FoodIT	EVENT	0.71+
next few years	DATE	0.69+
Campbell Soup	PERSON	0.66+
this summer	DATE	0.65+
to Farm	TITLE	0.65+
time	QUANTITY	0.59+
tons	QUANTITY	0.58+
Food IT:	TITLE	0.57+
2017	DATE	0.45+
Museum	LOCATION	0.43+
Computer History	ORGANIZATION	0.42+
Fork	EVENT	0.41+

Coco Brown, The Athena Alliance | Catalyst Conference 2016

>> From Phoenix, Arizona, theCUBE. At Catalyst Conference, here's your host Jeff Frick. (soft music) >> Hey Jeff Frick here with theCUBE. We're in Phoenix, Arizona at the Girls in Tech Catalyst Conference. About 400 people. The fourth year of the conference. Really getting together, talking about women in tech issues. Something in the water, here in Phoenix. We were here two years ago at the Grace Hopper Celebration of Women and Computing, also just down the road. So we're happy to be here and really get a feel. And bring to you some of the leaders here, that are making things happen. We're really excited by our next guest, Coco Brown, the founder and CEO of the Alena Alliance, or Athena Alliance, excuse me, welcome. >> Thank you. >> So the Athena Alliance, what's it all about? >> Well so the Athena Alliance is an organization of executive women who've achieved great success in their careers. And they have vision collectively of women operating at their highest level of impact. And within the context of a business leadership realm, that highest level really is the boardroom. And so our mission is to help women obtain board seats and be successful in the role. >> So there's a lot of conversations about board. It seems to be kind of the new hot button topic about inequality. There's certainly ton of conversations about inequality and pay highlighted recently by the women's national soccer team, which got a lot of buzz. And I think everyone knows that conversation that's been going on for a while. But the boardroom conversation is kind of new. It's kind of bubbling up. Or at least that's my sense of it, that barely have cracked the surface in terms of historical numbers in getting women representation on boards. >> Yeah. >> Why does that continue to be a problem? Is it a pipeline issue? Is it a match making issue? Is it a networking issue? Is it just, I just don't know? What is the issue? >> It's not a pipeline issue. And so what's happened in this discussion is there were some, sort of, pretty notable examples of situations where women raised their hands and said, hey where are the women on these boards. And the response was, well where are the women? Which kind of created this energy around the topic a lot more strongly more recently. Which is to say, there are a lot of qualified women out there who would be great board directors. And yet the positions of board director are gate kept by largely men. This is just the circumstance. Men are the ones who back companies. They're their VCs, they're the founders, they're the CEOs. And within their networks, they don't have a lot of women. Executive women. Likewise, executive women tend to seek each other out too. So we're not in each other's realms. So a lot of the conversation has been around raising awareness to the issue. There's been great tracking of exactly where is the issue. And how are we making progress. And then there's been a lot of great organizations that have been helping women get ready for board positions, training them. And thirdly, there's a lot of great organizations out there who are, essentially, identifying qualified women, and cataloging them, putting them in data bases and saying, hey no excuses, here they are. But the key missing element and my feeling as to why the problem continues to persist, part of it is just time. It's just going to take time. But part of it is also, really networking, what you said. It is about networking. It is that the women who want these positions and who are qualified for these positions need to know the men who are looking for board directors. And when you actually connect, make those two connections happen, you get incredible success. And we're seeing it already. >> Or as the age old advice, it's not who you know, but who knows you. >> Yes! >> It used to always be the other way around. But it's really who knows you. And we live in such a time of personal branding and external communication via LinkedIn, Twitter, blogs, medium, however you choose to externalize your professional position. And it kind of gets intermingled with your personal position. There really is not much excuse, at least, to make the attempt, to get yourself out there. >> Exactly, it's why. So there's 16 of the speakers here at this conference, are Athena Alliance women. And part of the reason we're here, we're here because this is such a noble and important and fantastic event for us to participate in. The other reason we're here is because this is apart of our way of getting known too, right. Of becoming more visible. Of making our brand, personal brand known. So this is one of those key things about who knows you that we should and need to be doing. >> So how many Athena foundation women are in executive boards now? >> So Athena Alliance is relatively new. So we're just getting started. About 50% of, 47% of the women associated with Athena Alliance are already on boards. >> That's pretty good, 47%. >> Yes, largely those are non-profit boards. >> Okay. >> They also are on a fair number of advisory boards. And they're now looking for the private boards and corporate boards and they're looking for public boards as well. >> And do you see that as kind of a logical stepping stone between an advisory board, a non-profit board, potentially a private company board, a VC company and then to a larger public entity. Is that kind of? >> Yeah I see it two ways. On the one hand, it's stepping stones and on the other, we have a variety of careers. So let's take me for example. I ran and was an owner of a privately held company. We reached about 50 million dollars in revenue before I sold my ownership, moved on. I'm qualified for a certain kind of a board. I'm qualified for a private board of a certain type of growth, sort of trajectory or stage. Others like Yvonne, who you spoke with, she's qualified for public boards of a different size. So some of it is what we're qualified for and what we can really contribute to and some of it is stepping stones. So for example, advisory boards are a great stepping stone. You get absolutely zero board credit for being on an advisory board, 'cause it doesn't have fiduciary responsibilities. >> No fiduciary responsibility. >> Right. But it's incredible network experience. It's a great way to get to know CEOs, to get to know VCs, to make yourself known as a candidate for other aspects of that company. >> Where do you see the natural networking opportunities? 'Cause clearly there's networks that exist around where you went to school. There's networks around, increasingly alumni groups, within companies, especially a big company like an Intel or an HP, where you got these huge alumni groups, 'cause they've been around for so long. Where are some of the other natural alumni groups that then cross over that are going to allow rubbing of shoulders with the old school guy board members with some of these women that are trying to break through? >> Yeah it's interesting. I think that is a really good opportunity space because I do see that mostly, the networking pods, if you will, are within school alumni groups, or corporate alumni groups, or organizations that women belong to. But that are largely then just women organizations. Or maybe some industry organizations. And industry boards are a great way to make that connection point. But I don't think that women do have opportunities of overlap with men in those organizations and those networking communities. So the way it has to happen is, I think we have to make it happen. So it's almost like, creating mixers. We need some mixers, right? Male VCs mixed with Athena Alliance women. Let's get together. We actually have an event coming up like that. Where you can have some men and women in the same room. They get to get a sense of each other. Those you do start seeing more of that going on and it's kind of essential. >> 'Cause you really need that right? I mean, they are networks. And everything going on today is all about networks, whether it's IOT or social media or whatever. It's networks and they're all naturally bound by something but how do you get that overlap from one network to the other when there's not enough overlap to really make the activity that you're seeking. Of course, there's always CUBE alumni, which is a terrific network. So we'll use that as a founding point. >> Absolutely. Well and Dan Scholnick, who is a general partner with Trinity, he's on a number of boards. He's speaking at an event for the Athena Alliance on a panel coming up. And he's got board openings in the variety of boards that he's on. Those are the kinds of connections. Make opportunities for Dan to be in the same room as a number of these great women. I think we just have to create it. >> It's interesting, interesting. 'Cause it is all about the connection, right. You got to know people and you got to put the word out. Nobody ever got a board seat sending out a resume. I don't know. How many come from executive head hunters? I never got a job from executive head hunters. It's really more about who you know. >> And executive recruiters only actually fill about one to two percent of board seats. It's only the top companies with the deepest pockets or the greatest pressure that can do that. >> Okay so what are your priorities for the next six months, nine months, what are your top things your guys are working on at the Alliance? >> So we're relatively new, so big, big priority for us is funding. We're also scaling. So scaling is one of the important things. In other words, scaling our relationships with those VCs, with CEOs, and starting to create great linkages through these networking events. >> All right, well Coco, thank you for taking a few minutes. >> Thank you. >> Absolutely and good luck with the Alliance. It sounds like you guys are on your way. We see increasingly, we did a show at SAP in conjunction with MAKERS and they got a great movie about some of the women who just broke down barriers in advertising, fashion, finance, tech, et cetera. Meg Whitman, among many women highlighted there. And it's tough to break down that door. When the first one gets through, hopefully they leave a little space for somebody else to scooch in behind them. >> Yeah, yeah. >> Absolutely. All right, Jeff Frick here with Coco Brown. We are the Girls in Tech Catalyst Conference, Phoenix, Arizona. You're watching theCUBE. See you next time. (soft music)

Published Date : Apr 22 2016

SUMMARY :

here's your host Jeff Frick. And bring to you some of the leaders here, and be successful in the role. that barely have cracked the surface It is that the women Or as the age old advice, And it kind of gets intermingled And part of the reason we're here, About 50% of, 47% of the women associated are non-profit boards. for the private boards And do you see that as kind and on the other, we have for other aspects of that company. Where are some of the So the way it has to happen is, And everything going on Those are the kinds of connections. It's really more about who you know. It's only the top companies So scaling is one of the important things. you for taking a few minutes. about some of the women who We are the Girls in Tech

ENTITIES

Entity	Category	Confidence
Dan Scholnick	PERSON	0.99+
Coco Brown	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Meg Whitman	PERSON	0.99+
Yvonne	PERSON	0.99+
Phoenix	LOCATION	0.99+
Athena	ORGANIZATION	0.99+
Athena Alliance	ORGANIZATION	0.99+
Intel	ORGANIZATION	0.99+
Alena Alliance	ORGANIZATION	0.99+
Coco	PERSON	0.99+
Dan	PERSON	0.99+
HP	ORGANIZATION	0.99+
Athena Alliance	ORGANIZATION	0.99+
Phoenix, Arizona	LOCATION	0.99+
LinkedIn	ORGANIZATION	0.99+
fourth year	QUANTITY	0.99+
47%	QUANTITY	0.99+
about 50 million dollars	QUANTITY	0.99+
CUBE	ORGANIZATION	0.98+
one	QUANTITY	0.98+
today	DATE	0.97+
nine months	QUANTITY	0.97+
two ways	QUANTITY	0.97+
About 400 people	QUANTITY	0.97+
two years ago	DATE	0.97+
two connections	QUANTITY	0.97+
About 50%	QUANTITY	0.97+
Twitter	ORGANIZATION	0.97+
16 of the speakers	QUANTITY	0.96+
two percent	QUANTITY	0.94+
SAP	ORGANIZATION	0.93+
Grace Hopper Celebration of Women and Computing	EVENT	0.92+
Tech Catalyst Conference	EVENT	0.92+
first one	QUANTITY	0.91+
theCUBE	ORGANIZATION	0.89+
next six months	DATE	0.89+
Catalyst Conference 2016	EVENT	0.87+
Catalyst Conference	EVENT	0.87+
Alliance	ORGANIZATION	0.85+
one network	QUANTITY	0.85+
thirdly	QUANTITY	0.84+
zero board credit	QUANTITY	0.79+
Tech Catalyst Conference	EVENT	0.78+
MAKERS	ORGANIZATION	0.75+
Girls in	EVENT	0.74+
about one	QUANTITY	0.71+
Trinity	PERSON	0.69+
ton of	QUANTITY	0.61+
soccer team	ORGANIZATION	0.51+
in	EVENT	0.46+

James Forrester | AWS Summit New York 2022

(light music) >> Hello, welcome back everybody to theCUBE's coverage in New York City of AWS Summit 2022. I'm John Furrier, host of theCUBE. We had Dave Vellante, Lisa Martin here earlier. I'm going to wrap it up here with James Forrester, last interview of the day here in New York. Wish we would have had another day. It's a packed house, 10,000 people. James Forrester's the VP Worldwide Technical Leader for VMware's Cloud on AWS. On AWS is a big distinction. James, welcome to theCube. Thanks for coming on. >> Thank you so much, John. It's great to be here. >> So I think it's been like six years since the announcement of VMware's Cloud on AWS, which is a separate instance, separate hardware, but it's changed the game for VMware. You guys have done a lot of work, successful traction with customers. Clarified, I remember at that time, it really clarified VMware's Cloud play. Which then gave VMware more time to work on what it's doing now, which is, you know, using all their assets and their operations with Tanzu, Monterey, Cloud Native, Cross Cloud. What they call you guys call Cross Cloud, I call Super Cloud, action, a lot of stuff happening. So thanks for coming on. Okay. So first question is, what's the future look like for VMware's Cloud on AWS? >> Super bright, super bright. And there's a couple of great reasons for that. I think firstly, what we're seeing is that customers have now made enough progress in their cloud journeys. Many of them have chosen AWS and they're going full force. We're going to help them go faster. We're going to help them get there and get native to those adjacent services much quicker with more confidence and more resiliency. So it's a super exciting time to be doing what we do. >> You know, VMware has had a steady install base, okay. I mean basically it's like almost ingrained in the operations. What do you guys see as that next level step up function? Because you know, obviously Broadcom is buying VMware. Obviously that utility will be in place, but there's more. There's more there that customers can tap into. This is the promise of the cross-cloud. How do you talk about that when you got the AWS action? How does that all integrate? >> Yeah, absolutely. And of course, because so many customers are going to AWS on their own cloud journeys right now, what we get to have the conversation about is how they can get there more confidently. And so for customers who are just starting out, who are looking at their application portfolios, who have a ton of skilled IT professionals who they want to bring into that cloud journey, they can use the skills they already have. For those folks who are a little bit further along but they may be finding that refactoring their applications is more complex, more difficult that they anticipated, we give them a way of moving with confidence and with much less risk so they can do those cloud journeys that they anticipated. >> You know, James, I want to get your thoughts on what the state of the current situation is, vis-à-vis, your customers and your customers' appetite for AWS services. 'Cause one of the promises of the original deal was clarifying messaging but more importantly, customers can get the VMware Cloud and take advantage of the higher level services on AWS. What's the update there? What's the current state of the art? What's some of the patterns that you're seeing on the uptake of services and how they're working together? >> Yeah, it's a great call out. And honestly, one of the misconceptions that I address right out of the gate is that somehow going VMware Cloud takes you away from those services. It doesn't, it gets you closer to them. Full, direct, native access to all of those hundreds of great AWS services. So what we often find is that customers have their enterprise data, inside data workloads in their data centers. But what they want to do is get that up next to the AWS services that can use it, like Redshift and Athena and Glue. They can move those workloads right adjacent to those services to start using them right away. So it's a great way to look at the platform. >> So one of the observations that's pretty well understood right now by most people, I'd say 90%, if not more, not a hundred percent 'cause I've heard people like not get it, but it's pretty clear that the operating model for the the enterprise will be hybrid as a steady state. I don't think there's any debate on that unless you think there is. >> Do you feel the same- >> No debate. No debate. >> Okay. Hybrid's a steady state. What does that mean as clients start to think about edge and their data centers. 'Cause now the private cloud is back in the game. So I've heard people talk about private cloud, which we, I think we coined the term with Dave, Wikibon years ago, but it kind of went away because that was not the public cloud. So public cloud won, on premise didn't go away. We saw Amazon with Outpost. So now they're like, I can still have stuff on prem and run it in a cloud operations. So they're calling that private cloud, I think. So you're starting to hear the same things. What it means basically is that hybrid is winning. It's the standard. What does the hybrid environment look like from a VMware perspective as you guys look at that and have been building that out 'cause you have customers that are on premises. >> Yeah. >> Is it just to the cloud and back? Is it, is there any changes? Is there new connective tissue? Is there a glue layer? What's the operating model for VMware customers? >> Well, customers wanted those same benefits from public cloud agility, cost benefits, elasticity, innovation, sovereignty, sustainability, but they wanted to be able to do that everywhere. They wanted it in their data centers. They wanted it at the edge. And as you've pointed out public cloud delivered that for customers. AWS first out there delivering that for customers. Now with innovations like VMware Cloud and AWS outpost, we're able to bring that back into the data center. We're able to bring those same benefits of public cloud into the customers on-prem environment. And you're right. We see hybrid just rolling and rolling and being able to offer our solution across all of it. >> Yeah, we're big fans of VMware because theCube's 12 years old, we've been at every VMworld. Now they're calling it VMware Explorer, the events coming up. So the folks watching, plug for VMware Explorer, formerly VMworld, it's on the schedule. Content catalog just came out last week. It's looking pretty good. So put a plug out there. We'll be there with theCube, two sets. So you know, if you're going to VMworld, now Explorer go register, get up there. It's in San Francisco, always a great event. vSphere and vSAN, always great products. But you got Carbon Black, you got Security. So these things have all been working kind of pistons for VMware. Tanzu, I know Raghu and those guys are doing it. Craig McLuckie and team, they're working on that. You got Tanzu, you got Monterey. That's the new cloud native thing. How is that tracking vis-à-vis, the operating model of the the core engine, vSphere, vSAN and others. And then with the native services of Cloud. So you got AWS Cloud with VMware Cloud, vSphere, vSAN, Carbon Black, and Security. And then you got the Tanzu over here. How are those three things coming together? >> Well, the services that customers know and love first and foremost that they've been running the mission critical workloads on, vSphere, vSAN, NSX. What VMware cloud and AWS is, is a packaging together of those services. So customers don't have to configure it all themselves and do the heavy lifting. We manage and run it on their behalf. What we are adding to that most recently with Tanzu is now the ability to run containers within the same environment. 'Cause customers tell us they've got parts of their organization that are very much on vSphere VMs. Parts of their organization are moving to containers. We want be able to provide a single operating model, a single layer, a single way of managing all of that. No matter where it's deployed. >> You know, remember back in the day, when Raghu wasn't the CEO, Carl Eschenbach was there, Sanjay Poonen was there. Carl's now at Sequoia Capital, Raghu's a CEO. Sanjay's kind of looking for a next gig. I always said, why doesn't vSphere and NSX become that abstraction layer and commoditize the network so that white boxes and Dell and HP could all play in that layer? It just never happened yet. Is that something you guys talk about at all? Like, I mean in the, in the smokey room, in the execs, is that happening? What's the vision? >> Well, we always work backwards- from customers, right? (John laughing) And what customers are telling us is they want us to help them with that undifferentiated heavy lifting. So who knows where that could take us, but right now we're very focused on helping those customers move with confidence to the cloud. >> You didn't take the bait on that one. I appreciate that. (James laughing) Okay. So let's get some perspective. You're out with customers. What are the big things that you're seeing right now from your customers right now? 'Cause you look behind us here, 10,000 people at this event. This is not a no-show. This is not a throwaway event in, you know, somewhere in the corner of the world. This is New York City, only one summit. This is bigger than Snowflake Summit and that was packed. So from an event standpoint, this is pretty a big game statement here for AWS. These companies are not experiencing headwinds, they're changing. So what are your customers telling you around what they're looking at for the cloud native architecture? I mean obviously the digital transformation is continuing, obviously clouds here. And again, we were saying earlier, this is the first time in history that the cloud hyperscalers have been in market during a so-called downturn. So there's no other data. 2008, I wouldn't call 'em up and running. They were building, but AWS, Azure, others, these cloud players they're in market. And so you're starting to see kind of some data coming out saying, Hey, this thing's still working, the engine of innovation is cranking out and it's not slowing down the digital transformation. It might change the capital markets and valuations but it's not changing customers. That's what I'm hearing. Now, you probably would agree with that, right? >> James: I think that's exactly right. >> Okay. So let's stay with that. If you believe that, then it's like, okay, what are they doing? So what are customers doubling down on? What are some of the patterns you're seeing in the environment today that you could share with the audience? >> Yeah, so I think first and foremost is that steady transition to the cloud to deliver all of those benefits, agility, cost, elasticity, innovation, sovereignty, sustainability that hasn't gone away at all. In fact, it's only accelerated. With workloads like virtual desktops, which became so critical during COVID the need to be able to provide that kind of scalable elastic capacity has only increased. Now, coupled with that, most of these customers are already on a cloud journey. And while some folks may have had the luxury of letting that go a little bit more slowly, nowadays the urgency is pervasive across all of the industries that we get to talk to in New York. Everyone needs to go faster. Everybody's not seeing the progress that they expected that we think we can help them deliver. So the opportunity I think that's come out of COVID is more workloads, different use cases, disaster recovery, ransomware- >> Is that more of an awareness or reality or both? >> Both. Absolutely. >> Okay. So let me ask the next question. 'Cause this is a good conversation, I think. I agree a hundred percent. We're seeing the same exact thing. Now let's talk about how companies are thinking about the real opportunity that's emerged, which is refactoring the business model without actually changing the makeup of the organization per se, to take on new territories and potentially take over categories. >> James: Mm hmm. >> So I mean a data warehouse and a data cloud's kind of the same thing. Snowflake probably wouldn't like me saying that they're a data warehouse because they call themselves a data cloud, but it's kind of the same thing, just refactored on AWS. >> James: Yep. >> That's a super cloud. So that's an opportunity for everyone to do that in every vertical. How many customers are actually thinking that way and actually taking steps to pursue that, capture that opportunity? Or do you agree it's the opportunity? >> No, I think that that is an opportunity and I love that idea of super cloud in that what I think customers have started to realize, over the last couple of years in particular, is it's very difficult to take advantage of all of those great cloud services if your applications are still behind a whole lot of different layers of firewalls and so forth. So getting the application close to those services, in proximity to those services is that first step in modernization. Then it doesn't have to be a change the wings on the plane while it's flying conversation, which- >> John: Yeah. >> You know, is very risky for a lot of organizations. >> John: Exactly. >> It's a let's get the plane going a little bit faster. Let's get the plane going a little bit smoother, and let's get the plane to its destination with less risk. >> You know, James, that reminds me of the old school conversations of non disruptive operations. Remember those days? >> James: I do, yeah. >> Mostly around storage and, and servers. But that's what basically what you're saying. Transform while operating, right? >> James: Exactly. >> So this is, you can do both. You got to make time and it's a talent question too. I'd love to get your thoughts on how customers are thinking about who do you put on which task. 'Cause you want your A players on both areas. You don't want all your A players, what I hear, CSOs and CIOs telling me is that, I put all my A players on transformation, I got no one running the business. >> James: Mm hmm. >> So you got to kind of balance. That's a cultural team decision. >> It's a cultural team decision. It's also a skills marketplace decision. >> John: Yeah. >> And there's a practical reality to the skills that are available and how fast you can hire them. So a big part of the conversation that we have is when customers have existing skills sets, plug those into their transformation, plug those into their business outcomes. I like to use the phrase, "Let's make heroes out of IT" because they can be a much more critical player than they think they can be. Yeah, IT basically is not even around anymore. It's part of the organization. And then you have data science and data engineering coming in. So it's, you know, IT is not a department anymore, it's the company >> Exactly right. >> If you're kind of going down that road, yeah. >> Yeah. Alright, so final question. What's the biggest change you've seen and observed in your current year and a half? You know, we're coming out of COVID, knowing what was before, what sea change, what inflection point are we in now? How would you describe this current market? 'Cause again, we're kind of in a unique market. You know, you got crypto around the corner, people getting attracted to that, little bubbly obviously, reality of cloud and 2.0 or super cloud emerging. On premise is not going away. Edge exploding on the industrial side, especially with machine learning coming along. So this operating model is clearly in sight. What's the biggest observation you've noticed. >> I think it's the sense of urgency over the last couple of years in that most customers I talked to are no longer relaxed about the timing of delivering cloud capabilities to their organizations. Most customers are on sort of a transformation journey of their own and digital transformation and cloud transformation are absolutely fundamental to that. >> One more real quick follow up question if you don't mind, 'cause I appreciate your time. One of the things that's come up a lot in our conversations is the role of the ecosystem. Not only as a part of the business model but also validation of the enablement that cloud offers companies. You have an enabling platform, your ecosystem is well known. And so your customers are starting to develop ecosystems. So if the cloud model kind of trickles like downstream, ecosystem is kind of a proof of something. >> James: Mm hmm. >> What's your view of all this ecosystem discussion as we transform this next generation? >> Yeah, I think it touches on a couple of things. So obviously there is a technology ecosystem, which is evolving very rapidly in support of cloud and cloud transformation. But what's interesting, I think is the business ecosystem that's evolving around it. We're seeing our customers evolve their own businesses to assume that those cloud capabilities will be available to them. And if the cloud capabilities are not available to them in a timely fashion, then the ecosystem starts to have a domino effect. So the ecosystems are interdependent between business, and technology, and skills, and talent. And I think that's a great to be >> James Forrester, they're going to shut us down. The speakers are on, they're going to pull the plug. Thanks for being our last interview here in New York City and bringing us home. Really appreciate you taking the time to come on theCube. >> John, thanks so much. Great to be here, really enjoyed it. Okay. We are wrapping it up here in New York City. I'm John Ford with theCube, great day. For Lisa Martin, Dave Vellante, and the entire crew of theCube here on the ground. Live in person events are back. theCube hybrid, get online, check out our coverage there. The SiliconANGLE and thecube.net. I'm John Furrier signing off from New York City. See you next time. (light music)

Published Date : Jul 14 2022

SUMMARY :

last interview of the It's great to be here. but it's changed the game for VMware. and get native to those This is the promise of the cross-cloud. more difficult that they anticipated, of the original deal that I address right out of the gate is that the operating model No debate. cloud is back in the game. into the data center. of the the core engine, is now the ability to run containers and commoditize the to help them with that in history that the cloud What are some of the the need to be able to provide that kind of the organization per se, and a data cloud's kind of the same thing. and actually taking steps to pursue that, So getting the application for a lot of organizations. and let's get the plane to its of the old school conversations what you're saying. I got no one running the business. So you got to kind of balance. It's a cultural team decision. So a big part of the down that road, yeah. Edge exploding on the industrial side, are no longer relaxed about the timing One of the things that's come up a lot So the ecosystems are the time to come on theCube. Vellante, and the entire crew

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Lisa Martin	PERSON	0.99+
James	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Sanjay Poonen	PERSON	0.99+
AWS	ORGANIZATION	0.99+
John	PERSON	0.99+
Sanjay	PERSON	0.99+
John Furrier	PERSON	0.99+
New York	LOCATION	0.99+
John Ford	PERSON	0.99+
San Francisco	LOCATION	0.99+
New York City	LOCATION	0.99+
Dell	ORGANIZATION	0.99+
2008	DATE	0.99+
James Forr	PERSON	0.99+
90%	QUANTITY	0.99+
Dave	PERSON	0.99+
Carl Eschenbach	PERSON	0.99+
both	QUANTITY	0.99+
HP	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Both	QUANTITY	0.99+
10,000 people	QUANTITY	0.99+
Snowflake Summit	EVENT	0.99+
Broadcom	ORGANIZATION	0.99+
VMware	ORGANIZATION	0.99+
James Forrester	PERSON	0.99+
Craig McLuckie	PERSON	0.99+
first question	QUANTITY	0.99+
six years	QUANTITY	0.99+
Sequoia Capital	ORGANIZATION	0.99+
last week	DATE	0.99+
two sets	QUANTITY	0.99+
one	QUANTITY	0.99+
Outpost	ORGANIZATION	0.99+
Tanzu	PERSON	0.99+
Tanzu	ORGANIZATION	0.98+
both areas	QUANTITY	0.98+
Ja	PERSON	0.98+
first time	QUANTITY	0.98+
first	QUANTITY	0.98+
Cloud Native	ORGANIZATION	0.97+
vSAN	TITLE	0.97+
first step	QUANTITY	0.97+
hundred percent	QUANTITY	0.97+
One	QUANTITY	0.97+
AWS Summit	EVENT	0.96+
vSphere	TITLE	0.96+
three things	QUANTITY	0.95+

Data Power Panel V3

(upbeat music) >> The stampede to cloud and massive VC investments has led to the emergence of a new generation of object store based data lakes. And with them two important trends, actually three important trends. First, a new category that combines data lakes and data warehouses aka the lakehouse is emerged as a leading contender to be the data platform of the future. And this novelty touts the ability to address data engineering, data science, and data warehouse workloads on a single shared data platform. The other major trend we've seen is query engines and broader data fabric virtualization platforms have embraced NextGen data lakes as platforms for SQL centric business intelligence workloads, reducing, or somebody even claim eliminating the need for separate data warehouses. Pretty bold. However, cloud data warehouses have added complimentary technologies to bridge the gaps with lakehouses. And the third is many, if not most customers that are embracing the so-called data fabric or data mesh architectures. They're looking at data lakes as a fundamental component of their strategies, and they're trying to evolve them to be more capable, hence the interest in lakehouse, but at the same time, they don't want to, or can't abandon their data warehouse estate. As such we see a battle royale is brewing between cloud data warehouses and cloud lakehouses. Is it possible to do it all with one cloud center analytical data platform? Well, we're going to find out. My name is Dave Vellante and welcome to the data platform's power panel on theCUBE. Our next episode in a series where we gather some of the industry's top analysts to talk about one of our favorite topics, data. In today's session, we'll discuss trends, emerging options, and the trade offs of various approaches and we'll name names. Joining us today are Sanjeev Mohan, who's the principal at SanjMo, Tony Baers, principal at dbInsight. And Doug Henschen is the vice president and principal analyst at Constellation Research. Guys, welcome back to theCUBE. Great to see you again. >> Thank guys. Thank you. >> Thank you. >> So it's early June and we're gearing up with two major conferences, there's several database conferences, but two in particular that were very interested in, Snowflake Summit and Databricks Data and AI Summit. Doug let's start off with you and then Tony and Sanjeev, if you could kindly weigh in. Where did this all start, Doug? The notion of lakehouse. And let's talk about what exactly we mean by lakehouse. Go ahead. >> Yeah, well you nailed it in your intro. One platform to address BI data science, data engineering, fewer platforms, less cost, less complexity, very compelling. You can credit Databricks for coining the term lakehouse back in 2020, but it's really a much older idea. You can go back to Cloudera introducing their Impala database in 2012. That was a database on top of Hadoop. And indeed in that last decade, by the middle of that last decade, there were several SQL on Hadoop products, open standards like Apache Drill. And at the same time, the database vendors were trying to respond to this interest in machine learning and the data science. So they were adding SQL extensions, the likes Hudi and Vertical we're adding SQL extensions to support the data science. But then later in that decade with the shift to cloud and object storage, you saw the vendor shift to this whole cloud, and object storage idea. So you have in the database camp Snowflake introduce Snowpark to try to address the data science needs. They introduced that in 2020 and last year they announced support for Python. You also had Oracle, SAP jumped on this lakehouse idea last year, supporting both the lake and warehouse single vendor, not necessarily quite single platform. Google very recently also jumped on the bandwagon. And then you also mentioned, the SQL engine camp, the Dremios, the Ahanas, the Starbursts, really doing two things, a fabric for distributed access to many data sources, but also very firmly planning that idea that you can just have the lake and we'll help you do the BI workloads on that. And then of course, the data lake camp with the Databricks and Clouderas providing a warehouse style deployments on top of their lake platforms. >> Okay, thanks, Doug. I'd be remiss those of you who me know that I typically write my own intros. This time my colleagues fed me a lot of that material. So thank you. You guys make it easy. But Tony, give us your thoughts on this intro. >> Right. Well, I very much agree with both of you, which may not make for the most exciting television in terms of that it has been an evolution just like Doug said. I mean, for instance, just to give an example when Teradata bought AfterData was initially seen as a hardware platform play. In the end, it was basically, it was all those after functions that made a lot of sort of big data analytics accessible to SQL. (clears throat) And so what I really see just in a more simpler definition or functional definition, the data lakehouse is really an attempt by the data lake folks to make the data lake friendlier territory to the SQL folks, and also to get into friendly territory, to all the data stewards, who are basically concerned about the sprawl and the lack of control in governance in the data lake. So it's really kind of a continuing of an ongoing trend that being said, there's no action without counter action. And of course, at the other end of the spectrum, we also see a lot of the data warehouses starting to edit things like in database machine learning. So they're certainly not surrendering without a fight. Again, as Doug was mentioning, this has been part of a continual blending of platforms that we've seen over the years that we first saw in the Hadoop years with SQL on Hadoop and data warehouses starting to reach out to cloud storage or should say the HDFS and then with the cloud then going cloud native and therefore trying to break the silos down even further. >> Now, thank you. And Sanjeev, data lakes, when we first heard about them, there were such a compelling name, and then we realized all the problems associated with them. So pick it up from there. What would you add to Doug and Tony? >> I would say, these are excellent points that Doug and Tony have brought to light. The concept of lakehouse was going on to your point, Dave, a long time ago, long before the tone was invented. For example, in Uber, Uber was trying to do a mix of Hadoop and Vertical because what they really needed were transactional capabilities that Hadoop did not have. So they weren't calling it the lakehouse, they were using multiple technologies, but now they're able to collapse it into a single data store that we call lakehouse. Data lakes, excellent at batch processing large volumes of data, but they don't have the real time capabilities such as change data capture, doing inserts and updates. So this is why lakehouse has become so important because they give us these transactional capabilities. >> Great. So I'm interested, the name is great, lakehouse. The concept is powerful, but I get concerned that it's a lot of marketing hype behind it. So I want to examine that a bit deeper. How mature is the concept of lakehouse? Are there practical examples that really exist in the real world that are driving business results for practitioners? Tony, maybe you could kick that off. >> Well, put it this way. I think what's interesting is that both data lakes and data warehouse that each had to extend themselves. To believe the Databricks hype it's that this was just a natural extension of the data lake. In point of fact, Databricks had to go outside its core technology of Spark to make the lakehouse possible. And it's a very similar type of thing on the part with data warehouse folks, in terms of that they've had to go beyond SQL, In the case of Databricks. There have been a number of incremental improvements to Delta lake, to basically make the table format more performative, for instance. But the other thing, I think the most dramatic change in all that is in their SQL engine and they had to essentially pretty much abandon Spark SQL because it really, in off itself Spark SQL is essentially stop gap solution. And if they wanted to really address that crowd, they had to totally reinvent SQL or at least their SQL engine. And so Databricks SQL is not Spark SQL, it is not Spark, it's basically SQL that it's adapted to run in a Spark environment, but the underlying engine is C++, it's not scale or anything like that. So Databricks had to take a major detour outside of its core platform to do this. So to answer your question, this is not mature because these are all basically kind of, even though the idea of blending platforms has been going on for well over a decade, I would say that the current iteration is still fairly immature. And in the cloud, I could see a further evolution of this because if you think through cloud native architecture where you're essentially abstracting compute from data, there is no reason why, if let's say you are dealing with say, the same basically data targets say cloud storage, cloud object storage that you might not apportion the task to different compute engines. And so therefore you could have, for instance, let's say you're Google, you could have BigQuery, perform basically the types of the analytics, the SQL analytics that would be associated with the data warehouse and you could have BigQuery ML that does some in database machine learning, but at the same time for another part of the query, which might involve, let's say some deep learning, just for example, you might go out to let's say the serverless spark service or the data proc. And there's no reason why Google could not blend all those into a coherent offering that's basically all triggered through microservices. And I just gave Google as an example, if you could generalize that with all the other cloud or all the other third party vendors. So I think we're still very early in the game in terms of maturity of data lakehouses. >> Thanks, Tony. So Sanjeev, is this all hype? What are your thoughts? >> It's not hype, but completely agree. It's not mature yet. Lakehouses have still a lot of work to do, so what I'm now starting to see is that the world is dividing into two camps. On one hand, there are people who don't want to deal with the operational aspects of vast amounts of data. They are the ones who are going for BigQuery, Redshift, Snowflake, Synapse, and so on because they want the platform to handle all the data modeling, access control, performance enhancements, but these are trade off. If you go with these platforms, then you are giving up on vendor neutrality. On the other side are those who have engineering skills. They want the independence. In other words, they don't want vendor lock in. They want to transform their data into any number of use cases, especially data science, machine learning use case. What they want is agility via open file formats using any compute engine. So why do I say lakehouses are not mature? Well, cloud data warehouses they provide you an excellent user experience. That is the main reason why Snowflake took off. If you have thousands of cables, it takes minutes to get them started, uploaded into your warehouse and start experimentation. Table formats are far more resonating with the community than file formats. But once the cost goes up of cloud data warehouse, then the organization start exploring lakehouses. But the problem is lakehouses still need to do a lot of work on metadata. Apache Hive was a fantastic first attempt at it. Even today Apache Hive is still very strong, but it's all technical metadata and it has so many different restrictions. That's why we see Databricks is investing into something called Unity Catalog. Hopefully we'll hear more about Unity Catalog at the end of the month. But there's a second problem. I just want to mention, and that is lack of standards. All these open source vendors, they're running, what I call ego projects. You see on LinkedIn, they're constantly battling with each other, but end user doesn't care. End user wants a problem to be solved. They want to use Trino, Dremio, Spark from EMR, Databricks, Ahana, DaaS, Frink, Athena. But the problem is that we don't have common standards. >> Right. Thanks. So Doug, I worry sometimes. I mean, I look at the space, we've debated for years, best of breed versus the full suite. You see AWS with whatever, 12 different plus data stores and different APIs and primitives. You got Oracle putting everything into its database. It's actually done some interesting things with MySQL HeatWave, so maybe there's proof points there, but Snowflake really good at data warehouse, simplifying data warehouse. Databricks, really good at making lakehouses actually more functional. Can one platform do it all? >> Well in a word, I can't be best at breed at all things. I think the upshot of and cogen analysis from Sanjeev there, the database, the vendors coming out of the database tradition, they excel at the SQL. They're extending it into data science, but when it comes to unstructured data, data science, ML AI often a compromise, the data lake crowd, the Databricks and such. They've struggled to completely displace the data warehouse when it really gets to the tough SLAs, they acknowledge that there's still a role for the warehouse. Maybe you can size down the warehouse and offload some of the BI workloads and maybe and some of these SQL engines, good for ad hoc, minimize data movement. But really when you get to the deep service level, a requirement, the high concurrency, the high query workloads, you end up creating something that's warehouse like. >> Where do you guys think this market is headed? What's going to take hold? Which projects are going to fade away? You got some things in Apache projects like Hudi and Iceberg, where do they fit Sanjeev? Do you have any thoughts on that? >> So thank you, Dave. So I feel that table formats are starting to mature. There is a lot of work that's being done. We will not have a single product or single platform. We'll have a mixture. So I see a lot of Apache Iceberg in the news. Apache Iceberg is really innovating. Their focus is on a table format, but then Delta and Apache Hudi are doing a lot of deep engineering work. For example, how do you handle high concurrency when there are multiple rights going on? Do you version your Parquet files or how do you do your upcerts basically? So different focus, at the end of the day, the end user will decide what is the right platform, but we are going to have multiple formats living with us for a long time. >> Doug is Iceberg in your view, something that's going to address some of those gaps in standards that Sanjeev was talking about earlier? >> Yeah, Delta lake, Hudi, Iceberg, they all address this need for consistency and scalability, Delta lake open technically, but open for access. I don't hear about Delta lakes in any worlds, but Databricks, hearing a lot of buzz about Apache Iceberg. End users want an open performance standard. And most recently Google embraced Iceberg for its recent a big lake, their stab at having supporting both lakes and warehouses on one conjoined platform. >> And Tony, of course, you remember the early days of the sort of big data movement you had MapR was the most closed. You had Horton works the most open. You had Cloudera in between. There was always this kind of contest as to who's the most open. Does that matter? Are we going to see a repeat of that here? >> I think it's spheres of influence, I think, and Doug very much was kind of referring to this. I would call it kind of like the MongoDB syndrome, which is that you have... and I'm talking about MongoDB before they changed their license, open source project, but very much associated with MongoDB, which basically, pretty much controlled most of the contributions made decisions. And I think Databricks has the same iron cloud hold on Delta lake, but still the market is pretty much associated Delta lake as the Databricks, open source project. I mean, Iceberg is probably further advanced than Hudi in terms of mind share. And so what I see that's breaking down to is essentially, basically the Databricks open source versus the everything else open source, the community open source. So I see it's a very similar type of breakdown that I see repeating itself here. >> So by the way, Mongo has a conference next week, another data platform is kind of not really relevant to this discussion totally. But in the sense it is because there's a lot of discussion on earnings calls these last couple of weeks about consumption and who's exposed, obviously people are concerned about Snowflake's consumption model. Mongo is maybe less exposed because Atlas is prominent in the portfolio, blah, blah, blah. But I wanted to bring up the little bit of controversy that we saw come out of the Snowflake earnings call, where the ever core analyst asked Frank Klutman about discretionary spend. And Frank basically said, look, we're not discretionary. We are deeply operationalized. Whereas he kind of poo-pooed the lakehouse or the data lake, et cetera, saying, oh yeah, data scientists will pull files out and play with them. That's really not our business. Do any of you have comments on that? Help us swing through that controversy. Who wants to take that one? >> Let's put it this way. The SQL folks are from Venus and the data scientists are from Mars. So it means it really comes down to it, sort that type of perception. The fact is, is that, traditionally with analytics, it was very SQL oriented and that basically the quants were kind of off in their corner, where they're using SaaS or where they're using Teradata. It's really a great leveler today, which is that, I mean basic Python it's become arguably one of the most popular programming languages, depending on what month you're looking at, at the title index. And of course, obviously SQL is, as I tell the MongoDB folks, SQL is not going away. You have a large skills base out there. And so basically I see this breaking down to essentially, you're going to have each group that's going to have its own natural preferences for its home turf. And the fact that basically, let's say the Python and scale of folks are using Databricks does not make them any less operational or machine critical than the SQL folks. >> Anybody else want to chime in on that one? >> Yeah, I totally agree with that. Python support in Snowflake is very nascent with all of Snowpark, all of the things outside of SQL, they're very much relying on partners too and make things possible and make data science possible. And it's very early days. I think the bottom line, what we're going to see is each of these camps is going to keep working on doing better at the thing that they don't do today, or they're new to, but they're not going to nail it. They're not going to be best of breed on both sides. So the SQL centric companies and shops are going to do more data science on their database centric platform. That data science driven companies might be doing more BI on their leagues with those vendors and the companies that have highly distributed data, they're going to add fabrics, and maybe offload more of their BI onto those engines, like Dremio and Starburst. >> So I've asked you this before, but I'll ask you Sanjeev. 'Cause Snowflake and Databricks are such great examples 'cause you have the data engineering crowd trying to go into data warehousing and you have the data warehousing guys trying to go into the lake territory. Snowflake has $5 billion in the balance sheet and I've asked you before, I ask you again, doesn't there has to be a semantic layer between these two worlds? Does Snowflake go out and do M&A and maybe buy ad scale or a data mirror? Or is that just sort of a bandaid? What are your thoughts on that Sanjeev? >> I think semantic layer is the metadata. The business metadata is extremely important. At the end of the day, the business folks, they'd rather go to the business metadata than have to figure out, for example, like let's say, I want to update somebody's email address and we have a lot of overhead with data residency laws and all that. I want my platform to give me the business metadata so I can write my business logic without having to worry about which database, which location. So having that semantic layer is extremely important. In fact, now we are taking it to the next level. Now we are saying that it's not just a semantic layer, it's all my KPIs, all my calculations. So how can I make those calculations independent of the compute engine, independent of the BI tool and make them fungible. So more disaggregation of the stack, but it gives us more best of breed products that the customers have to worry about. >> So I want to ask you about the stack, the modern data stack, if you will. And we always talk about injecting machine intelligence, AI into applications, making them more data driven. But when you look at the application development stack, it's separate, the database is tends to be separate from the data and analytics stack. Do those two worlds have to come together in the modern data world? And what does that look like organizationally? >> So organizationally even technically I think it is starting to happen. Microservices architecture was a first attempt to bring the application and the data world together, but they are fundamentally different things. For example, if an application crashes, that's horrible, but Kubernetes will self heal and it'll bring the application back up. But if a database crashes and corrupts your data, we have a huge problem. So that's why they have traditionally been two different stacks. They are starting to come together, especially with data ops, for instance, versioning of the way we write business logic. It used to be, a business logic was highly embedded into our database of choice, but now we are disaggregating that using GitHub, CICD the whole DevOps tool chain. So data is catching up to the way applications are. >> We also have databases, that trans analytical databases that's a little bit of what the story is with MongoDB next week with adding more analytical capabilities. But I think companies that talk about that are always careful to couch it as operational analytics, not the warehouse level workloads. So we're making progress, but I think there's always going to be, or there will long be a separate analytical data platform. >> Until data mesh takes over. (all laughing) Not opening a can of worms. >> Well, but wait, I know it's out of scope here, but wouldn't data mesh say, hey, do take your best of breed to Doug's earlier point. You can't be best of breed at everything, wouldn't data mesh advocate, data lakes do your data lake thing, data warehouse, do your data lake, then you're just a node on the mesh. (Tony laughs) Now you need separate data stores and you need separate teams. >> To my point. >> I think, I mean, put it this way. (laughs) Data mesh itself is a logical view of the world. The data mesh is not necessarily on the lake or on the warehouse. I think for me, the fear there is more in terms of, the silos of governance that could happen and the silo views of the world, how we redefine. And that's why and I want to go back to something what Sanjeev said, which is that it's going to be raising the importance of the semantic layer. Now does Snowflake that opens a couple of Pandora's boxes here, which is one, does Snowflake dare go into that space or do they risk basically alienating basically their partner ecosystem, which is a key part of their whole appeal, which is best of breed. They're kind of the same situation that Informatica was where in the early 2000s, when Informatica briefly flirted with analytic applications and realized that was not a good idea, need to redouble down on their core, which was data integration. The other thing though, that raises the importance of and this is where the best of breed comes in, is the data fabric. My contention is that and whether you use employee data mesh practice or not, if you do employee data mesh, you need data fabric. If you deploy data fabric, you don't necessarily need to practice data mesh. But data fabric at its core and admittedly it's a category that's still very poorly defined and evolving, but at its core, we're talking about a common meta data back plane, something that we used to talk about with master data management, this would be something that would be more what I would say basically, mutable, that would be more evolving, basically using, let's say, machine learning to kind of, so that we don't have to predefine rules or predefine what the world looks like. But so I think in the long run, what this really means is that whichever way we implement on whichever physical platform we implement, we need to all be speaking the same metadata language. And I think at the end of the day, regardless of whether it's a lake, warehouse or a lakehouse, we need common metadata. >> Doug, can I come back to something you pointed out? That those talking about bringing analytic and transaction databases together, you had talked about operationalizing those and the caution there. Educate me on MySQL HeatWave. I was surprised when Oracle put so much effort in that, and you may or may not be familiar with it, but a lot of folks have talked about that. Now it's got nowhere in the market, that no market share, but a lot of we've seen these benchmarks from Oracle. How real is that bringing together those two worlds and eliminating ETL? >> Yeah, I have to defer on that one. That's my colleague, Holger Mueller. He wrote the report on that. He's way deep on it and I'm not going to mock him. >> I wonder if that is something, how real that is or if it's just Oracle marketing, anybody have any thoughts on that? >> I'm pretty familiar with HeatWave. It's essentially Oracle doing what, I mean, there's kind of a parallel with what Google's doing with AlloyDB. It's an operational database that will have some embedded analytics. And it's also something which I expect to start seeing with MongoDB. And I think basically, Doug and Sanjeev were kind of referring to this before about basically kind of like the operational analytics, that are basically embedded within an operational database. The idea here is that the last thing you want to do with an operational database is slow it down. So you're not going to be doing very complex deep learning or anything like that, but you might be doing things like classification, you might be doing some predictives. In other words, we've just concluded a transaction with this customer, but was it less than what we were expecting? What does that mean in terms of, is this customer likely to turn? I think we're going to be seeing a lot of that. And I think that's what a lot of what MySQL HeatWave is all about. Whether Oracle has any presence in the market now it's still a pretty new announcement, but the other thing that kind of goes against Oracle, (laughs) that they had to battle against is that even though they own MySQL and run the open source project, everybody else, in terms of the actual commercial implementation it's associated with everybody else. And the popular perception has been that MySQL has been basically kind of like a sidelight for Oracle. And so it's on Oracles shoulders to prove that they're damn serious about it. >> There's no coincidence that MariaDB was launched the day that Oracle acquired Sun. Sanjeev, I wonder if we could come back to a topic that we discussed earlier, which is this notion of consumption, obviously Wall Street's very concerned about it. Snowflake dropped prices last week. I've always felt like, hey, the consumption model is the right model. I can dial it down in when I need to, of course, the street freaks out. What are your thoughts on just pricing, the consumption model? What's the right model for companies, for customers? >> Consumption model is here to stay. What I would like to see, and I think is an ideal situation and actually plays into the lakehouse concept is that, I have my data in some open format, maybe it's Parquet or CSV or JSON, Avro, and I can bring whatever engine is the best engine for my workloads, bring it on, pay for consumption, and then shut it down. And by the way, that could be Cloudera. We don't talk about Cloudera very much, but it could be one business unit wants to use Athena. Another business unit wants to use some other Trino let's say or Dremio. So every business unit is working on the same data set, see that's critical, but that data set is maybe in their VPC and they bring any compute engine, you pay for the use, shut it down. That then you're getting value and you're only paying for consumption. It's not like, I left a cluster running by mistake, so there have to be guardrails. The reason FinOps is so big is because it's very easy for me to run a Cartesian joint in the cloud and get a $10,000 bill. >> This looks like it's been a sort of a victim of its own success in some ways, they made it so easy to spin up single note instances, multi note instances. And back in the day when compute was scarce and costly, those database engines optimized every last bit so they could get as much workload as possible out of every instance. Today, it's really easy to spin up a new node, a new multi node cluster. So that freedom has meant many more nodes that aren't necessarily getting that utilization. So Snowflake has been doing a lot to add reporting, monitoring, dashboards around the utilization of all the nodes and multi node instances that have spun up. And meanwhile, we're seeing some of the traditional on-prem databases that are moving into the cloud, trying to offer that freedom. And I think they're going to have that same discovery that the cost surprises are going to follow as they make it easy to spin up new instances. >> Yeah, a lot of money went into this market over the last decade, separating compute from storage, moving to the cloud. I'm glad you mentioned Cloudera Sanjeev, 'cause they got it all started, the kind of big data movement. We don't talk about them that much. Sometimes I wonder if it's because when they merged Hortonworks and Cloudera, they dead ended both platforms, but then they did invest in a more modern platform. But what's the future of Cloudera? What are you seeing out there? >> Cloudera has a good product. I have to say the problem in our space is that there're way too many companies, there's way too much noise. We are expecting the end users to parse it out or we expecting analyst firms to boil it down. So I think marketing becomes a big problem. As far as technology is concerned, I think Cloudera did turn their selves around and Tony, I know you, you talked to them quite frequently. I think they have quite a comprehensive offering for a long time actually. They've created Kudu, so they got operational, they have Hadoop, they have an operational data warehouse, they're migrated to the cloud. They are in hybrid multi-cloud environment. Lot of cloud data warehouses are not hybrid. They're only in the cloud. >> Right. I think what Cloudera has done the most successful has been in the transition to the cloud and the fact that they're giving their customers more OnRamps to it, more hybrid OnRamps. So I give them a lot of credit there. They're also have been trying to position themselves as being the most price friendly in terms of that we will put more guardrails and governors on it. I mean, part of that could be spin. But on the other hand, they don't have the same vested interest in compute cycles as say, AWS would have with EMR. That being said, yes, Cloudera does it, I think its most powerful appeal so of that, it almost sounds in a way, I don't want to cast them as a legacy system. But the fact is they do have a huge landed legacy on-prem and still significant potential to land and expand that to the cloud. That being said, even though Cloudera is multifunction, I think it certainly has its strengths and weaknesses. And the fact this is that yes, Cloudera has an operational database or an operational data store with a kind of like the outgrowth of age base, but Cloudera is still based, primarily known for the deep analytics, the operational database nobody's going to buy Cloudera or Cloudera data platform strictly for the operational database. They may use it as an add-on, just in the same way that a lot of customers have used let's say Teradata basically to do some machine learning or let's say, Snowflake to parse through JSON. Again, it's not an indictment or anything like that, but the fact is obviously they do have their strengths and their weaknesses. I think their greatest opportunity is with their existing base because that base has a lot invested and vested. And the fact is they do have a hybrid path that a lot of the others lack. >> And of course being on the quarterly shock clock was not a good place to be under the microscope for Cloudera and now they at least can refactor the business accordingly. I'm glad you mentioned hybrid too. We saw Snowflake last month, did a deal with Dell whereby non-native Snowflake data could access on-prem object store from Dell. They announced a similar thing with pure storage. What do you guys make of that? Is that just... How significant will that be? Will customers actually do that? I think they're using either materialized views or extended tables. >> There are data rated and residency requirements. There are desires to have these platforms in your own data center. And finally they capitulated, I mean, Frank Klutman is famous for saying to be very focused and earlier, not many months ago, they called the going on-prem as a distraction, but clearly there's enough demand and certainly government contracts any company that has data residency requirements, it's a real need. So they finally addressed it. >> Yeah, I'll bet dollars to donuts, there was an EBC session and some big customer said, if you don't do this, we ain't doing business with you. And that was like, okay, we'll do it. >> So Dave, I have to say, earlier on you had brought this point, how Frank Klutman was poo-pooing data science workloads. On your show, about a year or so ago, he said, we are never going to on-prem. He burnt that bridge. (Tony laughs) That was on your show. >> I remember exactly the statement because it was interesting. He said, we're never going to do the halfway house. And I think what he meant is we're not going to bring the Snowflake architecture to run on-prem because it defeats the elasticity of the cloud. So this was kind of a capitulation in a way. But I think it still preserves his original intent sort of, I don't know. >> The point here is that every vendor will poo-poo whatever they don't have until they do have it. >> Yes. >> And then it'd be like, oh, we are all in, we've always been doing this. We have always supported this and now we are doing it better than others. >> Look, it was the same type of shock wave that we felt basically when AWS at the last moment at one of their reinvents, oh, by the way, we're going to introduce outposts. And the analyst group is typically pre briefed about a week or two ahead under NDA and that was not part of it. And when they dropped, they just casually dropped that in the analyst session. It's like, you could have heard the sound of lots of analysts changing their diapers at that point. >> (laughs) I remember that. And a props to Andy Jassy who once, many times actually told us, never say never when it comes to AWS. So guys, I know we got to run. We got some hard stops. Maybe you could each give us your final thoughts, Doug start us off and then-- >> Sure. Well, we've got the Snowflake Summit coming up. I'll be looking for customers that are really doing data science, that are really employing Python through Snowflake, through Snowpark. And then a couple weeks later, we've got Databricks with their Data and AI Summit in San Francisco. I'll be looking for customers that are really doing considerable BI workloads. Last year I did a market overview of this analytical data platform space, 14 vendors, eight of them claim to support lakehouse, both sides of the camp, Databricks customer had 32, their top customer that they could site was unnamed. It had 32 concurrent users doing 15,000 queries per hour. That's good but it's not up to the most demanding BI SQL workloads. And they acknowledged that and said, they need to keep working that. Snowflake asked for their biggest data science customer, they cited Kabura, 400 terabytes, 8,500 users, 400,000 data engineering jobs per day. I took the data engineering job to be probably SQL centric, ETL style transformation work. So I want to see the real use of the Python, how much Snowpark has grown as a way to support data science. >> Great. Tony. >> Actually of all things. And certainly, I'll also be looking for similar things in what Doug is saying, but I think sort of like, kind of out of left field, I'm interested to see what MongoDB is going to start to say about operational analytics, 'cause I mean, they're into this conquer the world strategy. We can be all things to all people. Okay, if that's the case, what's going to be a case with basically, putting in some inline analytics, what are you going to be doing with your query engine? So that's actually kind of an interesting thing we're looking for next week. >> Great. Sanjeev. >> So I'll be at MongoDB world, Snowflake and Databricks and very interested in seeing, but since Tony brought up MongoDB, I see that even the databases are shifting tremendously. They are addressing both the hashtag use case online, transactional and analytical. I'm also seeing that these databases started in, let's say in case of MySQL HeatWave, as relational or in MongoDB as document, but now they've added graph, they've added time series, they've added geospatial and they just keep adding more and more data structures and really making these databases multifunctional. So very interesting. >> It gets back to our discussion of best of breed, versus all in one. And it's likely Mongo's path or part of their strategy of course, is through developers. They're very developer focused. So we'll be looking for that. And guys, I'll be there as well. I'm hoping that we maybe have some extra time on theCUBE, so please stop by and we can maybe chat a little bit. Guys as always, fantastic. Thank you so much, Doug, Tony, Sanjeev, and let's do this again. >> It's been a pleasure. >> All right and thank you for watching. This is Dave Vellante for theCUBE and the excellent analyst. We'll see you next time. (upbeat music)

Published Date : Jun 2 2022

SUMMARY :

And Doug Henschen is the vice president Thank you. Doug let's start off with you And at the same time, me a lot of that material. And of course, at the and then we realized all the and Tony have brought to light. So I'm interested, the And in the cloud, So Sanjeev, is this all hype? But the problem is that we I mean, I look at the space, and offload some of the So different focus, at the end of the day, and warehouses on one conjoined platform. of the sort of big data movement most of the contributions made decisions. Whereas he kind of poo-pooed the lakehouse and the data scientists are from Mars. and the companies that have in the balance sheet that the customers have to worry about. the modern data stack, if you will. and the data world together, the story is with MongoDB Until data mesh takes over. and you need separate teams. that raises the importance of and the caution there. Yeah, I have to defer on that one. The idea here is that the of course, the street freaks out. and actually plays into the And back in the day when the kind of big data movement. We are expecting the end And the fact is they do have a hybrid path refactor the business accordingly. saying to be very focused And that was like, okay, we'll do it. So Dave, I have to say, the Snowflake architecture to run on-prem The point here is that and now we are doing that in the analyst session. And a props to Andy Jassy and said, they need to keep working that. Great. Okay, if that's the case, Great. I see that even the databases I'm hoping that we maybe have and the excellent analyst.

ENTITIES

Entity	Category	Confidence
Doug	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
Tony	PERSON	0.99+
Uber	ORGANIZATION	0.99+
Frank	PERSON	0.99+
Frank Klutman	PERSON	0.99+
Tony Baers	PERSON	0.99+
Mars	LOCATION	0.99+
Doug Henschen	PERSON	0.99+
2020	DATE	0.99+
AWS	ORGANIZATION	0.99+
Venus	LOCATION	0.99+
Oracle	ORGANIZATION	0.99+
2012	DATE	0.99+
Databricks	ORGANIZATION	0.99+
Dell	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Holger Mueller	PERSON	0.99+
Andy Jassy	PERSON	0.99+
last year	DATE	0.99+
$5 billion	QUANTITY	0.99+
$10,000	QUANTITY	0.99+
14 vendors	QUANTITY	0.99+
Last year	DATE	0.99+
last week	DATE	0.99+
San Francisco	LOCATION	0.99+
SanjMo	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
8,500 users	QUANTITY	0.99+
Sanjeev	PERSON	0.99+
Informatica	ORGANIZATION	0.99+
32 concurrent users	QUANTITY	0.99+
two	QUANTITY	0.99+
Constellation Research	ORGANIZATION	0.99+
Mongo	ORGANIZATION	0.99+
Sanjeev Mohan	PERSON	0.99+
Ahana	ORGANIZATION	0.99+
DaaS	ORGANIZATION	0.99+
EMR	ORGANIZATION	0.99+
32	QUANTITY	0.99+
Atlas	ORGANIZATION	0.99+
Delta	ORGANIZATION	0.99+
Snowflake	ORGANIZATION	0.99+
Python	TITLE	0.99+
each	QUANTITY	0.99+
Athena	ORGANIZATION	0.99+
next week	DATE	0.99+

AWS Heroes Panel | AWS Startup Showcase S2 E2 | Data as Code

>>Hi, everyone. Welcome to the cubes presentation of the AWS startup showcase the theme. This episode is data as code, and this is season two, episode two of the ongoing series covering exciting startups from the ecosystem in cloud and the future of data analytics. I'm your host, John furry. You're getting great featured panel here with AWS heroes, Lynn blankets, the CEO of Lindbergh Lega consulting, Peter Hanson's, founder of cloud Cedar and Alex debris, principal of debris advisory. Great to see all of you here and, uh, remotely and look forward to see you in person at the next re-invent or other event. >>Thanks for having us. >>So Lynn, you're doing a lot of work in healthcare, Peter you're in the middle of all the action as data as code Alex. You're in deep on the databases. We've got a good round up of, of topics here ranging from healthcare to getting under the hood on databases. So as we'll start with you, what are you working on right now? What trends do you see in the database space? >>Yeah, sure. So I do, uh, I do a lot of consulting work working with different people and, you know, often with, with dynamo DB or, or just general serverless technology type stuff. Um, if you want to talk about trends that I'm seeing right now, I would say trends you're seeing as a lot, just more serverless native databases or cloud native databases where you're seeing these cool databases come out that really take advantage of, uh, this new cloud environment, right? Where you have scalability, you have plasticity of the clouds. So you're not having, you know, instant space environments anymore. You're paying for capacity, you're paying for throughput. You're able to scale up and down. You're not managing individual instances. So a lot of cool stuff that we're seeing, you know, um, with this new generation of, of infrastructure and in particular database is taking advantage of this, this new cloud world >>And really lot deep into the database side in terms of like cloud native impact, diversity of database types, when to use certain databases that also a big deal. >>Yeah, absolutely. I like, I totally agree. I love seeing the different types of databases and, you know, AWS has this whole, uh, purpose-built database strategy. And I think that, that makes a lot of sense. Um, you know, I want to go too far with it. I would, I would more think about purpose-built categories and things like that, you know, specialize in an OLTB database within your, within your organization, whether that's dynamo DB or document DB or relational database Aurora or something like that. But then also choose some sort of analytics database, you know, if it's drew it or Redshift or Athena, and then, you know, if you have some specialized needs, you want to show some real time stuff to your users, check out rock site. If you want to, uh, you know, do some graph analytics, fraud detection, checkout tiger graph, a lot of cool stuff that we're seeing from the startup showcase here. >>Looking forward to unpacking that Lynn you've been in love now, a healthcare action with cloud ops, the pandemic pushes hard core on everybody. What are you working on? >>Yeah, it's all COVID data all the time. Uh, before the pandemic, I was supporting research groups for cancer genomics, which I still do, but, um, what's, uh, impactful is the explosive data volumes. You know, when you there's big data and there's genomic data, you know, I've worked with clients that have broken data centers, broken public cloud provider data centers because of the daily volume they're putting in. So there's this volume aspect. And then there's a collaboration, particularly around COVID research because of pandemic. And so you have this explosive volume, you have this, um, need for, uh, computational complexity. And that means cloud the challenge is it, you know, put the pedal to the metal. So you've got all these bioinformatics researchers that are used to single machine. Suddenly they have to deal with distributed compute. So it's a wild time to be in this space. >>What was the big change that you've seen with the, uh, the pandemic and in genomic cloud genomic specifically what's the big change has happened. >>The amount of data that is being put into the public cloud, um, previously people would have their data on their local, uh, capacity, and then they would publish their paper and the data may or may not become available for, uh, reproducing the research, uh, to accelerate for drug discovery and even variant identification. The data sets are being pushed to public cloud repositories, which is a whole new set of concerns. You have not only dealing with the volume and cost, but security, you know, there's federated security is non-trivial and not well understood by this domain. So there's so much work available here. >>Awesome. Peter, you're doing a lot with the data as a platform kind of view and platform engineering data as code is, is something that's being kicked around. What are you working on and how does platform engineering change as data becomes so much more prevalent in its value proposition? >>Yeah. So I'm the founder of cloud Cedar and, um, we sort of built this company out, this consultancy all around the challenges that a lot of companies have got with getting their data sorted, getting it organized, getting it ready for other use cases, such as analytics and machine learning, um, AI workloads and the like. So typically a platform engineering team will look after the organization of a company infrastructure, making sure that it's coherent across the company and a data platform, engineering teams doing something similar in that sense where they're, they're looking at making sure that, uh, data teams have a solid foundation to build upon, uh, that everything's quite predictable and what that enables is a faster velocity and the ability to use data as code as a way of specifying and onboarding data, building that, translating it, transforming it out into its specific domains and then on to data products. >>I have to ask you while you're here. Um, there's a big trend around data meshes right now. You're hearing, we've had a lot of stuff on the cube. Um, what are practical that people are using data mesh, first of all, is it relevant and how are people looking at this data mesh conversation? >>I think it becomes more and more relevant, uh, the bigger the organization that you're dealing with. So, you know, often times in the enterprise, you've got, uh, projects with timelines of five to 10 years often outlasting technology life cycles. The technology that you're building on is probably irrelevant by the time that you complete it. And what we're seeing is that data engineering teams and data teams more broadly, this organizational bottleneck and data mesh is all about, uh, breaking down that, um, bottleneck and decentralizing the work, shifting that work back onto, uh, development teams who oftentimes have got more of the context and a centralized data engineering team. And we're seeing a lot of, uh, Philocity increases as a result of that. >>It's interesting. There's so many different aspects of how data is changing the world. Lynn talks about the volume with the cloud and genomics. We're hearing data engineering at a platform level. You're talking about slicing and dicing and real-time information. You mentioned rock set, Alex. So I'd like to ask each of you to answer this next question, which is how has the team dynamics changed with data engineering because every single company's impacted. So if you're researchers, Lynn, you're pumping more data into the cloud, that's got a little bit of data engineering to it. Do they even understand that is that impacting them? So how has data changed the responsibilities or roles in this new emerging area of data engineering or whatever you want to call it? Lynn, we'll start with you. What do you, what do you see this impact? >>Well, you know, I mean, dev ops becomes data ops and ML ops and, uh, you know, this is a whole emergent area of work and it starts with an understanding of container technologies, which, you know, in different verticals like FinTech, that's a given, right, but in bioinformatics building an appropriately optimized Docker container is something I'm still working with customers now on because they have the concept of a Docker container is just a virtual machine, which obviously it isn't, or shouldn't be. So, um, you have, again, as I mentioned previously, this humongous skill gap, um, concepts like D, which are prevalent in ad tech FinTech, that's not available yet for most of my customers. So those are the things that I'm building. So the whole ops space is, um, this a wide open area. And really it's a question of practicality. Um, you know, I have, uh, a lot of experience with data lakes and, you know, containerizing and using the data lake platform. But a lot of my customers are going to move to like an interim pass based solutions. If they're using spark, for example, they might use to use a managed spark solution as an interim, um, step up to the cloud before they build their own containers. Because the amount of knowledge to do that effectively is non-trivial >>Peter, you mentioned data, you mentioned data lakes, onboarding data into lake house architectures, for instance, something that you're familiar with. Um, this is not obvious to some verticals obvious to others. What do you see this data engineering impact from a personnel standpoint? And then ultimately how things get built, >>You know, are you directing that to me, >>Peter? >>Yeah. So I think, um, first and foremost, you know, the workload that data engineering teams are dealing with is ever increasing. Usually there's a 10 X ratio of, um, software engineers to data engineers within a business and usually double the amount of analysts to data engineers again. And so they're, they're fighting it ever increasing backload. And, uh, so they're fighting an ever increasing backlog of, of, uh, tasks to do and tickets to, to, to churn through. And so what we're seeing is that data engineering teams are becoming data platform engineering teams where they're building capability instead of constantly hamster wheels spinning if you will. And so with that in mind, with onboarding data into, uh, a Lakehouse architecture or a data lake where data engineering teams, uh, uh, getting wins is developing a very good baseline of structure where they're getting the categorization, the data tagging, whether this data is of a particular domain, does it contain some, um, PII data, for instance, uh, and, and, and, and then the security aspects, and also, you know, the mechanisms on which to do the data transformations, >>Alex, on the database side, those are known personas in an enterprise, a them, the database team, but now the scale is so big. Um, and there's so much going on in databases. How does the data engineering impact organizations from your standpoint? >>Yeah, absolutely. I think definitely, you know, gone are the days where you have a single relational database that is serving operational queries for your users, and you can also serve analytics queries, you know, for your internal teams. It's, it's now split up into those purpose-built databases, like we've said. Uh, but now you've got two different teams managing it and they're, they're designing their data model for different things. You know? So L LLTP might have a more de-normalized model, something that works for very fast operations and it's optimized for that, but now you need to suck that data out and get it elsewhere so that your, your PM or your business analyst, or whoever can crunch through some of that. And, you know, now it needs to be in a more normalized format. How do you sort of bridge that gap? That's a tough one. I think you need to, you know, build empathy on each side of, of what each side is doing and, and build the tools to say, Hey, this is going to help you, uh, you know, LLTP team, if we know what, what users are actually doing, and, and if you can get us into the right format there, so that then I can, you know, we can analyze it, um, on the backend. >>So I think, I think building empathy across those teams is helpful. >>When I left to come back to, you mentioned a health and informatics is coming back. Um, but it's interesting, you know, I look at a database world and you look at the solutions that are out there. A lot of companies that build data solutions don't have a data problem. They've never, they're not swimming in a lot of data, but then you look at like the field that you're working in right now with the genomics and health and, and quantum, they're always, they're dealing with data all the time. So you have people who deal with a lot of data all the time are breaking through New Zealand. People who are don't have that experience are now becoming data full, right? So people are now either it's a first time problem, or they've always been swimming in a ton of data. So it's more of what's the new playbook. And then, wow, I've never had to deal with a lot of data before. What's your take? >>It's interesting. Cause they know, uh, bioinformatics hires, um, uh, grad students. So grad students, you know, use their, our scripts with their file on their laptop. And so, um, to get those folks to understand distributed container-based computing is like I said, a not non-trivial problem. What's been really interesting with the money pouring in to COVID research is when I first started, some of the workflows would take, you know, literally 500 hours and that was just okay. And coming out of FinTech, I was, uh, I could, I was blown away like FinTech is like, could that please take a millisecond rather than a second? Right. And so what has now happened, which makes it, you know, like I said, even more fun to work in this domain is, uh, the research dollars have really gone up because of the pandemic. And so there are, there are, there's this blending of people like me with more of a big data background coming into bioinformatics and working side by side. >>So it's this interesting sort of translation because you have the whole taxonomy of bioinformatics with genomics and sequencers and all the weird file types that you get. And then you have the whole taxonomy of dev ops data ops, you know, containers and Kubernetes and all that. And trying to get that into pipelines that can actually, you know, be efficient, given the constraints. Of course, we, on the tech side, we always want to make it super optimized. I had a customer that we got it down from 500 hours to minutes, but they wanted to stay with the past solution because it was easier for them to go from 500 hours to five hours was good enough, but you know, the techies want to get it down to five minutes. >>This is, this is, we've seen this movie before dev ops, um, edge and op operations, you know, IOT, world scenes, the convergence of cultures. Now you have data and then old, old school operations kind of coming up. So this kind of supports the thesis. That data as code is the next infrastructure as code. What do you guys, what's the reaction there for you guys? What do you think about that? What does data's code mean? If infrastructure's code was cloud and dev ops, what is data as code? What does that mean? >>I could take it if you like. I think, um, data teams, organizations, um, have been long been this bottleneck within the organization and there's like this dark matter of untapped energy and potential waiting to be unleashed a data with the advent of open source projects like DBT, um, have been slowly sort of embracing software development, lifecycle practices. And this is really sort of seeing a, a big steep increase in, um, in their velocity. And, and this is only going to increase and improve as we're seeing data teams, um, embrace starter as code. I think it's, uh, the future is bright for data. So I'm very excited. >>Lynn Peter reaction. I mean, agility data is code is developer concept CICB pipeline. You mentioned it new operational workflows coming into traditional operations reaction. >>Yeah. I mean, I think Peter's right on there. I'd say, you know, some of those tools we're seeing come in from, from software, like, like DBT, basically giving you that infrastructure as code, but applied to that data realm. Also there have been a few, like get for data type things, pack a derm, I believe is one and a few other ones where you bring that in and you also see a lot of immutability concepts flowing into the data realm. So I think just seeing some of those software engineering concepts come over to the data world has, has been pretty interesting >>What we'll literally just versioning datasets and the identification of what's in a data set. What's not in a data set. Some of this is around ethical AI as well, um, is a whole, uh, area that has come out of research groups. Um, mostly AI research groups, but is being applied to medical data and needs to be obviously, um, so this, this, this, um, metadata and versioning around data sets is really, I think, a very of the moment area. >>Yeah, I think we, we, you guys are bringing up a really good kind of direction that's happening in data. And that is something that you're seeing on the software side, open source and now dev ops. And now going to data is that the supply chain challenges of we've been talking about it here on the cube and this, this, um, this episode is, you know, we've seen Ukraine war, but some open source, you know, malware hitting datasets is data secure. What is that going to look like? So you starting to get into this what's the supply chain, is it verified data sets if data sets have to be managed a whole nother level of data supply chain comes up, what do you guys think about that? >>I'll jump in. Oh, sorry. I'll jump in again. I think that, you know, there's, there's, um, some, some of the compliance requirements, um, around financial data are going to be applied to other types of data, probably health data. So immutability reproducibility, um, that is, uh, legally required. Um, also some of the privacy requirements that originated in Europe with GDPR are going to be replicated as more and more, um, types of data. And again, I'm always going to speak for health, but there's other types as well coming out of personal devices and that kind of stuff. So I think, you know, this idea of data as code is it's, it goes down to versioning and controlling and, um, that's, uh, that's sort of a real succinct way to say it that we didn't used to think about that. We just put it in our, you know, relational database and we were good to go, but, um, versioning and controlling in the global ecosystem is kind of, uh, where I'm focusing my efforts. >>It brings up a good question. If databases, if data is going to be part of the development process has to be addressable, which means horizontally scalable. That means it has to be accessible and open. How do you make that work and not foreclose it with a lot of restrictions? >>I think the use of data catalogs and appropriate tagging and categorization, you know, I think, you know, everyone's heard of the term data swamp, and I think that just came about because that everyone saw like, oh, wow, S3, you know, infinite storage. We just, you know, throw whatever in there for as long as we want. And I think at times, you know, the proliferation of S3 buckets, um, and the like, you know, we've just seen, uh, perhaps security, not maintained as well as it could have been. And I think that's kind of where data platform engineering teams have really sort of, uh, come into the, for, you know, creating a governance set of buckets like formation on top. But I think that's kind of where we need to see a lot more work with appropriate tags and also the automatic publishing of metadata into data catalogs so that, um, folks can easily search and address particular data sets and also control the access. You know, for instance, you've got some PII data, perhaps really only your marketing folks should be looking at email addresses and the like not perhaps your finance folks. So I think, you know, there's, there's a lot to be leveraged there in formation and other solutions, >>Alex, let's back up and talk about what's in it for the customer, right. Let's zoom back and saying reality is I just got to get my data to make sure it's secure always on and not going to be hackable. And I just got to get my data available on river performance. So then, then I got to start thinking about, okay, how do I intersect it? So what should teams be thinking about right now as I look up all their data options or databases across their enterprise? >>Yeah, it's, it's a, it's a good question. I just, you know, I think Peter made some good points there and you can think of history as sort of ebbing and flowing between centralization and decentralization a lot of times. And you know, when storage was expensive, data was going to be sort of centralized and Maine maintained, sort of a, you know, by the, uh, the people that are in charge of it. But then when, when S3 comes along, it really decreases storage. Now we can do a lot more experiments on it. We can store a lot more of our data, keep it around and do different things on it. You know, now we've got regulations again, we were, we gotta, we gotta be more realistic about, about keeping that data secure and make sure we're, we're doing the right things with it. So it's, we're gonna probably go through a period of, of centralization as we work out some of this tooling around, you know, tagging and, and ethical AI that, that both Peter. And when we're talking about here and maybe get us into that, that next wearable world of de-centralization again. But I, I think that ebb and flow is going to be natural in response to, you know, the problems of the, the other extreme, >>Where are we in the market right now from progress standpoint, because data lakes don't want to be data swamps. You seeing lake formation as a data architecture, as an example, where are we with customers? What are they doing right now? Where would you put them in the progress bar of, of evolution towards the Nirvana of having this data sovereignty? And this data is code environment. Are they just now in the data lake store, everything real-time and historical? >>Well, I can jump in there. Um, SQL on files is the, is the driver. And so we know when Amazon got Athena, um, that really drove a lot of the customers to really realistically look at data lake technologies, but data warehouses are not going away. And the integration between the two is not seamless. No, we, we are partners with AWS, but we don't work for them. So we can tell you the truth here. Um, there's, there's work to it, but it really, for my customers, it really upped the ante around data lake, uh, because Athena and technologies like that, the serverless, um, SQL queries or the familiar quarry, um, uh, libraries really drove a movement away from either OLTB or OLAP, more expensive, more cumbersome structures, >>But they still need that. Oh, LTP, like if they have high latency issues, they want to be low latency. Can they have the best of both worlds? That's the question. >>I mean, I w I would say we're getting, you know, we're getting closer. We're always going to be, uh, you know, that technology is going to be moving forward, and then we'll just move the goalpost again, in terms of, of what we're asking from it. But I think, you know, the technology that's getting out there, you can get, get really well. And then, you know, just what I work in the dynamo DB world. So you can get really great low latency. So, you know, single digit millisecond LLTP response times on that. I think some of the analytics stuff has been a problem with that. And there, there are different solutions out there to where you can export dynamo to S3, and then you can be doing SQL on your FA your files with Athena Lakeland's talking about, or now you see, you know, rock set of partner here that that'll just ingest your dynamo, DB data, you know, make all those changes. So if you're doing a lot of, uh, changes to your data and dynamo is going to reflect in Roxanna, and then you can do analytics queries, you can do complex filters, different things like that. So, you know, I, I think we continue to push the envelope and then we moved the goalpost again. But, um, you know, I think we're in a, a lot better place than we were a few years ago, for sure. >>Where do you guys see this going relative to the next level? If data as code becomes that next agile, um, software defined environment with open source? Well, all of these new tools with serverless things happening with data lakes are built in with nice architectures with data warehouses, where does it go next? What happens next? If this becomes an agile environment, what's the impact? >>Well, I don't want to be so dominant, but I have, I feel strongly, so I'm going to jump in here. So, so I, um, I feel like, you know, now for my, my, my most computationally intensive workloads, I'm using GPS, I'm bursting to GPU for TensorFlow neural networks. So I've been doing quite a bit of exploration around Amazon bracket for QPS and it's early. Um, and it's specialty. It's not, you know, for everybody. And the learning curve again is pretty daunting, but, um, there are some use cases out there. I mean, I got ahold of a paper where some people did some, um, it was a Q CNN, um, quantum convolutional neural network for lung cancer images, um, from COVID patients and the, the, uh, the QP Hugh, um, algorithm pipeline performed more accurately and faster. So I think, um, bursting to quantum is something to pay attention to. >>Awesome. Peter, what's your take on what's next? >>Well, I think there's still, um, that, that was absolutely fascinating from Lynn, but I think also there's, there's, uh, you know, some more sort of low-level, uh, low-hanging fruit available in, in the data stack. I think there's a lot of, there's still a lot of challenges around the transformation there, getting our data from sort of raw landed data into business domains, and that sort of talks to a lot of what data mesh is all about. I think if we can somehow make that a little more frictionless, because that that's really where the like labor intensive work is. That's, that's kinda dominating, uh, data engineering teams and where we're sort of trying to push that, that workload back onto, um, you know, software engineering teams. >>Alice will give you the final word. What's the impact. What's the next step? What's it look like in the future? >>Yeah, for sure. I mean, I've never had the, uh, breaking a data center problem that wind's had, or the bursting the quantum problem, for sure. But, you know, if you're in that, you know, the pool I swim and of terabytes of data and below and things like that, I think it's a good time. It just like we saw, you know, like we were talking about dev ops and, and pushing, uh, you know, allowing software engineers to handle more of, of the operation stuff. I think the same thing with data can happen where, you know, software engineering teams can handle not just their code, not just, you know, deploying and operating it, but also thinking about their data around the code. And that doesn't mean you won't have people assist you within your organization. You won't have some specialists in there, but I think pushing more stuff, even onto the individual development teams where they have ownership of that. And they're thinking about it through all this different life cycle. I mean, I'm pretty bullish on that. And I think that's an exciting development >>Was that shift, what left with left is security. What does that mean to >>Shipped so much stuff left, but now, you know, the things that were at the end are back at the end again, but, uh, you know, at least we think we can think about that stuff early in the process, which is good, >>Great conversation, very provocative, very realistic and great impact on the future data as code is real, the developers I do believe will have a great operational role and the data stack concept and impacting things like quantum, it's all kind of lining up nicely. Um, and it's a great opportunity to be in this field from a science and policy standpoint. Um, data engineering is legit. It's going to continue to grow and thanks for unpacking that here on the queue. Appreciate it. Okay. Great panel D AWS heroes. They work with AWS and the ecosystem independently out there. They're in the trenches doing the front lines, cracking the code here with data as code season two, episode two of the ongoing series of the 80, but startups I'm John for your host. Thanks for watching.

Published Date : Apr 5 2022

SUMMARY :

remotely and look forward to see you in person at the next re-invent or other event. What trends do you see in the database space? So I do, uh, I do a lot of consulting work working with different people and, you know, often with, And really lot deep into the database side in terms of like cloud native impact, diversity of database and then, you know, if you have some specialized needs, you want to show some real time stuff to your users, check out rock site. What are you working on? you know, put the pedal to the metal. What was the big change that you've seen with the, uh, the pandemic and in genomic cloud genomic specifically but security, you know, there's federated security is non-trivial and not well understood What are you working on and how does making sure that it's coherent across the company and a data platform, I have to ask you while you're here. So, you know, often times in the enterprise, you've got, uh, projects with So I'd like to ask each of you to answer this next question, which is how has the team dynamics Um, you know, I have, uh, a lot of experience with data lakes and, you know, containerizing and using What do you see this data engineering impact from a personnel standpoint? and then the security aspects, and also, you know, the mechanisms How does the data engineering impact organizations from your standpoint? I think definitely, you know, gone are the days where you have a single relational database that is serving but it's interesting, you know, I look at a database world and you look at the solutions that are out there. which makes it, you know, like I said, even more fun to work in this domain is, uh, the research dollars have really for them to go from 500 hours to five hours was good enough, but you know, edge and op operations, you know, IOT, world scenes, I could take it if you like. I mean, agility data is code is developer concept CICB I'd say, you know, some of those tools we're seeing come in from, from software, to be obviously, um, so this, this, this, um, metadata and versioning around you know, we've seen Ukraine war, but some open source, you know, malware hitting datasets I think that, you know, there's, there's, um, How do you make that work and not foreclose it with a lot of restrictions? So I think, you know, there's, there's a lot to be leveraged there in formation And I just got to get my data available on river performance. But I, I think that ebb and flow is going to be natural in response to, you know, the problems of the, Where would you put them in the progress bar of, of evolution towards the So we can tell you the truth here. the question. We're always going to be, uh, you know, that technology is going to be moving forward, so I, um, I feel like, you know, now for my, my, my most computationally intensive Peter, what's your take on what's next? but I think also there's, there's, uh, you know, some more sort of low-level, Alice will give you the final word. I think the same thing with data can happen where, you know, software engineering teams can handle What does that mean to Um, and it's a great opportunity to be

ENTITIES

Entity	Category	Confidence
Lynn	PERSON	0.99+
Peter	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
New Zealand	LOCATION	0.99+
Peter Hanson	PERSON	0.99+
five hours	QUANTITY	0.99+
500 hours	QUANTITY	0.99+
five	QUANTITY	0.99+
Alex	PERSON	0.99+
two	QUANTITY	0.99+
Alice	PERSON	0.99+
each side	QUANTITY	0.99+
Lynn Peter	PERSON	0.99+
each	QUANTITY	0.99+
Athena Lakeland	ORGANIZATION	0.99+
five minutes	QUANTITY	0.99+
John	PERSON	0.99+
pandemic	EVENT	0.98+
FinTech	ORGANIZATION	0.98+
GDPR	TITLE	0.98+
first	QUANTITY	0.98+
both	QUANTITY	0.98+
both worlds	QUANTITY	0.97+
single machine	QUANTITY	0.96+
10 years	QUANTITY	0.96+
first time	QUANTITY	0.96+
10 X	QUANTITY	0.96+
CICB	ORGANIZATION	0.94+
single	QUANTITY	0.94+
John furry	PERSON	0.93+
Lynn blankets	PERSON	0.93+
80	QUANTITY	0.91+
Lindbergh Lega consulting	ORGANIZATION	0.9+
LLTP	ORGANIZATION	0.89+
one	QUANTITY	0.87+
two different teams	QUANTITY	0.87+
terabytes	QUANTITY	0.86+
S3	TITLE	0.81+
COVID	ORGANIZATION	0.79+
Alex	TITLE	0.78+
Lakehouse	ORGANIZATION	0.77+
few years ago	DATE	0.77+
a millisecond	QUANTITY	0.77+
single digit	QUANTITY	0.76+
D AWS	ORGANIZATION	0.76+
Startup Showcase S2 E2	EVENT	0.73+
a second	QUANTITY	0.73+
Kubernetes	TITLE	0.72+
Athena	ORGANIZATION	0.71+
season two	QUANTITY	0.7+
SQL	TITLE	0.69+
OLTB	ORGANIZATION	0.69+
Redshift	ORGANIZATION	0.69+
CNN	ORGANIZATION	0.68+
Cedar	ORGANIZATION	0.66+
Hugh	PERSON	0.66+
dynamo	ORGANIZATION	0.65+
episode	QUANTITY	0.63+
Q	ORGANIZATION	0.63+
episode two	OTHER	0.6+
Maine	LOCATION	0.6+

Rahul Pathak Opening Session | AWS Startup Showcase S2 E2

>>Hello, everyone. Welcome to the cubes presentation of the 80 minutes startup showcase. Season two, episode two, the theme is data as code, the future of analytics. I'm John furry, your host. We had a great day lineup for you. Fast growing startups, great lineup of companies, founders, and stories around data as code. And we're going to kick it off here with our opening keynote with Rahul Pathak VP of analytics at AWS cube alumni. Right? We'll thank you for coming on and being the opening keynote for this awesome event. >>Yeah. And it's great to see you, and it's great to be part of this event, uh, excited to, um, to help showcase some of the great innovation that startups are doing on top of AWS. >>Yeah. We last spoke at AWS reinvent and, uh, a lot's happened there, service loss of serverless as the center of the, of the action, but all these start-ups rock set Dremio Cribble monks next Liccardo, a HANA imply all doing great stuff. Data as code has a lot of traction. So a lot of still momentum going on in the marketplace. Uh, pretty exciting. >>No, it's, uh, it's awesome. I mean, I think there's so much innovation happening and you know, the, the wonderful part of working with data is that the demand for services and products that help customers drive insight from data is just skyrocketing and has no sign of no sign of slowing down. And so it's a great time to be in the data business. >>It's interesting to see the theme of the show getting traction, because you start to see data being treated almost like how developers write software, taking things out of branches, working on them, putting them back in, uh, machine learnings, uh, getting iterated on you, seeing more models, being trained differently with better insights, action ones that all kind of like working like code. And this is a whole nother way. People are reinventing their businesses. This has been a big, huge wave. What's your reaction to that? >>Uh, I think it's spot on, I mean, I think the idea of data's code and bringing some of the repeatability of processes from software development into how people built it, applications is absolutely fundamental and especially so in machine learning where you need to think about the explainability of a model, what version of the world was it trained on? When you build a better model, you need to be able to explain and reproduce it. So I think your insights are spot on and these ideas are showing up in all stages of the data work flow from ingestion to analytics to I'm out >>This next way is about modernization and going to the next level with cloud-scale. Uh, thank you so much for coming on and being the keynote presenter here for this great event. Um, I'll let you take it away. Reinventing businesses, uh, with ads analytics, right? We'll take it away. >>Okay, perfect. Well, folks, we're going to talk about, uh, um, reinventing your business with, uh, data. And if you think about it, the first wave of reinvention was really driven by the cloud. As customers were able to really transform how they thought about technology and that's well on her way. Although if you stop and think about it, I think we're only about five to 10% of the way done in terms of it span being on the cloud. So lots of work to do there, but we're seeing another wave of reinvention, which is companies reinventing their businesses with data and really using data to transform what they're doing to look for new opportunities and look for ways to operate more efficiently. And I think the past couple of years of the pandemic, it really only accelerated that trend. And so what we're seeing is, uh, you know, it's really about the survival of the most informed folks for the best data are able to react more quickly to what's happening. >>Uh, we've seen customers being able to scale up if they're in, say the delivery business or scale down, if they were in the travel business at the beginning of all of this, and then using data to be able to find new opportunities and new ways to serve customers. And so it's really foundational and we're seeing this across the board. And so, um, you know, it's great to see the innovation that's happening to help customers make sense of all of this. And our customers are really looking at ways to put data to work. It's about making better decisions, finding new efficiencies and really finding new opportunities to succeed and scale. And, um, you know, when it comes to, uh, good examples of this FINRA is a great one. You may not have heard of them, but that the U S equities regulators, all trading that happens in equities, they keep track of they're look at about 250 billion records per day. >>Uh, the examiner, I was only EMR, which is our spark and Hadoop service, and they're processing 20 terabytes of data running across tens of thousands of nodes. And they're looking for fraud and bad actors in the market. So, um, you know, huge, uh, transformation journey for FINRA over the years of customer I've gotten to work with personally since really 2013 onward. So it's been amazing to see their journey, uh, Pinterest, not a great customer. I'm sure everyone's familiar with, but, um, you know, they're about visual search and discovery and commerce, and, um, they're able to scale their daily lot searches, um, really a factor of three X or more, uh, drive down their costs. And they're using the Amazon Opus search service. And really what we're trying to do at AWS is give our customers the most comprehensive set of services for the end-to-end journey around, uh, data from ingestion to analytics and machine learning. And we will want to provide a comprehensive set of capabilities for ingestion, cataloging analytics, and then machine learning. And all of these are things that our partners and the startups that are run on us have available to them to build on as they build and deliver value for their customers. >>And, you know, the way we think about this is we want customers to be able to modernize what they're doing and their infrastructure. And we provide services for that. It's about unifying data, wherever it lives, connecting it. So the customers can build a complete picture of their customers and business. And then it's about innovation and really using machine learning to bring all of this unified data, to bear on driving new innovation and new opportunities for customers. And what we're trying to do AWS is really provide a scalable and secure cloud platform that customers and partners can build on a unifying is about connecting data. And it's also about providing well-governed access to data. So one of the big trends that we see is customers looking for the ability to make self-service data available to that customer there and use. And the key to that is good foundational governance. >>Once you can define good access controls, you then are more comfortable setting data free. And, um, uh, the other part of it is, uh, data lakes play a huge role because you need to be able to think about structured and unstructured data. In fact, about 80% of the data being generated today, uh, is unstructured. And you want to be able to connect data that's in data lakes with data that's in purpose-built data stores, whether that's databases on AWS databases, outside SAS products, uh, as well as things like data warehouses and machine learning systems, but really connecting data as key. Uh, and then, uh, innovation, uh, how can we bring to bear? And we imagine all processes with new technologies like AI and machine learning, and AI is also key to unlocking a lot of the value that's in unstructured data. If you can figure out what's in an imagine the sentiment of audio and do that in real-time that lets you then personalize and dynamically tailor experiences, all of which are super important to getting an edge, um, in, uh, in the modern marketplace. And so at AWS, we, when we think about connecting the dots across sources of data, allowing customers to use data, lakes, databases, analytics, and machine learning, we want to provide a common catalog and governance and then use these to help drive new experiences for customers and their apps and their devices. And then this, you know, in an ideal world, we'll create a closed loop. So you create a new experience. You observe our customers interact with it, that generates more data, which is a data source that feeds into the system. >>And, uh, you know, on AWS, uh, thinking about a modern data strategy, uh, really at the core is a data lakes built on us three. And I'll talk more about that in a second. Then you've got services like Athena included, lake formation for managing that data, cataloging it and querying it in place. And then you have the ability to use the right tool for the right job. And so we're big believers in purpose-built services for data because that's where you can avoid compromising on performance functionality or scale. Uh, and then as I mentioned, unification and inter interconnecting, all of that data. So if you need to move data between these systems, uh, there's well-trodden pathways that allow you to do that, and then features built into services that enable that. >>And, um, you know, some of the core ideas that guide the work that we do, um, scalable data lakes at key, um, and you know, this is really about providing arbitrarily scalable high throughput systems. It's about open format data for future-proofing. Uh, then we talk about purpose-built systems at the best possible functionality, performance, and cost. Uh, and then from a serverless perspective, this has been another big trend for us. We announced a bunch of serverless services and reinvented the goal here is to really take away the need to manage infrastructure from customers. They can really focus about driving differentiated business value, integrated governance, and then machine learning pervasively, um, not just as an end product for data scientists, but also machine learning built into data, warehouses, visualization and a database. >>And so it's scalable data lakes. Uh, data three is really the foundation for this. One of our, um, original services that AWS really the backbone of so much of what we do, uh, really unmatched your ability, availability, and scale, a huge portfolio of analytics services, uh, both that we offer, but also that our partners and customers offer and really arbitrary skin. We've got individual customers and estimator in the expert range, many in the hundreds of petabytes. And that's just growing. You know, as I mentioned, we see roughly a 10 X increase in data volume every five years. So that's a exponential increase in data volumes, Uh, from a purpose-built perspective, it's the right tool for the right job, the red shift and data warehousing Athena for querying all your data. Uh, EMR is our managed sparking to do, uh, open search for log analytics and search, and then Kinesis and Amex care for CAFCA and streaming. And that's been another big trend is, uh, real time. Data has been exploding and customers wanting to make sense of that data in real time, uh, is another big deal. >>Uh, some examples of how we're able to achieve differentiated performance and purpose-built systems. So with Redshift, um, using managed storage and it's led us and since types, uh, the three X better price performance, and what's out there available to all our customers and partners in EMR, uh, with things like spark, we're able to deliver two X performance of open source with a hundred percent compatibility, uh, almost three X and Presto, uh, with on two, which is our, um, uh, new Silicon chips on AWS, better price performance, about 10 to 12% better price performance, and 20% lower costs. And then, uh, all compatible source. So drop your jobs, then have them run faster and cheaper. And that translates to customer benefits for better margins for partners, uh, from a serverless perspective, this is about simplifying operations, reducing total cost of ownership and freeing customers from the need to think about capacity management. If we invent, we, uh, announced serverless redshifts EMR, uh, serverless, uh, Kinesis and Kafka, um, and these are all game changes for customers in terms of freeing our customers and partners from having to think about infrastructure and allowing them to focus on data. >>And, um, you know, when it comes to several assumptions in analytics, we've really got a very full and complete set. So, uh, whether that's around data warehousing, big data processing streaming, or cataloging or governance or visualization, we want all of our customers to have an option to run something struggles as well as if they have specialized needs, uh, uh, instances are available as well. And so, uh, really providing a comprehensive deployment model, uh, based on the customer's use cases, uh, from a governance perspective, uh, you know, like information is about easy build and management of data lakes. Uh, and this is what enables data sharing and self service. And, um, you know, with you get very granular access controls. So rule level security, uh, simple data sharing, and you can tag data. So you can tag a group of analysts in the year when you can say those only have access to the new data that's been tagged with the new tags, and it allows you to very, scaleably provide different secure views onto the same data without having to make multiple copies, another big win for customers and partners, uh, support transactions on data lakes. >>So updates and deletes. And time-travel, uh, you know, John talked about data as code and with time travel, you can look at, um, querying on different versions of data. So that's, uh, a big enabler for those types of strategies. And with blue, you're able to connect data in multiple places. So, uh, whether that's accessing data on premises in other SAS providers or, uh, clouds, uh, as well as data that's on AWS and all of this is, uh, serverless and interconnected. And, um, and really it's about plugging all of your data into the AWS ecosystem and into our partner ecosystem. So this API is all available for integration as well, but then from an AML perspective, what we're really trying to do is bring machine learning closer to data. And so with our databases and warehouses and lakes and BI tools, um, you know, we've infused machine learning throughout our, by, um, the state of the art machine running that we offer through SageMaker. >>And so you've got a ML in Aurora and Neptune for broths. Uh, you can train machine learning models from SQL, directly from Redshift and a female. You can use free inference, and then QuickSight has built in forecasting built in natural language, querying all powered by machine learning, same with anomaly detection. And here are the ideas, you know, how can we up our systems get smarter at the surface, the right insights for our customers so that they don't have to always rely on smart people asking the right questions, um, and you know, uh, really it's about bringing data back together and making it available for innovation. And, uh, thank you very much. I appreciate your attention. >>Okay. Well done reinventing the business with AWS analytics rural. That was great. Thanks for walking through that. That was awesome. I have to ask you some questions on the end-to-end view of the data. That seems to be a theme serverless, uh, in there, uh, Mel integration. Um, but then you also mentioned picking the right tool for the job. So then you've got like all these things moving on, simplify it for me right now. So from a business standpoint, how do they modernize? What's the steps that the clients are taking with analytics, what's the best practice? How do they, what's the what's the high order bit here? >>Uh, so the basic hierarchy is, you know, historically legacy systems are rigid and inflexible, and they weren't really designed for the scale of modern data or the variety of it. And so what customers are finding is they're moving to the cloud. They're moving from legacy systems with punitive licensing into more flexible, more systems. And that allows them to really think about building a decoupled, scalable future proof architecture. And so you've got the ability to combine data lakes and databases and data warehouses and connect them using common KPIs and common data protection. And that sets you up to deal with arbitrary scale and arbitrary types. And it allows you to evolve as the future changes since it makes it easy to add in a new type of engine, as we invent a better one a few years from now. Uh, and then, uh, once you've kind of got your data in a cloud and interconnected in this way, you can now build complete pictures of what's going on. You can understand all your touch points with customers. You can understand your complete supply chain, and once you can build that complete picture of your business, you can start to use analytics and machine learning to find new opportunities. So, uh, think about modernizing, moving to the cloud, setting up for the future, connecting data end to end, and then figuring out how to use that to your advantage. >>I know as you mentioned, modern data strategy gives you the best of both worlds. And you've mentioned, um, briefly, I want to get a little bit more, uh, insight from you on this. You mentioned open, open formats. One of the themes that's come out of some of the interviews, these companies we're going to be hearing from today is open source. The role opens playing. Um, how do you see that integrating in? Because again, this is just like software, right? Open, uh, open source software, open source data. It seems to be a trend. What does open look like to you? How do you see that progressing? >>Uh, it's a great question. Uh, open operates on multiple dimensions, John, as you point out, there's open data formats. These are things like JSI and our care for analytics. This allows multiple engines tend to operate on data and it'll, it, it creates option value for customers. If you're going to data in an open format, you can use it with multiple technologies and that'll be future-proofed. You don't have to migrate your data. Now, if you're thinking about using a different technology. So that's one piece now that sort of software, um, also, um, really a big enabler for innovation and for customers. And you've got things like squat arc and Presto, which are popular. And I know some of the startups, um, you know, that we're talking about as part of the showcase and use these technologies, and this allows for really the world to contribute, to innovating and these engines and moving them forward together. And we're big believers in that we've got open source services. We contribute to open-source, we support open source projects, and that's another big part of what we do. And then there's open API is things like SQL or Python. Uh, again, uh, common ways of interacting with data that are broadly adopted. And this one, again, create standardization. It makes it easier for customers to inter-operate and be flexible. And so open is really present all the way through. And it's a big part, I think, of, uh, the present and the future. >>Yeah. It's going to be fun to watch and see how that grows. It seems to be a lot of traction there. I want to ask you about, um, the other comment I thought was cool. You had the architectural slides out there. One was data lakes built on S3, and you had a theme, the glue in lake formation kind of around S3. And then you had the constellation of, you know, Kinesis SageMaker and other things around it. And you said, you know, pick the tool for the right job. And then you had the other slide on the analytics at the center and you had Redshift and all the other, other, other services around it around serverless. So one was more about the data lake with Athena glue and lake formation. The other one's about serverless. Explain that a little bit more for me, because I'm trying to understand where that fits. I get the data lake piece. Okay. Athena glue and lake formation enables it, and then you can pick and choose what you need on the serverless side. What does analytics in the center mean? >>So the idea there is that really, we wanted to talk about the fact that if you zoom into the analytics use case within analytics, everything that we offer, uh, has a serverless option for our customers. So, um, you could look at the bucket of analytics across things like Redshift or EMR or Athena, or, um, glue and league permission. You have the option to use instances or containers, but also to just not worry about infrastructure and just think declaratively about the data that you want to. >>Oh, so basically you're saying the analytics is going serverless everywhere. Talking about volumes, you mentioned 10 X volumes. Um, what are other stats? Can you share in terms of volumes? What are people seeing velocity I've seen data warehouses can't move as fast as what we're seeing in the cloud with some of your customers and how they're using data. How does the volume and velocity community have any kind of other kind of insights into those numbers? >>Yeah, I mean, I think from a stats perspective, um, you know, take Redshift, for example, customers are processing. So reading and writing, um, multiple exabytes of data there across from each shift. And, uh, you know, one of the things that we've seen in, uh, as time has progressed as, as data volumes have gone up and did a tapes have exploded, uh, you've seen data warehouses get more flexible. So we've added things like the ability to put semi-structured data and arbitrary, nested data into Redshift. Uh, we've also seen the seamless integration of data warehouses and data lakes. So, um, actually Redshift was one of the first to enable a straightforward acquiring of data. That's sitting in locally and drives as well as feed and that's managed on a stream and, uh, you know, those trends will continue. I think you'll kind of continue to see this, um, need to query data wherever it lives and, um, and, uh, allow, uh, leaks and warehouses and purpose-built stores to interconnect. >>You know, one of the things I liked about your presentation was, you know, kind of had the theme of, you know, modernize, unify, innovate, um, and we've been covering a lot of companies that have been, I won't say stumbling, but like getting to the future, some go faster than others, but they all kind of get stuck in an area that seems to be the same spot. It's the silos, breaking down the silos and get in the data lakes and kind of blending that purpose built data store. And they get stuck there because they're so used to silos and their teams, and that's kind of holding back the machine learning side of it because the machine learning can't do its job if they don't have access to all the data. And that's where we're seeing machine learning kind of being this new iterative model where the models are coming in faster. And so the silo brake busting is an issue. So what's your take on this part of the equation? >>Uh, so there's a few things I plan it. So you're absolutely right. I think that transition from some old data to interconnected data is always straightforward and it operates on a number of levels. You want to have the right technology. So, um, you know, we enable things like queries that can span multiple stores. You want to have good governance, you can connect across multiple ones. Uh, then you need to be able to get data in and out of these things and blue plays that role. So there's that interconnection on the technical side, but the other piece is also, um, you know, you want to think through, um, organizationally, how do you organize, how do you define it once data when they share it? And one of the asylees for enabling that sharing and, um, think about, um, some of the processes that need to get put in place and create the right incentives in your company to enable that data sharing. And then the foundational piece is good guardrails. You know, it's, uh, it can be scary to open data up. And, uh, the key to that is to put good governance in place where you can ensure that data can be shared and distributed while remaining protected and adhering to the privacy and compliance and security regulations that you have for that. And once you can assert that level of protection, then you can set that data free. And that's when, uh, customers really start to see the benefits of connecting all of it together, >>Right? And then we have a batch of startups here on this episode that are doing a lot of different things. Uh, some have, you know, new lake new lakes are forming observability lakes. You have CQL innovation on the front end data, tiering innovation at the data tier side, just a ton of innovation around this new data as code. How do you see as executive at AWS? You're enabling all this, um, where's the action going? Where are the white spaces? Where are the opportunities as this architecture continues to grow, um, and get traction because of the relevance of machine learning and AI and the apps are embedding data in there now as code where's the opportunities for these startups and how can they continue to grow? >>Yeah, the, I mean, the opportunity is it's amazing, John, you know, we talked a little bit about this at the beginning, but the, there is no slow down insight for the volume of data that we're generating pretty much everything that we have, whether it's a watch or a phone or the systems that we interact with are generating data and, uh, you know, customers, uh, you know, we talk a lot about the things that'll stay the same over time. And so, you know, the data volumes will continue to go up. Customers are gonna want to keep analyzing that data to make sense of it. They're going to want to be able to do it faster and more cheaply than they were yesterday. And then we're going to want to be able to make decisions and innovate, uh, in a shorter cycle and run more experiments than they were able to do. >>And so I think as long as, and they're always going to want this data to be secure and well-protected, and so I think as long as we, and the startups that we work with can continue to push on making these things better. Can I deal with more data? Can I deal with it more cheaply? Can I make it easier to get insight? And can I maintain a super high bar in security investments in these areas will just be off. Um, because, uh, the demand side of this equation is just in a great place, given what we're seeing in terms of theater and the architect for forum. >>I also love your comment about, uh, ML integration being the last leg of the equation here or less likely the journey, but you've got that enablement of the AIP solves a lot of problems. People can see benefits from good machine learning and AI is creating opportunities. Um, and also you also have mentioned the end to end with security piece. So data and security are kind of going hand in hand these days, not just the governments and the compliance stuff we're talking about security. So machine learning integration kind of connects all of this. Um, what's it all mean for the customers, >>For customers. That means that with machine learning and really enabling themselves to use machine learning, to make sense of data, they're able to find patterns that can represent new opportunities, um, quicker than ever before. And they're able to do it, uh, dynamically. So, you know, in a prior version of the world, we'd have little bit of systems and they would be relatively rigid and then we'd have to improve them. Um, with machine learning, this can be dynamic and near real time and you can customize them. So, uh, that just represents an opportunity to deepen relationships with customers and create more value and to find more efficiency in how businesses are run. So that piece is there. Um, and you know, your ideas around, uh, data's code really come into play because machine learning needs to be repeatable and explainable. And that means versioning, uh, keeping track of everything that you've done from a code and data and learning and training perspective >>And data sets are updating the machine learning. You got data sets growing, they become code modules that can be reused and, uh, interrogated, um, security okay. Is a big as a big theme data, really important security is seen as one of our top use cases. Certainly now in this day and age, we're getting a lot of, a lot of breaches and hacks coming in, being defended. It brings up the open, brings up the data as code security is a good proxy for kind of where this is going. What's your what's take on that and your reaction to that. >>So I'm, I'm security. You can, we can never invest enough. And I think one of the things that we, um, you know, guide us in AWS is security, availability, durability sort of jobs, you know, 1, 2, 3, and, um, and it operates at multiple levels. You need to protect data and rest with encryption, good key management and good practices though. You need to protect data on the wire. You need to have a good sense of what data is allowed to be seen by whom. And then you need to keep track of who did what and be able to verify and come back and prove that, uh, you know, uh, only the things that were allowed to happen actually happened. And you can actually then use machine learning on top of all of this apparatus to say, uh, you know, can I detect things that are happening that shouldn't be happening in near real time so they could put a stop to them. So I don't think any of us can ever invest enough in securing and protecting my data and our systems, and it is really fundamental or adding customer trust and it's just good business. So I think it is absolutely crucial. And we think about it all the time and are always looking for ways to raise >>Well, I really appreciate you taking the time to give the keynote final word here for the folks watching a lot of these startups that are presenting, they're doing well. Business wise, they're being used by large enterprises and people buying their products and using their services for customers are implementing more and more of the hot startups products they're relevant. What's your advice to the customer out there as they go on this journey, this new data as code this new future of analytics, what's your recommendation. >>So for customers who are out there, uh, recommend you take a look at, um, what, uh, the startups on AWS are building. I think there's tremendous innovation and energy, uh, and, um, there's really great technology being built on top of a rock solid platform. And so I encourage customers thinking about it to lean forward, to think about new technology and to embrace, uh, move to the cloud suite, modernized, you know, build a single picture of our data and, and figure out how to innovate and when >>Well, thanks for coming on. Appreciate your keynote. Thanks for the insight. And thanks for the conversation. Let's hand it off to the show. Let the show begin. >>Thank you, John pleasure, as always.

Published Date : Apr 5 2022

SUMMARY :

And we're going to kick it off here with our opening keynote with um, to help showcase some of the great innovation that startups are doing on top of AWS. service loss of serverless as the center of the, of the action, but all these start-ups rock set Dremio And so it's a great time to be in the data business. It's interesting to see the theme of the show getting traction, because you start to see data being treated and especially so in machine learning where you need to think about the explainability of a model, Uh, thank you so much for coming on and being the keynote presenter here for this great event. And so what we're seeing is, uh, you know, it's really about the survival And so, um, you know, it's great to see the innovation that's happening to help customers make So, um, you know, huge, uh, transformation journey for FINRA over the years of customer And the key to that is good foundational governance. And you want to be able to connect data that's in data lakes with data And then you have the ability to use the right tool for the right job. And, um, you know, some of the core ideas that guide the work that we do, um, scalable data lakes at And that's been another big trend is, uh, real time. and freeing customers from the need to think about capacity management. those only have access to the new data that's been tagged with the new tags, and it allows you to And time-travel, uh, you know, John talked about data as code And here are the ideas, you know, how can we up our systems get smarter at the surface, I have to ask you some questions on the end-to-end Uh, so the basic hierarchy is, you know, historically legacy systems are I know as you mentioned, modern data strategy gives you the best of both worlds. And I know some of the startups, um, you know, that we're talking about as part of the showcase And then you had the other slide on the analytics at the center and you had Redshift and all the other, So the idea there is that really, we wanted to talk about the fact that if you zoom about volumes, you mentioned 10 X volumes. And, uh, you know, one of the things that we've seen And so the silo brake busting is an issue. side, but the other piece is also, um, you know, you want to think through, Uh, some have, you know, new lake new lakes are forming observability lakes. And so, you know, the data volumes will continue to go up. And so I think as long as, and they're always going to want this data to be secure and well-protected, Um, and also you also have mentioned the end to end with security piece. And they're able to do it, uh, that can be reused and, uh, interrogated, um, security okay. And then you need to keep track of who did what and be able Well, I really appreciate you taking the time to give the keynote final word here for the folks watching a And so I encourage customers thinking about it to lean forward, And thanks for the conversation.

ENTITIES

Entity	Category	Confidence
Rahul Pathak	PERSON	0.99+
John	PERSON	0.99+
20 terabytes	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
2013	DATE	0.99+
20%	QUANTITY	0.99+
yesterday	DATE	0.99+
two	QUANTITY	0.99+
S3	TITLE	0.99+
Python	TITLE	0.99+
FINRA	ORGANIZATION	0.99+
10 X	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
hundred percent	QUANTITY	0.99+
SQL	TITLE	0.98+
both	QUANTITY	0.98+
One	QUANTITY	0.98+
80 minutes	QUANTITY	0.98+
each shift	QUANTITY	0.98+
one piece	QUANTITY	0.98+
about 80%	QUANTITY	0.98+
Neptune	LOCATION	0.98+
one	QUANTITY	0.98+
Pinterest	ORGANIZATION	0.98+
today	DATE	0.97+
QuickSight	ORGANIZATION	0.97+
three	QUANTITY	0.97+
Redshift	TITLE	0.97+
wave of reinvention	EVENT	0.97+
first	EVENT	0.96+
hundreds of petabytes	QUANTITY	0.96+
HANA	TITLE	0.96+
first	QUANTITY	0.95+
both worlds	QUANTITY	0.95+
Aurora	LOCATION	0.94+
Amex	ORGANIZATION	0.94+
SAS	ORGANIZATION	0.94+
pandemic	EVENT	0.94+
12%	QUANTITY	0.93+
about 10	QUANTITY	0.93+
past couple of years	DATE	0.92+
Kafka	TITLE	0.92+
Kinesis	ORGANIZATION	0.92+
Liccardo	TITLE	0.91+
EMR	TITLE	0.91+
about five	QUANTITY	0.89+
tens of thousands of nodes	QUANTITY	0.88+
Kinesis	TITLE	0.88+
10%	QUANTITY	0.87+
three X	QUANTITY	0.86+
Athena	ORGANIZATION	0.86+
about 250 billion records per	QUANTITY	0.85+
U S	ORGANIZATION	0.85+
CAFCA	ORGANIZATION	0.84+
Silicon	ORGANIZATION	0.83+
every five years	QUANTITY	0.82+
Season two	QUANTITY	0.82+
Athena	OTHER	0.78+
single picture	QUANTITY	0.74+

Rahul Pathak, AWS | AWS re:Invent 2021

>>Hey, welcome back everyone. We're live here in the cube in Las Vegas Raiders reinvent 2021. I'm Jeffrey hosted the key we're in person this year. It's a hybrid event online. Great action. Going on. I'm rolling. Vice-president of ADF analytics. David is great to see you. Thanks for coming on. >>It's great to be here, John. Thanks for having me again. >>Um, so you've got a really awesome job. You've got serverless, you've got analytics. You're in the middle of all the action for AWS. What's the big news. What are you guys announcing? What's going on? >>Yeah, well, it's been an awesome reinvent for us. Uh, we've had a number of several us analytics launches. So red shift, our petabyte scale data warehouse, EMR for open source analytics. Uh, and then we've also had, uh, managed streaming for Kafka go serverless and then on demand for Kinesis. And then a couple of other big ones. We've got RO and cell based security for AWS lake formation. So you can get really fine grain controls over your data lakes and then asset transactions. You can actually have a inserts, updates and deletes on data lakes, which is a big step forward. >>Uh, so Swami on stage and the keynote he's actually finishing up now. But even last night I saw him in the hallway. We were talking about as much as about AI. Of course, he's got the AI title, but AI is the outcome. It's the application of all the data and this and a new architecture. He said on stage just now like, Hey, it's not about the old databases from the nineties, right? There's multiple data stores now available. And there's the unification is the big trend. And he said something interesting. Governance can be an advantage, not an inhibitor. This is kind of this new horizontally scalable, um, kind of idea that enables the vertical specialization around machine learning to be effective. It's not a new architecture, but it's now becoming more popular. People are realizing it. It's sort of share your thoughts on this whole not shift, but the acceleration of horizontally scalable and vertically integrated. Yeah, >>No, I think the way Swami put it is exactly right. What you want is the right tool for the right job. And you want to be able to deliver that to customers. So you're not compromising on performance or functionality of scale, but then you wanted all of these to be interconnected. So they're, well-integrated, you can stay in your favorite interface and take advantage of other technologies. So you can have things like Redshift integrated with Sage makers, you get analytics and machine learning. And then in Swami's absolutely right. Governance is actually an enabler of velocity. Once you've got the right guardrails in place, you can actually set people free because they can innovate. You don't have to be in the way, but you know that your data is protected. It's being used in the way that you expect by the people that you are allowing to use that data. And so it becomes a very powerful way for customers to set data free. And then, because things are elastic and serverless, uh, you can really just match capacity with demand. And so as you see spikes in usage, the system can scale out as those dwindle, they can scale back down, and it just becomes a very efficient way for customers to operate with data at scale >>Every year it reinvented. So it was kind of like a pinch me moment. It's like, well, more that's really good technology. Oh my God, it's getting easier and easier. As the infrastructure as code becomes more programmable, it's becoming easier, more Lambda, more serverless action. Uh, you got new offerings. How are customers benefiting for instance, from the three new offerings that you guys announced here? What specifically is the value proposition that you guys are putting out there? Yeah, so the, >>Um, you know, as we've tried to do with AWS over the years, customers get to focus on the things that really differentiate them and differentiate their businesses. So we take away in Redshift serverless, for example, all of the work that's needed to manage clusters, provision them, scale them, optimize them. Uh, and that's all been automated and made invisible to customers, the customers to think about data, what they want to do with it, what insights they can derive from it. And they know they're getting the most efficient infrastructure possible to make that a reality for them with high performance and low costs. So, uh, better results, more ability to focus on what differentiates their business and lower cost structure over time. >>Yeah. I had the essential guys on it's interesting. They had part of the soul cloud. Continuous is their word for what Adam was saying is clouds everywhere. And they're saying it's faster to match what you want to do with the outcomes, but the capabilities and outcomes kind of merging together where it's easy to say, this is what we want to do. And here's the outcome it supports that's right with that. What are some of the key trends on those outcomes that you see with the data analytics that's most popular right now? And kind of where's that, where's that going? >>Yeah. I mean, I think what we've seen is that data's just becoming more and more critical and top of mind for customers and, uh, you know, the pandemic has also accelerated that we found that customers are really looking to data and analytics and machine learning to find new opportunities. How can they, uh, really expand their business, take advantage of what's happening? And then the other part is how can they find efficiencies? And so, um, really everything that we're trying to do is we're trying to connect it to business outcomes for customers. How can you deepen your relationship with your customers? How can you create new customer experiences and how can you do that more efficiently, uh, with more agility and take advantage of, uh, the ability to be flexible. And you know, what is a very unpredictable world, as we've seen, >>I noticed a lot of purpose-built discussion going on in the keynote with Swami as well. How are you creating this next layer of what I call purpose-built platform like features? I mean, tools are great. You see a lot of tools in the data market tools are tools of your hammer. You want to look for a nail. We see people over by too many tools and you have ultimately a platform, but this seems to be a new trend where there's this connect phenomenon was showing me that you've got these platform capabilities that people can build on top of it, because there's a huge ecosystem of data tools out there that you guys have as partners that want to snap together. So the trend is things are starting to snap together, less primitive, roll your own, which you can do, but there's now more easier ways. Take me through that. Explain that, unpack that that phenomenon role rolling your own firm is, which has been the way now to here. Here's, here's some prefabricated software go. >>Yeah. Um, so it's a great observation and you're absolutely right. I mean, I think there's some customers that want to roll their own and they'll start with instances, they'll install software, they'll write their own code, build their own bespoke systems. And, uh, and we provide what the customers need to do that. But I think increasingly you're starting to see these higher level abstractions that take away all of that detail. And mark has Adam put it and allow customers to compose these. And we think it's important when you do that, uh, to be modular. So customers don't have to have these big bang all or nothing approaches you can pick what's appropriate, uh, but you're never on a dead end. You can always evolve and scale as you need to. And then you want to bring these ideas of unified governance and cohesive interfaces across so that customers find it easy to adopt the next thing. And so you can start off say with batch analytics, you can expand into real time. You can bring in machine learning and predictive capabilities. You can add natural language, and it's a big ecosystem of managed services as well as third parties and partners. >>And what's interesting. I want to get your thoughts while I got you here, because I think this is such an important trend and historic moment in time, Jerry chin, who one of the smartest VCs that we know from Greylock and coin castles in the cloud, which kind of came out of a cube conversation here in the queue years ago, where we saw the movement of that someone's going to build real value on AWS, not just an app. And you see the rise of the snowflakes and Databricks and other companies. And he was pointing out that you can get a very narrow wedge and get a position with these platforms, build on top of them and then build value. And I think that's, uh, the number one question people ask me, it's like, okay, how do I build value on top of these analytic packages? So if I'm a startup or I'm a big company, I also want to leverage these high level abstractions and build on top of it. How do you talk about that? How do you explain that? Because that's what people kind of want to know is like, okay, is it enabling me or do I have to fend for myself later? This is kind of, it comes up a lot. >>That's a great question. And, um, you know, if you saw, uh, Goldman's announcement this week, which is about bringing, building their cloud on top of AWS, it's a great example of using our capabilities in terms of infrastructure and analytics and machine learning to really allow them to take what's value added about Goldman and their position to financial markets, to build something value, add, and create a ton of value for Goldman, uh, by leveraging the things that we offer. And to us, that's an ideal outcome because it's a win-win for us in Goldman, but it's also a win for Goldman and their customers. >>That's what we call the Supercloud that's the opportunity. So is there a lot of Goldmans opportunities out there? Is that just a, these unicorns, are these sites? I mean, how do you, I mean, that's Goldman Sachs, they're huge. Is there, is this open to everybody? >>Absolutely. I mean, that's been one of the, uh, you know, one of the core ideas behind AWS was we wanted to give anybody any developer access to the same technology that the world's largest corporations had. And, uh, that's what you have today. The things that Goldman uses to build that cloud are available to anybody. And you can start for a few pennies scale up, uh, you know, into the petabytes and beyond >>When I was talking to Adams, Lipski when I met with him prior to re-invent, I noticed that he was definitely had an affinity towards the data, obviously he's Amazonia, but he spent time at Tableau. So, so as he's running that company, so you see that kind of mindset of the data advantage. So I have to ask you, because it's something that I've been talking about for a while and I'm waiting for it to emerge, but I'm not sure it's going to happen yet. But what infrastructure is code was for dev ops and then dev sec ops, there's almost like a data ops developing where data as code or programmable data. If I can connect the dots of what Swami's saying, what you're doing is this is like a new horizontal layer of data of freely available data with some government governance built in that's right. So it's, data's being baked into everything. So data is any ingredient, not a query to some database, it's gotta be baked into the apps, that's data as code that's. Right. So it's almost a data DevOps kind of vibe. >>Yeah, no, you're absolutely right. And you know, you've seen it with things like ML ops and so on. It's all the special case of dev ops. But what you're really trying to do is to get programmatic and systematic about how you deal with data. And it's not just data that you have. It's also publicly available data sets and it's customers sharing with each other. So building the ecosystem, our data, and we've got things like our open data program where we've got publicly hosted data sets or things like the AWS data exchange where customers can actually monetize data. So it's not just data as code, but now data as a monetizeable asset. So it's a really exciting time to be in the data business. >>Yeah. And I think it's so many too. So I've got to ask you while I got you here since you're an expert. Um, okay. Here's my problem. I have a lot of data. I'm nervous about it. I want to secure it. So if I try to secure it, I'm not making it available. So I want to feed the machine learning. How do I create an architecture where I can make it freely available, but yet maintain the control and the comfort that this is going to be secure. So what products do I buy? >>Yeah. So, uh, you know, a great place to start at as three. Um, you know, it's one of the best places for data lakes, uh, for all the reasons. That's why we talked about your ability scale costs. You can then use lake formation to really protect and govern that data so you can decide who's allowed to see it and what they're allowed to see, and you don't have to create multiple copies. So you can define that, you know, this group of partners can see a, B and C. This group can see D E and F and the system enforces that. And you have a central point of control where you can monitor what's happening. And if you want to change your mind, you can do that instantly. And all access can be locked down that you've got a variety of encryption capabilities with things like KMS. And so you can really lock down your data, but yet keep it open to the parties that you want and give them specifically the access that you want to give them. And then once you've done that, they're free to use that data, according to the rules that you defined with the analytics tools that we offer to go drive value, create insight, and do something >>That's lake formation. And then you got a Thena querying. Yes, we got all kinds of tooling on top of it. >>It's all right. You can have, uh, Athena query and your data in S3 lake formation, protecting it. And then SageMaker is integrated with Athena. So you can pull that data into SageMaker for machine learning, interrogate that data, using natural language with things like QuickSight Q a like we demoed. So just a ton of power without having to really think too deeply about, uh, developing expert skill sets in this. >>So the next question I want to ask you is because that first part of the great, great, great description, thank you very much. Now, 5g in the edges here, outpost, how was the analytics going on that as edge becomes more pervasive in the architecture? >>Yeah, it's going to be a key part of this ecosystem and it's really a continuum. So, uh, you know, we find customers are collecting data at the edge. They might be making local ML or inference type decisions on edge devices, or, you know, automobiles, for example. Uh, but typically that data with some point will come back into the cloud, into S3 will be used to do heavy duty training, and then those models get pushed back out to the edge. And then some of the things that we've done in Athena, for example, with federated query, as long as you have a network path, and you can understand what the data format or the database is, you can actually run a query on that data. So you can run real-time queries on data, wherever it lives, whether it's on an edge device, on an outpost, in a local zone or in your cloud region and combine all of that together in one place. >>Yeah. And I think having that data copies everywhere is a big thing deal. I've got to ask you now that we're here at reinvent, what's your take we're back in person last year was all virtual. Finally, not 60,000 people, like a couple of years ago, it's still 27,000 people here, all lining up for the sessions, all having a great time. Um, all good. What's the most important story from your, your area that people should pay attention to? What's the headline, what's the top news? What should people pay attention to? >>Yeah, so I think first off it is awesome to be back in person. It's just so fun to see customers and to see, I mean, you, like, we've been meeting here over the years and it's, it's great to so much energy in person. It's been really nice. Uh, you know, I think from an analytics perspective, there's just been a ton of innovation. I think the core idea for us is we want to make it easy for customers to use the right tool for the right job to get insight from all of their data as cost effectively as possible. And I think, uh, you know, I think if customers walk away and think about it as being, it's now easier than ever for me to take advantage of everything that AWS has to offer, uh, to make sense of all the data that I'm generating and use it to drive business value, but I think we'll have done our jobs. Right. >>What's the coolest thing that you're seeing here is that the serverless innovation, is it, um, the new abstraction layer with data high level services in your mind? What's the coolest thing. Got it. >>It's hard to pick the coolest that sticks like kicking the candies. I mean, I think the, uh, you know, the continued innovation in terms of, uh, performance and functionality in each of our services is a big deal. I think serverless is a game changer for customers. Uh, and then I think really the infusion of machine learning throughout all of these systems. So things like Redshift ML, Athena ML, Pixar, Q a just really enabling new experiences for customers, uh, in a way that's easier than it ever has been. And I think that's a, that's a big deal and I'm really excited to see what customers do with it. >>Yeah. And I think the performance thing to me, the coolest thing that I'm seeing is the graviton three and the gravitron progression with the custom stacks with all this ease of use, it's just going to be just a real performance advantage and the costs are getting lowered. So I think the ECE two instances around the compute is phenomenal. No, >>Absolutely. I mean, I think the hardware and Silicon innovation is huge and it's not just performance. It's also the energy efficiency. It's a big deal for the future reality. >>We're at an inflection point where this modern applications are being built. And in my history, I'm old, my birthday is today. I'm in my fifties. So I remember back in the eighties, every major inflection point when there was a shift in how things were developed from mainframe client server, PC inter network, you name it every time the apps change, the app owners, app developers all went to the best platform processing. And so I think, you know, that idea of system software applications being bundled together, um, is a losing formula. I think you got to have that decoupling large-scale was seeing that with cloud. And I think now if I'm an app developer, whether whether I'm in a large ISV in your ecosystem or in the APN partner or a startup, I'm going to go with my software runs the best period and where I can create value. That's right. I get distribution, I create value and it runs fast. I mean, that's, I mean, it's pretty simple. So I think the ecosystem is going to be a big action for the next couple of years. >>Absolutely. Right. And I mean, the ecosystem's huge and I think, um, and we're also grateful to have all these partners here. It's a huge deal for us. And I think it really matters for customers >>What's on your roadmap this year, what you got going on. What can you share a little bit of a trajectory without kind of, uh, breaking the rules of the Amazonian, uh, confidentiality. Um, what's, what's the focus for the year? What do you what's next? >>Well, you know, as you know, we're always talking to customers and, uh, I think we're going to make things better, faster, cheaper, easier to use. And, um, I think you've seen some of the things that we're doing with integration now, you'll see more of that. And, uh, really the goal is how can customers get value as quickly as possible for as low cost as possible? That's how we went to >>Yeah. They're in the longterm. Yeah. We've always say every time we see each other data is at the center of the value proposition. I've been saying that for 10 years now, it's actually the value proposition, powering AI. And you're seeing because of it, the rise of superclouds and then the superclouds are emerging. I think you guys are the under innings of these emerging superclouds. And so it's a huge treading, the Goldman Sachs things of validation. So again, more data, the better, sorry, cool things happening. >>It is just it's everywhere. And the, uh, the diversity of use cases is amazing. I mean, I think from, you know, the Australia swimming team to, uh, to formula one to NASDAQ, it's just incredible to see what our >>Customers do. We see the great route. Good to see you. Thanks for coming on the cube. >>Pleasure to be here as always John. Great to see you. Thank you. Yeah. >>Thanks for, thanks for sharing. All of the data is the key to the success. Data is the value proposition. You've seen the rise of superclouds because of the data advantage. If you can expose it, protect it and govern it, unleashes creativity and opportunities for entrepreneurs and businesses. Of course, you got to have the scale and the price performance. That's what doing this is the cube coverage. You're watching the leader in worldwide tech coverage here in person for any of us reinvent 2021 I'm John ferry. Thanks for watching.

Published Date : Dec 1 2021

SUMMARY :

David is great to see you. It's great to be here, John. What are you guys announcing? So you can get really fine grain controls over your data lakes and then asset transactions. It's the application of all the data and this and a new architecture. And so as you see spikes in usage, the system can scale out How are customers benefiting for instance, from the three new offerings that you guys announced the customers to think about data, what they want to do with it, what insights they can derive from it. And they're saying it's faster to match what you want to do with the outcomes, And you know, what is a very unpredictable world, as we've seen, tools out there that you guys have as partners that want to snap together. So customers don't have to have these big bang all or nothing approaches you can pick And he was pointing out that you can get a very narrow wedge and get a position And, um, you know, if you saw, uh, Goldman's announcement this week, Is there, is this open to everybody? I mean, that's been one of the, uh, you know, one of the core ideas behind AWS was we wanted to give so you see that kind of mindset of the data advantage. And it's not just data that you have. So I've got to ask you while I got you here since you're an expert. And so you can really lock down your data, but yet And then you got a Thena querying. So you can pull that data into SageMaker for machine learning, So the next question I want to ask you is because that first part of the great, great, great description, thank you very much. data format or the database is, you can actually run a query on that data. I've got to ask you now that we're here at reinvent, And I think, uh, you know, I think if customers walk away and think about it as being, What's the coolest thing that you're seeing here is that the serverless innovation, I think the, uh, you know, the continued innovation in terms of, uh, So I think the ECE two instances around the compute is phenomenal. It's a big deal for the future reality. And so I think, you know, And I think it really matters for customers What can you share a little bit of a trajectory without kind of, Well, you know, as you know, we're always talking to customers and, uh, I think we're going to make things better, I think you guys are the under innings of these emerging superclouds. I mean, I think from, you know, the Australia swimming team to, uh, to formula one to NASDAQ, Thanks for coming on the cube. Great to see you. All of the data is the key to the success.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Goldman	ORGANIZATION	0.99+
Rahul Pathak	PERSON	0.99+
AWS	ORGANIZATION	0.99+
John	PERSON	0.99+
Goldman Sachs	ORGANIZATION	0.99+
Adam	PERSON	0.99+
Jerry chin	PERSON	0.99+
NASDAQ	ORGANIZATION	0.99+
Athena	LOCATION	0.99+
Jeffrey	PERSON	0.99+
2021	DATE	0.99+
60,000 people	QUANTITY	0.99+
27,000 people	QUANTITY	0.99+
10 years	QUANTITY	0.99+
last year	DATE	0.99+
John ferry	PERSON	0.99+
today	DATE	0.99+
three	QUANTITY	0.98+
Kafka	TITLE	0.98+
Swami	PERSON	0.98+
one	QUANTITY	0.98+
first	QUANTITY	0.98+
ADF analytics	ORGANIZATION	0.97+
Amazonia	ORGANIZATION	0.97+
eighties	DATE	0.97+
Pixar	ORGANIZATION	0.97+
fifties	QUANTITY	0.97+
each	QUANTITY	0.97+
three new offerings	QUANTITY	0.96+
Redshift	TITLE	0.96+
first part	QUANTITY	0.96+
this year	DATE	0.96+
last night	DATE	0.95+
Lipski	PERSON	0.94+
couple of years ago	DATE	0.94+
next couple of years	DATE	0.94+
Sage	ORGANIZATION	0.93+
Goldmans	ORGANIZATION	0.92+
pandemic	EVENT	0.92+
this week	DATE	0.92+
Databricks	ORGANIZATION	0.9+
one place	QUANTITY	0.9+
mark	PERSON	0.88+
Supercloud	ORGANIZATION	0.87+
Tableau	ORGANIZATION	0.84+
S3	TITLE	0.84+
5g	QUANTITY	0.8+
Athena ML	ORGANIZATION	0.78+
Athena	ORGANIZATION	0.78+
Raiders	ORGANIZATION	0.77+
years ago	DATE	0.75+
nineties	DATE	0.74+
ML	ORGANIZATION	0.73+
Adams	PERSON	0.73+
two instances	QUANTITY	0.72+
Lambda	TITLE	0.71+
Thena	ORGANIZATION	0.71+
SageMaker	TITLE	0.66+
Vegas	LOCATION	0.66+
Invent	EVENT	0.64+
Vice	PERSON	0.63+
graviton three	OTHER	0.62+
Australia	LOCATION	0.59+
Las	ORGANIZATION	0.59+
Kinesis	TITLE	0.57+
Amazonian	ORGANIZATION	0.56+

Tomer Shiran, Dremio | AWS re:Invent 2021

>>Good morning. Welcome back to the cubes. Continuing coverage of AWS reinvent 2021. I'm Lisa Martin. We have two live sets here. We've got over a hundred guests on the program this week with our live sets of remote sets, talking about the next decade in cloud innovation. And I'm pleased to be welcoming back. One of our cube alumni timbers. She ran the founder and CPO of Jenny-O to the program. Tom is going to be talking about why 2022 is the year open data architectures surpass the data warehouse Timur. Welcome back to the >>Cube. Thanks for having me. It's great to be here. It's >>Great to be here at a live event in person, my goodness, sitting side by side with guests. Talk to me a little bit about before we kind of dig into the data lake house versus the data warehouse. I want to, I want to unpack that with you. Talk to me about what what's going on at Jemena you guys were on the program earlier this summer, but what are some of the things going on right now in the fall of 2021? >>Yeah, for us, it's a big year of, uh, a lot of product news, a lot of new products, new innovation, a company's grown a lot. We're, uh, you know, probably three times bigger than we were a year ago. So a lot of, a lot of new, new folks on the team and, uh, many, many new customers. >>It's good, always new customers, especially during the last 22 months, which have been obviously incredibly challenging, but I want to unpack this, the difference between a data lake and data lake house, but I love the idea of a lake house by the way, but talk to me about what the differences are similarities and how customers are benefiting. Sure. Yeah. >>I think you could think of the lake house as kind of the evolution of the lake, right? So we have, we've had data lakes for a while. Now, the transition to the cloud made them a lot more powerful and now a lot of new capabilities coming into the world of data lakes really make the, that whole kind of concept that whole architecture, much more powerful to the point that you really are not going to need a data warehouse anymore. Right. And so it kind of gives you the best of both worlds, all the advantages that we had with data lakes, the flexibility to use different processing engines, to have data in your own account and open formats, um, all those benefits, but also the benefits that you had with warehouses, where you could do transactions and get high performance for your, uh, BI workloads and things like that. So the lake house makes kind of both of those come together and gives you the, the benefits of both >>Elizabeth talk to me about from a customer lens perspective, what are some of the key benefits and how does the customer go about from say they've got data warehouses, data lakes to actually evolving to the lake house. >>You know, data warehouses have been around forever, right? And you know, there's, there's been some new innovation there as we've kind of moved to the cloud, but fundamentally there are very close and very proprietary architecture that gets very expensive quickly. And so, you know, with a data warehouse, you have to take your data and load it into the warehouse, right. You know, whether that's a, you know, Terra data or snowflake or any, any other, uh, you know, database out there, that's, that's what you do. You bring the data into the engine. Um, the data lake house is a really different architecture. It's one where you actually, you're having, you have data as its own tier, right? Stored in open formats, things like parquet files and iceberg tables. And you're basically bringing the engines to the data instead of the data to the engine. And so now all of a sudden you can start to take advantage of all this innovation that's happening on the same set of data without having to copy and move it around. So whether that's, you know, Dremio for high performance, uh, BI workloads and SQL type of analysis, a spark for kind of batch processing and machine learning, Flink for streaming. So lots of different technologies that you can use on the, on the same data and the data stays in the customer's own account, right? So S3 effectively becomes their new data warehouse. >>Okay. So it can imagine during the last 22 months of this scattered work from Eddie, and we're still in this work from anywhere environment with so much data being generated at the edge of the edge, expanding that bringing the engines to the data is probably now more timely than ever. >>Yeah. I think the, the growth in data, uh, you see it everywhere, right? That that's the reason so many companies like ourselves are doing so well. Right? It's, it's, there's so much new data, so many new use cases and every company wants to be data-driven right. They all want to be, you know, to, to democratize data within the organization. Um, you know, but you need the platforms to be able to do that. Right. And so, uh, that's very hard if you have to constantly move data around, if you have to take your data, you know, which maybe is landing in S3, but move it into, you know, subsets of it into a data warehouse. And then from there move, you know, substance of that into, you know, BI extracts, right? Tableau extracts power BI imports, and you have to create cubes and lots of copies within the data warehouse. There's no way you're going to be able to provide self-service and data democratization. And so really requires a new architecture. Um, and that's one of the main things that we've been focused on at Dremio, um, is really taking the, the, the lake house and the lake and making it, not just something that data scientists use for, you know, really kind of advanced use cases, but even your production BI workloads can actually now run on the lake house when you're using a SQL technology. Like, and then >>It's really critical because as you talked about this, you know, companies, every company, these days is a data company. If they're not, they have to be, or there's a competitor in the rear view mirror that is going to be able to take over what they're doing. So this really is really critical, especially considering another thing that we learned in the last 22 months is that there's no real-time data access is no longer, a nice to have. It's really an essential for businesses in any organization. >>I think, you know, we, we see it even in our own company, right? The folks that are joining the workforce now, they, they learn sequel in school, right. They, they, they don't want to report on their desk, printed out every Monday morning. They want access to the database. How do I connect my whatever tool I want, or even type sequel by hand. And I want access to the data and I want to just use it. Right. And I want the performance of course, to be fast because otherwise I'll get frustrated and I won't use it, which has been the status quo for a long time. Um, and that's basically what we're solving >>The lake house versus a data warehouse, better able to really facilitate data democratization across an organization. >>Yeah. Because there's a big, you know, people don't talk a lot about the story before the story, right. With, with a data warehouse, the data never starts there. Right. You typically first have your data in something like an S3 or perhaps in other databases, right. And then you have to kind of ETL at all into, um, into that warehouse. And that's a lot of work. And typically only a small subset of the data gets ETL into that data warehouse. And then the user wants to query something that's not in the warehouse. And somebody has to go from engineering, spend, you know, a month or two months, you know, respond to that ticket and wiring up some new ETL, uh, to get the data in. And so it's a big problem, right? And so if you can have a system that can query the data directly in S3 and even join it with sources, uh, outside of that things like your Oracle database, your, your SQL server database here, you know, Mongo, DB, et cetera. Well, now you can really have the ability to expose data to your, to your users within the company and make it very self-service. They can, they can query any data at any time and get a fast response time that that's, that's what they need >>At self-service is key there. Speaking of self-service and things that are new. I know you guys dromio cloud launched that recently, new SAS offering. Talk to me about that. What's going on there. Yeah. >>We want to stream your cloud. We, we spent about two years, um, working on that internally and, uh, really the goal was to simplify how we deliver all of the, kind of the benefits that we've had in our product. Right. Sub-second response times on the lake, a semantic layer, the ability to connect to multiple sources, but take away the pain of having to, you know, install and manage software. Right. And so we did it in a way that the user doesn't have to think about versions. They don't have to think about upgrades. They don't have to monitor anything. It's basically like running and using Gmail. Right? You log in, you, you get to use it, right. You don't have to be very sophisticated. There's no, not a lot of administration you have to do. Um, it basically makes it a lot, a lot simpler. >>And what's the adoption been like so far? >>It's been great. It's been limited availability, but we've been onboarding customers, uh, every week now. Um, many startups, many of the world's largest companies. So that's been, that's been really exciting actually. >>So quite a range of customers. And one of the things, it sounds like you want me to has grown itself during the pandemic. We've seen acceleration of, of that, of, of, uh, startups, of a lot of companies, of cloud adoption of migration. What are some, how have your customer conversations changed in the last 22 months as businesses and every industry kind of scrambled in the beginning to, to survive and now are realizing that they need to modernize, to thrive and to be competitive and to have competitive advantage. >>I think I've seen a few different trends here. One is certainly, there's been a lot of, uh, acceleration of movement to the cloud, right? With, uh, uh, you know, how different businesses have been impacted. It's required them to be more agile, more elastic, right. They don't necessarily know how much workload they're gonna have at any point in time. So having that flexibility, both in terms of the technology that can, you know, with Dremio cloud, we scale, for example, infinitely, like you can have, you know, one query a day, or you can have a thousand queries a second and the system just takes care of it. Right. And so that's really important to these companies that are going through, you know, being impacted in various different ways, right? You had the companies, you know, the Peloton and zooms of the world that were business was exploding. >>And then of course, you know, the travel and hospitality industries, and that went to zero, all of a sudden it's been recovering nicely, uh, you know, since then, but so that flexibility, um, has been really important to customers. I think the other thing is just they've realized that they have to leverage data, right? Because in parallel to this pandemic has been also really a boom in technology, right? And so every industry is being disrupted by new startups, whether it's the insurance industry, the financial services, a lot of InsureTech, FinTech, you know, different, uh, companies that are trying to take advantage of data. So if you, as a, as an enterprise are not doing that, you know, that's a problem. >>It is a problem. It's definitely something that I think every business and every industry needs to be very acutely aware of because from a competitive advantage perspective, you know, there's someone in that rear view mirror who is going to be focused on data. I have a real solid, modern data strategy. That's going to be able to take over if a company is resting on its laurels at all. So here we are at reinvent, they talked a lot about, um, I just came off of Adam psyllid speeds. So Lipsey's keynote. But talk to me about the jumbo AWS partnership. I know AWS its partner ecosystem is huge. You're one of the partners, but talk to me about what's going on with the partnership. How long have you guys been partners? What are the advantages for your customers? >>You know, we've been very close partners with AWS for, for a number of years now, and it kind of spans many different parts of AWS from kind of the, uh, the engineering organization. So very close relationship with the S3 team, the C2 team, uh, you know, just having dinner last night with, uh, Kevin Miller, the GM of S3. Um, and so that's kind of one side of things is really the engineering integration. You know, we're the first technology to integrate with AWS lake formation, which is Amazon's data lake security technology. So we do a lot of work together on kind of upcoming features that Amazon is releasing. Um, and then also they've been really helpful on the go-to-market side of things on the sales and marketing, um, whether it's, you know, blogs on the Amazon blog, where their sales teams actually promoting Dremio to their customers, um, uh, to help them be successful. So it's really been a good, good partnership. >>And there they are, every time I talked to somebody from Amazon, we always talk about their kind of customer first focus, their customer obsession sounds like you're, there's deep alignment on from the technical engineering perspective, sales and marketing. Talk to me a little bit about cultural alignment, because when you're going into customer conversations, I imagine they want to see one unified team. >>Yeah. You know, I think Amazon does have that customer first and obviously we do as well. And we, you know, we have to right as a, as a startup for us, you know, if a customer has a problem, the whole company will jump on that problem. Right. So that's where we call it customer obsession internally. Um, and I think that's very much what we've seen, you know, with, with AWS as well as the desire to make the customer successful comes before. Okay. How does this affect a specific Amazon product? Right? Because anytime a customer is, uh, you know, using Dremio on AWS, they're also consuming many different AWS services and they're bringing data into AWS. And so, um, I, I think for both of us, it's all about how do we solve customer problems and make them successful with their data in this case. Yup. >>Solving those customer problems is the whole reason that we're all here. Right. Talk to me a little bit about, um, as we have just a few more minutes here, we, when we hear terms like, future-proof, I always want to dig in with, with folks like yourself, chief product officers, what does it actually mean? How do you enable businesses to create these future-proof data architectures that are gonna allow them to scale and be really competitive? Sure. >>So yeah, I think many companies have been, have experienced. What's known as lock-in right. They, they invest in some technology, you know, we've seen this with, you know, databases and data warehouses, right? You, you start using that and you can really never get off and prices go up and you find out that you're spending 10 times more, especially now with the cloud data warehouses 10 times more than you thought you were going to be spending. And at that point it becomes very difficult. Right? What do you do? And so, um, one of the great things about the data lake and the lake house architecture is that the data stays stored in the customer's own account. Right? It's in their S3 buckets in source formats, like parquet files and iceberg tables. Um, and they can use many different technologies on that. So, you know, today the best technology for, for, you know, sequel and, you know, powering your, your mission critical BI is, is Dremio, but tomorrow they might be something else, right. >>And that customer can then take that, uh, uh, that company can take that new technology point at the same data and start using it right. That they don't have to go through some really crazy migration process. And, you know, we see that with Teradata data and Oracle, right? The, the, the old school vendors, um, that's always been a pain. And now it is with the, with the newer, uh, cloud data warehouses, you see a lot of complaints around that, so that the lake house is fundamentally designed. Especially if you choose open source formats, like iceberg tables, as opposed to say a Delta, like you're, you're really, you know, future-proofing yourself. Right. Um, >>Got it. Talk to me about some of the things as we wrap up here that, that attendees can learn and see and touch and feel and smell at the jumbo booth at this reinvent. >>Yeah. I think there's a, there's a few different things they can, uh, they can watch, uh, watch a demo or play around with the dremmel cloud and they can talk to our team about what we're doing with Apache iceberg. It's a iceberg to me is one of the more exciting projects, uh, in this space because, you know, it's just created by Netflix and apple Salesforce, AWS just announced support for iceberg with that, with their products, Athena and EMR. So it's really kind of emerging as the standard table format, the way to represent data in open formats in S3. We've been behind iceberg now for, for a while. And so that to us is very exciting. We're happy to chat with folks at the booth about that. Um, Nessie is another project that we created an source project for, uh, really providing a good experience for your data, where you have version control and branching, and kind of trying to reinvent, uh, data engineering, data management. So that's another cool project that there, uh, we can talk about at the booth. >>So lots of opportunity there for attendees to learn even thank you, Tomer for joining me on the program today, talking about the difference between a data warehouse data lake, the lake house, did a great job explaining that Jamil cloud what's going on and how you guys are deepening that partnership with AWS. We appreciate your time. Thank you. Thanks for having me. My pleasure for Tomer. She ran I'm Lisa Martin. You're watching the cube. Our coverage of AWS reinvent continues after this.

Published Date : Nov 30 2021

SUMMARY :

She ran the founder and CPO of Jenny-O to the program. It's great to be here. Talk to me about what what's going on at Jemena you guys were on the program earlier this summer, We're, uh, you know, probably three times bigger than we were a year data lake house, but I love the idea of a lake house by the way, but talk to me about what the differences are similarities So the lake house makes kind of both of those come together and gives you the, the benefits of both Elizabeth talk to me about from a customer lens perspective, what are some of the key benefits and how does the customer go You know, whether that's a, you know, Terra data or snowflake or any, any other, uh, you know, database out there, expanding that bringing the engines to the data is probably now more timely than ever. And so, uh, that's very hard if you have to constantly move data around, if you have to take your data, It's really critical because as you talked about this, you know, companies, every company, these days is a data company. I think, you know, we, we see it even in our own company, right? The lake house versus a data warehouse, better able to really facilitate data democratization across spend, you know, a month or two months, you know, respond to that ticket and wiring up some new ETL, I know you guys dromio cloud launched that recently, to, you know, install and manage software. Um, many startups, many of the world's largest companies. And one of the things, it sounds like you want me to has grown itself during the pandemic. So having that flexibility, both in terms of the technology that can, you know, And then of course, you know, the travel and hospitality industries, and that went to zero, all of a sudden it's been recovering nicely, You're one of the partners, but talk to me about what's going on with the partnership. um, whether it's, you know, blogs on the Amazon blog, where their sales teams actually And there they are, every time I talked to somebody from Amazon, we always talk about their kind of customer first focus, And we, you know, we have to right as a, as a startup for us, you know, if a customer has a problem, the whole company will jump on that problem. How do you enable businesses to create these future-proof They, they invest in some technology, you know, we've seen this with, you know, databases and data warehouses, And, you know, we see that with Teradata data and Oracle, right? Talk to me about some of the things as we wrap up here that, that attendees can learn and see and uh, in this space because, you know, it's just created by Netflix and apple Salesforce, So lots of opportunity there for attendees to learn even thank you, Tomer for joining me on the program

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Kevin Miller	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Tom	PERSON	0.99+
10 times	QUANTITY	0.99+
10 times	QUANTITY	0.99+
Tomer	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Elizabeth	PERSON	0.99+
two months	QUANTITY	0.99+
Tomer Shiran	PERSON	0.99+
Teradata	ORGANIZATION	0.99+
Netflix	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Lipsey	PERSON	0.99+
Dremio	PERSON	0.99+
tomorrow	DATE	0.99+
apple	ORGANIZATION	0.99+
a month	QUANTITY	0.99+
One	QUANTITY	0.99+
fall of 2021	DATE	0.98+
today	DATE	0.98+
Eddie	PERSON	0.98+
one	QUANTITY	0.98+
both worlds	QUANTITY	0.98+
Adam psyllid	PERSON	0.98+
Gmail	TITLE	0.98+
S3	TITLE	0.97+
next decade	DATE	0.97+
SQL	TITLE	0.97+
a year ago	DATE	0.97+
three times	QUANTITY	0.97+
two live sets	QUANTITY	0.97+
2022	DATE	0.97+
this week	DATE	0.96+
iceberg	TITLE	0.96+
Dremio	ORGANIZATION	0.96+
first	QUANTITY	0.96+
about two years	QUANTITY	0.95+
Apache	ORGANIZATION	0.95+
Tableau	TITLE	0.95+
Monday morning	DATE	0.94+
SAS	ORGANIZATION	0.94+
one query	QUANTITY	0.94+
Jemena	ORGANIZATION	0.94+
earlier this summer	DATE	0.93+
second	QUANTITY	0.93+
first focus	QUANTITY	0.92+
last 22 months	DATE	0.91+
Delta	ORGANIZATION	0.9+
zero	QUANTITY	0.9+
2021	DATE	0.89+
last night	DATE	0.87+
a thousand queries	QUANTITY	0.85+
Mongo	ORGANIZATION	0.85+
a day	QUANTITY	0.84+
first technology	QUANTITY	0.82+
pandemic	EVENT	0.81+
a second	QUANTITY	0.8+

Milissa Pavlik, PavCon | AWS Summit DC 2021

>>Welcome back to the cubes coverage here in Washington D. C. For a W. S. Public sector summit. I'm john fraser host of the queues and in person event but of course we have remote guests. It's a hybrid event as well. Amazon is streaming amazon web services, streaming all the teams, some of the key notes of course all the cube interviews are free and streaming out there as well on the all the cube channels and all the social coordinates. Our next guest is Melissa Pavlik President and Ceo POV con joining me here to talk about predictive maintenance, bringing that to life for the U. S. Air Force melissa. Thanks for coming in remotely on our virtual cube here at the physical event. >>Excellent, thank you. Good morning >>Show. People have been been um face to face for the first time since 2019. A lot of people remote calling in checking things out kind of an interesting time, right? We're living in so uh what's your, what's your take on all this? >>Sure. I mean it's a new way of doing business, right? Um I will say, I guess for us as a company we always have been remote so it's not too much of a change but it is definitely challenging, especially trying to engage with such a large user community such as the United States Air Force who isn't always used to working as remotely. So it's definitely a unique challenge for sure. >>Well let's get into, I love this topic. You had a real success story. There is a case study with the U. S. Air Force, what's the relationship take us through what you guys are doing together? >>Sure. Absolutely. So we started working with the Air Force now about five years ago on this subject and predictive maintenance. Sometimes you might hear me catch myself and say CBM plus condition based maintenance. They're synonymous. They mean the exact same thing basically. But about five years ago the Air Force was contemplating how do we get into a space of getting ahead of unscheduled maintenance events? Um if the military they're big push always his readiness how do we improve readiness? So to do that it was a big ask of do we have the data to get ahead of failures? So we started on this journey about five years ago as I said and frankly we started under the radar we weren't sure if it was going to work. So we started with two platforms. And of course when I think a lot of people here predictive maintenance, they immediately think of sensor data and sensors are wonderful data but unfortunately especially in an entity such as the Air Force not all assets are censored. So it also opened up a whole other avenue of how do we use the data that they have today to be able to generate and get ahead of failures. So it did start a really great partnership working not only with the individual, I'll say Air Force entities that Air Force Lifecycle Management Center but we also worked across all the major commands, the individual units, supply control, logistics and everyone else. So it's been a really great team effort to bring together all of those but typically would be rather segregated operations together. >>Yeah, they're getting a lot of props lately on a lot of their projects across the board and this one particular, how did you guys specifically help them modernize and with their and get this particular maintenance thing off the ground? >>Absolutely. I think quite simply it was what really we put their existing data to work. We really wanted to get in there and think about they already had a ton of data. There wasn't a need to generate more. We're talking about petabytes of information. So how do we use that and put it into a focus of getting ahead of failure? We said we established basically three key performance parameters right from the beginning it was, we knew we wanted to increase availability which was going to directly improve readiness. We needed to make sure we were reducing mission aborts and we wanted to get ahead of any kind of maintenance costs. So for us it was really how do we leverage and embrace machine learning and ai paired with just big data analytics and how do we take frankly what is more of a World War Two era architecture and turn it into something that is in the information age. So our modernization really started with how do we take their existing data and turn it into something that is useful and then simultaneously how do we educate the workforce and helping them understand what truly machine learning and AI offer because I think sometimes there's everyone has their own opinions of what that means, but when you put it into action and you need to make sure that it's something that they can take action on, right, it's not just pushing a dot moving numbers around, it's really thinking about and considering how their operations are done and then infusing that with the data on the back end, >>it's awesome. You know the old workflows in the cloud, this is legit, I mean physical assets, all kinds of things and his legacy is also but you want the modernization, I was gonna ask you about the machine learning and ai component, you brought that up. What specifically are you leveraging their from the ai side of the machine learning side? >>Absolutely for us, most and foremost we're talking about responsible a i in this case because unfortunately a lot of the data in the Air Force is human entry, so it's manual, which basically means it's open and rife for a lot of error into that data. So we're really focused heavily on the data integrity, we're really focused on utilizing different types of machine learning because I think on the surface the general opinion is there's a lot of data here. So it would open itself naturally into a lot of potential machine learning techniques, but the reality situation is this data is not human understandable unless you are a prior maintainer, frankly, it's a lot of codes, there's not a whole lot of common taxonomy. So what we've done is we've looked at those supervised and unsupervised models, we've taken a whole different approach to infusing it with truly, what I would say arguably is the most important key element, domain expertise. You know, someone who actually understands what this data means. So that way we can in in End up with actionable output something that the air force can actually put into use, see the results coming out of it. And thus far it's been great. Air mobility command has come back and said we've been able to reduce their my caps, which are parts waiting for maintenance by 18%. That's huge in this space. >>Yeah, it's interesting about unsupervised and supervised machine learning. That's a big distinct because you mentioned there's a lot of human error going on. That's a big part. Can you explain a little bit more because that was that to solve the human error part or was it the mix and match because the different data sets, but why the why both machine learning modes. >>So really it was to address both items frankly. When we started down this path, we weren't sure we were going to find right, We went in with some hypotheses and some of those ended up being true and others were not. So it was a way for us to quickly adjust as we needed to again put the data into actionable use and make sure we were responsibly doing that. So for us a lot of it, because it's human understanding and human error that goes into this natural language processing is a really big area in this space. So for us, adjusting between and trying different techniques is really where we were able to discover and find out what was going to be the most effective and useful in this particular space as well as cost effective. Because for us there's also this resistance, you have to have resist the urge to want to monitor everything. In this case we're talking about really focusing on those top drivers and depending on the type of data that we had, depending on the users that we knew were going to get involved with it as well as I would say, the historical information, it really would help us dictate on supervised versus supervised and going the unsupervised route. In some cases there's just still not ready for that because the data is just so manual. Once we get to a point where there is more automation and more automated data collection, unsupervised will definitely no doubt become more valuable right now though, in order to get those actionable. That supervised modeling was really what we found to be the most valuable >>and that makes total sense. You've got a lot of head room to grow into with Unsupervised, which is actually harder as you as you know, everyone, everyone everyone knows that. But I mean that's really the reality. Congratulations. I gotta ask you on the AWS side what part do they play in all this? Obviously the cloud um their relationship with the Air Force as well, what's their what's their role in this particular maintenance solution? >>Sure, absolutely. And I'll just say, I mean we're really proud to be a partner network with them and so when we started this there was no cloud, so today a lot of opportunity or things we hear about in the Air Force where like cloud one platform one, those weren't in existence, you know, five years ago or so. So for us when we started down this path and we had to identify very quickly a format and a host location that would allow us not only handle large amounts of data but do all of the deep analysis we needed to AWS GovCloud is where that came in. Plus it also is awesome that they were already approved at I. L. Five to be able to host that we in collaboration with them host a nist 801 +71 environment. And so it's really allowed us to to grow and deliver this this impact out over 6000 users today on the Air Force side. So for us with a W. S. Has been a great partnership. They obviously have some really great native services that are inside their cloud as well as the pairing and easy collaboration among not only licensed products but also all that free and open source that's out there because again, arguably that's the best community to pull from because they're constantly evolving and working in the space. But a W has been a really great partner for us and of course we have some of our very favorite services I'm happy to talk about, but they've been really great to work with >>what's the top services. >>So for us, a lot of top services are like ec two's work spaces, of course S three and Glacier um are right up there, but you really enjoy working across glue Athena were really big on, we find a commercial service we're looking for that's not yet available in Govcloud. And we pull in our AWS partners and ask, hey, you know when it's going to get into the gulf cloud space and they move pretty quickly to get those in there. So recent months are definitely a theme in blue. Well, >>congratulations, great solution, I love this application because it highlights the power of the cloud, What's the future in store for the U. S. Air Force when it comes to predictive maintenance. >>Sure. I mean, I think at this point they are just going to continue adding additional top driver analyses you through our work for the past couple years. We've identified a lot of operational and functional opportunities for them. So there's gonna be some definitely foundational changing coming along, some enabling new technologies to get that data integrity more automated as well so that there isn't such a heavy lift on the downstream, we're talking about data cleansing, but I think as far as predictive maintenance goes, we're definitely going to see more and more improvements across the readiness level, getting rid of and eliminating that unscheduled event that drives a lot of the readiness concerns that are out there. And we're also hoping to see a lot more improvement and I'd say enhancement across the supply chain because we know that's also an area that really they could get ahead on your part of our other work as we developed a five year long range supply forecast and it's really been opening some eyes to see how they can better plan, not only on the maintenance side but also supporting maintenance from a logistics and supply, >>great stuff melissa. Great to have you on President Ceo Path Con, you're also the business owner. Um how's things going with the business? The pandemic looks like I'm gonna come out of it stronger. Got the tailwind with cloud technology. The modernization boom is here in, in Govcloud, 10 years celebrating Govcloud birthday here at this event. How's business house? How are you doing >>good. Everything has actually been, we were, I guess fortunate, as I mentioned the very beginning, we were remote companies. So fortunately for us the pandemic did not have that much of an operational hiccup and being that a lot of our clients are in the federal space, we were able to continue working and amazingly we actually grew during the pandemic. We added quite a bit of a personnel to the team and so we're looking forward to doing some more predictive maintenance across, not only explaining the Air Force but the other services as well. >>You know, the people who were Agile had some cloud action going on, we're productive and they came out stronger melissa. Great to have you on the cube. Thank you for coming in remotely and joining our face to face event from the interwebs. Thank you so much for coming on cuba >>All right, thank you, john have a great rest of your day. >>Okay. I'm john for here at the cube with a W. S. Public sector summit in person and remote bringing guest on. We've got the new capability of bringing remotes in. We do in person. I'll see you face to face hear the cube and it's like to be here at the public sector summit. Thanks for watching. Mhm. Mhm >>robert, Herjavec

Published Date : Sep 30 2021

SUMMARY :

I'm john fraser host of the queues and in person event but of course we have remote guests. Excellent, thank you. A lot of people remote calling in checking things out kind of an interesting time, we always have been remote so it's not too much of a change but it is definitely There is a case study with the U. So to do that it was a big ask of do we have the data So for us it was really how do we leverage and embrace I was gonna ask you about the machine learning and ai component, you brought that up. So that way we can in in to solve the human error part or was it the mix and match because the different data sets, depending on the users that we knew were going to get involved with it as well as I You've got a lot of head room to grow into with Unsupervised, So for us with a W. S. Has been a great partnership. And we pull in our AWS partners and ask, hey, you know when it's going to get into the gulf cloud What's the future in store for the U. S. Air Force when it comes to predictive maintenance. enhancement across the supply chain because we know that's also an area that really Got the tailwind with cloud technology. that a lot of our clients are in the federal space, we were able to continue working and amazingly we actually Great to have you on the cube. We've got the new capability of bringing remotes in.

ENTITIES

Entity	Category	Confidence
AWS	ORGANIZATION	0.99+
Washington D. C.	LOCATION	0.99+
john	PERSON	0.99+
U. S. Air Force	ORGANIZATION	0.99+
Air Force Lifecycle Management Center	ORGANIZATION	0.99+
robert	PERSON	0.99+
U. S. Air Force	ORGANIZATION	0.99+
Air Force	ORGANIZATION	0.99+
five year	QUANTITY	0.99+
18%	QUANTITY	0.99+
United States Air Force	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
cuba	LOCATION	0.99+
10 years	QUANTITY	0.99+
World War Two	EVENT	0.99+
two platforms	QUANTITY	0.99+
Melissa Pavlik	PERSON	0.99+
amazon	ORGANIZATION	0.99+
today	DATE	0.99+
five years ago	DATE	0.99+
both	QUANTITY	0.98+
both items	QUANTITY	0.98+
Govcloud	ORGANIZATION	0.98+
john fraser	PERSON	0.98+
over 6000 users	QUANTITY	0.98+
first time	QUANTITY	0.97+
2019	DATE	0.97+
Herjavec Milissa Pavlik	PERSON	0.97+
about five years ago	DATE	0.96+
pandemic	EVENT	0.95+
W. S. Public sector	EVENT	0.91+
ec two	ORGANIZATION	0.9+
S three	ORGANIZATION	0.89+
GovCloud	TITLE	0.88+
President	PERSON	0.87+
I. L. Five	LOCATION	0.83+
three key	QUANTITY	0.8+
Glacier	ORGANIZATION	0.79+
801 +71	OTHER	0.75+
past couple years	DATE	0.73+
petabytes	QUANTITY	0.69+
AWS Summit DC 2021	EVENT	0.67+
melissa	LOCATION	0.65+
Ceo POV con	ORGANIZATION	0.62+
W. S.	ORGANIZATION	0.62+
Ceo Path Con	EVENT	0.61+
melissa	PERSON	0.59+
Agile	ORGANIZATION	0.55+
one platform	QUANTITY	0.53+
cloud	ORGANIZATION	0.53+
one	QUANTITY	0.52+
PavCon	EVENT	0.52+
Air	ORGANIZATION	0.5+
Athena	ORGANIZATION	0.43+

Venkat Venkataramani, Rockset & Carl Sjogreen, Seesaw | AWS Startup Showcase

(mid tempo digital music) >> Welcome to today's session of theCUBE' presentation of the AWS startup showcase. This is New Breakthroughs and DevOps, Data Analytics, and Cloud Management Tools. The segment is featuring Rockset and we're going to be talking about data analytics. I'm your host, Lisa Martin, and today I'm joined by one of our alumni, Venkat Venkataramani, the co-founder and CEO of Rockset, and Carl Sjogreen, the co-founder and CPO of Seesaw Learning. We're going to be talking about the fast path to real-time analytics at Seesaw. Guys, Thanks so much for joining me today. >> Thanks for having us >> Thank you for having us. >> Carl, let's go ahead and start with you. Give us an overview of Seesaw. >> Yeah, so Seesaw is a platform that brings educators, students, and families together to create engaging and learning experiences. We're really focused on elementary aged students, and have a suite of creative tools and engaging learning activities that helps get their learning and ideas out into the world and share that with family members. >> And this is used by over 10 million teachers and students and family members across 75% of the schools in the US and 150 countries. So you've got a great big global presence. >> Yeah, it's really an honor to serve so many teachers and students and families. >> I can imagine even more so now with the remote learning being such a huge focus for millions and millions across the country. Carl, let's go ahead and get the backstory. Let's talk about data. You've a ton of data on how your product is being used across millions of data points. Talk to me about the data goals that you set prior to using Rockset. >> Yeah, so, as you can imagine with that many users interacting with Seesaw, we have all sorts of information about how the product is being used, which schools, which districts, what those usage patterns look like. And before we started working with Rockset, a lot of data infrastructure was really custom built and cobbled together a bit over the years. We had a bunch of batch jobs processing data, we were using some tools, like Athena, to make that data visible to our internal customers. But we had a very sort disorganized data infrastructure that really as we've grown, we realized was getting in the way of helping our sales and marketing and support and customer success teams, really service our customers in the way that we wanted to past. >> So operationalizing that data to better serve internal users like sales and marketing, as well as your customers. Give me a picture, Carl, of those key technology challenges that you knew you needed to solve. >> Yeah, well, at the simplest level, just understanding, how an individual school or district is using Seesaw, where they're seeing success, where they need help, is a critical question for our customer support teams and frankly for our school and district partners. a lot of what they're asking us for is data about how Seesaw is being used in their school, so that they can help target interventions, They can understand where there is an opportunity to double down on where they are seeing success. >> Now, before you found Rockset, you did consider a more traditional data warehouse approach, but decided against it. Talk to me about the decision why was a traditional data warehouse not the right approach? >> Well, one of the key drivers is that, we are heavy users of DynamoDB. That's our main data store and has been tremendous aid in our scaling. Last year we scaled with the transition to remote learning, most of our metrics by, 10X and Dynamo didn't skip a beat, it was fantastic in that environment. But when we started really thinking about how to build a data infrastructure on top of it, using a sort of traditional data warehouse, a traditional ETL pipeline, it wasn't going to require a fair amount of work for us to really build that out on our own on top of Dynamo. And one of the key advantages of Rockset was that it was basically plug and play for our Dynamo instance. We turned Rockset on, connected it to our DynamoDB and were able within hours to start querying that data in ways that we hadn't before. >> Venkat let's bring you into the conversation. Let's talk about the problems that you're solving for Seesaw and also the complimentary relationship that you have with DynamoDB. >> Definitely, I think, Seesaw, big fan of the product. We have two kids in elementary school that are active users, so it's a pleasure to partner with Seesaw here. If you really think about what they're asking for, what Carl's vision was for their data stack. The way we look at is business observability. They have many customers and they want to make sure that they're doing the right thing and servicing them better. And all of their data is in a very scalable, large scale, no SEQUEL store like DynamoDB. So it makes it very easy for you to build applications, but it's very, very hard to do analytics on it. Rockset had comes with all batteries included, including real-time data connectors, with Amazon DynamoDB. And so literally you can just point Rockset at any of your Dynamo tables, even though it's a no SEQUEL store, Rockset will in real time replicate the data and automatically convert them into fast SEQUEL tables for you to do analytics on. And so within one to two seconds of data getting modified or new data arriving in DynamoDB from your application, within one to two seconds, it's available for query processing in Rockset with full feature SEQUEL. And not just that, I think another very important aspect that was very important for Seesaw is not just that they wanted me to do batch analytics. They wanted their analytics to be interactive because a lot of the time we just say something is wrong. It's good to know that, but oftentimes you have a lot more followup questions. Why is it wrong? When did it go wrong? Is it a particular release that we did? Is it something specific to the school district? Are they trying to use some part of the product more than other parts of the product and struggling with it? Or anything like that. It's really, I think it comes down to Seesaw's and Carl's vision of what that data stack should serve and how we can use that to better serve the customers. And Rockset's indexing technology, and whatnot allows you to not only get real-time in terms of data freshness, but also the interactivity that comes in ad-hoc drilling down and slicing and dicing kind of analytics that is just our bread and butter . And so that is really how I see not only us partnering with Seesaw and allowing them to get the business observerbility they care about, but also compliment Dynamo transactional databases that are massively scalable, born in the cloud, like DynamoDB. >> Carl talked to me about that complimentary relationship that Venkat just walked us through and how that is really critical to what you're trying to deliver at Seesaw. >> Yeah, well, just to reiterate what Venkat said, I think we have so much data that any question you ask about it, immediately leads to five other questions about it. We have a very seasonal business as one example. Obviously in the summertime when kids aren't in school, we have very different usage patterns, then during this time right now is our critical back to school season versus a steady state, maybe in the middle of the school year. And so really understanding how data is trending over time, how it compares year over year, what might be driving those things, is something that frankly we just haven't had the tools to really dig into. There's a lot about that, that we are still beginning to understand and dig into more. And so this iterative exploration of data is incredibly powerful to expose to our product team, our sales and marketing teams to really understand where Seesaw's working and where we still have work do with our customers. And that's so critical to us doing a good job for schools in districts. >> And how long have you been using Rockset, Carl? >> It's about six months now, maybe a little bit longer. >> Okay, so during the pandemic. So talk to me a little bit about in the last 18 months, where we saw the massive overnight transition to remote learning and there's still a lot of places that are in that or a hybrid environment. How critical was it to have Rockset to fuel real-time analytics interactivity, particularly in a very challenging last 18 month time period? >> The last 18 months have been hard for everyone, but I think have hit teachers and schools maybe harder than anyone, they have been struggling with. And then, overnight transition to remote learning challenges of returning to the classroom hybrid learning, teachers and schools are being asked to stretch in ways they have never been stretched before. And so, our real focus last year was in doing whatever we could to help them manage those transitions. And data around student attendance in a remote learning situation, data around which kids were completing lessons and which kids weren't, was really critical data to provide to our customers. And a lot of our data infrastructure had to be built out to support answering those questions in this really crazy time for schools. >> I want to talk about the data set, but I'd like to go back to Venkat 'cause what's interesting about this story is Seesaw is a customer of Rockset, Venkat, is a customer of Seesaw. Talk to me Venkat about how this has been helpful in the remote learning that your kids have been going through the last year and a half. >> Absolutely. I have two sons, nine and ten year olds, and they are in fourth and fifth grade now. And I still remember when I told them that Seesaw is considering using Rockset for the analytics, they were thrilled, they were overjoyed because finally they understood what I do for a living. (chuckling) And so that was really amazing. I think, it was a fantastic dual because for the first time I actually understood what kids do at school. I think every week at the end of the week, we would use Seesaw to just go look at, "Hey, well, let's see what you did last week." And we would see not only what the prompts and what the children were doing in the classroom, but also the comments from the educators, and then they comment back. And then we were like, "Hey, this is not how you speak to an educators." So it was really amazing to actually go through that, and so we are very, very big fans of the product, we really look forward to using it, whether it is remote learning or not, we try to use it as a family, me, my wife and the kids, as much as possible. And it's a very constant topic of conversation, every week when we are working with the kids and seeing how we can help them. >> So from an observability perspective, it sounds like it's giving parents and teachers that visibility that really without it, you don't get. >> That's absolutely correct . I think the product itself is about making connections, giving people more visibility into things that are constantly happening, but you're not in the know. Like, before Seesaw, I used to ask the kids, "How was school today? "what happened in the class?" And they'll say, "It was okay." It would be a very short answer, it wouldn't really have the depth that we are able to get from Seesaw. So, absolutely. And so it's only right that, that level of observability and that level of... Is also available for their business teams, the support teams so that they can also service all the organizations that Seesaw's working with, not only the parents and the educators and the students that are actually using the product. >> Carl, let's talk about that data stack And then I'm going to open the can on some of those impacts that it's making to your internal folks. We talked about DynamoDB, but give me an visual audio, visual picture of the data stack. >> Yeah. So, we use DynamoDB as our database of record. We're now in the process of centralizing all of our analytics into Rockset. So that rather than having different BaaS jobs in different systems, querying that data in different ways, trying to really set Rockset up as the source of truth for analytics on top of Dynamo. And then on top of Rockset, exposing that data, both to internal customers for that interactive iterative SEQUEL style queries, but also bridging that data into the other systems our business users use. So Salesforce, for example, is a big internal tool and have that data now piped into Salesforce so that a sales rep can run a report on a prospect to reach out to, or a customer that needs help getting started with Seesaw. And it's all plumbed through the Rockset infrastructure. >> From an outcome standpoint, So I mentioned sales and marketing getting that visibility, being able to act on real time data, how has it impacted sales in the last year and a half? six months rather since , it's now since months using it. >> Well, I don't know if I can draw a direct line between those things, but it's been a very busy year for Seesaw, as schools have transitioned to remote learning. And our business is really largely driven by teachers discovering our free product, finding it valuable in their classroom, and then asking their school or district leadership to purchase a school wide subscription. It's a very bottoms up sales motion. And so data on where teachers are starting to use Seesaw is the key input into our sales and marketing discussions with schools and districts. And so understanding that data quickly in real time is a key part of our sales strategy and a key part of how we grow at Seesaw over time. >> And it sounds like Rockset is empowering those users, the sales and marketing folks to really fine tune their interactions with existing customers, prospective customers. And I imagine you on the product side in terms of tuning the product. What are some of the things Carl that you've learned in the last six months that have helped you make better decisions on what you want Seesaw to deliver in the future? >> Well, one of the things that I think has been really interesting is how usage patterns have changed between the classroom and remote learning. We saw per student usage of Seesaw increased dramatically over the past year, and really understanding what that means for how the product needs to evolve to better meet teacher needs, to help organize that information, since it's now a lot more of it, really helped motivate our product roadmap over the last year. We launched a new progress dashboard that helps teachers get an added glance view of what's happening in their classroom. That was really in direct response to the changing usage patterns, that we were able to understand with better insights into data. >> And those insights allow you to pivot and iterate on the product. Venkat I want to just go back to the AWS relationship for a second. You both talked about the complimentary nature of Rockset and DynamoDB. Here we are at the AWS Startup Showcase. Venkat just give the audience a little overview of the partnership that you guys have with AWS. >> Rockset fully runs on AWS, so we are customer of AWS. We are also a partner. There are lots of amazing cloud data products that AWS has, including DynamoDB or AWS Kinesis. And so one with which we have built in integrations. So if you're managing data in AWS, we compliment and we can provide, very, very fast interactive real-time analytics on all of your datasets. So the partnership has been wonderful, we're very excited to be in the Startup Showcase. And so I hope this continuous for years to come. >> Let's talk about the synergies between a Rockset and Seesaw for a second. I know we talked about the huge value of real time analytics, especially in today's world, where we've learned many things in the last year and a half, including that real-time analytics is no longer a nice to have for a lot of industries, 'cause I think Carl as you said, if you can't get access to the data, then there's questions we can't ask. Or we can't iterate on operations, if we wait seconds for every query to load, then there's questions we can't ask. Talk to me Venkat, about how Rockset is benefiting from what you're learning from Seesaw's usage of the technology? >> Absolutely. I mean, if you go to the first part of the question on why do businesses really go after real time. What is the drive here? You might have heard the phrase, the world is going from batch to real-time. What does it really mean? What's the driving factor there? Our take on it is, I think it's about accelerating growth. Seesaw's product being amazing and it'll continue to grow, it'll continue to be a very, very important product in the world. With or without Rockset, that will be true. The way we look at once they have real-time business observability, is that inherent growth that they have, they can reach more people, they can put their product in the hands of more and more people, they can iterate faster. And at the end of the day, it is really about having this very interesting platform, very interesting architecture to really make a lot more data driven decisions and iterate much more quickly. And so in batch analytics, if you were able to make, let's say five decisions a quarter, in real time analytics you can make five decisions a day. So that's how we look at it. So that is really, I think, what is the underpinnings of why the world is going from batch to real time. And what have we learned from having a Seesaw as a customer? I think Seesaw has probably one of the largest DynamoDB installations that we have looked at. I think, we're talking about billions and billions of records, even though they have tens of millions of active users. And so I think it has been an incredible partnership working with them closely, and they have had a tremendous amount of input on our product roadmap and some of that like role-based access control and other things have already being a part of the product, thanks to the continuous feedback we get from their team. So we're delighted about this partnership and I am sure there's more input that they have, that we cannot wait to incorporate in our roadmap. >> I imagine Venkat as well, you as the parent user and your kids, you probably have some input that goes to the Seesaw side. So this seems like a very synergistic relationship. Carl, a couple more questions for you. I'd love to know how in this... Here we are kind of back to school timeframe, We've got a lot of students coming back, they're still remote learning. What are some of the things that you're excited about for this next school year that do you think Rockset is really going to fuel or power for Seesaw? >> Yeah, well, I think schools are navigating yet another transition now, from a world of remote learning to a world of back to the classroom. But back to the classroom feels very different than it does at any other back to school timeframe. Many of our users are in first or second grade. We serve early elementary age ranges and some of those students have never been in a classroom before. They are entering second grade and never having been at school. And that's hard. That's a hard transition for teachers in schools to make. And so as a partner to those schools, we want to do everything we can to help them manage that transition, in general and with Seesaw in particular. And the more we can understand how they're using Seesaw, where they're struggling with Seesaw, as part of that transition, the more we can be a good partner to them and help them really get the most value out of Seesaw, in this new world that we're living in, which is sort of like normal, and in many ways not. We are still not back to normal as far as schools are concerned. >> I'm sure though, the partnership that you provide to the teachers and the students can be a game changer in these, and still navigating some very uncertain times. Carl, last question for you. I want you to point folks to where they can go to learn more about Seesaw, and how for all those parents watching, they might be able to use this with their families. >> Yeah, well, seesaw.me is our website, and you can go to seesaw.me and learn more about Seesaw, and if any of this sounds interesting, ask your teacher, if they're not using Seesaw, to give it a look. >> Seesaw.me, excellent. Venkat, same question for you. Where do you want folks to go to learn more about Rockset and its capabilities? >> Rockset.com is our website. There is a free trial for... $300 worth of free trial credits. It's a self service platform, you don't need to talk to anybody, all the pricing and everything is out there. So, if real-time analytics and modernizing your data stack is on your roadmap, go give it a spin. >> Excellent guys. Thanks so much for joining me today, talking about real-time analytics, how it's really empowering both the data companies and the users to be able to navigate in challenging waters. Venkat, thank you, Carl, thank you for joining us. >> Thanks everyone. >> Thanks Lisa. >> For my guests, this has been our coverage of the AWS Startup Showcase, New Breakthroughs in DevOps, Data Analytics and Cloud Management Tools. I am Lisa Martin. Thanks for watching. (mid tempo music)

Published Date : Sep 22 2021

SUMMARY :

the fast path to real-time and start with you. out into the world and share across 75% of the schools to serve so many teachers and get the backstory. in the way that we wanted to past. that you knew you needed to solve. to double down on where Talk to me about the decision And one of the key advantages of Rockset that you have with DynamoDB. because a lot of the time we and how that is really critical is our critical back to school season It's about six months now, in the last 18 months, where we saw challenges of returning to the classroom in the remote learning And so that was really amazing. that visibility that really and the students that are And then I'm going to open the can and have that data now in the last year and a half? is the key input into our And I imagine you on the product side for how the product needs to evolve that you guys have with AWS. in the Startup Showcase. in the last year and a half, and it'll continue to grow, that goes to the Seesaw side. And the more we can understand the partnership that you provide and if any of this sounds interesting, to learn more about Rockset all the pricing and both the data companies and the users of the AWS Startup Showcase,

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
Venkat Venkataramani	PERSON	0.99+
Carl	PERSON	0.99+
Carl Sjogreen	PERSON	0.99+
Venkat	PERSON	0.99+
Seesaw	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Rockset	ORGANIZATION	0.99+
$300	QUANTITY	0.99+
nine	QUANTITY	0.99+
US	LOCATION	0.99+
Venkat	ORGANIZATION	0.99+
millions	QUANTITY	0.99+
fourth	QUANTITY	0.99+
two kids	QUANTITY	0.99+
Lisa	PERSON	0.99+
first	QUANTITY	0.99+
Last year	DATE	0.99+
one	QUANTITY	0.99+
two seconds	QUANTITY	0.99+
one example	QUANTITY	0.99+
tens of millions	QUANTITY	0.99+
five decisions	QUANTITY	0.99+
last year	DATE	0.99+
second grade	QUANTITY	0.99+
five other questions	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
last week	DATE	0.99+
six months	QUANTITY	0.99+
Dynamo	ORGANIZATION	0.99+
ten year	QUANTITY	0.99+
150 countries	QUANTITY	0.98+
today	DATE	0.98+
billions	QUANTITY	0.98+
two sons	QUANTITY	0.98+

Breaking Analysis: How JPMC is Implementing a Data Mesh Architecture on the AWS Cloud

>> From theCUBE studios in Palo Alto and Boston, bringing you data-driven insights from theCUBE and ETR. This is braking analysis with Dave Vellante. >> A new era of data is upon us, and we're in a state of transition. You know, even our language reflects that. We rarely use the phrase big data anymore, rather we talk about digital transformation or digital business, or data-driven companies. Many have come to the realization that data is a not the new oil, because unlike oil, the same data can be used over and over for different purposes. We still use terms like data as an asset. However, that same narrative, when it's put forth by the vendor and practitioner communities, includes further discussions about democratizing and sharing data. Let me ask you this, when was the last time you wanted to share your financial assets with your coworkers or your partners or your customers? Hello everyone, and welcome to this week's Wikibon Cube Insights powered by ETR. In this breaking analysis, we want to share our assessment of the state of the data business. We'll do so by looking at the data mesh concept and how a leading financial institution, JP Morgan Chase is practically applying these relatively new ideas to transform its data architecture. Let's start by looking at what is the data mesh. As we've previously reported many times, data mesh is a concept and set of principles that was introduced in 2018 by Zhamak Deghani who's director of technology at ThoughtWorks, it's a global consultancy and software development company. And she created this movement because her clients, who were some of the leading firms in the world had invested heavily in predominantly monolithic data architectures that had failed to deliver desired outcomes in ROI. So her work went deep into trying to understand that problem. And her main conclusion that came out of this effort was the world of data is distributed and shoving all the data into a single monolithic architecture is an approach that fundamentally limits agility and scale. Now a profound concept of data mesh is the idea that data architectures should be organized around business lines with domain context. That the highly technical and hyper specialized roles of a centralized cross functional team are a key blocker to achieving our data aspirations. This is the first of four high level principles of data mesh. So first again, that the business domain should own the data end-to-end, rather than have it go through a centralized big data technical team. Second, a self-service platform is fundamental to a successful architectural approach where data is discoverable and shareable across an organization and an ecosystem. Third, product thinking is central to the idea of data mesh. In other words, data products will power the next era of data success. And fourth data products must be built with governance and compliance that is automated and federated. Now there's lot more to this concept and there are tons of resources on the web to learn more, including an entire community that is formed around data mesh. But this should give you a basic idea. Now, the other point is that, in observing Zhamak Deghani's work, she is deliberately avoided discussions around specific tooling, which I think has frustrated some folks because we all like to have references that tie to products and tools and companies. So this has been a two-edged sword in that, on the one hand it's good, because data mesh is designed to be tool agnostic and technology agnostic. On the other hand, it's led some folks to take liberties with the term data mesh and claim mission accomplished when their solution, you know, maybe more marketing than reality. So let's look at JP Morgan Chase in their data mesh journey. Is why I got really excited when I saw this past week, a team from JPMC held a meet up to discuss what they called, data lake strategy via data mesh architecture. I saw that title, I thought, well, that's a weird title. And I wondered, are they just taking their legacy data lakes and claiming they're now transformed into a data mesh? But in listening to the presentation, which was over an hour long, the answer is a definitive no, not at all in my opinion. A gentleman named Scott Hollerman organized the session that comprised these three speakers here, James Reid, who's a divisional CIO at JPMC, Arup Nanda who is a technologist and architect and Serita Bakst who is an information architect, again, all from JPMC. This was the most detailed and practical discussion that I've seen to date about implementing a data mesh. And this is JP Morgan's their approach, and we know they're extremely savvy and technically sound. And they've invested, it has to be billions in the past decade on data architecture across their massive company. And rather than dwell on the downsides of their big data past, I was really pleased to see how they're evolving their approach and embracing new thinking around data mesh. So today, we're going to share some of the slides that they use and comment on how it dovetails into the concept of data mesh that Zhamak Deghani has been promoting, and at least as we understand it. And dig a bit into some of the tooling that is being used by JP Morgan, particularly around it's AWS cloud. So the first point is it's all about business value, JPMC, they're in the money business, and in that world, business value is everything. So Jr Reid, the CIO showed this slide and talked about their overall goals, which centered on a cloud first strategy to modernize the JPMC platform. I think it's simple and sensible, but there's three factors on which he focused, cut costs always short, you got to do that. Number two was about unlocking new opportunities, or accelerating time to value. But I was really happy to see number three, data reuse. That's a fundamental value ingredient in the slide that he's presenting here. And his commentary was all about aligning with the domains and maximizing data reuse, i.e. data is not like oil and making sure there's appropriate governance around that. Now don't get caught up in the term data lake, I think it's just how JP Morgan communicates internally. It's invested in the data lake concept, so they use water analogies. They use things like data puddles, for example, which are single project data marts or data ponds, which comprise multiple data puddles. And these can feed in to data lakes. And as we'll see, JPMC doesn't strive to have a single version of the truth from a data standpoint that resides in a monolithic data lake, rather it enables the business lines to create and own their own data lakes that comprise fit for purpose data products. And they do have a single truth of metadata. Okay, we'll get to that. But generally speaking, each of the domains will own end-to-end their own data and be responsible for those data products, we'll talk about that more. Now the genesis of this was sort of a cloud first platform, JPMC is leaning into public cloud, which is ironic since the early days, in the early days of cloud, all the financial institutions were like never. Anyway, JPMC is going hard after it, they're adopting agile methods and microservices architectures, and it sees cloud as a fundamental enabler, but it recognizes that on-prem data must be part of the data mesh equation. Here's a slide that starts to get into some of that generic tooling, and then we'll go deeper. And I want to make a couple of points here that tie back to Zhamak Deghani's original concept. The first is that unlike many data architectures, this puts data as products right in the fat middle of the chart. The data products live in the business domains and are at the heart of the architecture. The databases, the Hadoop clusters, the files and APIs on the left-hand side, they serve the data product builders. The specialized roles on the right hand side, the DBA's, the data engineers, the data scientists, the data analysts, we could have put in quality engineers, et cetera, they serve the data products. Because the data products are owned by the business, they inherently have the context that is the middle of this diagram. And you can see at the bottom of the slide, the key principles include domain thinking, an end-to-end ownership of the data products. They build it, they own it, they run it, they manage it. At the same time, the goal is to democratize data with a self-service as a platform. One of the biggest points of contention of data mesh is governance. And as Serita Bakst said on the Meetup, metadata is your friend, and she kind of made a joke, she said, "This sounds kind of geeky, but it's important to have a metadata catalog to understand where data resides and the data lineage in overall change management. So to me, this really past the data mesh stink test pretty well. Let's look at data as products. CIO Reid said the most difficult thing for JPMC was getting their heads around data product, and they spent a lot of time getting this concept to work. Here's the slide they use to describe their data products as it related to their specific industry. They set a common language and taxonomy is very important, and you can imagine how difficult that was. He said, for example, it took a lot of discussion and debate to define what a transaction was. But you can see at a high level, these three product groups around wholesale, credit risk, party, and trade and position data as products, and each of these can have sub products, like, party, we'll have to know your customer, KYC for example. So a key for JPMC was to start at a high level and iterate to get more granular over time. So lots of decisions had to be made around who owns the products and the sub-products. The product owners interestingly had to defend why that product should even exist, what boundaries should be in place and what data sets do and don't belong in the various products. And this was a collaborative discussion, I'm sure there was contention around that between the lines of business. And which sub products should be part of these circles? They didn't say this, but tying it back to data mesh, each of these products, whether in a data lake or a data hub or a data pond or data warehouse, data puddle, each of these is a node in the global data mesh that is discoverable and governed. And supporting this notion, Serita said that, "This should not be infrastructure-bound, logically, any of these data products, whether on-prem or in the cloud can connect via the data mesh." So again, I felt like this really stayed true to the data mesh concept. Well, let's look at some of the key technical considerations that JPM discussed in quite some detail. This chart here shows a diagram of how JP Morgan thinks about the problem, and some of the challenges they had to consider were how to write to various data stores, can you and how can you move data from one data store to another? How can data be transformed? Where's the data located? Can the data be trusted? How can it be easily accessed? Who has the right to access that data? These are all problems that technology can help solve. And to address these issues, Arup Nanda explained that the heart of this slide is the data in ingestor instead of ETL. All data producers and contributors, they send their data to the ingestor and the ingestor then registers the data so it's in the data catalog. It does a data quality check and it tracks the lineage. Then, data is sent to the router, which persists the data in the data store based on the best destination as informed by the registration. This is designed to be a flexible system. In other words, the data store for a data product is not fixed, it's determined at the point of inventory, and that allows changes to be easily made in one place. The router simply reads that optimal location and sends it to the appropriate data store. Nowadays you see the schema infer there is used when there is no clear schema on right. In this case, the data product is not allowed to be consumed until the schema is inferred, and then the data goes into a raw area, and the inferer determines the schema and then updates the inventory system so that the data can be routed to the proper location and properly tracked. So that's some of the detail of how the sausage factory works in this particular use case, it was very interesting and informative. Now let's take a look at the specific implementation on AWS and dig into some of the tooling. As described in some detail by Arup Nanda, this diagram shows the reference architecture used by this group within JP Morgan, and it shows all the various AWS services and components that support their data mesh approach. So start with the authorization block right there underneath Kinesis. The lake formation is the single point of entitlement and has a number of buckets including, you can see there the raw area that we just talked about, a trusted bucket, a refined bucket, et cetera. Depending on the data characteristics at the data catalog registration block where you see the glue catalog, that determines in which bucket the router puts the data. And you can see the many AWS services in use here, identity, the EMR, the elastic MapReduce cluster from the legacy Hadoop work done over the years, the Redshift Spectrum and Athena, JPMC uses Athena for single threaded workloads and Redshift Spectrum for nested types so they can be queried independent of each other. Now remember very importantly, in this use case, there is not a single lake formation, rather than multiple lines of business will be authorized to create their own lakes, and that creates a challenge. So how can that be done in a flexible and automated manner? And that's where the data mesh comes into play. So JPMC came up with this federated lake formation accounts idea, and each line of business can create as many data producer or consumer accounts as they desire and roll them up into their master line of business lake formation account. And they cross-connect these data products in a federated model. And these all roll up into a master glue catalog so that any authorized user can find out where a specific data element is located. So this is like a super set catalog that comprises multiple sources and syncs up across the data mesh. So again to me, this was a very well thought out and practical application of database. Yes, it includes some notion of centralized management, but much of that responsibility has been passed down to the lines of business. It does roll up to a master catalog, but that's a metadata management effort that seems compulsory to ensure federated and automated governance. As well at JPMC, the office of the chief data officer is responsible for ensuring governance and compliance throughout the federation. All right, so let's take a look at some of the suspects in this world of data mesh and bring in the ETR data. Now, of course, ETR doesn't have a data mesh category, there's no such thing as that data mesh vendor, you build a data mesh, you don't buy it. So, what we did is we use the ETR dataset to select and filter on some of the culprits that we thought might contribute to the data mesh to see how they're performing. This chart depicts a popular view that we often like to share. It's a two dimensional graphic with net score or spending momentum on the vertical axis and market share or pervasiveness in the data set on the horizontal axis. And we filtered the data on sectors such as analytics, data warehouse, and the adjacencies to things that might fit into data mesh. And we think that these pretty well reflect participation that data mesh is certainly not all compassing. And it's a subset obviously, of all the vendors who could play in the space. Let's make a few observations. Now as is often the case, Azure and AWS, they're almost literally off the charts with very high spending velocity and large presence in the market. Oracle you can see also stands out because much of the world's data lives inside of Oracle databases. It doesn't have the spending momentum or growth, but the company remains prominent. And you can see Google Cloud doesn't have nearly the presence in the dataset, but it's momentum is highly elevated. Remember that red dotted line there, that 40% line, anything over that indicates elevated spending momentum. Let's go to Snowflake. Snowflake is consistently shown to be the gold standard in net score in the ETR dataset. It continues to maintain highly elevated spending velocity in the data. And in many ways, Snowflake with its data marketplace and its data cloud vision and data sharing approach, fit nicely into the data mesh concept. Now, a caution, Snowflake has used the term data mesh in it's marketing, but in our view, it lacks clarity, and we feel like they're still trying to figure out how to communicate what that really is. But is really, we think a lot of potential there to that vision. Databricks is also interesting because the firm has momentum and we expect further elevated levels in the vertical axis in upcoming surveys, especially as it readies for its IPO. The firm has a strong product and managed service, and is really one to watch. Now we included a number of other database companies for obvious reasons like Redis and Mongo, MariaDB, Couchbase and Terradata. SAP as well is in there, but that's not all database, but SAP is prominent so we included them. As is IBM more of a database, traditional database player also with the big presence. Cloudera includes Hortonworks and HPE Ezmeral comprises the MapR business that HPE acquired. So these guys got the big data movement started, between Cloudera, Hortonworks which is born out of Yahoo, which was the early big data, sorry early Hadoop innovator, kind of MapR when it's kind of owned course, and now that's all kind of come together in various forms. And of course, we've got Talend and Informatica are there, they are two data integration companies that are worth noting. We also included some of the AI and ML specialists and data science players in the mix like DataRobot who just did a monster $250 million round. Dataiku, H2O.ai and ThoughtSpot, which is all about democratizing data and injecting AI, and I think fits well into the data mesh concept. And you know we put VMware Cloud in there for reference because it really is the predominant on-prem infrastructure platform. All right, let's wrap with some final thoughts here, first, thanks a lot to the JP Morgan team for sharing this data. I really want to encourage practitioners and technologists, go to watch the YouTube of that meetup, we'll include it in the link of this session. And thank you to Zhamak Deghani and the entire data mesh community for the outstanding work that you're doing, challenging the established conventions of monolithic data architectures. The JPM presentation, it gives you real credibility, it takes Data Mesh well beyond concept, it demonstrates how it can be and is being done. And you know, this is not a perfect world, you're going to start somewhere and there's going to be some failures, the key is to recognize that shoving everything into a monolithic data architecture won't support massive scale and agility that you're after. It's maybe fine for smaller use cases in smaller firms, but if you're building a global platform in a data business, it's time to rethink data architecture. Now much of this is enabled by the cloud, but cloud first doesn't mean cloud only, doesn't mean you'll leave your on-prem data behind, on the contrary, you have to include non-public cloud data in your Data Mesh vision just as JPMC has done. You've got to get some quick wins, that's crucial so you can gain credibility within the organization and grow. And one of the key takeaways from the JP Morgan team is, there is a place for dogma, like organizing around data products and domains and getting that right. On the other hand, you have to remain flexible because technologies is going to come, technology is going to go, so you got to be flexible in that regard. And look, if you're going to embrace the metaphor of water like puddles and ponds and lakes, we suggest maybe a little tongue in cheek, but still we believe in this, that you expand your scope to include data ocean, something John Furry and I have talked about and laughed about extensively in theCUBE. Data oceans, it's huge. It's the new data lake, go transcend data lake, think oceans. And think about this, just as we're evolving our language, we should be evolving our metrics. Much the last the decade of big data was around just getting the stuff to work, getting it up and running, standing up infrastructure and managing massive, how much data you got? Massive amounts of data. And there were many KPIs built around, again, standing up that infrastructure, ingesting data, a lot of technical KPIs. This decade is not just about enabling better insights, it's a more than that. Data mesh points us to a new era of data value, and that requires the new metrics around monetizing data products, like how long does it take to go from data product conception to monetization? And how does that compare to what it is today? And what is the time to quality if the business owns the data, and the business has the context? the quality that comes out of them, out of the shoot should be at a basic level, pretty good, and at a higher mark than out of a big data team with no business context. Automation, AI, and very importantly, organizational restructuring of our data teams will heavily contribute to success in the coming years. So we encourage you, learn, lean in and create your data future. Okay, that's it for now, remember these episodes, they're all available as podcasts wherever you listen, all you got to do is search, breaking analysis podcast, and please subscribe. Check out ETR's website at etr.plus for all the data and all the survey information. We publish a full report every week on wikibon.com and siliconangle.com. And you can get in touch with us, email me david.vellante@siliconangle.com, you can DM me @dvellante, or you can comment on my LinkedIn posts. This is Dave Vellante for theCUBE insights powered by ETR. Have a great week everybody, stay safe, be well, and we'll see you next time. (upbeat music)

Published Date : Jul 12 2021

SUMMARY :

This is braking analysis and the adjacencies to things

ENTITIES

Entity	Category	Confidence
JPMC	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
2018	DATE	0.99+
Zhamak Deghani	PERSON	0.99+
James Reid	PERSON	0.99+
JP Morgan	ORGANIZATION	0.99+
JP Morgan	ORGANIZATION	0.99+
Cloudera	ORGANIZATION	0.99+
Serita Bakst	PERSON	0.99+
IBM	ORGANIZATION	0.99+
HPE	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Scott Hollerman	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
40%	QUANTITY	0.99+
JP Morgan Chase	ORGANIZATION	0.99+
Serita	PERSON	0.99+
Yahoo	ORGANIZATION	0.99+
Arup Nanda	PERSON	0.99+
each	QUANTITY	0.99+
ThoughtWorks	ORGANIZATION	0.99+
first	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
david.vellante@siliconangle.com	OTHER	0.99+
each line	QUANTITY	0.99+
Terradata	ORGANIZATION	0.99+
Redis	ORGANIZATION	0.99+
$250 million	QUANTITY	0.99+
first point	QUANTITY	0.99+
three factors	QUANTITY	0.99+
Second	QUANTITY	0.99+
MapR	ORGANIZATION	0.99+
today	DATE	0.99+
Informatica	ORGANIZATION	0.99+
Talend	ORGANIZATION	0.99+
John Furry	PERSON	0.99+
Zhamak Deghani	PERSON	0.99+
first platform	QUANTITY	0.98+
YouTube	ORGANIZATION	0.98+
fourth	QUANTITY	0.98+
single	QUANTITY	0.98+
One	QUANTITY	0.98+
Third	QUANTITY	0.97+
Couchbase	ORGANIZATION	0.97+
three speakers	QUANTITY	0.97+
two data	QUANTITY	0.97+
first strategy	QUANTITY	0.96+
one	QUANTITY	0.96+
one place	QUANTITY	0.96+
Jr Reid	PERSON	0.96+
single lake	QUANTITY	0.95+
SAP	ORGANIZATION	0.95+
wikibon.com	OTHER	0.95+
siliconangle.com	OTHER	0.94+
Azure	ORGANIZATION	0.93+

Steven Mih, Ahana and Sachin Nayyar, Securonix | AWS Startup Showcase

>> Voiceover: From theCUBE's Studios in Palo Alto in Boston, connecting with thought leaders all around the world, this is theCUBE Conversation. >> Welcome back to theCUBE's coverage of the AWS Startup Showcase. Next Big Thing in AI, Security and Life Sciences featuring Ahana for the AI Trek. I'm your host, John Furrier. Today, we're joined by two great guests, Steven Mih, Ahana CEO, and Sachin Nayyar, Securonix CEO. Gentlemen, thanks for coming on theCUBE. We're talking about the Next-Gen technologies on AI, Open Data Lakes, et cetera. Thanks for coming on. >> Thanks for having us, John. >> Thanks, John. >> What a great line up here. >> Sachin: Thanks, Steven. >> Great, great stuff. Sachin, let's get in and talk about your company, Securonix. What do you guys do? Take us through, I know you've got a slide to help us through this, I want to introduce your stuff first then jump in with Steven. >> Absolutely. Thanks again, Steven. Ahana team for having us on the show. So Securonix, we started the company in 2010. We are the leader in security analytics and response capability for the cybermarket. So basically, this is a category of solutions called SIEM, Security Incident and Event Management. We are the quadrant leaders in Gartner, we now have about 500 customers today and have been plugging away since 2010. Started the company just really focused on analytics using machine learning and an advanced analytics to really find the needle in the haystack, then moved from there to needle in the needle stack using more algorithms, analysis of analysis. And then kind of, I evolved the company to run on cloud and become sort of the biggest security data lake on cloud and provide all the analytics to help companies with their insider threat, cyber threat, cloud solutions, application threats, emerging internally and externally, and then response and have a great partnership with Ahana as well as with AWS. So looking forward to this session, thank you. >> Awesome. I can't wait to hear the news on that Next-Gen SIEM leadership. Steven, Ahana, talk about what's going on with you guys, give us the update, a lot of stuff happening. >> Yeah. Great to be here and thanks for that such, and we appreciate the partnership as well with both Securonix and AWS. Ahana is the open source company based on PrestoDB, which is a project that came out of Facebook and is widely used, one of the fastest growing projects in data analytics today. And we make a managed service for Presto easily on AWS, all cloud native. And we'll be talking about that more during the show. Really excited to be here. We believe in open source. We believe in all the challenges of having data in the cloud and making it easy to use. So thanks for having us again. >> And looking forward to digging into that managed service and why that's been so successful. Looking forward to that. Let's get into the Securonix Next-Gen SIEM leadership first. Let's share the journey towards what you guys are doing here. As the Open Data Lakes on AWS has been a hot topic, the success of data in the cloud, no doubt is on everyone's mind especially with the edge coming. It's just, I mean, just incredible growth. Take us through Sachin, what do you guys got going on? >> Absolutely. Thanks, John. We are hearing about cyber threats every day. No question about it. So in the past, what was happening is companies, what we have done as enterprise is put all of our eggs in the basket of solutions that were evaluating the network data. With cloud, obviously there is no more network data. Now we have moved into focusing on EDR, right thing to do on endpoint detection. But with that, we also need security analytics across on-premise and cloud. And your other solutions like your OT, IOT, your mobile, bringing it all together into a security data lake and then running purpose built analytics on top of that, and then having a response so we can prevent some of these things from happening or detect them in real time versus innovating for hours or weeks and months, which is is obviously too late. So with some of the recent events happening around colonial and others, we all know cybersecurity is on top of everybody's mind. First and foremost, I also want to. >> Steven: (indistinct) slide one and that's all based off on top of the data lake, right? >> Sachin: Yes, absolutely. Absolutely. So before we go into on Securonix, I also want to congratulate everything going on with the new cyber initiatives with our government and just really excited to see some of the things that the government is also doing in this space to bring, to have stronger regulation and bring together the government and the private sector. From a Securonix perspective, today, we have one third of the fortune 500 companies using our technology. In addition, there are hundreds of small and medium sized companies that rely on Securonix for their cyber protection. So what we do is, again, we are running the solution on cloud, and that is very important. It is not just important for hosting, but in the space of cybersecurity, you need to have a solution, which is not, so where we can update the threat models and we can use the intelligence or the Intel that we gather from our customers, partners, and industry experts and roll it out to our customers within seconds and minutes, because the game is real time in cybersecurity. And that you can only do in cloud where you have the complete telemetry and access to these environments. When we go on-premise traditionally, what you will see is customers are even thinking about pushing the threat models through their standard Dev test life cycle management, and which is just completely defeating the purpose. So in any event, Securonix on the cloud brings together all the data, then runs purpose-built analytics on it. Helps you find very few, we are today pulling in several million events per second from our customers, and we provide just a very small handful of events and reduce the false positives so that people can focus on them. Their security command center can focus on that and then configure response actions on top of that. So we can take action for known issues and have intelligence in all the layers. So that's kind of what the Securonix is focused on. >> Steven, he just brought up, probably the most important story in technology right now. That's ransomware more than, first of all, cybersecurity in general, but ransomware, he mentioned some of the government efforts. Some are saying that the ransomware marketplace is bigger than some governments, nation state governments. There's a business model behind it. It's highly active. It's dominating the scene and it's a real threat. This is the new world we're living in, cloud creates the refactoring capabilities. We're hearing that story here with Securonix. How does Presto and Securonix work together? Because I'm connecting the dots here in real time. I think you're going to go there. So take us through because this is like the most important topic happening. >> Yeah. So as Sachin said, there's all this data that needs to go into the cloud and it's all moving to the cloud. And there's a massive amounts of data and hundreds of terabytes, petabytes of data that's moving into the data lakes and that's the S3-based data lakes, which are the easiest, cheapest, commodified place to put all this data. But in order to deliver the results that Sachin's company is driving, which is intelligence on when there's a ransomware or possibility, you need to have analytics on them. And so Presto is the open source project that is a open source SQL query engine for data lakes and other data sources. It was created by Facebook as part of the Linux foundation, something called Presto foundation. And it was built to replace the complicated Hadoop stack in order to then drive analytics at very lightning fast queries on large, large sets of data. And so Presto fits in with this Open Data Lake analytics movement, which has made Presto one of the fastest growing projects out there. >> What is an Open Data Lake? Real quick for the audience who wants to learn on what it means. Does is it means it's open source in the Linux foundation or open meaning it's open to multiple applications? What does that even mean? >> Yeah. Open Data Lake analytics means that you're, first of all, your data lake has open formats. So it is made up of say something called the ORC or Parquet. And these are formats that any engine can be used against. That's really great, instead of having locked in data types. Data lakes can have all different types of data. It can have unstructured, semi-structured data. It's not just the structured data, which is typically in your data warehouses. There's a lot more data going into the Open Data Lake. And then you can, based on what workload you're looking to get benefit from, the insights come from that, and actually slide two covers this pictorially. If you look on the left here on slide two, the Open Data Lake is where all the data is pulling. And Presto is the layer in between that and the insights which are driven by the visualization, reporting, dashboarding, BI tools or applications like in Securonix case. And so analytics are now being driven by every company for not just industries of security, but it's also for every industry out there, retail, e-commerce, you name it. There's a healthcare, financials, all are looking at driving more analytics for their SaaSified applications as well as for their own internal analysts, data scientists, and folks that are trying to be more data-driven. >> All right. Let's talk about the relationship now with where Presto fits in with Securonix because I get the open data layer. I see value in that. I get also what we're talking about the cloud and being faster with the datasets. So how does, Sachin' Securonix and Ahana fit in together? >> Yeah. Great question. So I'll tell you, we have two customers. I'll give you an example. We have two fortune 10 customers. One has moved most of their operations to the cloud and another customer which is in the process, early stage. The data, the amount of data that we are getting from the customer who's moved fully to the cloud is 20 times, 20 times more than the customer who's in the early stages of moving to the cloud. That is because the ability to add this level of telemetry in the cloud, in this case, it happens to be AWS, Office 365, Salesforce and several other rescalers across several other cloud technologies. But the level of logging that we are able to get the telemetry is unbelievable. So what it does is it allows us to analyze more, protect the customers better, protect them in real time, but there is a cost and scale factor to that. So like I said, when you are trying to pull in billions of events per day from a customer billions of events per day, what the customers are looking for is all of that data goes in, all of data gets enriched so that it makes sense to a normal analyst and all of that data is available for search, sometimes 90 days, sometimes 12 months. And then all of that data is available to be brought back into a searchable format for up to seven years. So think about the amount of data we are dealing with here and we have to provide a solution for this problem at a price that is affordable to the customer and that a medium-sized company as well as a large organization can afford. So after a lot of our analysis on this and again, Securonix is focused on cyber, bringing in the data, analyzing it, so after a lot of our analysis, we zeroed in on S3 as the core bucket where this data needs to be stored because the price point, the reliability, and all the other functions available on top of that. And with that, with S3, we've created a great partnership with AWS as well as with Snowflake that is providing this, from a data lake perspective, a bigger data lake, enterprise data lake perspective. So now for us to be able to provide customers the ability to search that data. So data comes in, we are enriching it. We are putting it in S3 in real time. Now, this is where Presto comes in. In our research, Presto came out as the best search engine to sit on top of S3. The engine is supported by companies like Facebook and Uber, and it is open source. So open source, like you asked the question. So for companies like us, we cannot depend on a very small technology company to offer mission critical capabilities because what if that company gets acquired, et cetera. In the case of open source, we are able to adopt it. We know there is a community behind it and it will be kind of available for us to use and we will be able to contribute in it for the longterm. Number two, from an open source perspective, we have a strong belief that customers own their own data. Traditionally, like Steven used the word locked in, it's a key term, customers have been locked in into proprietary formats in the past and those days are over. You should be, you own the data and you should be able to use it with us and with other systems of choice. So now you get into a data search engine like Presto, which scales independently of the storage. And then when we start looking at Presto, we came across Ahana. So for every open source system, you definitely need a sort of a for-profit company that invests in the community and then that takes the community forward. Because without a company like this, the community will die. So we are very excited about the partnership with Presto and Ahana. And Ahana provides us the ability to take Presto and cloudify it, or make the cloud operations work plus be our conduit to the Ahana community. Help us speed up certain items on the roadmap, help our team contribute to the community as well. And then you have to take a solution like Presto, you have to put it in the cloud, you have to make it scale, you have to put it on Kubernetes. Standard thing that you need to do in today's world to offer it as sort of a micro service into our architecture. So in all of those areas, that's where our partnership is with Ahana and Presto and S3 and we think, this is the search solution for the future. And with something like this, very soon, we will be able to offer our customers 12 months of data, searchable at extremely fast speeds at very reasonable price points and you will own your own data. So it has very significant business benefits for our customers with the technology partnership that we have set up here. So very excited about this. >> Sachin, it's very inspiring, a couple things there. One, decentralize on your own data, having a democratized, that piece is killer. Open source, great point. >> Absolutely. >> Company goes out of business, you don't want to lose the source code or get acquired or whatever. That's a key enabler. And then three, a fast managed service that has a commercial backing behind it. So, a great, and by the way, Snowflake wasn't around a couple of years ago. So like, so this is what we're talking about. This is the cloud scale. Steven, take us home with this point because this is what innovation looks like. Could you share why it's working? What's some of the things that people could walk away with and learn from as the new architecture for the new NextGen cloud is here, so this is a big part of and share how this works? >> That's right. As you heard from Sachin, every company is becoming data-driven and analytics are central to their business. There's more data and it needs to be analyzed at lower cost without the locked in and people want that flexibility. And so a slide three talks about what Ahana cloud for Presto does. It's the best Presto out of the box. It gives you very easy to use for your operations team. So it can be one or two people just managing this and they can get up to speed very quickly in 30 minutes, be up and running. And that jump starts their movement into an Open Data Lake analytics architecture. That architecture is going to be, it is the one that is at Facebook, Uber, Twitter, other large web scale, internet scale companies. And with the amount of data that's occurring, that's now becoming the standard architecture for everyone else in the future. And so just to wrap, we're really excited about making that easy, giving an open source solution because the open source data stack based off of data lake analytics is really happening. >> I got to ask you, you've seen many waves on the industry. Certainly, you've been through the big data waves, Steven. Sachin, you're on the cutting edge and just the cutting edge billions of signals from one client alone is pretty amazing scale and refactoring that value proposition is super important. What's different from 10 years ago when the Hadoop, you mentioned Hadoop earlier, which is RIP, obviously the cloud killed it. We all know that. Everyone kind of knows that. But like, what's different now? I mean, skeptics might say, I don't believe you, but it's just crazy. There's no way it works. S3 costs way too much. Why is this now so much more of an attractive proposition? What do you say the naysayers out there? With Steve, we'll start with you and then Sachin, I want you to like weigh in too. >> Yeah. Well, if you think about the Hadoop era and if you look at slide three, it was a very complicated system that was done mainly on-prem. And you'd have to go and set up a big data team and a rack and stack a bunch of servers and then try to put all this stuff together and candidly, the results and the outcomes of that were very hard to get unless you had the best possible teams and invested a lot of money in this. What you saw in this slide was that, that right hand side which shows the stack. Now you have a separate compute, which is based off of Intel based instances in the cloud. We run the best in that and they're part of the Presto foundation. And that's now data lakes. Now the distributed compute engines are the ones that have become very much easier. So the big difference in what I see is no longer called big data. It's just called data analytics because it's now become commodified as being easy and the bar is much, much lower, so everyone can get the benefit of this across industries, across organizations. I mean, that's good for the world, reduces the security threats, the ransomware, in the case of Securonix and Sachin here. But every company can benefit from this. >> Sachin, this is really as an example in my mind and you can comment too on if you'd believe or not, but replatform with the cloud, that's a no brainer. People do that. They did it. But the value is refactoring in the cloud. It's thinking differently with the assets you have and making sure you're using the right pieces. I mean, there's no brainer, you know it's good. If it costs more money to stand up something than to like get value out of something that's operating at scale, much easier equation. What's your thoughts on this? Go back 10 years and where we are now, what's different? I mean, replatforming, refactoring, all kinds of happening. What's your take on all this? >> Agreed, John. So we have been in business now for about 10 to 11 years. And when we started my hair was all black. Okay. >> John: You're so silly. >> Okay. So this, everything has happened here is the transition from Hadoop to cloud. Okay. This is what the result has been. So people can see it for themselves. So when we started off with deep partnerships with the Hadoop providers and again, Hadoop is the foundation, which has now become EMR and everything else that AWS and other companies have picked up. But when you start with some basic premise, first, the racking and stacking of hardware, companies having to project their entire data volume upfront, bringing the servers and have 50, 100, 500 servers sitting in their data centers. And then when there are spikes in data, or like I said, as you move to the cloud, your data volume will increase between five to 20x and projecting for that. And then think about the agility that it will take you three to six months to bring in new servers and then bring them into the architecture. So big issue. Number two big issue is that the backend of that was built for HDFS. So Hadoop in my mind was built to ingest large amounts of data in batches and then perform some spark jobs on it, some analytics. But we are talking in security about real time, high velocity, high variety data, which has to be available in real time. It wasn't built for that, to be honest. So what was happening is, again, even if you look at the Hadoop companies today as they have kind of figured, kind of define their next generation, they have moved from HDFS to now kind of a cloud based platform capability and have discarded the traditional HDFS architecture because it just wasn't scaling, wasn't searching fast enough, wasn't searching fast enough for hundreds of analysts at the same time. And then obviously, the servers, et cetera wasn't working. Then when we worked with the Hadoop companies, they were always two to three versions behind for the individual services that they had brought together. And again, when you're talking about this kind of a volume, you need to be on the cutting edge always of the technologies underneath that. So even while we were working with them, we had to support our own versions of Kafka, Solr, Zookeeper, et cetera to really bring it together and provide our customers this capability. So now when we have moved to the cloud with solutions like EMR behind us, AWS has invested in in solutions like EMR to make them scalable, to have scale and then scale out, which traditional Hadoop did not provide because they missed the cloud wave. And then on top of that, again, rather than throwing data in that traditional older HDFS format, we are now taking the same format, the parquet format that it supports, putting it in S3 and now making it available and using all the capabilities like you said, the refactoring of that is critical. That rather than on-prem having servers and redundancies with S3, we get built in redundancy. We get built in life cycle management, high degree of confidence data reliability. And then we get all this innovation from companies like, from groups like Presto, companies like Ahana sitting on double that S3. And the last item I would say is in the cloud we are now able to offer multiple, have multiple resilient options on our side. So for example, with us, we still have some premium searching going on with solutions like Solr and Elasticsearch, then you have Presto and Ahana providing majority of our searching, but we still have Athena as a backup in case something goes down in the architecture. Our queries will spin back up to Athena, AWS service on Presto and customers will still get served. So all of these options, but what it doesn't cost us anything, Athena, if we don't use it, but all of these options are not available on-prem. So in my mind, I mean, it's a whole new world we are living in. It is a world where now we have made it possible for companies to even enterprises to even think about having true security data lakes, which are useful and having real-time analytics. From my perspective, I don't even sign up today for a large enterprise that wants to build a data lake on-prem because I know that is not, that is going to be a very difficult project to make it successful. So we've come a long way and there are several details around this that we've kind of endured through the process, but very excited where we are today. >> Well, we certainly follow up with theCUBE on all your your endeavors. Quickly on Ahana, why them, why their solution? In your words, what would be the advice you'd give me if I'm like, okay, I'm looking at this, why do I want to use it, and what's your experience? >> Right. So the standard SQL query engine for data lake analytics, more and more people have more data, want to have something that's based on open source, based on open formats, gives you that flexibility, pay as you go. You only pay for what you use. And so it proved to be the best option for Securonix to create a self-service system that has all the speed and performance and scalability that they need, which is based off of the innovation from the large companies like Facebook, Uber, Twitter. They've all invested heavily. We contribute to the open source project. It's a vibrant community. We encourage people to join the community and even Securonix, we'll be having engineers that are contributing to the project as well. I think, is that right Sachin? Maybe you could share a little bit about your thoughts on being part of the community. >> Yeah. So also why we chose Ahana, like John said. The first reason is you see Steven is always smiling. Okay. >> That's for sure. >> That is very important. I mean, jokes apart, you need a great partner. You need a great partner. You need a partner with a great attitude because this is not a sprint, this is a marathon. So the Ahana founders, Steven, the whole team, they're world-class, they're world-class. The depth that the CTO has, his experience, the depth that Dipti has, who's running the cloud solution. These guys are world-class. They are very involved in the community. We evaluated them from a community perspective. They are very involved. They have the depth of really commercializing an open source solution without making it too commercial. The right balance, where the founding companies like Facebook and Uber, and hopefully Securonix in the future as we contribute more and more will have our say and they act like the right stewards in this journey and then contribute as well. So and then they have chosen the right niche rather than taking portions of the product and making it proprietary. They have put in the effort towards the cloud infrastructure of making that product available easily on the cloud. So I think it's sort of a no-brainer from our side. Once we chose Presto, Ahana was the no-brainer and just the partnership so far has been very exciting and I'm looking forward to great things together. >> Likewise Sachin, thanks so much for that. And we've only found your team, you're world-class as well, and working together and we look forward to working in the community also in the Presto foundation. So thanks for that. >> Guys, great partnership. Great insight and really, this is a great example of cloud scale, cloud value proposition as it unlocks new benefits. Open source, managed services, refactoring the opportunities to create more value. Stephen, Sachin, thank you so much for sharing your story here on open data lakes. Can open always wins in my mind. This is theCUBE we're always open and we're showcasing all the hot startups coming out of the AWS ecosystem for the AWS Startup Showcase. I'm John Furrier, your host. Thanks for watching. (bright music)

Published Date : Jun 24 2021

SUMMARY :

leaders all around the world, of the AWS Startup Showcase. to help us through this, and provide all the what's going on with you guys, in the cloud and making it easy to use. Let's get into the Securonix So in the past, what was So in any event, Securonix on the cloud Some are saying that the and that's the S3-based data in the Linux foundation or open meaning And Presto is the layer in because I get the open data layer. and all the other functions that piece is killer. and learn from as the new architecture for everyone else in the future. obviously the cloud killed it. and the bar is much, much lower, But the value is refactoring in the cloud. So we have been in business and again, Hadoop is the foundation, be the advice you'd give me system that has all the speed The first reason is you see and just the partnership so in the community also in for the AWS Startup Showcase.

ENTITIES

Entity	Category	Confidence
Steven	PERSON	0.99+
Sachin	PERSON	0.99+
John	PERSON	0.99+
Steve	PERSON	0.99+
Securonix	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Steven Mih	PERSON	0.99+
50	QUANTITY	0.99+
Uber	ORGANIZATION	0.99+
2010	DATE	0.99+
Stephen	PERSON	0.99+
Sachin Nayyar	PERSON	0.99+
Facebook	ORGANIZATION	0.99+
20 times	QUANTITY	0.99+
one	QUANTITY	0.99+
12 months	QUANTITY	0.99+
three	QUANTITY	0.99+
Twitter	ORGANIZATION	0.99+
Ahana	PERSON	0.99+
two customers	QUANTITY	0.99+
90 days	QUANTITY	0.99+
Ahana	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
100	QUANTITY	0.99+
30 minutes	QUANTITY	0.99+
Presto	ORGANIZATION	0.99+
hundreds of terabytes	QUANTITY	0.99+
five	QUANTITY	0.99+
First	QUANTITY	0.99+
One	QUANTITY	0.99+
two	QUANTITY	0.99+
hundreds	QUANTITY	0.99+
six months	QUANTITY	0.99+
S3	TITLE	0.99+
Zookeeper	TITLE	0.99+

Dimitri Sirota, BigID | CUBE Conversation, March 2021

(upbeat music) >> Well good to have you with us here as we continue the AWS startup showcase and we're joined now by the CEO of BigID, Dmitri Sirota. And Dmitri good afternoon to you? How are you doing today? >> I'm pretty good, it's Friday, it's sunny, it's warm, I'm doing well. >> Then that's a good start, yeah. Glad to have you with us here. First off, just about BigID and when you look at I would assume these accolades are, they are quite a showcase for you. Well economic forum technology pioneer. Forbes cloud 100, business insider startup the watch. I mean, you are getting a lot of attention, obviously for... >> Yep. >> And well-deserved, but when you see these kinds of recognitions coming your way- >> Yep. >> First of what does that do to inspire, motivate and fuel this great passion that you have? >> Yeah, look I think all of these recognitions help, I think affirm, I think what we aspire to be right? Provide the preeminent solution for helping organizations understand their data and in so doing, be able to address problems in privacy and protection and perspective. And I think that these recognitions are part of that as our customers, as our partners like AWS. So they're all part of that ad mixture. And I think they contribute to a sense that we're doing some pioneering work, right as they work from the world economic forum recognized. So I think it's important. I think it's healthy. It encourages kind of cooperative spirit at the company. And I think it's, you know, it's very encouraging for us to continue and build. >> So let's talk about BigID, a little bit for our viewers who might not be too familiar. You are a fairly new company, raised 200 million so far, five years of operations coming up on five years. >> Yep. >> But talk about your sweet spot in terms of the variety of services they provided in terms of protection and security. >> Yeah, sure. So we were founded with really this kind of precept that organizations need to have a better understanding of their data. I think when we got started about five years ago. Most organizations had some view of their data, maybe a few of their files, maybe their databases. What changed is the emerging privacy regulations like GDPR and CCPA later forced companies to rethink their approach to data understanding data knowledge, because part of the kind of the core consumption of privacy is that you and me and other individuals have a right to their data the data actually belongs to us. Similar to when you deposit a check in a bank. That money you deposited is yours. If you ever want to withdraw it, the bank has to give it back to you. And in a similar way, these privacy regulations require organizations to be able to give back your data or delete it or do other things. And as it happens there was no real technology to help companies do that, to help companies look across their vast data estates and pick out all the pieces of information all the detritus that could belong to Dimitri. So it could be my password, it could be my social security, it could be my click stream, it could be my IP address, my cookie. And so we developed from the ground up a brand new approach to technology that covers the data center and the cloud, and allow organizations to understand their data at a level of detail that never existed before. And still, I would argue doesn't exist today. Separate from BigID. And we describe that as our foundational data discovery in depth, right? We provide this kind of multidimensional view of your data to understand the content and the context of the information. And what that allows organizations to do is better understand the risk better meet certain regulatory requirements like GDPR and CCPA. But ultimately also get better value from their data. And so what was pioneering about us is not only that level of detail that we provided almost like your iPhone provides you four cameras to look at the world. We provide you kind of four lenses to look at your data. But then on top of that we introduced a platform that allowed you to take action on what you found. And that action could be in the realm of privacy so that you could solve for some of the privacy use cases like data rights or consent or consumer privacy preferences or data protection data security, so that you can remediate. You can do deal with data lifecycle management. You could deal with encryption, et cetera. Or ultimately what we call a data governance or data perspective, this idea of being able to get value from your data but doing so in a privacy and security preserving way. So that's kind of the conception we want to help you know, your data. And then we want to help you act on your data so that your data is both secure. It's both compliant , but ultimately you get value from your data. >> Now we get into this, helping me know my data better because you you've talked about data you know and data you don't right? >> Dimitri: Yeah. >> And you're saying there's a lot more that we don't or a company doesn't know. >> Dimitri: Yeah. >> Than it's aware of. And I find that still kind of striking in this day and age. I mean with kind of the sophistication of tools that we have and different capabilities that I think give us better insight. But I'm still kind of surprised when you're saying there's all a lot of data that companies are housing that they're not even aware of right now. >> They're not and candidly they didn't really want to be for a long, long time. I think the more you know sometimes the more you have to fix, right? So there needed to be a catalyzing event like these privacy regulations to essentially kind of unpack, to force a set of actions because the privacy regulation said, no, no, no you need to know whether you want to or not. So I think a lot of organizations for years and years outside of a couple of narrow fields like HIPAA, PCI unless there was a specific regulation, they didn't want to know too much. And as a consequence there, wasn't really technology to keep up with the explosion in data volumes and data platforms. Right? Think about like AWS didn't exist when a lot of these technologies were first built in the early 2000's. And so we had to kind of completely re-think things. And one thing I'll also kind of highlight is the need or necessity is not just driven by some of these emerging privacy regulations, but it's also driven by the shift to the cloud. Because when you have all your data on a server in a data center in New Jersey, you could feel a false sense of security because you have doors to that data center in New Jersey and you have firewalls to that data center in New Jersey. And if anybody asks you where your sensitive data you could say, it's in New Jersey! But now all of a sudden you move it into the cloud and data becomes the perimeter, right? It's kind of naked and exposed it's out there. And so I think there's a much greater need and urgency because now data is kind of in the ethos in the air. And so organizations are really kind of looking for additional ability for them to both understand contextualize and deal with some of the privacy security and data governance aspects of that data. >> So you're talking about data obviously AWS comes to mind, right? >> Dimitri: Yeah. And the relationship that you have with them it's been a couple of years in the making things are going really well for you and ultimately for your customers. What is it about this particular partnership that you have with AWS that you think has allowed you to bring that even more added value at the end of the day to your customer base? >> Look, our customers are going to AWS because its simplicity to kind of provision their applications, their services, the cost is incredibly attractive, the diversity of capabilities that AWS provides our customers. And so we have a lot of larger and midsize and even smaller organizations that are going to AWS. And it's important for us to be where our customers are. And so if our customers are using Red Sheriff, or using S creator, using dynamo or using Kinesis or using security hub. We have to be there, right? So we've kind of followed that pathway because of they're putting data in those places, part of our job is provide that insight and intelligence to our customers around those data assets, wherever they are. And so we build a set of capabilities and expertise around the broader AWS platform. So that we could argue that we can help you, whether you keep your data in S3 whether you keep it Dynamo, whether you keep it in EMR, RDS, Aurora, Athena the list goes on and on. We want to be that expert partner for you to kind of help you know your data and then tend to take action on your data. >> So the question about data security in general, obviously as you know, there are these major stories of tremendous breach that's right. >> Yep. >> Stayed afterwards, in some cases. >> Bad guys. >> Yeah, really bad guys and bad smart guys, unfortunately and persistent to say the least. How do you work with your clients in an environment like that? Where, you know, these threats are never ending, >> Yep. >> They're becoming more and more complex. And the tools that you have are certainly robust but at the end of the day, it's very difficult. If not impossible to say a 100% bulletproof, right? >> Yeah. >> It's if you are absolutely safe with us. But you still try, you give these insurances because of your sophistication that, should give people some peace of mind. Again, it's a tough battle your in. >> Yeah. So I think the first rule of fight club is that, to solve a problem, you need to know the problem, right? You can't fix what you can't find, right? So if you're unaware that there's a potential compromise in your data, potential risk in your data maybe you have passwords in a certain data store and there's no security around that. You need to know that you have passwords in a certain data store and there's no security around that. >> Because unless you know that first, there's no ability for you to solve it. So the first part of what we do that kind of know your data that K-Y-D, is we help organizations understand what data do they have that potentially is at risk, may violate a regulatory requirement like GDPR or CCPA, things of that sort. So that's kind of the first level of value because you can't solve for something you can't, you're unaware of, right? You need to be able to see it and you need to be able to understand it. And so our ability to kind of both understand your data and understand what it is, why it is, whose it is where it came from, the risk around it lets you take action on that. Now we don't stop there. We don't stop at just helping you kind of find the problem. We also help you understand if there's additional levels of exposure. Do you have access control around that data for instance. If that data is open to the world and you just put a bunch of passwords there or API keys or credentials, that's a problem. So we provide this kind of holistic view into your data and to some of the security controls. And then most importantly, through our application platform our own apps, we provide ways for you to take action on that. And that action could take many forms. It could be about remediating where you delegate to a security owner and say, hey, I want you to delete that data. Or I want you to encrypt that data. It could be something more automated where it just encrypts everything. But again, part of the value and virtue of our platform is that we both help you identify the potential risk points. And then we give you in the form of apps that sit on top of our platform, ways to take action on it, to secure it, to reduce it, to minimize the risk. >> Because these threats are ever evolving. Can you give us a little, maybe inside peek under the tent here, a bit about what you're looking at in terms of products or services down the road here. So if somebody is thinking, okay. What enhanced tools might be at my disposal in the near term or even in the longterm to try and mitigate these risks. Can you give us an idea about some things you guys are working on? >> Yeah. So the biggest thing we're working on I've already kind of hinted at this is really the kind of first in industry platform, in our category companies that look at data and by platform i mean, something like where you can introduce apps. So AWS has a platform. People can introduce additional capabilities on top of AWS. In the data discovery classification arena, that had never been the case because the tools were very, very old. So we're introducing these apps and these apps allow you to take a variety of actions. I've mentioned a few of them, there's retention. You can do encryption, you can do access control, you could do remediation, and you could do breach impact analysis. Each of these apps is kind of an atomic unit of functionality. So there's no different than on your iPhone or your Android phone. You may have an Uber app, when you click on it, all of a sudden your phone looks like an Uber application. You may have an app focused on Salesforce, you click on it, all of a sudden your phone looks like a Salesforce application. And so what we've done is we've kind of taken this kind of data discovery, classification and intelligence mechanism that kind of K-Y-D I referenced. And then we built a whole app platform. And what we're going to start announcing over the coming months, is more and more apps in the field of privacy, in the fields of data security or protection, and even the fields of data value what we call perspective and that's and we're actually coming out with an announcement shortly on this app marketplace. And there'll be BigID building apps, but you know what, there's going to be a lot of third parties building apps. So companies that do intrusion detection and integrations and all kinds of other things are also building apps on BigID. And that's an exciting part of what you're going to see coming from us in the coming weeks. >> Great. Well, thanks for the sneak peek and wait I feel like I just barely scratched the surface of it. Governance, compliance, right? Regulation, you have so many balls in the air but obviously you're juggling them quite well and we wish you continued success, job well done. Thanks, Dimitri. >> Dimitri: Thank you very much for having me. (upbeat music)

Published Date : Mar 19 2021

SUMMARY :

Well good to have you with us here Friday, it's sunny, it's warm, Glad to have you with us here. And I think it's, you know, So let's talk about BigID, a little bit in terms of the variety we want to help you know, your data. that we don't or a company doesn't know. And I find that still kind of striking the more you have to fix, right? that you have with them to kind of help you know your data obviously as you know, there How do you work with your clients And the tools that you It's if you are You need to know that you have passwords is that we both help you identify about some things you guys are working on? and these apps allow you to and we wish you continued Dimitri: Thank you

ENTITIES

Entity	Category	Confidence
Dmitri Sirota	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Dimitri	PERSON	0.99+
Dimitri Sirota	PERSON	0.99+
March 2021	DATE	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
New Jersey	LOCATION	0.99+
100%	QUANTITY	0.99+
five years	QUANTITY	0.99+
Dmitri	PERSON	0.99+
BigID	ORGANIZATION	0.99+
200 million	QUANTITY	0.99+
Each	QUANTITY	0.99+
Friday	DATE	0.99+
both	QUANTITY	0.99+
today	DATE	0.99+
first part	QUANTITY	0.99+
GDPR	TITLE	0.99+
first	QUANTITY	0.98+
first level	QUANTITY	0.98+
Uber	ORGANIZATION	0.98+
CCPA	TITLE	0.96+
Android	TITLE	0.96+
First	QUANTITY	0.95+
BigID	TITLE	0.95+
four cameras	QUANTITY	0.95+
early 2000's	DATE	0.95+
S creator	TITLE	0.92+
Red Sheriff	TITLE	0.92+
HIPAA	TITLE	0.89+
S3	TITLE	0.89+
Kinesis	TITLE	0.87+
five years ago	DATE	0.86+
first rule	QUANTITY	0.85+
about	DATE	0.8+
couple of years	QUANTITY	0.75+
one thing	QUANTITY	0.74+
Aurora	TITLE	0.7+
four lenses	QUANTITY	0.69+
Athena	TITLE	0.67+
RDS	TITLE	0.65+
100	QUANTITY	0.61+
Salesforce	TITLE	0.58+
Dynamo	TITLE	0.56+
CEO	PERSON	0.56+
-D	ORGANIZATION	0.54+
PCI	ORGANIZATION	0.54+
years	QUANTITY	0.5+
dynamo	TITLE	0.48+
EMR	TITLE	0.48+
Forbes cloud	ORGANIZATION	0.46+
CUBE	ORGANIZATION	0.35+

HOLD_CA_Dimitri Sirota, BigID | CUBE Conversation, March 2021

Published Date : Mar 17 2021

SUMMARY :

ENTITIES

Entity	Category	Confidence
Dmitri Sirota	PERSON	0.99+
AWS	ORGANIZATION	0.99+
March 2021	DATE	0.99+
Dimitri	PERSON	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
New Jersey	LOCATION	0.99+
100%	QUANTITY	0.99+
five years	QUANTITY	0.99+
Dmitri	PERSON	0.99+
BigID	ORGANIZATION	0.99+
200 million	QUANTITY	0.99+
Each	QUANTITY	0.99+
Friday	DATE	0.99+
both	QUANTITY	0.99+
today	DATE	0.99+
first part	QUANTITY	0.99+
GDPR	TITLE	0.99+
first	QUANTITY	0.98+
first level	QUANTITY	0.98+
Uber	ORGANIZATION	0.98+
CCPA	TITLE	0.96+
Android	TITLE	0.96+
First	QUANTITY	0.95+
four cameras	QUANTITY	0.95+
early 2000's	DATE	0.95+
BigID	TITLE	0.94+
S creator	TITLE	0.92+
Red Sheriff	TITLE	0.92+
Sirota	PERSON	0.9+
HIPAA	TITLE	0.89+
S3	TITLE	0.89+
Kinesis	TITLE	0.87+
five years ago	DATE	0.86+
first rule	QUANTITY	0.85+
about	DATE	0.8+
couple of years	QUANTITY	0.75+
one thing	QUANTITY	0.74+
Aurora	TITLE	0.7+
four lenses	QUANTITY	0.69+
Athena	TITLE	0.67+
RDS	TITLE	0.65+
100	QUANTITY	0.61+
Salesforce	TITLE	0.58+
Dynamo	TITLE	0.56+
CEO	PERSON	0.56+
-D	ORGANIZATION	0.54+
PCI	ORGANIZATION	0.54+
years	QUANTITY	0.5+
dynamo	TITLE	0.48+
EMR	TITLE	0.48+
Forbes cloud	ORGANIZATION	0.46+

William Murphy, BigID | AWS Startup Showcase

(upbeat music) >> Well, good day and thank you for joining us as we continue our series here on theCUBE of the AWS Startup Showcase featuring today BigID. And with us is Will Murphy, who's the Vice President of the Business Development and Alliances at BigID. Will, good day to you, how are you doing today? >> Thanks John, I'm doing well. I'm glad to be here. >> Yeah, that's great. And theCUBE alum too, I might add so it's nice to have you back. Let's first off, let's share the BigID story. You've been around for just a handful of years. Accolades coming from every which direction so obviously what you're doing, you're doing very well. But for our viewers who might not be too familiar with BigID, just give us a 30,000 foot level of your core competence. >> Yeah absolutely. So actually we just had our five-year anniversary for BigID, which we're quite excited about. And that five year comes with some pretty big red marks. We've raised over $200 million for a unicorn now. But where that comes to and how that came about was that we're dealing with longstanding problems with modern data landscapes, security governance, privacy initiatives. And starting in 2016 with the authorship of GDPR, the European privacy law organizations had to treat data differently than they did before. They couldn't afford to just sit on all this data that was collected. For a couple reasons, right? One of them being that it's expensive. So you're constantly storing data whether that's on-prem or in the cloud as we're going to talk about. There's expense to that. You have to pay to secure the data and keep it from being leaked, You have to pay for access control, you have to pay for a lot of different things. And you're not getting any value out of that. And then there's the idea of the customer trust piece, which is like if anything happens to that data, your reputation as a company and the trust you have between your customers and your organization is broken. So BigID, what we did is we decided that there was a foundation that needed to be built. The foundation was data discovery. If an organization knows where its data is, whose data it is, where it is, and what it is and also who has access to it, they can start to make actionable decisions based on the data and based on this new data intelligence. So, we're trying to help organizations keep up with modern data initiatives. And we're empowering organizations to handle their data, sensitive, personal regulated. What's actually quite interesting is we allow organizations to define what's sensitive to them because like people, organizations are all different. And so what's sensitive to one organization might not be to another. It goes beyond the wall. And so we're giving organizations that new power and flexibility. >> And this is what I still find striking is that obviously with this exponential growth of data you got machine learning, just bringing billions of inputs. It seems like right now. Also you had this vast reservoir of data. Is that the companies in large part don't know a lot about the data that they're harvesting and where it is, and so it's not actionable. It's kind of dark data, right? Just out there residing. And so as I understand it, this is your focus basically is to tell people, hey here's your landscape, here's how you can better put it to action why it's valuable and we're going to help you protect it. And they're not aware of these things which I still find a little striking in this day and age >> And it goes even further. So you know, when you start to reveal the truth and what's going on with data, there's a couple things that some organizations do. And enter the human instincts. Some organizations want to bury their head in the sand like everything's fine. Which is as we know and we've seen the news frequently not a sustainable approach. There's the like let's be we're overwhelmed. Yeah. We don't even know where to start. Then there's the unnatural reaction, which is okay, we have to centralize and control everything. Which defeats the purpose of having shared drives and collaboration in geographically disparate workforces, which we've seen in particularly over the last year, how important that resiliency within organizations is to be able to work in different areas. And so it really restricts the value that organizations can get from their data, which is important. And it's important in a ton of ways. And for customers that have allowed their data to be stored and harvested by these organizations, like they're not getting value out of it neither. It's just risk. And we've got to move data from the liability side of the balance sheet to the assets side of the balance sheet. And that comes first and foremost with knowledge. >> So everybody's going cloud, right? Used to be, you know, everybody's on prem. And all of a sudden we build a bigger house. And so because you build a bigger house, you need better security, right? Your perimeter's got to grow. And that's where I assume AWS has come in with you. And this is a two year partnership that you've been engaged with in AWS. So maybe shine a little light on that. About the partnership that you've created with AWS and then how you then in turn transition that to leverage that for the benefit of your customer base. >> Yeah. So AWS has been a great partner. They are very forward-looking for an organization as large as they are. Very forward looking that they can't do everything that their customers need. And it's better for the ecosystem as a whole to enable small companies like us, and we were very small when we started our relationship with them, to join their partner organization. So we're an advanced partner now. We're part of ISV Accelerate. So it's a slightly more lead partner organization. And we're there because our customers are there. And AWS like us, we both have a customer obsessed culture. But organizations are embracing the cloud. And there's fear of the cloud, but there really shouldn't be in the way that we thought of it maybe five or 10 years ago. And that companies like AWS are spending a lot more money on security than most organizations can. So like they have huge security teams, they're building massive infrastructure. And then on top of that, companies themselves can can use products like big ID and other products to make themselves more secure from outside threats and from inside threats as well. So we are trying to with them approach modern data challenges well. So even within AWS, if you put all the information in like let's say S3 buckets, it doesn't really tell you anything. It's like, you know, I make this analogy sometimes. I live in Manhattan and if I were to collect all the keys of everybody that lived in a 10 block radius around me and put it into a dumpster and keep doing that, I would theoretically know where all the keys were. They're in the dumpster. Now, if somebody asked me, I'd like my keys back, I'd have a really hard time giving them that. Because I've got to sort through, you know, 10,000 people's keys. And I don't really know a lot about it. But those key say a lot, you know? It says like, are you in an old building? Are you in a new building? Do you have a bike? Do you have a car? Do you have a gym locker? There's all sorts of information. And I think that this analogy holds up for data but ifs of the way you store your data is important. But you can gain a lot of theoretically innocuous but valuable information from the data that's there, while not compromising the sensitive data. And as an AWS has been a fabulous partner in this. They've helped us build a AWS security, have integration out of the box. We now work with over 12 different AWS native applications from anything like S3, Redshift, Athena, Kinesis, as well as apps built on AWS, like Snowflake and Databricks that we connect to. And in AWS, the technical teams, department teams have been an enormous part of our success there. We're very proud to have joined the marketplace, to be where our customers want to buy enterprise software more and more. And that's another area that we're collaborating in joint accounts now to bring more value and simplicity to our joint customers. >> So what's your process in terms of your customer and evaluating their needs? 'Cause you just talked about it earlier that you had different approaches to security. Some people put their head in the sand, right? Some people admit that there's a problem. Some people fully are engaged. So I assume there's also a different level of sophistication in terms of what they already have in place and then what their needs are. So if you were to shine a little light on that, about assessing where they are in terms of their data landscape. And now AWS and its tools, which you just touched on. You know, the multiple tools you have in your service. Now, all that comes together to develop what would be I guess, a unique program for a company's specific needs. >> It is. We started talking to the largest enterprise accounts when we were founded and we still have a real proclivity and expertise in that area. So the issues with the large enterprise accounts and the uniqueness there is scale. They have a tremendous amount of data: HR data financial data, customer data, you name it. Right? Like, we could go dry mouth talking about how many insane data so many times with these large customers. For AWS, scale wasn't an issue. They can store it. They can analyze it. They can do tons with it. Where we were helping is that we could make that safer. So if you want to perform data analytics, you want to ensure that sensitive data is not being part of that. You want to make sure you're not violating local, national or industry specific regulations. Financial services is a great example. There's dozens of regulations at the federal level in United States. And each state has their own regulations. This becomes increasingly complex. So AWS handles this by allowing an amazing amount of customization for their customers. They have data centers in the right places. They have experts on vertical specific issues. BigID handles this similarly in some ways, but we handle it through extensibility. So one of our big things is we have to be able to connect to everywhere where our customers have data. So we want to build a foundation of like let's say first, let's understand the goals. Is the goal compliant with the law? Which it should be for everybody. That should just be like, we need to comply with the law. Like that's easy. Yeah. Then there's the next piece, like are we dealing with something legacy? Was there a breach? Do we need to understand what happened? Are we trying to be forward-looking and understanding? We want to make sure we can lock down our most sensitive data. Tier our storage, tier our security, tier are our analytics efforts which also is cost-effective. So you don't have to do everything everywhere. Or is the goal a little bit like we needed to get our return on investment faster. And we can't do that without de-risking some of that. So we've taken those lessons from the enterprise where it's exceedingly difficult to work because of the strict requirements because the customers expect more. And I think like AWS, we're bringing it down market. We have some new product coming out. It's exclusive for AWS now called SmallID, which is a cloud native. A smaller version, lighter weight version of our product for customers in the more commercial space. In the SMB space where they can start to build a foundation of understanding their data for protection and for security, for privacy. >> Will, and before I let you go here what I'd like to hear about is practical application. You know, somebody that you've, you know, that you were able to help and assist, you evaluated. 'Cause you've talked about the format here. You talked about your process and talked about some future, I guess, challenges, opportunities. But just to give our viewers an idea of maybe the kind of success you've already had. To give them a perspective on that. Just share a couple of stories, if you wouldn't mind. Whether there's some work that you guys did and rolled up your sleeves and created that additional value for your customers. >> Yeah, absolutely. So I'll give a couple examples. I'm going to keep everyone anonymized. As a privacy based company, in many ways, we try to respect-- >> Probably a good idea, right? (Will chuckles) >> But let's talk about different types of sensitive data. So we have customers that intellectual property is their biggest concern. So they do care about compliance. They want to comply with all the local and national laws where their company resides and all their offices are. But they were very concerned about sensitive data sprawl around intellectual property. They have a lot of patents. They have a lot of sensitive data that way. So one of the things we did is we were able to provide custom tags and classifications for their sensitive data based on intellectual property. And they could see across their cloud environment, across their on-premise environment, across shared drives et cetera, where sensitive data had sprawl. Where it had moved, who's having access to it. And they were able to start realigning their storage strategy and their content management strategy, data governance strategy, based on that. And start to move sensitive data back to certain locations, lock that down on a higher level. Could create more access control there, but also proliferate and share data that more teams needed access to. And so that's an example of a use case that I don't think we imagined necessarily in 2016 when we were focused on privacy but we've seen that the value can come from it. Yeah. >> So it's a good... Please, yeah, go ahead. >> No, I mean, the other (mumbles). So we've worked with some of the largest AWS customers in the world. Their concern is how do we even start to scan the Tedder terabytes and petabytes of data in any reasonable fashion without it being out of date. If we create this data map, if we create this data inventory, it's going to be out of date day one. As soon as we say, it's complete, we've already added more. >> John: Right. >> That's where our scalability fits in. We were able to do a full scan of their entire AWS environment in months. And then keep up with the new data that was going into their AWS environment. This is huge. This was groundbreaking for them. So our hyper scan capability that we brought out, that we rolled out to AWS first, was a game changer for them. To understand what data they had, where it is, who's it is et cetera, at a way that they never thought they could keep up with. You know, I brought back to the beginning of code when the British government was keeping track of all the COVID cases on spreadsheets and spreadsheets broke. It was also out of date. As soon as they entered something else it was already out of date. They couldn't keep up with it. Like there's better ways to do that. Luckily they think they've moved on from that manual system. But automation using the correct human inputs when necessary. Then let machine learning, let big data take care of things that it can. Don't waste human hours that are precious and expensive unnecessarily. And make better decisions based on that data. >> Yeah. You raised a great point too which I hadn't thought of about. The fact is, you do your snapshot today and you start evaluating all their needs for today. And by the time you're able to get that done their needs have now exponentially grown. It's like painting the golden gate bridge. Right? You get done and now you got to paint it again, except it got bigger. We added lanes, but anyway. Hey, Will. Thanks for the time. We certainly appreciate it. Thanks for joining us here on the startup showcase. And just remind me that if you ever asked for my keys keep them out of that dumpster. Okay? (Will chuckles) >> Thanks, John. Glad to be here. >> Pleasure. (soft music)

Published Date : Mar 12 2021

SUMMARY :

of the AWS Startup Showcase I'm glad to be here. so it's nice to have you back. and the trust you have Is that the companies And enter the human instincts. And all of a sudden we but ifs of the way you store that you had different So the issues with the of maybe the kind of I'm going to keep everyone anonymized. So one of the things we So it's a good... of the largest AWS customers in the world. of all the COVID cases And by the time you're (soft music)

ENTITIES

Entity	Category	Confidence
AWS	ORGANIZATION	0.99+
John	PERSON	0.99+
Manhattan	LOCATION	0.99+
2016	DATE	0.99+
Will Murphy	PERSON	0.99+
two year	QUANTITY	0.99+
United States	LOCATION	0.99+
BigID	TITLE	0.99+
Will	PERSON	0.99+
William Murphy	PERSON	0.99+
10 block	QUANTITY	0.99+
five	DATE	0.99+
over $200 million	QUANTITY	0.99+
30,000 foot	QUANTITY	0.99+
One	QUANTITY	0.99+
10,000 people	QUANTITY	0.99+
each state	QUANTITY	0.99+
Redshift	TITLE	0.98+
one	QUANTITY	0.98+
GDPR	TITLE	0.98+
Athena	TITLE	0.98+
first	QUANTITY	0.98+
five year	QUANTITY	0.98+
S3	TITLE	0.98+
today	DATE	0.98+
Snowflake	TITLE	0.97+
10 years ago	DATE	0.97+
last year	DATE	0.97+
Kinesis	TITLE	0.95+
both	QUANTITY	0.93+
five-year anniversary	QUANTITY	0.92+
one organization	QUANTITY	0.91+
BigID	ORGANIZATION	0.9+
dozens of regulations	QUANTITY	0.89+
SmallID	TITLE	0.89+
petabytes	QUANTITY	0.84+
British government	ORGANIZATION	0.81+
ISV	ORGANIZATION	0.81+
over 12 different	QUANTITY	0.77+
AWS Startup Showcase	EVENT	0.77+
billions of inputs	QUANTITY	0.77+
day one	QUANTITY	0.75+
couple	QUANTITY	0.71+
couple reasons	QUANTITY	0.7+
handful of years	QUANTITY	0.7+
tons	QUANTITY	0.69+
couple things	QUANTITY	0.67+
Vice President	PERSON	0.65+
them	QUANTITY	0.64+
European	OTHER	0.61+
S3	COMMERCIAL_ITEM	0.58+
theCUBE	ORGANIZATION	0.57+
Databricks	TITLE	0.54+
terabytes	QUANTITY	0.53+
Business Development	ORGANIZATION	0.5+
Showcase	EVENT	0.5+
COVID	OTHER	0.45+
Accelerate	TITLE	0.43+
Alliances	ORGANIZATION	0.43+
BigID	EVENT	0.42+
Tedder	COMMERCIAL_ITEM	0.38+

theCube On Cloud 2021 - Kickoff

>>from around the globe. It's the Cube presenting Cuban cloud brought to you by silicon angle, everybody to Cuban cloud. My name is Dave Volonte, and I'll be here throughout the day with my co host, John Ferrier, who was quarantined in an undisclosed location in California. He's all good. Don't worry. Just precautionary. John, how are you doing? >>Hey, great to see you. John. Quarantine. My youngest daughter had covitz, so contact tracing. I was negative in quarantine at a friend's location. All good. >>Well, we wish you the best. Yeah, well, right. I mean, you know what's it like, John? I mean, you're away from your family. Your basically shut in, right? I mean, you go out for a walk, but you're really not in any contact with anybody. >>Correct? Yeah. I mean, basically just isolation, Um, pretty much what everyone's been kind of living on, kind of suffering through, but hopefully the vaccines are being distributed. You know, one of the things we talked about it reinvent the Amazon's cloud conference. Was the vaccine on, but just the whole workflow around that it's gonna get better. It's kind of really sucky. Here in the California area, they haven't done a good job, a lot of criticism around, how that's rolling out. And, you know, Amazon is now offering to help now that there's a new regime in the U. S. Government S o. You know, something to talk about, But certainly this has been a terrible time for Cove it and everyone in the deaths involved. But it's it's essentially pulled back the covers, if you will, on technology and you're seeing everything. Society. In fact, um, well, that's big tech MIT disinformation campaigns. All these vulnerabilities and cyber, um, accelerated digital transformation. We'll talk about a lot today, but yeah, it's totally changed the world. And I think we're in a new generation. I think this is a real inflection point, Dave. You know, modern society and the geo political impact of this is significant. You know, one of the benefits of being quarantined you'd be hanging out on these clubhouse APS, uh, late at night, listening to experts talk about what's going on, and it's interesting what's happening with with things like water and, you know, the island of Taiwan and China and U. S. Sovereignty, data, sovereignty, misinformation. So much going on to talk about. And, uh, meanwhile, companies like Mark injuries in BC firm starting a media company. What's going on? Hell freezing over. So >>we're gonna be talking about a lot of that stuff today. I mean, Cuba on cloud. It's our very first virtual editorial event we're trying to do is bring together our community. It's a it's an open forum and we're we're running the day on our 3 65 software platform. So we got a great lineup. We got CEO Seo's data Practitioners. We got a hard core technologies coming in, cloud experts, investors. We got some analysts coming in and we're creating this day long Siri's. And we've got a number of sessions that we've developed and we're gonna unpack. The future of Cloud computing in the coming decade is, John said, we're gonna talk about some of the public policy new administration. What does that mean for tech and for big tech in General? John, what can you add to that? >>Well, I think one of the things that we talked about Cove in this personal impact to me but other people as well. One of the things that people are craving right now is information factual information, truth texture that we call it. But hear this event for us, Davis, our first inaugural editorial event. Robbo, Kristen, Nicole, the entire Cube team Silicon angle, really trying to put together Morva cadence we're gonna doom or of these events where we can put out feature the best people in our community that have great fresh voices. You know, we do interview the big names Andy Jassy, Michael Dell, the billionaires with people making things happen. But it's often the people under there that are the rial newsmakers amid savory, for instance, that Google one of the most impressive technical people, he's gotta talk. He's gonna present democratization of software development in many Mawr riel people making things happen. And I think there's a communal element. We're going to do more of these. Obviously, we have, uh, no events to go to with the Cube. So we have the cube virtual software that we have been building and over years and now perfecting and we're gonna introduce that we're gonna put it to work, their dog footing it. We're gonna put that software toe work. We're gonna do a lot mawr virtual events like this Cuban cloud Cuban startup Cuban raising money. Cuban healthcare, Cuban venture capital. Always think we could do anything. Question is, what's the right story? What's the most important stories? Who's telling it and increase the aperture of the lens of the industry that we have and and expose that and fastest possible. That's what this software, you'll see more of it. So it's super exciting. We're gonna add new features like pulling people up on stage, Um, kind of bring on the clubhouse vibe and more of a community interaction with people to meet each other, and we'll roll those out. But the goal here is to just showcase it's cloud story in a way from people that are living it and providing value. So enjoy the day is gonna be chock full of presentations. We're gonna have moderated chat in these sessions, so it's an all day event so people can come in, drop out, and also that's everything's on demand immediately after the time slot. But you >>want to >>participate, come into the time slot into the cube room or breakout session. Whatever you wanna call it, it's a cube room, and the people in there chatting and having a watch party. So >>when you're in that home page when you're watching, there's a hero video there. Beneath that, there's a calendar, and you'll see that red line is that red horizontal line of vertical line is rather, it's a linear clock that will show you where we are in the day. If you click on any one of those sessions that will take you into the chat, we'll take you through those in a moment and share with you some of the guests that we have upcoming and and take you through the day what I wanted to do. John is trying to set the stage for the conversations that folks are gonna here today. And to do that, I wanna ask the guys to bring up a graphic. And I want to talk to you, John, about the progression of cloud over time and maybe go back to the beginning and review the evolution of cloud and then really talk a little bit about where we think it Z headed. So, guys, if you bring up that graphic when a W S announced s three, it was March of 2000 and six. And as you recall, John you know, nobody really. In the vendor and user community. They didn't really pay too much attention to that. And then later that year, in August, it announced E C two people really started. They started to think about a new model of computing, but they were largely, you know, chicken tires. And it was kind of bleeding edge developers that really leaned in. Um what? What were you thinking at the time? When when you saw, uh, s three e c to this retail company coming into the tech world? >>I mean, I thought it was totally crap. I'm like, this is terrible. But then at that time, I was thinking working on I was in between kind of start ups and I didn't have a lot of seed funding. And then I realized the C two was freaking awesome. But I'm like, Holy shit, this is really great because I don't need to pay a lot of cash, the Provisional Data center, or get a server. Or, you know, at that time, state of the art startup move was to buy a super micro box or some sort of power server. Um, it was well past the whole proprietary thing. But you have to assemble probably anyone with 5 to 8 grand box and go in, and we'll put a couple ghetto rack, which is basically, uh, you know, you put it into some coasting location. It's like with everybody else in the tech ghetto of hosting, still paying monthly fees and then maintaining it and provisioning that's just to get started. And then Amazon was just really easy. And then from there you just It was just awesome. I just knew Amazon would be great. They had a lot of things that they had to fix. You know, custom domains and user interface Council got better and better, but it was awesome. >>Well, what we really saw the cloud take hold from my perspective anyway, was the financial crisis in, you know, 709 It put cloud on the radar of a number of CFOs and, of course, shadow I T departments. They wanted to get stuff done and and take I t in in in, ah, pecs, bite sized chunks. So it really was. There's cloud awakening and we came out of that financial crisis, and this we're now in this 10 year plus boom um, you know, notwithstanding obviously the economic crisis with cove it. But much of it was powered by the cloud in the decade. I would say it was really about I t transformation. And it kind of ironic, if you will, because the pandemic it hits at the beginning of this decade, >>and it >>creates this mandate to go digital. So you've you've said a lot. John has pulled forward. It's accelerated this industry transformation. Everybody talks about that, but and we've highlighted it here in this graphic. It probably would have taken several more years to mature. But overnight you had this forced march to digital. And if you weren't a digital business, you were kind of out of business. And and so it's sort of here to stay. How do you see >>You >>know what this evolution and what we can expect in the coming decades? E think it's safe to say the last 10 years defined by you know, I t transformation. That's not gonna be the same in the coming years. How do you see it? >>It's interesting. I think the big tech companies are on, but I think this past election, the United States shows um, the power that technology has. And if you look at some of the main trends in the enterprise specifically around what clouds accelerating, I call the second wave of innovations coming where, um, it's different. It's not what people expect. Its edge edge computing, for instance, has talked about a lot. But industrial i o t. Is really where we've had a lot of problems lately in terms of hacks and malware and just just overall vulnerabilities, whether it's supply chain vulnerabilities, toe actual disinformation, you know, you know, vulnerabilities inside these networks s I think this network effects, it's gonna be a huge thing. I think the impact that tech will have on society and global society geopolitical things gonna be also another one. Um, I think the modern application development of how applications were written with data, you know, we always been saying this day from the beginning of the Cube data is his integral part of the development process. And I think more than ever, when you think about cloud and edge and this distributed computing paradigm, that cloud is now going next level with is the software and how it's written will be different. You gotta handle things like, where's the compute component? Is it gonna be at the edge with all the server chips, innovations that Amazon apple intel of doing, you're gonna have compute right at the edge, industrial and kind of human edge. How does that work? What's Leighton see to that? It's it really is an edge game. So to me, software has to be written holistically in a system's impact on the way. Now that's not necessarily nude in the computer science and in the tech field, it's just gonna be deployed differently. So that's a complete rewrite, in my opinion of the software applications. Which is why you're seeing Amazon Google VM Ware really pushing Cooper Netease and these service messes in the micro Services because super critical of this technology become smarter, automated, autonomous. And that's completely different paradigm in the old full stack developer, you know, kind of model. You know, the full stack developer, his ancient. There's no such thing as a full stack developer anymore, in my opinion, because it's a half a stack because the cloud takes up the other half. But no one wants to be called the half stack developer because it doesn't sound as good as Full Stack, but really Cloud has eliminated the technology complexity of what a full stack developer used to dio. Now you can manage it and do things with it, so you know, there's some work to done, but the heavy lifting but taking care of it's the top of the stack that I think is gonna be a really critical component. >>Yeah, and that that sort of automation and machine intelligence layer is really at the top of the stack. This this thing becomes ubiquitous, and we now start to build businesses and new processes on top of it. I wanna I wanna take a look at the Big Three and guys, Can we bring up the other The next graphic, which is an estimate of what the revenue looks like for the for the Big three. And John, this is I asked and past spend for the Big Three Cloud players. And it's It's an estimate that we're gonna update after earning seasons, and I wanna point a couple things out here. First is if you look at the combined revenue production of the Big Three last year, it's almost 80 billion in infrastructure spend. I mean, think about that. That Z was that incremental spend? No. It really has caused a lot of consolidation in the on Prem data center business for guys like Dell. And, you know, um, see, now, part of the LHP split up IBM Oracle. I mean, it's etcetera. They've all felt this sea change, and they had to respond to it. I think the second thing is you can see on this data. Um, it's true that azure and G C P they seem to be growing faster than a W s. We don't know the exact numbers >>because >>A W S is the only company that really provides a clean view of i s and pass. Whereas Microsoft and Google, they kind of hide the ball in their numbers. I mean, I don't blame them because they're behind, but they do leave breadcrumbs and clues about growth rates and so forth. And so we have other means of estimating, but it's it's undeniable that azure is catching up. I mean, it's still quite distance the third thing, and before I want to get your input here, John is this is nuanced. But despite the fact that Azure and Google the growing faster than a W s. You can see those growth rates. A W s I'll call this out is the only company by our estimates that grew its business sequentially last quarter. Now, in and of itself, that's not significant. But what is significant is because AWS is so large there $45 billion last year, even if the slower growth rates it's able to grow mawr and absolute terms than its competitors, who are basically flat to down sequentially by our estimates. Eso So that's something that I think is important to point out. Everybody focuses on the growth rates, but it's you gotta look at also the absolute dollars and, well, nonetheless, Microsoft in particular, they're they're closing the gap steadily, and and we should talk more about the competitive dynamics. But I'd love to get your take on on all this, John. >>Well, I mean, the clouds are gonna win right now. Big time with the one the political climate is gonna be favoring Big check. But more importantly, with just talking about covert impact and celebrating the digital transformation is gonna create a massive rising tide. It's already happening. It's happening it's happening. And again, this shift in programming, uh, models are gonna really kinda accelerating, create new great growth. So there's no doubt in my mind of all three you're gonna win big, uh, in the future, they're just different, You know, the way they're going to market position themselves, they have to be. Google has to be a little bit different than Amazon because they're smaller and they also have different capabilities, then trying to catch up. So if you're Google or Microsoft, you have to have a competitive strategy to decide. How do I wanna ride the tide If you will put the rising tide? Well, if I'm Amazon, I mean, if I'm Microsoft and Google, I'm not going to try to go frontal and try to copy Amazon because Amazon is just pounding lead of features and scale and they're different. They were, I would say, take advantage of the first mover of pure public cloud. They really awesome. It passed and I, as they've integrated in Gardner, now reports and integrated I as and passed components. So Gardner finally got their act together and said, Hey, this is really one thing. SAS is completely different animal now Microsoft Super Smart because they I think they played the right card. They have a huge installed base converted to keep office 3 65 and move sequel server and all their core jewels into the cloud as fast as possible, clarified while filling in the gaps on the product side to be cloud. So you know, as you're doing trends job, they're just it's just pedal as fast as you can. But Microsoft is really in. The strategy is just go faster trying. Keep pedaling fast, get the features, feature velocity and try to make it high quality. Google is a little bit different. They have a little power base in terms of their network of strong, and they have a lot of other big data capabilities, so they have to use those to their advantage. So there is. There is there is competitive strategy game application happening with these companies. It's not like apples, the apples, In my opinion, it never has been, and I think that's funny that people talk about it that way. >>Well, you're bringing up some great points. I want guys bring up the next graphic because a lot of things that John just said are really relevant here. And what we're showing is that's a survey. Data from E. T. R R Data partners, like 1400 plus CEOs and I T buyers and on the vertical axis is this thing called Net score, which is a measure of spending momentum. And the horizontal axis is is what's called market share. It's a measure of the pervasiveness or, you know, number of mentions in the data set. There's a couple of key points I wanna I wanna pick up on relative to what John just said. So you see A W S and Microsoft? They stand alone. I mean, they're the hyper scale er's. They're far ahead of the pack and frankly, they have fall down, toe, lose their lead. They spend a lot on Capex. They got the flywheel effects going. They got both spending velocity and large market shares, and so, but they're taking a different approach. John, you're right there living off of their SAS, the state, their software state, Andi, they're they're building that in to their cloud. So they got their sort of a captive base of Microsoft customers. So they've got that advantage. They also as we'll hear from from Microsoft today. They they're building mawr abstraction layers. Andy Jassy has said We don't wanna be in that abstraction layer business. We wanna have access to those, you know, fine grain primitives and eso at an AP level. So so we can move fast with the market. But but But so those air sort of different philosophies, John? >>Yeah. I mean, you know, people who know me know that I love Amazon. I think their product is superior at many levels on in its way that that has advantages again. They have a great sass and ecosystem. They don't really have their own SAS play, although they're trying to add some stuff on. I've been kind of critical of Microsoft in the past, but one thing I'm not critical of Microsoft, and people can get this wrong in the marketplace. Actually, in the journalism world and also in just some other analysts, Microsoft has always had large scale eso to say that Microsoft never had scale on that Amazon owned the monopoly on our franchise on scales wrong. Microsoft had scale from day one. Their business was always large scale global. They've always had infrastructure with MSN and their search and the distributive how they distribute browsers and multiple countries. Remember they had the lock on the operating system and the browser for until the government stepped in in 1997. And since 1997 Microsoft never ever not invested in infrastructure and scale. So that whole premise that they don't compete well there is wrong. And I think that chart demonstrates that there, in there in the hyper scale leadership category, hands down the question that I have. Is that there not as good and making that scale integrate in because they have that legacy cards. This is the classic innovator's dilemma. Clay Christensen, right? So I think they're doing a good job. I think their strategy sound. They're moving as fast as they can. But then you know they're not gonna come out and say We don't have the best cloud. Um, that's not a marketing strategy. Have to kind of hide in this and get better and then double down on where they're winning, which is. Clients are converting from their legacy at the speed of Microsoft, and they have a huge client base, So that's why they're stopping so high That's why they're so good. >>Well, I'm gonna I'm gonna give you a little preview. I talked to gear up your f Who's gonna come on today and you'll see I I asked him because the criticism of Microsoft is they're, you know, they're just good enough. And so I asked him, Are you better than good enough? You know, those are fighting words if you're inside of Microsoft, but so you'll you'll have to wait to see his answer. Now, if you guys, if you could bring that that graphic back up I wanted to get into the hybrid zone. You know where the field is. Always got >>some questions coming in on chat, Dave. So we'll get to those >>great Awesome. So just just real quick Here you see this hybrid zone, this the field is bunched up, and the other companies who have a large on Prem presence and have been forced to initiate some kind of coherent cloud strategy included. There is Michael Michael, multi Cloud, and Google's there, too, because they're far behind and they got to take a different approach than a W s. But as you can see, so there's some real progress here. VM ware cloud on AWS stands out, as does red hat open shift. You got VM Ware Cloud, which is a VCF Cloud Foundation, even Dell's cloud. And you'd expect HP with Green Lake to be picking up momentum in the future quarters. And you've got IBM and Oracle, which there you go with the innovator's dilemma. But there, at least in the cloud game, and we can talk about that. But so, John, you know, to your point, you've gotta have different strategies. You're you're not going to take out the big too. So you gotta play, connect your print your on Prem to your cloud, your hybrid multi cloud and try to create new opportunities and new value there. >>Yeah, I mean, I think we'll get to the question, but just that point. I think this Zeri Chen's come on the Cube many times. We're trying to get him to come on lunch today with Features startup, but he's always said on the Q B is a V C at Greylock great firm. Jerry's Cloud genius. He's been there, but he made a point many, many years ago. It's not a winner. Take all the winner. Take most, and the Big Three maybe put four or five in there. We'll take most of the markets here. But I think one of the things that people are missing and aren't talking about Dave is that there's going to be a second tier cloud, large scale model. I don't want to say tear to cloud. It's coming to sound like a sub sub cloud, but a new category of cloud on cloud, right? So meaning if you get a snowflake, did I think this is a tale? Sign to what's coming. VM Ware Cloud is a native has had huge success, mainly because Amazon is essentially enabling them to be successful. So I think is going to be a wave of a more of a channel model of indirect cloud build out where companies like the Cube, potentially for media or others, will build clouds on top of the cloud. So if Google, Microsoft and Amazon, whoever is the first one to really enable that okay, we'll do extremely well because that means you can compete with their scale and create differentiation on top. So what snowflake did is all on Amazon now. They kind of should go to azure because it's, you know, politically correct that have multiple clouds and distribution and business model shifts. But to get that kind of performance they just wrote on Amazon. So there's nothing wrong with that. Because you're getting paid is variable. It's cap ex op X nice categorization. So I think that's the way that we're watching. I think it's super valuable, I think will create some surprises in terms of who might come out of the woodwork on be a leader in a category. Well, >>your timing is perfect, John and we do have some questions in the chat. But before we get to that, I want to bring in Sargi Joe Hall, who's a contributor to to our community. Sargi. Can you hear us? All right, so we got, uh, while >>bringing in Sarpy. Let's go down from the questions. So the first question, Um, we'll still we'll get the student second. The first question. But Ronald ask, Can a vendor in 2021 exist without a hybrid cloud story? Well, story and capabilities. Yes, they could live with. They have to have a story. >>Well, And if they don't own a public cloud? No. No, they absolutely cannot. Uh hey, Sergey. How you doing, man? Good to see you. So, folks, let me let me bring in Sergeant Kohala. He's a He's a cloud architect. He's a practitioner, He's worked in as a technologist. And there's a frequent guest on on the Cube. Good to see you, my friend. Thanks for taking the time with us. >>And good to see you guys to >>us. So we were kind of riffing on the competitive landscape we got. We got so much to talk about this, like, it's a number of questions coming in. Um, but Sargi we wanna talk about you know, what's happening here in Cloud Land? Let's get right into it. I mean, what do you guys see? I mean, we got yesterday. New regime, new inaug inauguration. Do you do you expect public policy? You'll start with you Sargi to have What kind of effect do you think public policy will have on, you know, cloud generally specifically, the big tech companies, the tech lash. Is it gonna be more of the same? Or do you see a big difference coming? >>I think that there will be some changing narrative. I believe on that. is mainly, um, from the regulators side. A lot has happened in one month, right? So people, I think are losing faith in high tech in a certain way. I mean, it doesn't, uh, e think it matters with camp. You belong to left or right kind of thing. Right? But parlor getting booted out from Italy s. I think that was huge. Um, like, how do you know that if a cloud provider will not boot you out? Um, like, what is that line where you draw the line? What are the rules? I think that discussion has to take place. Another thing which has happened in the last 23 months is is the solar winds hack, right? So not us not sort acknowledging that I was Russia and then wish you watching it now, new administration might have a different sort of Boston on that. I think that's huge. I think public public private partnership in security arena will emerge this year. We have to address that. Yeah, I think it's not changing. Uh, >>economics economy >>will change gradually. You know, we're coming out off pandemic. The money is still cheap on debt will not be cheap. for long. I think m and a activity really will pick up. So those are my sort of high level, Uh, >>thank you. I wanna come back to them. And because there's a question that chat about him in a But, John, how do you see it? Do you think Amazon and Google on a slippery slope booting parlor off? I mean, how do they adjudicate between? Well, what's happening in parlor? Uh, anything could happen on clubhouse. Who knows? I mean, can you use a I to find that stuff? >>Well, that's I mean, the Amazons, right? Hiding right there bunkered in right now from that bad, bad situation. Because again, like people we said Amazon, these all three cloud players win in the current environment. Okay, Who wins with the U. S. With the way we are China, Russia, cloud players. Okay, let's face it, that's the reality. So if I wanted to reset the world stage, you know what better way than the, you know, change over the United States economy, put people out of work, make people scared, and then reset the entire global landscape and control all with cash? That's, you know, conspiracy theory. >>So you see the riches, you see the riches, get the rich, get richer. >>Yeah, well, that's well, that's that. That's kind of what's happening, right? So if you start getting into this idea that you can't actually have an app on site because the reason now I'm not gonna I don't know the particular parlor, but apparently there was a reason. But this is dangerous, right? So what? What that's gonna do is and whether it's right or wrong or not, whether political opinion is it means that they were essentially taken offline by people that weren't voted for that. Weren't that when people didn't vote for So that's not a democracy, right? So that's that's a different kind of regime. What it's also going to do is you also have this groundswell of decentralized thinking, right. So you have a whole wave of crypto and decentralized, um, cyber punks out there who want to decentralize it. So all of this stuff in January has created a huge counterculture, and I had predicted this so many times in the Cube. David counterculture is coming and and you already have this kind of counterculture between centralized and decentralized thinking and so I think the Amazon's move is dangerous at a fundamental level. Because if you can't get it, if you can't get buy domain names and you're completely blackballed by by organized players, that's a Mafia, in my opinion. So, uh, and that and it's also fuels the decentralized move because people say, Hey, if that could be done to them, it could be done to me. Just the fact that it could be done will promote a swing in the other direction. I >>mean, independent of of, you know, again, somebody said your political views. I mean Parlor would say, Hey, we're trying to clean this stuff up now. Maybe they didn't do it fast enough, but you think about how new parlor is. You think about the early days of Twitter and Facebook, so they were sort of at a disadvantage. Trying to >>have it was it was partly was what it was. It was a right wing stand up job of standing up something quick. Their security was terrible. If you look at me and Cory Quinn on be great to have him, and he did a great analysis on this, because if you look the lawsuit was just terrible. Security was just a half, asshole. >>Well, and the experience was horrible. I mean, it's not It was not a great app, but But, like you said, it was a quick stew. Hand up, you know, for an agenda. But nonetheless, you know, to start, get to your point earlier. It's like, you know, Are they gonna, you know, shut me down? If I say something that's, you know, out of line, or how do I control that? >>Yeah, I remember, like, 2019, we involved closing sort of remarks. I was there. I was saying that these companies are gonna be too big to fail. And also, they're too big for other nations to do business with. In a way, I think MNCs are running the show worldwide. They're running the government's. They are way. Have seen the proof of that in us this year. Late last year and this year, um, Twitter last night blocked Chinese Ambassador E in us. Um, from there, you know, platform last night and I was like, What? What's going on? So, like, we used to we used to say, like the Chinese company, tech companies are in bed with the Chinese government. Right. Remember that? And now and now, Actually, I think Chinese people can say the same thing about us companies. Uh, it's not a good thing. >>Well, let's >>get some question. >>Let's get some questions from the chat. Yeah. Thank you. One is on M and a subject you mentioned them in a Who do you see is possible emanate targets. I mean, I could throw a couple out there. Um, you know, some of the cdn players, maybe aka my You know, I like I like Hashi Corp. I think they're doing some really interesting things. What do you see? >>Nothing. Hashi Corp. And anybody who's doing things in the periphery is a candidate for many by the big guys, you know, by the hyper scholars and number two tier two or five hyper scholars. Right. Uh, that's why sales forces of the world and stuff like that. Um, some some companies, which I thought there will be a target, Sort of. I mean, they target they're getting too big, because off their evaluations, I think how she Corpuz one, um, >>and >>their bunch in the networking space. Uh, well, Tara, if I say the right that was acquired by at five this week, this week or last week, Actually, last week for $500 million. Um, I know they're founder. So, like I found that, Yeah, there's a lot going on on the on the network side on the anything to do with data. Uh, that those air too hard areas in the cloud arena >>data, data protection, John, any any anything you could adhere. >>And I think I mean, I think ej ej is gonna be where the gaps are. And I think m and a activity is gonna be where again, the bigger too big to fail would agree with you on that one. But we're gonna look at white Spaces and say a white space for Amazon is like a monster space for a start up. Right? So you're gonna have these huge white spaces opportunities, and I think it's gonna be an M and a opportunity big time start ups to get bought in. Given the speed on, I think you're gonna see it around databases and around some of these new service meshes and micro services. I mean, >>they there's a There's a question here, somebody's that dons asking why is Google who has the most pervasive tech infrastructure on the planet. Not at the same level of other to hyper scale is I'll give you my two cents is because it took him a long time to get their heads out of their ads. I wrote a piece of around that a while ago on they just they figured out how to learn the enterprise. I mean, John, you've made this point a number of times, but they just and I got a late start. >>Yeah, they're adding a lot of people. If you look at their who their hiring on the Google Cloud, they're adding a lot of enterprise chops in there. They realized this years ago, and we've talked to many of the top leaders, although Curry and hasn't yet sit down with us. Um, don't know what he's hiding or waiting for, but they're clearly not geared up to chicken Pete. You can see it with some some of the things that they're doing, but I mean competed the level of Amazon, but they have strength and they're playing their strength, but they definitely recognize that they didn't have the enterprise motions and people in the DNA and that David takes time people in the enterprise. It's not for the faint of heart. It's unique details that are different. You can't just, you know, swing the Google playbook and saying We're gonna home The enterprises are text grade. They knew that years ago. So I think you're going to see a good year for Google. I think you'll see a lot of change. Um, they got great people in there. On the product marketing side is Dev Solution Architects, and then the SRE model that they have perfected has been strong. And I think security is an area that they could really had a lot of value it. So, um always been a big fan of their huge network and all the intelligence they have that they could bring to bear on security. >>Yeah, I think Google's problem main problem that to actually there many, but one is that they don't They don't have the boots on the ground as compared to um, Microsoft, especially an Amazon actually had a similar problem, but they had a wide breath off their product portfolio. I always talk about feature proximity in cloud context, like if you're doing one thing. You wanna do another thing? And how do you go get that feature? Do you go to another cloud writer or it's right there where you are. So I think Amazon has the feature proximity and they also have, uh, aske Compared to Google, there's skills gravity. Larger people are trained on AWS. I think Google is trying there. So second problem Google is having is that that they're they're more focused on, I believe, um, on the data science part on their sort of skipping the cool components sort of off the cloud, if you will. The where the workloads needs, you know, basic stuff, right? That's like your compute storage and network. And that has to be well, talk through e think e think they will do good. >>Well, so later today, Paul Dillon sits down with Mids Avery of Google used to be in Oracle. He's with Google now, and he's gonna push him on on the numbers. You know, you're a distant third. Does that matter? And of course, you know, you're just a preview of it's gonna say, Well, no, we don't really pay attention to that stuff. But, John, you said something earlier that. I think Jerry Chen made this comment that, you know, Is it a winner? Take all? No, but it's a winner. Take a lot. You know the number two is going to get a big chunk of the pie. It appears that the markets big enough for three. But do you? Does Google have to really dramatically close the gap on be a much, much closer, you know, to the to the leaders in orderto to compete in this race? Or can they just kind of continue to bump along, siphon off the ad revenue? Put it out there? I mean, I >>definitely can compete. I think that's like Google's in it. Then it they're not. They're not caving, right? >>So But But I wrote I wrote recently that I thought they should even even put mawr oven emphasis on the cloud. I mean, maybe maybe they're already, you know, doubling down triple down. I just I think that is a multi trillion dollar, you know, future for the industry. And, you know, I think Google, believe it or not, could even do more. Now. Maybe there's just so much you could dio. >>There's a lot of challenges with these company, especially Google. They're in Silicon Valley. We have a big Social Justice warrior mentality. Um, there's a big debate going on the in the back channels of the tech scene here, and that is that if you want to be successful in cloud, you have to have a good edge strategy, and that involves surveillance, use of data and pushing the privacy limits. Right? So you know, Google has people within the country that will protest contract because AI is being used for war. Yet we have the most unstable geopolitical seen that I've ever witnessed in my lifetime going on right now. So, um, don't >>you think that's what happened with parlor? I mean, Rob Hope said, Hey, bar is pretty high to kick somebody off your platform. The parlor went over the line, but I would also think that a lot of the employees, whether it's Google AWS as well, said, Hey, why are we supporting you know this and so to your point about social justice, I mean, that's not something. That >>parlor was not just social justice. They were trying to throw the government. That's Rob e. I think they were in there to get selfies and being protesters. But apparently there was evidence from what I heard in some of these clubhouse, uh, private chats. Waas. There was overwhelming evidence on parlor. >>Yeah, but my point is that the employee backlash was also a factor. That's that's all I'm saying. >>Well, we have Google is your Google and you have employees to say we will boycott and walk out if you bid on that jet I contract for instance, right, But Microsoft one from maybe >>so. I mean, that's well, >>I think I think Tom Poole's making a really good point here, which is a Google is an alternative. Thio aws. The last Google cloud next that we were asked at they had is all virtual issue. But I saw a lot of I T practitioners in the audience looking around for an alternative to a W s just seeing, though, we could talk about Mano Cloud or Multi Cloud, and Andy Jassy has his his narrative around, and he's true when somebody goes multiple clouds, they put you know most of their eggs in one basket. Nonetheless, I think you know, Google's got a lot of people interested in, particularly in the analytic side, um, in in an alternative, hedging their bets eso and particularly use cases, so they should be able to do so. I guess my the bottom line here is the markets big enough to have Really? You don't have to be the Jack Welch. I gotta be number one and number two in the market. Is that the conclusion here? >>I think so. But the data gravity and the skills gravity are playing against them. Another problem, which I didn't want a couple of earlier was Google Eyes is that they have to boot out AWS wherever they go. Right? That is a huge challenge. Um, most off the most off the Fortune 2000 companies are already using AWS in one way or another. Right? So they are the multi cloud kind of player. Another one, you know, and just pure purely somebody going 200% Google Cloud. Uh, those cases are kind of pure, if you will. >>I think it's gonna be absolutely multi cloud. I think it's gonna be a time where you looked at the marketplace and you're gonna think in terms of disaster recovery, model of cloud or just fault tolerant capabilities or, you know, look at the parlor, the next parlor. Or what if Amazon wakes up one day and said, Hey, I don't like the cubes commentary on their virtual events, so shut them down. We should have a fail over to Google Cloud should Microsoft and Option. And one of people in Microsoft ecosystem wants to buy services from us. We have toe kind of co locate there. So these are all open questions that are gonna be the that will become certain pretty quickly, which is, you know, can a company diversify their computing An i t. In a way that works. And I think the momentum around Cooper Netease you're seeing as a great connective tissue between, you know, having applications work between clouds. Right? Well, directionally correct, in my opinion, because if I'm a company, why wouldn't I wanna have choice? So >>let's talk about this. The data is mixed on that. I'll share some data, meaty our data with you. About half the companies will say Yeah, we're spreading the wealth around to multiple clouds. Okay, That's one thing will come back to that. About the other half were saying, Yeah, we're predominantly mono cloud we didn't have. The resource is. But what I think going forward is that that what multi cloud really becomes. And I think John, you mentioned Snowflake before. I think that's an indicator of what what true multi cloud is going to look like. And what Snowflake is doing is they're building abstraction, layer across clouds. Ed Walsh would say, I'm standing on the shoulders of Giants, so they're basically following points of presence around the globe and building their own cloud. They call it a data cloud with a global mesh. We'll hear more about that later today, but you sign on to that cloud. So they're saying, Hey, we're gonna build value because so many of Amazon's not gonna build that abstraction layer across multi clouds, at least not in the near term. So that's a really opportunity for >>people. I mean, I don't want to sound like I'm dating myself, but you know the date ourselves, David. I remember back in the eighties, when you had open systems movement, right? The part of the whole Revolution OS I open systems interconnect model. At that time, the networking stacks for S N A. For IBM, decadent for deck we all know that was a proprietary stack and then incomes TCP I p Now os I never really happened on all seven layers, but the bottom layers standardized. Okay, that was huge. So I think if you look at a W s or some of the comments in the chat AWS is could be the s n a. Depends how you're looking at it, right? And you could say they're open. But in a way, they want more Amazon. So Amazon's not out there saying we love multi cloud. Why would they promote multi cloud? They are a one of the clouds they want. >>That's interesting, John. And then subject is a cloud architect. I mean, it's it is not trivial to make You're a data cloud. If you're snowflake, work on AWS work on Google. Work on Azure. Be seamless. I mean, certainly the marketing says that, but technically, that's not trivial. You know, there are latent see issues. Uh, you know, So that's gonna take a while to develop. What? Do your thoughts there? >>I think that multi cloud for for same workload and multi cloud for different workloads are two different things. Like we usually put multiple er in one bucket, right? So I think you're right. If you're trying to do multi cloud for the same workload, that's it. That's Ah, complex, uh, problem to solve architecturally, right. You have to have a common ap ice and common, you know, control playing, if you will. And we don't have that yet, and then we will not have that for a for at least one other couple of years. So, uh, if you if you want to do that, then you have to go to the lower, lowest common denominator in technical sort of stock, if you will. And then you're not leveraging the best of the breed technology off their from different vendors, right? I believe that's a hard problem to solve. And in another thing, is that that that I always say this? I'm always on the death side, you know, developer side, I think, uh, two deaths. Public cloud is a proxy for innovative culture. Right. So there's a catch phrase I have come up with today during shower eso. I think that is true. And then people who are companies who use the best of the breed technologies, they can attract the these developers and developers are the Mazen's off This digital sort of empires, amazingly, is happening there. Right there they are the Mazen's right. They head on the bricks. I think if you don't appeal to developers, if you don't but extensive for, like, force behind educating the market, you can't you can't >>put off. It's the same game Stepping story was seeing some check comments. Uh, guard. She's, uh, linked in friend of mine. She said, Microsoft, If you go back and look at the Microsoft early days to the developer Point they were, they made their phones with developers. They were a software company s Oh, hey, >>forget developers, developers, developers. >>You were if you were in the developer ecosystem, you were treated his gold. You were part of the family. If you were outside that world, you were competitors, and that was ruthless times back then. But they again they had. That was where it was today. Look at where the software defined businesses and starve it, saying it's all about being developer lead in this new way to program, right? So the cloud next Gen Cloud is going to look a lot like next Gen Developer and all the different tools and techniques they're gonna change. So I think, yes, this kind of developer ecosystem will be harnessed, and that's the power source. It's just gonna look different. So, >>Justin, Justin in the chat has a comment. I just want to answer the question about elastic thoughts on elastic. Um, I tell you, elastic has momentum uh, doing doing very well in the market place. Thea Elk Stack is a great alternative that people are looking thio relative to Splunk. Who people complain about the pricing. Of course it's plunks got the easy button, but it is getting increasingly expensive. The problem with elk stack is you know, it's open source. It gets complicated. You got a shard, the databases you gotta manage. It s Oh, that's what Ed Walsh's company chaos searches is all about. But elastic has some riel mo mentum in the marketplace right now. >>Yeah, you know, other things that coming on the chat understands what I was saying about the open systems is kubernetes. I always felt was that is a bad metaphor. But they're with me. That was the TCP I peep In this modern era, C t c p I p created that that the disruptor to the S N A s and the network protocols that were proprietary. So what KUBERNETES is doing is creating a connective tissue between clouds and letting the open source community fill in the gaps in the middle, where kind of way kind of probably a bad analogy. But that's where the disruption is. And if you look at what's happened since Kubernetes was put out there, what it's become kind of de facto and standard in the sense that everyone's rallying around it. Same exact thing happened with TCP was people were trashing it. It is terrible, you know it's not. Of course they were trashed because it was open. So I find that to be very interesting. >>Yeah, that's a good >>analogy. E. Thinks the R C a cable. I used the R C. A cable analogy like the VCRs. When they started, they, every VC had had their own cable, and they will work on Lee with that sort of plan of TV and the R C. A cable came and then now you can put any TV with any VCR, and the VCR industry took off. There's so many examples out there around, uh, standards And how standards can, you know, flair that fire, if you will, on dio for an industry to go sort of wild. And another trend guys I'm seeing is that from the consumer side. And let's talk a little bit on the consuming side. Um, is that the The difference wouldn't be to B and B to C is blood blurred because even the physical products are connected to the end user Like my door lock, the August door lock I didn't just put got get the door lock and forget about that. Like I I value the expedience it gives me or problems that gives me on daily basis. So I'm close to that vendor, right? So So the middle men, uh, middle people are getting removed from from the producer off the technology or the product to the consumer. Even even the sort of big grocery players they have their APs now, uh, how do you buy stuff and how it's delivered and all that stuff that experience matters in that context, I think, um, having, uh, to be able to sell to thes enterprises from the Cloud writer Breuder's. They have to have these case studies or all these sample sort off reference architectures and stuff like that. I think whoever has that mawr pushed that way, they are doing better like that. Amazon is Amazon. Because of that reason, I think they have lot off sort off use cases about on top of them. And they themselves do retail like crazy. Right? So and other things at all s. So I think that's a big trend. >>Great. Great points are being one of things. There's a question in there about from, uh, Yaden. Who says, uh, I like the developer Lead cloud movement, But what is the criticality of the executive audience when educating the marketplace? Um, this comes up a lot in some of my conversations around automation. So automation has been a big wave to automate this automate everything. And then everything is a service has become kind of kind of the the executive suite. Kind of like conversation we need to make everything is a service in our business. You seeing people move to that cloud model. Okay, so the executives think everything is a services business strategy, which it is on some level, but then, when they say Take that hill, do it. Developers. It's not that easy. And this is where a lot of our cube conversations over the past few months have been, especially during the cova with cute virtual. This has come up a lot, Dave this idea, and start being around. It's easy to say everything is a service but will implement it. It's really hard, and I think that's where the developer lead Connection is where the executive have to understand that in order to just say it and do it are two different things. That digital transformation. That's a big part of it. So I think that you're gonna see a lot of education this year around what it means to actually do that and how to implement it. >>I'd like to comment on the as a service and subject. Get your take on it. I mean, I think you're seeing, for instance, with HP Green Lake, Dell's come out with Apex. You know IBM as its utility model. These companies were basically taking a page out of what I what I would call a flawed SAS model. If you look at the SAS players, whether it's salesforce or workday, service now s a P oracle. These models are They're really They're not cloud pricing models. They're they're basically you got to commit to a term one year, two year, three year. We'll give you a discount if you commit to the longer term. But you're locked in on you. You probably pay upfront. Or maybe you pay quarterly. That's not a cloud pricing model. And that's why I mean, they're flawed. You're seeing companies like Data Dog, for example. Snowflake is another one, and they're beginning to price on a consumption basis. And that is, I think, one of the big changes that we're going to see this decade is that true cloud? You know, pay by the drink pricing model and to your point, john toe, actually implement. That is, you're gonna need a whole new layer across your company on it is quite complicated it not even to mention how you compensate salespeople, etcetera. The a p. I s of your product. I mean, it is that, but that is a big sea change that I see coming. Subject your >>thoughts. Yeah, I think like you couldn't see it. And like some things for this big tech exacts are hidden in the plain >>sight, right? >>They don't see it. They they have blind spots, like Look at that. Look at Amazon. They went from Melissa and 200 millisecond building on several s, Right, Right. And then here you are, like you're saying, pay us for the whole year. If you don't use the cloud, you lose it or will pay by month. Poor user and all that stuff like that that those a role models, I think these players will be forced to use that term pricing like poor minute or for a second, poor user. That way, I think the Salesforce moral is hybrid. They're struggling in a way. I think they're trying to bring the platform by doing, you know, acquisition after acquisition to be a platform for other people to build on top off. But they're having a little trouble there because because off there, such pricing and little closeness, if you will. And, uh, again, I'm coming, going, going back to developers like, if you are not appealing to developers who are writing the latest and greatest code and it is open enough, by the way open and open source are two different things that we all know that. So if your platform is not open enough, you will have you know, some problems in closing the deals. >>E. I want to just bring up a question on chat around from Justin didn't fitness. Who says can you touch on the vertical clouds? Has your offering this and great question Great CP announcing Retail cloud inventions IBM Athena Okay, I'm a huge on this point because I think this I'm not saying this for years. Cloud computing is about horizontal scalability and vertical specialization, and that's absolutely clear, and you see all the clouds doing it. The vertical rollouts is where the high fidelity data is, and with machine learning and AI efforts coming out, that's accelerated benefits. There you have tow, have the vertical focus. I think it's super smart that clouds will have some sort of vertical engine, if you will in the clouds and build on top of a control playing. Whether that's data or whatever, this is clearly the winning formula. If you look at all the successful kind of ai implementations, the ones that have access to the most data will get the most value. So, um if you're gonna have a data driven cloud you have tow, have this vertical feeling, Um, in terms of verticals, the data on DSO I think that's super important again, just generally is a strategy. I think Google doing a retail about a super smart because their whole pitches were not Amazon on. Some people say we're not Google, depending on where you look at. So every of these big players, they have dominance in the areas, and that's scarce. Companies and some companies will never go to Amazon for that reason. Or some people never go to Google for other reasons. I know people who are in the ad tech. This is a black and we're not. We're not going to Google. So again, it is what it is. But this idea of vertical specialization relevant in super >>forts, I want to bring to point out to sessions that are going on today on great points. I'm glad you asked that question. One is Alan. As he kicks off at 1 p.m. Eastern time in the transformation track, he's gonna talk a lot about the coming power of ecosystems and and we've talked about this a lot. That that that to compete with Amazon, Google Azure, you've gotta have some kind of specialization and vertical specialization is a good one. But of course, you see in the big Big three also get into that. But so he's talking at one o'clock and then it at 3 36 PM You know this times are strange, but e can explain that later Hillary Hunter is talking about she's the CTO IBM I B M's ah Financial Cloud, which is another really good example of specifying vertical requirements and serving. You know, an audience subject. I think you have some thoughts on this. >>Actually, I lost my thought. E >>think the other piece of that is data. I mean, to the extent that you could build an ecosystem coming back to Alan Nancy's premise around data that >>billions of dollars in >>their day there's billions of dollars and that's the title of the session. But we did the trillion dollar baby post with Jazzy and said Cloud is gonna be a trillion dollars right? >>And and the point of Alan Answer session is he's thinking from an individual firm. Forget the millions that you're gonna save shifting to the cloud on cost. There's billions in ecosystems and operating models. That's >>absolutely the business value. Now going back to my half stack full stack developer, is the business value. I've been talking about this on the clubhouses a lot this past month is for the entrepreneurs out there the the activity in the business value. That's the new the new intellectual property is the business logic, right? So if you could see innovations in how work streams and workflow is gonna be a configured differently, you have now large scale cloud specialization with data, you can move quickly and take territory. That's much different scenario than a decade ago, >>at the point I was trying to make earlier was which I know I remember, is that that having the horizontal sort of features is very important, as compared to having vertical focus. You know, you're you're more healthcare focused like you. You have that sort of needs, if you will, and you and our auto or financials and stuff like that. What Google is trying to do, I think that's it. That's a good thing. Do cook up the reference architectures, but it's a bad thing in a way that you drive drive away some developers who are most of the developers at 80 plus percent, developers are horizontal like you. Look at the look into the psyche of a developer like you move from company to company. And only few developers will say I will stay only in health care, right? So I will only stay in order or something of that, right? So they you have to have these horizontal capabilities which can be applied anywhere on then. On top >>of that, I think that's true. Sorry, but I'll take a little bit different. Take on that. I would say yes, that's true. But remember, remember the old school application developer Someone was just called in Application developer. All they did was develop applications, right? They pick the framework, they did it right? So I think we're going to see more of that is just now mawr of Under the Covers developers. You've got mawr suffer defined networking and software, defined storage servers and cloud kubernetes. And it's kind of like under the hood. But you got your, you know, classic application developer. I think you're gonna see him. A lot of that come back in a way that's like I don't care about anything else. And that's the promise of cloud infrastructure is code. So I think this both. >>Hey, I worked. >>I worked at people solved and and I still today I say into into this context, I say E r P s are the ultimate low code. No code sort of thing is right. And what the problem is, they couldn't evolve. They couldn't make it. Lightweight, right? Eso um I used to write applications with drag and drop, you know, stuff. Right? But But I was miserable as a developer. I didn't Didn't want to be in the applications division off PeopleSoft. I wanted to be on the tools division. There were two divisions in most of these big companies ASAP. Oracle. Uh, like companies that divisions right? One is the cooking up the tools. One is cooking up the applications. The basketball was always gonna go to the tooling. Hey, >>guys, I'm sorry. We're almost out of time. I always wanted to t some of the sections of the day. First of all, we got Holder Mueller coming on at lunch for a power half hour. Um, you'll you'll notice when you go back to the home page. You'll notice that calendar, that linear clock that we talked about that start times are kind of weird like, for instance, an appendix coming on at 1 24. And that's because these air prerecorded assets and rather than having a bunch of dead air, we're just streaming one to the other. So so she's gonna talk about people, process and technology. We got Kathy Southwick, whose uh, Silicon Valley CEO Dan Sheehan was the CEO of Dunkin Brands and and he was actually the c 00 So it's C A CEO connecting the dots to the business. Daniel Dienes is the CEO of you I path. He's coming on a 2:47 p.m. East Coast time one of the hottest companies, probably the fastest growing software company in history. We got a guy from Bain coming on Dave Humphrey, who invested $750 million in Nutanix. He'll explain why and then, ironically, Dheeraj Pandey stew, Minuteman. Our friend interviewed him. That's 3 35. 1 of the sessions are most excited about today is John McD agony at 403 p. M. East Coast time, she's gonna talk about how to fix broken data architectures, really forward thinking stuff. And then that's the So that's the transformation track on the future of cloud track. We start off with the Big Three Milan Thompson Bukovec. At one oclock, she runs a W s storage business. Then I mentioned gig therapy wrath at 1. 30. He runs Azure is analytics. Business is awesome. Paul Dillon then talks about, um, IDs Avery at 1 59. And then our friends to, um, talks about interview Simon Crosby. I think I think that's it. I think we're going on to our next session. All right, so keep it right there. Thanks for watching the Cuban cloud. Uh huh.

Published Date : Jan 22 2021

SUMMARY :

cloud brought to you by silicon angle, everybody I was negative in quarantine at a friend's location. I mean, you go out for a walk, but you're really not in any contact with anybody. And I think we're in a new generation. The future of Cloud computing in the coming decade is, John said, we're gonna talk about some of the public policy But the goal here is to just showcase it's Whatever you wanna call it, it's a cube room, and the people in there chatting and having a watch party. that will take you into the chat, we'll take you through those in a moment and share with you some of the guests And then from there you just It was just awesome. And it kind of ironic, if you will, because the pandemic it hits at the beginning of this decade, And if you weren't a digital business, you were kind of out of business. last 10 years defined by you know, I t transformation. And if you look at some of the main trends in the I think the second thing is you can see on this data. Everybody focuses on the growth rates, but it's you gotta look at also the absolute dollars and, So you know, as you're doing trends job, they're just it's just pedal as fast as you can. It's a measure of the pervasiveness or, you know, number of mentions in the data set. And I think that chart demonstrates that there, in there in the hyper scale leadership category, is they're, you know, they're just good enough. So we'll get to those So just just real quick Here you see this hybrid zone, this the field is bunched But I think one of the things that people are missing and aren't talking about Dave is that there's going to be a second Can you hear us? So the first question, Um, we'll still we'll get the student second. Thanks for taking the time with us. I mean, what do you guys see? I think that discussion has to take place. I think m and a activity really will pick up. I mean, can you use a I to find that stuff? So if I wanted to reset the world stage, you know what better way than the, and that and it's also fuels the decentralized move because people say, Hey, if that could be done to them, mean, independent of of, you know, again, somebody said your political views. and he did a great analysis on this, because if you look the lawsuit was just terrible. But nonetheless, you know, to start, get to your point earlier. you know, platform last night and I was like, What? you know, some of the cdn players, maybe aka my You know, I like I like Hashi Corp. for many by the big guys, you know, by the hyper scholars and if I say the right that was acquired by at five this week, And I think m and a activity is gonna be where again, the bigger too big to fail would agree with Not at the same level of other to hyper scale is I'll give you network and all the intelligence they have that they could bring to bear on security. The where the workloads needs, you know, basic stuff, right? the gap on be a much, much closer, you know, to the to the leaders in orderto I think that's like Google's in it. I just I think that is a multi trillion dollar, you know, future for the industry. So you know, Google has people within the country that will protest contract because I mean, Rob Hope said, Hey, bar is pretty high to kick somebody off your platform. I think they were in there to get selfies and being protesters. Yeah, but my point is that the employee backlash was also a factor. I think you know, Google's got a lot of people interested in, particularly in the analytic side, is that they have to boot out AWS wherever they go. I think it's gonna be a time where you looked at the marketplace and you're And I think John, you mentioned Snowflake before. I remember back in the eighties, when you had open systems movement, I mean, certainly the marketing says that, I think if you don't appeal to developers, if you don't but extensive She said, Microsoft, If you go back and look at the Microsoft So the cloud next Gen Cloud is going to look a lot like next Gen Developer You got a shard, the databases you gotta manage. And if you look at what's happened since Kubernetes was put out there, what it's become the producer off the technology or the product to the consumer. Okay, so the executives think everything is a services business strategy, You know, pay by the drink pricing model and to your point, john toe, actually implement. Yeah, I think like you couldn't see it. I think they're trying to bring the platform by doing, you know, acquisition after acquisition to be a platform the ones that have access to the most data will get the most value. I think you have some thoughts on this. Actually, I lost my thought. I mean, to the extent that you could build an ecosystem coming back to Alan Nancy's premise But we did the trillion dollar baby post with And and the point of Alan Answer session is he's thinking from an individual firm. So if you could see innovations Look at the look into the psyche of a developer like you move from company to company. And that's the promise of cloud infrastructure is code. I say E r P s are the ultimate low code. Daniel Dienes is the CEO of you I path.

ENTITIES

Entity	Category	Confidence
Sergey	PERSON	0.99+
John	PERSON	0.99+
California	LOCATION	0.99+
Andy Jassy	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Justin	PERSON	0.99+
Daniel Dienes	PERSON	0.99+
Google	ORGANIZATION	0.99+
John Ferrier	PERSON	0.99+
Dave Volonte	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Ronald	PERSON	0.99+
Jerry Chen	PERSON	0.99+
David	PERSON	0.99+
Ed Walsh	PERSON	0.99+
Michael Dell	PERSON	0.99+
Dave	PERSON	0.99+
Kathy Southwick	PERSON	0.99+
Paul Dillon	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Rob Hope	PERSON	0.99+
Dell	ORGANIZATION	0.99+
1997	DATE	0.99+
Tara	PERSON	0.99+
HP	ORGANIZATION	0.99+
Dan Sheehan	PERSON	0.99+
Simon Crosby	PERSON	0.99+
Alan	PERSON	0.99+

Steve Touw, Immuta | AWS re:Invent 2020

>>from around the globe. It's the Cube with digital coverage of AWS reinvent 2020 sponsored by Intel, AWS and our community partners. All right, you're continuing or we're continuing around the clock coverage and around the world coverage off a W s reinvent 2020 virtual conference This year, I'm guessing hundreds of thousands of folks are tuning in for coverage. And we have we have on the other end of the country a cube alarm. Stephen Towel, co founder and CTO of immunity. Stephen, welcome back to the show. >>Great. Great to be here. Thanks for having me again. I hope to match your enthusiasm. >>You know what is, uh, your co founder? I'm sure you could match the enthusiasm. Plus, we're talking about data governance. You You've been on the cute before, and you kind of laid the foundation for us last year. Talking about challenges around data access and data access control. I want to extend this conversation. I had a conversation with a CEO chief data officer a couple of years ago. He shared how his data analysts his the people that actually take the data and make business decisions or create outcomes to make business decisions spent 80% of their time wrangling the data just doing transformations. >>How's the >>Muda helping solve that problem? >>Yeah, great questions. So it's actually interesting. We're seeing a division of roles in these organizations where we have data engineering teams that are actually managing. Ah, lot of the prep work that goes into exposing data and releasing data analysts. Uh, and as part of their day to day job is to ensure that that data that they're released into the analyst is what they're allowed to see. Um and so we kind of see this, this problem of compliance getting in the way of analysts doing their own transformation. So it would be great if we didn't have to have a limited to just this small data engineering team to release the data. What we believe one of the rial issues behind that is that they are the ones that are trusted. They're the only ones that could see all the data in the clear. So it needs to be a very small subset of humans, so to speak, that can do this transformation work and release it. And that means that the data analyst downstream are hamstrung to a certain extent and bottlenecked by requesting these data engineers do some of this transformation work for them. Eso I think because, as you said, that's so critical to being able to analyze data, that bottleneck could could be a back breaker for organization. So we really think that to you need to tie transformation with compliance in order to streamline your analytics in your organization. >>So that has me curious. What does that actually look like? Because Because when I think of a data analyst, they're not always thinking about Well, who should have this data? They're trying to get the answer to the question Thio provide to the data engineer. What does that functionally looked like when that when you want to see that relationship of collaboration? >>Yeah, So we e think the beauty of a Muda and the beauty of governance solutions done right is that they should be invisible to the downstream analysts to a certain extent. So the data engineering team will takes on some requirements from their legal compliance. Seems such as you need a mask p I I or you need Thio. Hi. These kinds of rose from these kinds of analysts, depending on what the users doing. And we've just seen an explosion of different slices or different ways, you should dice up your data and what who's allowed to see what and not just about who they are, but what they're doing on DSO. You can kind of bake all these policies upfront on your data on a tool like Kamuda, and it will dynamically react based on who the analyst is and what they're doing to ensure that the right policies air being enforced. And we could do that in a way that when the analysts I mean, what we also see is just setting your policies on your data. Once up front, that's not the end of the story. Like a lot of people will tap themselves on the back and say, Look, we've got all our data protected appropriately, job done. But that's not really the case, because the analysts will start creating their own data products and they want to share that with other analysts. And so when you think about this, this becomes a very complex problem of okay. Before someone can share their data with anyone else, we need to understand what they were allowed to see eso being able to control the kind of this downstream flow of of transformations and feature engineering to ensure that Onley the right people, are seeing the things that they're allowed to see. But still, enabling analytics is really the challenges that that we saw that in Muda Thio, you know, help the the data teams create those initial policies at scale but also help the analytical teams build driven data products in a way that doesn't introduce data leaks. >>So as I think about the traditional ways in which we do this, we kind of, you know, take a data sad. Let's say, is the databases and we said, security rules etcetera on those data states. That's what you're painting to ISMM or of Dynamic. Has Muto approaching this problem from just a architectural direction? >>Yeah, great question. So I'm sure you've probably heard the term role based access control on, but it's been around forever where you basically aggregate your users in the roles, and then you build rules around those roles on gritty, much every legacy. Already, BMS manages data access this way. Um, what we're seeing now and I call it the private data era that we're now embarking on or have been embarking on for the past three years or so. Where consumers are more aware of their data, privacy and the needs they had their there's, you know, data regulations coming fast and furious with no end in sight. Um, we believe that this role based access control paradigm is just broken. We've got customers with thousands of roles that they're trying to manage Thio to, you know, slice up the data all the different ways that they need Thio. So instead, we we offer an accurate based access control solution and also policy based access control solution. We're. Instead, it's really about How do you dynamically enforced policy by separating who the user is from the policy that needs to be enforced and and having that execute at runtime? A good analogy to this is role based. Access control is like writing code without being able to use variables. You're writing the same block a code over and over again with slight changes based on the roll where actually based access control is, you're able to use variables and basically the policy gets decided at runtime based on who the user is and what they're doing. So >>that dynamic nature kind of lends itself to the public cloud. Were you seeing this applied in the world off a ws were here Reinvent so our customers using this with a W s >>So it all comes down to scalability so that the same reasons that used to separate storage from compute. You know, you get your storage in one place you could ephemera, lee, spin up, compute like EMR if you want. Um, you can use Athena against your storage in a server lis way that that kind of, um, freedom to choose whatever compute you want. Um, the same kind of concepts of apply with policy enforcement. You wanna separate your policy from your platform on that This private data era has has, you know, created this need just like you have to separate your compute from storage in the big data era. And this allows you to have a single plane of glass to enforce policy consistently, no matter what compute you're using or what a U s resource is you're using, um and so this gives our customers power to not only, um, you know, build the rules that they need to build and not have to do it uniquely her service in the U. S. But also proved to their legal and compliance teams that they're doing it correctly because, um, when when you do it this way, it really simplifies everything. And you have one place to go toe, understand how policies being enforced. And this really gives you the auditing and reporting around, um, be enforcement that you've been doing to put every one of these, that everything is being done correctly and that your data consumers can understand You know how your data is being protected. Their data is being protected. Um, and you could actually answer those questions when they come at you. >>So let's put this idea to the test a little bit. So I have the data engineer who kind of designs the security policy around the data or implements that policy using Kamuda Aziz dictated by the security and chief data officer of the organization. Then I have the analyst, and the analyst is just using the tools at their disposal. Let's say that one analyst wants to use AWS Lambda and another analysts wants to use our type database or analysis tools. You're telling me that Muda allows the flexibility for that analyst to use either tool within a W S. >>That's right, because we enforce policy at the data layer. Eso If you think about a Muda, it's really three layers policy authoring, which you touched on where those requirements get turned into real policies. Policy decision ing. So at query time we see who the user is, what they're doing on what policy has been defined to dynamically build that policy at run time and then enforcement, which is what you're getting at. The enforcement happens at the data layer, for example, we can enforce policies, natively and spark. So no matter how you're connecting to spark, that policy is going to get enforced appropriately. So we don't really care about what the clients Liz, because the enforcement is happening at the data or the compute layer is is a more accurate way todo to say it >>so. A practical reality off collaboration, especially around large data sets, is the ability to share data across organizations. How is immune hoping thio just make that barrier? Ah, little lower but ensuring security so that when I'm sharing data with, uh, analysts with within another firm. They're only seeing the data that they need to see, but we can effectively collaborate on those pieces of content. >>Yeah, I'm glad you asked this. I mean, this is like the, you know, the big finale, right? Like, this is what you get when you have this granularity on your own data ecosystem. It enables you to have that granularity now, when you want to share outside of your internal ecosystem. And so I think an important part about this is that when you think about governance, you can't necessarily have one God users so to speak, that has control over all tables and all policies. You really need segmentation of duty, where different parts of the organ hooking their own data build their own policies in a way where people can't step on each other and then this can expand this out. The third party data sharing where you can set different anonymous ation levels on your data when you're sharing an external the organization verse, if it's internal users and then someone else in your ord could share their data with you and then that also do that Third party. So it really enables and freeze these organizations Thio share with each other in ways that weren't possibly before. Because it happens in the day. The layer, um, these organizations can choose their own compute and still have the same policies being forced again. Going back to that consistency piece, um, it provides. Think of it is almost a authoritative way to share data in your organization. It doesn't have to be ad hoc. Oh, I have to share with this group over here. How should I do it? What policies should enforce. There's a single authoritative way to set policy and share your data. >>So the first thing that comes to my mind, especially when we give more power to the users, is when the auditors come and they say, You know what, Keith? I understand this is the policy, but prove it. How do we provide auditors with the evidence that you know, the we're implementing the policy that we designed and then two were ableto audit that policy? >>Yeah. Good question. So, um, I briefly spoke about this a little bit, but the when you author and define the policies in the Muda there immediately being enforced. So when you write something in our platform, um, it's not a glorified Wikipedia, right? It's actually turning those policies on and enforcing it at the data later. And because of that, any query that's coming through a Muda is going to be audited. But I think even more importantly, to be honest, we keep a history of how policy changes happening over time, too. So you could understand, you know, so and so changed the policy on this table versus other table, you know, got newly added, these people got dropped from it. So you get this rich history of not only who's touching what data and what data is important, but you're also getting a rich history off. Okay, how have we been treating this data from a policy perspective over time? How is it like what were my risk levels over the past year? With B six tables on? You can answer those kinds of questions as well. >>And then we're in the era of cloud. We expect to be able to consume these services via AP I via pay as you go type of thing. How is your relationship with AWS and how in the cutting. Ultimately, the customer. How do I consume a music? >>Yeah, so in Munich can pretty much be deployed anywhere. So obviously we're talking to us here. We have a SAS offering where you can spin up Muda pretrial and just be often running building policies and hooking up hooking our policy enforcement engine into your compute. Um, that runs in our, um you know, infrastructure. There's also a deployment model where you deploy immune it into your VPC s so it can run on your infrastructure. Behind your firewalls on DWI do not require any public Internet access at all for that to run. We don't do any kind of phone homing because, obviously, privacy company, we take this very seriously internally as well. We also have on premise deployments, um, again with zero connectivity air gapped environments. Eso. So we offer that kind of flexibility to our customers wherever they want immediate toe to be deployed. An important thing to remember their two is immediate. Does not actually store any data. We just store metadata and policy information. Um, so it's that also provides the customers some flexibility where if they want to use our SAS, they can simply go policy in there, and then the data still lives in their account. We're just kind of pushing policy down into that. Dynamically. >>So Stephen Towel co founder c t o of immunity. I don't think you have to worry about matching my energy level. I through some pretty tough questions at at you and you were ready there with all the answers. You wanna see more interesting conversations from around the world with founders, builders, AWS reinvent is all about builders and we're talking to the builders throughout this show. Visit us on the web. The Cube. You can engage with us on Twitter. Talk to you next episode off the Cube from AWS reinvent 2020.

Published Date : Dec 8 2020

SUMMARY :

end of the country a cube alarm. I hope to match your enthusiasm. been on the cute before, and you kind of laid the foundation for us last year. And that means that the data analyst downstream are hamstrung to a certain extent and like when that when you want to see that relationship of collaboration? of different slices or different ways, you should dice up your data and what who's allowed to see what So as I think about the traditional ways in which we do this, we kind of, you know, data, privacy and the needs they had their there's, you know, data regulations coming fast that dynamic nature kind of lends itself to the public cloud. you know, created this need just like you have to separate your compute from storage in You're telling me that Muda allows the flexibility for that analyst to use either at the data or the compute layer is is a more accurate way todo to They're only seeing the data that they need to see, but we can effectively collaborate on those when you want to share outside of your internal ecosystem. So the first thing that comes to my mind, especially when we give more power to the users, So when you write something in our platform, AP I via pay as you go type of thing. Um, so it's that also provides the customers some flexibility where if they Talk to you next episode off the Cube from AWS

ENTITIES

Entity	Category	Confidence
Stephen	PERSON	0.99+
Keith	PERSON	0.99+
AWS	ORGANIZATION	0.99+
80%	QUANTITY	0.99+
Stephen Towel	PERSON	0.99+
Steve Touw	PERSON	0.99+
Munich	LOCATION	0.99+
two	QUANTITY	0.99+
last year	DATE	0.99+
U. S.	LOCATION	0.99+
thousands	QUANTITY	0.99+
Intel	ORGANIZATION	0.98+
This year	DATE	0.98+
Thio	PERSON	0.98+
single	QUANTITY	0.98+
SAS	ORGANIZATION	0.97+
first thing	QUANTITY	0.96+
three layers	QUANTITY	0.96+
Wikipedia	ORGANIZATION	0.95+
Immuta	PERSON	0.94+
one	QUANTITY	0.94+
roles	QUANTITY	0.94+
W s reinvent 2020	EVENT	0.93+
couple of years ago	DATE	0.92+
Muto	PERSON	0.92+
one place	QUANTITY	0.91+
one analyst	QUANTITY	0.91+
single plane	QUANTITY	0.91+
Kamuda Aziz	PERSON	0.91+
hundreds of thousands of folks	QUANTITY	0.89+
Cube	COMMERCIAL_ITEM	0.88+
zero	QUANTITY	0.87+
Lambda	TITLE	0.85+
past three years	DATE	0.85+
Athena	ORGANIZATION	0.83+
Twitter	ORGANIZATION	0.82+
Kamuda	TITLE	0.82+
ISMM	ORGANIZATION	0.81+
God	PERSON	0.78+
AWS reinvent 2020	EVENT	0.74+
past year	DATE	0.73+
Invent	EVENT	0.72+
CTO	PERSON	0.72+
Liz	PERSON	0.67+
Muda	TITLE	0.67+
BMS	ORGANIZATION	0.58+
2020	DATE	0.57+
EMR	TITLE	0.54+
six	QUANTITY	0.51+
Dynamic	ORGANIZATION	0.49+
reinvent	TITLE	0.49+
DWI	ORGANIZATION	0.45+
Onley	ORGANIZATION	0.45+
Thio	LOCATION	0.44+
re	EVENT	0.4+
2020	TITLE	0.39+

Nimrod Vax, BigID | AWS re:Invent 2020 Partner Network Day

>> Announcer: From around the globe, it's theCUBE. With digital coverage of AWS re:Invent 2020. Special coverage sponsored by AWS global partner network. >> Okay, welcome back everyone to theCUBE virtual coverage of re:Invent 2020 virtual. Normally we're in person, this year because of the pandemic we're doing remote interviews and we've got a great coverage here of the APN, Amazon Partner Network experience. I'm your host John Furrier, we are theCUBE virtual. Got a great guest from Tel Aviv remotely calling in and videoing, Nimrod Vax, who is the chief product officer and co-founder of BigID. This is the beautiful thing about remote, you're in Tel Aviv, I'm in Palo Alto, great to see you. We're not in person but thanks for coming on. >> Thank you. Great to see you as well. >> So you guys have had a lot of success at BigID, I've noticed a lot of awards, startup to watch, company to watch, kind of a good market opportunity data, data at scale, identification, as the web evolves beyond web presence identification, authentication is super important. You guys are called BigID. What's the purpose of the company? Why do you exist? What's the value proposition? >> So first of all, best startup to work at based on Glassdoor worldwide, so that's a big achievement too. So look, four years ago we started BigID when we realized that there is a gap in the market between the new demands from organizations in terms of how to protect their personal and sensitive information that they collect about their customers, their employees. The regulations were becoming more strict but the tools that were out there, to the large extent still are there, were not providing to those requirements and organizations have to deal with some of those challenges in manual processes, right? For example, the right to be forgotten. Organizations need to be able to find and delete a person's data if they want to be deleted. That's based on GDPR and later on even CCPA. And organizations have no way of doing it because the tools that were available could not tell them whose data it is that they found. The tools were very siloed. They were looking at either unstructured data and file shares or windows and so forth, or they were looking at databases, there was nothing for Big Data, there was nothing for cloud business applications. And so we identified that there is a gap here and we addressed it by building BigID basically to address those challenges. >> That's great, great stuff. And I remember four years ago when I was banging on the table and saying, you know regulation can stunt innovation because you had the confluence of massive platform shifts combined with the business pressure from society. That's not stopping and it's continuing today. You seeing it globally, whether it's fake news in journalism, to privacy concerns where modern applications, this is not going away. You guys have a great market opportunity. What is the product? What is smallID? What do you guys got right now? How do customers maintain the success as the ground continues to shift under them as platforms become more prevalent, more tools, more platforms, more everything? >> So, I'll start with BigID. What is BigID? So BigID really helps organizations better manage and protect the data that they own. And it does that by connecting to everything you have around structured databases and unstructured file shares, big data, cloud storage, business applications and then providing very deep insight into that data. Cataloging all the data, so you know what data you have where and classifying it so you know what type of data you have. Plus you're analyzing the data to find similar and duplicate data and then correlating them to an identity. Very strong, very broad solution fit for IT organization. We have some of the largest organizations out there, the biggest retailers, the biggest financial services organizations, manufacturing and et cetera. What we are seeing is that there are, with the adoption of cloud and business success obviously of AWS, that there are a lot of organizations that are not as big, that don't have an IT organization, that have a very well functioning DevOps organization but still have a very big footprint in Amazon and in other kind of cloud services. And they want to get visibility and they want to do it quickly. And the SmallID is really built for that. SmallID is a lightweight version of BigID that is cloud-native built for your AWS environment. And what it means is that you can quickly install it using CloudFormation templates straight from the AWS marketplace. Quickly stand up an environment that can scan, discover your assets in your account automatically and give you immediate visibility into that, your S3 bucket, into your DynamoDB environments, into your EMR clusters, into your Athena databases and immediately building a full catalog of all the data, so you know what files you have where, you know where what tables, what technical metadata, operational metadata, business metadata and also classified data information. So you know where you have sensitive information and you can immediately address that and apply controls to that information. >> So this is data discovery. So the use case is, I'm an Amazon partner, I mean we use theCUBE virtuals on Amazon, but let's just say hypothetically, we're growing like crazy. Got S3 buckets over here secure, encrypted and the rest, all that stuff. Things are happening, we're growing like a weed. Do we just deploy smallIDs and how it works? Is that use cases, SmallID is for AWS and BigID for everything else or? >> You can start small with SmallID, you get the visibility you need, you can leverage the automation of AWS so that you automatically discover those data sources, connect to them and get visibility. And you could grow into BigID using the same deployment inside AWS. You don't have to switch migrate and you use the same container cluster that is running inside your account and automatically scale it up and then connect to other systems or benefit from the more advanced capabilities the BigID can offer such as correlation, by connecting to maybe your Salesforce, CRM system and getting the ability to correlate to your customer data and understand also whose data it is that you're storing. Connecting to your on-premise mainframe, with the same deployment connecting to your Google Drive or office 365. But the point is that with the smallID you can really start quickly, small with a very small team and get that visibility very quickly. >> Nimrod, I want to ask you a question. What is the definition of cloud native data discovery? What does that mean to you? >> So cloud native means that it leverages all the benefits of the cloud. Like it gets all of the automation and visibility that you get in a cloud environment versus any traditional on-prem environment. So one thing is that BigID is installed directly from your marketplace. So you could browse, find its solution on the AWS marketplace and purchase it. It gets deployed using CloudFormation templates very easily and very quickly. It runs on a elastic container service so that once it runs you can automatically scale it up and down to increase the scan and the scale capabilities of the solution. It connects automatically behind the scenes into the security hub of AWS. So you get those alerts, the policy alerts fed into your security hub. It has integration also directly into the native logging capabilities of AWS. So your existing Datadog or whatever you're using for monitoring can plug into it automatically. That's what we mean by cloud native. >> And if you're cloud native you got to be positioned to take advantage of the data and machine learning in particular. Can you expand on the role of machine learning in your solution? Customers are leaning in heavily this year, you're seeing more uptake on machine learning which is basically AI, AI is machine learning, but it's all tied together. ML is big on all the deployments. Can you share your thoughts? >> Yeah, absolutely. So data discovery is a very tough problem and it has been around for 20 years. And the traditional methods of classifying the data or understanding what type of data you have has been, you're looking at the pattern of the data. Typically regular expressions or types of kind of pattern-matching techniques that look at the data. But sometimes in order to know what is personal or what is sensitive it's not enough to look at the pattern of the data. How do you distinguish between a date of birth and any other date. Date of birth is much more sensitive. How do you find country of residency or how do you identify even a first name from the last name? So for that, you need more advanced, more sophisticated capabilities that go beyond just pattern matching. And BigID has a variety of those techniques, we call that discovery-in-depth. What it means is that very similar to security-in-depth where you can not rely on a single security control to protect your environment, you can not rely on a single discovery method to truly classify the data. So yes, we have regular expression, that's the table state basic capability of data classification but if you want to find data that is more contextual like a first name, last name, even a phone number and distinguish between a phone number and just a sequence of numbers, you need more contextual NLP based discovery, name entity recognition. We're using (indistinct) to extract and find data contextually. We also apply deep learning, CNN capable, it's called CNN, which is basically deep learning in order to identify and classify document types. Which is basically being able to distinguish between a resume and a application form. Finding financial records, finding medical records. So RA are advanced NLP classifiers can find that type of data. The more advanced capabilities that go beyond the smallID into BigID also include cluster analysis which is an unsupervised machine learning method of finding duplicate and similar data correlation and other techniques that are more contextual and need to use machine learning for that. >> Yeah, and unsupervised that's a lot harder than supervised. You need to have that ability to get that what you can't see. You got to get the blind spots identified and that's really the key observational data you need. This brings up the kind of operational you heard cluster, I hear governance security you mentioned earlier GDPR, this is an operational impact. Can you talk about how it impacts on specifically on the privacy protection and governance side because certainly I get the clustering side of it, operationally just great. Everyone needs to get that. But now on the business model side, this is where people are spending a lot of time scared and worried actually. What the hell to do? >> One of the things that we realized very early on when we started with BigID is that everybody needs a discovery. You need discovery and we actually started with privacy. You need discovery in route to map your data and apply the privacy controls. You need discovery for security, like we said, right? Find and identify sensitive data and apply controls. And you also need discovery for data enablement. You want to discover the data, you want to enable it, to govern it, to make it accessible to the other parts of your business. So discovery is really a foundation and starting point and that you get there with smallID. How do you operationalize that? So BigID has the concept of an application framework. Think about it like an Apple store for data discovery where you can run applications inside your kind of discovery iPhone in order to run specific (indistinct) use cases. So, how do you operationalize privacy use cases? We have applications for privacy use cases like subject access requests and data rights fulfillment, right? Under the CCPA, you have the right to request your data, what data is being stored about you. BigID can help you find all that data in the catalog that after we scan and find that information we can find any individual data. We have an application also in the privacy space for consent governance right under CCP. And you have the right to opt out. If you opt out, your data cannot be sold, cannot be used. How do you enforce that? How do you make sure that if someone opted out, that person's data is not being pumped into Glue, into some other system for analytics, into Redshift or Snowflake? BigID can identify a specific person's data and make sure that it's not being used for analytics and alert if there is a violation. So that's just an example of how you operationalize this knowledge for privacy. And we have more examples also for data enablement and data management. >> There's so much headroom opportunity to build out new functionality, make it programmable. I really appreciate what you guys are doing, totally needed in the industry. I could just see endless opportunities to make this operationally scalable, more programmable, once you kind of get the foundation out there. So congratulations, Nimrod and the whole team. The question I want to ask you, we're here at re:Invent's virtual, three weeks we're here covering Cube action, check out theCUBE experience zone, the partner experience. What is the difference between BigID and say Amazon's Macy? Let's think about that. So how do you compare and contrast, in Amazon they say we love partnering, but we promote our ecosystem. You guys sure have a similar thing. What's the difference? >> There's a big difference. Yes, there is some overlap because both a smallID and Macy can classify data in S3 buckets. And Macy does a pretty good job at it, right? I'm not arguing about it. But smallID is not only about scanning for sensitive data in S3. It also scans anything else you have in your AWS environment, like DynamoDB, like EMR, like Athena. We're also adding Redshift soon, Glue and other rare data sources as well. And it's not only about identifying and alerting on sensitive data, it's about building full catalog (indistinct) It's about giving you almost like a full registry of your data in AWS, where you can look up any type of data and see where it's found across structured, unstructured big data repositories that you're handling inside your AWS environment. So it's broader than just for security. Apart from the fact that they're used for privacy, I would say the biggest value of it is by building that catalog and making it accessible for data enablement, enabling your data across the board for other use cases, for analytics in Redshift, for Glue, for data integrations, for various other purposes. We have also integration into Kinesis to be able to scan and let you know which topics, use what type of data. So it's really a very, very robust full-blown catalog of the data that across the board that is dynamic. And also like you mentioned, accessible to APIs. Very much like the AWS tradition. >> Yeah, great stuff. I got to ask you a question while you're here. You're the co-founder and again congratulations on your success. Also the chief product officer of BigID, what's your advice to your colleagues and potentially new friends out there that are watching here? And let's take it from the entrepreneurial perspective. I have an application and I start growing and maybe I have funding, maybe I take a more pragmatic approach versus raising billions of dollars. But as you grow the pressure for AppSec reviews, having all the table stakes features, how do you advise developers or entrepreneurs or even business people, small medium-sized enterprises to prepare? Is there a way, is there a playbook to say, rather than looking back saying, oh, I didn't do with all the things I got to go back and retrofit, get BigID. Is there a playbook that you see that will help companies so they don't get killed with AppSec reviews and privacy compliance reviews? Could be a waste of time. What's your thoughts on all this? >> Well, I think that very early on when we started BigID, and that was our perspective is that we knew that we are a security and privacy company. So we had to take that very seriously upfront and be prepared. Security cannot be an afterthought. It's something that needs to be built in. And from day one we have taken all of the steps that were needed in order to make sure that what we're building is robust and secure. And that includes, obviously applying all of the code and CI/CD tools that are available for testing your code, whether it's (indistinct), these type of tools. Applying and providing, penetration testing and working with best in line kind of pen testing companies and white hat hackers that would look at your code. These are kind of the things that, that's what you get funding for, right? >> Yeah. >> And you need to take advantage of that and use them. And then as soon as we got bigger, we also invested in a very, kind of a very strong CSO that comes from the industry that has a lot of expertise and a lot of credibility. We also have kind of CSO group. So, each step of funding we've used extensively also to make RM kind of security poster a lot more robust and invisible. >> Final question for you. When should someone buy BigID? When should they engage? Is it something that people can just download immediately and integrate? Do you have to have, is the go-to-market kind of a new target the VP level or is it the... How does someone know when to buy you and download it and use the software? Take us through the use case of how customers engage with. >> Yeah, so customers directly have those requirements when they start hitting and having to comply with regulations around privacy and security. So very early on, especially organizations that deal with consumer information, get to a point where they need to be accountable for the data that they store about their customers and they want to be able to know their data and provide the privacy controls they need to their consumers. For our BigID product this typically is a kind of a medium size and up company, and with an IT organization. For smallID, this is a good fit for companies that are much smaller, that operate mostly out of their, their IT is basically their DevOps teams. And once they have more than 10, 20 data sources in AWS, that's where they start losing count of the data that they have and they need to get more visibility and be able to control what data is being stored there. Because very quickly you start losing count of data information, even for an organization like BigID, which isn't a bigger organization, right? We have 200 employees. We are at the point where it's hard to keep track and keep control of all the data that is being stored in all of the different data sources, right? In AWS, in Google Drive, in some of our other sources, right? And that's the point where you need to start thinking about having that visibility. >> Yeah, like all growth plan, dream big, start small and get big. And I think that's a nice pathway. So small gets you going and you lead right into the BigID. Great stuff. Final, final question for you while I gatchu here. Why the awards? Someone's like, hey, BigID is this cool company, love the founder, love the team, love the value proposition, makes a lot of sense. Why all the awards? >> Look, I think one of the things that was compelling about BigID from the beginning is that we did things differently. Our whole approach for personal data discovery is unique. And instead of looking at the data, we started by looking at the identities, the people and finally looking at their data, learning how their data looks like and then searching for that information. So that was a very different approach to the traditional approach of data discovery. And we continue to innovate and to look at those problems from a different perspective so we can offer our customers an alternative to what was done in the past. It's not saying that we don't do the basic stuffs. The Reg X is the connectivity that that is needed. But we always took a slightly different approach to diversify, to offer something slightly different and more comprehensive. And I think that was the thing that really attracted us from the beginning with the RSA Innovation Sandbox award that we won in 2018, the Gartner Cool Vendor award that we received. And later on also the other awards. And I think that's the unique aspect of BigID. >> You know you solve big problems than certainly as needed. We saw this early on and again I don't think that the problem is going to go away anytime soon, platforms are emerging, more tools than ever before that converge into platforms and as the logic changes at the top all of that's moving onto the underground. So, congratulations, great insight. >> Thank you very much. >> Thank you. Thank you for coming on theCUBE. Appreciate it Nimrod. Okay, I'm John Furrier. We are theCUBE virtual here for the partner experience APN virtual. Thanks for watching. (gentle music)

Published Date : Dec 3 2020

SUMMARY :

Announcer: From around the globe, of the APN, Amazon Partner Great to see you as well. So you guys have had a For example, the right to be forgotten. What is the product? of all the data, so you know and the rest, all that stuff. and you use the same container cluster What is the definition of Like it gets all of the automation of the data and machine and need to use machine learning for that. and that's really the key and that you get there with smallID. Nimrod and the whole team. of the data that across the things I got to go back These are kind of the things that, and a lot of credibility. is the go-to-market kind of And that's the point where you need and you lead right into the BigID. And instead of looking at the data, and as the logic changes at the top for the partner experience APN virtual.

ENTITIES

Entity	Category	Confidence
AWS	ORGANIZATION	0.99+
Nimrod Vax	PERSON	0.99+
Nimrod	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Palo Alto	LOCATION	0.99+
Tel Aviv	LOCATION	0.99+
2018	DATE	0.99+
Glassdoor	ORGANIZATION	0.99+
BigID	TITLE	0.99+
200 employees	QUANTITY	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
BigID	ORGANIZATION	0.99+
Apple	ORGANIZATION	0.99+
SmallID	TITLE	0.99+
GDPR	TITLE	0.99+
four years ago	DATE	0.98+
billions of dollars	QUANTITY	0.98+
Redshift	TITLE	0.98+
CloudFormation	TITLE	0.97+
both	QUANTITY	0.97+
DynamoDB	TITLE	0.97+
single	QUANTITY	0.97+
CNN	ORGANIZATION	0.97+
this year	DATE	0.97+
EMR	TITLE	0.97+
one thing	QUANTITY	0.97+
One	QUANTITY	0.96+
one	QUANTITY	0.96+
each step	QUANTITY	0.95+
Amazon Partner Network	ORGANIZATION	0.95+
three weeks	QUANTITY	0.95+
APN	ORGANIZATION	0.95+
20 years	QUANTITY	0.95+
S3	TITLE	0.94+
Athena	TITLE	0.94+
office 365	TITLE	0.94+
today	DATE	0.93+
first name	QUANTITY	0.92+
smallIDs	TITLE	0.91+
Gartner Cool Vendor	TITLE	0.91+
Kinesis	TITLE	0.91+
20 data sources	QUANTITY	0.9+
RSA Innovation Sandbox	TITLE	0.88+
CCP	TITLE	0.88+
Invent 2020 Partner Network Day	EVENT	0.88+
smallID	TITLE	0.88+
more than 10,	QUANTITY	0.88+
Macy	ORGANIZATION	0.86+

Ed Walsh, ChaosSearch | AWS re:Invent 2020 Partner Network Day

>> Narrator: From around the globe it's theCUBE, with digital coverage of AWS re:Invent 2020. Special coverage sponsored by AWS Global Partner Network. >> Hello and welcome to theCUBE Virtual and our coverage of AWS re:Invent 2020 with special coverage of APN partner experience. We are theCUBE Virtual and I'm your host, Justin Warren. And today I'm joined by Ed Walsh, CEO of ChaosSearch. Ed, welcome to theCUBE. >> Well thank you for having me, I really appreciate it. >> Now, this is not your first time here on theCUBE. You're a regular here and I've loved it to have you back. >> I love the platform you guys are great. >> So let's start off by just reminding people about what ChaosSearch is and what do you do there? >> Sure, the best way to say is so ChaosSearch helps our clients know better. We don't do that by a special wizard or a widget that you give to your, you know, SecOp teams. What we do is a hard work to give you a data platform to get insights at scale. And we do that also by achieving the promise of data lakes. So what we have is a Chaos data platform, connects and indexes data in a customer's S3 or glacier accounts. So inside your data lake, not our data lake but renders that data fully searchable and available for analysis using your existing tools today 'cause what we do is index it and publish open API, it's like API like Elasticsearch API, and soon SQL. So give you an example. So based upon those capabilities were an ideal replacement for a commonly deployed, either Elasticsearch or ELK Stack deployments, if you're hitting scale issues. So we talk about scalable log analytics, and more and more people are hitting these scale issues. So let's say if you're using Elasticsearch ELK or Amazon Elasticsearch, and you're hitting scale issues, what I mean by that is like, you can't keep enough retention. You want longer retention, or it's getting very expensive to keep that retention, or because the scale you hit where you have availability, where the cluster is hard to keep up running or is crashing. That's what we mean by the issues at scale. And what we do is simply we allow you, because we're publishing the open API of Elasticsearch use all your tools, but we save you about 80% off your monthly bill. We also give you an, and it's an and statement and give you unlimited retention. And as much as you want to keep on S3 or into Glacier but we also take care of all the hassles and management and the time to manage these clusters, which ends up being on a database server called leucine. And we take care of that as a managed service. And probably the biggest thing is all of this without changing anything your end users are using. So we include Kibana, but imagine it's an Elastic API. So if you're using API or Kibana, it's just easy to use the exact same tools used today, but you get the benefits of a true data lake. In fact, we're running now Elasticsearch on top of S3 natively. If that makes it sense. >> Right and natively is pretty cool. And look, 80% savings, is a dramatic number, particularly this year. I think there's a lot of people who are looking to save a few quid. So it'd be very nice to be able to save up to 80%. I am curious as to how you're able to achieve that kind of saving though. >> Yeah, you won't be the first person to ask me that. So listen, Elastic came around, it was, you know we had Splunk and we also have a lot of Splunk clients, but Elastic was a more cost effective solution open source to go after it. But what happens is, especially at scale, if it's fall it's actually very cost-effective. But underneath last six tech ELK Stack is a leucine database, it's a database technology. And that sits on our servers that are heavy memory count CPU count in and SSDs. So you can do on-prem or even in the clouds, so if you do an Amazon, basically you're spinning up a server and it stays up, it doesn't spin up, spin down. So those clusters are not one server, it's a cluster of those servers. And typically if you have any scale you're actually having multiple clusters because you don't dare put it on one, for different use cases. So our savings are actually you no longer need those servers to spin up and you don't need to pay for those seen underneath. You can still use Kibana under API but literally it's $80 off your bill that you're paying for your service now, and it's hard dollars. So it's not... And we typically see clients between 70 and 80%. It's up to 80, but it's literally right within a 10% margin that you're saving a lot of money, but more importantly, saving money is a great thing. But now you have one unified data lake that you can have. You used to go across some of the data or all the data through the role-based access. You can give different people. Like we've seen people who say, hey give that, help that person 40 days of this data. But the SecOp up team gets to see across all the different law. You know, all the machine generated data they have. And we can give you a couple of examples of that and walk you through how people deploy if you want. >> I'm always keen to hear specific examples of how customers are doing things. And it's nice that you've thought of drawn that comparison there around what what cloud is good for and what it isn't is. I'll often like to say that AWS is cheap to fail in, but expensive to succeed. So when people are actually succeeding with this and using this, this broad amount of data so what you're saying there with that savings I've actually got access to a lot more data that I can do things with. So yeah, if you could walk through a couple of examples of what people are doing with this increased amount of data that they have access to in EKL Search, what are some of the things that people are now able to unlock with that data? >> Well, literally it's always good for a customer size so we can go through and we go through it however it might want, Kleiner, Blackboard, Alert Logic, Armor Security, HubSpot. Maybe I'll start with HubSpot. One of our good clients, they were doing some Cloud Flare data that was one of their clusters they were using a lot to search for. But they were looking at to look at a denial service. And they were, we find everyone kind of at scale, they get limited. So they were down to five days retention. Why? Well, it's not that they meant to but basically they couldn't cost-effectively handle that in the scale. And also they're having scale issues with the environment, how they set the cluster and sharding. And when they also denial service tech, what happened that's when the influx of data that is one thing about scale is how fast it comes out, yet another one is how much data you have. But this is as the data was coming after them at denial service, that's when the cluster would actually go down believe it or not, you know right. When you need your log analysis tools. So what we did is because they're just using Kibana, it was easy swap. They ran in parallel because we published the open API but we took them from five days to nine days. They could keep as much as they want but nine days for denial services is what they wanted. And then we did save them in over $4 million a year in hard dollars, What they're paying in their environment from really is the savings on the server farm and a little bit on the Elasticsearch Stack. But more importantly, they had no outages since. Now here's the thing. Are you talking about the use case? They also had other clusters and you find everyone does it. They don't dare put it on one cluster, even though these are not one server, they're multiple servers. So the next use case for CloudFlare was one, the next QS and it was a 10 terabyte a day influx kept it for 90 days. So it's about a petabyte. They brought another use case on which was NetMon, again, Network Monitoring. And again, I'm having the same scale issue, retention area. And what they're able to do is easily roll that on. So that's one data platform. Now they're adding the next one. They have about four different use cases and it's just different clusters able to bring together. But now what they're able to do give you use cases either they getting more cost effective, more stability and freedom. We say saves you a lot of time, cost and complexity. Just the time they manage that get the data in the complexities around it. And then the cost is easy to kind of quantify but they've got better but more importantly now for particular teams they only need their access to one data but the SecOP team wants to see across all the data. And it's very easy for them to see across all the data where before it was impossible to do. So now they have multiple large use cases streaming at them. And what I love about that particular case is at one point they were just trying to test our scale. So they started tossing more things at it, right. To see if they could kind of break us. So they spiked us up to 30 terabytes a day which is for Elastic would even 10 terabytes a day makes things fall over. Now, if you think of what they just did, what were doing is literally three steps, put your data in S3 and as fast as you can, don't modify, just put it there. Once it's there three steps connect to us, you give us readability access to those buckets and a place to write the indexy. All of that stuff is in your S3, it never comes out. And then basically you set up, do you want to do live or do you want to do real time analysis? Or do you want to go after old data? We do the rest, we ingest, we normalize the schema. And basically we give you our back and the refinery to give the right people access. So what they did is they basically throw a whole bunch of stuff at it. They were trying to outrun S3. So, you know, we're on shoulders of giants. You know, if you think about our platform for clients what's a better dental like than S3. You're not going to get a better cross curve, right? You're not going to get a better parallelism. And so, or security it's in your, you know a virtual environment. But if you... And also you can keep data in the right location. So Blackboard's a good example. They need to keep that in all the different regions and because it's personal data, they, you know, GDPR they got to keep data in that location. It's easy, we just put compute in each one of the different areas they are. But the net net is if you think that architecture is shoulders of giants if you think you can outrun by just sheer volume or you can put in more cost-effective place to keep long-term or you think you can out store you have so much data that S3 and glacier can't possibly do it. Then you got me at your bigger scale at me but that's the scale we'r&e talking about. So if you think about the spiked our throughput what they really did is they try to outrun S3. And we didn't pick up. Now, the next thing is they tossed a bunch of users at us which were just spinning up in our data fabric different ways to do the indexing, to keep up with it. And new use cases in case they're going after everyone gets their own worker nodes which are all expected to fail in place. So again, they did some of that but really they're like you guys handled all the influx. And if you think about it, it's the shoulders of giants being on top of an Amazon platform, which is amazing. You're not going to get a more cost effective data lake in the world, and it's continuing to fall in price. And it's a cost curve, like no other, but also all that resiliency, all that security and the parallelism you can get, out of an S3 Glacier is just a bar none is the most scalable environment, you can build an environment. And what we do is a thin layer. It's a data platform that allows you to have your data now fully searchable and queryable using your tools >> Right and you, you mentioned there that, I mean you're running in AWS, which has broad experience in doing these sorts of things at scale but on that operational management side of things. As you mentioned, you actually take that off, off the hands of customers so that you run it on their behalf. What are some of the areas that you see people making in trying to do this themselves, when you've gone into customers, and brought it into the EKL Search platform? >> Yeah, so either people are just trying their best to build out clusters of Elasticsearch or they're going to services like Logz.io, Sumo Logic or Amazon Elasticsearch services. And those are all basically on the same ELK Stack. So they have the exact same limits as the same bits. Then we see people trying to say, well I really want to go to a data lake. I want to get away from these database servers and which have their limits. I want to use a data Lake. And then we see a lot of people putting data into environments before they, instead of using Elasticsearch, they want to use SQL type tools. And what they do is they put it into a Parquet or Presto form. It's a Presto dialect, but it into Parquet and structure it. And they go a lot of other way to, Hey it's in the data lake, but they end up building these little islands inside their data lake. And it's a lot of time to transform the data, to get it in a format that you can go after our tools. And then what we do is we don't make you do that. Just literally put the data there. And then what we do is we do the index and a polish API. So right now it's Elasticsearch in a very short time we'll publish Presto or the SQL dialect. You can use the same tool. So we do see people, either brute forcing and trying their best with a bunch of physical servers. We do see another group that says, you know, I want to go use an Athena use cases, or I want to use a there's a whole bunch of different startups saying, I do data lake or data lake houses. But they are, what they really do is force you to put things in the structure before you get insight. True data lake economics is literally just put it there, and use your tools natively to go after it. And that's where we're unique compared to what we see from our competition. >> Hmm, so with people who have moved into ChaosSearch, what's, let's say pick one, if you can, the most interesting example of what people have started to do with, with their data. What's new? >> That's good. Well, I'll give you another one. And so Armor Security is a good one. So Armor Security is a security service company. You know, thousands of clients doing great I mean a beautiful platform, beautiful business. And they won Rackspace as a partner. So now imagine thousand clients, but now, you know massive scale that to keep up with. So that would be an example but another example where we were able to come in and they were facing a major upgrade of their environment just to keep up, and they expose actually to their customers is how their customers do logging analytics. What we're able to do is literally simply because they didn't go below the API they use the exact same tools that are on top and in 30 days replaced that use case, save them tremendous amount of dollars. But now they're able to go back and have unlimited retention. They used to restrict their clients to 14 days. Now they have an opportunity to do a bunch of different things, and possible revenue opportunities and other. But allow them to look at their business differently and free up their team to do other things. And now they're, they're putting billing and other things into the same environment with us because one is easy it's scale but also freed up their team. No one has enough team to do things. And then the biggest thing is what people do interesting with our product is actually in their own tools. So, you know, we talk about Kibana when we do SQL again we talk about Looker and Tableau and Power BI, you know, the really interesting thing, and we think we did the hard work on the data layer which you can say is, you know I can about all the ways you consolidate the performance. Now, what becomes really interesting is what they're doing at the visibility level, either Kibana or the API or Tableau or Looker. And the key thing for us is we just say, just use the tools you're used to. Now that might be a boring statement, but to me, a great value proposition is not changing what your end users have to use. And they're doing amazing things. They're doing the exact same things they did before. They're just doing it with more data at bigger scale. And also they're able to see across their different machine learning data compared to being limited going at one thing at a time. And that getting the correlation from a unified data lake is really what we, you know we get very excited about. What's most exciting to our clients is they don't have to tell the users they have to use a different tool, which, you know, we'll decide if that's really interesting in this conversation. But again, I always say we didn't build a new algorithm that you going to give the SecOp team or a new pipeline cool widget that going to help the machine learning team which is another API we'll publish. But basically what we do is a hard work of making the data platform scalable, but more importantly give you the APIs that you're used to. So it's the platform that you don't have to change what your end users are doing, which is a... So we're kind of invisible behind the scenes. >> Well, that's certainly a pretty strong proposition there and I'm sure that there's plenty of scope for customers to come and and talk to you because no one's creating any less data. So Ed, thanks for coming out of theCUBE. It's always great to see you here. >> Know, thank you. >> You've been watching theCUBE Virtual and our coverage of AWS re:Invent 2020 with special coverage of APN partner experience. Make sure you check out all our coverage online, either on your desktop, mobile on your phone, wherever you are. I've been your host, Justin Warren. And I look forward to seeing you again soon. (soft music)

Published Date : Dec 3 2020

SUMMARY :

the globe it's theCUBE, and our coverage of AWS re:Invent 2020 Well thank you for having me, loved it to have you back. and the time to manage these clusters, be able to save up to 80%. And we can give you a So yeah, if you could walk and the parallelism you can get, that you see people making it's in the data lake, but they end up what's, let's say pick one, if you can, I can about all the ways you It's always great to see you here. And I look forward to

ENTITIES

Entity	Category	Confidence
Justin Warren	PERSON	0.99+
Ed Walsh	PERSON	0.99+
$80	QUANTITY	0.99+
40 days	QUANTITY	0.99+
five days	QUANTITY	0.99+
Ed Walsh	PERSON	0.99+
90 days	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
AWS Global Partner Network	ORGANIZATION	0.99+
nine days	QUANTITY	0.99+
80%	QUANTITY	0.99+
10 terabytes	QUANTITY	0.99+
thousands	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
HubSpot	ORGANIZATION	0.99+
Ed	PERSON	0.99+
10%	QUANTITY	0.99+
Elasticsearch	TITLE	0.99+
30 days	QUANTITY	0.99+
Armor Security	ORGANIZATION	0.99+
14 days	QUANTITY	0.99+
thousand clients	QUANTITY	0.99+
Blackboard	ORGANIZATION	0.99+
Kleiner	ORGANIZATION	0.99+
S3	TITLE	0.99+
One	QUANTITY	0.99+
Alert Logic	ORGANIZATION	0.99+
three steps	QUANTITY	0.98+
one	QUANTITY	0.98+
GDPR	TITLE	0.98+
one thing	QUANTITY	0.98+
one data	QUANTITY	0.98+
one server	QUANTITY	0.98+
Elastic	TITLE	0.98+
70	QUANTITY	0.98+
SQL	TITLE	0.98+
about 80%	QUANTITY	0.97+
Kibana	TITLE	0.97+
first time	QUANTITY	0.97+
over $4 million a year	QUANTITY	0.97+
one cluster	QUANTITY	0.97+
first person	QUANTITY	0.97+
CloudFlare	TITLE	0.97+
ChaosSearch	ORGANIZATION	0.97+
this year	DATE	0.97+
Glacier	TITLE	0.97+
up to 80%	QUANTITY	0.97+
Parquet	TITLE	0.96+
each one	QUANTITY	0.95+
Splunk	ORGANIZATION	0.95+
Sumo Logic	ORGANIZATION	0.94+
up to 80	QUANTITY	0.94+
Power BI	TITLE	0.93+
today	DATE	0.93+
Rackspace	ORGANIZATION	0.92+
up to 30 terabytes a day	QUANTITY	0.92+
one point	QUANTITY	0.91+
S3 Glacier	COMMERCIAL_ITEM	0.91+
Elastic API	TITLE	0.89+

Chris Wiborg, Cohesity & Sabina Joseph, AWS | AWS re:Invent 2020

>> Announcer: From around the globe, it's theCUBE with digital coverage of AWS re:Invent 2020, sponsored by Intel, AWS and our community partners. >> Hello everyone, this is Dave Vellante and welcome to theCUBES Wall-To-Wall coverage of AWS re:Invent 2020 virtual reinvented our coverage over three weeks over cloud. We're looking into the next decade of innovation. And with me are two great guests, Chris Wiborg is the Vice President of Product Marketing at Cohesity and Sabina Joseph is the General Manager for Americas Technology for Partners AWS. Folks, thanks for coming to theCUBE. Great to see you. >> Great to be here today. Thanks for having us. >> You're very welcome. It's great to see you and Chris, before we get into the partnership, I want to ask kind of what you've seen in the market, with the increased focus on data, digital business, obviously the last nine months, people have really shifted their priorities. How have you seen customers responding? >> Yeah, it's sort of strange to say this at a time. It's really hard for all of us dealing with a global pandemic, but the market has picked up in many ways and perhaps that's not surprising given a lot of folks have started to shift things to more virtual way of working and the data hasn't slowed down. And so with that we've also seen a little bit of a shift and this is part of the reason behind the announcement we're making of trying to accelerate for many organizations projects that had originally been planned to put in a data center to moving more towards the cloud. Part of this as a CapEx to OpEX shift. But I think it also in some cases all is under this umbrella of digital transformation, where they're trying to accelerate new ways of doing things while in some cases, people can't even get into data centers in some cases anymore. And so how can you do that more remotely? How can you go to a model to loot more Self-Help? And all that leads up to part of what we're going to be talking about today. So the market has been very busy because again, data growth hasn't slowed down. I think the one thing that I'd add to that is you'd see an uptake in terms of focus and interest in some of the things that we do because of all the ransomware attacks that are out there. That's another piece of it. >> I want to get into the announcement as well, but I mean, you're right, Chris, it's a very hexy, it's tough as it is for the climate. It's a good time to be in tech. It's even better if you're in cloud. So Sabina, I wonder if you'd had... I think you must have a lot of people in the ecosystem really wanting to work with you. >> We do, I think with the proliferation of data. And data across many different silos I think the key is, how do we provide customers more value from this data, that way they can make it optimal for their business. So, yes, we do have a lot of different partners wanting to work. >> Okay, so we're all busy. I feel like we've never worked so hard in our lives, but so Cohesity and AWS, you've announced a strategic collaboration. Tell me more about it. Why did you choose to collaborate together Chris, other than AWS is the number one cloud platform. What were some of the other factors that we should be focused on? >> I think it's the Sabina, please do chime in here as well. I think the big portion of it, Dave has to do with this shared vision that we have around. Really what we believe is the next chapter in data management. And so how do we make it simple for organizations to not only protect and secure and manage their data, but also get more value out of it and derive more value from that data, which is kind of what Sabina was hinting at. And a lot of the reasons that we think this is such a good match, given all the varying services Amazon has, that you can build off, given what Cohesity does. So Sabina, I know you going to start with customers. You always interviewed enough Amazon, and it was only us to know that's really the starting point, the prison from which you looked, but so from that prison, from your perspective, what's the collaboration? Why the collaboration? What does it bring for customers? >> So, you know, I've the saying here. I think there was a lot of alignment, both in terms of culture and working backwards from customers, customers session. And really kind of understand, what can we do right into the Intelligent Data Management Solution to enterprise and mid-sized customers and provide simplicity, flexibility, and reduced total cost of ownership. And that's where Cohesity and AWS, we really shared that vision. I would say over the last couple of years, Cohesity of course, has been a partner of AWS for quite some time now. And then when we started to talk to each other, we understood that these were some of the things we wanted to not just address, but also provide an opportunity for customers. So that's why we collaborated in this unique way to bring forward a Data Management as a service solution for our customers. >> All right, Chris, I really want to dig into this a little bit more because I've talked to a number of CEOs that have said, boy, our business resilience strategy was way to focused on DRA maybe too much focus on backup. We're now a digital business, because every business, so you're out of business, if you're not a digital business overnight. And so this notion of data management and data management as a service, what problems are you really focused on solving there? >> I think two things, Dave and let's go back to a Cohesity after solving as a company. And that's the problem with what we call mass data fragmentation, where you have data stored in many different locations, prem, cloud, edge, et cetera, typically in many different pieces of infrastructure. So there's a lot of silos going on there, and it's really hard to get your hands around the entirety of what you have. And first of all, make sure it is protected. And there's some compliance implications to that and so on. And then also again, how can you not only protected, but do more with it and get better transparency and more value out of that data that today might be dark, might be opaque because a, do you know where it is? And b, even if you do, what more can you do with it? And so that's kind of the first problem we're setting out to solve. And why as we look at moving to doing what we're doing with AWS, providing an alternate consumption option is also really important, we think. So some people have staff and skills to roll their own, to do their selves and cohesively we'll continue to support those customers, obviously, as we do today. But what we also want to provide a new option for those that want to make that shift from CapEx to OpEX, and more from a management of their environment doing it themselves to having somebody else manage it for them, and really reducing that cost and overhead associated with running your own data center effectively. And so bringing valuable Cohesity leaders to the cloud is the second piece of that, where we want to make sure we carry that bigger vision along where we're not just doing one thing, we're doing multiple things. And so Data Management in our sense is not just about backup, although that's the first thing you'll see. We're also going to tackle that dr problem, you raised as well. If you look closely a couple of weeks ago, we made an announcement around what we're doing with a product we call Site Continuity on the on-prem world, guess what that's going to come real soon to AWS. And then beyond that files and objects, test data management and as we'll get to a little bit later more when we start leveraging the value and the power of some of the advanced services, AWS hasn't been to the table for things like compliance and so on. >> Great, thank you for that. And so Sabina, I mean, we run on AWS, we're small, but still we go into the console and there's this buffet of services and we have a lot of options. So, I wonder if you could talk about customer choice, your philosophy around that, why that's important, how you're providing different deployment models. And the example I would use is why is backup as a service? Not enough, why do we need to go beyond that? >> First of all, thank you very much for being our customer. >> Welcome. >> And I think the key behind this solution that Cohesity is building on top of AWS is to really provide one platform and one user interface. Yes, backup as a service is the first service that we will start with and we are starting with, but I think we all realize that customers do many different things, but get data. They do disaster recovery, they have file services, Dev and test, and then the value add services, which we'll talk about in a bit around analytics compliance, machine learning and so on. So those are all the different value, at least we want to provide the date with that data. In addition of course, backup as a service disaster recovery, as a service file services and so on. Well the backup services comprehensive that we are launching with and provide some rich protection across all of this data, but at the end of the day, it's customer's choice whether they want to manage your own data and infrastructure or Cohesity kind of manage this across the infrastructure for them both in a hybrid model and in a cloud model. And we have many customers kind of wanting to look at both options because they had both environments. I don't know Chris, if you want to talk about Dolby a little bit, but I can certainly get into it. I don't know if you want to get a little bit into Dolby and how they're using it. >> Yeah, that's a great example, actually Sabina. So, I think Dave, Sabina is suggesting, one of our early design partners on this was Dolby and they're an existing Cohesity customer. Today they're very happy what we're doing on-prem. And so I asked them why would you be interested in managing data also in the cloud? And his answer was, well, "look for me, it's really all about the self-help option. "I have a lot of clients, I do well centrality, "I have a lot of clients in my organization, "but I want to point to do their own thing "and not have to directly manage them. "This is going to be the perfect option for them. "They can just go sign up, connect and protect "to get started. All right, Step one." >> I talked to another customer who commented well in this sort of hybrid configuration that Sabina suggests the stuff that they have on-prem today. They'll probably protect on-prem, but workloads like let's say Microsoft 365, mailboxes or something like that, it's in the cloud. Why would they back haul that into their data center? Why not just protect it there in the cloud itself? It just seems to make sense. And then we also have customers we're talking to that, there are large distributed organizations where maybe the stuff that's in the branch office, the remote office, they want to backup to the cloud because of land back, haul costs and so on. It's easier to do it that way. And then the central stuff is still central. So we going to give as Sabina said, customers that choice. You can do cloud only if you want to, you can do prem only with us, or you can do both. And we expect a lot of customers loaded up in a third bucket and that sort of hybrid scenario and let them choose why they do it and use that combination. The great thing is when you go to Cohesity Helio's, that's going to be the control center, if you will, for both things on-prem and also in this new DevOps offering in cloud. So one experience from a manageability standpoint, that's just the only thing I'd add to Sabina's answer about what's great about this and why you want to do more than just one thing. Well, if you sort of solve this problem of infrastructure silos and in your traditional data center, and now you're bringing in the cloud, why we create silos and best of breed things all over again, don't you want to consolidate some of that for ease of use and lower cost of ownership as well. And so that's one of the things we think we're going to bring to the table. It's pretty unique versus letting customers pick and choose, five or 10 different solutions and trying to merge those together. We think we've got a better way. >> Got it. So then let's come back to some of the comments you were making about added value. So what the customers really do with data, with data management as a service and AWS that maybe they couldn't do before. >> So the way I look at it, Cohesity and AWS are custodians of this data, on behalf of the customer, ultimately it is their data, but we want to unlock the value from this data versus having it being in different silos, different locations and so on. So the vision that we have, which we are on the road right now, in terms of unlocking this data is to really add additional services, maybe compliance as a service, analytics as a service, machine learning as a service. So let's just kind of walk through these three things, So if you think about compliance as a service, using Amazon Macie, which uses machine learning to really kind of discover, classify and protect sensitive data. And if you think about analytics as a service, using AWS Glue to run ETL on this data, Amazon Athena to run sequel queries and then potentially create data warehouse using Amazon Redshift. Then if you really start thinking about other machine learning services, right across the AWS machine learning stack, if you look at it at a high level, customers could use Amazon text tracks, Amazon transcribe to extract value from the Metadata to allow deeper business specific content that they need for their different solutions they have to end customers. For example, another logical use case could be Amazon comprehend medical using that to kind of distract extract medical information from this data. And then finally customers can also use Amazon SageMaker to build advanced machine learning models, to really start deriving even additional value and gain business insights from this data. So those are kind of the things we have in our mind, in terms of compliance to service, machine learning as the service, analytics as a service. And then of course, I want to bring in Chris here to talk a little bit about what they plan to do with their MarketPlace, the Cohesity Marketplace. >> Yeah, no, I think, it's a great Sabina. So we've always had this concept at Cohesity, Dave, of being able to do more with your data. And you've seen express so far in our marketplace, which is still going to be there. We just think plugging some of the additional services that Sabina mentioned. When you have a center of gravity for your data in the cloud is going to make that concept even more powerful. And so day one, when we GA just right now, actually during re:Invent you going to be able to do it yourself. You'll have data backed up into the cloud. For example, you can apply those services if you have the skill to do that. But over time, working in conjunction with Amazon, the goal is to be able to make those services something that you would just go in again to Helios and say, for example, turn on the compliance service. And behind the scenes we're invoking and it was on Macie doing all right thing with all the data under management like Cohesity already. And so you just get them to report back out if that's what you're aiming to do. And so we going to try and make this as simple and easy to use as possible, leveraging the power of all the great things that Amazon has does through the API that they have combined with what we do in an engineering effort that we'll be driving with our guidance, to really give a great value, add customers far beyond the insurance policy you get with backup and being able to do more with that data and add value to your organization. >> And that's okay. So you've announced at re:Invent GA of Cohesity dataprotect how should customers think about getting started? >> Well, they can get started today, since we're an LGA I just go to www.queasy.com and I have the ability to go ahead there and actually join in on a free trial and to get started. And if they decided to convert them, then they can go from there. So risk-free gone in, check it out. We welcome feedback as always from our customers and then stay tuned because right around the corner after we're done with one offer as part of the bigger DevOps umbrella, you'll see disaster recovery and additional services, really the whole value of the Cohesity platform over time delivered through AWS. >> As a service bring it on guys, Sabina and Chris, thanks so much, really appreciate you coming on and thank you for watching everyone. Keep it right there with digging deep into AWS and the re:Invent ecosystem. You're watching theCUBE. (upbeat music)

Published Date : Dec 1 2020

SUMMARY :

Announcer: From around the globe, and Sabina Joseph is the General Manager Great to be here today. It's great to see you in some of the things that we do I think you must have a lot of people the proliferation of data. other than AWS is the And a lot of the reasons that we think to talk to each other, And so this notion of data management And that's the problem with what we call And the example I would use First of all, thank you very the date with that data. "This is going to be the And so that's one of the things we think and AWS that maybe they So the vision that we have, of being able to do more with your data. And that's okay. and I have the ability to go ahead there and the re:Invent ecosystem.

ENTITIES

Entity	Category	Confidence
Chris	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Dave	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Sabina	PERSON	0.99+
Chris Wiborg	PERSON	0.99+
five	QUANTITY	0.99+
Today	DATE	0.99+
Sabina Joseph	PERSON	0.99+
Cohesity	ORGANIZATION	0.99+
first service	QUANTITY	0.99+
both	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
one platform	QUANTITY	0.99+
today	DATE	0.99+
both options	QUANTITY	0.99+
Dolby	ORGANIZATION	0.99+
second piece	QUANTITY	0.98+
www.queasy.com	OTHER	0.98+
Intel	ORGANIZATION	0.98+
Helios	ORGANIZATION	0.98+
both environments	QUANTITY	0.97+
one	QUANTITY	0.97+
Cohesity Helio	ORGANIZATION	0.97+
two great guests	QUANTITY	0.96+
first problem	QUANTITY	0.96+
one offer	QUANTITY	0.95+
First	QUANTITY	0.95+
10 different solutions	QUANTITY	0.95+
over three weeks	QUANTITY	0.94+
Americas Technology	ORGANIZATION	0.92+

SEAGATE AI FINAL

>>C G technology is focused on data where we have long believed that data is in our DNA. We help maximize humanity's potential by delivering world class, precision engineered data solutions developed through sustainable and profitable partnerships. Included in our offerings are hard disk drives. As I'm sure many of you know, ah, hard drive consists of a slider also known as a drive head or transducer attached to a head gimbal assembly. I had stack assembly made up of multiple head gimbal assemblies and a drive enclosure with one or more platters, or just that the head stacked assembles into. And while the concept hasn't changed, hard drive technology has progressed well beyond the initial five megabytes, 500 quarter inch drives that Seagate first produced. And, I think 1983. We have just announced in 18 terabytes 3.5 inch drive with nine flatters on a single head stack assembly with dual head stack assemblies this calendar year, the complexity of these drives further than need to incorporate Edge analytics at operation sites, so G Edward stemming established the concept of continual improvement and everything that we do, especially in product development and operations and at the end of World War Two, he embarked on a mission with support from the US government to help Japan recover from its four time losses. He established the concept of continual improvement and statistical process control to the leaders of prominent organizations within Japan. And because of this, he was honored by the Japanese emperor with the second order of the sacred treasure for his teachings, the only non Japanese to receive this honor in hundreds of years. Japan's quality control is now world famous, as many of you may know, and based on my own experience and product development, it is clear that they made a major impact on Japan's recovery after the war at Sea Gate. The work that we've been doing and adopting new technologies has been our mantra at continual improvement. As part of this effort, we embarked on the adoption of new technologies in our global operations, which includes establishing machine learning and artificial intelligence at the edge and in doing so, continue to adopt our technical capabilities within data science and data engineering. >>So I'm a principal engineer and member of the Operations and Technology Advanced Analytics Group. We are a service organization for those organizations who need to make sense of the data that they have and in doing so, perhaps introduce a different way to create an analyzed new data. Making sense of the data that organizations have is a key aspect of the work that data scientist and engineers do. So I'm a project manager for an initiative adopting artificial intelligence methodologies for C Gate manufacturing, which is the reason why I'm talking to you today. I thought I'd start by first talking about what we do at Sea Gate and follow that with a brief on artificial intelligence and its role in manufacturing. And I'd like them to discuss how AI and machine Learning is being used at Sea Gate in developing Edge analytics, where Dr Enterprise and Cooper Netease automates deployment, scaling and management of container raised applications. So finally, I like to discuss where we are headed with this initiative and where Mirant is has a major role in case some of you are not conversant in machine learning, artificial intelligence and difference outside some definitions. To cite one source, machine learning is the scientific study of algorithms and statistical bottles without computer systems use to effectively perform a specific task without using explicit instructions, relying on patterns and inference Instead, thus, being seen as a subset of narrow artificial intelligence were analytics and decision making take place. The intent of machine learning is to use basic algorithms to perform different functions, such as classify images to type classified emails into spam and not spam, and predict weather. The idea and this is where the concept of narrow artificial intelligence comes in, is to make decisions of a preset type basically let a machine learn from itself. These types of machine learning includes supervised learning, unsupervised learning and reinforcement learning and in supervised learning. The system learns from previous examples that are provided, such as images of dogs that are labeled by type in unsupervised learning. The algorithms are left to themselves to find answers. For example, a Siris of images of dogs can be used to group them into categories by association that's color, length of coat, length of snout and so on. So in the last slide, I mentioned narrow a I a few times, and to explain it is common to describe in terms of two categories general and narrow or weak. So Many of us were first exposed to General Ai in popular science fiction movies like 2000 and One, A Space Odyssey and Terminator General Ai is a I that can successfully perform any intellectual task that a human can. And if you ask you Lawn Musk or Stephen Hawking, this is how they view the future with General Ai. If we're not careful on how it is implemented, so most of us hope that is more like this is friendly and helpful. Um, like Wally. The reality is that machines today are not only capable of weak or narrow, a I AI that is focused on a narrow, specific task like understanding, speech or finding objects and images. Alexa and Google Home are becoming very popular, and they can be found in many homes. Their narrow task is to recognize human speech and answer limited questions or perform simple tasks like raising the temperature in your home or ordering a pizza as long as you have already defined the order. Narrow. AI is also very useful for recognizing objects in images and even counting people as they go in and out of stores. As you can see in this example, so artificial intelligence supplies, machine learning analytics inference and other techniques which can be used to solve actual problems. The two examples here particle detection, an image anomaly detection have the potential to adopt edge analytics during the manufacturing process. Ah, common problem in clean rooms is spikes in particle count from particle detectors. With this application, we can provide context to particle events by monitoring the area around the machine and detecting when foreign objects like gloves enter areas where they should not. Image Anomaly detection historically has been accomplished at sea gate by operators in clean rooms, viewing each image one at a time for anomalies, creating models of various anomalies through machine learning. Methodologies can be used to run comparative analyses in a production environment where outliers can be detected through influence in an automated real Time analytics scenario. So anomaly detection is also frequently used in machine learning to find patterns or unusual events in our data. How do you know what you don't know? It's really what you ask, and the first step in anomaly detection is to use an algorithm to find patterns or relationships in your data. In this case, we're looking at hundreds of variables and finding relationships between them. We can then look at a subset of variables and determine how they are behaving in relation to each other. We use this baseline to define normal behavior and generate a model of it. In this case, we're building a model with three variables. We can then run this model against new data. Observations that do not fit in the model are defined as anomalies, and anomalies can be good or bad. It takes a subject matter expert to determine how to classify the anomalies on classify classification could be scrapped or okay to use. For example, the subject matter expert is assisting the machine to learn the rules. We then update the model with the classifications anomalies and start running again, and we can see that there are few that generate these models. Now. Secret factories generate hundreds of thousands of images every day. Many of these require human toe, look at them and make a decision. This is dull and steak prone work that is ideal for artificial intelligence. The initiative that I am project managing is intended to offer a solution that matches the continual increased complexity of the products we manufacture and that minimizes the need for manual inspection. The Edge Rx Smart manufacturing reference architecture er, is the initiative both how meat and I are working on and sorry to say that Hamid isn't here today. But as I said, you may have guessed. Our goal is to introduce early defect detection in every stage of our manufacturing process through a machine learning and real time analytics through inference. And in doing so, we will improve overall product quality, enjoy higher yields with lesser defects and produce higher Ma Jin's. Because this was entirely new. We established partnerships with H B within video and with Docker and Amaranthus two years ago to develop the capability that we now have as we deploy edge Rx to our operation sites in four continents from a hardware. Since H P. E. And in video has been an able partner in helping us develop an architecture that we have standardized on and on the software stack side doctor has been instrumental in helping us manage a very complex project with a steep learning curve for all concerned. To further clarify efforts to enable more a i N M l in factories. Theobald active was to determine an economical edge Compute that would access the latest AI NML technology using a standardized platform across all factories. This objective included providing an upgrade path that scales while minimizing disruption to existing factory systems and burden on factory information systems. Resource is the two parts to the compute solution are shown in the diagram, and the gateway device connects to see gates, existing factory information systems, architecture ER and does inference calculations. The second part is a training device for creating and updating models. All factories will need the Gateway device and the Compute Cluster on site, and to this day it remains to be seen if the training devices needed in other locations. But we do know that one devices capable of supporting multiple factories simultaneously there are also options for training on cloud based Resource is the stream storing appliance consists of a kubernetes cluster with GPU and CPU worker notes, as well as master notes and docker trusted registries. The GPU nodes are hardware based using H B E l 4000 edge lines, the balance our virtual machines and for machine learning. We've standardized on both the H B E. Apollo 6500 and the NVIDIA G X one, each with eight in video V 100 GP use. And, incidentally, the same technology enables augmented and virtual reality. Hardware is only one part of the equation. Our software stack consists of Docker Enterprise and Cooper Netease. As I mentioned previously, we've deployed these clusters at all of our operations sites with specific use. Case is planned for each site. Moran Tous has had a major impact on our ability to develop this capability by offering a stable platform in universal control plane that provides us, with the necessary metrics to determine the health of the Kubernetes cluster and the use of Dr Trusted Registry to maintain a secure repository for containers. And they have been an exceptional partner in our efforts to deploy clusters at multiple sites. At this point in our deployment efforts, we are on prem, but we are exploring cloud service options that include Miranda's next generation Docker enterprise offering that includes stack light in conjunction with multi cluster management. And to me, the concept of federation of multi cluster management is a requirement in our case because of the global nature of our business where our operation sites are on four continents. So Stack Light provides the hook of each cluster that banks multi cluster management and effective solution. Open source has been a major part of Project Athena, and there has been a debate about using Dr CE versus Dr Enterprise. And that decision was actually easy, given the advantages that Dr Enterprise would offer, especially during a nearly phase of development. Cooper Netease was a natural addition to the software stack and has been widely accepted. But we have also been a work to adopt such open source as rabbit and to messaging tensorflow and tensor rt, to name three good lab for developments and a number of others. As you see here, is well, and most of our programming programming has been in python. The results of our efforts so far have been excellent. We are seeing a six month return on investment from just one of seven clusters where the hardware and software cost approached close to $1 million. The performance on this cluster is now over three million images processed per day for their adoption has been growing, but the biggest challenge we've seen has been handling a steep learning curve. Installing and maintaining complex Cooper needs clusters in data centers that are not used to managing the unique aspect of clusters like this. And because of this, we have been considering adopting a control plane in the cloud with Kubernetes as the service supported by Miranda's. Even without considering, Kubernetes is a service. The concept of federation or multi cluster management has to be on her road map, especially considering the global nature of our company. Thank you.

Published Date : Sep 15 2020

SUMMARY :

at the end of World War Two, he embarked on a mission with support from the US government to help and the first step in anomaly detection is to use an algorithm to find patterns

ENTITIES

Entity	Category	Confidence
Seagate	ORGANIZATION	0.99+
hundreds of years	QUANTITY	0.99+
two parts	QUANTITY	0.99+
python	TITLE	0.99+
six month	QUANTITY	0.99+
World War Two	EVENT	0.99+
C Gate	ORGANIZATION	0.99+
one	QUANTITY	0.99+
Stephen Hawking	PERSON	0.99+
Sea Gate	ORGANIZATION	0.99+
Japan	LOCATION	0.99+
Lawn Musk	PERSON	0.99+
Terminator	TITLE	0.99+
1983	DATE	0.99+
one part	QUANTITY	0.99+
two examples	QUANTITY	0.99+
A Space Odyssey	TITLE	0.99+
five megabytes	QUANTITY	0.99+
3.5 inch	QUANTITY	0.99+
second part	QUANTITY	0.99+
18 terabytes	QUANTITY	0.99+
first step	QUANTITY	0.99+
hundreds	QUANTITY	0.99+
both	QUANTITY	0.98+
NVIDIA	ORGANIZATION	0.98+
over three million images	QUANTITY	0.98+
first	QUANTITY	0.98+
each site	QUANTITY	0.98+
H B E. Apollo 6500	COMMERCIAL_ITEM	0.98+
each cluster	QUANTITY	0.98+
each image	QUANTITY	0.98+
one source	QUANTITY	0.98+
today	DATE	0.98+
G X one	COMMERCIAL_ITEM	0.98+
Cooper	PERSON	0.98+
second order	QUANTITY	0.98+
Japan	ORGANIZATION	0.98+
Hamid	PERSON	0.97+
Dr Enterprise	ORGANIZATION	0.97+
Cooper Netease	ORGANIZATION	0.97+
each	QUANTITY	0.97+
One	TITLE	0.97+
Theobald	PERSON	0.97+
nine flatters	QUANTITY	0.97+
one devices	QUANTITY	0.96+
Siris	TITLE	0.96+
hundreds of thousands of images	QUANTITY	0.96+
Docker Enterprise	ORGANIZATION	0.95+
Docker	ORGANIZATION	0.95+
seven clusters	QUANTITY	0.95+
two years ago	DATE	0.95+
US government	ORGANIZATION	0.95+
Mirant	ORGANIZATION	0.95+
Operations and Technology Advanced Analytics Group	ORGANIZATION	0.94+
four time losses	QUANTITY	0.94+
Wally	PERSON	0.94+
Japanese	OTHER	0.93+
two categories	QUANTITY	0.93+
H B E l 4000	COMMERCIAL_ITEM	0.9+
H B	ORGANIZATION	0.9+
three variables	QUANTITY	0.9+
General Ai	TITLE	0.87+
G Edward	PERSON	0.87+
Google Home	COMMERCIAL_ITEM	0.87+
$1 million	QUANTITY	0.85+
Miranda	ORGANIZATION	0.85+
Sea Gate	LOCATION	0.85+
Alexa	TITLE	0.85+
500 quarter inch drives	QUANTITY	0.84+
Kubernetes	TITLE	0.83+
single head	QUANTITY	0.83+
eight	QUANTITY	0.83+
Dr	TITLE	0.82+
variables	QUANTITY	0.81+
this calendar year	DATE	0.78+
H P. E.	ORGANIZATION	0.78+
2000	DATE	0.73+
Project Athena	ORGANIZATION	0.72+
Rx Smart	COMMERCIAL_ITEM	0.69+
dual	QUANTITY	0.68+
V 100	COMMERCIAL_ITEM	0.65+
close	QUANTITY	0.65+
four continents	QUANTITY	0.64+
GP	QUANTITY	0.62+

Dave Brown, Amazon | AWS Summit Online 2020

>> Narrator: From theCUBE studios in Palo Alto in Boston, connecting with thought leaders all around the world, this is theCUBE conversation. >> Everyone, welcome to the Cube special coverage of the AWS Summit San Francisco, North America all over the world, and most of the parts Asia, Pacific Amazon Summit is the hashtag. This is part of theCUBE Virtual Program, where we're going to be covering Amazon Summits throughout the year. I'm John Furrier, host of theCUBE. And of course, we're not at the events. We're here in the Palo Alto Studios, with our COVID-19 quarantine crew. And we got a great guest here from AWS, Dave Brown, Vice President of EC2, leads the team on elastic compute, and its business where it's evolving and most importantly, what it means for the customers in the industry. Dave, thanks for spending the time to come on theCUBE virtual program. >> Hey John, it's really great to be here, thanks for having me. >> So we got the summit going down. It's new format because of the shelter in place. They're going virtual or digital, virtualization of events. And I want to have a session with you on EC2, and some of the new things they're going on. And I think the story is important, because certainly around the pandemic, and certainly on the large scale, SaaS business models, which are turning out to be quite the impact from a positive standpoint, with people sheltering in place, what is the role of data in all this, okay? And also, there's a lot of pressure financially. We've had the payroll loan programs from the government, and to companies really looking at their bottom lines. So two major highlights going on in the world that's directly impacted. And you have some products, and news around this, I want to do a deep dive on that. One is AppFlow, which is a new integration service by AWS, that really talks about taking the scale and value of AWS services, and integrating that with SaaS Applications. And the migration acceleration program for Windows, which has a storied history of database. For many, many years, you guys have been powering most of the Windows workloads, ironic that you guys are not Microsoft, but certainly had success there. Let's start with the AppFlow. Okay, this was recently announced on the 22nd of April. This is a new service. Can you take us through why this is important? What is the service? Why now, what was the main driver behind AppFlow? >> Yeah, absolutely. So with the launcher AppFlow, what we're really trying to do is make it easy for organizations and enterprises to really control the flow of their data, between the number of different applications that they use on premise, and AWS. And so the problem we started to see was, enterprises just had this data all over the place, and they wanted to do something useful with it. Right, we see many organizations running Data Lakes, large scale analytics, Big Machine Learning on AWS, but before you can do all of that, you have to have access to the data. And if that data is sitting in an application, either on-premise or elsewhere in AWS, it's very difficult to get out of that application, and into S3, or Redshift, or one of those services, before you can manipulate it, that was the challenge. And so the journey kind of started a few years ago, we actually launched a service on the EC2 network, inside Private Link. And it was really, it provided organizations with a very secure way to transfer network data, both between VPCs, and also between VPC, and on-prem networks. And what this highlighted to us, is organizations say that's great, but I actually don't have the technical ability, or the team, to actually do the work that's required to transform the data from, whether it's Salesforce, or SAP, and actually move it over Private Link to AWS. And so we realized, while private link was useful, we needed another layer of service that actually provided this, and one of the key requirements was an organization must be able to do this with no code at all. So basically, no developer required. And I want to be able to transfer data from Salesforce, my Salesforce database, and put that in Redshift together with some other data, and then perform some function on that. And so that's what AppFlow is all about. And so we came up with the idea about a little bit more than a year ago, that was the first time I sat down, and actually reviewed the content for what this was going to be. And the team's been hard at work, and launched on the 22nd of April. And we actually launched with 14 partners as well, that provide what we call connectors, which allow us to access these various services, and companies like Salesforce and ServiceNow, Slack, Snowflake, to name a few. >> Well, certainly you guys have a great ecosystem of SaaS partners, and that's you know well documented in the industry that you guys are not going to be competing directly with a lot of these big SaaS players, although you do have a few services for customers who want end to end, Jassy continues to pound that home on my Cube interviews. But I think this, >> Absolutely. is notable, and I want to get your thoughts on this, because this seems to be the key unlocking of the value of SaaS and Cloud, because data traversal, data transfer, there's costs involved, also moving traffic over the internet is unsecure, and unreliable. So a couple questions I wanted to just ask you directly. One is did the AppFlow come out of the AWS Private Link piece of it? And two, is it one directional or bi-directional? How is that working? Because I'm guessing that you had Private Link became successful, because no one wants to move on the internet. They wanted direct connects. Was there something inadequate about that service? Was there more headroom there? And is it bi-directional for the customer? >> So let me take the second one, it's absolutely bi-directional. So you can transfer that data between an on-premise application and AWS, or AWS and the on-premise application. Really, anything that has a connector can support the data flow in both directions. And with transformations, and so data in one data source, may need to be transformed, before it's actually useful in a second data source. And so AppFlow takes care of all that transformation as well, in both directions, And again, with no requirement for any code, on behalf of the customer. Which really unlocks it for a lot of the more business focused parts of an organization, who maybe don't have immediate access to developers. They can use it immediately, just literally with a few transformations via the console, and it's working for you. In terms of, you mentioned sort of the flow of data over the internet, and the need for security of data. It's critically important, and as we look at just what had happened as a company does. We have very, very strict requirements around the flow of data, and what services we can use internally. And where's any of our data going to be going? And I think it's a good example of how many enterprises are thinking about data today. They don't even want to trust even HTTPS, and encryption of data on the internet. I'd rather just be in a world where my data never ever traverses the internet, and I just never have to deal with that. And so, the journey all started with Private Link there, and probably was an interesting feature, 'cause it really was changing the way that we asked our customers to think about networking. Nothing like Private Link has ever existed, in the sort of standard networking that an enterprise would normally have. It's kind of only possible because of what VPC allows you to do, and what the software defined network on AWS gives you. And so we built Private Link, and as I said, customers started to adopt it. They loved the idea of being able to transfer data, either between VPCs, or between on-premise. Or between their own VPC, and maybe a third party provider, like Snowflake, has been a very big adopter of Private Link, and they have many customers using it to get access to Snowflake databases in a very secure way. And so that's where it all started, and in those discussions with customers, we started to see that they wanted us to up level a little bit. They said, "We can use Private Link, it's great, "but one of the problems we have is just the flow of data." And how do we move data in a very secure, in a highly available way, with no sort of bottlenecks in the system. And so we thought Private Link was a great sort of underlying technology, that empowered all of this, but we had to build the system on top of that, which is AppFlow. That says we're going to take care of all the complexity. And then we had to go to the ecosystem, and say to all these providers, "Can you guys build connectors?" 'Cause everybody realized it's super important that data can be shared, and so that organizations can really extract the value from that data. And so the 14 of them at launch, we have many, many more down the road, have come to the party with with connectors, and full support of what AppFlow provides. >> Yeah us DevOps purists always are pounding the fist on the table, now virtual table, API's and connectors. This is the model, so people are integrating. And I want to get your thoughts on this. I think you said low code, or no code on the developer simplicity side. Is it no code, or low code? Can you just explain quickly and clarify that point? >> It's no code for getting started literally, for the kind of, it's basic to medium complexity use case. It's not code, and a lot of customers we spoke to, that was a bottleneck. Right, they needed something from data. It might have been the finance organization, or it could have been human resources, somebody else in organization needed that. They don't have a developer that helps them typically. And so we find that they would wait many, many months, or maybe even never get the project done, just because they never ever had access to that data, or to the developer to actually do the work that was required for the transformation. And so it's no code for almost all use cases. Where it literally is, select your data source, select the connector, and then select the transformations. And some basic transformations, renaming of fields, transformation of data in simple ways. That's more than sufficient for the vast majority of use cases. And then obviously through to the destination, with the connector on the other side, to do the final transformation, to the final data source that you want to migrate the data to. >> You know, you have an interesting background, was looking at your history, and you've essentially been a web services kind of guy all your life. From a code standpoint software environment, and now I'll say EC2 is the crown jewel of AWS, and doing more and more with S3. But what's interesting, as you build more of these layers services in there, there's more flexibility. So right now, in most of the customer environments, is a debate around, do I build something monolithic, and or decoupled, okay? And I think there's a world where there's a mutually, not mutually exclusive, I mean, you have a mainframe, you have a big monolithic thing, if it does something. But generally people would agree that a decoupled environment is more flexible, and more agile. So I want to kind of get to the customer use case, 'cause I can really see this being really powerful, AppFlow with Private Link, where you mentioned Snowflake. I mean, Snowflake is built on AWS, they're doing extremely, extremely well, like any other company that builds on AWS. Whether it's theCUBE Cloud, or it's Snowflake. As we tap those services, customers, we might have people who want to build on our platform on top of AWS. So I know a bunch of startups that are building within the Snowflake ecosystem, a customer of yours. >> Yeah. >> So they're technically a customer of Amazon, but they're also in the ecosystem of say, Snowflake. >> Yes. >> So this brings up an interesting kind of computer science problem, which is architecturally, how do I think about that? Is this something where AppFlow could help me? Because I certainly want to enable people to build on a platform, that I build if I'm doing that, if I'm not going to be a pure SaaS turnkey application. But if I'm going to bring partners in, and do integration, use the benefits of the goodness of an API or Connector driven architecture, I need that. So explain to me how this helps me, or doesn't help me. Is this something that makes sense to you? Does this question make sense? How do you react to that? >> I think so, I think the question is pretty broad. But I think there's an element in which I can help. So firstly, you talk about sort of decoupled applications, right? And I think that is certainly the way that we've gone at Amazon, and been very, very successful for us. I think we started that journey back in 2003, when we decoupled the monolithic application that was amazon.com. And that's when our service journey started. And a lot of that sort of inspired AWS, and how we built what we built today. And we see a lot of our customers doing that, moving to smaller applications. It just works better, it's easier to debug, there's ownership at a very controlled level. So you can get all your engineering teams to have very clear and crisp ownership. And it just drives innovation, right? 'Cause each little component can innovate without the burden of the rest of the ecosystem. And so that's what we really enjoy. I think the other thing that's important when you think about design, is to see how much of the ecosystem you can leverage. And so whether you're building on Snowflake, or you're building directly on top of AWS, or you're building on top of one of our other customers and partners. If you can use something that solves the problem for you, versus building it yourself. Well that just leaves you with more time to actually go and focus on the stuff that you need to be solving, right? The product you need to be building. And so in the case of AppFlow, I think if there's a need for transfer of data, between, for example, Snowflake and some data warehouse, that you as an organisation are trying to build on a Snowflake infrastructure. AppFlow is something you could potentially look at. It's certainly not something that you could just use for, it's very specific and focused to the flow of data between services from a data analytics point of view. It's not really something you could use from an API point of view, or messaging between services. It's more really just facilitating that flow of data, and the transformation of data, to get it into a place that you can do something useful with it. >> And you said-- >> But like any of our services-- (speakers talk over each other) Couldn't be using any layer in the stack. >> Yes, it's a level of integration, right? There's no code to code, depending on how you look at it, cool. Customer use cases, you mentioned, large scale analytics, I thought I heard you say, machine learning, Data Lakes. I mean, basically, anyone who's using data is going to want to tap some sort of data repository, and figure out how to scale data when appropriate. There's also contextual, relevant data that might be specific to say, an industry vertical, or a database. And obviously, AI becomes the application for all this. >> Exactly. >> If I'm a customer, how does AppFlow relate to that? How does that help me, and what's the bottom line? >> So I think there's two parts to that journey. And depending on where customers are, and so there's, we do have millions of customers today that are running applications on AWS. Over the last few years, we've seen the emergence of Data Lakes, really just the storage of a large amount of data, typically in S3. But then companies want to extract value out of, and use in certain ways. Obviously, we have many, many tools today, from Redshift, Athena, that allow you to utilize these Data Lakes, and be able to run queries against this information. Things like EMR, and one of our oldest services in the space. And so doing some sort of large scale analytics, and more recently, services like SageMaker, are allowing us to do machine learning. And so being able to run machine learning across an enormous amount of data that we have stored in AWS. And there's some stuff in the IoT, workload use space as well, that's emerging. And many customers are using it. There's obviously many customers today that aren't using it on AWS, potential customers for us, that are looking to do something useful with data. And so the one part of the journey is taking up all of that infrastructure, and we have a lot of services that make it really easy to do machine learning, and do analytics, and that sort of thing. And then the other problem, the other side of the problem, which is what AppFlow is addressing is, how do I get that data to S3, or to Redshift, to actually go and run that machine learning workload? And that's what it's really unlocking for customers. And it's not just the one time transfer of data, the other thing that AppFlow actually supports, is the continuous updating of data. And so if you decide that you want to have that view of your data in S3, for example, and Data Lake, that's kept up to date, within a few minutes, within an hour, you can actually configure AppFlow to do that. And so the data source could be Salesforce, it could be Slack, it could be whatever data source you want to blend. And you continuously have that flow of data between those systems. And so when you go to run your machine learning workload, or your analytics, it's all continuously up to date. And you don't have this problem of, let me get the data, right? And when I think about some of the data jobs that I've run, in my time, back in the day as an engineer, on early EC2, a small part of it was actually running the job on the data. A large part of it was how do I actually get that data, and is it up to date? >> Up to date data is critical, I think that's the big feature there is that, this idea of having the data connectors, really makes the data fresh, because we go through the modeling, and you realize why I missed a big patch of data, the machine learnings not effective. >> Exactly. >> I mean, it's only-- >> Exactly, and the other thing is, it's very easy to bring in new data sources, right? You think about how many companies today have an enormous amount of data just stored in silos, and they haven't done anything with it. Often it'll be a conversation somewhere, right? Around the coffee machine, "Hey, we could do this, and we can do this." But they haven't had the developers to help them, and haven't had access to the data, and haven't been able to move the data, and to put it in a useful place. And so, I think what we're seeing here, with AppFlow, really unlocking of that. Because going from that initial conversation, to actually having something running, literally requires no code. Log into the AWS console, configure a few connectors, and it's up and running, and you're ready to go. And you can do the same thing with SageMaker, or any of the other services we have on the other side that make it really simple to run some of these ideas, that just historically have been just too complicated. >> Alright, so take me through that console piece. Just walk me through, I'm in, you sold me on this. I just came out of meeting with my company, and I said, "Hey, you know what? "We're blowing up this siloed approach. "We want to kind of create this horizontal data model, "where we can mix "and match connectors based upon our needs." >> Yeah. >> So what do I do? I'm using SageMaker, using some data, I got S3, I got an application. What do I do? I'm connecting what, S3? >> Yeah, well-- >> To the app? >> So the simplest thing is, and the simplest place to find this actually, is on Jeff Bezos blog, that he did for the release, right? Jeff always does a great job in demonstrating how to use our various products. But it literally is going into the standard AWS console, which is the console that we use for all of our services. I think we have 200 of them, so it is getting kind of challenging to find the ball in that console, as we continue to grow. And find AppFlow. AppFlow is a top level service, and so you'll see it in the console. And the first thing you got to do, is you got to configure your Source-Connect. And so it's a connector that, where's the data coming from? And as I said, we had 14 partners, you'll be able to see those connectors there, and see what's supported. And obviously, there's the connectivity. Do you have access to that data, or where is the data running? AppFlow runs within AWS, and so you need to have either VPN, or direct connect back to the organization, if the data source is on-premise. If the data source happens to be in AWS, and obviously be in a VPC, and you just need to configure some of that connectivity functionality. >> So no code if the connectors are there, but what if I want to build my own connector? >> So building your own connector, that is something that we working with third parties with right now. I could be corrected, but not 100% sure whether that's available. It's certainly something I think we would allow customers to do, is to extend sort of either the existing connectors, or to add additional transformations as well. And so you'd be able to do that. But the transformations that the vast majority of our customers are using are literally just in the console, with the basic transformations. >> It comes bigger apps that people have, and just building those connectors. How does a partner get involved? You got 14 partners now, how do you extend the partner base contact in Amazon Partner Manager, or you send an email to someone? How does someone get involved? What are you recommending? >> So there are a couple of ways, right? We have an extensive partner ecosystem that the vast majority of these ISVs are already integrated with. And so, we have the 14 we launched with, we also pre announced SAP, which is going to be a very critical one for the vast majority of our customers. Having deep integration with SAP data, and being able to bring that seamlessly into AWS. That'll be launching soon. And then there's a long list of other ones, that we're currently working on. And they're currently working on them themselves. And then the other one is going to be, like with most things that Amazon, feedback from customers. And so what we hear from customers, and very often you'll hear from third party partners as well, who'll come and say, "Hey, my customers are asking me "to integrate with the AppFlow, what do I need to do?" And so, you know, just reaching out to AWS, and letting them know that you'd be interested in integrating, that you're not part of the partner program. The team would be happy to engage, and bring you on board, so-- >> (mumbles) on playbook, get the top use cases nailed down, listen to customers, and figure it out. >> Exactly. >> Great stuff Dave, we really appreciate it. I'm looking forward to digging in AppFlow, and I'll check on Jeff Bezos blog. Sure, it's April 22, was the launch day, probably had up there. One of the things that want to just jump into, now moving into the next topic, is the cost structure. A lot of pressure on costs. This is where I think this Migration Acceleration Program for Windows is interesting. Andy Jassy always likes to boast on stage at Reinvent, about the number of workloads of Windows running on Amazon Web Services. This has been a big part of the customers, I think, for over 10 years, that I can think of him talking about this. What is this about? Are you still seeing uptake on Windows workloads, or, I mean,-- >> Absolutely. >> Azure has got some market share, >> Absolutely. >> but now you, doesn't really kind of square in my mind, what's going on here. Tell us about this migration service. >> Yeah, absolutely, on the migration side. So Windows is absolutely, we still believe AWS is the best place to run a Windows workload. And we have many, many happy Windows customers today. And it's a very big, very large, growing point of our business today, it used to be. I was part of the original team back in 2008, that launched, I think it was Windows 2008, back then on EC2. And I remember sort of working out all the details, of how to do all the virtualization with Windows, obviously back then we'd done Linux. And getting Windows up and running, and working through some of the challenges that Windows had as an operating system in the early days. And it was October 2008 that we actually launched Windows as an operating system. And it's just been, we've had many, many happy Windows customers since then. >> Why is Amazon so peak to run workloads from Windows so effectively? >> Well, I think, sorry what did you say peaked? >> Why is Amazon so in well positioned to run the Windows workloads? >> Well, firstly, I mean, I think Windows is really just the operating system, right? And so if you think about that as the very last little bit of your sort of virtualization stack, and then being able to support your applications. What you really have to think about is, everything below that, both in terms of the compute, so performance you're going to get, the price performance you're going to get. With our Nitro Hypervisor, and the Nitro System that we developed back in 2018, or launched in 2018. We really are able to provide you with the best price performance, and have the very least overhead from a hypervisor point of view. And then what that means is you're getting more out of your machine, for the price that you pay. And then you think about the rest of the ecosystem, right? Think about all the other services, and all the features, and just the breadth, and the extensiveness of AWS. And that's critically important for all of our Windows customers as well. And so you're going to have things like Active Directory, and these sort of things that are very Windows specific, and we can absolutely support all of those, natively. And in the Windows operating system as well. We have things like various agents that you can run inside the Windows box to do more maintenance and management. And so I think we've done a really good job in bringing Windows into the larger, and broader ecosystem of AWS. And it really is just a case of making sure that Windows runs smoothly. And that's just the last little bit on top of that, and so many customers enterprises run Windows today. When I started out my career, I was developing software in the banking industry, and it was a very much a Windows environment. They were running critical applications. And so we see it's critically important for customers who run Windows today, to be able to bring those Windows workloads to AWS. >> Yeah, and that's certainly-- >> We are seeing a trend. Yeah, sorry, go ahead. >> Well, they're certainly out there from a market share standpoint, but this is a cost driver, you guys are saying, and I want you to just give an example, or just illustrate why it costs less. How is it a cost savings? Is it just services, cycle times on EC2? I mean what's the cost savings? I'm a customer like, "Okay, so I'm going to go to Amazon with my workloads." Why is it a cost saving? >> I think there are a few things. The one I was referring to in my previous comment was the price performance, right? And so if I'm running on a system, where the hypervisor is using a significant portion of the physical CPU that I want to use as well. Well there's an overhead to that. And so from a price performance point of view, I look at, if I go and benchmark a CPU, and I look at how much I pay for that per unit of that benchmark, it's better on AWS. Because with our natural system, we're able to give you 100% of the floor. And so you get a performance then. So that's the first thing is price performance, which is different from this price. But there's a saving there as well. The other one is a large part, and getting into the migration program as well. A large part of what we do with our customers, when they come to AWS, is supposed to be, we take a long look at their license strategy. What licenses do they have? And a key part of bringing in Windows workloads AWS, is license optimization. What can we do to help you optimize the licenses that you're using today for Windows, for SQL Server, and really try and find efficiencies in that. And so we're able to secure significant savings for many of our customers by doing that. And we have a number of tools that they use as part of the migration program to do that. And so that helps save there. And then finally, we have a lot of customers doing what we call modernization of their applications. And so it really embraced Cloud, and some of the benefits that you get from Cloud. Especially elasticities, so being able to scale for demand. It's very difficult to do that when you bound by license for your operating system, because every box you run, you have to have a license for it. And so tuning auto scaling on, you've got to make sure you have enough licenses for all these Windows boxes you've seen. And so the push the Cloud's bringing, we've seen a lot of customers move Windows applications from Windows to Linux, or even move SQL Server, from SQL server to SQL Server on Linux, or another database platform. And do a modernization there, that already allows them to benefit from the elasticity that Cloud provides, without having to constantly worry about licenses. >> So final question on this point, migration service implies migration from somewhere else. How do they get involved? What's the onboarding process? Can you give a quick detail on that? >> Absolutely, so we've been helping customers with migrations for years. We've launched a migration program, or Migration Acceleration Program, MAP. We launched it, I think about 2016, 2017 was the first part of that. It was really just a bringing together of the various, the things we'd learned, the tools we built, the best strategies to do a migration. And we said, "How do we help customers looking "to migrate to the Cloud." And so that's what MAP's all about, is just a three phase, we'll help you assess the migration, we'll help you do a lot of planning. And then ultimately, we help you actually do the migration. We partner with a number of external partners, and ISVs, and GSIs, who also worked very closely with us to help customers do migrations. And so what we launched in April of this year, with the Windows migration program, is really just more support for Windows workload, as part of the broader Migration Acceleration Program. And there's benefits to customers, it's a smoother migration, it's a faster migration in almost all cases, we're doing license assessments, and so there's cost reduction in that as well. And ultimately, there's there's other benefits as well that we offer them, if they partner with us in bringing the workload to AWS. And so getting involved is really just reaching out to one of our AWS sales folks, or one of your account managers, if you have an account manager, and talk to them about workloads that you'd like to bring in. And we even go as far as helping you identify which applications are easiest to migrate. And so that you can kind of get going with some of the easier ones, while we help you with some of the more difficult ones. And strategies' about removing those roadblocks to bring your services to AWS. >> Takes the blockers away, Dave Brown, Vice President of EC2, the crown jewel of AWS, breaking down AppFlow, and the migration to Windows services. Great insights, appreciate the time. >> Thanks. >> We're here with Dave Brown, VP of EC2, as part of the virtual Cube coverage. Dave, I want to get your thoughts on an industry topic. Given what you've done with EC2, and the success, and with COVID-19, you're seeing that scale problem play out on the world stage for the entire population of the global world. This is now turning non-believers into believers of DevOps, web services, real time. I mean, this is now a moment in history, with the challenges that we have, even when we come out of this, whether it's six months or 12 months, the world won't be the same. And I believe that there's going to be a Cambrian explosion of applications. And an architecture that's going to look a lot like Cloud, Cloud-native. You've been doing this for many, many years, key architect of EC2 with your team. How do you see this playing out? Because a lot of people are going to be squirreling in rooms, when this comes back. They're going to be video conferencing now, but when they have meetings, they're going to look at the window of the future, and they're going to be exposed to what's failed. And saying, "We need to double down on that, "we have to fix this." So there's going to be winners and losers coming out of this pandemic, really quickly. And I think this is going to be a major opportunity for everyone to rally around this moment, to reset. And I think it's going to look a lot like this decoupled, this distributed computing environment, leveraging all the things that we've talked about in the past. So what's your advice, and how do you see this evolving? >> Yeah, I completely agree. I mean, I think, just the speed at which it happened as well. And the way in which organizations, both internally and externally, had to reinvent themselves very, very quickly, right? We've been very fortunate within Amazon, moving to working from home was relatively simple for the vast majority of us. Obviously, we have a number of our employees that work in data centers, and performance centers that have been on the front lines, and be doing a great job. But for the rest of us, it's been virtual video conferencing, right? All about meetings, and being able to use all of our networking tools securely, either over the VPN, or the no VPN infrastructure that we have. And many organizations had to do that. And so I think there are a number of different things that have impacted us right now. Obviously, virtual desktops has been a significant sort of growth point, right? Folks don't have access to the physical machine anymore, they're now all having to work remote, and so service like Workspaces, which runs on EC2, as well, has being a critical service data to support many of our largest customers. Our client VPN service, so we have within EC2 on the networking side, has also been critical for many large organizations, as they see more of their staff working everyday remotely. It has also seen, been able to support a lot of customers there. Just more broadly, what we've seen with COVID-19, is we've seen some industries really struggle, obviously travel industry, people just aren't traveling anymore. And so there's been immediate impact to some of those industries. They've been other industries that support functions like the video conferencing, or entertainment side of the house, has seen a bit of growth, over the last couple of months. And education has been an interesting one for us as well, where schools have been moving online. And behind the scenes in AWS, and on EC2, we've been working really hard to make sure that our supply chains are not interrupted in any way. The last thing we want to do is have any of our customers not be able to get EC2 capacity, when they desperately need it. And so we've made sure that capacity is fully available, even all the way through the pandemic. And we've even been able to support customers with, I remember one customer who told me the next day, they're going to have more than hundred thousand students coming online. And they suddenly had to grow their business, by some crazy number. And we were able to support them, and give them the capacity, which is way outside of any sort of demand--. >> I think this is the Cambrain explosion that I was referring to, because a whole new set of new things have emerged. New gaps in businesses have been exposed, new opportunities are emerging. This is about agility. It's real time now. It's actually happening for everybody, not just the folks on the inside of the industry. This is going to create a reinvention. So it's ironic, I've heard the word reinvent mentioned more times now, over the past three months, than I've heard it representing to Amazon. 'Cause that's your annual conference, Reinvent, but people are resetting and reinventing. It's actually a tactic, this is going on. So they're going to need some Clouds. So what do you say to that? >> So, I mean, the first thing is making sure that we can continue to be highly available, continue to have the capacity. The worst scenario is not being able to have the capacity for our customers, right? We did see that with some providers, and that honesty on outside is just years and years of experience of being able to manage supply chain. And the second thing is obviously, making sure that we remain available, that we don't have issues. And so, you know, with all of our stuff going remote and working from home, all my teams are working from home. Being able to support AWS in this environment, we haven't missed a beat there, which has been really good. We were well set up to be able to absorb this. And then obviously, remaining secure, which was our highest priority. And then innovating with our customers, and being able to, and that's both products that we're going to launch over time. But in many cases, like that education scenario I was talking about, that's been able to find that capacity, in multiple regions around the world, literally on a Sunday night, because they found out literally that afternoon, that Monday morning, all schools were virtual, and they were going to use their platform. And so they've been able to respond to that demand. We've seen a lot more machine learning workloads, we've seen an increase there as well as organizations are running more models, both within the health sciences area, but also in the financial areas. And also in just general business, (mumbles), yes, wherever it might be. Everybody's trying to respond to, what is the impact of this? And better understand it. And so machine learning is helping there, and so we've been able to support all those workloads. And so there's been an explosion. >> I was joking with my son, I said, "This world is interesting." Amazon really wins, that stuff's getting delivered to my house, and I want to play video games and Twitch, and I want to build applications, and write software. Now I could do that all in my home. So you went all around. But all kidding aside, this is an opportunity to define agility, so I want to get your thoughts, because I'm a bit a big fan of Amazon. As everyone knows, I'm kind of a pro Amazon person, and as other Clouds kind of try to level up, they're moving in the same direction, which is good for everybody, good competition and all. But S3 and EC2 have been the crown jewels. And building more services around those, and creating these abstraction layers, and new sets of service to make it easier, I know has been a top priority for AWS. So can you share your vision on how you're going to make EC2, and all these services easier for me? So if I'm a coder, I want literally no code, low code, infrastructure as code. I need to make Amazon more programmable and easier. Can you just share your vision on, as we talk about the virtual summits, as we cover the show, what's your take on making Amazon easier to consume and use? >> It's been something we thought a lot over the years, right? When we started out, we were very simple. The early days of EC2, it wasn't that rich feature set. And it's been an interesting journey for us. We've obviously become a lot more, we've written, launched local features, which narrative brings some more complexity to the platform. We have launched things like Lightsail over the years. Lightsail is a hosting environment that gives you that EC2 like experience, but it's a lot simpler. And it's also integrated with a number of other services like RDS and ELB as well, basic load balancing functionality. And we've seen some really good growth there. But what we've also learned is customers enjoy the richness of what ECU provides, and what the full ecosystem provides, and being able to use the pieces that they really need to build their application. From an S3 point of view, from a board ecosystem point of view. It's providing customers with the features and functionality that they really need to be successful. From the compute side of the house, we've done some things. Obviously, Containers have really taken off. And there's a lot of frameworks, whether it's EKS, or community service, or a Docker-based ECS, has made that a lot simpler for developers. And then obviously, in the serverless space, Landers, a great way of consuming EC2, right? I know it's serverless, but there's still an EC2 instance under the hood. And being able to bring a basic function and run those functions in serverless is, a lot of customers are enjoying that. The other complexity we're going after is on the networking side of the house, I find that a lot of developers out there, they're more than happy to write the code, they're more than happy to bring their reputation to AWS. But they struggle a little bit more on the networking side, they really do not want to have to worry about whether they have a route to an internet gateway, and if their subnets defined correctly to actually make the application work. And so, we have services like App Mesh, and the whole mesh server space is developing a lot. To really make that a lot simpler, where you can just bring your application, and call it on an application that just uses service discovery. And so those higher level services are definitely helping. In terms of no code, I think that App Mesh, sorry not App Mesh, AppFlow is one of the examples for already given organizations something at that level, that says I can do something with no code. I'm sure there's a lot of work happening in other areas. It's not something I'm actively thinking on right now , in my role in leading EC2, but I'm sure as the use cases come from customers, I'm sure you'll see more from us in those areas. They'll likely be more specific, though. 'Cause as soon as you take code out of the picture, you're going to have to get pretty specific in the use case. You already get the depth, the functionality the customers will need. >> Well, it's been super awesome to have your valuable time here on the virtual Cube for covering Amazon Summit, Virtual Digital Event that's going on. And we'll be going on throughout the year. Really appreciate the insight. And I think, it's right on the money. I think the world is going to have in six to 12 months, surge in reset, reinventing, and growing. So I think a lot of companies who are smart, are going to reset, reinvent, and set a new growth trajectory. Because it's a Cloud-native world, it's Cloud-computing, this is now a reality, and I think there's proof points now. So the whole world's experiencing it, not just the insiders, and the industry, and it's going to be an interesting time. So really appreciate that, they appreciate it. >> Great, >> Them coming on. >> Thank you very much for having me. It's been good. >> I'm John Furrier, here inside theCUBE Virtual, our virtual Cube coverage of AWS Summit 2020. We're going to have ongoing Amazon Summit Virtual Cube. We can't be on the show floor, so we'll be on the virtual show floor, covering and talking to the people behind the stories, and of course, the most important stories in silicon angle, and thecube.net. Thanks for watching. (upbeat music)

Published Date : May 13 2020

SUMMARY :

leaders all around the world, and most of the parts Hey John, it's really great to be here, and certainly on the large And so the problem we started to see was, in the industry that you guys And is it bi-directional for the customer? and encryption of data on the internet. And I want to get your thoughts on this. and a lot of customers we spoke to, And I think there's a world in the ecosystem of say, Snowflake. benefits of the goodness And so in the case of AppFlow, of our services-- and figure out how to scale And so the one part of the really makes the data fresh, Exactly, and the other thing is, and I said, "Hey, you know what? So what do I do? And the first thing you got to do, that the vast majority and just building those connectors. And then the other one is going to be, the top use cases nailed down, One of the things that doesn't really kind of square in my mind, of how to do all the And in the Windows We are seeing a trend. and I want you to just give an example, And so the push the Cloud's bringing, What's the onboarding process? And so that you can kind of get going and the migration to Windows services. And I believe that there's going to And the way in which organizations, inside of the industry. And the second thing is obviously, But S3 and EC2 have been the crown jewels. and the whole mesh server and it's going to be an interesting time. Thank you very much for having me. and of course, the most important stories

ENTITIES

Entity	Category	Confidence
Dave Brown	PERSON	0.99+
Dave	PERSON	0.99+
14	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
John	PERSON	0.99+
100%	QUANTITY	0.99+
John Furrier	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
October 2008	DATE	0.99+
Jeff	PERSON	0.99+
Palo Alto	LOCATION	0.99+
2003	DATE	0.99+
2018	DATE	0.99+
Andy Jassy	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Amazon Web Services	ORGANIZATION	0.99+
April 22	DATE	0.99+
14 partners	QUANTITY	0.99+
six months	QUANTITY	0.99+
two parts	QUANTITY	0.99+
22nd of April	DATE	0.99+
Windows	TITLE	0.99+
Snowflake	TITLE	0.99+
12 months	QUANTITY	0.99+
AppFlow	TITLE	0.99+
first	QUANTITY	0.99+
SQL Server	TITLE	0.99+
SQL	TITLE	0.99+
Linux	TITLE	0.99+
EC2	TITLE	0.99+

Ajay Vohora & Lester Waters, Io-Tahoe | AWS re:Invent 2019

>>LA Las Vegas. It's the cube covering AWS reinvent 2019, brought to you by Amazon web services and they don't care along with its ecosystem partners. >>Fine. Oh, welcome back here to Las Vegas. We are alive at AWS. Reinvent a lot with Justin Warren. I'm John Walls day one of a jam pack show. We had great keynotes this morning from Andy Jassy, uh, also representatives from Goldman Sachs and number of other enterprises on this stage right now we're gonna talk about data. It's all about data with IO Tahoe, a couple of the companies, representatives, CEO H J for horror. Jorge J. Thanks for being with us. Thank you Joan. And uh, Lester waters is the CSO at IO Tahoe. Leicester. Good afternoon to you. Thanks for being with us. Thank you for having us. CJ, you brought a football with you there. I see. So you've come prepared for a sport sport. I love it. All right. But if this is that your booth and your, you're showing here I assume and exhibiting and I know you've got a big offering we're going to talk about a little bit later on. First tell us about IO Tahoe a little bit to inform our viewers right now who might not be too familiar with the company. >>Sure. Well, our background was dealing with enterprise scale data issues that were really about the complexity, the amount of data and different types of data. So 2014 around when we're in stealth, kind of working on our technology, uh, the, a lot of the common technologies around them were Apache base. So Hadoop, um, large enterprises that were working with like a GE, Comcast had a cow help us come out of stealth in 2017. Uh, and grave, it's gave us a great story of solving petabyte scale data challenges, uh, using machine learning. So, uh, that manual overhead, that more and more as we look at, uh, AWS services, how do we drive the automation and get the value from data, uh, automation. >>It's gotta be the way forwards. All right, so let's, let's jump onto that then. Uh, on, on that notion, you've got this exponential growth in data, obviously working off the edge internet of things. Um, all these inputs, right? And we have so much more information at our disposal. Some of it's great, some of it's not. How do we know the difference, especially in this world where this exponential increase has happened. Lester, I mean, just tackle that for, from a, uh, from a company perspective and identifying, you know, first off, how do we ever figure out what do we have that's that valuable? Where do we get the value out of that, right? And then, um, how do we make sense of it? How do we put it into practice? >>Yeah. So I think not most enterprises have a problem with data sprawl. There's project startup, we get a block of data and then all of a sudden the new, a new project comes along, they take a copy of that data. There's another instance of it. Then there's another instance for another project. >>And suddenly these different data sources become authoritative and become production. So now I have three, four, or five different instances. Oh, and then there's the three or four that got canceled and they're still sitting around. And as an information security professional, my challenge is to know where all of those pieces of data are so that, so that I can govern it and make sure that the stuff I don't need is gotten rid of it deleted. Uh, so you know, using the IO Tahoe software, I'm able to catalog all of that. I'm able to garner insights into that data using the, the nine patent pending algorithms that we have, uh, to, to find that, uh, to do intelligent tagging, if you will. So, uh, from my perspective, I'm very interested in making sure that I'm adhering to compliance rules. So the really cool thing about the stuff is that we go and tag data, we look at it and we actually tie it to lines of regulations. So you could go CC CCPA. This bit of text here applies to this. And that's really helpful for me as an information security professional because I'm not necessarily versed on every line of regulation, but when I can go and look at it handily like that, it makes it easier for me to go, Oh, okay, that's great. I know how to treat that in terms of control. So that for, that's the important bit for me. So if you don't know where your data is, you can't control it. You can't monitor it. >>Governance. Yeah. The, the knowing where stuff is, I'm familiar with a framework that was developed at Telstra back in Australia called the five no's, which is about exactly that. Knowing where your data is, what is it, who has access to it? Cause I actually being able to cattle on the data then like knowing what it is that you have. This is a mammoth task. I mean that's, that's hard enough 12 years ago. But like today with the amount of data that's actually actively being created every single day, so how, how does your system help CSOs tackle this, this kind of issue and maybe less listed. You can, you can start off and then, then you can tell us a bit more of yourself. >>Yeah, I mean I'll start off on that. It's a, a place to kind of see the feedback from our enterprise customers is as that veracity and volume of data increases. The, the challenge is definitely there to keep on top of governing that. So continually discovering that new data created, how is it different? How's it adding to the existing data? Uh, using machine learning and the models that we create, whether it's anomaly detection or classifying the data based on certain features in the data that allows us to tag it, load that in our catalog. So I've discovered it now we've made it accessible. Now any BI developer data engineer can search for that data in a catalog and make something from it. So if there were 10 steps in that data mile, we definitely sold the first four or five to of bring that momentum to getting value from that data. So discovering it, catalog it, tagging the data to make it searchable, and then it's free to pick up for whatever use case is out there, whether it's migration, security, compliance, um, security is a big one for you. >>And I would also add too, for the data scientists, you know, knowing all the assets they have available to them in order to, to drive those business value insights that they're so important these days. For companies because you know, a lot of companies compete on very thin margins and, and, and having insights into their data and to the way customers can use their data really can make, make or break a company these days. So that's, that's critical. And as Aja pointed out, being able to automate that through, through data ops if you will, uh, and drive those insights automatically is great. Like for example, from an information security standpoint, I want to fingerprint my data and I want to feed it into a DLP system. And so that, you know, I can really sort of keep an eye out if this data is actually going out. And it really is my data versus a standard reject kind of matching, which isn't the best, uh, techniques. So >>yeah. So walk us through that in a bit more detail. So you mentioned tagging is essentially that a couple of times. So let's go into the details a little bit about what that, what that actually means for customers. My understanding is that you're looking for things like a social security number that could be sitting somewhere in this data. So finding out where are all these social security numbers that I may not be aware of and it could be being shared with someone who shouldn't have access to that, but it is there, is that what it is or are they, are there other kinds of data that you're able to tag that traditional purchase? >>Yeah. Was wait straight out of the box. You've got your um, PII or personally, um, identifiable information, that kind of day that is covered under the CCPA GDPR. So there are those standards, regulatory driven definitions that is social security number name, address would fall under. Um, beyond that. Then in a large enterprise, you've got a clever data scientists, data engineers you through the nature of their work can combine sets of data that could include work patterns, IDs, um, lots of activity. You bring that together and that suddenly becomes, uh, under that umbrella of sensitive. Um, so being able to tag and classify data under those regulatory policies, but then is what and what could be an operational risk to an organization, whether it's a bank, insurance, utility, health care in particular, if you work in all those verticals or yeah, across the way, agnostic to any vertical. >>Okay. All right. And the nature of being able to do that is having that machine learning set up a baseline, um, around what is sensitive and then honing that to what is particular to that organization. So, you know, lots of people will use ever sort of seen here at AWS S three, uh, Aurora, Postgres or, or my sequel Redshift. Um, and also different ways the underlying sources of that data, whether it's a CRM system, a IOT, all of those sources have got nuances that makes every enterprise data landscape just slightly different. So China make a rules based, one size fits all approach is, is going to be limiting, um, that the increase your manual overhead. So customers like GE, Comcast, um, that move way beyond throwing people at the problem, that's no longer possible. Uh, so being smart about how to approach this, classifying the data, using features in the data crane, that metadata as an asset just as an eight data warehouse would be, allows you to, to enable the rest of the organization. >>So, I mean, you've talked about, um, you know, deriving value and identifying value. Um, how does ultimately, once you catalog your tag, what does this mean to the bottom line of terms of ROI? How does AWS play into that? Um, you know, why am I as, as a, as a company, you know, what value am I getting out of, of your abilities with AWS and then having that kind of capability. >>Yeah. We, we did a great study with Forester. Um, they calculated the ROI and it's a mixture of things. It's that manual personnel overhead who are locked into that. Um, pretty unpleasant low productivity role of wrangling with data for want of a better words to make something of it. They'd much rather be creating the dashboards that the BI or the insights. Um, so moving, you know, dozens of people from the back office manual wrangling into what's going to make difference to the chief marketing officer and your CFO bring down the cost of served your customer by getting those operational insights is how they want to get to working with that data. So that automation to take out the manual overhead of the upfront task is an allowing that, that resource to be better deployed onto the more interesting productive work. So that's one part of the ROI. >>The other is with AWS. What we've found here engaging with the AWS ecosystem is just that speed of migration to AWS. We can take months out of that by cataloging what's on premise and saying, huh, I date aside. So our data engineering team want to create products on for their own customers using Sage maker using Redshift, Athena. Um, but what is the exact data that we need to push into the cloud to use those services? Is it the 20 petabytes that we've accumulated over the 20 last 20 years? That's probably not going to be the case. So tiering the on prem and cloud, um, base of that data is, is really helpful to a data officer and an information architect to set themselves up to accelerate that migration to AWS. So for people who've used this kind of system and they've run through the tagging and seen the power of the platform that you've got there. So what are some of the things that they're now able to do once they've got these highly qual, high quality tagged data set? >>So it's not just tagging too. We also do, uh, we do, we do, we do fuzzy, fuzzy magic so we can find relationships in the data or even relationships within the data in terms of duplicate. So, so for example, somebody, somebody got married and they're really the same, you know, so now there's their surname has changed. We can help companies find that, those bits of a matching. And I think we had one customer where we saved about, saved him about a hundred thousand a year in mailing costs because they were sending, you know, to, you know, misses, you know, right there anymore. Her name was. And having the, you know, being able to deduplicate that kind of data really helps with that helps people save money. >>Yep. And that's kind of the next phase in our journey is moving beyond the tag in the classification is uh, our roadmap working with AWS is very much machine learning driven. So our engineering team, uh, what they're excited about is what's the next model, what's the next problem we can solve with AI machine learning to throw at the large scale data problem. So we'll continually be curating and creating that metadata catalog asset. So allow that to be used as a resource to enable the rest of the, the data landscape. >>And I think what's interesting about our product is we really have multiple audiences for it. We've got the chief data officer who wants to make sure that we're completely compliant because it doesn't want that 4% potential fine. You know, so being able to evidence that they're having due diligence and their data management will go a long way towards if there is a breach because zero days do happen. But if you can evidence that you've really been, been, had a good discipline, then you won't get that fine or hopefully you won't get a big fine. And that the second audience is going to be information security professionals who want to secure that perimeter. The third is going to be the data architects who are trying to, to uh, to, you know, manage and, and create new solutions with that data. And the fourth of course is the data scientists trying to drive >>new business value. >>Alright, well before we, we, we, we um, let y'all take off, I want to know about, uh, an offering that you've launched this week, uh, apparently to great success and you're pretty excited about just your space alone here, your presence here. But tell us a little bit about that before you take off. >>Yeah. So we're here also sponsoring the jam lounge and everybody's welcome to sign up. It's, um, a number of our friends there to competitively take some challenges, come into the jam lounge, use our products, and kind of understand what it means to accelerate that journey onto AWS. What can I do if I show what what? Yeah, give me, give me an idea about the blog. You can take some chances to discover data and understand what data is there. Isn't there fighting relationships and intuitively through our UI, start exploring that and, and joining the dots. Um, uh, what, what is my day that knowing your data and then creating policies to drive that data into use. Cool. Good. And maybe pick up a football along the way so I know. Yeah. Thanks for being with us. Thank you for half the time. And, uh, again, the jam lounge, right? Right, right here at the SAS Bora AWS reinvent. We are alive. And you're watching this right here on the queue.

Published Date : Dec 4 2019

SUMMARY :

AWS reinvent 2019, brought to you by Amazon web services So you've come prepared for So Hadoop, um, large enterprises that were working with like and identifying, you know, first off, how do we ever figure out what do we have that's that There's project startup, we get a block of data and then all of a sudden the new, a new project comes along, So that for, that's the important bit for me. it is that you have. tagging the data to make it searchable, and then it's free to pick up for And I would also add too, for the data scientists, you know, knowing all the assets they So let's go into the details a little bit about what that, what that actually means for customers. Um, so being able to tag and classify And the nature of being able to do that is having Um, you know, why am I as, as a, as a company, you know, what value am I Um, so moving, you know, dozens of people from the back office base of that data is, is really helpful to a data officer and And having the, you know, being able to deduplicate that kind of data really So allow that to be used as a resource And that the second audience is going you take off. start exploring that and, and joining the dots.

ENTITIES

Entity	Category	Confidence
Comcast	ORGANIZATION	0.99+
GE	ORGANIZATION	0.99+
Justin Warren	PERSON	0.99+
Andy Jassy	PERSON	0.99+
Goldman Sachs	ORGANIZATION	0.99+
Australia	LOCATION	0.99+
2017	DATE	0.99+
Joan	PERSON	0.99+
AWS	ORGANIZATION	0.99+
10 steps	QUANTITY	0.99+
three	QUANTITY	0.99+
Las Vegas	LOCATION	0.99+
2014	DATE	0.99+
Telstra	ORGANIZATION	0.99+
Jorge J.	PERSON	0.99+
five	QUANTITY	0.99+
Ajay Vohora	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
20 petabytes	QUANTITY	0.99+
four	QUANTITY	0.99+
John Walls	PERSON	0.99+
IO Tahoe	ORGANIZATION	0.99+
4%	QUANTITY	0.99+
Io-Tahoe	PERSON	0.99+
one customer	QUANTITY	0.99+
First	QUANTITY	0.99+
CJ	PERSON	0.99+
Redshift	TITLE	0.99+
third	QUANTITY	0.99+
12 years ago	DATE	0.98+
fourth	QUANTITY	0.98+
today	DATE	0.98+
Lester Waters	PERSON	0.98+
H J	PERSON	0.97+
Aja	PERSON	0.97+
Forester	ORGANIZATION	0.97+
CCPA	TITLE	0.97+
this week	DATE	0.97+
zero days	QUANTITY	0.96+
about a hundred thousand a year	QUANTITY	0.96+
first	QUANTITY	0.95+
second audience	QUANTITY	0.94+
nine	QUANTITY	0.94+
LA Las Vegas	LOCATION	0.94+
Sage	ORGANIZATION	0.92+
Leicester	LOCATION	0.91+
Apache	ORGANIZATION	0.9+
Lester	PERSON	0.9+
SAS Bora	ORGANIZATION	0.88+
first four	QUANTITY	0.87+
one part	QUANTITY	0.87+
one	QUANTITY	0.87+
2019	DATE	0.85+
Hadoop	ORGANIZATION	0.84+
Aurora	TITLE	0.82+
dozens of people	QUANTITY	0.79+
Redshift	ORGANIZATION	0.78+
Postgres	ORGANIZATION	0.76+
20	DATE	0.75+
eight data warehouse	QUANTITY	0.74+
five different	QUANTITY	0.73+
CEO	PERSON	0.7+
single day	QUANTITY	0.69+
China	LOCATION	0.68+
20 last	QUANTITY	0.65+
Athena	LOCATION	0.63+
morning	DATE	0.55+
Invent	EVENT	0.54+
GDPR	TITLE	0.53+
S three	TITLE	0.52+
years	QUANTITY	0.51+
no	OTHER	0.4+
waters	ORGANIZATION	0.39+

Angie Embree, Best Friends Animal Society | AWS Imagine Nonprofit 2019

>> Narrator: From Seattle, Washington it's the CUBE covering AWS Imagine non-profit. Brought to you by Amazon web services. >> Hey welcome back everybody, Jeff Frick here with the CUBE. We're on the waterfront in Seattle, it's an absolutely gorgeous couple of days here at the AWS Imagine Nonprofit Conference. We went to the AWS Imagine Education Conference, this is really all about nonprofits and we're hearing all kinds of interesting stories about how these people are using AWS to help conquer really big problems. We're going to shift gears a little bit from the two footed problems to the four footed problems and that's animals and everybody likes animals but nobody likes animal shelters and nobody likes the ultimate solution that many animal shelters used to use to take care of problems. But thank you to our next guest, that is not quite the case so much anymore. So we're really happy to have Angie Embree on. She is the CIO of Best Friends Animal Society, Angie great to see you. >> It's great to see you as well and thank you for having me. >> Oh absolutely! So before we got on I just heard this crazy, crazy statistic that when your organization started in 1984 approximately 17 million animals were killed in US shelters per year. That number is now down to 700 thousand, that is a giant, giant reduction. And yet you, with big audacious goals really are looking to get that to zero. So, that's a giant goal, give us a little bit of background on the organization and how you decided to go after a goal like that and some of the ways you are actually going to achieve it. >> Well, the organization started in 1984 and it started with a group of friends in Southern Utah who decided that, you know the killing in America's shelters just had to go. So really the Best Friends founders started the no-kill movement along with a gentleman in San Francisco by the name of Rich Avanzino. And as you said, they took you know the killing down from 17 million in 1984 to approximately 733 thousand now. The organization started as just the sanctuary, we have the largest no-kill companion animal sanctuary in the country where we hold about 17 hundred animals every day. And we also have, you know, knowing that we needed to help out the rest of the country we have built life saving centers in Houston, Texas. Or we're working on Houston, Texas but Los Angels, California, New York City, Salt Lake City, Atlanta, Georgia, it seems like I've left somebody out but, >> Probably, but that's okay. >> We have life saving centers all over the country. So it was really, you know, when they realized what was going on in America's shelters it was really the idea that we should not be killing animals for space. So, just recently in fact, I will say recently but in the last few years, Julie Castle our CEO put kind of, did our moon shot, put that stake in the ground and said we're going to take this country no-kill by the year 2025. >> Right. >> So it's super exciting. >> So it's really interesting because you guys are trying to execute your vision, and it's easy to execute your own vision, but it's a whole different thing when you're trying to execute your vision through this huge infrastructure of shelters that have been around forever. So, I wonder if you can explain kind of what's your relationship with shelters that you don't own. I guess, I think you said before we turned on the cameras there are affiliates, so how does that relationship work? How do you help them achieve your goal which is no-kill. >> Yeah, so we have over 27 hundred network partners around the country. And what we do is we help to educate them on, you know we understand their problems, we have creative programs to solve those problems. So we help to educate them on, you know, how they can implement these programs within their shelters. We provide them grant funding, we have an annual conference every year where they can come and learn. But they're really our partners and you know we know we can't do it alone. It's going to take us, it's going to take them and it's going to take everybody in every community to really step up and help solve the problem. >> Right, and what was the biggest thing that changed in terms of kind of attitude in terms of the way they operate the shelter because I think you said before that a lot of the killing was done to make room. >> Right, killing is done usually for space. >> So what do they do know? Clearly the space demands probably haven't changed so what are they doing alternatively where before they would put the animal down? >> Well alternatively we're doing transport programs. So there are areas in the country that actually have a demand for animals. So instead of killing the animals, we put them on some sort of transport vehicle and we take them to the areas that are in demand. We also do what's called a trap-neuter-return program. So one of the biggest problems across the country are community cats so those, a lot of people call them feral cats but they're community cats and usually have a caretaker. But what we do is we trap those cats, we take them into the shelter, we neuter them and vaccinate them and then return them to their home. That keeps them from making a lot of other little cats. >> Making babies (laughs) >> So yeah, cat's are one of the biggest problems in shelters today because of the community cats, they're feral cats and they're not adoptable. So if we can, we don't have to kill them. We can, you know, we can keep them from reproducing as I said and then we can put them back in their habitat where they live a long healthy life, happy life. >> Right, so you said you've joined the organization 5 years ago, 5 and 1/2 years ago and you're the CIO, first ever CIO. >> I am (laughs) >> What brought you here and then now that you're here with kind of a CIO hat, what are some of the new perspective that you can bring to the organization that didn't necessarily, that they had had before from kind of a technical perspective? >> Well, what brought me here was, I never expected to be here, if you would have told me I would be the CIO at Best Friends Animal Society you know 10 years ago I would have said you're kidding because I didn't really realize that there were professional positions in organizations like Best Friends. But I, you know, my journey begins the same as, began the same as a lot of peoples did. I was that little kid always bringing home animals and you know my mother hated it. You know it was always something showing up at our doorstep with me, you know. And I just loved animals all my life and as I went through college and got my degree and started my professional career, then I thought well I'm going to of course have animals because I can have as many as I want now, right! (laughs) So I started adopting, and I didn't even realize until I was in my 30s that they were killing in shelters and I learned that in Houston, Texas when I lived there. I was working for IBM at the time, and one day a lady came on the television and she said they were doing a new segment and she said we're a no-kill shelter and I thought oh my god if there are no-kill shelters then there are kill shelters, right? >> There must be the other. >> Yeah so, to make a long story short then I started not working in animal welfare but doing more to support the movement and donating. Adopting from shelters and fostering animals and then one day I had been to Best Friends as a visitor vacationing in this beautiful part of Utah. But I saw the CIO ... >> Position. >> position open and I said I'm going for it. >> Good for you. >> Yeah. >> Good for you, so now you're there so what are some of the things you've implemented from kind of a techy, you know kind of data perspective that they didn't have before? >> Well, they didn't have a lot. >> They probably didn't have a lot, besides email and the obvious things. >> Being the first CIO I don't know that I knew what I was walking into at the time because I got to Kanab, and Kanab Utah where the sanctuary is, is the headquarters. And Kanab is very infrastructure challenged. >> (laughs) Infrastructure challenged, I like that. >> There is one ISP in Kanab and there is no redundancy in networks so we really don't have, you know, you come from the city and you think, you take these things for granted and you find out oh my god, what am I going to do? And Kanab is you know the hub of our network, so if Kanab goes down, you know the whole organization is down so one of the first decisions I made was that we were going to the cloud. >> Right, right. >> Because we had to get Kanab out of that position and that was one of our, one of the first major decisions I made and we chose AWS as our partner to do that so that was very very exciting. We knew that they had infrastructure we couldn't dream of providing. >> Right, right. >> And, you know we could really make our whole network more robust, our applications would be available and we could really do some great things. >> You're not worried about the one ISP provider in Kanab because of an accident that knocks a phone pole down. >> Yeah, yeah. >> All right but then you're talking about some new things that you're working on and a new thing you talked about before we turned the cameras on community lifesaving dashboards, what is that all about? >> Okay, so a couple of years ago the community lifesaving dashboard is the culmination of two years of work. From all across the Best Friends organization not just the IT department, in fact it was the brainchild of our Chief Mission Officer Holly Sizemore. But it's really, in animal welfare there's never been a national picture of what the problem really is regarding killing animals in shelters. So we did this big. >> Because they're all regional right? They're all regional shelters, very local. >> They're all local community shelters, yes. And transparency isn't forced, so you know some states force transparency, they reinforce in the report numbers but a lot of states don't. >> At the state level. >> Yeah, a lot of states don't, so. You know when you're killing animals in shelters you really don't want people to know that. >> Yeah, yeah it's not something you want to advertise. >> Because the American public doesn't believe in it. So anyway we worked really hard to collect all this data from across the country and we put it all into this dashboard and it is now a tool where anybody in the public, it's on our website, can look at it and they can see that where we're at from a national level. They can see where they're at from a state level, they can drill down into their community and they can drill down to an individual shelter. >> Wow. >> And the idea behind the dashboard is to really, is to get communities behind helping their shelters. Because as I said earlier, it's going to take us all. >> Right. >> And not only Best Friends and our partners but the public plays a big part of this. >> Right, and so when did that roll out? Do you have any kind of feedback, how's it working? >> It's working wonderfully, we rolled it out at our conference in July. >> So recently, so it's a pretty new initiative. >> Yeah it's just a few weeks old. >> Okay. >> We rolled it out at our national conference and we were all a bit nervous about it, you know especially from a technology perspective. >> Right, right. >> We knew that being the first of it's kind ever in animal welfare that you know it was going to get a lot of publicity both inside and outside the movement. >> (laughs) How you want to say both pro and con. >> Yeah, and it's sitting on our website, well really pro and con. >> Right, right. >> But it's sitting on our website and we're like okay, we don't know what kind of traffic we're going to get, you know what are we going to do about this? So we spent a lot of time with Amazon prior to the launch, you know having them look at our environment and getting advice, discussing it with them. >> Not going to bring down that ISP in Utah. >> No, thank god! (laughs) >> (laughs) >> No it wasn't, thank god we were in the cloud. So Amazon really helped us prepare and then the day of the launch, we knew the time of the launch. So we actually had a war room set up, a virtual war room and we had Amazon employees participating in our war room. We watched the traffic and we did get huge spikes in traffic at all times through the day when certain things were happening. And I'm happy to say from a technology perspective it was a non-event because we did not crash we stayed up, we handled all the traffic, we scaled when we needed to, and we did it you know, virtually at the press of a button. >> Awesome. >> Or the flick of a switch, whatever you want to say. >> That's what you want right? >> Yeah, exactly. >> You just don't want anyone to know, I was like give a good ref, nobody's talking about you you probably did a good job. >> Yeah, exactly yeah. >> Good, so before I let you go so what are some of your initiatives now looking forward. You've got this great partner in AWS, you have basically as much horsepower as you need to get done what you need to get done. What are some of the things that you see, you know kind of next for your roadmap? >> Well, we have a lot. >> Don't give me the whole list (laughs) >> No I'm just going to hit on a few key points. I think, you know we used Amazon initially as our cloud infrastructure but I think the biggest thing we're looking at is platform as a service. There is so much capability out there with predictive analytics, machine learning, artificial intelligence, ARVR, you name it facial recognitions, so we're really investigating those technologies because we think they have you know they could have a huge impact on our movement and really help us achieve life saving. >> Right, right. >> And, I think that, you know we're starting we have our fledgling data science program. We're using the Amazon data lake technology, Athena, Glue, they were just telling me about data lake formation which I just a few minutes ago emailed my data guy and said start looking at data lake formation. >> Right, right. >> So, I mean we're really investing in the platform as a service. The other thing I see is that we're, animal welfare is sort of broken from a technology perspective and a data perspective. In that we have no interoperability and you know we don't have the data available. So lets say you want to adopt a 5-year old animal. Well, you go to a shelter you can't get 5 years of history on a 5 year old animal. So it's really starting to fix the foundation for the movement as a whole, not just Best Friends. So, making sure that you know the veterinary data is there, all the data from the pet ecosystem is there. So we're investigating with AWS they're actually coming to our sanctuary in a couple of months, we're going to do a workshop to figure out how we do this, how we really fix it so that we have interoperability between every shelter when an animal moves from shelter to rescue or whatever so that their data follows them wherever they go. So adopters are fully informed when adopting an animal. >> Because you're in a pretty interesting position, because you're not with any one particular shelter you kind of cross many many boundaries. So you're in a good position to be that aggregator of that data. >> Yeah, I don't know that we want to be the aggregator but we want to lead the movement towards doing that. Just getting the technology players, the shelter management systems, the other people who play a role in technology for animal welfare, getting them in a room and talking and figuring out this problem is huge. >> Right. >> And with a partner like Amazon we feel it can be solved. >> Right. Well Angie thank you for taking a few minutes and sharing your story, really really enjoyed hearing it. >> All right thank you so much. >> All right, she's Angie, I'm Jeff you're watching the CUBE we're at AWS Imagine in Seattle, thanks for watching we'll see you next time. (upbeat music)

Published Date : Aug 13 2019

SUMMARY :

Brought to you by Amazon web services. and nobody likes the ultimate solution It's great to see you as well and some of the ways you are actually going to achieve it. And we also have, you know, knowing that we needed to So it was really, you know, when they realized So it's really interesting because you guys So we help to educate them on, you know, how they can before that a lot of the killing was done to make room. So instead of killing the animals, we put them on We can, you know, we can keep them from reproducing Right, so you said you've joined the organization and you know my mother hated it. and then one day I had been to Best Friends and the obvious things. Being the first CIO I don't know that I knew in networks so we really don't have, you know, and that was one of our, one of the first major And, you know we could really make in Kanab because of an accident So we did this big. Because they're all regional right? And transparency isn't forced, so you know you really don't want people to know that. and they can drill down to an individual shelter. And the idea behind the dashboard is to really, but the public plays a big part of this. at our conference in July. and we were all a bit nervous about it, you know in animal welfare that you know it was going to get Yeah, and it's sitting on our website, prior to the launch, you know having them look we scaled when we needed to, and we did it you know, I was like give a good ref, nobody's talking about you What are some of the things that you see, I think, you know we used Amazon initially And, I think that, you know we're starting and you know we don't have the data available. you kind of cross many many boundaries. Yeah, I don't know that we want to be the aggregator and sharing your story, really really enjoyed hearing it. we'll see you next time.

ENTITIES

Entity	Category	Confidence
David Nicholson	PERSON	0.99+
Chris	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Joel	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Peter	PERSON	0.99+
Mona	PERSON	0.99+
Dave Vellante	PERSON	0.99+
David Vellante	PERSON	0.99+
Keith	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Jeff	PERSON	0.99+
Kevin	PERSON	0.99+
Joel Minick	PERSON	0.99+
Andy	PERSON	0.99+
Ryan	PERSON	0.99+
Cathy Dally	PERSON	0.99+
Patrick	PERSON	0.99+
Greg	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Stephen	PERSON	0.99+
Kevin Miller	PERSON	0.99+
Marcus	PERSON	0.99+
Dave Alante	PERSON	0.99+
Eric	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Dan	PERSON	0.99+
Peter Burris	PERSON	0.99+
Greg Tinker	PERSON	0.99+
Utah	LOCATION	0.99+
IBM	ORGANIZATION	0.99+
John	PERSON	0.99+
Raleigh	LOCATION	0.99+
Brooklyn	LOCATION	0.99+
Carl Krupitzer	PERSON	0.99+
Lisa	PERSON	0.99+
Lenovo	ORGANIZATION	0.99+
JetBlue	ORGANIZATION	0.99+
2015	DATE	0.99+
Dave	PERSON	0.99+
Angie Embree	PERSON	0.99+
Kirk Skaugen	PERSON	0.99+
Dave Nicholson	PERSON	0.99+
2014	DATE	0.99+
Simon	PERSON	0.99+
United	ORGANIZATION	0.99+
Stu Miniman	PERSON	0.99+
Southwest	ORGANIZATION	0.99+
Kirk	PERSON	0.99+
Frank	PERSON	0.99+
Patrick Osborne	PERSON	0.99+
1984	DATE	0.99+
China	LOCATION	0.99+
Boston	LOCATION	0.99+
California	LOCATION	0.99+
Singapore	LOCATION	0.99+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Athena: