Sanjay Poonen, CEO & President, Cohesity | VMware Explore 2022

>>Good afternoon, everyone. And welcome back to the VMware Explorer. 2022 live from San Francisco. Lisa Martin, here with Dave. Valante good to be sitting next to you, sir. >>Yeah. Yeah. The big set >>And we're very excited to be welcoming buck. One of our esteemed alumni Sanja poin joins us, the CEO and president of cohesive. Nice to see >>You. Thank you, Lisa. Thank you, Dave. It's great to meet with you all the time and the new sort of setting here, but first >>Time, first time we've been in west, is that right? We've been in north. We've been in south. We've been in Las Vegas, right. But west, >>I mean, it's also good to be back with live shows with absolutely, you know, after sort of the two or three or hiatus. And it was a hard time for the whole world, but I'm kind of driving a little bit of adrenaline just being here with people. So >>You've also got some adrenaline, sorry, Dave. Yeah, you're good because you are new in the role at cohesive. You wrote a great blog that you are identified. The four reasons I came to cohesive. Tell the audience, just give 'em a little bit of a teaser about that. >>Yeah, I think you should all read it. You can Google and, and Google find that article. I talked about the people Mohi is a fantastic founder. You know, he was the, you know, the architect of the Google file system. And you know, one of the senior Google executives was on my board. Bill Corrin said one of the smartest engineers. He was the true father of hyperconverge infrastructure. A lot of the code of Nutanix. He wrote, I consider him really the father of that technology, which brought computer storage. And when he took that same idea of bringing compute to secondary storage, which is really what made the scale out architect unique. And we were at your super cloud event talking about that, Dave. Yeah. Right. So it's a people I really got to respect his smarts, his integrity and the genius, what he is done. I think the customer base, I called a couple of customers. One of them, a fortune 100 customer. I, I can't tell you who it was, but a very important customer. I've known him. He said, I haven't seen tech like this since VMware, 20 years ago, Amazon 10 years ago and now Ko. So that's special league. We're winning very much in the enterprise and that type of segment, the partners, you know, we have HPE, Cisco as investors. Amazon's an investors. So, you know, and then finally the opportunity, I think this whole area of data management and data security now with threats, like ransomware big opportunity. >>Okay. So when you were number two at VMware, you would come on and say, we'd love all our partners and of course, okay. So you know, a little bit about how to work with, with VMware. So, so when you now think about the partnership between cohesive and VMware, what are the things that you're gonna stress to your constituents on the VMware side to convince them that Hey, partnering with cohesive is gonna gonna drive more value for customers, you know, put your thumb on the scale a little bit. You know, you gotta, you gotta unfair advantage somewhat, but you should use it. So what's the narrative gonna be like? >>Yeah, I think listen with VMware and Amazon, that probably their top two partners, Dave, you know, like one of the first calls I made was to Raghu and he knew about this decision before. That's the level of trust I have in him. I even called Michael Dell, you know, before I made the decision, there's a little bit of overlap with Dell, but it's really small compared to the overlap, the potential with Dell hardware that we could compliment. And then I called four CEOs. I was, as I was making this decision, Andy Jassey at Amazon, he was formerly AWS CEO sat Nadela at Microsoft Thomas cor at Google and Arvin Christian, IBM to say, I'm thinking about this making decision. They are many of the mentors and friends to me. So I believe in an ecosystem. And you know, even Chuck Robbins, who the CEO of Cisco is an investor, I texted him and said, Hey, finally, we can be friends. >>It was harder to us to be friends with Cisco, given the overlap of NSX. So I have a big tent towards everybody in our ecosystem with VMware. I think the simple answer is there's no overlap okay. With, with the kind of the primary storage capabilities with VSAN. And by the same thing with Nutanix, we will be friends and, and extend that to be the best data protection solution. But given also what we could do with security, I think this is gonna go a lot further. And then it's all about meet the field. We have common partners. I think, you know, sort of the narrative I talked about in that blog is just like snowflake was replacing Terada and ServiceNow replace remedy and CrowdStrike, replacing Symantec, we're replacing legacy vendors. We are viewed as the modern solution cloud optimized for private and public cloud. We can help you and make VMware and vs a and VCF very relevant to that part of the data management and data security continuum, which I think could end VMware. And by the way, the same thing into the public cloud. So most of the places where we're being successful is clearly withs, but increasingly there's this discussion also about playing into the cloud. So I think both with VMware and Amazon, and of course the other partners in the hyperscaler service, storage, networking place and security, we have some big plans. >>How, how much do you see this? How do you see this multi-cloud narrative that we're hearing here from, from VMware evolving? How much of an opportunity is it? How are customers, you know, we heard about cloud chaos yesterday at the keynote, are customers, do they, do they admit that there's cloud chaos? Some probably do some probably don't how much of an opportunity is that for cohesive, >>It's tremendous opportunity. And I think that's why you need a Switzerland type player in this space to be successful. And you know, and you can't explicitly rule out the fact that the big guys get into this space, but I think it's, if you're gonna back up office 365 or what they call now, Microsoft 365 into AWS or Google workspace into Azure or Salesforce into one of those clouds, you need a Switzerland player. It's gonna be hard. And in many cases, if you're gonna back up data or you protect that data into AWS banks need a second copy of that either on premise or Azure. So it's very hard, even if they have their own native data protection for them to be dual cloud. So I think a multi-cloud story and the fact that there's at least three big vendors of cloud in, in the us, you know, one in China, if include Alibaba creates a Switzerland opportunity for us, that could be fairly big. >>And I think, you know, what we have to do is make sure while we'll be optimized, our preferred cloud is AWS. Our control plane runs there. We can't take an all in AWS stack with the control plane and the data planes at AWS to Walmart. So what I've explained to both Microsoft and AWS is that data plane will need to be multi-cloud. So I can go to an, a Walmart and say, I can back up your data into Azure if you choose to, but the control plane's still gonna be an AWS, same thing with Google. Maybe they have another account. That's very Google centric. So that's how we're gonna believe the, the control plane will be in AWS. We'll optimize it there, but the data plane will be multicloud. >>Yeah. And that's what Mo had explained at Supercloud. You know, and I talked to him, he really helped me hone in on the deployment models. Yes. Where, where, where the cohesive deployment model is instantiating that technology stack into each cloud region and each cloud, which gives you latency advantages and other advantages >>And single code based same platform. >>And then bringing it, tying it together with a unified, you know, interface. That was he, he was, he was key. In fact, I, I wrote about it recently and, and gave him and the other 29 >>Quite a bit in that session, he went deep with you. I >>Mean, with Mohi, when you get a guy who developed a Google file system, you know, who can technically say, okay, this is technically correct or no, Dave, your way off be. So I that's why I had to >>Go. I, I thought you did a great job in that interview because you probed him pretty deep. And I'm glad we could do that together with him next time. Well, maybe do that together here too, but it was really helpful. He's the, he's the, he's the key reason I'm here. >>So you say data management is ripe for disrupt disruption. Talk about that. You talked about this Switzerland effect. That sounds to me like a massive differentiator for cohesive. Why is data management right for disruption and why is cohesive the right partner to do it? >>Yeah, I think, listen, everyone in this sort of data protection backup from years ago have been saying the S Switzerland argument 18 years ago, I was a at Veras an executive there. We used the Switzerland argument, but what's changed is the cloud. And what's changed as a threat vector in security. That's, what's changed. And in that the proposition of a, a Switzerland player has just become more magnified because you didn't have a sales force or Workday service now then, but now you do, you didn't have multi-cloud. You had hardware vendors, you know, Dell, HPE sun at the time. IBM, it's now Lenovo. So that heterogeneity of, of on-premise service, storage, networking, HyperCloud, and, and the apps world has gotten more and more diverse. And I think you really need scale out architectures. Every one of the legacy players were not built with scale out architectures. >>If you take that fundamental notion of bringing compute to storage, you could almost paralyze. Imagine you could paralyze backup recovery and bring so much scale and speed that, and that's what Mo invented. So he took that idea of how he had invented and built Nutanix and applied that to secondary storage. So now everything gets faster and cheaper at scale. And that's a disruptive technology ally. What snowflake did to ator? I mean, the advantage of snowflake is when you took that same concept data, warehousing is not a new concept it's existed from since Ralph Kimball and bill Inman and the people who are fathers of data warehousing, they took that to Webscale. And in that came a disruptive force toter data, right on snowflake. And then of course now data bricks and big query, similar things. So we're doing the same thing. We just have to showcase the customers, which we do. And when large customers see that they're replacing the legacy solutions, I have a lot of respect for legacy solutions, but at some point in time of a solution was invented in 1995 or 2000, 2005. It's right. For change. >>So you use snowflake as an example, Frank SL doesn't like when I say playbook, cuz I says, Dave, I'm a situational CEO, no playbook, but there are patterns here. And one of the things he did is to your point go after, you know, Terra data with a better data warehouse, simplify scale, et cetera. And now he's, he's a constructing a Tam expansion strategy, same way he did at ServiceNow. And I see you guys following a similar pattern. Okay. You get your foot in the door. Let's face it. I mean, a lot of this started with, you know, just straight back. Okay, great. Now it's extending into data management now extending to multi-cloud that's like concentric circles in a Tam expansion strategy. How, how do you, as, as a CEO, that's part of your job is Tam expansion. >>So yeah, I think the way to think about the Tam is, I mean, people say it's 20, 30 billion, but let me tell you how you can piece it apart in size, Dave and Lisa number one, I estimate there's probably about 10 to 20 exabytes of data managed by these legacy players of on-prem stores that they back up to. Okay. So you add them all up in the market shares that they respectively are. And by the way, at the peak, the biggest of these companies got to 2 billion and then shrunk. That was Verto when I was there in 2004, 2 billion, every one of them is small and they stopped growing. You look at the IDC charts. Many of them are shrinking. We are the fastest growing in the last two years, but I estimate there's about 20 exabytes of data that collectively among the legacy players, that's either gonna stay on prem or move to the cloud. Okay. So the opportunity as they replace one of those legacy tools with us is first off to manage that 20 X by cheaper, faster with the Webscale glass offer the cloud guys, we could tip that into the cloud. Okay. >>But you can't stop there. >>Okay. No, we are not doing just backup recovery. We have a platform that can do files. We can do test dev analytics and now security. Okay. That data is potentially at a risk, not so much in the past, but for ransomware, right? How do we classify that? How do we govern that data? How do we run potential? You know, the same way you did antivirus some kind of XDR algorithms on the data to potentially not just catch the recovery process, which is after fact, but maybe the predictive act of before to know, Hey, there's somebody loitering around this data. So if I'm basically managing in the exabytes of data and I can proactively tell you what, this is, one CIO described this very simply to me a few weeks ago that I, and she said, I have 3000 applications, okay. I wanna be prepared for a black Swan event, except it's not a nine 11 planes getting the, the buildings. >>It is an extortion event. And I want to know when that happens, which of my 3000 apps I recover within one hour within one day within one week, no later than one month. Okay. And I don't wanna pay the bad guys at penny. That's what we do. So that's security discussions. We didn't have that discussion in 2004 when I was at another company, because we were talking about flood floods and earthquakes as a disaster recovery. Now you have a lot more security opportunity to be able to describe that. And that's a boardroom discussion. She needs to have that >>Digital risk. O O okay, go ahead please. I >>Was just gonna say, ransomware attack happens every what? One, every 11, 9, 11 seconds. >>And the dollar amount are going up, you know, dollar are going up. Yep. >>And, and when you pay the ransom, you don't always get your data back. So you that's not. >>And listen, there's always an ethical component. Should you do it or not do it? If you, if you don't do it and you're threatened, they may have left an Easter egg there. Listen, I, I feel very fortunate that I've been doing a lot in security, right? I mean, I built the business at, at, at VMware. We got it to over a billion I'm on the board of sneak. I've been doing security and then at SAP ran. So I know a lot about security. So what we do in security and the ecosystem that supports us in security, we will have a very carefully crafted stay tuned. Next three weeks months, you'll see us really rolling out a very kind of disciplined aspect, but we're not gonna pivot this company and become a cyber security company. Some others in our space have done that. I think that's not who we are. We are a data management and a data security company. We're not just a pure security company. We're doing both. And we do it well, intelligently, thoughtfully security is gonna be built into our platform, not voted on. Okay. And there'll be certain security things that we do organically. There's gonna be a lot that we do through partnerships, this >>Security market that's coming to you. You don't have to go claim that you're now a security vendor, right? The market very naturally saying, wow, a comprehensive security strategy has to incorporate a data protection strategy and a recovery, you know, and the things that we've talking about Mount ransomware, I want to ask you, you I've been around a long time, longer than you actually Sanjay. So, but you you've, you've seen a lot. You look, >>Thank you. That's all good. Oh, >>Shucks. So the market, I've never seen a market like this, right? I okay. After the.com crash, we said, and I know you can't talk about IPO. That's not what I'm talking about, but everything was bad after that. Right. 2008, 2000, everything was bad. I've never seen a market. That's half full, half empty, you know, snowflake beats and raises the stock, goes through the roof. Dev if it, if the area announced today, Mongo, DB, beat and Ray, that things getting crushed and, and after market never seen anything like this. It's so fed, driven and, and hard to protect. And, and of course, I know it's a marathon, you know, it's not a sprint, but have you ever seen anything like this? >>Listen, I walk worked through 18 quarters as COO of VMware. You've seen where I've seen public quarters there and you know, was very fortunate. Thanks to the team. I don't think I missed my numbers in 18 quarters except maybe once close. But we, it was, it's tough. Being a public company of the company is tough. I did that also at SAP. So the journey from 10 to 20 billion at SAP, the journey from six to 12 at VMware, that I was able to be fortunate. It's humbling because you, you really, you know, we used to have this, we do the earnings call and then we kind of ask ourselves, what, what do you think the stock price was gonna be a day and a half later? And we'd all take bets as to where this, I think you just basically, as a, as a sea level executive, you try to build a culture of beaten, raise, beaten, raise, beaten, raise, and you wanna set expectations in a way that you're not setting them up for failure. >>And you know, it's you, there's, Dave's a wonderful CEO as is Frank Salman. So it's hard for me to dissect. And sometimes the market are fickle on some small piece of it. But I think also the, when I, I encourage people say, take the long term view. When you take the long term view, you're not bothered about the ups and downs. If you're building a great company over the length of time, now it will be very clear over the arc of many, many quarters that you're business is trouble. If you're starting to see a decay in growth. And like, for example, when you start to see a growth, start to decay significantly by five, 10 percentage points, okay, there's something macro going on at this company. And that's what you won't avoid. But these, you know, ups and downs, my view is like, if you've got both Mongo D and snowflake are fantastic companies, they're CEOs of people I respect. They've actually kind of an, a, you know, advisor to us as a company, you knows moat very well. So we respect him, respect Frank, and you, there have been other quarters where Frank's, you know, the Snowflake's had a down result after that. So you build a long term and they are on the right side of history, snowflake, and both of them in terms of being a modern cloud relevant in the case of MongoDB, open source, two data technology, that's, you know, winning, I, I, we would like to be like them one day >>As, as the new CEO of cohesive, what are you most ask? What are you most anxious about and what are you most excited about? >>I think, listen, you know, you know, everything starts with the employee. You, I always believe I wrote my first memo to all employees. There was an article in Harvard business review called service profit chains that had a seminal impact on my leadership, which is when they studied companies who had been consistently profitable over a long period of time. They found that not just did those companies serve their customers well, but behind happy engaged customers were happy, engaged employees. So I always believe you start with the employee and you ensure that they're engaged, not just recruiting new employees. You know, I put on a tweet today, we're hiring reps and engineers. That's okay. But retaining. So I wanna start with ensuring that everybody, sometimes we have to make some unfortunate decisions with employees. We've, we've got a part company with, but if we can keep the best and brightest retained first, then of course, you know, recruiting machine, I'm trying to recruit the best and brightest to this company, people all over the place. >>I want to get them here. It's been, so I mean, heartwarming to come Tom world and just see people from all walks, kind of giving me hugs. I feel incredibly blessed. And then, you know, after employees, it's customers and partners, I feel like the tech is in really good hands. I don't have to worry about that. Cuz Mo it's in charge. He's got this thing. I can go to bed knowing that he's gonna keep innovating the future. Maybe in some of the companies I've worried about the tech innovation piece, but most doing a great job there. I can kind of leave that in his cap of hands, but employees, customers, partners, that's kind of what I'm focused on. None of them are for me, like a keep up at night, but there are are opportunities, right? And sometimes there's somebody you're trying to salvage to make sure or somebody you're trying to convince to join. >>But you know, customers, I love pursuing customers. I love the win. I hate to lose. So fortune 1000 global, 2000 companies, small companies, big companies, I wanna win every one of them. And it's not, it's not like, I mean, I know all these CEOs in my competitors. I texted him the day I joined and said, listen, I'll compete, honorably, whatever have you, but it's like Kobe and LeBron Kobe's passed away now. So maybe it's Steph Curry. LeBron, whoever your favorite athlete is you put your best on the court and you win. And that's how I am. That's nothing I've known no other gear than to put my best on the court and win, but do it honorably. It should not be the one that you're doing it. Unethically. You're doing it personally. You're not calling people's names. You're competing honorably. And when you win the team celebrates, it's not a victory for me. It's a victory for the team. >>I always think I'm glad that you brought up the employee experience and we're almost out of time, but I always think the employee experience and the customer experience are inextricably linked. This employees have to be empowered. They have to have the data that they need to do their job so that they can deliver to the customer. You can't do one without the other. >>That's so true. I mean, I, it's my belief. And I've talked also on this show and others about servant leadership. You know, one of my favorite poems is Brenda Naor. I went to bed in life. I dreamt that life was joy. I woke up and realized life was service. I acted in service was joy. So when you have a leadership model, which is it's about, I mean, there's lots of layers between me and the individual contributor, but I really care about that sales rep and the engineer. That's the leaf level of the organization. What can I get obstacle outta their way? I love skipping levels of going right. That sales rep let's go and crack this deal. You know? So you have that mindset. Yeah. I mean, you, you empower, you invert the pyramid and you realize the power is at the leaf level of an organization. >>So that's what I'm trying to do. It's a little easier to do it with 2000 people than I dunno, either 20, 20, 2000 people or 35,000 reported me at VMware. And I mean a similar number at SAP, which was even bigger, but you can shape this. Now we are, we're not a startup anymore. We're a midsize company. We'll see. Maybe along the way, there's an IP on the path. We'll wait for that. When it comes, it's a milestone. It's not the destination. So we do that and we are, we, I told people we are gonna build this green company. Cohesive is gonna be a great company like VMware one day, like Amazon. And there's always a day of early beginnings, but we have to work harder. This is kind of like the, you know, eight year old version of your kid, as opposed to the 18 year old version of the kid. And you gotta work a little harder. So I love it. Yeah. >>Good luck. Awesome. Thank you. Best of luck. Congratulations. On the role, it sounds like there's a tremendous amount of adrenaline, a momentum carrying you forward Sanjay. We always appreciate having you. Thank >>You for having in your show. >>Thank you. Our pleasure, Lisa. Thank you for Sanja poin and Dave ante. I'm Lisa Martin. You're watching the cube live from VMware Explorer, 2022, stick around our next guest. Join us momentarily.

Published Date : Sep 1 2022

SUMMARY :

Valante good to be sitting next to you, sir. And we're very excited to be welcoming buck. It's great to meet with you all the time and the new sort of setting here, We've been in north. I mean, it's also good to be back with live shows with absolutely, you know, after sort of the two or three or hiatus. You wrote a great blog that you are identified. And you know, one of the senior Google executives was on my board. So you know, a little bit about how to work with, with VMware. And you know, even Chuck Robbins, who the CEO of I think, you know, sort of the narrative I talked about in that blog is And I think that's why you need a Switzerland type player in this space to And I think, you know, what we have to do is make sure while we'll be optimized, our preferred cloud is AWS. stack into each cloud region and each cloud, which gives you latency advantages and other advantages And then bringing it, tying it together with a unified, you know, interface. Quite a bit in that session, he went deep with you. Mean, with Mohi, when you get a guy who developed a Google file system, you know, who can technically Go. I, I thought you did a great job in that interview because you probed him pretty deep. So you say data management is ripe for disrupt disruption. And I think you really need scale out architectures. the advantage of snowflake is when you took that same concept data, warehousing is not a new concept it's existed from since And I see you guys following a similar pattern. So yeah, I think the way to think about the Tam is, I mean, people say it's 20, 30 billion, but let me tell you how you can piece it apart You know, the same way you did antivirus some kind of XDR And I want to know when that happens, which of my 3000 apps I I Was just gonna say, ransomware attack happens every what? And the dollar amount are going up, you know, dollar are going up. And, and when you pay the ransom, you don't always get your data back. I mean, I built the business at, at, at VMware. protection strategy and a recovery, you know, and the things that we've talking about Mount ransomware, Thank you. And, and of course, I know it's a marathon, you know, it's not a sprint, I think you just basically, as a, as a sea level executive, you try to build a culture of And you know, it's you, there's, Dave's a wonderful CEO as is Frank Salman. I think, listen, you know, you know, everything starts with the employee. And then, you know, And when you win the team celebrates, I always think I'm glad that you brought up the employee experience and we're almost out of time, but I always think the employee experience and the customer So when you have a leadership model, which is it's about, I mean, This is kind of like the, you know, eight year old version of your kid, as opposed to the 18 year old version of a momentum carrying you forward Sanjay. Thank you.

ENTITIES

Entity	Category	Confidence
Sanjay	PERSON	0.99+
Chuck Robbins	PERSON	0.99+
Andy Jassey	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
Dave	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Lisa Martin	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Alibaba	ORGANIZATION	0.99+
1995	DATE	0.99+
Dell	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
China	LOCATION	0.99+
2004	DATE	0.99+
Bill Corrin	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Frank Salman	PERSON	0.99+
Lenovo	ORGANIZATION	0.99+
Sanjay Poonen	PERSON	0.99+
2005	DATE	0.99+
Google	ORGANIZATION	0.99+
Arvin Christian	PERSON	0.99+
Lisa	PERSON	0.99+
Steph Curry	PERSON	0.99+
2000	DATE	0.99+
20	QUANTITY	0.99+
VMware	ORGANIZATION	0.99+
Las Vegas	LOCATION	0.99+
San Francisco	LOCATION	0.99+
2 billion	QUANTITY	0.99+
3000 apps	QUANTITY	0.99+
today	DATE	0.99+
Sanja poin	PERSON	0.99+
Nutanix	ORGANIZATION	0.99+
35,000	QUANTITY	0.99+
LeBron	PERSON	0.99+
Veras	ORGANIZATION	0.99+
five	QUANTITY	0.99+
Mongo	ORGANIZATION	0.99+
Walmart	ORGANIZATION	0.99+
Frank	PERSON	0.99+
eight year	QUANTITY	0.99+
Mohi	PERSON	0.99+
both	QUANTITY	0.99+
10	QUANTITY	0.99+
Kobe	PERSON	0.99+
Switzerland	LOCATION	0.99+
2008	DATE	0.99+
DB	ORGANIZATION	0.99+
six	QUANTITY	0.99+
Nadela	PERSON	0.99+
3000 applications	QUANTITY	0.99+
Symantec	ORGANIZATION	0.99+
Ralph Kimball	PERSON	0.99+
2000 people	QUANTITY	0.99+
yesterday	DATE	0.99+
Supercloud	ORGANIZATION	0.99+

Analyst Power Panel: Future of Database Platforms

(upbeat music) >> Once a staid and boring business dominated by IBM, Oracle, and at the time newcomer Microsoft, along with a handful of wannabes, the database business has exploded in the past decade and has become a staple of financial excellence, customer experience, analytic advantage, competitive strategy, growth initiatives, visualizations, not to mention compliance, security, privacy and dozens of other important use cases and initiatives. And on the vendor's side of the house, we've seen the rapid ascendancy of cloud databases. Most notably from Snowflake, whose massive raises leading up to its IPO in late 2020 sparked a spate of interest and VC investment in the separation of compute and storage and all that elastic resource stuff in the cloud. The company joined AWS, Azure and Google to popularize cloud databases, which have become a linchpin of competitive strategies for technology suppliers. And if I get you to put your data in my database and in my cloud, and I keep innovating, I'm going to build a moat and achieve a hugely attractive lifetime customer value in a really amazing marginal economics dynamic that is going to fund my future. And I'll be able to sell other adjacent services, not just compute and storage, but machine learning and inference and training and all kinds of stuff, dozens of lucrative cloud offerings. Meanwhile, the database leader, Oracle has invested massive amounts of money to maintain its lead. It's building on its position as the king of mission critical workloads and making typical Oracle like claims against the competition. Most were recently just yesterday with another announcement around MySQL HeatWave. An extension of MySQL that is compatible with on-premises MySQLs and is setting new standards in price performance. We're seeing a dramatic divergence in strategies across the database spectrum. On the far left, we see Amazon with more than a dozen database offerings each with its own API and primitives. AWS is taking a right tool for the right job approach, often building on open source platforms and creating services that it offers to customers to solve very specific problems for developers. And on the other side of the line, we see Oracle, which is taking the Swiss Army Knife approach, converging database functionality, enabling analytic and transactional workloads to run in the same data store, eliminating the need to ETL, at the same time adding capabilities into its platform like automation and machine learning. Welcome to this database Power Panel. My name is Dave Vellante, and I'm so excited to bring together some of the most respected industry analyst in the community. Today we're going to assess what's happening in the market. We're going to dig into the competitive landscape and explore the future of database and database platforms and decode what it means to customers. Let me take a moment to welcome our guest analyst today. Matt Kimball is a vice president and principal analysts at Moor Insights and Strategy, Matt. He knows products, he knows industry, he's got real world IT expertise, and he's got all the angles 25 plus years of experience in all kinds of great background. Matt, welcome. Thanks very much for coming on theCUBE. Holgar Mueller, friend of theCUBE, vice president and principal analyst at Constellation Research in depth knowledge on applications, application development, knows developers. He's worked at SAP and Oracle. And then Bob Evans is Chief Content Officer and co-founder of the Acceleration Economy, founder and principle of Cloud Wars. Covers all kinds of industry topics and great insights. He's got awesome videos, these three minute hits. If you haven't seen 'em, checking them out, knows cloud companies, his Cloud Wars minutes are fantastic. And then of course, Marc Staimer is the founder of Dragon Slayer Research. A frequent contributor and guest analyst at Wikibon. He's got a wide ranging knowledge across IT products, knows technology really well, can go deep. And then of course, Ron Westfall, Senior Analyst and Director Research Director at Futurum Research, great all around product trends knowledge. Can take, you know, technical dives and really understands competitive angles, knows Redshift, Snowflake, and many others. Gents, thanks so much for taking the time to join us in theCube today. It's great to have you on, good to see you. >> Good to be here, thanks for having us. >> Thanks, Dave. >> All right, let's start with an around the horn and briefly, if each of you would describe, you know, anything I missed in your areas of expertise and then you answer the following question, how would you describe the state of the database, state of platform market today? Matt Kimball, please start. >> Oh, I hate going first, but that it's okay. How would I describe the world today? I would just in one sentence, I would say, I'm glad I'm not in IT anymore, right? So, you know, it is a complex and dangerous world out there. And I don't envy IT folks I'd have to support, you know, these modernization and transformation efforts that are going on within the enterprise. It used to be, you mentioned it, Dave, you would argue about IBM versus Oracle versus this newcomer in the database space called Microsoft. And don't forget Sybase back in the day, but you know, now it's not just, which SQL vendor am I going to go with? It's all of these different, divergent data types that have to be taken, they have to be merged together, synthesized. And somehow I have to do that cleanly and use this to drive strategic decisions for my business. That is not easy. So, you know, you have to look at it from the perspective of the business user. It's great for them because as a DevOps person, or as an analyst, I have so much flexibility and I have this thing called the cloud now where I can go get services immediately. As an IT person or a DBA, I am calling up prevention hotlines 24 hours a day, because I don't know how I'm going to be able to support the business. And as an Oracle or as an Oracle or a Microsoft or some of the cloud providers and cloud databases out there, I'm licking my chops because, you know, my market is expanding and expanding every day. >> Great, thank you for that, Matt. Holgar, how do you see the world these days? You always have a good perspective on things, share with us. >> Well, I think it's the best time to be in IT, I'm not sure what Matt is talking about. (laughing) It's easier than ever, right? The direction is going to cloud. Kubernetes has won, Google has the best AI for now, right? So things are easier than ever before. You made commitments for five plus years on hardware, networking and so on premise, and I got gray hair about worrying it was the wrong decision. No, just kidding. But you kind of both sides, just to be controversial, make it interesting, right. So yeah, no, I think the interesting thing specifically with databases, right? We have this big suite versus best of breed, right? Obviously innovation, like you mentioned with Snowflake and others happening in the cloud, the cloud vendors server, where to save of their databases. And then we have one of the few survivors of the old guard as Evans likes to call them is Oracle who's doing well, both their traditional database. And now, which is really interesting, remarkable from that because Oracle it was always the power of one, have one database, add more to it, make it what I call the universal database. And now this new HeatWave offering is coming and MySQL open source side. So they're getting the second (indistinct) right? So it's interesting that older players, traditional players who still are in the market are diversifying their offerings. Something we don't see so much from the traditional tools from Oracle on the Microsoft side or the IBM side these days. >> Great, thank you Holgar. Bob Evans, you've covered this business for a while. You've worked at, you know, a number of different outlets and companies and you cover the competition, how do you see things? >> Dave, you know, the other angle to look at this from is from the customer side, right? You got now CEOs who are any sort of business across all sorts of industries, and they understand that their future success is going to be dependent on their ability to become a digital company, to understand data, to use it the right way. So as you outline Dave, I think in your intro there, it is a fantastic time to be in the database business. And I think we've got a lot of new buyers and influencers coming in. They don't know all this history about IBM and Microsoft and Oracle and you know, whoever else. So I think they're going to take a long, hard look, Dave, at some of these results and who is able to help these companies not serve up the best technology, but who's going to be able to help their business move into the digital future. So it's a fascinating time now from every perspective. >> Great points, Bob. I mean, digital transformation has gone from buzzword to imperative. Mr. Staimer, how do you see things? >> I see things a little bit differently than my peers here in that I see the database market being segmented. There's all the different kinds of databases that people are looking at for different kinds of data, and then there is databases in the cloud. And so database as cloud service, I view very differently than databases because the traditional way of implementing a database is changing and it's changing rapidly. So one of the premises that you stated earlier on was that you viewed Oracle as a database company. I don't view Oracle as a database company anymore. I view Oracle as a cloud company that happens to have a significant expertise and specialty in databases, and they still sell database software in the traditional way, but ultimately they're a cloud company. So database cloud services from my point of view is a very distinct market from databases. >> Okay, well, you gave us some good meat on the bone to talk about that. Last but not least-- >> Dave did Marc, just say Oracle's a cloud company? >> Yeah. (laughing) Take away the database, it would be interesting to have that discussion, but let's let Ron jump in here. Ron, give us your take. >> That's a great segue. I think it's truly the era of the cloud database, that's something that's rising. And the key trends that come with it include for example, elastic scaling. That is the ability to scale on demand, to right size workloads according to customer requirements. And also I think it's going to increase the prioritization for high availability. That is the player who can provide the highest availability is going to have, I think, a great deal of success in this emerging market. And also I anticipate that there will be more consolidation across platforms in order to enable cost savings for customers, and that's something that's always going to be important. And I think we'll see more of that over the horizon. And then finally security, security will be more important than ever. We've seen a spike (indistinct), we certainly have seen geopolitical originated cybersecurity concerns. And as a result, I see database security becoming all the more important. >> Great, thank you. Okay, let me share some data with you guys. I'm going to throw this at you and see what you think. We have this awesome data partner called Enterprise Technology Research, ETR. They do these quarterly surveys and each period with dozens of industry segments, they track clients spending, customer spending. And this is the database, data warehouse sector okay so it's taxonomy, so it's not perfect, but it's a big kind of chunk. They essentially ask customers within a category and buy a specific vendor, you're spending more or less on the platform? And then they subtract the lesses from the mores and they derive a metric called net score. It's like NPS, it's a measure of spending velocity. It's more complicated and granular than that, but that's the basis and that's the vertical axis. The horizontal axis is what they call market share, it's not like IDC market share, it's just pervasiveness in the data set. And so there are a couple of things that stand out here and that we can use as reference point. The first is the momentum of Snowflake. They've been off the charts for many, many, for over two years now, anything above that dotted red line, that 40%, is considered by ETR to be highly elevated and Snowflake's even way above that. And I think it's probably not sustainable. We're going to see in the next April survey, next month from those guys, when it comes out. And then you see AWS and Microsoft, they're really pervasive on the horizontal axis and highly elevated, Google falls behind them. And then you got a number of well funded players. You got Cockroach Labs, Mongo, Redis, MariaDB, which of course is a fork on MySQL started almost as protest at Oracle when they acquired Sun and they got MySQL and you can see the number of others. Now Oracle who's the leading database player, despite what Marc Staimer says, we know, (laughs) and they're a cloud player (laughing) who happens to be a leading database player. They dominate in the mission critical space, we know that they're the king of that sector, but you can see here that they're kind of legacy, right? They've been around a long time, they get a big install base. So they don't have the spending momentum on the vertical axis. Now remember this is, just really this doesn't capture spending levels, so that understates Oracle but nonetheless. So it's not a complete picture like SAP for instance is not in here, no Hana. I think people are actually buying it, but it doesn't show up here, (laughs) but it does give an indication of momentum and presence. So Bob Evans, I'm going to start with you. You've commented on many of these companies, you know, what does this data tell you? >> Yeah, you know, Dave, I think all these compilations of things like that are interesting, and that folks at ETR do some good work, but I think as you said, it's a snapshot sort of a two-dimensional thing of a rapidly changing, three dimensional world. You know, the incidents at which some of these companies are mentioned versus the volume that happens. I think it's, you know, with Oracle and I'm not going to declare my religious affiliation, either as cloud company or database company, you know, they're all of those things and more, and I think some of our old language of how we classify companies is just not relevant anymore. But I want to ask too something in here, the autonomous database from Oracle, nobody else has done that. So either Oracle is crazy, they've tried out a technology that nobody other than them is interested in, or they're onto something that nobody else can match. So to me, Dave, within Oracle, trying to identify how they're doing there, I would watch autonomous database growth too, because right, it's either going to be a big plan and it breaks through, or it's going to be caught behind. And the Snowflake phenomenon as you mentioned, that is a rare, rare bird who comes up and can grow 100% at a billion dollar revenue level like that. So now they've had a chance to come in, scare the crap out of everybody, rock the market with something totally new, the data cloud. Will the bigger companies be able to catch up and offer a compelling alternative, or is Snowflake going to continue to be this outlier. It's a fascinating time. >> Really, interesting points there. Holgar, I want to ask you, I mean, I've talked to certainly I'm sure you guys have too, the founders of Snowflake that came out of Oracle and they actually, they don't apologize. They say, "Hey, we not going to do all that complicated stuff that Oracle does, we were trying to keep it real simple." But at the same time, you know, they don't do sophisticated workload management. They don't do complex joints. They're kind of relying on the ecosystems. So when you look at the data like this and the various momentums, and we talked about the diverging strategies, what does this say to you? >> Well, it is a great point. And I think Snowflake is an example how the cloud can turbo charge a well understood concept in this case, the data warehouse, right? You move that and you find steroids and you see like for some players who've been big in data warehouse, like Sentara Data, as an example, here in San Diego, what could have been for them right in that part. The interesting thing, the problem though is the cloud hides a lot of complexity too, which you can scale really well as you attract lots of customers to go there. And you don't have to build things like what Bob said, right? One of the fascinating things, right, nobody's answering Oracle on the autonomous database. I don't think is that they cannot, they just have different priorities or the database is not such a priority. I would dare to say that it's for IBM and Microsoft right now at the moment. And the cloud vendors, you just hide that right through scripts and through scale because you support thousands of customers and you can deal with a little more complexity, right? It's not against them. Whereas if you have to run it yourself, very different story, right? You want to have the autonomous parts, you want to have the powerful tools to do things. >> Thank you. And so Matt, I want to go to you, you've set up front, you know, it's just complicated if you're in IT, it's a complicated situation and you've been on the customer side. And if you're a buyer, it's obviously, it's like Holgar said, "Cloud's supposed to make this stuff easier, but the simpler it gets the more complicated gets." So where do you place your bets? Or I guess more importantly, how do you decide where to place your bets? >> Yeah, it's a good question. And to what Bob and Holgar said, you know, the around autonomous database, I think, you know, part of, as I, you know, play kind of armchair psychologist, if you will, corporate psychologists, I look at what Oracle is doing and, you know, databases where they've made their mark and it's kind of, that's their strong position, right? So it makes sense if you're making an entry into this cloud and you really want to kind of build momentum, you go with what you're good at, right? So that's kind of the strength of Oracle. Let's put a lot of focus on that. They do a lot more than database, don't get me wrong, but you know, I'm going to short my strength and then kind of pivot from there. With regards to, you know, what IT looks at and what I would look at you know as an IT director or somebody who is, you know, trying to consume services from these different cloud providers. First and foremost, I go with what I know, right? Let's not forget IT is a conservative group. And when we look at, you know, all the different permutations of database types out there, SQL, NoSQL, all the different types of NoSQL, those are largely being deployed by business users that are looking for agility or businesses that are looking for agility. You know, the reason why MongoDB is so popular is because of DevOps, right? It's a great platform to develop on and that's where it kind of gained its traction. But as an IT person, I want to go with what I know, where my muscle memory is, and that's my first position. And so as I evaluate different cloud service providers and cloud databases, I look for, you know, what I know and what I've invested in and where my muscle memory is. Is there enough there and do I have enough belief that that company or that service is going to be able to take me to, you know, where I see my organization in five years from a data management perspective, from a business perspective, are they going to be there? And if they are, then I'm a little bit more willing to make that investment, but it is, you know, if I'm kind of going in this blind or if I'm cloud native, you know, that's where the Snowflakes of the world become very attractive to me. >> Thank you. So Marc, I asked Andy Jackson in theCube one time, you have all these, you know, data stores and different APIs and primitives and you know, very granular, what's the strategy there? And he said, "Hey, that allows us as the market changes, it allows us to be more flexible. If we start building abstractions layers, it's harder for us." I think also it was not a good time to market advantage, but let me ask you, I described earlier on that spectrum from AWS to Oracle. We just saw yesterday, Oracle announced, I think the third major enhancement in like 15 months to MySQL HeatWave, what do you make of that announcement? How do you think it impacts the competitive landscape, particularly as it relates to, you know, converging transaction and analytics, eliminating ELT, I know you have some thoughts on this. >> So let me back up for a second and defend my cloud statement about Oracle for a moment. (laughing) AWS did a great job in developing the cloud market in general and everything in the cloud market. I mean, I give them lots of kudos on that. And a lot of what they did is they took open source software and they rent it to people who use their cloud. So I give 'em lots of credit, they dominate the market. Oracle was late to the cloud market. In fact, they actually poo-pooed it initially, if you look at some of Larry Ellison's statements, they said, "Oh, it's never going to take off." And then they did 180 turn, and they said, "Oh, we're going to embrace the cloud." And they really have, but when you're late to a market, you've got to be compelling. And this ties into the announcement yesterday, but let's deal with this compelling. To be compelling from a user point of view, you got to be twice as fast, offer twice as much functionality, at half the cost. That's generally what compelling is that you're going to capture market share from the leaders who established the market. It's very difficult to capture market share in a new market for yourself. And you're right. I mean, Bob was correct on this and Holgar and Matt in which you look at Oracle, and they did a great job of leveraging their database to move into this market, give 'em lots of kudos for that too. But yesterday they announced, as you said, the third innovation release and the pace is just amazing of what they're doing on these releases on HeatWave that ties together initially MySQL with an integrated builtin analytics engine, so a data warehouse built in. And then they added automation with autopilot, and now they've added machine learning to it, and it's all in the same service. It's not something you can buy and put on your premise unless you buy their cloud customers stuff. But generally it's a cloud offering, so it's compellingly better as far as the integration. You don't buy multiple services, you buy one and it's lower cost than any of the other services, but more importantly, it's faster, which again, give 'em credit for, they have more integration of a product. They can tie things together in a way that nobody else does. There's no additional services, ETL services like Glue and AWS. So from that perspective, they're getting better performance, fewer services, lower cost. Hmm, they're aiming at the compelling side again. So from a customer point of view it's compelling. Matt, you wanted to say something there. >> Yeah, I want to kind of, on what you just said there Marc, and this is something I've found really interesting, you know. The traditional way that you look at software and, you know, purchasing software and IT is, you look at either best of breed solutions and you have to work on the backend to integrate them all and make them all work well. And generally, you know, the big hit against the, you know, we have one integrated offering is that, you lose capability or you lose depth of features, right. And to what you were saying, you know, that's the thing I found interesting about what Oracle is doing is they're building in depth as they kind of, you know, build that service. It's not like you're losing a lot of capabilities, because you're going to one integrated service versus having to use A versus B versus C, and I love that idea. >> You're right. Yeah, not only you're not losing, but you're gaining functionality that you can't get by integrating a lot of these. I mean, I can take Snowflake and integrate it in with machine learning, but I also have to integrate in with a transactional database. So I've got to have connectors between all of this, which means I'm adding time. And what it comes down to at the end of the day is expertise, effort, time, and cost. And so what I see the difference from the Oracle announcements is they're aiming at reducing all of that by increasing performance as well. Correct me if I'm wrong on that but that's what I saw at the announcement yesterday. >> You know, Marc, one thing though Marc, it's funny you say that because I started out saying, you know, I'm glad I'm not 19 anymore. And the reason is because of exactly what you said, it's almost like there's a pseudo level of witchcraft that's required to support the modern data environment right in the enterprise. And I need simpler faster, better. That's what I need, you know, I am no longer wearing pocket protectors. I have turned from, you know, break, fix kind of person, to you know, business consultant. And I need that point and click simplicity, but I can't sacrifice, you know, a depth of features of functionality on the backend as I play that consultancy role. >> So, Ron, I want to bring in Ron, you know, it's funny. So Matt, you mentioned Mongo, I often and say, if Oracle mentions you, you're on the map. We saw them yesterday Ron, (laughing) they hammered RedShifts auto ML, they took swipes at Snowflake, a little bit of BigQuery. What were your thoughts on that? Do you agree with what these guys are saying in terms of HeatWaves capabilities? >> Yes, Dave, I think that's an excellent question. And fundamentally I do agree. And the question is why, and I think it's important to know that all of the Oracle data is backed by the fact that they're using benchmarks. For example, all of the ML and all of the TPC benchmarks, including all the scripts, all the configs and all the detail are posted on GitHub. So anybody can look at these results and they're fully transparent and replicate themselves. If you don't agree with this data, then by all means challenge it. And we have not really seen that in all of the new updates in HeatWave over the last 15 months. And as a result, when it comes to these, you know, fundamentals in looking at the competitive landscape, which I think gives validity to outcomes such as Oracle being able to deliver 4.8 times better price performance than Redshift. As well as for example, 14.4 better price performance than Snowflake, and also 12.9 better price performance than BigQuery. And so that is, you know, looking at the quantitative side of things. But again, I think, you know, to Marc's point and to Matt's point, there are also qualitative aspects that clearly differentiate the Oracle proposition, from my perspective. For example now the MySQL HeatWave ML capabilities are native, they're built in, and they also support things such as completion criteria. And as a result, that enables them to show that hey, when you're using Redshift ML for example, you're having to also use their SageMaker tool and it's running on a meter. And so, you know, nobody really wants to be running on a meter when, you know, executing these incredibly complex tasks. And likewise, when it comes to Snowflake, they have to use a third party capability. They don't have the built in, it's not native. So the user, to the point that he's having to spend more time and it increases complexity to use auto ML capabilities across the Snowflake platform. And also, I think it also applies to other important features such as data sampling, for example, with the HeatWave ML, it's intelligent sampling that's being implemented. Whereas in contrast, we're seeing Redshift using random sampling. And again, Snowflake, you're having to use a third party library in order to achieve the same capabilities. So I think the differentiation is crystal clear. I think it definitely is refreshing. It's showing that this is where true value can be assigned. And if you don't agree with it, by all means challenge the data. >> Yeah, I want to come to the benchmarks in a minute. By the way, you know, the gentleman who's the Oracle's architect, he did a great job on the call yesterday explaining what you have to do. I thought that was quite impressive. But Bob, I know you follow the financials pretty closely and on the earnings call earlier this month, Ellison said that, "We're going to see HeatWave on AWS." And the skeptic in me said, oh, they must not be getting people to come to OCI. And then they, you remember this chart they showed yesterday that showed the growth of HeatWave on OCI. But of course there was no data on there, it was just sort of, you know, lines up and to the right. So what do you guys think of that? (Marc laughs) Does it signal Bob, desperation by Oracle that they can't get traction on OCI, or is it just really a smart tame expansion move? What do you think? >> Yeah, Dave, that's a great question. You know, along the way there, and you know, just inside of that was something that said Ellison said on earnings call that spoke to a different sort of philosophy or mindset, almost Marc, where he said, "We're going to make this multicloud," right? With a lot of their other cloud stuff, if you wanted to use any of Oracle's cloud software, you had to use Oracle's infrastructure, OCI, there was no other way out of it. But this one, but I thought it was a classic Ellison line. He said, "Well, we're making this available on AWS. We're making this available, you know, on Snowflake because we're going after those users. And once they see what can be done here." So he's looking at it, I guess you could say, it's a concession to customers because they want multi-cloud. The other way to look at it, it's a hunting expedition and it's one of those uniquely I think Oracle ways. He said up front, right, he doesn't say, "Well, there's a big market, there's a lot for everybody, we just want on our slice." Said, "No, we are going after Amazon, we're going after Redshift, we're going after Aurora. We're going after these users of Snowflake and so on." And I think it's really fairly refreshing these days to hear somebody say that, because now if I'm a buyer, I can look at that and say, you know, to Marc's point, "Do they measure up, do they crack that threshold ceiling? Or is this just going to be more pain than a few dollars savings is worth?" But you look at those numbers that Ron pointed out and that we all saw in that chart. I've never seen Dave, anything like that. In a substantive market, a new player coming in here, and being able to establish differences that are four, seven, eight, 10, 12 times better than competition. And as new buyers look at that, they're going to say, "What the hell are we doing paying, you know, five times more to get a poor result? What's going on here?" So I think this is going to rattle people and force a harder, closer look at what these alternatives are. >> I wonder if the guy, thank you. Let's just skip ahead of the benchmarks guys, bring up the next slide, let's skip ahead a little bit here, which talks to the benchmarks and the benchmarking if we can. You know, David Floyer, the sort of semiretired, you know, Wikibon analyst said, "Dave, this is going to force Amazon and others, Snowflake," he said, "To rethink actually how they architect databases." And this is kind of a compilation of some of the data that they shared. They went after Redshift mostly, (laughs) but also, you know, as I say, Snowflake, BigQuery. And, like I said, you can always tell which companies are doing well, 'cause Oracle will come after you, but they're on the radar here. (laughing) Holgar should we take this stuff seriously? I mean, or is it, you know, a grain salt? What are your thoughts here? >> I think you have to take it seriously. I mean, that's a great question, great point on that. Because like Ron said, "If there's a flaw in a benchmark, we know this database traditionally, right?" If anybody came up that, everybody will be, "Oh, you put the wrong benchmark, it wasn't audited right, let us do it again," and so on. We don't see this happening, right? So kudos to Oracle to be aggressive, differentiated, and seem to having impeccable benchmarks. But what we really see, I think in my view is that the classic and we can talk about this in 100 years, right? Is the suite versus best of breed, right? And the key question of the suite, because the suite's always slower, right? No matter at which level of the stack, you have the suite, then the best of breed that will come up with something new, use a cloud, put the data warehouse on steroids and so on. The important thing is that you have to assess as a buyer what is the speed of my suite vendor. And that's what you guys mentioned before as well, right? Marc said that and so on, "Like, this is a third release in one year of the HeatWave team, right?" So everybody in the database open source Marc, and there's so many MySQL spinoffs to certain point is put on shine on the speed of (indistinct) team, putting out fundamental changes. And the beauty of that is right, is so inherent to the Oracle value proposition. Larry's vision of building the IBM of the 21st century, right from the Silicon, from the chip all the way across the seven stacks to the click of the user. And that what makes the database what Rob was saying, "Tied to the OCI infrastructure," because designed for that, it runs uniquely better for that, that's why we see the cross connect to Microsoft. HeatWave so it's different, right? Because HeatWave runs on cheap hardware, right? Which is the breadth and butter 886 scale of any cloud provider, right? So Oracle probably needs it to scale OCI in a different category, not the expensive side, but also allow us to do what we said before, the multicloud capability, which ultimately CIOs really want, because data gravity is real, you want to operate where that is. If you have a fast, innovative offering, which gives you more functionality and the R and D speed is really impressive for the space, puts away bad results, then it's a good bet to look at. >> Yeah, so you're saying, that we versus best of breed. I just want to sort of play back then Marc a comment. That suite versus best of breed, there's always been that trade off. If I understand you Holgar you're saying that somehow Oracle has magically cut through that trade off and they're giving you the best of both. >> It's the developing velocity, right? The provision of important features, which matter to buyers of the suite vendor, eclipses the best of breed vendor, then the best of breed vendor is in the hell of a potential job. >> Yeah, go ahead Marc. >> Yeah and I want to add on what Holgar just said there. I mean the worst job in the data center is data movement, moving the data sucks. I don't care who you are, nobody likes it. You never get any kudos for doing it well, and you always get the ah craps, when things go wrong. So it's in- >> In the data center Marc all the time across data centers, across cloud. That's where the bleeding comes. >> It's right, you get beat up all the time. So nobody likes to move data, ever. So what you're looking at with what they announce with HeatWave and what I love about HeatWave is it doesn't matter when you started with it, you get all the additional features they announce it's part of the service, all the time. But they don't have to move any of the data. You want to analyze the data that's in your transactional, MySQL database, it's there. You want to do machine learning models, it's there, there's no data movement. The data movement is the key thing, and they just eliminate that, in so many ways. And the other thing I wanted to talk about is on the benchmarks. As great as those benchmarks are, they're really conservative 'cause they're underestimating the cost of that data movement. The ETLs, the other services, everything's left out. It's just comparing HeatWave, MySQL cloud service with HeatWave versus Redshift, not Redshift and Aurora and Glue, Redshift and Redshift ML and SageMaker, it's just Redshift. >> Yeah, so what you're saying is what Oracle's doing is saying, "Okay, we're going to run MySQL HeatWave benchmarks on analytics against Redshift, and then we're going to run 'em in transaction against Aurora." >> Right. >> But if you really had to look at what you would have to do with the ETL, you'd have to buy two different data stores and all the infrastructure around that, and that goes away so. >> Due to the nature of the competition, they're running narrow best of breed benchmarks. There is no suite level benchmark (Dave laughs) because they created something new. >> Well that's you're the earlier point they're beating best of breed with a suite. So that's, I guess to Floyer's earlier point, "That's going to shake things up." But I want to come back to Bob Evans, 'cause I want to tap your Cloud Wars mojo before we wrap. And line up the horses, you got AWS, you got Microsoft, Google and Oracle. Now they all own their own cloud. Snowflake, Mongo, Couchbase, Redis, Cockroach by the way they're all doing very well. They run in the cloud as do many others. I think you guys all saw the Andreessen, you know, commentary from Sarah Wang and company, to talk about the cost of goods sold impact of cloud. So owning your own cloud has to be an advantage because other guys like Snowflake have to pay cloud vendors and negotiate down versus having the whole enchilada, Safra Catz's dream. Bob, how do you think this is going to impact the market long term? >> Well, Dave, that's a great question about, you know, how this is all going to play out. If I could mention three things, one, Frank Slootman has done a fantastic job with Snowflake. Really good company before he got there, but since he's been there, the growth mindset, the discipline, the rigor and the phenomenon of what Snowflake has done has forced all these bigger companies to really accelerate what they're doing. And again, it's an example of how this intense competition makes all the different cloud vendors better and it provides enormous value to customers. Second thing I wanted to mention here was look at the Adam Selipsky effect at AWS, took over in the middle of May, and in Q2, Q3, Q4, AWS's growth rate accelerated. And in each of those three quotas, they grew faster than Microsoft's cloud, which has not happened in two or three years, so they're closing the gap on Microsoft. The third thing, Dave, in this, you know, incredibly intense competitive nature here, look at Larry Ellison, right? He's got his, you know, the product that for the last two or three years, he said, "It's going to help determine the future of the company, autonomous database." You would think he's the last person in the world who's going to bring in, you know, in some ways another database to think about there, but he has put, you know, his whole effort and energy behind this. The investments Oracle's made, he's riding this horse really hard. So it's not just a technology achievement, but it's also an investment priority for Oracle going forward. And I think it's going to form a lot of how they position themselves to this new breed of buyer with a new type of need and expectations from IT. So I just think the next two or three years are going to be fantastic for people who are lucky enough to get to do the sorts of things that we do. >> You know, it's a great point you made about AWS. Back in 2018 Q3, they were doing about 7.4 billion a quarter and they were growing in the mid forties. They dropped down to like 29% Q4, 2020, I'm looking at the data now. They popped back up last quarter, last reported quarter to 40%, that is 17.8 billion, so they more doubled and they accelerated their growth rate. (laughs) So maybe that pretends, people are concerned about Snowflake right now decelerating growth. You know, maybe that's going to be different. By the way, I think Snowflake has a different strategy, the whole data cloud thing, data sharing. They're not trying to necessarily take Oracle head on, which is going to make this next 10 years, really interesting. All right, we got to go, last question. 30 seconds or less, what can we expect from the future of data platforms? Matt, please start. >> I have to go first again? You're killing me, Dave. (laughing) In the next few years, I think you're going to see the major players continue to meet customers where they are, right. Every organization, every environment is, you know, kind of, we use these words bespoke in Snowflake, pardon the pun, but Snowflakes, right. But you know, they're all opinionated and unique and what's great as an IT person is, you know, there is a service for me regardless of where I am on my journey, in my data management journey. I think you're going to continue to see with regards specifically to Oracle, I think you're going to see the company continue along this path of being all things to all people, if you will, or all organizations without sacrificing, you know, kind of richness of features and sacrificing who they are, right. Look, they are the data kings, right? I mean, they've been a database leader for an awful long time. I don't see that going away any time soon and I love the innovative spirit they've brought in with HeatWave. >> All right, great thank you. Okay, 30 seconds, Holgar go. >> Yeah, I mean, the interesting thing that we see is really that trend to autonomous as Oracle calls or self-driving software, right? So the database will have to do more things than just store the data and support the DVA. It will have to show it can wide insights, the whole upside, it will be able to show to one machine learning. We haven't really talked about that. How in just exciting what kind of use case we can get of machine learning running real time on data as it changes, right? So, which is part of the E5 announcement, right? So we'll see more of that self-driving nature in the database space. And because you said we can promote it, right. Check out my report about HeatWave latest release where I post in oracle.com. >> Great, thank you for that. And Bob Evans, please. You're great at quick hits, hit us. >> Dave, thanks. I really enjoyed getting to hear everybody's opinion here today and I think what's going to happen too. I think there's a new generation of buyers, a new set of CXO influencers in here. And I think what Oracle's done with this, MySQL HeatWave, those benchmarks that Ron talked about so eloquently here that is going to become something that forces other companies, not just try to get incrementally better. I think we're going to see a massive new wave of innovation to try to play catch up. So I really take my hat off to Oracle's achievement from going to, push everybody to be better. >> Excellent. Marc Staimer, what do you say? >> Sure, I'm going to leverage off of something Matt said earlier, "Those companies that are going to develop faster, cheaper, simpler products that are going to solve customer problems, IT problems are the ones that are going to succeed, or the ones who are going to grow. The one who are just focused on the technology are going to fall by the wayside." So those who can solve more problems, do it more elegantly and do it for less money are going to do great. So Oracle's going down that path today, Snowflake's going down that path. They're trying to do more integration with third party, but as a result, aiming at that simpler, faster, cheaper mentality is where you're going to continue to see this market go. >> Amen brother Marc. >> Thank you, Ron Westfall, we'll give you the last word, bring us home. >> Well, thank you. And I'm loving it. I see a wave of innovation across the entire cloud database ecosystem and Oracle is fueling it. We are seeing it, with the native integration of auto ML capabilities, elastic scaling, lower entry price points, et cetera. And this is just going to be great news for buyers, but also developers and increased use of open APIs. And so I think that is really the key takeaways. Just we're going to see a lot of great innovation on the horizon here. >> Guys, fantastic insights, one of the best power panel as I've ever done. Love to have you back. Thanks so much for coming on today. >> Great job, Dave, thank you. >> All right, and thank you for watching. This is Dave Vellante for theCube and we'll see you next time. (soft music)

Published Date : Mar 31 2022

SUMMARY :

and co-founder of the and then you answer And don't forget Sybase back in the day, the world these days? and others happening in the cloud, and you cover the competition, and Oracle and you know, whoever else. Mr. Staimer, how do you see things? in that I see the database some good meat on the bone Take away the database, That is the ability to scale on demand, and they got MySQL and you I think it's, you know, and the various momentums, and Microsoft right now at the moment. So where do you place your bets? And to what Bob and Holgar said, you know, and you know, very granular, and everything in the cloud market. And to what you were saying, you know, functionality that you can't get to you know, business consultant. you know, it's funny. and all of the TPC benchmarks, By the way, you know, and you know, just inside of that was of some of the data that they shared. the stack, you have the suite, and they're giving you the best of both. of the suite vendor, and you always get the ah In the data center Marc all the time And the other thing I wanted to talk about and then we're going to run 'em and all the infrastructure around that, Due to the nature of the competition, I think you guys all saw the Andreessen, And I think it's going to form I'm looking at the data now. and I love the innovative All right, great thank you. and support the DVA. Great, thank you for that. And I think what Oracle's done Marc Staimer, what do you say? or the ones who are going to grow. we'll give you the last And this is just going to Love to have you back. and we'll see you next time.

ENTITIES

Entity	Category	Confidence
David Floyer	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Ron Westfall	PERSON	0.99+
Dave	PERSON	0.99+
Marc Staimer	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Marc	PERSON	0.99+
Ellison	PERSON	0.99+
Bob Evans	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Matt	PERSON	0.99+
Holgar Mueller	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Frank Slootman	PERSON	0.99+
Ron	PERSON	0.99+
Staimer	PERSON	0.99+
Andy Jackson	PERSON	0.99+
Bob	PERSON	0.99+
Matt Kimball	PERSON	0.99+
Google	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
Sarah Wang	PERSON	0.99+
San Diego	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
Rob	PERSON	0.99+

Clemence W. Chee & Christoph Sawade, HelloFresh

(upbeat music) >> Hello everyone. We're here at theCUBE startup showcase made possible by AWS. Thanks so much for joining us today. You know, when Zhamak Dehghani was formulating her ideas around data mesh, she wasn't the only one thinking about decentralized data architectures. HelloFresh was going into hyper-growth mode and realized that in order to support its scale, it needed to rethink how it thought about data. Like many companies that started in the early part of the last decade, HelloFresh relied on a monolithic data architecture and the internal team it had concerns about its ability to support continued innovation at high velocity. The company's data team began to think about the future and work backwards from a target architecture, which possessed many principles of so-called data mesh, even though they didn't use that term specifically. The company is a strong example of an early but practical pioneer of data mesh. Now, there are many practitioners and stakeholders involved in evolving the company's data architecture many of whom are listed here on this slide. Two are highlighted in red and joining us today. We're really excited to welcome you to theCUBE, Clemence Chee, who is the global senior director for data at HelloFresh, and Christoph Sawade, who's the global senior director of data also of course at HelloFresh. Folks, welcome. Thanks so much for making some time today and sharing your story. >> Thank you very much. >> Thanks, Dave. >> All right, let's start with HelloFresh. You guys are number one in the world in your field. You deliver hundreds of millions of meals each year to many, many millions of people around the globe. You're scaling. Christoph, tell us a little bit more about your company and its vision. >> Yeah. Should I start or Clemence? Maybe take over the first piece because Clemence has actually been longer a director at HelloFresh. >> Yeah go ahead Clemence. >> I mean, yes, about approximately six years ago I joined and HelloFresh, and I didn't think about the startup I was joining would eventually IPO. And just two years later, HelloFresh went public. And approximately three years and 10 months after HelloFresh was listed on the German stock exchange which was just last week, HelloFresh was included in the DAX Germany's leading stock market index and that, to mind a great, great milestone, and I'm really looking forward and I'm very excited for the future for HelloFresh and also our data. The vision that we have is to become the world's leading food solution group. And there are a lot of attractive opportunities. So recently we did launch and expand in Norway. This was in July. And earlier this year, we launched the US brand, Green Chef, in the UK as well. We're committed to launch continuously different geographies in the next coming years and have a strong path ahead of us. With the acquisition of ready to eat companies like factor in the US and the plant acquisition of Youfoodz in Australia, we are diversifying our offer, now reaching even more and more untapped customer segments and increase our total address for the market. So by offering customers and growing range of different alternatives to shop food and to consume meals, we are charging towards this vision and this goal to become the world's leading integrated food solutions group. >> Love it. You guys are on a rocket ship. You're really transforming the industry. And as you expand your TAM, it brings us to sort of the data as a core part of that strategy. So maybe you guys could talk a little bit about your journey as a company, specifically as it relates to your data journey. I mean, you began as a startup, you had a basic architecture and like everyone, you've made extensive use of spreadsheets, you built a Hadoop based system that started to grow. And when the company IPO'd, you really started to explode. So maybe describe that journey from a data perspective. >> Yes, Dave. So HelloFresh by 2015, approximately had evolved what amount, a classical centralized data management set up. So we grew very organically over the years, and there were a lot of very smart people around the globe, really building the company and building our infrastructure. This also means that there were a small number of internal and external sources, data sources, and a centralized BI team with a number of people producing different reports, different dashboards and, and products for our executives, for example, or for different operations teams to see a company's performance and knowledge was transferred just by our talking to each other face-to-face conversations. And the people in the data warehouse team were considered as the data wizard or as the ETL wizard. Very classical challenges. And it was ETL, who reserved, indicated the kind of like a style of knowledge of data management, right? So our central data warehouse team then was responsible for different type of verticals in different domains, different geographies. And all this setup gave us in the beginning, the flexibility to grow fast as a company in 2015. >> Christoph, anything to add to that? >> Yes, not explicitly to that one, but as, as Clemence said, right, this was kind of the setup that actually worked for us quite a while. And then in 2017, when HelloFresh went public, the company also grew rapidly. And just to give you an idea how that looked like as well, the tech departments have actually increased from about 40 people to almost 300 engineers. And in the same way as the business units, as there Clemence has described, also grew sustainably. So we continue to launch HelloFresh in new countries, launched new brands like Every Plate, and also acquired other brands like we have Factor. And that grows also from a data perspective, the number of data requests that the central (mumbles), we're getting become more and more and more, and also more and more complex. So that for the team meant that they had a fairly high mental load. So they had to achieve a very, or basically get a very deep understanding about the business and also suffered a lot from this context, switching back and forth. Essentially, they had to prioritize across our product requests from our physical product, digital product, from a physical, from, sorry, from the marketing perspective, and also from the central reporting teams. And in a nutshell, this was very hard for these people, and that altered situations that let's say the solution that we have built. We can not really optimal. So in a, in a, in a, in a nutshell, the central function became a bottleneck and slow down of all the innovation of the company. >> It's a classic case. Isn't it? I mean, Clemence, you see, you see the central team becomes a bottleneck, and so the lines of business, the marketing team, sales teams say "Okay, we're going to take things into our own hands." And then of course IT and the technical team is called in later to clean up the mess. Maybe, maybe I'm overstating it, but, but that's a common situation. Isn't it? >> Yeah this is what exactly happened. Right. So we had a bottleneck, we had those central teams, there was always a bit of tension. Analytics teams then started in those business domains like marketing, supply chain, finance, HR, and so on started really to build their own data solutions. At some point you have to get the ball rolling, right? And then continue the trajectory, which means then that the data pipelines didn't meet the engineering standards. And there was an increased need for maintenance and support from central teams. Hence over time, the knowledge about those pipelines and how to maintain a particular infrastructure, for example, left the company, such that most of those data assets and data sets that turned into a huge debt with decreasing data quality, also decreasing lack of trust, decreasing transparency. And this was an increasing challenge where a majority of time was spent in meeting rooms to align on, on data quality for example. >> Yeah. And the point you were making Christoph about context switching, and this is, this is a point that Zhamak makes quite often as we've, we've, we've contextualized our operational systems like our sales systems, our marketing systems, but not our, our data systems. So you're asking the data team, okay, be an expert in sales, be an expert in marketing, be an expert in logistics, be an expert in supply chain and it's start, stop, start, stop. It's a paper cut environment, and it's just not as productive. But, but, and the flip side of that is when you think about a centralized organization, you think, hey, this is going to be a very efficient way across functional team to support the organization, but it's not necessarily the highest velocity, most effective organizational structure. >> Yeah. So, so I agree with that piece, that's up to a certain scale. A centralized function has a lot of advantages, right? So it's a tool for everyone, which would go to a destined kind of expert team. However, if you see that you actually would like to accelerate that in specific as the type of growth. But you want to actually have autonomy on certain teams and move the teams, or let's say the data to the experts in these teams. And this, as you have mentioned, right, that increases mental load. And you can either internally start splitting your team into different kinds of sub teams focusing on different areas, however, that is then again, just adding another piece where actually collaboration needs to happen because the external seized, so why not bridging that gap immediately and actually move these teams end to end into the, into the function themselves. So maybe just to continue what Clemence was saying, and this is actually where our, so, Clemence and my journey started to become one joint journey. So Clemence was coming actually from one of these teams who builds their own solutions. I was basically heading the platform team called data warehouse team these days. And in 2019, where (mumbles) become more and more serious, I would say, so more and more people have recognized that this model does not really scale, in 2019, basically the leadership of the company came together and identified data as a key strategic asset. And what we mean by that, that if he leveraged it in a, in a, an appropriate way, it gives us a unique, competitive advantage, which could help us to, to support and actually fully automate our decision making process across the entire value chain. So once we, what we're trying to do now, or what we would be aiming for is that HelloFresh is able to build data products that have a purpose. We're moving away from the idea that it's just a bi-product. We have a purpose why we would like to collect this data. There's a clear business need behind that. And because it's so important to, for the company as a business, we also want to provide them as a trustworthy asset to the rest of the organization. We'd say, this is the best customer experience, but at least in a way that users can easily discover, understand and securely access, high quality data. >> Yeah. So, and, and, and Clemence, when you see Zhamak's writing, you see, you know, she has the four pillars and the principles. As practitioners, you look at that say, okay, hey, that's pretty good thinking. And then now we have to apply it. And that's where the devil meets the details. So it's the for, the decentralized data ownership, data as a product, which we'll talk about a little bit, self-serve, which you guys have spent a lot of time on, and Clemence your wheelhouse, which is, which is governance and a federated governance model. And it's almost like if you, if you achieve the first two, then you have to solve for the second two, it almost creates a new challenges, but maybe you could talk about that a little bit as to how it relates to HelloFresh. >> Yes. So Chris has mentioned that we identified kind of a challenge beforehand and said, how can we actually decentralized and actually empower the different colleagues of ours? And this was more a, we realized that it was more an organizational or a cultural change. And this is something that someone also mentioned. I think ThoughtWorks mentioned one of the white papers, it's more of an organizational or a cultural impact. And we kicked off a phased reorganization, or different phases we're currently on, in the middle of still, but we kicked off different phases of organizational restructuring or reorganization trying to lock this data at scale. And the idea was really moving away from ever growing complex matrix organizations or matrix setups and split between two different things. One is the value creation. So basically when people ask the question, what can we actually do? What should we do? This is value creation and the how, which is capability building, and both are equal in authority. This actually then creates a high urge in collaboration and this collaboration breaks up the different silos that were built. And of course, this also includes different needs of staffing for teams staffing with more, let's say data scientists or data engineers, data professionals into those business domains, enhance, or some more capability building. >> Okay, go ahead. Sorry. >> So back to Zhamak Dehghani. So we, the idea also then crossed over when she published her papers in May, 2019. And we thought, well, the four pillars that she described were around decentralized data ownership, product, data as a product mindset, we have a self-service infrastructure. And as you mentioned, federated computational governance. And this suited very much with our thinking at that point of time to reorganize the different teams and this then that to not only organizational restructure, but also in completely new approach of how we need to manage data, through data. >> Got it. Okay. So your businesses is exploding. The data team was having to become domain experts to many areas, constantly context switching as we said, people started to take things into their own hands. So again, we said classic story, but, but you didn't let it get out of control and that's important. And so we, we actually have a picture of kind of where you're going today and it's evolved into this, Pat, if you could bring up the picture with the, the elephant, here we go. So I will talk a little bit about the architecture. It doesn't show it here, the spreadsheet era, but Christoph, maybe you could talk about that. It does show the Hadoop monolith, which exists today. I think that's in a managed hosting service, but, but you, you preserve that piece of it. But if I understand it correctly, everything is evolving to the cloud. I think you're running a lot of this or all of it in AWS. You've got, everybody's got their own data sources. You've got a data hub, which I think is enabled by a master catalog for discovery and all this underlying technical infrastructure that is, is really not the focus of this conversation today. But the key here, if I understand correctly is these domains are autonomous and that not only this required technical thinking, but really supportive organizational mindset, which we're going to talk about today. But, but Christoph, maybe you could address, you know, at a high level, some of the architectural evolution that you guys went through. >> Yeah, sure. Yeah. Maybe it's also a good summary about the entire history. So as you have mentioned, right, we started in the very beginning, it's a monolith on the operational plan, right? Actually it wasn't just one model it was two, one for the backend and one for the front end. And our analytical plan was essentially a couple of spreadsheets. And I think there's nothing wrong with spreadsheets, but it allows you to store information, it allows you to transform data, it allows you to share this information, it allows you to visualize this data, but all kind of, it's not actually separating concern, right? Every single one tool. And this means that it's obviously not scalable, right? You reach the point where this kind of management's set up in, or data management is in one tool, reached elements. So what we have started is we created our data lake, as we have seen here on our dupe. And just in the very beginning actually reflected very much our operation upon this. On top of that, we used Impala as a data warehouse, but there was not really a distinction between what is our data warehouse and what is our data lakes as the Impala was used as kind of both as a kind of engine to create a warehouse and data lake constructed itself. And this organic growth actually led to a situation. As I think it's clear now that we had the centralized model as, for all the domains that were really lose Kimball, the modeling standards and there's new uniformity we used to actually build, in-house, a base of building materialized use, of use that we have used for the presentation there. There was a lot of duplication of effort. And in the end, essentially the amendments and feedback tool, which helped us to, to improve of what we, have built during the end in a natural, as you said, the lack of trust. And this basically was a starting point for us to understand, okay, how can we move away? And there are a lot of different things that we can discuss of apart from this organizational structure that we have set up here, we have three or four pillars from Zhamak. However, there's also the next, extra question around, how do we implement product, right? What are the implications on that level and I think that is, that's something that we are, that we are currently still in progress. >> Got it. Okay. So I wonder if we could talk about, switch gears a little bit, and talk about the organizational and cultural challenges that you faced. What were those conversations like? And let's, let's dig into that a little bit. I want to get into governance as well. >> The conversations on the cultural change. I mean, yes, we went through a hyper growth through the last year, and obviously there were a lot of new joiners, a lot of different, very, very smart people joining the company, which then results that collaborations got a bit more difficult. Of course, the time zone changes. You have different, different artifacts that you had recreated in documentation that were flying around. So we were, we had to build the company from scratch, right? Of course, this then resulted always this tension, which I described before. But the most important part here is that data has always been a very important factor at HelloFresh, and we collected more of this data and continued to improve, use data to improve the different key areas of our business. Even when organizational struggles like the central (mumbles) struggles, data somehow always helped us to grow through this kind of change, right? In the end, those decentralized teams in our local geographies started with solutions that serve the business, which was very, very important. Otherwise, we wouldn't be at the place where we are today, but they did violate best practices and standards. And I always use the sports analogy, Dave. So like any sport, there are different rules and regulations that need to be followed. These routes are defined by, I'll call it, the sports association. And this is what you can think about other data governance and then our compliance team. Now we add the players to it who need to follow those rules and abide by them. This is what we then call data management. Now we have the different players, the professionals they also need to be trained and understand the strategy and the rules before they can play. And this is what I then called data literacy. So we realized that we need to focus on helping our teams to develop those capabilities and teach the standards for how work is being done to truly drive functional excellence in the different domains. And one of our ambition of our data literacy program for example, is to really empower every employee at HelloFresh, everyone, to make the right data-informed decisions by providing data education that scales (mumbles), and that can be different things. Different things like including data capabilities with, in the learning path for example, right? So help them to create and deploy data products, connecting data, producers, and data consumers, and create a common sense and more understanding of each other's dependencies, which is important. For example, SIS, SLO, state of contracts, et cetera, people get more of a sense of ownership and responsibility. Of course, we have to define what it means. What does ownership means? What does responsibility mean? But we are teaching this to our colleagues via individual learning patterns and help them upscale to use also their shared infrastructure, and those self-service data applications. And of all to summarize, we are still in this progress of learning. We're still learning as well. So learning never stops at Hello Fresh, but we are really trying this to make it as much fun as possible. And in the end, we all know user behavior is changed through positive experience. So instead of having massive training programs over endless courses of workshops, leaving our new joiners and colleagues confused and overwhelmed, we're applying gamification, right? So split different levels of certification where our colleagues, can access, have had access points. They can earn badges along the way, which then simplifies the process of learning and engagement of the users. And this is what we see in surveys, for example, where our employees value this gamification approach a lot and are even competing to collect those learning pet badges, to become the number one on the leaderboard. >> I love the gamification. I mean, we've seen it work so well in so many different industries, not the least of which is crypto. So you've identified some of the process gaps that you, you saw, you just gloss over them. Sometimes I say, pave the cow path. You didn't try to force. In other words, a new architecture into the legacy processes, you really had to rethink your approach to data management. So what did that entail? >> To rethink the way of data management, 100%. So if I take the example of revolution, industrial revolution or classical supply chain revolution, but just imagine that you have been riding a horse, for example, your whole life, and suddenly you can operate a car or you suddenly receive just a complete new way of transporting assets from A to B. So we needed to establish a new set of cross-functional business processes to run faster, drive faster, more robustly, and deliver data products which can be trusted and used by downstream processes and systems. Hence we had a subset of new standards and new procedures that would fall into the internal data governance and compliance sector. With internal, I'm always referring to the data operations around new things like data catalog, how to identify ownership, how to change ownership, how to certify data assets, everything around classical is software development, which we now apply to data. This, this is some old and new thinking, right? Deployment, versioning, QA, all the different things, ingestion policies, the deletion procedures, all the things that software development has been doing, we do it now with data as well. And it's simple terms, it's a whole redesign of the supply chain of our data with new procedures and new processes in asset creation, asset management and asset consumption. >> So data's become kind of the new development kit, if you will. I want to shift gears and talk about the notion of data product, and we have a slide that, that we pulled from your deck. And I'd like to unpack it a little bit. I'll just, if you can bring that up, I'll, I'll read it. A data product is a product whose primary objective is to leverage on data to solve customer problems, where customers are both internal and external. so pretty straightforward. I know you've, you've gone much deeper in your thinking and into your organization, but how do you think about that and how do you determine for instance, who owns what, how did you get everybody to agree? >> I can take that one. Maybe let me start as a data product. So I think that's an ongoing debate, right? And I think the debate itself is the important piece here, right? You mentioned the debate, you've clarified what we actually mean by that, a product, and what is actually the mindset. So I think just from a definition perspective, right? I think we find the common denominator that we say, okay, that our product is something which is important for the company that comes with value. What do you mean by that? Okay. It's a solution to a customer problem that delivers ideally maximum value to the business. And yes, leverage is the power of data. And we have a couple of examples, and I'll hit refresh here, the historical and classical ones around dashboards, for example, to monitor our error rates, but also more sophisticated based for example, to incorporate machine learning algorithms in our recipe recommendation. However, I think the important aspects of a data product is A: there is an owner, right? There's someone accountable for making sure that the product that you're providing is actually served and has maintained. And there are, there's someone who's making sure that this actually keeps the value of what we are promising. Combined with the idea of the proper documentation, like a product description, right? The people understand how to use it. What is this about? And related to that piece is the idea of, there's a purpose, right? We need to understand or ask ourselves, okay, why does a thing exist? Does it provide the value that we think it does? Then it leads in to a good understanding of what the life cycle of the data product and product life cycle. What do we mean? Okay. From the beginning, from the creation, you need to have a good understanding. You need to collect feedback. We need to learn about that, you need to rework, and actually finally, also to think about, okay, when is it time to decommission that piece So overall I think the core of this data product is product thinking 101, right? That we start, the point is, the starting point needs to be the problem and not the solution. And this is essentially what we have seen, what was missing, what brought us to this kind of data spaghetti that we have built there in Rush, essentially, we built it. Certain data assets develop in isolation and continuously patch the solution just to fulfill these ad hoc requests that we got and actually really understanding what the stakeholder needs. And the interesting piece as a results in duplication of (mumbled) And this is not just frustrating and probably not the most efficient way, how the company should work. But also if I build the same data assets, but slightly different assumption across the company and multiple teams that leads to data inconsistency. And imagine the following scenario. You, as a management, for management perspective, you're asking basically a specific question and you get essentially from a couple of different teams, different kinds of graphs, different kinds of data and numbers. And in the end, you do not know which ones to trust. So there's actually much (mumbles) but good. You do not know what actually is it noise for times of observing or is it just actually, is there actually a signal that I'm looking for? And the same as if I'm running an AB test, right? I have a new feature, I would like to understand what is the business impact of this feature? I run that with a specific source and an unfortunate scenario. Your production system is actually running on a different source. You see different numbers. What you have seen in the AB test is actually not what you see then in production, typical thing. Then as you asking some analytics team to actually do a deep dive, to understand where the discrepancies are coming from, worst case scenario again, there's a different kind of source. So in the end, it's a pretty frustrating scenario. And it's actually a waste of time of people that have to identify the root cause of this type of divergence. So in a nutshell, the highest degree of consistency is actually achieved if people are just reusing data assets. And also in the end, the meetup talk they've given, right? We start trying to establish this approach by AB testing. So we have a team, but just providing, or is kind of owning their target metric associated business teams, and they're providing that as a product also to other services, including the AB testing team. The AB testing team can use this information to find an interface say, okay, I'm drawing information for the metadata of an experiment. And in the end, after the assignment, after this data collection phase, they can easily add a graph to a dashboard just grouped by the AB testing barrier. And we have seen that also in other companies. So it's not just a nice dream that we have, right? I have actually looked at other companies maybe looked on search and we established a complete KPI pipeline that was computing all these information and this information both hosted by the team and those that (mumbles) AB testing, deep dives and, and regular reporting again. So just one last second, the, the important piece, Now, why I'm coming back to that is that it requires that we are treating this data as a product, right? If we want to have multiple people using the thing that I am owning and building, we have to provide this as a trust (mumbles) asset and in a way that it's easy for people to discover and to actually work with. >> Yeah. And coming back to that. So this is, to me this is why I get so excited about data mesh, because I really do think it's the right direction for organizations. When people hear data product, they think, "Well, what does that mean?" But then when you start to sort of define it as you did, it's using data to add value that could be cutting costs, that could be generating revenue, it could be actually directly creating a product that you monetize. So it's sort of in the eyes of the beholder, but I think the other point that we've made, is you made it earlier on too, and again, context. So when you have a centralized data team and you have all these P&L managers, a lot of times they'll question the data 'cause they don't own it. They're like, "Well, wait a minute." If it doesn't agree with their agenda, they'll attack the data. But if they own the data, then they're responsible for defending that. And that is a mindset change that's really important. And I'm curious is how you got to that ownership. Was it a top-down or was somebody providing leadership? Was it more organic bottom up? Was it a sort of a combination? How do you decide who owned what? In other words, you know, did you get, how did you get the business to take ownership of the data and what does owning the data actually mean? >> That's a very good question, Dave. I think that one of the pieces where I think we have a lot of learning and basically if you ask me how we could stop the filling, I think that would be the first piece that we need to start. Really think about how that should be approached. If it's staff has ownership, right? That means somehow that the team has the responsibility to host themselves the data assets to minimum acceptable standards. That's minimum dependencies up and down stream. The interesting piece has to be looking backwards. What was happening is that under that definition, this extra process that we have to go through is not actually transferring ownership from a central team to the other teams, but actually in most cases to establish ownership. I make this difference because saying we have to transfer ownership actually would erroneously suggest that the dataset was owned before, but this platform team, yes, they had the capability to make the change, but actually the analytics team, but always once we had the business understand the use cases and what no one actually bought, it's actually expensive, expected. So we had to go through this very lengthy process and establishing ownership, how we have done that as in the beginning, very naively started, here's a document, here are all the data assets, what is probably the nearest neighbor who can actually take care of that. And then we, we moved it over. But the problem here is that all these things is kind of technical debt, right? It's not really properly documented, pretty unstable. It was built in a very inconsistent way over years. And these people that built this thing have already left the company. So this is actually not a nice thing that you want to see and people build up a certain resistance, even if they have actually bought into this idea of domain ownership. So if you ask me these learnings, what needs to happen is first, the company needs to really understand what our core business concept that we have the need to have this mapping from this other core business concept that we have. These are the domain teams who are owning this concept, and then actually linked that to the, the assets and integrate that better, but suppose understanding how we can evolve, actually the data assets and new data builds things new and the, in this piece and the domain, but also how can we address reduction of technical depth and stabilizing what we have already. >> Thank you for that Christoph. So I want to turn a direction here and talk Clemence about governance. And I know that's an area that's passionate, you're passionate about. I pulled this slide from your deck, which I kind of messed up a little bit, sorry for that. But, but, but by the way, we're going to publish a link to the full video that you guys did. So we'll share that with folks, but it's one of the most challenging aspects of data mesh. If you're going to decentralize, you, you quickly realize this could be the wild west, as we talked about all over again. So how are you approaching governance? There's a lot of items on this slide that are, you know, underscore the complexity, whether it's privacy compliance, et cetera. So, so how did you approach this? >> It's yeah, it's about connecting those dots, right? So the aim of the data governance program is to promote the autonomy of every team while still ensuring that everybody has the right interoperability. So when we want to move from the wild west, riding horses to a civilized way of transport, I can take the example of modern street traffic. Like when all participants can maneuver independently, and as long as they follow the same rules and standards, everybody can remain compatible with each other and understand and learn from each other so we can avoid car crashes. So when I go from country to country, I do understand what the street infrastructure means. How do I drive my car? I can also read the traffic lights and the different signals. So likewise, as a business in HelloFresh we do operate autonomously and consequently need to follow those external and internal rules and standards set forth by the tradition in which we operate. So in order to prevent a, a car crash, we need to at least ensure compliance with regulations, to account for societies and our customers' increasing concern with data protection and privacy. So teaching and advocating this imaging, evangelizing this to everyone in the company was a key community or communication strategy. And of course, I mean, I mentioned data privacy, external factors, the same goes for internal regulations and processes to help our colleagues to adapt for this very new environment. So when I mentioned before, the new way of thinking, the new way of dealing and managing data, this of course implies that we need new processes and regulations for our colleagues as well. In a nutshell, then this means that data governance provides a framework for managing our people, the processes and technology and culture around our data traffic. And that governance must come together in order to have this effective program providing at least a common denominator is especially critical for shared data sets, which we have across our different geographies managed, and shared applications on shared infrastructure and applications. And as then consumed by centralized processes, for example, master data, everything, and all the metrics and KPIs, which are also used for a central steering. It's a big change, right? And our ultimate goal is to have this non-invasive federated, automated and computational governance. And for that, we can't just talk about it. We actually have to go deep and use case by use case and QC by PUC and generate learnings and learnings with the different teams. And this would be a classical approach of identifying the target structure, the target status, match it with the current status, by identifying together with the business teams, with the different domains and have a risk assessment, for example, to increase transparency because a lot of teams, they might not even know what kind of situation they might be. And this is where this training and this piece of data literacy comes into place, where we go in and trade based on the findings, based on the most valuable use case. And based on that, help our teams to do this change, to increase their capability. I just told a little bit more, I wouldn't say hand-holding, but a lot of guidance. >> Can I kind of kind of chime in quickly and (mumbled) below me, I mean, there's a lot of governance piece, but I think that is important. And if you're talking about documentation, for example, yes, we can go from team to team and tell these people, hey, you have to document your data assets and data catalog, or you have to establish a data contract and so on and forth. But if we would like to build data products at scale, following actual governance, we need to think about automation, right? We need to think about a lot of things that we can learn from engineering before, and just starts as simple things. Like if we would like to build up trust in our data products, right? And actually want to apply the same rigor and the best practices that we know from engineering. There are things that we can do. And we should probably think about what we can copy. And one example might be so the level of service level agreements, so that level objectives. So the level of indicators, right, that represent on a, on an engineering level, right? Are we providing services? They're representing the promises we make to our customer and to our consumers. These are the internal objectives that help us to keep those promises. And actually these audits of, of how we are tracking ourselves, how we are doing. And this is just one example of where I think the federated governance, governance comes into play, right? In an ideal world, you should not just talk about data as a product, but also data product that's code. That'd be say, okay, as most, as much as possible, right? Give the engineers the tool that they are familiar with, and actually not ask the product managers, for example, to document the data assets in the data catalog, but make it part of the configuration has as, as a, as a CDCI continuous delivery pipeline, as we typically see in other engineering, tasks through it and services maybe say, okay, there is configuration, we can think about PII, we can think about data quality monitoring, we can think about the ingestion data catalog and so on and forth. But I think ideally in a data product goals become a sort of templates that can be deployed and are actually rejected or verified at build time before we actually make them and deploy them to production. >> Yeah so it's like DevOps for data product. So, so I'm envisioning almost a three-phase approach to governance. And you're kind of, it sounds like you're in the early phase of it, call it phase zero, where there's learning, there's literacy, there's training education, there's kind of self-governance. And then there's some kind of oversight, some, a lot of manual stuff going on, and then you, you're trying to process builders at this phase and then you codify it and then you can automate it. Is that fair? >> Yeah. I would rather think, think about automation as early as possible in a way, and yes, it needs to be separate rules, but then actually start actually use case by use case. Is there anything that small piece that we can already automate? If just possible roll that out at the next extended step-by-step. >> Is there a role though, that adjudicates that? Is there a central, you know, chief state officer who's responsible for making sure people are complying or is it, how do you handle it? >> I mean, from a, from a, from a platform perspective, yes. This applies in to, to implement certain pieces, that we are saying are important and actually would like to implement, however, that is actually working very closely with the governance department, So it's Clemence's piece to understand that defy the policies that needs to be implemented. >> So good. So Clemence essentially, it's, it's, it's your responsibility to make sure that the policy is being followed. And then as you were saying, Christoph, you want to compress the time to automation as fast as possible. Is that, is that-- >> Yeah, so it's a really, it's a, what needs to be really clear is that it's always a split effort, right? So you can't just do one or the other thing, but there is some that really goes hand in hand because for the right information, for the right engineering tooling, we need to have the transparency first. I mean, code needs to be coded. So we kind of need to operate on the same level with the right understanding. So there's actually two things that are important, which is one it's policies and guidelines, but not only that, because more importantly or equally important is to align with the end-user and tech teams and engineering and really bridge between business value business teams and the engineering teams. >> Got it. So just a couple more questions, because we got to wrap up, I want to talk a little bit about the business outcome. I know it's hard to quantify and I'll talk about that in a moment, but, but major learnings, we've got some of the challenges that, that you cited. I'll just put them up here. We don't have to go detailed into this, but I just wanted to share with some folks, but my question, I mean, this is the advice for your peers question. If you had to do it differently, if you had a do over or a Mulligan, as we like to say for you, golfers, what, what would you do differently? >> I mean, I, can we start with, from, from the transformational challenge that understanding that it's also high load of cultural exchange. I think this is, this is important that a particular communication strategy needs to be put into place and people really need to be supported, right? So it's not that we go in and say, well, we have to change into, towards data mash, but naturally it's the human nature, nature, nature, we are kind of resistant to change, right? And (mumbles) uncomfortable. So we need to take that away by training and by communicating. Chris, you might want to add something to that. >> Definitely. I think the point that I've also made before, right? We need to acknowledge that data mesh it's an architectural scale, right? If you're looking for something which is necessary by huge companies who are vulnerable, that are product at scale. I mean, Dave, you mentioned that right, there are a lot of advantages to have a centralized team, but at some point it may make sense to actually decentralize here. And at this point, right, if you think about data mesh, you have to recognize that you're not building something on a green field. And I think there's a big learning, which is also reflected on the slide is, don't underestimate your baggage. It's typically is you come to a point where the old model doesn't work anymore. And as had a fresh write, we lost the trust in our data. And actually we have seen certain risks of slowing down our innovation. So we triggered that, this was triggering the need to actually change something. So at this transition applies that you took, we have a lot of technical depth accumulated over years. And I think what we have learned is that potentially we have, de-centralized some assets too early. This is not actually taking into account the maturity of the team. We are actually investigating too. And now we'll be actually in the face of correcting pieces of that one, right? But I think if you, if you, if you start from scratch, you have to understand, okay, is all my teams actually ready for taking on this new, this new capability? And you have to make sure that this is decentralization. You build up these capabilities and the teams, and as Clemence has mentioned, right? Make sure that you take the, the people on your journey. I think these are the pieces that also here it comes with this knowledge gap, right? That we need to think about hiring literacy, the technical depth I just talked about. And I think the, the last piece that I would add now, which is not here on the slide deck is also from our perspective, we started on the analytical layer because it was kind of where things are exploding, right? This is the bit where people feel the pain. But I think a lot of the efforts that we have started to actually modernize the current stage and data products, towards data mesh, we've understood that it always comes down basically to a proper shape of our operational plan. And I think what needs to happen is I think we got through a lot of pains, but the learning here is this needs to really be an, a commitment from the company. It needs to have an end to end. >> I think that point, that last point you made is so critical because I, I, I hear a lot from the vendor community about how they're going to make analytics better. And that's not, that's not unimportant, but, but true data product thinking and decentralized data organizations really have to operationalize in order to scale it. So these decisions around data architecture and organization, they're fundamental and lasting, it's not necessarily about an individual project ROI. They're going to be projects, sub projects, you know, within this architecture. But the architectural decision itself is organizational it's cultural and, and what's the best approach to support your business at scale. It really speaks to, to, to what you are, who you are as a company, how you operate and getting that right, as we've seen in the success of data-driven companies is, yields tremendous results. So I'll, I'll, I'll ask each of you to give, give us your final thoughts and then we'll wrap. Maybe. >> Just can I quickly, maybe just jumping on this piece, what you have mentioned, right, the target architecture. If you talk about these pieces, right, people often have this picture of (mumbled). Okay. There are different kinds of stages. We have (incomprehensible speech), we have actually a gesture layer, we have a storage layer, transformation layer, presentation data, and then we are basically putting a lot of technology on top of that. That's kind of our target architecture. However, I think what we really need to make sure is that we have these different kinds of views, right? We need to understand what are actually the capabilities that we need to know, what new goals, how does it look and feel from the different kinds of personas and experience view. And then finally that should actually go to the, to the target architecture from a technical perspective. Maybe just to give an outlook what we are planning to do, how we want to move that forward. Yes. Actually based on our strategy in the, in the sense of we would like to increase the maturity as a whole across the entire company. And this is kind of a framework around the business strategy and it's breaking down into four pillars as well. People meaning the data culture, data literacy, data organizational structure and so on. If you're talking about governance, as Clemence had actually mentioned that right, compliance, governance, data management, and so on, you're talking about technology. And I think we could talk for hours for that one it's around data platform, data science platform. And then finally also about enablements through data. Meaning we need to understand data quality, data accessibility and applied science and data monetization. >> Great. Thank you, Christoph. Clemence why don't you bring us home. Give us your final thoughts. >> Okay. I can just agree with Christoph that important is to understand what kind of maturity people have, but I understand we're at the maturity level, where a company, where people, our organization is, and really understand what does kind of, it's just kind of a change applies to that, those four pillars, for example, what needs to be tackled first. And this is not very clear from the very first beginning (mumbles). It's kind of like green field, you come up with must wins to come up with things that you really want to do out of theory and out of different white papers. Only if you really start conducting the first initiatives, you do understand that you are going to have to put those thoughts together. And where do I miss out on one of those four different pillars, people process technology and governance, but, and then that can often the integration like doing step by step, small steps, by small steps, not pulling the ocean where you're capable, really to identify the gaps and see where either you can fill the gaps or where you have to increase maturity first and train people or increase your tech stack. >> You know, HelloFresh is an excellent example of a company that is innovating. It was not born in Silicon Valley, which I love. It's a global company. And, and I got to ask you guys, it seems like it's just an amazing place to work. Are you guys hiring? >> Yes, definitely. We do. As, as mentioned right as well as one of these aspects distributing and actually hiring as an entire company, specifically for data. I think there are a lot of open roles, so yes, please visit or our page from data engineering, data, product management, and Clemence has a lot of roles that you can speak to about. But yes. >> Guys, thanks so much for sharing with theCUBE audience, you're, you're pioneers, and we look forward to collaborations in the future to track progress, and really want to thank you for your time. >> Thank you very much. >> Thank you very much Dave. >> And thank you for watching theCUBE's startup showcase made possible by AWS. This is Dave Volante. We'll see you next time. (cheerful music)

Published Date : Sep 15 2021

SUMMARY :

and the internal team it had the world in your field. Maybe take over the first and the plant acquisition And as you expand your TAM, the flexibility to grow So that for the team meant and so the lines of business, and so on started really to and the flip side of that say the data to the experts So it's the for, And the idea was really moving away Okay, go ahead. And as you mentioned, federated computational governance. is really not the focus of And in the end, and talk about the organizational And in the end, we all know user behavior not the least of which is crypto. So if I take the example of revolution, of the new development kit, And also in the end, So it's sort of in the the company needs to really but it's one of the most So the aim of the data governance and actually not ask the the early phase of it, that we can already automate? that defy the policies that the time to automation on the same level with the about the business outcome. So it's not that we go in and say, well, efforts that we have started to I hear a lot from the vendor in the sense of we would like Clemence why don't you bring us home. fill the gaps or where you And, and I got to ask you guys, that you can speak to about. collaborations in the future to track And thank you for watching

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Christoph	PERSON	0.99+
Chris	PERSON	0.99+
Christoph Sawade	PERSON	0.99+
2015	DATE	0.99+
Zhamak Dehghani	PERSON	0.99+
Youfoodz	ORGANIZATION	0.99+
Dave Volante	PERSON	0.99+
Clemence Chee	PERSON	0.99+
2019	DATE	0.99+
Norway	LOCATION	0.99+
2017	DATE	0.99+
AWS	ORGANIZATION	0.99+
May, 2019	DATE	0.99+
UK	LOCATION	0.99+
HelloFresh	ORGANIZATION	0.99+
Clemence	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
Australia	LOCATION	0.99+
100%	QUANTITY	0.99+
US	LOCATION	0.99+
July	DATE	0.99+
two	QUANTITY	0.99+
Clemence W. Chee	PERSON	0.99+
Two	QUANTITY	0.99+
TAM	ORGANIZATION	0.99+
one	QUANTITY	0.99+
three	QUANTITY	0.99+
Hello Fresh	ORGANIZATION	0.99+
first piece	QUANTITY	0.99+
one tool	QUANTITY	0.99+
last year	DATE	0.99+
last week	DATE	0.99+
two things	QUANTITY	0.99+
Zhamak	PERSON	0.99+
first	QUANTITY	0.99+
two years later	DATE	0.99+
Pat	PERSON	0.99+
second two	QUANTITY	0.99+
one last second	QUANTITY	0.99+
Green Chef	ORGANIZATION	0.99+
One	QUANTITY	0.98+
first two	QUANTITY	0.98+
one example	QUANTITY	0.98+
both	QUANTITY	0.98+
one model	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.97+
four pillars	QUANTITY	0.97+
Every Plate	ORGANIZATION	0.97+
today	DATE	0.97+
each	QUANTITY	0.97+
earlier this year	DATE	0.97+

CB Bohn, Principal Data Engineer, Microfocus | The Convergence of File and Object

>> Announcer: From around the globe it's theCUBE. Presenting the Convergence of File and Object brought to you by Pure Storage. >> Okay now we're going to get the customer perspective on object and we'll talk about the convergence of file and object, but really focusing on the object pieces this is a content program that's being made possible by Pure Storage and it's co-created with theCUBE. Christopher CB Bohn is here. He's a lead architect for MicroFocus the enterprise data warehouse and principal data engineer at MicroFocus. CB welcome good to see you. >> Thanks Dave good to be here. >> So tell us more about your role at Microfocus it's a pan Microfocus role because we know the company is a multi-national software firm it acquired the software assets of HP of course including Vertica tell us where you fit. >> Yeah so Microfocus is you know, it's like I can says it's wide, worldwide company that it sells a lot of software products all over the place to governments and so forth. And it also grows often by acquiring other companies. So there is there the problem of integrating new companies and their data. And so what's happened over the years is that they've had a number of different discreet data systems so you've got this data spread all over the place and they've never been able to get a full complete introspection on the entire business because of that. So my role was come in, design a central data repository and an enterprise data warehouse, that all reporting could be generated against. And so that's what we're doing and we selected Vertica as the EDW system and Pure Storage FlashBlade as the communal repository. >> Okay so you obviously had experience with with Vertica in your previous role, so it's not like you were starting from scratch, but paint a picture of what life was like before you embarked on this sort of consolidated approach to your data warehouse. Was it just dispared data all over the place? A lot of M and A going on, where did the data live? >> CB: So >> Right so again the data is all over the place including under people's desks and just dedicated you know their own private SQL servers, It, a lot of data in a Microfocus is one on SQL server, which has pros and cons. Cause that's a great transactional database but it's not really good for analytics in my opinion. So but a lot of stuff was running on that, they had one Vertica instance that was doing some select reporting. Wasn't a very powerful system and it was what they call Vertica enterprise mode where it had dedicated nodes which had the compute and storage in the same locus on each server okay. So Vertica Eon mode is a whole new world because it separates compute from storage. Okay and at first was implemented in AWS so that you could spin up you know different numbers of compute nodes and they all share the same communal storage. But there has been a demand for that kind of capability, but in an on-prem situation. Okay so Pure storage was the first vendor to come along and have an S3 emulation that was actually workable. And so Vertica worked with Pure Storage to make that all happen and that's what we're using. >> Yeah I know back when back from where we used to do face-to-face, we would be at you know Pure Accelerate, Vertica was always there it stopped by the booth, see what they're doing so tight integration there. And you mentioned Eon mode and the ability to scale, storage and compute independently. And so and I think Vertica is the only one I know they were the first, I'm not sure anybody else does that both for cloud and on-prem, but so how are you using Eon mode, are you both in AWS and on-prem are you exclusively cloud? Maybe you could describe that a little bit. >> Right so there's a number of internal rules at Microfocus that you know there's, it's not AWS is not approved for their business processes. At least not all of them, they really wanted to be on-prem and all the transactional systems are on-prem. And so we wanted to have the analytics OLAP stuff close to the OLTP stuff right? So that's why they called there, co-located very close to each other. And so we could, what's nice about this situation is that these S3 objects, it's an S3 object store on the Pure Flash Blade. We could copy those over if we needed it to AWS and we could spin up a version of Vertica there, and keep going. It's like a tertiary GR strategy cause we actually have a, we're setting up a second, Flash Blade Vertica system geo located elsewhere for backup and we can get into it if you want to talk about how the latest version of the Pure software for the Flash Blade allows synchronization across network boundaries of those Flash Blade which is really nice because if, you know there's a giant sinkhole opens up under our Koll of facility and we lose that thing then we just have to switch to DNS. And we were back in business of the DR. And then the third one was to go, we could copy those objects over to AWS and be up and running there. So we're feeling pretty confident about being able to weather whatever comes along. >> Yeah I'm actually very interested in that conversation but before we go there. you mentioned you want, you're going to have the old lab close to the OLTP, was that for latency reasons, data movement reasons, security, all of the above. >> Yeah it's really all of the above because you know we are operating under the same sub-net. So to gain access to that data, you know you'd have to be within that VPN environment. We didn't want to going out over the public internet. Okay so and just for latency reasons also, you know we have a lot of data and we're continually doing ETL processes into Vertica from our production data, transactional databases. >> Right so they got to be approximate. So I'm interested in so you're using the Pure Flash Blade as an object store, most people think, oh object simple but slow. Not the case for you is that right? >> Not the case at all >> Why is that. >> This thing had hoop It's ripping, well you have to understand about Vertica and the way it stores data. It stores data in what they call storage containers. And those are immutable, okay on disc whether it's on AWS or if you had a enterprise mode Vertica, if you do an update or delete it actually has to go and retrieve that object container from disc and it destroys it and rebuilds it, okay which is why you don't, you want to avoid updates and deletes with vertica because the way it gets its speed is by sorting and ordering and encoding the data on disk. So it can read it really fast. But if you do an operation where you're deleting or updating a record in the middle of that, then you've got to rebuild that entire thing. So that actually matches up really well with S3 object storage because it's kind of the same way, it gets destroyed and rebuilt too okay. So that matches up very well with Vertica and we were able to design the system so that it's a panda only. Now we have some reports that we're running in SQL server. Okay which we're taking seven days. So we moved that to Vertica from SQL server and we rewrote the queries, which were had, which had been written in TC SQL with a bunch of loops and so forth and we were to get, this is amazing it went from seven days to two seconds, to generate this report. Which has tremendous value to the company because it would have to have this long cycle of seven days to get a new introspection in what they call the knowledge base. And now all of a sudden it's almost on demand two seconds to generate it. That's great and that's because of the way the data is stored. And the S3 you asked about, oh you know it, it's slow, well not in that context. Because what happens really with Vertica Eon mode is that it can, they have, when you set up your compute nodes, they have local storage also which is called the depot. It's kind of a cache okay. So the data will be drawn from the Flash Blade and cached locally. And that was, it was thought when they designed that, oh you know it's that'll cut down on the latency. Okay but it turns out that if you have your compute nodes close meaning minimal hops to the Flash Blade that you can actually tell Vertica, you know don't even bother caching that stuff just read it directly on the fly from the from the Flash Blade and the performance is still really good. It depends on your situation. But I know for example a major telecom company that uses the same topologies we're talking about here they did the same thing. They just dropped the cache cause the Flash Blade was able to deliver the data fast enough. >> So that's, you're talking about that's speed of light issues and just the overhead of switching infrastructure is that, it's eliminated and so as a result you can go directly to the storage array? >> That's correct yeah, it's like, it's fast enough that it's almost as if it's local to the compute node. But every situation is different depending on your needs. If you've got like a few tables that are heavily used, then yeah put them in the cache because that'll be probably a little bit faster. But if you're have a lot of ad hoc queries that are going on, you know you may exceed the storage of the local cache and then you're better off having it just read directly from the, from the Flash Blade. >> Got it so it's >> Okay. >> It's an append only approach. So you're not >> Right >> Overwriting on a record, so but then what you have automatically re index and that's the intelligence of the system. how does that work? >> Oh this is where we did a little bit of magic. There's not really anything like magic but I'll tell you what it is I mean. ( Dave laughing) Vertica does not have indexes. They don't exist. Instead I told you earlier that it gets a speed by sorting and encoding the data on disk and ordering it right. So when you've got an append-only situation, the natural question is well if I have a unique record, with let's say ID one, two, three, what happens if I append a new version of that, what happens? Well the way Vertica operates is that there's a thing called a projection which is actually like a materialized columnar data store. And you can have a, what they call a top-K projection, which says only put in this projection the records that meet a certain condition. So there's a field that we like to call a discriminator field which is like okay usually it's the latest update timestamp. So let's say we have record one, two, three and it had yesterday's date and that's the latest version. Now a new version comes in. When the data at load time vertical looks at that and then it looks in the projection and says does this exist already? If it doesn't then it adds it. If it does then that one now goes into that projection okay. And so what you end up having is a projection that is the latest snapshot of the data, which would be like, oh that's the reality of what the table is today okay. But inherent in that is that you now have a table that has all the change history of those records, which is awesome. >> Yeah. >> Because, you often want to go back and revisit, you know what it will happen to you. >> But that materialized view is the most current and the system knows that at least can (murmuring). >> Right so we then create views that draw off from that projection so that our users don't have to worry about any of that. They just get oh and say select from this view and they're getting the latest greatest snapshot of what the reality of the data is right now. But if they want to go back and say, well how did this data look two days ago? That's an easy query for them to do also. So they get the best of both worlds. >> So could you just plug any flash array into your system and achieve the same results or is there anything really unique about Pure? >> Yeah well they're the only ones that have got I think really dialed in the S3 object form because I don't think AWS actually publishes every last detail of that S3 spec. Okay so it had, there's a certain amount of reverse engineering they had to do I think. But they got it right. When we've, a couple maybe a year and a half ago or so there they were like at 99%, but now they worked with Vertica people to make sure that that object format was true to what it should be. So that it works just as if Vertica doesn't care, if it is on AWS or if it's on Pure Flash Blade because Pure did a really good job of dialing in that format and so Vertica doesn't care. It just knows S3, doesn't know what it doesn't care where it's going it just works. >> So the essentially vendor R and D abstracted that complexity so you didn't have to rewrite the application is that right? >> Right, so you know when Vertica ships it's software, you don't get a specific version for Pure or AWS, it's all in one package, and then when you configure it, it knows oh okay well, I'm just pointed at the, you know this port, on the Pure storage Flash Blade, and it just works. >> CB what's your data team look like? How is it evolving? You know a lot of customers I talked to they complain that they struggled to get value out of the data and they don't have the expertise, what does your team look like? How is it, is it changing or did the pandemic change things at all? I wonder if you could bring us up to date on that? >> Yeah but in some ways Microfocus has an advantage in that it's such a widely dispersed across the world company you know it's headquartered in the UK, but I deal with people I'm in the Bay Area, we have people in Mexico, Romania, India. >> Okay enough >> All over the place yeah all over the place. So when this started, it was actually a bigger project it got scaled back, it was almost to the point where it was going to be cut. Okay, but then we said, well let's try to do almost a skunkworks type of thing with reduced staff. And so we're just like a hand. You could count the number of key people on this on one hand. But we got it all together, and it's been a traumatic transformation for the company. Now there's, it's one approval and admiration from the highest echelons of this company that, hey this is really providing value. And the company is starting to get views into their business that they didn't have before. >> That's awesome, I mean, I've watched Microfocus for years. So to me they've always had a, their part of their DNA is private equity I mean they're sharp investors, they do great M and A >> CB: Yeah >> They know how to drive value and they're doing modern M and A, you know, we've seen what they what wait, what they did with SUSE, obviously driving value out of Vertica, they've got a really, some sharp financial people there. So that's they must have loved the the Skunkworks, fast ROI you know, small denominator, big numerator. (laughing) >> Well I think that in this case, smaller is better when you're doing development. You know it's a two-minute cooks type of thing and if you've got people who know what they're doing, you know I've got a lot of experience with Vertica, I've been on the advisory board for Vertica for a long time. >> Right And you know I was able to learn from people who had already, we're like the second or third company to do a Pure Flash Blade Vertica installation, but some of the best companies after they've already done it we are members of the advisory board also. So I learned from the best, and we were able to get this thing up and running quickly and we've got you know, a lot of other, you know handful of other key people who know how to write SQL and so forth to get this up and running quickly. >> Yeah so I mean, look it Pure is a fit I mean I sound like a fan boy, but Pure is all about simplicity, so is object. So that means you don't have to ra, you know worry about wrangling storage and worrying about LANs and all that other nonsense and file names but >> I have burned by hardware in the past you know, where oh okay they built into a price and so they cheap out on stuff like fans or other things in these components fail and the whole thing goes down, but this hardware is super good quality. And so I'm happy with the quality of that we're getting. >> So CB last question. What's next for you? Where do you want to take this initiative? >> Well we are in the process now of, we're when, so I designed a system to combine the best of the Kimball approach to data warehousing and the inland approach okay. And what we do is we bring over all the data we've got and we put it into a pristine staging layer. Okay like I said it's a, because it's append-only, it's essentially a log of all the transactions that are happening in this company, just as they appear okay. And then from the Kimball side of things we're designing the data marts now. So that's what the end users actually interact with. So we're taking the, we're examining the transactional systems to say, how are these business objects created? What's the logic there and we're recreating those logical models in Vertica. So we've done a handful of them so far, and it's working out really well. So going forward we've got a lot of work to do, to create just about every object that the company needs. >> CB you're an awesome guest really always a pleasure talking to you and >> Thank you. >> congratulations and good luck going forward stay safe. >> Thank you, you too Dave. >> All right thank you. And thank you for watching the Convergence of File and Object. This is Dave Vellante for theCUBE. (soft music)

Published Date : Apr 28 2021

SUMMARY :

brought to you by Pure Storage. but really focusing on the object pieces it acquired the software assets of HP all over the place to Okay so you obviously so that you could spin up you know and the ability to scale, and we can get into it if you want to talk security, all of the above. Yeah it's really all of the above Not the case for you is that right? And the S3 you asked about, storage of the local cache So you're not and that's the intelligence of the system. and that's the latest version. you know what it will happen to you. and the system knows that at least the data is right now. in the S3 object form and then when you configure it, I'm in the Bay Area, And the company is starting to get So to me they've always had loved the the Skunkworks, I've been on the advisory a lot of other, you know So that means you don't have to by hardware in the past you know, Where do you want to take this initiative? object that the company needs. congratulations and good And thank you for watching

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Mexico	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
MicroFocus	ORGANIZATION	0.99+
Vertica	ORGANIZATION	0.99+
UK	LOCATION	0.99+
seven days	QUANTITY	0.99+
Romania	LOCATION	0.99+
99%	QUANTITY	0.99+
HP	ORGANIZATION	0.99+
Microfocus	ORGANIZATION	0.99+
two-minute	QUANTITY	0.99+
second	QUANTITY	0.99+
two seconds	QUANTITY	0.99+
India	LOCATION	0.99+
Kimball	ORGANIZATION	0.99+
Pure Storage	ORGANIZATION	0.99+
each server	QUANTITY	0.99+
CB Bohn	PERSON	0.99+
yesterday	DATE	0.99+
two days ago	DATE	0.99+
first	QUANTITY	0.99+
Christopher CB Bohn	PERSON	0.98+
SQL	TITLE	0.98+
Vertica	TITLE	0.98+
a year and a half ago	DATE	0.98+
both worlds	QUANTITY	0.98+
Pure Flash Blade	COMMERCIAL_ITEM	0.98+
both	QUANTITY	0.98+
vertica	TITLE	0.98+
Bay Area	LOCATION	0.97+
one	QUANTITY	0.97+
Flash Blade	COMMERCIAL_ITEM	0.97+
third one	QUANTITY	0.96+
CB	PERSON	0.96+
one package	QUANTITY	0.96+
today	DATE	0.95+
Pure storage Flash Blade	COMMERCIAL_ITEM	0.95+
first vendor	QUANTITY	0.95+
pandemic	EVENT	0.94+
S3	TITLE	0.94+
marts	DATE	0.92+
Skunkworks	ORGANIZATION	0.91+
SUSE	ORGANIZATION	0.89+
three	QUANTITY	0.87+
S3	COMMERCIAL_ITEM	0.87+
third company	QUANTITY	0.84+
Pure Flash Blade Vertica	COMMERCIAL_ITEM	0.83+

Pure Storage Convergence of File and Object FULL SHOW V1

we're running what i would call a little mini series and we're exploring the convergence of file and object storage what are the key trends why would you want to converge file an object what are the use cases and architectural considerations and importantly what are the business drivers of uffo so-called unified fast file and object in this program you'll hear from matt burr who is the gm of pure's flashblade business and then we'll bring in the perspectives of a solutions architect garrett belsner who's from cdw and then the analyst angle with scott sinclair of the enterprise strategy group esg he'll share some cool data on our power panel and then we'll wrap with a really interesting technical conversation with chris bond cb bond who is a lead data architect at microfocus and he's got a really cool use case to share with us so sit back and enjoy the program from around the globe it's thecube presenting the convergence of file and object brought to you by pure storage we're back with the convergence of file and object a special program made possible by pure storage and co-created with the cube so in this series we're exploring that convergence between file and object storage we're digging into the trends the architectures and some of the use cases for unified fast file and object storage uffo with me is matt burr who's the vice president and general manager of flashblade at pure storage hello matt how you doing i'm doing great morning dave how are you good thank you hey let's start with a little 101 you know kind of the basics what is unified fast file and object yeah so look i mean i think you got to start with first principles talking about the rise of unstructured data so um when we think about unstructured data you sort of think about the projections 80 of data by 2025 is going to be unstructured data whether that's machine generated data or um you know ai and ml type workloads uh you start to sort of see this um i don't want to say it's a boom uh but it's sort of a renaissance for unstructured data if you will we move away from you know what we've traditionally thought of as general purpose nas and and file shares to you know really things that focus on uh fast object taking advantage of s3 cloud native applications that need to integrate with applications on site um you know ai workloads ml workloads tend to look to share data across you know multiple data sets and you really need to have a platform that can deliver both highly performant and scalable fast file and object from one system so talk a little bit more about some of the drivers that you know bring forth that need to unify file an object yeah i mean look you know there's a there's there's a real challenge um in managing you know bespoke uh bespoke infrastructure or architectures around general purpose nas and daz etc so um if you think about how a an architect sort of looks at an application they might say well okay i need to have um you know fast daz storage proximal to the application um but that's going to require a tremendous amount of dams which is a tremendous amount of drives right hard drives are you know historically pretty pretty pretty unwieldy to manage because you're replacing them relatively consistently at multi-petabyte scale um so you start to look at things like the complexity of daz you start to look at the complexity of general purpose nas and you start to just look at quite frankly something that a lot of people don't really want to talk about anymore but actual data center space right like consolidation matters the ability to take you know something that's the size of a microwave like a modern flash blade or a modern um you know uffo device uh replaces something that might be you know the size of three or four or five refrigerators so matt what why is is now the right time for this i mean for years nobody really paid much attention to object s3 already obviously changed you know that course most of the world's data is still stored in file formats and you get there with nfs or smb why is now the time to think about unifying object and file well because we're moving to things like a contactless society um you know the the things that we're going to do are going to just require a tremendous amount more compute power network um and quite frankly storage throughput and you know i can give you two sort of real primary examples here right you know warehouses are being you know taken over by robots if you will um it's not a war it's a it's a it's sort of a friendly advancement in you know how do i how do i store a box in a warehouse and you know we have we have a customer who focuses on large sort of big box distribution warehousing and you know a box that carried a an object two weeks ago might have a different box size two weeks later well that robot needs to know where the space is in the data center in order to put it but also needs to be able to process hey i don't want to put the thing that i'm going to access the most in the back of the warehouse i'm going to put that thing in the front of the warehouse all of those types of data you know sort of real time you can think of the robot as almost an edge device is processing in real time unstructured data in its object right so it's sort of the emergence of these new types of workloads and i give you the opposite example the other end of the spectrum is ransomware right you know today you know we'll talk to customers and they'll say quite commonly hey if you know anybody can sell me a backup device i need something that can restore quickly um if you had the ability to restore something in 270 terabytes an hour or 250 terabytes an hour uh that's much faster when you're dealing with a ransomware attack you want to get your data back quickly you know so i want to add i was going to ask you about that later but since you brought it up what is the right i guess call it architecture for for for ransomware i mean how and explain like how unified object and file which appointment i get the fast recovery but how how would you recommend a customer uh go about architecting a ransomware proof you know system yeah well you know with with flashblade and and with flasharray there's an actual feature called called safe mode and that safe mode actually protects uh the snapshots and and the data from uh sort of being a part of the of the ransomware event and so if you're in a type of ransomware situation like this you're able to leverage safe mode and you say okay what happens in a ransomware attack is you can't get access to your data and so you know the bad guy the perpetrator is basically saying hey i'm not going to give you access to your data until you pay me you know x in bitcoin or whatever it might be right um with with safe mode those snapshots are actually protected outside of the ransomware blast zone and you can bring back those snapshots because what's your alternative if you're not doing something like that your alternative is either to pay and unlock your data or you have to start retouring restoring excuse me from tape or slow disk that could take you days or weeks to get your data back so leveraging safe mode um you know in either the flash for the flash blade product uh is a great way to go about architecting against ransomware i got to put my my i'm thinking like a customer now so safe mode so that's an immutable mode right can't change the data um is it can can an administrator go in and change that mode can you turn it off do i still need an air gap for example what would you recommend there yeah so there there are still um uh you know sort of our back or roll back role-based access control policies uh around who can access that safe mode and who can right okay so uh anyway subject for a different day i want to i want to actually bring up uh if you don't object a topic that i think used to be really front and center and it now be is becoming front and center again i mean wikibon just produced a research note forecasting the future of flash and hard drives and those of you who follow us know we've done this for quite some time and you can if you could bring up the chart here you you could and we see this happening again it was originally we forecast the the the death of of quote-unquote high spin speed disc drives which is kind of an oxymoron but you can see on here on this chart this hard disk had a magnificent journey but they peaked in volume in manufacturing volume in 2010 and the reason why that is is so important is that volumes now are steadily dropping you can see that and we use wright's law to explain why this is a problem and wright's law essentially says that as you your cumulative manufacturing volume doubles your cost to manufacture decline by a constant percentage now i won't go too much detail on that but suffice it to say that flash volumes are growing very rapidly hdd volumes aren't and so flash because of consumer volumes can take advantage of wright's law and that constant reduction and that's what's really important for the next generation which is always more expensive to build uh and so this kind of marks the beginning of the end matt what do you think what what's the future hold for spinning disc in your view uh well i can give you the answer on two levels on a personal level uh it's why i come to work every day uh you know the the eradication or or extinction of an inefficient thing um you know i like to say that uh inefficiency is the bane of my existence uh and i think hard drives are largely inefficient and i'm willing to accept the sort of long-standing argument that um you know we've seen this transition in block right and we're starting to see it repeat itself in in unstructured data and i'm going to accept the argument that cost is a vector here and it most certainly is right hdds have been considerably cheaper uh than than than flash storage um you know even to this day uh you know up up to this point right but we're starting to approach the point where you sort of reach a a 3x sort of um you know differentiator between the cost of an hdd and an std and you know that really is that point in time when uh you begin to pick up a lot of volume and velocity and so you know that tends to map directly to you know what you're seeing here which is you know a a slow decline uh which i think is going to become even more rapid kind of probably starting around next year um where you start to see sds excuse me ssds uh you know really replacing hdds uh at a much more rapid clip particularly on the unstructured data side and it's largely around cost the the workloads that we talked about robots and warehouses or you know other types of advanced machine learning and artificial intelligence type applications and workflows you know they require a degree of performance that a hard drive just can't deliver we are we are seeing sort of the um creative innovative uh disruption of an entire industry right before our eyes it's a fun thing to live through yeah and and we would agree i mean it doesn't the premise there is that it doesn't have to be less expensive we think it will be by you know the second half or early second half of this decade but even if it's a we think around a 3x delta the value of of ssd relative to spinning disk is going to overwhelm just like with your laptop you know it got to the point where you said why would i ever have a spinning disc in my laptop we see the same thing happening here um and and so and we're talking about you know raw capacity you know put in compression and d-dupe and everything else that you really can't do with spinning discs because of the performance issues you can do with flash okay let's come back to uffo can we dig into the challenges specifically that that this solves for customers give me give us some examples yeah so you know i mean if we if we think about the examples um you know the the robotic one um i think is is is the one that i think is the marker for you know kind of of of the the modern side of of of what we see here um but what we're you know what we're what we're seeing from a trend perspective which you know not everybody's deploying robots right um you know there's there's many companies that are you know that aren't going to be in either the robotic business uh or or even thinking about you know sort of future type oriented type things but what they are doing is green field applications are being built on object um generally not on not on file and and not on block and so you know the rise of of object as sort of the the sort of let's call it the the next great protocol for um you know for uh for for modern workloads right this is this is that that modern application coming to the forefront and that could be anything from you know financial institutions you know right down through um you we've even see it and seen it in oil and gas uh we're also seeing it across across healthcare uh so you know as as as companies take the opportunity as industries to take this opportunity to modernize you know they're modernizing not on things that are are leveraging you know um you know sort of archaic disk technology they're they're they're really focusing on on object but they still have file workflows that they need to that they need to be able to support and so having the ability to be able to deliver those things from one device in a capacity orientation or a performance orientation uh while at the same time dramatically simplifying uh the overall administration of your environment both physically and non-physically is a key driver so the great thing about object is it's simple it's a kind of a get put metaphor um it's it scales out you know because it's got metadata associated with the data uh and and it's cheap uh the drawback is you don't necessarily associate it with high performance and and and as well most applications don't you know speak in that language they speak in the language of file you know or as you mentioned block so i i see real opportunities here if i have some some data that's not necessarily frequently accessed you know every day but yet i want to then whether end of quarter or whatever it is i want to i want to or machine learning i want to apply some ai to that data i want to bring it in and then apply a file format uh because for performance reasons is that right maybe you could unpack that a little bit yeah so um you know we see i mean i think you described it well right um but i don't think object necessarily has to be slow um and nor does it have to be um you know because when you think about you brought up a good point with metadata right being able to scale to a billions of objects being able to scale to billions of objects excuse me is of value right um and i think people do traditionally associate object with slow but it's not necessarily slow anymore right we we did a sort of unofficial survey of of of our of our customers and our employee base and when people described object they thought of it as like law firms and storing a word doc if you will um and that that's just you know i think that there's a lack of understanding or a misnomer around what modern what modern object has become and perform an object particularly at scale when we're talking about billions of objects you know that's the next frontier right um is it at pace performance wise with you know the other protocols no uh but it's making leaps and grounds so you talked a little bit more about some of the verticals that you see i mean i think when i think of financial services i think transaction processing but of course they have a lot of tons of unstructured data are there any patterns you're seeing by by vertical market um we're you know we're not that's the interesting thing um and you know um as a as a as a as a company with a with a block heritage or a block dna those patterns were pretty easy to spot right there were a certain number of databases that you really needed to support oracle sql some postgres work et cetera then kind of the modern databases around cassandra and things like that you knew that there were going to be vmware environments you know you could you could sort of see the trends and where things were going unstructured data is such a broader horizontal thing right so you know inside of oil and gas for example you have you know um you have specific applications and bespoke infrastructures for those applications um you know inside of media entertainment you know the same thing the the trend that we're seeing the commonality that we're seeing is the modernization of you know object as a starting point for all the all the net new workloads within within those industry verticals right that's the most common request we see is what's your object roadmap what's your you know what's your what's your object strategy you know where do you think where do you think object is going so um there isn't any um you know sort of uh there's no there's no path uh it's really just kind of a wide open field in front of us with common requests across all industries so the amazing thing about pure just as a kind of a little you know quasi you know armchair historian the industry is pure was really the only company in many many years to be able to achieve escape velocity break through a billion dollars i mean three part couldn't do it isilon couldn't do it compellent couldn't do it i could go on but pure was able to achieve that as an independent company and so you become a leader you look at the gartner magic quadrant you're a leader in there i mean if you've made it this far you've got to have some chops and so of course it's very competitive there are a number of other storage suppliers that have announced products that unify object and file so i'm interested in how pure differentiates why pure um it's a great question um and it's one that uh you know having been a long time puritan uh you know i take pride in answering um and it's actually a really simple answer um it's it's business model innovation and technology right the the technology that goes behind how we do what we do right and i don't mean the product right innovation is product but having a better support model for example um or having on the business model side you know evergreen storage right where we sort of look at your relationship to us as a subscription right um you know we're going to sort of take the thing that that you've had and we're going to modernize that thing in place over time such that you're not rebuying that same you know terabyte or you know petabyte of storage that you've that you that you've paid for over time so um you know sort of three legs of the stool uh that that have made you know pure clearly differentiated i think the market has has recognized that um you're right it's it's hard to break through to a billion dollars um but i look forward to the day that you know we we have two billion dollar products and i think with uh you know that rise in in unstructured data growing to 80 by 2025 and you know the massive transition that you know you guys have noted in in in your hdd slide i think it's a huge opportunity for us on you know the other unstructured data side of the house you know the other thing i'd add matt i've talked to cause about this is is it's simplicity first i've asked them why don't you do this why don't you do it and the answer is always the same is that adds complexity and we we put simplicity for the customer ahead of everything else and i think that served you very very well what about the economics of of unified file an object i mean if you bring in additional value presumably there's a there there's a cost to that but there's got to be also a business case behind it what kind of impact have you seen uh with customers yeah i mean look i'll i'll i'll go back to something i mentioned earlier which is just the reclamation of floor space and power and cooling right um you know there's a you know there's people people people want to search for kind of the the sexier element if you will when it comes to looking at how we how you derive value from something but the reality is if you're reducing your power consumption by you know by by a material percentage power bills matter in big in big data centers um you know customers typically are are facing you know a paradigm of well i i want to go to the cloud but you know the clouds are not being more expensive than i thought it was going to be or you know i figured out what i can use in the cloud i thought it was going to be everything but it's not going to be everything so hybrid's where we're landing but i want to be out of the data center business and i don't want to have a team of 20 storage people to match you know to administer my storage um you know so there's sort of this this very tangible value around you know hey if i could manage um you know multiple petabytes with one full-time engineer uh because the system uh to yoran kaz's point was radically simpler to administer didn't require someone to be running around swapping drives all the time would that be a value the answer is yes 100 of the time right and then you start to look at okay all right well on the uffo side from a product perspective hey if i have to manage a you know bespoke environment for this application if i have to manage a bespoke environment for this application and a bespoke environment for this application and this book environment for this application i'm managing four different things and can i actually share data across those four different things there's ways to share data but most customers it just gets too complex how do you even know what your what your gold.master copy is of data if you have it in four different places or you try to have it in four different places and it's four different siloed infrastructures so when you get to the sort of the side of you know how do we how do you measure value in uffo it's actually being able to have all of that data concentrated in one place so that you can share it from application to application got it i'm interested we use a couple minutes left i'm interested in the the update on flashblade you know generally but also i have a specific question i mean look getting file right is hard enough uh you just announced smb support for flashblade i'm interested in you know how that fits in i think it's kind of obvious with file and object converging but give us the update on on flashblade and maybe you could address that specific question yeah so um look i mean we're we're um you know tremendously excited about the growth of flashblade uh you know we we we found workloads we never expected to find um you know the rapid restore workload was one that was actually brought to us from from from a customer actually and has become you know one of our one of our top two three four you know workloads so um you know we're really happy with the trend we've seen in it um and you know mapping back to you know thinking about hdds and ssds you know we're well on a path to building a billion dollar business here so you know we're very excited about that um but to your point you know you don't just snap your fingers and get there right um you know we've learned that doing file and object uh is is harder than block um because there's more things that you have to go do for one you're basically focused on three protocols s b nfs and s3 not necessarily in that order um but to your point about smb uh you know we we are uh on the path through to releasing um you know smb uh full full native smb support in in the system that will allow us to uh service customers we have a limitation with some customers today where they'll have an s b portion of their nfs workflow um and we do great on the nfs side um but you know we didn't we didn't have the ability to plug into the s p component of their workflow so that's going to open up a lot of opportunity for us um on on that front um and you know we continue to you know invest significantly across the board in in areas like security which is you know become more than just a hot button you know today security's always been there but it feels like it's blazing hot today um and so you know going through the next couple years we'll be looking at uh you know developing some some um you know pretty material security elements of the product as well so uh well on a path to a billion dollars is the net on that and uh you know we're we're fortunate to have have smb here and we're looking forward to introducing that to to those customers that have you know nfs workloads today with an s p component yeah nice tailwind good tam expansion strategy matt thanks so much really appreciate you coming on the program we appreciate you having us and uh thanks much dave good to see you [Music] okay we're back with the convergence of file and object in a power panel this is a special content program made possible by pure storage and co-created with the cube now in this series what we're doing is we're exploring the coming together of file and object storage trying to understand the trends that are driving this convergence the architectural considerations that users should be aware of and which use cases make the most sense for so-called unified fast file in object storage and with me are three great guests to unpack these issues garrett belsner is the data center solutions architect he's with cdw scott sinclair is a senior analyst at enterprise strategy group he's got deep experience on enterprise storage and brings that independent analyst perspective and matt burr is back with us gentlemen welcome to the program thank you hey scott let me let me start with you uh and get your perspective on what's going on the market with with object the cloud a huge amount of unstructured data out there that lives in files give us your independent view of the trends that you're seeing out there well dave you know where to start i mean surprise surprise date is growing um but one of the big things that we've seen is we've been talking about data growth for what decades now but what's really fascinating is or changed is because of the digital economy digital business digital transformation whatever you call it now people are not just storing data they actually have to use it and so we see this in trends like analytics and artificial intelligence and what that does is it's just increasing the demand for not only consolidation of massive amounts of storage that we've seen for a while but also the demand for incredibly low latency access to that storage and i think that's one of the things that we're seeing that's driving this need for convergence as you put it of having multiple protocols consolidated onto one platform but also the need for high performance access to that data thank you for that a great setup i got like i wrote down three topics that we're going to unpack as a result of that so garrett let me let me go to you maybe you can give us the perspective of what you see with customers is is this is this like a push where customers are saying hey listen i need to converge my file and object or is it more a story where they're saying garrett i have this problem and then you see unified file and object as a solution yeah i think i think for us it's you know taking that consultative approach with our customers and really kind of hearing pain around some of the pipelines the way that they're going to market with data today and kind of what are the problems that they're seeing we're also seeing a lot of the change driven by the software vendors as well so really being able to support a disaggregated design where you're not having to upgrade and maintain everything as a single block has really been a place where we've seen a lot of customers pivot to where they have more flexibility as they need to maintain larger volumes of data and higher performance data having the ability to do that separate from compute and cache and those other layers are is really critical so matt i wonder if if you could you know follow up on that so so gary was talking about this disaggregated design so i like it you know distributed cloud etc but then we're talking about bringing things together in in one place right so square that circle how does this fit in with this hyper-distributed cloud edge that's getting built out yeah you know i mean i i could give you the easy answer on that but i could also pass it back to garrett in the sense that you know garrett maybe it's important to talk about um elastic and splunk and some of the things that you're seeing in in that world and and how that i think the answer to dave's question i think you can give you can give a pretty qualified answer relative what your customers are seeing oh that'd be great please yeah absolutely no no problem at all so you know i think with um splunk kind of moving from its traditional design and classic design whatever you want you want to call it up into smart store um that was kind of one of the first that we saw kind of make that move towards kind of separating object out and i think you know a lot of that comes from their own move to the cloud and updating their code to basically take advantage of object object in the cloud uh but we're starting to see you know with like vertica eon for example um elastic other folks taking that same type of approach where in the past we were building out many 2u servers we were jamming them full of uh you know ssds and nvme drives that was great but it doesn't really scale and it kind of gets into that same problem that we see with you know hyper convergence a little bit where it's you know you're all you're always adding something maybe that you didn't want to add um so i think it you know again being driven by software is really kind of where we're seeing the world open up there but that whole idea of just having that as a hub and a central place where you can then leverage that out to other applications whether that's out to the edge for machine learning or ai applications to take advantage of it i think that's where that convergence really comes back in but i think like scott mentioned earlier it's really folks are now doing things with the data where before i think they were really storing it trying to figure out what are we going to actually do with it when we need to do something with it so this is making it possible yeah and dave if i could just sort of tack on to the end of garrett's answer there you know in particular vertica with neon mode the ability to leverage sharded subclusters give you um you know sort of an advantage in terms of being able to isolate performance hot spots you an advantage to that is being able to do that on a flashblade for example so um sharded subclusters allow you to sort of say i'm you know i'm going to give prioritization to you know this particular element of my application and my data set but i can still share those share that data across those across those subclusters so um you know as you see you know vertica advance with eon mode or you see splunk advance with with smart store you know these are all sort of advancements that are you know it's a chicken in the egg thing um they need faster storage they need you know sort of a consolidated data storage data set um and and that's what sort of allows these things to drive forward yeah so vertica eon mode for those who don't know it's the ability to separate compute and storage and scale independently i think i think vertica if they're if they're not the only one they're one of the only ones i think they might even be the only one that does that in the cloud and on-prem and that sort of plays into this distributed you know nature of this hyper-distributed cloud i sometimes call it and and i'm interested in the in the data pipeline and i wonder scott if we could talk a little bit about that maybe we're unified object and file i mean i'm envisioning this this distributed mesh and then you know uffo is sort of a node on that that i i can tap when i need it but but scott what are you seeing as the state of infrastructure as it relates to the data pipeline and the trends there yeah absolutely dave so when i think data pipeline i immediately gravitate to analytics or or machine learning initiatives right and so one of the big things we see and this is it's an interesting trend it seems you know we continue to see increased investment in ai increased interest and people think and as companies get started they think okay well what does that mean well i got to go hire a data scientist okay well that data scientist probably needs some infrastructure and what they end what often happens in these environments is where it ends up being a bespoke environment or a one-off environment and then over time organizations run into challenges and one of the big challenges is the data science team or people whose jobs are outside of it spend way too much time trying to get the infrastructure to to keep up with their demands and predominantly around data performance so one of the one of the ways organizations that especially have artificial intelligence workloads in production and we found this in our research have started mitigating that is by deploying flash all across the data pipeline we have we have data on this sorry interrupt but yeah if you could bring up that that chart that would be great um so take us through this uh uh scott and share with us what we're looking at here yeah absolutely so so dave i'm glad you brought this up so we did this study um i want to say late last year uh one of the things we looked at was across artificial intelligence environments now one thing that you're not seeing on this slide is we went through and we asked all around the data pipeline and we saw flash everywhere but i thought this was really telling because this is around data lakes and when when or many people think about the idea of a data lake they think about it as a repository it's a place where you keep maybe cold data and what we see here is especially within production environments a pervasive use of flash storage so i think that 69 of organizations are saying their data lake is mostly flash or all flash and i think we have zero percent that don't have any flash in that environment so organizations are finding out that they that flash is an essential technology to allow them to harness the value of their data so garrett and then matt i wonder if you could chime in as well we talk about digital transformation and i sometimes call it you know the coveted forced march to digital transformation and and i'm curious as to your perspective on things like machine learning and the adoption and scott you may have a perspective on this as well you know we had to pivot we had to get laptops we had to secure the end points you know and vdi those became super high priorities what happened to you know injecting ai into my applications and and machine learning did that go in the back burner was that accelerated along with the need to digitally transform garrett i wonder if you could share with us what you saw with with customers last year yeah i mean i think we definitely saw an acceleration um i think folks are in in my market are still kind of figuring out how they inject that into more of a widely distributed business use case but again this data hub and allowing folks to now take advantage of this data that they've had in these data lakes for a long time i agree with scott i mean many of the data lakes that we have were somewhat flash accelerated but they were typically really made up of you know large capacity slower spinning near-line drive accelerated with some flash but i'm really starting to see folks now look at some of those older hadoop implementations and really leveraging new ways to look at how they consume data and many of those redesigned customers are coming to us wanting to look at all flash solutions so we're definitely seeing it we're seeing an acceleration towards folks trying to figure out how to actually use it in more of a business sense now or before i feel it goes a little bit more skunk works kind of people dealing with uh you know in a much smaller situation maybe in the executive offices trying to do some testing and things scott you're nodding away anything you can add in here yeah so first off it's great to get that confirmation that the stuff we're seeing in our research garrett's seeing you know out in the field and in the real world um but you know as it relates to really the past year it's been really fascinating so one of the things we study at esg is i.t buying intentions what are things what are initiatives that companies plan to invest in and at the beginning of 2020 we saw a heavy interest in machine learning initiatives then you transition to the middle of 2020 in the midst of covid some organizations continued on that path but a lot of them had the pivot right how do we get laptops to everyone how do we continue business in this new world well now as we enter into 2021 and hopefully we're coming out of this uh you know the pandemic era um we're getting into a world where organizations are pivoting back towards these strategic investments around how do i maximize the usage of data and actually accelerating those because they've seen the importance of of digital business initiatives over the past year yeah matt i mean when we exited 2019 we saw a narrowing of experimentation and our premise was you know that that organizations are going to start now operationalizing all their digital transformation experiments and and then we had a you know 10 month petri dish on on digital so what do you what are you seeing in this regard a 10 month petri dish is an interesting way to interesting way to describe it um you know we saw another there's another there's another candidate for pivot in there around ransomware as well right um you know security entered into the mix which took people's attention away from some of this as well i mean look i'd like to bring this up just a level or two um because what we're actually talking about here is progress right and and progress isn't is an inevitability um you know whether it's whether whether you believe that it's by 2025 or you or you think it's 2035 or 2050 it doesn't matter we're on a forced march to the eradication of disk and that is happening in many ways uh you know in many ways um due to some of the things that garrett was referring to and what scott was referring to in terms of what are customers demands for how they're going to actually leverage the data that they have and that brings me to kind of my final point on this which is we see customers in three phases there's the first phase where they say hey i have this large data store and i know there's value in there i don't know how to get to it or i have this large data store and i've started a project to get value out of it and we failed those could be customers that um you know marched down the hadoop path early on and they they got some value out of it um but they realized that you know hdfs wasn't going to be a modern protocol going forward for any number of reasons you know the first being hey if i have gold.master how do i know that i have gold.4 is consistent with my gold.master so data consistency matters and then you have the sort of third group that says i have these large data sets i know how to extract value from them and i'm already on to the verticas the elastics you know the splunks etc um i think those folks are the folks that that ladder group are the folks that kept their their their projects going because they were already extracting value from them the first two groups we we're seeing sort of saying the second half of this year is when we're going to begin really being picking up on these on these types of initiatives again well thank you matt by the way for for hitting the escape key because i think value from data really is what this is all about and there are some real blockers there that i kind of want to talk about you mentioned hdfs i mean we were very excited of course in the early days of hadoop many of the concepts were profound but at the end of the day it was too complicated we've got these hyper-specialized roles that are that are you know serving the business but it still takes too long it's it's too hard to get value from data and one of the blockers is infrastructure that the complexity of that infrastructure really needs to be abstracted taking up a level we're starting to see this in in cloud where you're seeing some of those abstraction layers being built from some of the cloud vendors but more importantly a lot of the vendors like pew are saying hey we can do that heavy lifting for you uh and we you know we have expertise in engineering to do cloud native so i'm wondering what you guys see uh maybe garrett you could start us off and other students as some of the blockers uh to getting value from data and and how we're going to address those in the coming decade yeah i mean i i think part of it we're solving here obviously with with pure bringing uh you know flash to a market that traditionally was utilizing uh much slower media um you know the other thing that i that i see that's very nice with flashblade for example is the ability to kind of do things you know once you get it set up a blade at a time i mean a lot of the things that we see from just kind of more of a you know simplistic approach to this like a lot of these teams don't have big budgets and being able to kind of break them down into almost a blade type chunk i think has really kind of allowed folks to get more projects and and things off the ground because they don't have to buy a full expensive system to run these projects so that's helped a lot i think the wider use cases have helped a lot so matt mentioned ransomware you know using safe mode as a place to help with ransomware has been a really big growth spot for us we've got a lot of customers very interested and excited about that and the other thing that i would say is bringing devops into data is another thing that we're seeing so kind of that push towards data ops and really kind of using automation and infrastructure as code as a way to now kind of drive things through the system the way that we've seen with automation through devops is really an area we're seeing a ton of growth with from a services perspective guys any other thoughts on that i mean we're i'll tee it up there we are seeing some bleeding edge which is somewhat counterintuitive especially from a cost standpoint organizational changes at some some companies uh think of some of the the the internet companies that do uh music uh for instance and adding podcasts etc and those are different data products we're seeing them actually reorganize their data architectures to make them more distributed uh and actually put the domain heads the business heads in charge of the the data and the data pipeline and that is maybe less efficient but but it's again some of these bleeding edge what else are you guys seeing out there that might be yes some harbingers of the next decade uh i'll go first um you know i think specific to um the the construct that you threw out dave one of the things that we're seeing is um you know the the application owner maybe it's the devops person but it's you know maybe it's it's it's the application owner through the devops person they're they're becoming more technical in their understanding of how infrastructure um interfaces with their with their application i think um you know what what we're seeing on the flashblade side is we're having a lot more conversations with application people than um just i.t people it doesn't mean that the it people aren't there the it people are still there for sure they have to deliver the service etc um but you know the days of of i.t you know building up a catalog of services and a business owner subscribing to one of those services you know picking you know whatever sort of fits their need um i don't think that constru i think that's the construct that changes going forward the application owner is becoming much more prescriptive about what they want the infrastructure to fit how they want the infrastructure to fit into their application and that's a big change and and for for um you know certainly folks like like garrett and cdw um you know they do a good job with this being able to sort of get to the application owner and bring those two sides together there's a tremendous amount of value there for us it's been a little bit of a retooling we've traditionally sold to the i.t side of the house and um you know we've had to teach ourselves how to go talk the language of of applications so um you know i think you pointed out a good a good a good construct there and and you know that that application owner taking playing a much bigger role in what they're expecting uh from the performance of it infrastructure i think is is is a key is a key change interesting i mean that definitely is a trend that's put you guys closer to the business where the the infrastructure team is is serving the business as opposed to sometimes i talk to data experts and they're frustrated uh especially data owners or or data product builders who are frustrated that they feel like they have to beg beg the the data pipeline team to get you know new data sources or get data out how about the edge um you know maybe scott you can kick us off i mean we're seeing you know the emergence of edge use cases ai inferencing at the edge a lot of data at the edge what are you seeing there and and how does this unified object i'll bring us back to that and file fit wow dave how much time do we have um two minutes first of all scott why don't you why don't you just tell everybody what the edge is yeah you got it figured out all right how much time do you have matt at the end of the day and that that's that's a great question right is if you take a step back and i think it comes back today of something you mentioned it's about extracting value from data and what that means is when you extract value from data what it does is as matt pointed out the the influencers or the users of data the application owners they have more power because they're driving revenue now and so what that means is from an i.t standpoint it's not just hey here are the services you get use them or lose them or you know don't throw a fit it is no i have to i have to adapt i have to follow what my application owners mean now when you bring that back to the edge what it means is is that data is not localized to the data center i mean we just went through a nearly 12-month period where the entire workforce for most of the companies in this country had went distributed and business continued so if business is distributed data is distributed and that means that means in the data center that means at the edge that means that the cloud that means in all other places in tons of places and what it also means is you have to be able to extract and utilize data anywhere it may be and i think that's something that we're going to continue to and continue to see and i think it comes back to you know if you think about key characteristics we've talked about things like performance and scale for years but we need to start rethinking it because on one hand we need to get performance everywhere but also in terms of scale and this ties back to some of the other initiatives and getting value from data it's something i call that the massive success problem one of the things we see especially with with workloads like machine learning is businesses find success with them and as soon as they do they say well i need about 20 of these projects now all of a sudden that overburdens it organizations especially across across core and edge and cloud environments and so when you look at environments ability to meet performance and scale demands wherever it needs to be is something that's really important you know so dave i'd like to um just sort of tie together sort of two things that um i think that i heard from scott and garrett that i think are important and it's around this concept of scale um you know some of us are old enough to remember the day when kind of a 10 terabyte blast radius was too big of a blast radius for people to take on or a terabyte of storage was considered to be um you know an exemplary budget environment right um now we sort of think as terabytes kind of like we used to think of as gigabytes in some ways um petabyte like you don't have to explain anybody what a petabyte is anymore um and you know what's on the horizon and it's not far are our exabyte type data set workloads um and you start to think about what could be in that exabyte of data we've talked about how you extract that value we've talked about sort of um how you start but if the scale is big not everybody's going to start at a petabyte or an exabyte to garrett's point the ability to start small and grow into these products or excuse me these projects i think a is a really um fundamental concept here because you're not going to just go by i'm going to kick off a five petabyte project whether you do that on disk or flash it's going to be expensive right but if you could start at a couple hundred terabytes not just as a proof of concept but as something that you know you could get predictable value out of that then you could say hey this either scales linearly or non-linearly in a way that i can then go map my investments to how i can go dig deeper into this that's how all of these things are gonna that's how these successful projects are going to start because the people that are starting with these very large you know sort of um expansive you know greenfield projects at multi-petabyte scale it's gonna be hard to realize near-term value excellent we gotta wrap but but garrett i wonder if you could close when you look forward you talk to customers do you see this unification of of file and object is it is this an evolutionary trend is it something that is that that is that is that is going to be a lever that customers use how do you see it evolving over the next two three years and beyond yeah i mean i think from our perspective i mean just from what we're seeing from the numbers within the market the amount of growth that's happening with unstructured data is really just starting to finally really kind of hit this data deluge or whatever you want to call it that we've been talking about for so many years it really does seem to now be becoming true as we start to see things scale out and really folks settle into okay i'm going to use the cloud to to start and maybe train my models but now i'm going to get it back on prem because of latency or security or whatever the the um decision points are there this is something that is not going to slow down and i think you know folks like pure having the ability to have the tools that they give us um to use and bring to market with our customers are really key and critical for us so i see it as a huge growth area and a big focus for us moving forward guys great job unpacking a topic that you know it's covered a little bit but i think we we covered some ground that is uh that is new and so thank you so much for those insights and that data really appreciate your time thanks steve thanks yeah thanks dave okay and thank you for watching the convergence of file and object keep it right there right back after this short break innovation impact influence welcome to the cube disruptors developers and practitioners learn from the voices of leaders who share their personal insights from the hottest digital events around the globe enjoy the best this community has to offer on the cube your global leader in high-tech digital coverage [Music] okay now we're going to get the customer perspective on object and we'll talk about the convergence of file and object but really focusing on the object piece this is a content program that's being made possible by pure storage and it's co-created with the cube christopher cb bond is here he's a lead architect for microfocus the enterprise data warehouse and principal data engineer at microfocus cb welcome good to see you thanks dave good to be here so tell us more about your role at microfocus it's a pan microfocus role of course we know the company is a multinational software firm and acquired the software assets of hp of course including vertica tell us where you fit yeah so microfocus is uh you know it's like i said wide worldwide uh company that uh sells a lot of software products all over the place to governments and so forth and um it also grows often by acquiring other companies so there is the problem of of integrating new companies and their data and so what's happened over the years is that they've had a a number of different discrete data systems so you've got this data spread all over the place and they've never been able to get a full complete introspection on the entire business because of that so my role was come in design a central data repository an enterprise data warehouse that all reporting could be generated against and so that's what we're doing and we selected vertica as the edw system and pure storage flashblade as the communal repository okay so you obviously had experience with with vertica in your in your previous role so it's not like you were starting from scratch but but paint a picture of what life was like before you embarked on this sort of consolidated a approach to your your data warehouse what was it just disparate data all over the place a lot of m a going on where did the data live right so again the data was all over the place including under people's desks in just dedicated you know their their own private uh sql servers it a lot of data in in um microfocus is run on sql server which has pros and cons because that's a great uh transactional database but it's not really good for analytics in my opinion so uh but a lot of stuff was running on that they had one vertica instance that was doing some select uh reporting wasn't a very uh powerful system and it was what they call vertica enterprise mode where had dedicated nodes which um had the compute and storage um in the same locus on each uh server okay so vertica eon mode is a whole new world because it separates compute from storage you mentioned eon mode uh and the ability to to to scale storage and compute independently we wanted to have the uh analytics olap stuff close to the oltp stuff right so that's why they're co-located very close to each other and so uh we could what's nice about this situation is that these s3 objects it's an s3 object store on the pure flash plate we could copy those over if we needed to uh aws and we could spin up um a version of vertica there and keep going it's it's like a tertiary dr strategy because we actually have a we're setting up a second flashblade vertica system geo-located elsewhere for backup and we can get into it if you want to talk about how the latest version of the pure software for the flashblade allows synchronization across network boundaries of those flash plays which is really nice because if uh you know there's a giant sinkhole opens up under our colo facility and we lose that thing then we just have to switch the dns and we were back in business off the dr and then if that one was to go we could copy those objects over to aws and be up and running there so we're feeling pretty confident about being able to weather whatever comes along so you're using the the pure flash blade as an object store um most people think oh object simple but slow uh not the case for you is that right not the case at all it's ripping um well you have to understand about vertica and the way it stores data it stores data in what they call storage containers and those are immutable okay on disk whether it's on aws or if you had a enterprise mode vertica if you do an update or delete it actually has to go and retrieve that object container from disk and it destroys it and rebuilds it okay which is why you don't you want to avoid updates and deletes with vertica because the way it gets its speed is by sorting and ordering and encoding the data on disk so it can read it really fast but if you do an operation where you're deleting or updating a record in the middle of that then you've got to rebuild that entire thing so that actually matches up really well with s3 object storage because it's kind of the same way uh it gets destroyed and rebuilt too okay so that matches up very well with vertica and we were able to design this system so that it's append only now we had some reports that were running in sql server okay uh which were taking seven days so we moved that to uh to vertica from sql server and uh we rewrote the queries which were which had been written in t sql with a bunch of loops and so forth and we were to get this is amazing it went from seven days to two seconds to generate this report which has tremendous value uh to the company because it would have to have this long cycle of seven days to get a new introspection in what they call their knowledge base and now all of a sudden it's almost on demand two seconds to generate it that's great and that's because of the way the data is stored and uh the s3 you asked about oh you know is it slow well not in that context because what happens really with vertica eon mode is that it can they have um when you set up your compute nodes they have local storage also which is called the depot it's kind of a cache okay so the data will be drawn from the flash and cached locally uh and that was it was thought when they designed that oh you know it's that'll cut down on the latency okay but it turns out that if you have your compute nodes close meaning minimal hops to the flashblade that you can actually uh tell vertica you know don't even bother caching that stuff just read it directly on the fly from the from the flashblade and the performance is still really good it depends on your situation but i know for example a major telecom company that uh uses the same topology as we're talking about here they did the same thing they just they just dropped the cache because the flash player was able to to deliver the the data fast enough so that's you're talking about that that's speed of light issues and just the overhead of of of switching infrastructure is that that gets eliminated and so as a result you can go directly to the storage array that's correct yeah it's it's like it's fast enough that it's it's almost as if it's local to the compute node uh but every situation is different depending on your uh your knees if you've got like a few tables that are heavily used uh then yeah put them um put them in the cash because that'll be probably a little bit faster but if you have a lot of ad hoc queries that are going on you know you may exceed the storage of the local cache and then you're better off having it uh just read directly from the uh from the flash blade got it look it pure's a fit i mean i sound like a fanboy but pure is all about simplicity so is object so that means you don't have to you know worry about wrangling storage and worrying about luns and all that other you know nonsense and and file i've been burned by hardware in the past you know where oh okay they're building to a price and so they cheap out on stuff like fans or other things and these these components fail and the whole thing goes down but this hardware is super super good quality and uh so i'm i'm happy with the quality that we're getting so cb last question what's next for you where do you want to take this uh this this initiative well we are in the process now of we um when so i i designed this system to combine the best of the kimball approach to data warehousing and the inland approach okay and what we do is we bring over all the data we've got and we put it into a pristine staging layer okay like i said it's uh because it's append only it's essentially a log of all the transactions that are happening in this company just they appear okay and then from the the kimball side of things we're designing the data marts now so that that's what the end users actually interact with and so we're we're taking uh the we're examining the transactional systems to say how are these business objects created what's what's the logic there and we're recreating those logical models in uh in vertica so we've done a handful of them so far and it's working out really well so going forward we've got a lot of work to do to uh create just about every object that that the company needs cb you're an awesome guest to really always a pleasure talking to you and uh thank you congratulations and and good luck going forward stay safe thank you [Music] okay let's summarize the convergence of file and object first i want to thank our guests matt burr scott sinclair garrett belsener and c.b bohn i'm your host dave vellante and please allow me to briefly share some of the key takeaways from today's program so first as scott sinclair of esg stated surprise surprise data's growing and matt burr he helped us understand the growth of unstructured data i mean estimates indicate that the vast majority of data will be considered unstructured by mid-decade 80 or so and obviously unstructured data is growing very very rapidly now of course your definition of unstructured data and that may vary across across a wide spectrum i mean there's video there's audio there's documents there's spreadsheets there's chat i mean these are generally considered unstructured data but of course they all have some type of structure to them you know perhaps it's not as strict as a relational database but there's certainly metadata and certain structure to these types of use cases that i just mentioned now the key to what pure is promoting is this idea of unified fast file and object uffo look object is great it's inexpensive it's simple but historically it's been less performant so good for archiving or cheap and deep types of examples organizations often use file for higher performance workloads and let's face it most of the world's data lives in file formats what pure is doing is bringing together file and object by for example supporting multiple protocols ie nfs smb and s3 s3 of course has really given new life to object over the past decade now the key here is to essentially enable customers to have the best of both worlds not having to trade off performance for object simplicity and a key discussion point that we've had on the program has been the impact of flash on the long slow death of spinning disk look hard disk drives they had a great run but hdd volumes they peaked in 2010 and flash as you well know has seen tremendous volume growth thanks to the consumption of flash in mobile devices and then of course its application into the enterprise and that's volume is just going to keep growing and growing and growing the price declines of flash are coming down faster than those of hdd so it's the writing's on the wall it's just a matter of time so flash is riding down that cost curve very very aggressively and hdd has essentially become you know a managed decline business now by bringing flash to object as part of the flashblade portfolio and allowing for multiple protocols pure hopes to eliminate the dissonance between file and object and simplify the choice in other words let the workload decide if you have data in a file format no problem pure can still bring the benefits of simplicity of object at scale to the table so again let the workload inform what the right strategy is not the technical infrastructure now pure course is not alone there are others supporting this multi-protocol strategy and so we asked matt burr why pure or what's so special about you and not surprisingly in addition to the product innovation he went right to pure's business model advantages i mean for example with its evergreen support model which was very disruptive in the marketplace you know frankly pure's entire business disrupted the traditional disk array model which was fundamentally was flawed pure forced the industry to respond and when it achieved escape velocity velocity and pure went public the entire industry had to react and a big part of the pure value prop in addition to this business model innovation that we just discussed is simplicity pure's keep its simple approach coincided perfectly with the ascendancy of cloud where technology organizations needed cloud-like simplicity for certain workloads that were never going to move into the cloud they're going to stay on-prem now i'm going to come back to this but allow me to bring in another concept that garrett and cb really highlighted and that is the complexity of the data pipeline and what do you mean what do i mean by that and why is this important so scott sinclair articulated he implied that the big challenge is organizations their data full but insights are scarce scarce a lot of data not as much insights it takes time too much time to get to those insights so we heard from our guests that the complexity of the data pipeline was a barrier to getting to faster insights now cb bonds shared how he streamlined his data architecture using vertica's eon mode which allowed him to scale compute independently of storage so that brought critical flexibility and improved economics at scale and flashblade of course was the back-end storage for his data warehouse efforts now the reason i think this is so important is that organizations are struggling to get insights from data and the complexity associated with the data pipeline and data life cycles let's face it it's overwhelming organizations and there the answer to this problem is a much longer and different discussion than unifying object and file that's you know i can spend all day talking about that but let's focus narrowly on the part of the issue that is related to file and object so the situation here is that technology has not been serving the business the way it should rather the formula is twisted in the world of data and big data and data architectures the data team is mired in complex technical issues that impact the time to insights now part of the answer is to abstract the underlying infrastructure complexity and create a layer with which the business can interact that accelerates instead of impedes innovation and unifying file and object is a simple example of this where the business team is not blocked by infrastructure nuance like does this data reside in a file or object format can i get to it quickly and inexpensively in a logical way or is the infrastructure in a stovepipe and blocking me so if you think about the prevailing sentiment of how the cloud is evolving to incorporate on premises workloads that are hybrid and configurations that are working across clouds and now out to the edge this idea of an abstraction layer that essentially hides the underlying infrastructure is a trend we're going to see evolve this decade now is uffo the be all end-all answer to solving all of our data pipeline challenges no no of course not but by bringing the simplicity and economics of object together with the ubiquity and performance of file uffo makes it a lot easier it simplifies life organizations that are evolving into digital businesses which by the way is every business so we see this as an evolutionary trend that further simplifies the underlying technology infrastructure and does a better job supporting the data flows for organizations so they don't have to spend so much time worrying about the technology details that add a little value to the business okay so thanks for watching the convergence of file and object and thanks to pure storage for making this program possible this is dave vellante for the cube we'll see you next time [Music] you

Published Date : Feb 24 2021

SUMMARY :

on the nfs side um but you know we

ENTITIES

Entity	Category	Confidence
garrett belsner	PERSON	0.99+
matt burr	PERSON	0.99+
2010	DATE	0.99+
2050	DATE	0.99+
270 terabytes	QUANTITY	0.99+
seven days	QUANTITY	0.99+
2021	DATE	0.99+
scott sinclair	PERSON	0.99+
2035	DATE	0.99+
2019	DATE	0.99+
four	QUANTITY	0.99+
three	QUANTITY	0.99+
two seconds	QUANTITY	0.99+
2025	DATE	0.99+
matt burr	PERSON	0.99+
first phase	QUANTITY	0.99+
dave	PERSON	0.99+
dave vellante	PERSON	0.99+
scott sinclair	PERSON	0.99+
five	QUANTITY	0.99+
250 terabytes	QUANTITY	0.99+
10 terabyte	QUANTITY	0.99+
zero percent	QUANTITY	0.99+
100	QUANTITY	0.99+
steve	PERSON	0.99+
gary	PERSON	0.99+
two billion dollar	QUANTITY	0.99+
garrett	PERSON	0.99+
two minutes	QUANTITY	0.99+
two weeks later	DATE	0.99+
three topics	QUANTITY	0.99+
two sides	QUANTITY	0.99+
two weeks ago	DATE	0.99+
billion dollars	QUANTITY	0.99+
mid-decade 80	DATE	0.99+
today	DATE	0.99+
cdw	PERSON	0.98+
three phases	QUANTITY	0.98+
80	QUANTITY	0.98+
billions of objects	QUANTITY	0.98+
10 month	QUANTITY	0.98+
one device	QUANTITY	0.98+
an hour	QUANTITY	0.98+
one platform	QUANTITY	0.98+
scott	ORGANIZATION	0.97+
last year	DATE	0.97+
five petabyte	QUANTITY	0.97+
scott	PERSON	0.97+
cassandra	PERSON	0.97+
one	QUANTITY	0.97+
single block	QUANTITY	0.97+
one system	QUANTITY	0.97+
next decade	DATE	0.96+
tons of places	QUANTITY	0.96+
both worlds	QUANTITY	0.96+
vertica	TITLE	0.96+
matt	PERSON	0.96+
both	QUANTITY	0.96+
69 of organizations	QUANTITY	0.96+
billion dollars	QUANTITY	0.95+
pandemic	EVENT	0.95+
first	QUANTITY	0.95+
three great guests	QUANTITY	0.95+
next year	DATE	0.95+

Keynote Analysis | IBM Data and AI Forum

>>Live from Miami, Florida. It's the cube covering IBM's data and AI forum brought to you by IBM. >>Welcome everybody to the port of Miami. My name is Dave Vellante and you're watching the cube, the leader in live tech coverage. We go out to the events, we extract the signal from the noise and we're here at the IBM data and AI form. The hashtag is data AI forum. This is IBM's. It's formerly known as the, uh, IBM analytics university. It's a combination of learning peer network and really the focus is on AI and data. And there are about 1700 people here up from, Oh, about half of that last year, uh, when it was the IBM, uh, analytics university, about 600 customers, a few hundred partners. There's press here, there's, there's analysts, and of course the cube is covering this event. We'll be here for one day, 128 hands-on sessions or ER or sessions, 35 hands on labs. As I say, a lot of learning, a lot of technical discussions, a lot of best practices. >>What's happening here. For decades, our industry has marched to the cadence of Moore's law. The idea that you could double the processor performance every 18 months, doubling the number of transistors, you know, within, uh, the footprint that's no longer what's driving innovation in the it and technology industry today. It's a combination of data with machine intelligence applied to that data and cloud. So data we've been collecting data, we've always talked about all this data that we've collected and over the past 10 years with the advent of lower costs, warehousing technologies in file stores like Hadoop, um, with activity going on at the edge with new databases and lower cost data stores that can handle unstructured data as well as structured data. We've amassed this huge amount of, of data that's growing at a, at a nonlinear rate. It's, you know, this, the curve is steepening is exponential. >>So there's all this data and then applying machine intelligence or artificial intelligence with machine learning to that data is the sort of blending of a new cocktail. And then the third piece of that third leg of that stool is the cloud. Why is the cloud important? Well, it's important for several reasons. One is that's where a lot of the data lives too. It's where agility lives. So cloud, cloud, native of dev ops, and being able to spin up infrastructure as code really started in the cloud and it's sort of seeping to to on prem, slowly and hybrid and multi-cloud, ACC architectures. But cloud gives you not only that data access, not only the agility, but also scale, global scale. So you can test things out very cheaply. You can experiment very cheaply with cloud and data and AI. And then once your POC is set and you know it's going to give you business value and the business outcomes you want, you can then scale it globally. >>And that's really what what cloud brings. So this forum here today where the big keynotes, uh, Rob Thomas kicked it off. He uh, uh, actually take that back. A gentleman named Ray Zahab, he's an adventure and ultra marathon or kicked it off. This Jude one time ran 4,500 miles in 111 days with two ultra marathon or colleagues. Um, they had no days off. They traveled through six countries, they traversed Africa, the continent, and he took two showers in a 111 days. And his whole mission is really talking about the power of human beings, uh, and, and the will of humans to really rise above any challenge would with no limits. So that was the sort of theme that, that was set for. This, the, the tone that was set for this conference that Rob Thomas came in and invoked the metaphor of superheroes and superpowers of course, AI and data being two of those three superpowers that I talked about in addition to cloud. >>So Rob talked about, uh, eliminating the good to find the great, he talked about some of the experiences with Disney's ward. Uh, ward Kimball and Stanley, uh, ward Kimball went to, uh, uh, Walt Disney with this amazing animation. And Walter said, I love it. It was so funny. It was so beautiful, was so amazing. Your work 283 days on this. I'm cutting it out. So Rob talked about cutting out the good to find, uh, the great, um, also talking about AI is penetrated only about four to 10% within organizations. Why is that? Why is it so low? He said there are three things that are blockers. They're there. One is data and he specifically is referring to data quality. The second is trust and the third is skillsets. So he then talked about, you know, of course dovetailed a bunch of IBM products and capabilities, uh, into, you know, those, those blockers, those challenges. >>He talked about two in particular, IBM cloud pack for data, which is this way to sort of virtualize data across different clouds and on prem and hybrid and and basically being able to pull different data stores in, virtualize it, combine join data and be able to act on it and apply a machine learning and AI to it. And then auto AI a way to basically machine intelligence for artificial intelligence. In other words, AI for AI. What's an example? How do I choose the right algorithm and that's the best fit for the use case that I'm using. Let machines do that. They've got experience and they can have models that are trained to actually get the best fit. So we talked about that, talked about a customer, a panel, a Miami Dade County, a Wunderman Thompson, and the standard bank of South Africa. These are incumbents that are using a machine intelligence and AI to actually try to super supercharge their business. We heard a use case with the Royal bank of Scotland, uh, basically applying AI and driving their net promoter score. So we'll talk some more about that. Um, and we're going to be here all day today, uh, interviewing executives, uh, from, uh, from IBM, talking about, you know, what customers are doing with a, uh, getting the feedback from the analysts. So this is what we do. Keep it right there, buddy. We're in Miami all day long. This is Dave Olanta. You're watching the cube. We'll be right back right after this short break..

Published Date : Oct 22 2019

SUMMARY :

IBM's data and AI forum brought to you by IBM. It's a combination of learning peer network and really the focus is doubling the number of transistors, you know, within, uh, the footprint that's in the cloud and it's sort of seeping to to on prem, slowly and hybrid and multi-cloud, really talking about the power of human beings, uh, and, and the will of humans So Rob talked about cutting out the good to find, and that's the best fit for the use case that I'm using.

ENTITIES

Entity	Category	Confidence
Ray Zahab	PERSON	0.99+
Miami	LOCATION	0.99+
Dave Vellante	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Rob Thomas	PERSON	0.99+
Dave Olanta	PERSON	0.99+
4,500 miles	QUANTITY	0.99+
35 hands	QUANTITY	0.99+
Stanley	PERSON	0.99+
two	QUANTITY	0.99+
six countries	QUANTITY	0.99+
128 hands	QUANTITY	0.99+
111 days	QUANTITY	0.99+
Walter	PERSON	0.99+
Rob	PERSON	0.99+
Africa	LOCATION	0.99+
Jude	PERSON	0.99+
one day	QUANTITY	0.99+
283 days	QUANTITY	0.99+
third piece	QUANTITY	0.99+
Miami, Florida	LOCATION	0.99+
Wunderman Thompson	ORGANIZATION	0.99+
Royal bank of Scotland	ORGANIZATION	0.99+
One	QUANTITY	0.99+
third	QUANTITY	0.99+
today	DATE	0.99+
second	QUANTITY	0.98+
last year	DATE	0.98+
about 600 customers	QUANTITY	0.98+
third leg	QUANTITY	0.98+
South Africa	LOCATION	0.97+
one time	QUANTITY	0.97+
three things	QUANTITY	0.96+
IBM Data	ORGANIZATION	0.96+
about 1700 people	QUANTITY	0.96+
three superpowers	QUANTITY	0.96+
two ultra marathon	QUANTITY	0.95+
Kimball	PERSON	0.95+
two showers	QUANTITY	0.94+
10%	QUANTITY	0.94+
about four	QUANTITY	0.88+
IBM analytics university	ORGANIZATION	0.86+
Miami Dade County	LOCATION	0.8+
18 months	QUANTITY	0.78+
hundred partners	QUANTITY	0.76+
decades	QUANTITY	0.74+
university	ORGANIZATION	0.73+
ward	PERSON	0.69+
Disney	ORGANIZATION	0.69+
Hadoop	TITLE	0.67+
Moore	PERSON	0.6+
years	DATE	0.59+
Walt	PERSON	0.58+
Disney	PERSON	0.5+
10	QUANTITY	0.46+
half	QUANTITY	0.4+
past	DATE	0.39+

Aaron T. Myers Cloudera Software Engineer Talking Cloudera & Hadooop

>>so erin you're a technique for a Cloudera, you're a whiz kid from Brown, you have, how many Brown people are engineers here at Cloudera >>as of monday, we have five full timers and two interns at the moment and we're trying to hire more all the time. >>Mhm. So how many interns? >>Uh two interns from Brown this this summer? A few more from other schools? Cool, >>I'm john furry with silicon angle dot com. Silicon angle dot tv. We're here in the cloud era office in my little mini studio hasn't been built out yet, It was studio, we had to break it down for a doctor, ralph kimball, not richard Kimble from uh I called him on twitter but coupon um but uh the data warehouse guru was in here um and you guys are attracting a lot of talent erin so tell us a little bit about, you know, how Claudia is making it happen and what's the big deal here, people smart here, it's mature, it's not the first time around this company, this company has some some senior execs and there's been a lot, a lot of people uh in the market who have been talking about uh you know, a lot of first time entrepreneurs doing their startups and I've been hearing for some folks in in the, in the trenches that there's been a frustration and start ups out there, that there's a lot of first time entrepreneurs and everyone wants to be the next twitter and there's some kind of companies that are straddling failure out there? And and I was having that conversation with someone just today and I said, they said, what's it like Cloudera and I said, uh, this is not the first time crew here in Cloudera. So, uh, share with the folks out there, what you're seeing for Cloudera and the management team. >>Sure. Well, one of the most attractive parts about working Cloudera for me, one of the reasons I, I really came here was have been incredibly experienced management team, Mike Charles, they've all there at the top of this Oregon, they have all done this before they founded startups, Growing startups, old startups and uh, especially in contrast with my, the place where I worked previously. Uh, the amount of experience here is just tremendous. You see them not making mistakes where I'm sure others would. >>And I mean, Mike Olson is veteran. I mean he's been, he's an adviser to start ups. I know he's been in some investors. Amer was obviously PhD candidates bolted out the startup, sold it to yahoo, worked at, yahoo, came back finish his PhD at stanford under Mendel over there in the PhD program over this, we banged in a speech. He came back entrepreneur residents, Excel partners. Now it does Cloudera. Um, when did you join the company and just take us through who you are and when you join Cloudera, I want your background. >>Sure. So I, I joined a little over a year ago is about 30 people at the time. Uh, I came from a small start up of the music online music store in new york city um uh, which doesn't really exist all that much anymore. Um but you know, I I sort of followed my other colleagues from Brown who worked here um was really sold by the management team and also by the tremendous market opportunity that that Hadoop has right now. Uh Cloudera was very much the first commercial player there um which is really a unique experience and I think you've covered this pretty well before. I think we all around here believe that uh the markets only growing. Um and we're going to see the market and the big data market in general get bigger and bigger in the next few years. >>So, so obviously computer science is all the rage and and I'm particularly proud of hangout, we've had conversations in the hallway while you're tweeting about this and that. Um, but you know, silicon angles home is here, we've had, I've had a chance to watch you and the other guys here grow from, you know, from your other office was a san mateo or san Bruno somewhere in there. Like >>uh it was originally in burlingame, then we relocate the headquarters Palo Alto and now we have a satellite up in san Francisco. >>So you guys bolted out. You know, you have a full on blow in san Francisco office. So um there was a big busting at the seams here in Palo Alto people commuting down uh even building their burning man. Uh >>Oh yeah sure >>skits here and they're constructing their their homes here, but burning man, so we're doing that in san Francisco, what's the vibe like in san Francisco, tell us what's going on >>in san Francisco, san Francisco is great. It's, I'm I live in san Francisco as do a lot of us. About half the engineering team works up there now. Um you know we're running out of space there certainly. Um and you're already, oh yeah, oh yeah, we're hiring as fast as we absolutely can. Um so definitely not space to build the burning man huts there like like there is down, down in Palo Alto but it's great up there. >>What are you working on right now for project insurance? The computer science is one of the hot topics we've been covering on silicon angle, taking more of a social angle, social media has uh you know, moves from this pr kind of, you know, check in facebook fan page to hype to kind of a real deal social marketplace where you know data, social data, gestural data, mobile data geo data data is the center of the value proposition. So you live that every day. So talk about your view on the computer science landscape around data and why it's such a big deal. >>Oh sure. Uh I think data is sort of one of those uh fundamental uh things that can be uh mind for value across every industry, there's there's no industry out there that can't benefit from better understanding what their customers are doing, what their competitors are doing etcetera. And that's sort of the the unique value proposition of, you know, stuff like Hadoop. Um truly we we see interest from every sector that exists, which is great as for what the project that I'm specifically working on right now, I primarily work on H. D. F. S, which is the Hadoop distributed file system underlies pretty much all the other um projects in the Hadoop ecosystem. Uh and I'm particularly working with uh other colleagues at Cloudera and at other companies, yahoo and facebook on high availability for H. D. F. S, which has been um in some deployments is a serious concern. Hadoop is primarily a batch processing system, so it's less of a concern than in others. Um but when you start talking about running H base, which needs to be up all the time serving live traffic than having highly available H DFS is uh necessity and we're looking forward to delivering that >>talk about the criticism that H. D. F. S has been having. Um Well, I wouldn't say criticism. I mean, it's been a great, great product that produced the HDs, a core parts of how do you guys been contributing to the standard of Apache, that's no secret to the folks out there, that cloud area leads that effort. Um but there's new companies out there kind of trying a new approach and they're saying they're doing it better, what are they saying in terms and what's really happening? So, you know, there's some argument like, oh, we can do it better. And what's the what, why are they doing it, that was just to make money do a new venture, or is that, what's your opinion on that? Yeah, >>sure. I mean, I think it's natural to to want to go after uh parts of the core Hadoop system and say, you know, Hadoop is a great ecosystem, but what if we just swapped out this part or swapped out that part, couldn't couldn't we get some some really easy gains. Um and you know, sometimes that will be true. I have confidence that that that just will not simply not be true in in the very near future. One of the great benefits about Apache, Hadoop being open source is that we have a huge worldwide network of developers working at some of the best engineering organizations in the world who are all collaborating on this stuff. Um and, you know, I firmly believe that the collaborative open source process produces the best software and that's that's what Hadoop is at its very core. >>What about the arguments are saying that, oh, I need to commercialize it differently for my installed base bolt on a little proprietary extensions? Um That's legitimate argument. TMC might take that approach or um you know, map are I was trying to trying to rewrite uh H. T. F. >>S. To me, is >>it legitimate? I mean is there fighting going on in the standards? Maybe that's a political question you might want to answer. But give me a shot. >>I mean the Hadoop uh isn't there's no open standard for Hadoop. You can't say like this is uh this is like do compatible or anything like that. But you know what you can say is like this is Apache Hadoop. Uh And so in that sense there's no there's no fighting to be had there. Um Yeah, >>so yeah. Who um struggling as a company. But you know, there's a strong head Duke D. N. A. At yahoo, certainly, I talked with the the founder of the startup. Horton works just announced today that they have a new board member. He's the guy who's the Ceo of Horton works and now on bluster, I'm sorry, cluster announced they have um rob from benchmark on the board. Uh He's the Ceo of Horton works and and one of my not criticisms but points about Horton was this guy's an engineer, never run a company before. He's no Mike Olson. Okay, so you know, Michaelson has a long experience. So this guy comes into running and he's obviously in in open source, is that good for Yahoo and open sources. He they say they're going to continue to invest in Hadoop? They clearly are are still using a lot of Hadoop certainly. Um how is that changing Apache, is that causing more um consolidation, is that causing more energy? What's your view on the whole Horton works? Think >>um you know, yahoo is uh has been and will continue to be a huge contributor. Hadoop, they uh I can't say for sure, but I feel pretty confident that they have more data under management under Hadoop than anyone else in the world and there's no question in my mind that they'll continue to invest huge amounts of both key way effort and engineering effort and uh all of the things that Hadoop needs to to advance. Um I'm sure that Horton works will continue to work very closely with with yahoo. Um And you know, we're excited to see um more and more contributors to to Hadoop um both from Horton works and from yahoo proper. >>Cool, Well, I just want to clarify for the folks out there who don't understand what this whole yahoo thing is, It was not a spin out, these were key Hadoop core guys who left the company to form a startup of which yahoo financed with benchmark capital. So, yahoo is clearly and told me and reaffirm that with me that they are clearly investing more in Hadoop internally as well. So there's more people inside, yahoo that work on Hadoop than they are in the entire Horton's work company. So that's very clear. So just to clear that up out there. Um erin. so you're you're a young gun, right? You're a young whiz like Todd madam on here, explain to the folks out there um a little bit older maybe guys in their thirties or C IOS a lot of people are doing, you know, they're kicking the tires on big data, they're hearing about real time analytics, they're hearing about benefits have never heard before. Uh Dave a lot and I on the cube talk about, you know, the transformations that are going on, you're seeing AMC getting into big data, everyone's transforming at the enterprise level and service provider. What explains the folks why Hadoop is so important. Why is that? Do if not the fastest or one of the fastest growing projects in Apache ever? Sure. Even faster than the web server project, which is one of the better, >>better bigger ones. >>Why is the dupes and explain to them what it is? Well, you know, >>it's been it's pretty well covered that there's been an explosion of data that more data is produced every every year over and over. We talk about exabytes which is a quantity of data that is so large that pretty much no one can really theoretically comprehend it. Um and more and more uh organizations want to store and process and learn from, you know, get insights from that data um in addition to just the explosion of data um you know that there is simply more data, organizations are less willing to discard data. One of the beauties of Hadoop is truly that it's so very inexpensive per terabyte to store data that you don't have to think up front about what you want to store, what you want to discard, store it all and figure out later what is the most useful bits we call that sort of schema on read. Um as opposed to, you know, figuring out the schema a priority. Um and that is a very powerful shift in dynamics of data storage in general. And I think that's very attractive to all sorts of organizations. >>Your, I'll see a Brown graduate and you have some interns from Brown to Brown um, Premier computer science program almost as good as when I went to school at Northeastern University. >>Um >>you know, the unsung heroes of computer science only kidding Brown's great program, but you know, cutting edge computer science areas known as obviously leading in a lot of the computer science areas do in general is known that you gotta be pretty savvy to be either masters level PhD to kind of play in this area? Not a lot of adoption, what I call the grassroots developers. What's your vision and how do you see the computer science, younger generation, even younger than you kind of growing up into this because those tools aren't yet developed. You still got to be, you're pretty strong from a computer science perspective and also explained to the folks who aren't necessarily at the browns of the world or getting into computer science, what about, what is that this revolution about and where is it going? What are some of the things you see happening around the corner that that might not be obvious. >>Sure there's a few questions there. Um part of it is how do people coming out of college get into this thing, It's not uh taught all that much in school, How do how do you sort of make the leap from uh the standard computer science curriculum into this sort of thing? And um you know, part of it is that really we're seeing more and more schools offering distributed computing classes or they have grids available um to to do this stuff there there is some research coming out of Brown actually and lots of other schools about Hadoop proper in the behavior of Hadoop under failure scenarios, that sort of stuff, which is very interesting. Google uh actually has classes that they teach, I believe in conjunction with the University of Washington um where they teach undergraduates and your master's level, graduate students about mass produced and distributed computing and they actually use Hadoop to do it because it is the architecture of Hadoop is modeled after um >>uh >>google's internal infrastructure. Um So you know that that's that's one way we're seeing more and more people who are just coming out of college who have distributed systems uh knowledge like this? Um Another question? the other part of the question you asked is how does um how does the ordinary developer get into this stuff? And the answer is we're working hard, you know, we and others in the hindu community are working hard on making it, making her do just much easier to consume. We released, you cover this fair bit, the ECM Express project that lets you install Hadoop with just minimal effort as close to 11 click as possible. Um and there's lots of um sort of layers built on top of Hadoop to make it more easily consumed by developers Hive uh sort of sequel like interface on top of mass produce. And Pig has its own DSL for programming against mass produce. Um so you don't have to write heart, you don't have to write straight map produced code, anything like that. Uh and it's getting easier for operators every day. >>Well, I mean, evolution was, I mean, you guys actually working on that cloud era. Um what about what about some of the abstractions? You're seeing those big the Rage is, you know, look back a year ago VM World coming up and uh little plugs looking angle dot tv will be broadcasting live and at VM World. Um you know, he has been on the Q XV m where um Spring Source was a big announcement that they made. Um, Haruka brought by Salesforce Cloud Software frameworks are big, what does that look like and how does it relate to do and the ecosystem around Hadoop where, you know, the rage is the software frameworks and networks kind of collide and you got the you got the kind of the intersection of, you know, software frameworks and networks obviously, you know, in the big players, we talk about E M C. And these guys, it's clear that they realize that software is going to be their key differentiator. So it's got to get to a framework stand, what is Hadoop and Apache talking about this kind of uh, evolution for for Hadoop. >>Sure. Well, you know, I think we're seeing very much the commoditization of hardware. Um, you just can't buy bigger and bigger computers anymore. They just don't exist. So you're going to need something that can take a lot of little computers and make it look like one big computer. And that's what Hadoop is especially good at. Um we talk about scaling out instead of scaling up, you can just buy more relatively inexpensive computers. Uh and that's great. And sort of the beauty of Hadoop, um, is that it will grow linearly as your data set as your um, your your scale, your traffic, whatever grows. Um and you don't have to have this exponential price increase of buying bigger and bigger computers, You can just buy more. Um and that that's sort of the beauty of it is a software framework that if you write against it. Um you don't have to think about the scaling anymore. It will do that for you. >>Okay. The question for you, it's gonna kind of a weird question but try to tackle it. You're at a party having a few cocktails, having a few beers with your buddies and your buddies who works at a big enterprise says man we've got all this legacy structured data systems, I need to implement some big data strategy, all this stuff. What do I do? >>Sure, sure. Um Not the question I thought you were going to ask me that you >>were a g rated program here. >>Okay. I thought you were gonna ask me, how do I explain what I do to you know people that we'll get to that next. Okay. Um Yeah, I mean I would say that the first thing to do is to implement a start, start small, implement a proof of concept, get a subset of the data that you would like to analyze, put it, put Hadoop on a few machines, four or five, something like that and start writing some hive queries, start writing some some pig scripts and I think you'll you know pretty quickly and easily see the value that you can get out of it and you can do so with the knowledge that when you do want to operate over your entire data set, you will absolutely be able to trivially scale to that size. >>Okay. So now the question that I want to ask is that you're at a party and I want to say, what do you >>do? You usually tell people in my hedge fund manager? No but seriously um I I tell people I work on distributed supercomputers. Software for distributed supercomputers and that people have some idea what distributed means and supercomputers and they figure that out. >>So final question for I know you gotta go get back to programming uh some code here. Um what's the future of Hadoop in the sense of from a developer standpoint? I was having a conversation with a developer who's a big data jockey and talking about Miss kelly gets anything and get his hands on G. O. Data, text data because the data data junkie and he says I just don't know what to build. Um What are some of the enabling apps that you may see out there and or you have just conceiving just brainstorming out there, what's possible with with data, can you envision the next five years, what are you gonna see evolve and what some of the coolest things you've seen that might that are happening right now. >>Sure. Sure. I mean I think you're going to see uh just the front ends to these things getting just easier and easier and easier to interact with and at some point you won't even know that you're interacting with a Hadoop cluster that will be the engine underneath the hood but you know, you'll you'll be uh from your perspective you'll be driving a Ferrari and by that I mean you know, standard B. I tool, standard sequel query language. Um we'll all be implemented on top of this stuff and you know from that perspective you could implement, you know, really anything you want. Um We're seeing a lot of great work coming out of just identifying trends amongst masses of data that you know, if you tried to analyze it with any other tool, you'd either have to distill it down so far that you would you would question your results or that you could only run the very simplest sort of queries over um and not really get those like powerful deep insights, those sort of correlative insights um that we're seeing people do. So I think you'll see, you'll continue to see uh great recommendations systems coming out of this stuff. You'll see um root cause analysis, you'll see great work coming out of the advertising industry um to you know to really say which ad was responsible for this purchase. Was it really the last ad they clicked on or was it the ad they saw five weeks ago they put the thought in mind that sort of correlative analysis is being empowered by big data systems like a dupe. >>Well I'm bullish on big data, I think people I think it's gonna be even bigger than I think you're gonna have some kids come out of college and say I could use big data to create a differentiation and build an airline based on one differentiation. These are cool new ways and, and uh, data we've never seen before. So Aaron, uh, thanks for coming >>on the issue >>um, your inside Palo Alto Studio and we're going to.

Published Date : Sep 28 2011

SUMMARY :

the market who have been talking about uh you know, a lot of first time entrepreneurs doing their startups and I've been Uh, the amount of experience take us through who you are and when you join Cloudera, I want your background. Um but you know, I I sort of followed my other colleagues you know, from your other office was a san mateo or san Bruno somewhere in there. So you guys bolted out. Um you know we're running out of space there certainly. on silicon angle, taking more of a social angle, social media has uh you know, Um but when you start talking about running H base, which needs to be up all the time serving live traffic So, you know, there's some argument like, oh, we can do it better. Um and you know, sometimes that will be true. TMC might take that approach or um you know, map are I was trying to trying to rewrite Maybe that's a political question you might want to answer. But you know what you can say is like this is Apache Hadoop. so you know, Michaelson has a long experience. Um And you know, we're excited to see um more and more contributors to Uh Dave a lot and I on the cube talk about, you know, per terabyte to store data that you don't have to think up front about what Your, I'll see a Brown graduate and you have some interns from Brown to Brown What are some of the things you see happening around the corner that And um you know, part of it is that really we're seeing more and more schools offering And the answer is we're working hard, you know, we and others in the hindu community are working do and the ecosystem around Hadoop where, you know, the rage is the software frameworks and Um and that that's sort of the beauty of it is a software framework I need to implement some big data strategy, all this stuff. Um Not the question I thought you were going to ask me that you the value that you can get out of it and you can do so with the knowledge that when you do and that people have some idea what distributed means and supercomputers and they figure that out. apps that you may see out there and or you have just conceiving just brainstorming out out of just identifying trends amongst masses of data that you know, if you tried Well I'm bullish on big data, I think people I think it's gonna be even bigger than I think you're gonna have some kids come out of college

ENTITIES

Entity	Category	Confidence
Mike Olson	PERSON	0.99+
yahoo	ORGANIZATION	0.99+
Mike Charles	PERSON	0.99+
san Francisco	LOCATION	0.99+
Palo Alto	LOCATION	0.99+
Yahoo	ORGANIZATION	0.99+
Aaron	PERSON	0.99+
Aaron T. Myers	PERSON	0.99+
University of Washington	ORGANIZATION	0.99+
Hadoop	TITLE	0.99+
facebook	ORGANIZATION	0.99+
Cloudera	ORGANIZATION	0.99+
richard Kimble	PERSON	0.99+
Michaelson	PERSON	0.99+
two interns	QUANTITY	0.99+
Oregon	LOCATION	0.99+
Google	ORGANIZATION	0.99+
Todd	PERSON	0.99+
Claudia	PERSON	0.99+
AMC	ORGANIZATION	0.99+
five weeks ago	DATE	0.99+
Northeastern University	ORGANIZATION	0.99+
monday	DATE	0.99+
first time	QUANTITY	0.99+
both	QUANTITY	0.99+
Dave	PERSON	0.99+
TMC	ORGANIZATION	0.99+
ralph kimball	PERSON	0.99+
burlingame	LOCATION	0.99+
Ferrari	ORGANIZATION	0.98+
today	DATE	0.98+
five	QUANTITY	0.98+
Brown	ORGANIZATION	0.98+
thirties	QUANTITY	0.98+
one	QUANTITY	0.98+
Horton	ORGANIZATION	0.98+
Apache	ORGANIZATION	0.98+
Hadoop	ORGANIZATION	0.98+
erin	PERSON	0.98+
google	ORGANIZATION	0.97+
One	QUANTITY	0.97+
twitter	ORGANIZATION	0.97+
Brown	PERSON	0.97+
a year ago	DATE	0.97+
Salesforce	ORGANIZATION	0.97+
john furry	PERSON	0.96+
one big computer	QUANTITY	0.95+
new york city	LOCATION	0.95+
Mendel	PERSON	0.94+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Kimball: