Breaking Analysis: Enterprise Technology Predictions 2023

(upbeat music beginning) >> From the Cube Studios in Palo Alto and Boston, bringing you data-driven insights from the Cube and ETR, this is "Breaking Analysis" with Dave Vellante. >> Making predictions about the future of enterprise tech is more challenging if you strive to lay down forecasts that are measurable. In other words, if you make a prediction, you should be able to look back a year later and say, with some degree of certainty, whether the prediction came true or not, with evidence to back that up. Hello and welcome to this week's Wikibon Cube Insights, powered by ETR. In this breaking analysis, we aim to do just that, with predictions about the macro IT spending environment, cost optimization, security, lots to talk about there, generative AI, cloud, and of course supercloud, blockchain adoption, data platforms, including commentary on Databricks, snowflake, and other key players, automation, events, and we may even have some bonus predictions around quantum computing, and perhaps some other areas. To make all this happen, we welcome back, for the third year in a row, my colleague and friend Eric Bradley from ETR. Eric, thanks for all you do for the community, and thanks for being part of this program. Again. >> I wouldn't miss it for the world. I always enjoy this one. Dave, good to see you. >> Yeah, so let me bring up this next slide and show you, actually come back to me if you would. I got to show the audience this. These are the inbounds that we got from PR firms starting in October around predictions. They know we do prediction posts. And so they'll send literally thousands and thousands of predictions from hundreds of experts in the industry, technologists, consultants, et cetera. And if you bring up the slide I can show you sort of the pattern that developed here. 40% of these thousands of predictions were from cyber. You had AI and data. If you combine those, it's still not close to cyber. Cost optimization was a big thing. Of course, cloud, some on DevOps, and software. Digital... Digital transformation got, you know, some lip service and SaaS. And then there was other, it's kind of around 2%. So quite remarkable, when you think about the focus on cyber, Eric. >> Yeah, there's two reasons why I think it makes sense, though. One, the cybersecurity companies have a lot of cash, so therefore the PR firms might be working a little bit harder for them than some of their other clients. (laughs) And then secondly, as you know, for multiple years now, when we do our macro survey, we ask, "What's your number one spending priority?" And again, it's security. It just isn't going anywhere. It just stays at the top. So I'm actually not that surprised by that little pie chart there, but I was shocked that SaaS was only 5%. You know, going back 10 years ago, that would've been the only thing anyone was talking about. >> Yeah. So true. All right, let's get into it. First prediction, we always start with kind of tech spending. Number one is tech spending increases between four and 5%. ETR has currently got it at 4.6% coming into 2023. This has been a consistently downward trend all year. We started, you know, much, much higher as we've been reporting. Bottom line is the fed is still in control. They're going to ease up on tightening, is the expectation, they're going to shoot for a soft landing. But you know, my feeling is this slingshot economy is going to continue, and it's going to continue to confound, whether it's supply chains or spending. The, the interesting thing about the ETR data, Eric, and I want you to comment on this, the largest companies are the most aggressive to cut. They're laying off, smaller firms are spending faster. They're actually growing at a much larger, faster rate as are companies in EMEA. And that's a surprise. That's outpacing the US and APAC. Chime in on this, Eric. >> Yeah, I was surprised on all of that. First on the higher level spending, we are definitely seeing it coming down, but the interesting thing here is headlines are making it worse. The huge research shop recently said 0% growth. We're coming in at 4.6%. And just so everyone knows, this is not us guessing, we asked 1,525 IT decision-makers what their budget growth will be, and they came in at 4.6%. Now there's a huge disparity, as you mentioned. The Fortune 500, global 2000, barely at 2% growth, but small, it's at 7%. So we're at a situation right now where the smaller companies are still playing a little bit of catch up on digital transformation, and they're spending money. The largest companies that have the most to lose from a recession are being more trepidatious, obviously. So they're playing a "Wait and see." And I hope we don't talk ourselves into a recession. Certainly the headlines and some of their research shops are helping it along. But another interesting comment here is, you know, energy and utilities used to be called an orphan and widow stock group, right? They are spending more than anyone, more than financials insurance, more than retail consumer. So right now it's being driven by mid, small, and energy and utilities. They're all spending like gangbusters, like nothing's happening. And it's the rest of everyone else that's being very cautious. >> Yeah, so very unpredictable right now. All right, let's go to number two. Cost optimization remains a major theme in 2023. We've been reporting on this. You've, we've shown a chart here. What's the primary method that your organization plans to use? You asked this question of those individuals that cited that they were going to reduce their spend and- >> Mhm. >> consolidating redundant vendors, you know, still leads the way, you know, far behind, cloud optimization is second, but it, but cloud continues to outpace legacy on-prem spending, no doubt. Somebody, it was, the guy's name was Alexander Feiglstorfer from Storyblok, sent in a prediction, said "All in one becomes extinct." Now, generally I would say I disagree with that because, you know, as we know over the years, suites tend to win out over, you know, individual, you know, point products. But I think what's going to happen is all in one is going to remain the norm for these larger companies that are cutting back. They want to consolidate redundant vendors, and the smaller companies are going to stick with that best of breed and be more aggressive and try to compete more effectively. What's your take on that? >> Yeah, I'm seeing much more consolidation in vendors, but also consolidation in functionality. We're seeing people building out new functionality, whether it's, we're going to talk about this later, so I don't want to steal too much of our thunder right now, but data and security also, we're seeing a functionality creep. So I think there's further consolidation happening here. I think niche solutions are going to be less likely, and platform solutions are going to be more likely in a spending environment where you want to reduce your vendors. You want to have one bill to pay, not 10. Another thing on this slide, real quick if I can before I move on, is we had a bunch of people write in and some of the answer options that aren't on this graph but did get cited a lot, unfortunately, is the obvious reduction in staff, hiring freezes, and delaying hardware, were three of the top write-ins. And another one was offshore outsourcing. So in addition to what we're seeing here, there were a lot of write-in options, and I just thought it would be important to state that, but essentially the cost optimization is by and far the highest one, and it's growing. So it's actually increased in our citations over the last year. >> And yeah, specifically consolidating redundant vendors. And so I actually thank you for bringing that other up, 'cause I had asked you, Eric, is there any evidence that repatriation is going on and we don't see it in the numbers, we don't see it even in the other, there was, I think very little or no mention of cloud repatriation, even though it might be happening in this in a smattering. >> Not a single mention, not one single mention. I went through it for you. Yep. Not one write-in. >> All right, let's move on. Number three, security leads M&A in 2023. Now you might say, "Oh, well that's a layup," but let me set this up Eric, because I didn't really do a great job with the slide. I hid the, what you've done, because you basically took, this is from the emerging technology survey with 1,181 responses from November. And what we did is we took Palo Alto and looked at the overlap in Palo Alto Networks accounts with these vendors that were showing on this chart. And Eric, I'm going to ask you to explain why we put a circle around OneTrust, but let me just set it up, and then have you comment on the slide and take, give us more detail. We're seeing private company valuations are off, you know, 10 to 40%. We saw a sneak, do a down round, but pretty good actually only down 12%. We've seen much higher down rounds. Palo Alto Networks we think is going to get busy. Again, they're an inquisitive company, they've been sort of quiet lately, and we think CrowdStrike, Cisco, Microsoft, Zscaler, we're predicting all of those will make some acquisitions and we're thinking that the targets are somewhere in this mess of security taxonomy. Other thing we're predicting AI meets cyber big time in 2023, we're going to probably going to see some acquisitions of those companies that are leaning into AI. We've seen some of that with Palo Alto. And then, you know, your comment to me, Eric, was "The RSA conference is going to be insane, hopping mad, "crazy this April," (Eric laughing) but give us your take on this data, and why the red circle around OneTrust? Take us back to that slide if you would, Alex. >> Sure. There's a few things here. First, let me explain what we're looking at. So because we separate the public companies and the private companies into two separate surveys, this allows us the ability to cross-reference that data. So what we're doing here is in our public survey, the tesis, everyone who cited some spending with Palo Alto, meaning they're a Palo Alto customer, we then cross-reference that with the private tech companies. Who also are they spending with? So what you're seeing here is an overlap. These companies that we have circled are doing the best in Palo Alto's accounts. Now, Palo Alto went and bought Twistlock a few years ago, which this data slide predicted, to be quite honest. And so I don't know if they necessarily are going to go after Snyk. Snyk, sorry. They already have something in that space. What they do need, however, is more on the authentication space. So I'm looking at OneTrust, with a 45% overlap in their overall net sentiment. That is a company that's already existing in their accounts and could be very synergistic to them. BeyondTrust as well, authentication identity. This is something that Palo needs to do to move more down that zero trust path. Now why did I pick Palo first? Because usually they're very inquisitive. They've been a little quiet lately. Secondly, if you look at the backdrop in the markets, the IPO freeze isn't going to last forever. Sooner or later, the IPO markets are going to open up, and some of these private companies are going to tap into public equity. In the meantime, however, cash funding on the private side is drying up. If they need another round, they're not going to get it, and they're certainly not going to get it at the valuations they were getting. So we're seeing valuations maybe come down where they're a touch more attractive, and Palo knows this isn't going to last forever. Cisco knows that, CrowdStrike, Zscaler, all these companies that are trying to make a push to become that vendor that you're consolidating in, around, they have a chance now, they have a window where they need to go make some acquisitions. And that's why I believe leading up to RSA, we're going to see some movement. I think it's going to pretty, a really exciting time in security right now. >> Awesome. Thank you. Great explanation. All right, let's go on the next one. Number four is, it relates to security. Let's stay there. Zero trust moves from hype to reality in 2023. Now again, you might say, "Oh yeah, that's a layup." A lot of these inbounds that we got are very, you know, kind of self-serving, but we always try to put some meat in the bone. So first thing we do is we pull out some commentary from, Eric, your roundtable, your insights roundtable. And we have a CISO from a global hospitality firm says, "For me that's the highest priority." He's talking about zero trust because it's the best ROI, it's the most forward-looking, and it enables a lot of the business transformation activities that we want to do. CISOs tell me that they actually can drive forward transformation projects that have zero trust, and because they can accelerate them, because they don't have to go through the hurdle of, you know, getting, making sure that it's secure. Second comment, zero trust closes that last mile where once you're authenticated, they open up the resource to you in a zero trust way. That's a CISO of a, and a managing director of a cyber risk services enterprise. Your thoughts on this? >> I can be here all day, so I'm going to try to be quick on this one. This is not a fluff piece on this one. There's a couple of other reasons this is happening. One, the board finally gets it. Zero trust at first was just a marketing hype term. Now the board understands it, and that's why CISOs are able to push through it. And what they finally did was redefine what it means. Zero trust simply means moving away from hardware security, moving towards software-defined security, with authentication as its base. The board finally gets that, and now they understand that this is necessary and it's being moved forward. The other reason it's happening now is hybrid work is here to stay. We weren't really sure at first, large companies were still trying to push people back to the office, and it's going to happen. The pendulum will swing back, but hybrid work's not going anywhere. By basically on our own data, we're seeing that 69% of companies expect remote and hybrid to be permanent, with only 30% permanent in office. Zero trust works for a hybrid environment. So all of that is the reason why this is happening right now. And going back to our previous prediction, this is why we're picking Palo, this is why we're picking Zscaler to make these acquisitions. Palo Alto needs to be better on the authentication side, and so does Zscaler. They're both fantastic on zero trust network access, but they need the authentication software defined aspect, and that's why we think this is going to happen. One last thing, in that CISO round table, I also had somebody say, "Listen, Zscaler is incredible. "They're doing incredibly well pervading the enterprise, "but their pricing's getting a little high," and they actually think Palo Alto is well-suited to start taking some of that share, if Palo can make one move. >> Yeah, Palo Alto's consolidation story is very strong. Here's my question and challenge. Do you and me, so I'm always hardcore about, okay, you've got to have evidence. I want to look back at these things a year from now and say, "Did we get it right? Yes or no?" If we got it wrong, we'll tell you we got it wrong. So how are we going to measure this? I'd say a couple things, and you can chime in. One is just the number of vendors talking about it. That's, but the marketing always leads the reality. So the second part of that is we got to get evidence from the buying community. Can you help us with that? >> (laughs) Luckily, that's what I do. I have a data company that asks thousands of IT decision-makers what they're adopting and what they're increasing spend on, as well as what they're decreasing spend on and what they're replacing. So I have snapshots in time over the last 11 years where I can go ahead and compare and contrast whether this adoption is happening or not. So come back to me in 12 months and I'll let you know. >> Now, you know, I will. Okay, let's bring up the next one. Number five, generative AI hits where the Metaverse missed. Of course everybody's talking about ChatGPT, we just wrote last week in a breaking analysis with John Furrier and Sarjeet Joha our take on that. We think 2023 does mark a pivot point as natural language processing really infiltrates enterprise tech just as Amazon turned the data center into an API. We think going forward, you're going to be interacting with technology through natural language, through English commands or other, you know, foreign language commands, and investors are lining up, all the VCs are getting excited about creating something competitive to ChatGPT, according to (indistinct) a hundred million dollars gets you a seat at the table, gets you into the game. (laughing) That's before you have to start doing promotion. But he thinks that's what it takes to actually create a clone or something equivalent. We've seen stuff from, you know, the head of Facebook's, you know, AI saying, "Oh, it's really not that sophisticated, ChatGPT, "it's kind of like IBM Watson, it's great engineering, "but you know, we've got more advanced technology." We know Google's working on some really interesting stuff. But here's the thing. ETR just launched this survey for the February survey. It's in the field now. We circle open AI in this category. They weren't even in the survey, Eric, last quarter. So 52% of the ETR survey respondents indicated a positive sentiment toward open AI. I added up all the sort of different bars, we could double click on that. And then I got this inbound from Scott Stevenson of Deep Graham. He said "AI is recession-proof." I don't know if that's the case, but it's a good quote. So bring this back up and take us through this. Explain this chart for us, if you would. >> First of all, I like Scott's quote better than the Facebook one. I think that's some sour grapes. Meta just spent an insane amount of money on the Metaverse and that's a dud. Microsoft just spent money on open AI and it is hot, undoubtedly hot. We've only been in the field with our current ETS survey for a week. So my caveat is it's preliminary data, but I don't care if it's preliminary data. (laughing) We're getting a sneak peek here at what is the number one net sentiment and mindshare leader in the entire machine-learning AI sector within a week. It's beating Data- >> 600. 600 in. >> It's beating Databricks. And we all know Databricks is a huge established enterprise company, not only in machine-learning AI, but it's in the top 10 in the entire survey. We have over 400 vendors in this survey. It's number eight overall, already. In a week. This is not hype. This is real. And I could go on the NLP stuff for a while. Not only here are we seeing it in open AI and machine-learning and AI, but we're seeing NLP in security. It's huge in email security. It's completely transforming that area. It's one of the reasons I thought Palo might take Abnormal out. They're doing such a great job with NLP in this email side, and also in the data prep tools. NLP is going to take out data prep tools. If we have time, I'll discuss that later. But yeah, this is, to me this is a no-brainer, and we're already seeing it in the data. >> Yeah, John Furrier called, you know, the ChatGPT introduction. He said it reminded him of the Netscape moment, when we all first saw Netscape Navigator and went, "Wow, it really could be transformative." All right, number six, the cloud expands to supercloud as edge computing accelerates and CloudFlare is a big winner in 2023. We've reported obviously on cloud, multi-cloud, supercloud and CloudFlare, basically saying what multi-cloud should have been. We pulled this quote from Atif Kahn, who is the founder and CTO of Alkira, thanks, one of the inbounds, thank you. "In 2023, highly distributed IT environments "will become more the norm "as organizations increasingly deploy hybrid cloud, "multi-cloud and edge settings..." Eric, from one of your round tables, "If my sources from edge computing are coming "from the cloud, that means I have my workloads "running in the cloud. "There is no one better than CloudFlare," That's a senior director of IT architecture at a huge financial firm. And then your analysis shows CloudFlare really growing in pervasion, that sort of market presence in the dataset, dramatically, to near 20%, leading, I think you had told me that they're even ahead of Google Cloud in terms of momentum right now. >> That was probably the biggest shock to me in our January 2023 tesis, which covers the public companies in the cloud computing sector. CloudFlare has now overtaken GCP in overall spending, and I was shocked by that. It's already extremely pervasive in networking, of course, for the edge networking side, and also in security. This is the number one leader in SaaSi, web access firewall, DDoS, bot protection, by your definition of supercloud, which we just did a couple of weeks ago, and I really enjoyed that by the way Dave, I think CloudFlare is the one that fits your definition best, because it's bringing all of these aspects together, and most importantly, it's cloud agnostic. It does not need to rely on Azure or AWS to do this. It has its own cloud. So I just think it's, when we look at your definition of supercloud, CloudFlare is the poster child. >> You know, what's interesting about that too, is a lot of people are poo-pooing CloudFlare, "Ah, it's, you know, really kind of not that sophisticated." "You don't have as many tools," but to your point, you're can have those tools in the cloud, Cloudflare's doing serverless on steroids, trying to keep things really simple, doing a phenomenal job at, you know, various locations around the world. And they're definitely one to watch. Somebody put them on my radar (laughing) a while ago and said, "Dave, you got to do a breaking analysis on CloudFlare." And so I want to thank that person. I can't really name them, 'cause they work inside of a giant hyperscaler. But- (Eric laughing) (Dave chuckling) >> Real quickly, if I can from a competitive perspective too, who else is there? They've already taken share from Akamai, and Fastly is their really only other direct comp, and they're not there. And these guys are in poll position and they're the only game in town right now. I just, I don't see it slowing down. >> I thought one of your comments from your roundtable I was reading, one of the folks said, you know, CloudFlare, if my workloads are in the cloud, they are, you know, dominant, they said not as strong with on-prem. And so Akamai is doing better there. I'm like, "Okay, where would you want to be?" (laughing) >> Yeah, which one of those two would you rather be? >> Right? Anyway, all right, let's move on. Number seven, blockchain continues to look for a home in the enterprise, but devs will slowly begin to adopt in 2023. You know, blockchains have got a lot of buzz, obviously crypto is, you know, the killer app for blockchain. Senior IT architect in financial services from your, one of your insight roundtables said quote, "For enterprises to adopt a new technology, "there have to be proven turnkey solutions. "My experience in talking with my peers are, "blockchain is still an open-source component "where you have to build around it." Now I want to thank Ravi Mayuram, who's the CTO of Couchbase sent in, you know, one of the predictions, he said, "DevOps will adopt blockchain, specifically Ethereum." And he referenced actually in his email to me, Solidity, which is the programming language for Ethereum, "will be in every DevOps pro's playbook, "mirroring the boom in machine-learning. "Newer programming languages like Solidity "will enter the toolkits of devs." His point there, you know, Solidity for those of you don't know, you know, Bitcoin is not programmable. Solidity, you know, came out and that was their whole shtick, and they've been improving that, and so forth. But it, Eric, it's true, it really hasn't found its home despite, you know, the potential for smart contracts. IBM's pushing it, VMware has had announcements, and others, really hasn't found its way in the enterprise yet. >> Yeah, and I got to be honest, I don't think it's going to, either. So when we did our top trends series, this was basically chosen as an anti-prediction, I would guess, that it just continues to not gain hold. And the reason why was that first comment, right? It's very much a niche solution that requires a ton of custom work around it. You can't just plug and play it. And at the end of the day, let's be very real what this technology is, it's a database ledger, and we already have database ledgers in the enterprise. So why is this a priority to move to a different database ledger? It's going to be very niche cases. I like the CTO comment from Couchbase about it being adopted by DevOps. I agree with that, but it has to be a DevOps in a very specific use case, and a very sophisticated use case in financial services, most likely. And that's not across the entire enterprise. So I just think it's still going to struggle to get its foothold for a little bit longer, if ever. >> Great, thanks. Okay, let's move on. Number eight, AWS Databricks, Google Snowflake lead the data charge with Microsoft. Keeping it simple. So let's unpack this a little bit. This is the shared accounts peer position for, I pulled data platforms in for analytics, machine-learning and AI and database. So I could grab all these accounts or these vendors and see how they compare in those three sectors. Analytics, machine-learning and database. Snowflake and Databricks, you know, they're on a crash course, as you and I have talked about. They're battling to be the single source of truth in analytics. They're, there's going to be a big focus. They're already started. It's going to be accelerated in 2023 on open formats. Iceberg, Python, you know, they're all the rage. We heard about Iceberg at Snowflake Summit, last summer or last June. Not a lot of people had heard of it, but of course the Databricks crowd, who knows it well. A lot of other open source tooling. There's a company called DBT Labs, which you're going to talk about in a minute. George Gilbert put them on our radar. We just had Tristan Handy, the CEO of DBT labs, on at supercloud last week. They are a new disruptor in data that's, they're essentially making, they're API-ifying, if you will, KPIs inside the data warehouse and dramatically simplifying that whole data pipeline. So really, you know, the ETL guys should be shaking in their boots with them. Coming back to the slide. Google really remains focused on BigQuery adoption. Customers have complained to me that they would like to use Snowflake with Google's AI tools, but they're being forced to go to BigQuery. I got to ask Google about that. AWS continues to stitch together its bespoke data stores, that's gone down that "Right tool for the right job" path. David Foyer two years ago said, "AWS absolutely is going to have to solve that problem." We saw them start to do it in, at Reinvent, bringing together NoETL between Aurora and Redshift, and really trying to simplify those worlds. There's going to be more of that. And then Microsoft, they're just making it cheap and easy to use their stuff, you know, despite some of the complaints that we hear in the community, you know, about things like Cosmos, but Eric, your take? >> Yeah, my concern here is that Snowflake and Databricks are fighting each other, and it's allowing AWS and Microsoft to kind of catch up against them, and I don't know if that's the right move for either of those two companies individually, Azure and AWS are building out functionality. Are they as good? No they're not. The other thing to remember too is that AWS and Azure get paid anyway, because both Databricks and Snowflake run on top of 'em. So (laughing) they're basically collecting their toll, while these two fight it out with each other, and they build out functionality. I think they need to stop focusing on each other, a little bit, and think about the overall strategy. Now for Databricks, we know they came out first as a machine-learning AI tool. They were known better for that spot, and now they're really trying to play catch-up on that data storage compute spot, and inversely for Snowflake, they were killing it with the compute separation from storage, and now they're trying to get into the MLAI spot. I actually wouldn't be surprised to see them make some sort of acquisition. Frank Slootman has been a little bit quiet, in my opinion there. The other thing to mention is your comment about DBT Labs. If we look at our emerging technology survey, last survey when this came out, DBT labs, number one leader in that data integration space, I'm going to just pull it up real quickly. It looks like they had a 33% overall net sentiment to lead data analytics integration. So they are clearly growing, it's fourth straight survey consecutively that they've grown. The other name we're seeing there a little bit is Cribl, but DBT labs is by far the number one player in this space. >> All right. Okay, cool. Moving on, let's go to number nine. With Automation mixer resurgence in 2023, we're showing again data. The x axis is overlap or presence in the dataset, and the vertical axis is shared net score. Net score is a measure of spending momentum. As always, you've seen UI path and Microsoft Power Automate up until the right, that red line, that 40% line is generally considered elevated. UI path is really separating, creating some distance from Automation Anywhere, they, you know, previous quarters they were much closer. Microsoft Power Automate came on the scene in a big way, they loom large with this "Good enough" approach. I will say this, I, somebody sent me a results of a (indistinct) survey, which showed UiPath actually had more mentions than Power Automate, which was surprising, but I think that's not been the case in the ETR data set. We're definitely seeing a shift from back office to front soft office kind of workloads. Having said that, software testing is emerging as a mainstream use case, we're seeing ML and AI become embedded in end-to-end automations, and low-code is serving the line of business. And so this, we think, is going to increasingly have appeal to organizations in the coming year, who want to automate as much as possible and not necessarily, we've seen a lot of layoffs in tech, and people... You're going to have to fill the gaps with automation. That's a trend that's going to continue. >> Yep, agreed. At first that comment about Microsoft Power Automate having less citations than UiPath, that's shocking to me. I'm looking at my chart right here where Microsoft Power Automate was cited by over 60% of our entire survey takers, and UiPath at around 38%. Now don't get me wrong, 38% pervasion's fantastic, but you know you're not going to beat an entrenched Microsoft. So I don't really know where that comment came from. So UiPath, looking at it alone, it's doing incredibly well. It had a huge rebound in its net score this last survey. It had dropped going through the back half of 2022, but we saw a big spike in the last one. So it's got a net score of over 55%. A lot of people citing adoption and increasing. So that's really what you want to see for a name like this. The problem is that just Microsoft is doing its playbook. At the end of the day, I'm going to do a POC, why am I going to pay more for UiPath, or even take on another separate bill, when we know everyone's consolidating vendors, if my license already includes Microsoft Power Automate? It might not be perfect, it might not be as good, but what I'm hearing all the time is it's good enough, and I really don't want another invoice. >> Right. So how does UiPath, you know, and Automation Anywhere, how do they compete with that? Well, the way they compete with it is they got to have a better product. They got a product that's 10 times better. You know, they- >> Right. >> they're not going to compete based on where the lowest cost, Microsoft's got that locked up, or where the easiest to, you know, Microsoft basically give it away for free, and that's their playbook. So that's, you know, up to UiPath. UiPath brought on Rob Ensslin, I've interviewed him. Very, very capable individual, is now Co-CEO. So he's kind of bringing that adult supervision in, and really tightening up the go to market. So, you know, we know this company has been a rocket ship, and so getting some control on that and really getting focused like a laser, you know, could be good things ahead there for that company. Okay. >> One of the problems, if I could real quick Dave, is what the use cases are. When we first came out with RPA, everyone was super excited about like, "No, UiPath is going to be great for super powerful "projects, use cases." That's not what RPA is being used for. As you mentioned, it's being used for mundane tasks, so it's not automating complex things, which I think UiPath was built for. So if you were going to get UiPath, and choose that over Microsoft, it's going to be 'cause you're doing it for more powerful use case, where it is better. But the problem is that's not where the enterprise is using it. The enterprise are using this for base rote tasks, and simply, Microsoft Power Automate can do that. >> Yeah, it's interesting. I've had people on theCube that are both Microsoft Power Automate customers and UiPath customers, and I've asked them, "Well you know, "how do you differentiate between the two?" And they've said to me, "Look, our users and personal productivity users, "they like Power Automate, "they can use it themselves, and you know, "it doesn't take a lot of, you know, support on our end." The flip side is you could do that with UiPath, but like you said, there's more of a focus now on end-to-end enterprise automation and building out those capabilities. So it's increasingly a value play, and that's going to be obviously the challenge going forward. Okay, my last one, and then I think you've got some bonus ones. Number 10, hybrid events are the new category. Look it, if I can get a thousand inbounds that are largely self-serving, I can do my own here, 'cause we're in the events business. (Eric chuckling) Here's the prediction though, and this is a trend we're seeing, the number of physical events is going to dramatically increase. That might surprise people, but most of the big giant events are going to get smaller. The exception is AWS with Reinvent, I think Snowflake's going to continue to grow. So there are examples of physical events that are growing, but generally, most of the big ones are getting smaller, and there's going to be many more smaller intimate regional events and road shows. These micro-events, they're going to be stitched together. Digital is becoming a first class citizen, so people really got to get their digital acts together, and brands are prioritizing earned media, and they're beginning to build their own news networks, going direct to their customers. And so that's a trend we see, and I, you know, we're right in the middle of it, Eric, so you know we're going to, you mentioned RSA, I think that's perhaps going to be one of those crazy ones that continues to grow. It's shrunk, and then it, you know, 'cause last year- >> Yeah, it did shrink. >> right, it was the last one before the pandemic, and then they sort of made another run at it last year. It was smaller but it was very vibrant, and I think this year's going to be huge. Global World Congress is another one, we're going to be there end of Feb. That's obviously a big big show, but in general, the brands and the technology vendors, even Oracle is going to scale down. I don't know about Salesforce. We'll see. You had a couple of bonus predictions. Quantum and maybe some others? Bring us home. >> Yeah, sure. I got a few more. I think we touched upon one, but I definitely think the data prep tools are facing extinction, unfortunately, you know, the Talons Informatica is some of those names. The problem there is that the BI tools are kind of including data prep into it already. You know, an example of that is Tableau Prep Builder, and then in addition, Advanced NLP is being worked in as well. ThoughtSpot, Intelius, both often say that as their selling point, Tableau has Ask Data, Click has Insight Bot, so you don't have to really be intelligent on data prep anymore. A regular business user can just self-query, using either the search bar, or even just speaking into what it needs, and these tools are kind of doing the data prep for it. I don't think that's a, you know, an out in left field type of prediction, but it's the time is nigh. The other one I would also state is that I think knowledge graphs are going to break through this year. Neo4j in our survey is growing in pervasion in Mindshare. So more and more people are citing it, AWS Neptune's getting its act together, and we're seeing that spending intentions are growing there. Tiger Graph is also growing in our survey sample. I just think that the time is now for knowledge graphs to break through, and if I had to do one more, I'd say real-time streaming analytics moves from the very, very rich big enterprises to downstream, to more people are actually going to be moving towards real-time streaming, again, because the data prep tools and the data pipelines have gotten easier to use, and I think the ROI on real-time streaming is obviously there. So those are three that didn't make the cut, but I thought deserved an honorable mention. >> Yeah, I'm glad you did. Several weeks ago, we did an analyst prediction roundtable, if you will, a cube session power panel with a number of data analysts and that, you know, streaming, real-time streaming was top of mind. So glad you brought that up. Eric, as always, thank you very much. I appreciate the time you put in beforehand. I know it's been crazy, because you guys are wrapping up, you know, the last quarter survey in- >> Been a nuts three weeks for us. (laughing) >> job. I love the fact that you're doing, you know, the ETS survey now, I think it's quarterly now, right? Is that right? >> Yep. >> Yep. So that's phenomenal. >> Four times a year. I'll be happy to jump on with you when we get that done. I know you were really impressed with that last time. >> It's unbelievable. This is so much data at ETR. Okay. Hey, that's a wrap. Thanks again. >> Take care Dave. Good seeing you. >> All right, many thanks to our team here, Alex Myerson as production, he manages the podcast force. Ken Schiffman as well is a critical component of our East Coast studio. Kristen Martin and Cheryl Knight help get the word out on social media and in our newsletters. And Rob Hoof is our editor-in-chief. He's at siliconangle.com. He's just a great editing for us. Thank you all. Remember all these episodes that are available as podcasts, wherever you listen, podcast is doing great. Just search "Breaking analysis podcast." Really appreciate you guys listening. I publish each week on wikibon.com and siliconangle.com, or you can email me directly if you want to get in touch, david.vellante@siliconangle.com. That's how I got all these. I really appreciate it. I went through every single one with a yellow highlighter. It took some time, (laughing) but I appreciate it. You could DM me at dvellante, or comment on our LinkedIn post and please check out etr.ai. Its data is amazing. Best survey data in the enterprise tech business. This is Dave Vellante for theCube Insights, powered by ETR. Thanks for watching, and we'll see you next time on "Breaking Analysis." (upbeat music beginning) (upbeat music ending)

Published Date : Jan 29 2023

SUMMARY :

insights from the Cube and ETR, do for the community, Dave, good to see you. actually come back to me if you would. It just stays at the top. the most aggressive to cut. that have the most to lose What's the primary method still leads the way, you know, So in addition to what we're seeing here, And so I actually thank you I went through it for you. I'm going to ask you to explain and they're certainly not going to get it to you in a zero trust way. So all of that is the One is just the number of So come back to me in 12 So 52% of the ETR survey amount of money on the Metaverse and also in the data prep tools. the cloud expands to the biggest shock to me "Ah, it's, you know, really and Fastly is their really the folks said, you know, for a home in the enterprise, Yeah, and I got to be honest, in the community, you know, and I don't know if that's the right move and the vertical axis is shared net score. So that's really what you want Well, the way they compete So that's, you know, One of the problems, if and that's going to be obviously even Oracle is going to scale down. and the data pipelines and that, you know, Been a nuts three I love the fact I know you were really is so much data at ETR. and we'll see you next time

ENTITIES

Entity	Category	Confidence
Alex Myerson	PERSON	0.99+
Eric	PERSON	0.99+
Eric Bradley	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Rob Hoof	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
10	QUANTITY	0.99+
Ravi Mayuram	PERSON	0.99+
Cheryl Knight	PERSON	0.99+
George Gilbert	PERSON	0.99+
Ken Schiffman	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Tristan Handy	PERSON	0.99+
Dave	PERSON	0.99+
Atif Kahn	PERSON	0.99+
November	DATE	0.99+
Frank Slootman	PERSON	0.99+
APAC	ORGANIZATION	0.99+
Zscaler	ORGANIZATION	0.99+
Palo	ORGANIZATION	0.99+
David Foyer	PERSON	0.99+
February	DATE	0.99+
January 2023	DATE	0.99+
DBT Labs	ORGANIZATION	0.99+
October	DATE	0.99+
Rob Ensslin	PERSON	0.99+
Scott Stevenson	PERSON	0.99+
John Furrier	PERSON	0.99+
69%	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
CrowdStrike	ORGANIZATION	0.99+
4.6%	QUANTITY	0.99+
10 times	QUANTITY	0.99+
2023	DATE	0.99+
Scott	PERSON	0.99+
1,181 responses	QUANTITY	0.99+
Palo Alto	ORGANIZATION	0.99+
third year	QUANTITY	0.99+
Boston	LOCATION	0.99+
Alex	PERSON	0.99+
thousands	QUANTITY	0.99+
OneTrust	ORGANIZATION	0.99+
45%	QUANTITY	0.99+
33%	QUANTITY	0.99+
Databricks	ORGANIZATION	0.99+
two reasons	QUANTITY	0.99+
Palo Alto	LOCATION	0.99+
last year	DATE	0.99+
BeyondTrust	ORGANIZATION	0.99+
7%	QUANTITY	0.99+
IBM	ORGANIZATION	0.99+

Analyst Predictions 2023: The Future of Data Management

(upbeat music) >> Hello, this is Dave Valente with theCUBE, and one of the most gratifying aspects of my role as a host of "theCUBE TV" is I get to cover a wide range of topics. And quite often, we're able to bring to our program a level of expertise that allows us to more deeply explore and unpack some of the topics that we cover throughout the year. And one of our favorite topics, of course, is data. Now, in 2021, after being in isolation for the better part of two years, a group of industry analysts met up at AWS re:Invent and started a collaboration to look at the trends in data and predict what some likely outcomes will be for the coming year. And it resulted in a very popular session that we had last year focused on the future of data management. And I'm very excited and pleased to tell you that the 2023 edition of that predictions episode is back, and with me are five outstanding market analyst, Sanjeev Mohan of SanjMo, Tony Baer of dbInsight, Carl Olofson from IDC, Dave Menninger from Ventana Research, and Doug Henschen, VP and Principal Analyst at Constellation Research. Now, what is it that we're calling you, guys? A data pack like the rat pack? No, no, no, no, that's not it. It's the data crowd, the data crowd, and the crowd includes some of the best minds in the data analyst community. They'll discuss how data management is evolving and what listeners should prepare for in 2023. Guys, welcome back. Great to see you. >> Good to be here. >> Thank you. >> Thanks, Dave. (Tony and Dave faintly speaks) >> All right, before we get into 2023 predictions, we thought it'd be good to do a look back at how we did in 2022 and give a transparent assessment of those predictions. So, let's get right into it. We're going to bring these up here, the predictions from 2022, they're color-coded red, yellow, and green to signify the degree of accuracy. And I'm pleased to report there's no red. Well, maybe some of you will want to debate that grading system. But as always, we want to be open, so you can decide for yourselves. So, we're going to ask each analyst to review their 2022 prediction and explain their rating and what evidence they have that led them to their conclusion. So, Sanjeev, please kick it off. Your prediction was data governance becomes key. I know that's going to knock you guys over, but elaborate, because you had more detail when you double click on that. >> Yeah, absolutely. Thank you so much, Dave, for having us on the show today. And we self-graded ourselves. I could have very easily made my prediction from last year green, but I mentioned why I left it as yellow. I totally fully believe that data governance was in a renaissance in 2022. And why do I say that? You have to look no further than AWS launching its own data catalog called DataZone. Before that, mid-year, we saw Unity Catalog from Databricks went GA. So, overall, I saw there was tremendous movement. When you see these big players launching a new data catalog, you know that they want to be in this space. And this space is highly critical to everything that I feel we will talk about in today's call. Also, if you look at established players, I spoke at Collibra's conference, data.world, work closely with Alation, Informatica, a bunch of other companies, they all added tremendous new capabilities. So, it did become key. The reason I left it as yellow is because I had made a prediction that Collibra would go IPO, and it did not. And I don't think anyone is going IPO right now. The market is really, really down, the funding in VC IPO market. But other than that, data governance had a banner year in 2022. >> Yeah. Well, thank you for that. And of course, you saw data clean rooms being announced at AWS re:Invent, so more evidence. And I like how the fact that you included in your predictions some things that were binary, so you dinged yourself there. So, good job. Okay, Tony Baer, you're up next. Data mesh hits reality check. As you see here, you've given yourself a bright green thumbs up. (Tony laughing) Okay. Let's hear why you feel that was the case. What do you mean by reality check? >> Okay. Thanks, Dave, for having us back again. This is something I just wrote and just tried to get away from, and this just a topic just won't go away. I did speak with a number of folks, early adopters and non-adopters during the year. And I did find that basically that it pretty much validated what I was expecting, which was that there was a lot more, this has now become a front burner issue. And if I had any doubt in my mind, the evidence I would point to is what was originally intended to be a throwaway post on LinkedIn, which I just quickly scribbled down the night before leaving for re:Invent. I was packing at the time, and for some reason, I was doing Google search on data mesh. And I happened to have tripped across this ridiculous article, I will not say where, because it doesn't deserve any publicity, about the eight (Dave laughing) best data mesh software companies of 2022. (Tony laughing) One of my predictions was that you'd see data mesh washing. And I just quickly just hopped on that maybe three sentences and wrote it at about a couple minutes saying this is hogwash, essentially. (laughs) And that just reun... And then, I left for re:Invent. And the next night, when I got into my Vegas hotel room, I clicked on my computer. I saw a 15,000 hits on that post, which was the most hits of any single post I put all year. And the responses were wildly pro and con. So, it pretty much validates my expectation in that data mesh really did hit a lot more scrutiny over this past year. >> Yeah, thank you for that. I remember that article. I remember rolling my eyes when I saw it, and then I recently, (Tony laughing) I talked to Walmart and they actually invoked Martin Fowler and they said that they're working through their data mesh. So, it takes a really lot of thought, and it really, as we've talked about, is really as much an organizational construct. You're not buying data mesh >> Bingo. >> to your point. Okay. Thank you, Tony. Carl Olofson, here we go. You've graded yourself a yellow in the prediction of graph databases. Take off. Please elaborate. >> Yeah, sure. So, I realized in looking at the prediction that it seemed to imply that graph databases could be a major factor in the data world in 2022, which obviously didn't become the case. It was an error on my part in that I should have said it in the right context. It's really a three to five-year time period that graph databases will really become significant, because they still need accepted methodologies that can be applied in a business context as well as proper tools in order for people to be able to use them seriously. But I stand by the idea that it is taking off, because for one thing, Neo4j, which is the leading independent graph database provider, had a very good year. And also, we're seeing interesting developments in terms of things like AWS with Neptune and with Oracle providing graph support in Oracle database this past year. Those things are, as I said, growing gradually. There are other companies like TigerGraph and so forth, that deserve watching as well. But as far as becoming mainstream, it's going to be a few years before we get all the elements together to make that happen. Like any new technology, you have to create an environment in which ordinary people without a whole ton of technical training can actually apply the technology to solve business problems. >> Yeah, thank you for that. These specialized databases, graph databases, time series databases, you see them embedded into mainstream data platforms, but there's a place for these specialized databases, I would suspect we're going to see new types of databases emerge with all this cloud sprawl that we have and maybe to the edge. >> Well, part of it is that it's not as specialized as you might think it. You can apply graphs to great many workloads and use cases. It's just that people have yet to fully explore and discover what those are. >> Yeah. >> And so, it's going to be a process. (laughs) >> All right, Dave Menninger, streaming data permeates the landscape. You gave yourself a yellow. Why? >> Well, I couldn't think of a appropriate combination of yellow and green. Maybe I should have used chartreuse, (Dave laughing) but I was probably a little hard on myself making it yellow. This is another type of specialized data processing like Carl was talking about graph databases is a stream processing, and nearly every data platform offers streaming capabilities now. Often, it's based on Kafka. If you look at Confluent, their revenues have grown at more than 50%, continue to grow at more than 50% a year. They're expected to do more than half a billion dollars in revenue this year. But the thing that hasn't happened yet, and to be honest, they didn't necessarily expect it to happen in one year, is that streaming hasn't become the default way in which we deal with data. It's still a sidecar to data at rest. And I do expect that we'll continue to see streaming become more and more mainstream. I do expect perhaps in the five-year timeframe that we will first deal with data as streaming and then at rest, but the worlds are starting to merge. And we even see some vendors bringing products to market, such as K2View, Hazelcast, and RisingWave Labs. So, in addition to all those core data platform vendors adding these capabilities, there are new vendors approaching this market as well. >> I like the tough grading system, and it's not trivial. And when you talk to practitioners doing this stuff, there's still some complications in the data pipeline. And so, but I think, you're right, it probably was a yellow plus. Doug Henschen, data lakehouses will emerge as dominant. When you talk to people about lakehouses, practitioners, they all use that term. They certainly use the term data lake, but now, they're using lakehouse more and more. What's your thoughts on here? Why the green? What's your evidence there? >> Well, I think, I was accurate. I spoke about it specifically as something that vendors would be pursuing. And we saw yet more lakehouse advocacy in 2022. Google introduced its BigLake service alongside BigQuery. Salesforce introduced Genie, which is really a lakehouse architecture. And it was a safe prediction to say vendors are going to be pursuing this in that AWS, Cloudera, Databricks, Microsoft, Oracle, SAP, Salesforce now, IBM, all advocate this idea of a single platform for all of your data. Now, the trend was also supported in 2023, in that we saw a big embrace of Apache Iceberg in 2022. That's a structured table format. It's used with these lakehouse platforms. It's open, so it ensures portability and it also ensures performance. And that's a structured table that helps with the warehouse side performance. But among those announcements, Snowflake, Google, Cloud Era, SAP, Salesforce, IBM, all embraced Iceberg. But keep in mind, again, I'm talking about this as something that vendors are pursuing as their approach. So, they're advocating end users. It's very cutting edge. I'd say the top, leading edge, 5% of of companies have really embraced the lakehouse. I think, we're now seeing the fast followers, the next 20 to 25% of firms embracing this idea and embracing a lakehouse architecture. I recall Christian Kleinerman at the big Snowflake event last summer, making the announcement about Iceberg, and he asked for a show of hands for any of you in the audience at the keynote, have you heard of Iceberg? And just a smattering of hands went up. So, the vendors are ahead of the curve. They're pushing this trend, and we're now seeing a little bit more mainstream uptake. >> Good. Doug, I was there. It was you, me, and I think, two other hands were up. That was just humorous. (Doug laughing) All right, well, so I liked the fact that we had some yellow and some green. When you think about these things, there's the prediction itself. Did it come true or not? There are the sub predictions that you guys make, and of course, the degree of difficulty. So, thank you for that open assessment. All right, let's get into the 2023 predictions. Let's bring up the predictions. Sanjeev, you're going first. You've got a prediction around unified metadata. What's the prediction, please? >> So, my prediction is that metadata space is currently a mess. It needs to get unified. There are too many use cases of metadata, which are being addressed by disparate systems. For example, data quality has become really big in the last couple of years, data observability, the whole catalog space is actually, people don't like to use the word data catalog anymore, because data catalog sounds like it's a catalog, a museum, if you may, of metadata that you go and admire. So, what I'm saying is that in 2023, we will see that metadata will become the driving force behind things like data ops, things like orchestration of tasks using metadata, not rules. Not saying that if this fails, then do this, if this succeeds, go do that. But it's like getting to the metadata level, and then making a decision as to what to orchestrate, what to automate, how to do data quality check, data observability. So, this space is starting to gel, and I see there'll be more maturation in the metadata space. Even security privacy, some of these topics, which are handled separately. And I'm just talking about data security and data privacy. I'm not talking about infrastructure security. These also need to merge into a unified metadata management piece with some knowledge graph, semantic layer on top, so you can do analytics on it. So, it's no longer something that sits on the side, it's limited in its scope. It is actually the very engine, the very glue that is going to connect data producers and consumers. >> Great. Thank you for that. Doug. Doug Henschen, any thoughts on what Sanjeev just said? Do you agree? Do you disagree? >> Well, I agree with many aspects of what he says. I think, there's a huge opportunity for consolidation and streamlining of these as aspects of governance. Last year, Sanjeev, you said something like, we'll see more people using catalogs than BI. And I have to disagree. I don't think this is a category that's headed for mainstream adoption. It's a behind the scenes activity for the wonky few, or better yet, companies want machine learning and automation to take care of these messy details. We've seen these waves of management technologies, some of the latest data observability, customer data platform, but they failed to sweep away all the earlier investments in data quality and master data management. So, yes, I hope the latest tech offers, glimmers that there's going to be a better, cleaner way of addressing these things. But to my mind, the business leaders, including the CIO, only want to spend as much time and effort and money and resources on these sorts of things to avoid getting breached, ending up in headlines, getting fired or going to jail. So, vendors bring on the ML and AI smarts and the automation of these sorts of activities. >> So, if I may say something, the reason why we have this dichotomy between data catalog and the BI vendors is because data catalogs are very soon, not going to be standalone products, in my opinion. They're going to get embedded. So, when you use a BI tool, you'll actually use the catalog to find out what is it that you want to do, whether you are looking for data or you're looking for an existing dashboard. So, the catalog becomes embedded into the BI tool. >> Hey, Dave Menninger, sometimes you have some data in your back pocket. Do you have any stats (chuckles) on this topic? >> No, I'm glad you asked, because I'm going to... Now, data catalogs are something that's interesting. Sanjeev made a statement that data catalogs are falling out of favor. I don't care what you call them. They're valuable to organizations. Our research shows that organizations that have adequate data catalog technologies are three times more likely to express satisfaction with their analytics for just the reasons that Sanjeev was talking about. You can find what you want, you know you're getting the right information, you know whether or not it's trusted. So, those are good things. So, we expect to see the capabilities, whether it's embedded or separate. We expect to see those capabilities continue to permeate the market. >> And a lot of those catalogs are driven now by machine learning and things. So, they're learning from those patterns of usage by people when people use the data. (airy laughs) >> All right. Okay. Thank you, guys. All right. Let's move on to the next one. Tony Bear, let's bring up the predictions. You got something in here about the modern data stack. We need to rethink it. Is the modern data stack getting long at the tooth? Is it not so modern anymore? >> I think, in a way, it's got almost too modern. It's gotten too, I don't know if it's being long in the tooth, but it is getting long. The modern data stack, it's traditionally been defined as basically you have the data platform, which would be the operational database and the data warehouse. And in between, you have all the tools that are necessary to essentially get that data from the operational realm or the streaming realm for that matter into basically the data warehouse, or as we might be seeing more and more, the data lakehouse. And I think, what's important here is that, or I think, we have seen a lot of progress, and this would be in the cloud, is with the SaaS services. And especially you see that in the modern data stack, which is like all these players, not just the MongoDBs or the Oracles or the Amazons have their database platforms. You see they have the Informatica's, and all the other players there in Fivetrans have their own SaaS services. And within those SaaS services, you get a certain degree of simplicity, which is it takes all the housekeeping off the shoulders of the customers. That's a good thing. The problem is that what we're getting to unfortunately is what I would call lots of islands of simplicity, which means that it leads it (Dave laughing) to the customer to have to integrate or put all that stuff together. It's a complex tool chain. And so, what we really need to think about here, we have too many pieces. And going back to the discussion of catalogs, it's like we have so many catalogs out there, which one do we use? 'Cause chances are of most organizations do not rely on a single catalog at this point. What I'm calling on all the data providers or all the SaaS service providers, is to literally get it together and essentially make this modern data stack less of a stack, make it more of a blending of an end-to-end solution. And that can come in a number of different ways. Part of it is that we're data platform providers have been adding services that are adjacent. And there's some very good examples of this. We've seen progress over the past year or so. For instance, MongoDB integrating search. It's a very common, I guess, sort of tool that basically, that the applications that are developed on MongoDB use, so MongoDB then built it into the database rather than requiring an extra elastic search or open search stack. Amazon just... AWS just did the zero-ETL, which is a first step towards simplifying the process from going from Aurora to Redshift. You've seen same thing with Google, BigQuery integrating basically streaming pipelines. And you're seeing also a lot of movement in database machine learning. So, there's some good moves in this direction. I expect to see more than this year. Part of it's from basically the SaaS platform is adding some functionality. But I also see more importantly, because you're never going to get... This is like asking your data team and your developers, herding cats to standardizing the same tool. In most organizations, that is not going to happen. So, take a look at the most popular combinations of tools and start to come up with some pre-built integrations and pre-built orchestrations, and offer some promotional pricing, maybe not quite two for, but in other words, get two products for the price of two services or for the price of one and a half. I see a lot of potential for this. And it's to me, if the class was to simplify things, this is the next logical step and I expect to see more of this here. >> Yeah, and you see in Oracle, MySQL heat wave, yet another example of eliminating that ETL. Carl Olofson, today, if you think about the data stack and the application stack, they're largely separate. Do you have any thoughts on how that's going to play out? Does that play into this prediction? What do you think? >> Well, I think, that the... I really like Tony's phrase, islands of simplification. It really says (Tony chuckles) what's going on here, which is that all these different vendors you ask about, about how these stacks work. All these different vendors have their own stack vision. And you can... One application group is going to use one, and another application group is going to use another. And some people will say, let's go to, like you go to a Informatica conference and they say, we should be the center of your universe, but you can't connect everything in your universe to Informatica, so you need to use other things. So, the challenge is how do we make those things work together? As Tony has said, and I totally agree, we're never going to get to the point where people standardize on one organizing system. So, the alternative is to have metadata that can be shared amongst those systems and protocols that allow those systems to coordinate their operations. This is standard stuff. It's not easy. But the motive for the vendors is that they can become more active critical players in the enterprise. And of course, the motive for the customer is that things will run better and more completely. So, I've been looking at this in terms of two kinds of metadata. One is the meaning metadata, which says what data can be put together. The other is the operational metadata, which says basically where did it come from? Who created it? What's its current state? What's the security level? Et cetera, et cetera, et cetera. The good news is the operational stuff can actually be done automatically, whereas the meaning stuff requires some human intervention. And as we've already heard from, was it Doug, I think, people are disinclined to put a lot of definition into meaning metadata. So, that may be the harder one, but coordination is key. This problem has been with us forever, but with the addition of new data sources, with streaming data with data in different formats, the whole thing has, it's been like what a customer of mine used to say, "I understand your product can make my system run faster, but right now I just feel I'm putting my problems on roller skates. (chuckles) I don't need that to accelerate what's already not working." >> Excellent. Okay, Carl, let's stay with you. I remember in the early days of the big data movement, Hadoop movement, NoSQL was the big thing. And I remember Amr Awadallah said to us in theCUBE that SQL is the killer app for big data. So, your prediction here, if we bring that up is SQL is back. Please elaborate. >> Yeah. So, of course, some people would say, well, it never left. Actually, that's probably closer to true, but in the perception of the marketplace, there's been all this noise about alternative ways of storing, retrieving data, whether it's in key value stores or document databases and so forth. We're getting a lot of messaging that for a while had persuaded people that, oh, we're not going to do analytics in SQL anymore. We're going to use Spark for everything, except that only a handful of people know how to use Spark. Oh, well, that's a problem. Well, how about, and for ordinary conventional business analytics, Spark is like an over-engineered solution to the problem. SQL works just great. What's happened in the past couple years, and what's going to continue to happen is that SQL is insinuating itself into everything we're seeing. We're seeing all the major data lake providers offering SQL support, whether it's Databricks or... And of course, Snowflake is loving this, because that is what they do, and their success is certainly points to the success of SQL, even MongoDB. And we were all, I think, at the MongoDB conference where on one day, we hear SQL is dead. They're not teaching SQL in schools anymore, and this kind of thing. And then, a couple days later at the same conference, they announced we're adding a new analytic capability-based on SQL. But didn't you just say SQL is dead? So, the reality is that SQL is better understood than most other methods of certainly of retrieving and finding data in a data collection, no matter whether it happens to be relational or non-relational. And even in systems that are very non-relational, such as graph and document databases, their query languages are being built or extended to resemble SQL, because SQL is something people understand. >> Now, you remember when we were in high school and you had had to take the... Your debating in the class and you were forced to take one side and defend it. So, I was was at a Vertica conference one time up on stage with Curt Monash, and I had to take the NoSQL, the world is changing paradigm shift. And so just to be controversial, I said to him, Curt Monash, I said, who really needs acid compliance anyway? Tony Baer. And so, (chuckles) of course, his head exploded, but what are your thoughts (guests laughing) on all this? >> Well, my first thought is congratulations, Dave, for surviving being up on stage with Curt Monash. >> Amen. (group laughing) >> I definitely would concur with Carl. We actually are definitely seeing a SQL renaissance and if there's any proof of the pudding here, I see lakehouse is being icing on the cake. As Doug had predicted last year, now, (clears throat) for the record, I think, Doug was about a year ahead of time in his predictions that this year is really the year that I see (clears throat) the lakehouse ecosystems really firming up. You saw the first shots last year. But anyway, on this, data lakes will not go away. I've actually, I'm on the home stretch of doing a market, a landscape on the lakehouse. And lakehouse will not replace data lakes in terms of that. There is the need for those, data scientists who do know Python, who knows Spark, to go in there and basically do their thing without all the restrictions or the constraints of a pre-built, pre-designed table structure. I get that. Same thing for developing models. But on the other hand, there is huge need. Basically, (clears throat) maybe MongoDB was saying that we're not teaching SQL anymore. Well, maybe we have an oversupply of SQL developers. Well, I'm being facetious there, but there is a huge skills based in SQL. Analytics have been built on SQL. They came with lakehouse and why this really helps to fuel a SQL revival is that the core need in the data lake, what brought on the lakehouse was not so much SQL, it was a need for acid. And what was the best way to do it? It was through a relational table structure. So, the whole idea of acid in the lakehouse was not to turn it into a transaction database, but to make the data trusted, secure, and more granularly governed, where you could govern down to column and row level, which you really could not do in a data lake or a file system. So, while lakehouse can be queried in a manner, you can go in there with Python or whatever, it's built on a relational table structure. And so, for that end, for those types of data lakes, it becomes the end state. You cannot bypass that table structure as I learned the hard way during my research. So, the bottom line I'd say here is that lakehouse is proof that we're starting to see the revenge of the SQL nerds. (Dave chuckles) >> Excellent. Okay, let's bring up back up the predictions. Dave Menninger, this one's really thought-provoking and interesting. We're hearing things like data as code, new data applications, machines actually generating plans with no human involvement. And your prediction is the definition of data is expanding. What do you mean by that? >> So, I think, for too long, we've thought about data as the, I would say facts that we collect the readings off of devices and things like that, but data on its own is really insufficient. Organizations need to manipulate that data and examine derivatives of the data to really understand what's happening in their organization, why has it happened, and to project what might happen in the future. And my comment is that these data derivatives need to be supported and managed just like the data needs to be managed. We can't treat this as entirely separate. Think about all the governance discussions we've had. Think about the metadata discussions we've had. If you separate these things, now you've got more moving parts. We're talking about simplicity and simplifying the stack. So, if these things are treated separately, it creates much more complexity. I also think it creates a little bit of a myopic view on the part of the IT organizations that are acquiring these technologies. They need to think more broadly. So, for instance, metrics. Metric stores are becoming much more common part of the tooling that's part of a data platform. Similarly, feature stores are gaining traction. So, those are designed to promote the reuse and consistency across the AI and ML initiatives. The elements that are used in developing an AI or ML model. And let me go back to metrics and just clarify what I mean by that. So, any type of formula involving the data points. I'm distinguishing metrics from features that are used in AI and ML models. And the data platforms themselves are increasingly managing the models as an element of data. So, just like figuring out how to calculate a metric. Well, if you're going to have the features associated with an AI and ML model, you probably need to be managing the model that's associated with those features. The other element where I see expansion is around external data. Organizations for decades have been focused on the data that they generate within their own organization. We see more and more of these platforms acquiring and publishing data to external third-party sources, whether they're within some sort of a partner ecosystem or whether it's a commercial distribution of that information. And our research shows that when organizations use external data, they derive even more benefits from the various analyses that they're conducting. And the last great frontier in my opinion on this expanding world of data is the world of driver-based planning. Very few of the major data platform providers provide these capabilities today. These are the types of things you would do in a spreadsheet. And we all know the issues associated with spreadsheets. They're hard to govern, they're error-prone. And so, if we can take that type of analysis, collecting the occupancy of a rental property, the projected rise in rental rates, the fluctuations perhaps in occupancy, the interest rates associated with financing that property, we can project forward. And that's a very common thing to do. What the income might look like from that property income, the expenses, we can plan and purchase things appropriately. So, I think, we need this broader purview and I'm beginning to see some of those things happen. And the evidence today I would say, is more focused around the metric stores and the feature stores starting to see vendors offer those capabilities. And we're starting to see the ML ops elements of managing the AI and ML models find their way closer to the data platforms as well. >> Very interesting. When I hear metrics, I think of KPIs, I think of data apps, orchestrate people and places and things to optimize around a set of KPIs. It sounds like a metadata challenge more... Somebody once predicted they'll have more metadata than data. Carl, what are your thoughts on this prediction? >> Yeah, I think that what Dave is describing as data derivatives is in a way, another word for what I was calling operational metadata, which not about the data itself, but how it's used, where it came from, what the rules are governing it, and that kind of thing. If you have a rich enough set of those things, then not only can you do a model of how well your vacation property rental may do in terms of income, but also how well your application that's measuring that is doing for you. In other words, how many times have I used it, how much data have I used and what is the relationship between the data that I've used and the benefits that I've derived from using it? Well, we don't have ways of doing that. What's interesting to me is that folks in the content world are way ahead of us here, because they have always tracked their content using these kinds of attributes. Where did it come from? When was it created, when was it modified? Who modified it? And so on and so forth. We need to do more of that with the structure data that we have, so that we can track what it's used. And also, it tells us how well we're doing with it. Is it really benefiting us? Are we being efficient? Are there improvements in processes that we need to consider? Because maybe data gets created and then it isn't used or it gets used, but it gets altered in some way that actually misleads people. (laughs) So, we need the mechanisms to be able to do that. So, I would say that that's... And I'd say that it's true that we need that stuff. I think, that starting to expand is probably the right way to put it. It's going to be expanding for some time. I think, we're still a distance from having all that stuff really working together. >> Maybe we should say it's gestating. (Dave and Carl laughing) >> Sorry, if I may- >> Sanjeev, yeah, I was going to say this... Sanjeev, please comment. This sounds to me like it supports Zhamak Dehghani's principles, but please. >> Absolutely. So, whether we call it data mesh or not, I'm not getting into that conversation, (Dave chuckles) but data (audio breaking) (Tony laughing) everything that I'm hearing what Dave is saying, Carl, this is the year when data products will start to take off. I'm not saying they'll become mainstream. They may take a couple of years to become so, but this is data products, all this thing about vacation rentals and how is it doing, that data is coming from different sources. I'm packaging it into our data product. And to Carl's point, there's a whole operational metadata associated with it. The idea is for organizations to see things like developer productivity, how many releases am I doing of this? What data products are most popular? I'm actually in right now in the process of formulating this concept that just like we had data catalogs, we are very soon going to be requiring data products catalog. So, I can discover these data products. I'm not just creating data products left, right, and center. I need to know, do they already exist? What is the usage? If no one is using a data product, maybe I want to retire and save cost. But this is a data product. Now, there's a associated thing that is also getting debated quite a bit called data contracts. And a data contract to me is literally just formalization of all these aspects of a product. How do you use it? What is the SLA on it, what is the quality that I am prescribing? So, data product, in my opinion, shifts the conversation to the consumers or to the business people. Up to this point when, Dave, you're talking about data and all of data discovery curation is a very data producer-centric. So, I think, we'll see a shift more into the consumer space. >> Yeah. Dave, can I just jump in there just very quickly there, which is that what Sanjeev has been saying there, this is really central to what Zhamak has been talking about. It's basically about making, one, data products are about the lifecycle management of data. Metadata is just elemental to that. And essentially, one of the things that she calls for is making data products discoverable. That's exactly what Sanjeev was talking about. >> By the way, did everyone just no notice how Sanjeev just snuck in another prediction there? So, we've got- >> Yeah. (group laughing) >> But you- >> Can we also say that he snuck in, I think, the term that we'll remember today, which is metadata museums. >> Yeah, but- >> Yeah. >> And also comment to, Tony, to your last year's prediction, you're really talking about it's not something that you're going to buy from a vendor. >> No. >> It's very specific >> Mm-hmm. >> to an organization, their own data product. So, touche on that one. Okay, last prediction. Let's bring them up. Doug Henschen, BI analytics is headed to embedding. What does that mean? >> Well, we all know that conventional BI dashboarding reporting is really commoditized from a vendor perspective. It never enjoyed truly mainstream adoption. Always that 25% of employees are really using these things. I'm seeing rising interest in embedding concise analytics at the point of decision or better still, using analytics as triggers for automation and workflows, and not even necessitating human interaction with visualizations, for example, if we have confidence in the analytics. So, leading companies are pushing for next generation applications, part of this low-code, no-code movement we've seen. And they want to build that decision support right into the app. So, the analytic is right there. Leading enterprise apps vendors, Salesforce, SAP, Microsoft, Oracle, they're all building smart apps with the analytics predictions, even recommendations built into these applications. And I think, the progressive BI analytics vendors are supporting this idea of driving insight to action, not necessarily necessitating humans interacting with it if there's confidence. So, we want prediction, we want embedding, we want automation. This low-code, no-code development movement is very important to bringing the analytics to where people are doing their work. We got to move beyond the, what I call swivel chair integration, between where people do their work and going off to separate reports and dashboards, and having to interpret and analyze before you can go back and do take action. >> And Dave Menninger, today, if you want, analytics or you want to absorb what's happening in the business, you typically got to go ask an expert, and then wait. So, what are your thoughts on Doug's prediction? >> I'm in total agreement with Doug. I'm going to say that collectively... So, how did we get here? I'm going to say collectively as an industry, we made a mistake. We made BI and analytics separate from the operational systems. Now, okay, it wasn't really a mistake. We were limited by the technology available at the time. Decades ago, we had to separate these two systems, so that the analytics didn't impact the operations. You don't want the operations preventing you from being able to do a transaction. But we've gone beyond that now. We can bring these two systems and worlds together and organizations recognize that need to change. As Doug said, the majority of the workforce and the majority of organizations doesn't have access to analytics. That's wrong. (chuckles) We've got to change that. And one of the ways that's going to change is with embedded analytics. 2/3 of organizations recognize that embedded analytics are important and it even ranks higher in importance than AI and ML in those organizations. So, it's interesting. This is a really important topic to the organizations that are consuming these technologies. The good news is it works. Organizations that have embraced embedded analytics are more comfortable with self-service than those that have not, as opposed to turning somebody loose, in the wild with the data. They're given a guided path to the data. And the research shows that 65% of organizations that have adopted embedded analytics are comfortable with self-service compared with just 40% of organizations that are turning people loose in an ad hoc way with the data. So, totally behind Doug's predictions. >> Can I just break in with something here, a comment on what Dave said about what Doug said, which (laughs) is that I totally agree with what you said about embedded analytics. And at IDC, we made a prediction in our future intelligence, future of intelligence service three years ago that this was going to happen. And the thing that we're waiting for is for developers to build... You have to write the applications to work that way. It just doesn't happen automagically. Developers have to write applications that reference analytic data and apply it while they're running. And that could involve simple things like complex queries against the live data, which is through something that I've been calling analytic transaction processing. Or it could be through something more sophisticated that involves AI operations as Doug has been suggesting, where the result is enacted pretty much automatically unless the scores are too low and you need to have a human being look at it. So, I think that that is definitely something we've been watching for. I'm not sure how soon it will come, because it seems to take a long time for people to change their thinking. But I think, as Dave was saying, once they do and they apply these principles in their application development, the rewards are great. >> Yeah, this is very much, I would say, very consistent with what we were talking about, I was talking about before, about basically rethinking the modern data stack and going into more of an end-to-end solution solution. I think, that what we're talking about clearly here is operational analytics. There'll still be a need for your data scientists to go offline just in their data lakes to do all that very exploratory and that deep modeling. But clearly, it just makes sense to bring operational analytics into where people work into their workspace and further flatten that modern data stack. >> But with all this metadata and all this intelligence, we're talking about injecting AI into applications, it does seem like we're entering a new era of not only data, but new era of apps. Today, most applications are about filling forms out or codifying processes and require a human input. And it seems like there's enough data now and enough intelligence in the system that the system can actually pull data from, whether it's the transaction system, e-commerce, the supply chain, ERP, and actually do something with that data without human involvement, present it to humans. Do you guys see this as a new frontier? >> I think, that's certainly- >> Very much so, but it's going to take a while, as Carl said. You have to design it, you have to get the prediction into the system, you have to get the analytics at the point of decision has to be relevant to that decision point. >> And I also recall basically a lot of the ERP vendors back like 10 years ago, we're promising that. And the fact that we're still looking at the promises shows just how difficult, how much of a challenge it is to get to what Doug's saying. >> One element that could be applied in this case is (indistinct) architecture. If applications are developed that are event-driven rather than following the script or sequence that some programmer or designer had preconceived, then you'll have much more flexible applications. You can inject decisions at various points using this technology much more easily. It's a completely different way of writing applications. And it actually involves a lot more data, which is why we should all like it. (laughs) But in the end (Tony laughing) it's more stable, it's easier to manage, easier to maintain, and it's actually more efficient, which is the result of an MIT study from about 10 years ago, and still, we are not seeing this come to fruition in most business applications. >> And do you think it's going to require a new type of data platform database? Today, data's all far-flung. We see that's all over the clouds and at the edge. Today, you cache- >> We need a super cloud. >> You cache that data, you're throwing into memory. I mentioned, MySQL heat wave. There are other examples where it's a brute force approach, but maybe we need new ways of laying data out on disk and new database architectures, and just when we thought we had it all figured out. >> Well, without referring to disk, which to my mind, is almost like talking about cave painting. I think, that (Dave laughing) all the things that have been mentioned by all of us today are elements of what I'm talking about. In other words, the whole improvement of the data mesh, the improvement of metadata across the board and improvement of the ability to track data and judge its freshness the way we judge the freshness of a melon or something like that, to determine whether we can still use it. Is it still good? That kind of thing. Bringing together data from multiple sources dynamically and real-time requires all the things we've been talking about. All the predictions that we've talked about today add up to elements that can make this happen. >> Well, guys, it's always tremendous to get these wonderful minds together and get your insights, and I love how it shapes the outcome here of the predictions, and let's see how we did. We're going to leave it there. I want to thank Sanjeev, Tony, Carl, David, and Doug. Really appreciate the collaboration and thought that you guys put into these sessions. Really, thank you. >> Thank you. >> Thanks, Dave. >> Thank you for having us. >> Thanks. >> Thank you. >> All right, this is Dave Valente for theCUBE, signing off for now. Follow these guys on social media. Look for coverage on siliconangle.com, theCUBE.net. Thank you for watching. (upbeat music)

Published Date : Jan 11 2023

SUMMARY :

and pleased to tell you (Tony and Dave faintly speaks) that led them to their conclusion. down, the funding in VC IPO market. And I like how the fact And I happened to have tripped across I talked to Walmart in the prediction of graph databases. But I stand by the idea and maybe to the edge. You can apply graphs to great And so, it's going to streaming data permeates the landscape. and to be honest, I like the tough grading the next 20 to 25% of and of course, the degree of difficulty. that sits on the side, Thank you for that. And I have to disagree. So, the catalog becomes Do you have any stats for just the reasons that And a lot of those catalogs about the modern data stack. and more, the data lakehouse. and the application stack, So, the alternative is to have metadata that SQL is the killer app for big data. but in the perception of the marketplace, and I had to take the NoSQL, being up on stage with Curt Monash. (group laughing) is that the core need in the data lake, And your prediction is the and examine derivatives of the data to optimize around a set of KPIs. that folks in the content world (Dave and Carl laughing) going to say this... shifts the conversation to the consumers And essentially, one of the things (group laughing) the term that we'll remember today, to your last year's prediction, is headed to embedding. and going off to separate happening in the business, so that the analytics didn't And the thing that we're waiting for and that deep modeling. that the system can of decision has to be relevant And the fact that we're But in the end We see that's all over the You cache that data, and improvement of the and I love how it shapes the outcome here Thank you for watching.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Doug Henschen	PERSON	0.99+
Dave Menninger	PERSON	0.99+
Doug	PERSON	0.99+
Carl	PERSON	0.99+
Carl Olofson	PERSON	0.99+
Dave Menninger	PERSON	0.99+
Tony Baer	PERSON	0.99+
Tony	PERSON	0.99+
Dave Valente	PERSON	0.99+
Collibra	ORGANIZATION	0.99+
Curt Monash	PERSON	0.99+
Sanjeev Mohan	PERSON	0.99+
Christian Kleinerman	PERSON	0.99+
Dave Valente	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Sanjeev	PERSON	0.99+
Constellation Research	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Ventana Research	ORGANIZATION	0.99+
2022	DATE	0.99+
Hazelcast	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Tony Bear	PERSON	0.99+
25%	QUANTITY	0.99+
2021	DATE	0.99+
last year	DATE	0.99+
65%	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
today	DATE	0.99+
five-year	QUANTITY	0.99+
TigerGraph	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
two services	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
David	PERSON	0.99+
RisingWave Labs	ORGANIZATION	0.99+

Breaking Analysis: CIOs in a holding pattern but ready to strike at monetization

>> From theCUBE Studios in Palo Alto and Boston, bringing you data-driven insights from theCUBE and ETR. This is "Breaking Analysis" with Dave Vellante. >> Recent conversations with IT decision makers show a stark contrast between exiting 2023 versus the mindset when we were leaving 2022. CIOs are generally funding new initiatives by pushing off or cutting lower priority items, while security efforts are still being funded. Those that enable business initiatives that generate revenue or taking priority over cleaning up legacy technical debt. The bottom line is, for the moment, at least, the mindset is not cut everything, rather, it's put a pause on cleaning up legacy hairballs and fund monetization. Hello, and welcome to this week's Wikibon Cube Insights powered by ETR. In this breaking analysis, we tap recent discussions from two primary sources, year-end ETR roundtables with IT decision makers, and CUBE conversations with data, cloud, and IT architecture practitioners. The sources of data for this breaking analysis come from the following areas. Eric Bradley's recent ETR year end panel featured a financial services DevOps and SRE manager, a CSO in a large hospitality firm, a director of IT for a big tech company, the head of IT infrastructure for a financial firm, and a CTO for global travel enterprise, and for our upcoming Supercloud2 conference on January 17th, which you can register free by the way, at supercloud.world, we've had CUBE conversations with data and cloud practitioners, specifically, heads of data in retail and financial services, a cloud architect and a biotech firm, the director of cloud and data at a large media firm, and the director of engineering at a financial services company. Now we've curated commentary from these sources and now we share them with you today as anecdotal evidence supporting what we've been reporting on in the marketplace for these last couple of quarters. On this program, we've likened the economy to the slingshot effect when you're driving, when you're cruising along at full speed on the highway, and suddenly you see red brake lights up ahead, so, you tap your own brakes and then you speed up again, and traffic is moving along at full speed, so, you think nothing of it, and then, all of a sudden, the same thing happens. You slow down to a crawl and you start wondering, "What the heck is happening?" And you become a lot more cautious about the rate of acceleration when you start moving again. Well, that's the trend in IT spend right now. Back in June, we reported that despite the macro headwinds, CIOs were still expecting 6% to 7% spending growth for 2022. Now that was down from 8%, which we reported at the beginning of 2022. That was before Ukraine, and Fed tightening, but given those two factors, you know that that seemed pretty robust, but throughout the fall, we began reporting consistently declining expectations where CIOs are now saying Q4 will come in at around 3% growth relative to last year, and they're expecting, or should we say hoping that it pops back up in 2023 to 4% to 5%. The recent ETR panelists, when they heard this, are saying based on their businesses and discussions with their peers, they could see low single digit growth for 2023, so, 1%, 2%, 3%, so, this sort of slingshotting, or sometimes we call it a seesaw economy, has caught everyone off guard. Amazon is a good example of this, and there are others, but Amazon entered the pandemic with around 800,000 employees. It doubled that workforce during the pandemic. Now, right before Thanksgiving in 2022, Amazon announced that it was laying off 10,000 employees, and, Jassy, the CEO of Amazon, just last week announced that number is now going to grow to 18,000. Now look, this is a rounding error at Amazon from a headcount standpoint and their headcount remains far above 2019 levels. Its stock price, however, does not and it's back down to 2019 levels. The point is that visibility is very poor right now and it's reflected in that uncertainty. We've seen a lot of layoffs, obviously, the stock market's choppy, et cetera. Now importantly, not everything is on hold, and this downturn is different from previous tech pullbacks in that the speed at which new initiatives can be rolled out is much greater thanks to the cloud, and if you can show a fast return, you're going to get funding. Organizations are pausing on the cleanup of technical debt, unless it's driving fast business value. They're holding off on modernization projects. Those business enablement initiatives are still getting funded. CIOs are finding the money by consolidating redundant vendors, and they're stealing from other pockets of budget, so, it's not surprising that cybersecurity remains the number one technology priority in 2023. We've been reporting that for quite some time now. It's specifically cloud, cloud native security container and API security. That's where all the action is, because there's still holes to plug from that forced march to digital that occurred during COVID. Cloud migration, kind of showing here on number two on this chart, still a high priority, while optimizing cloud spend is definitely a strategy that organizations are taking to cut costs. It's behind consolidating redundant vendors by a long shot. There's very little evidence that cloud repatriation, i.e., moving workloads back on prem is a major cost cutting trend. The data just doesn't show it. What is a trend is getting more real time with analytics, so, companies can do faster and more accurate customer targeting, and they're really prioritizing that, obviously, in this down economy. Real time, we sometimes lose it, what's real time? Real time, we sometimes define as before you lose the customer. Now in the hiring front, customers tell us they're still having a hard time finding qualified site reliability engineers, SREs, Kubernetes expertise, and deep analytics pros. These job markets remain very tight. Let's stay with security for just a moment. We said many times that, prior to COVID, zero trust was this undefined buzzword, and the joke, of course, is, if you ask three people, "What is zero trust?" You're going to get three different answers, but the truth is that virtually every security company that was resisting taking a position on zero trust in an attempt to avoid... They didn't want to get caught up in the buzzword vortex, but they're now really being forced to go there by CISOs, so, there are some good quotes here on cyber that we want to share that came out of the recent conversations that we cited up front. The first one, "Zero trust is the highest ROI, because it enables business transformation." In other words, if I can have good security, I can move fast, it's not a blocker anymore. Second quote here, "ZTA," zero trust architecture, "Is more than securing the perimeter. It encompasses strong authentication and multiple identity layers. It requires taking a software approach to security instead of a hardware focus." The next one, "I'd love to have a security data lake that I could apply to asset management, vulnerability management, incident management, incident response, and all aspects for my security team. I see huge promise in that space," and the last one, I see NLP, natural language processing, as the foundation for email security, so, instead of searching for IP addresses, you can now read emails at light speed and identify phishing threats, so, look at, this is a small snapshot of the mindset around security, but I'll add, when you talk to the likes of CrowdStrike, and Zscaler, and Okta, and Palo Alto Networks, and many other security firms, they're listening to these narratives around zero trust. I'm confident they're working hard on skating to this puck, if you will. A good example is this idea of a security data lake and using analytics to improve security. We're hearing a lot about that. We're hearing architectures, there's acquisitions in that regard, and so, that's becoming real, and there are many other examples, because data is at the heart of digital business. This is the next area that we want to talk about. It's obvious that data, as a topic, gets a lot of mind share amongst practitioners, but getting data right is still really hard. It's a challenge for most organizations to get ROI and expected return out of data. Most companies still put data at the periphery of their businesses. It's not at the core. Data lives within silos or different business units, different clouds, it's on-prem, and increasingly it's at the edge, and it seems like the problem is getting worse before it gets better, so, here are some instructive comments from our recent conversations. The first one, "We're publishing events onto Kafka, having those events be processed by Dataproc." Dataproc is a Google managed service to run Hadoop, and Spark, and Flank, and Presto, and a bunch of other open source tools. We're putting them into the appropriate storage models within Google, and then normalize the data into BigQuery, and only then can you take advantage of tools like ThoughtSpot, so, here's a company like ThoughtSpot, and they're all about simplifying data, democratizing data, but to get there, you have to go through some pretty complex processes, so, this is a good example. All right, another comment. "In order to use Google's AI tools, we have to put the data into BigQuery. They haven't integrated in the way AWS and Snowflake have with SageMaker. Moving the data is too expensive, time consuming, and risky," so, I'll just say this, sharing data is a killer super cloud use case, and firms like Snowflake are on top of it, but it's still not pretty across clouds, and Google's posture seems to be, "We're going to let our database product competitiveness drive the strategy first, and the ecosystem is going to take a backseat." Now, in a way, I get it, owning the database is critical, and Google doesn't want to capitulate on that front. Look, BigQuery is really good and competitive, but you can't help but roll your eyes when a CEO stands up, and look, I'm not calling out Thomas Kurian, every CEO does this, and talks about how important their customers are, and they'll do whatever is right by the customer, so, look, I'm telling you, I'm rolling my eyes on that. Now let me also comment, AWS has figured this out. They're killing it in database. If you take Redshift for example, it's still growing, as is Aurora, really fast growing services and other data stores, but AWS realizes it can make more money in the long-term partnering with the Snowflakes and Databricks of the world, and other ecosystem vendors versus sub optimizing their relationships with partners and customers in order to sell more of their own homegrown tools. I get it. It's hard not to feature your own product. IBM chose OS/2 over Windows, and tried for years to popularize it. It failed. Lotus, go back way back to Lotus 1, 2, and 3, they refused to run on Windows when it first came out. They were running on DEC VAX. Many of you young people in the United States have never even heard of DEC VAX. IBM wanted to run every everything only in its cloud, the same with Oracle, originally. VMware, as you might recall, tried to build its own cloud, but, eventually, when the market speaks and reveals what seems to be obvious to analysts, years before, the vendors come around, they face reality, and they stop wasting money, fighting a losing battle. "The trend is your friend," as the saying goes. All right, last pull quote on data, "The hardest part is transformations, moving traditional Informatica, Teradata, or Oracle infrastructure to something more modern and real time, and that's why people still run apps in COBOL. In IT, we rarely get rid of stuff, rather we add on another coat of paint until the wood rots out or the roof is going to cave in. All right, the last key finding we want to highlight is going to bring us back to the cloud repatriation myth. Followers of this program know it's a real sore spot with us. We've heard the stories about repatriation, we've read the thoughtful articles from VCs on the subject, we've been whispered to by vendors that you should investigate this trend. It's really happening, but the data simply doesn't support it. Here's the question that was posed to these practitioners. If you had unlimited budget and the economy miraculously flipped, what initiatives would you tackle first? Where would you really lean into? The first answer, "I'd rip out legacy on-prem infrastructure and move to the cloud even faster," so, the thing here is, look, maybe renting infrastructure is more expensive than owning, maybe, but if I can optimize my rental with better utilization, turn off compute, use things like serverless, get on a steeper and higher performance over time, and lower cost Silicon curve with things like Graviton, tap best of breed tools in AI, and other areas that make my business more competitive. Move faster, fail faster, experiment more quickly, and cheaply, what's that worth? Even the most hard-o CFOs understand the business benefits far outweigh the possible added cost per gigabyte, and, again, I stress "possible." Okay, other interesting comments from practitioners. "I'd hire 50 more data engineers and accelerate our real-time data capabilities to better target customers." Real-time is becoming a thing. AI is being injected into data and apps to make faster decisions, perhaps, with less or even no human involvement. That's on the rise. Next quote, "I'd like to focus on resolving the concerns around cloud data compliance," so, again, despite the risks of data being spread out in different clouds, organizations realize cloud is a given, and they want to find ways to make it work better, not move away from it. The same thing in the next one, "I would automate the data analytics pipeline and focus on a safer way to share data across the states without moving it," and, finally, "The way I'm addressing complexity is to standardize on a single cloud." MonoCloud is actually a thing. We're hearing this more and more. Yes, my company has multiple clouds, but in my group, we've standardized on a single cloud to simplify things, and this is a somewhat dangerous trend, because it's creating even more silos and it's an opportunity that needs to be addressed, and that's why we've been talking so much about supercloud is a cross-cloud, unifying, architectural framework, or, perhaps, it's a platform. In fact, that's a question that we will be exploring later this month at Supercloud2 live from our Palo Alto Studios. Is supercloud an architecture or is it a platform? And in this program, we're featuring technologists, analysts, practitioners to explore the intersection between data and cloud and the future of cloud computing, so, you don't want to miss this opportunity. Go to supercloud.world. You can register for free and participate in the event directly. All right, thanks for listening. That's a wrap. I'd like to thank Alex Myerson, who's on production and manages our podcast, Ken Schiffman as well, Kristen Martin and Cheryl Knight, they helped get the word out on social media, and in our newsletters, and Rob Hof is our editor-in-chief over at siliconangle.com. He does some great editing. Thank you, all. Remember, all these episodes are available as podcasts wherever you listen. All you've got to do is search "breaking analysis podcasts." I publish each week on wikibon.com and siliconangle.com where you can email me directly at david.vellante@siliconangle.com or DM me, @Dante, or comment on our LinkedIn posts. By all means, check out etr.ai. They get the best survey data in the enterprise tech business. We'll be doing our annual predictions post in a few weeks, once the data comes out from the January survey. This is Dave Vellante for theCUBE Insights powered by ETR. Thanks for watching, everybody, and we'll see you next time on "Breaking Analysis." (upbeat music)

Published Date : Jan 7 2023

SUMMARY :

This is "Breaking Analysis" and the director of engineering

ENTITIES

Entity	Category	Confidence
Alex Myerson	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Ken Schiffman	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Jassy	PERSON	0.99+
Cheryl Knight	PERSON	0.99+
Eric Bradley	PERSON	0.99+
Rob Hof	PERSON	0.99+
Okta	ORGANIZATION	0.99+
Kristen Martin	PERSON	0.99+
Zscaler	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Thomas Kurian	PERSON	0.99+
6%	QUANTITY	0.99+
IBM	ORGANIZATION	0.99+
2023	DATE	0.99+
18,000	QUANTITY	0.99+
Palo Alto Networks	ORGANIZATION	0.99+
10,000 employees	QUANTITY	0.99+
CrowdStrike	ORGANIZATION	0.99+
January	DATE	0.99+
2022	DATE	0.99+
January 17th	DATE	0.99+
Boston	LOCATION	0.99+
Lotus 1	TITLE	0.99+
2019	DATE	0.99+
June	DATE	0.99+
8%	QUANTITY	0.99+
United States	LOCATION	0.99+
david.vellante@siliconangle.com	OTHER	0.99+
Snowflakes	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
Lotus	TITLE	0.99+
two factors	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
Dataproc	ORGANIZATION	0.99+
three people	QUANTITY	0.99+
last week	DATE	0.99+
Supercloud2	EVENT	0.99+
Teradata	ORGANIZATION	0.99+
1%	QUANTITY	0.99+
3	TITLE	0.99+
Windows	TITLE	0.99+
5%	QUANTITY	0.99+
3%	QUANTITY	0.99+
BigQuery	TITLE	0.99+
Second quote	QUANTITY	0.99+
4%	QUANTITY	0.99+
DEC VAX	TITLE	0.99+
Thanksgiving	EVENT	0.98+
OS/2	TITLE	0.98+
7%	QUANTITY	0.98+
last year	DATE	0.98+
two primary sources	QUANTITY	0.98+
each week	QUANTITY	0.98+
Informatica	ORGANIZATION	0.98+
pandemic	EVENT	0.98+
first one	QUANTITY	0.98+
siliconangle.com	OTHER	0.97+
first answer	QUANTITY	0.97+
2%	QUANTITY	0.97+
around 800,000 employees	QUANTITY	0.97+
50 more data engineers	QUANTITY	0.97+
zero trust	QUANTITY	0.97+
Snowflake	ORGANIZATION	0.96+
single cloud	QUANTITY	0.96+
2	TITLE	0.96+
today	DATE	0.95+
ETR	ORGANIZATION	0.95+
single cloud	QUANTITY	0.95+
LinkedIn	ORGANIZATION	0.94+
later this month	DATE	0.94+

Mattia Ballerio, Elmec Informatica | The Path to Sustainable IT

(upbeat music) >> We're back talking about the path to sustainable IT and now we're going to get the perspective from Mattia Ballerio who is with Elmec Informatica, an IT services firm in the beautiful Lombardi region, of Italy, north of Milano. Mattia, welcome to theCUBE. Thanks so much for coming on. >> Thank you very much, Dave. Thank you. >> All right, before we jump in, tell us a little bit more about Elmec Informatica. What's your focus? Talk about your unique value add to customers. >> Yeah! So basically Elmec Informatica is middle company from the north part of Italy. And is managed service provider in the IT area. Okay, so the, the main focus area of Elmec is, rich digital transformation, and innovation to our clients with the focus on infrastructure services, workplace services, and also cybersecurity services, okay. And we try to follow the path of our clients to the digital transformation and innovation through technology and sustainability. >> Yeah, obviously very hot topics right now. Sustainability, environmental impact, they're growing areas of focus among leaders across all industries, particularly acute right now in, in Europe, with the, you know, the energy challenges. You've talked about things like sustainable business. What does that mean? What does that term, you know, speak to, and, and what can others learn from it? >> Yeah, at Elmec, our approach to sustainability is grounded in science and, and values. And also in a customer territory, but also employee centered. I mean, we conduct regular assessments to understand the most significant environment and social issues for our business with, with the goal of prioritizing what we do for a sustainability future. Our service delivery methodology, employee care, relationship with the local supplier, and local area and institution are a major factor for us to, to build a such a responsibility strategy. Specifically during the past year, we have been particularly focused on define sustainability governance in the company based on stakeholder engagement, defining material issues, establishing quantitative indicators, to monitor and setting medium to long term goals. >> Okay, so you have a lot of data. You can go into a customer, you can do an assessment, you can set a baseline, and then you have other data by which you can compare that and, and understand what's achievable. So what's your vision for sustainable business? You know, that strategy, you know, how has it affected your business in terms of the evolution? 'Cause this was, hasn't always been as hot a topic as it is today, and, and is it a competitive advantage for you? >> Yeah, yeah. For, for all intense and proposed sustainability is a competitive advantage for Elmec. I mean, it's so, because at the time of profound transformation in the work, in the world of work, CSR issues make a company more attractive when searching for new talent to enter in the workforce of our company. In addition, efforts to ensure people's proper work life balance are a strong retention factor. And, regarding our business proposition, Elmec's attempts is to meet high standard of sustainability and reliability. Our green data center, you said is a prime example of this approach, as at the same time, is there a conditioning activity that is done to give a second life to technology devices that come from, back from rental? I mean, our customer inquiries with respect to Elmec sustainability are increasingly frequent, and in depth. And which is why we monitor our performance, and invest in certification, such as, EcoVadis or ISO 14,001. Okay? >> Got it! So in a previous life, I actually did some work with, with power companies, and there were two big factors in IT, that affected the power consumption. Obviously virtualization was a big one, if you could consolidate servers, you know, that was huge. But the other was the advent of flash storage, and that was all we used to actually go in with the, the engineers and the power company put in alligator clips to measure of, of of an all flash array versus, you know, the spinning disk and it was a big impact. So, you want to talk about, your, your experience with Pure Storage. You use Flash Array, and the Evergreen architecture. Can you talk about your experience there? Why did you make that decision to select Pure Storage? How does that help you meet sustainability and operational requirements? Do those benefits scale as your customers grow? What's your experience been? >> Yeah! It was basically, an easy, an easy answer to our, to our business needs. Okay, because you said before that, in Elmec, we manage a lot of data, okay. And in the past we, we, we see, we see that, the constraints of managing so many, many data was very, very difficult to manage in terms of power consumption or simply for the, the space of storing the data. And, when, when Pure came to us and share our, their products, their vision, to the data management journey for Elmec Informatica, it was very easy to choose Pure, why? With values and the numbers, we, we create a business case and, we said, we see that our power consumption usage was much less, more than 90% of previous technology that we used in the past. Okay? And so of course you have to manage a gradual deploy of flash technology storage, but it was a good target. So we have tried to monitoring the adoption of flash technology, and monitor, monitoring also the power consumption, and the efficiency that the pure technology bring to our, to our IT systems, and of course the IT systems of our clients. And so this is one, the first part, the first good part of our trip with, with Pure. And after that, we approach also the sustainability in long term of choosing Pure technology storage. You mentioned the evergreen models of Pure, and of course this was, a game challenge for us because it allows, it allow us to extend the life cycle management of our data centers, but also the, it allows us to improve the facility, of the facilities of using technology from our technical side, okay. So we are much more efficient than in the past with the choose of Pure Storage Technologies, okay. Of course, this easy users, easy usage mode, let me say, it allow us to bring this value to our, to all our clients that put their data in our data centers. >> So, you talked about how you've seen, 90% improvement relative to previous technologies. I always, I haven't put you on the spot. Because I, I, I was on Pure's website, and I saw in their ESG report some com, you know, it was a comparison with a generic competitor. I'm presuming that competitor was not, you know 2010 spinning disk system. But, but, so I'm curious, as to the results that you're seeing with Pure, in terms of footprint and power usage. You, you're referencing some of that. We heard some metrics from Nicole and Ajay earlier in the program. Do you think, again I'm going to put you in the spot, do you think that Pure's architecture, and the way they've applied, whether it's machine intelligence or the Evergreen model, et cetera, is more competitive than other platforms, that you've seen? >> Yeah, of course. Is more competitor, more competitive. Because basically it allows to service provider to do much more efficient value proposition and offer services that are more that brings more values to, to the customers. Okay, so the customer is always at the center of a proposition of service provider. And the trying to adopt the methodology and also the, the value that Pure as inside, by design in the technology is, is for us very, very important and very, very strategic. Because, because, with like a glass, we can ourself transfer, try to transfer the values of Pure, Pure technologies to our service provider client. >> Okay Mattia, let's wrap and talk about sort of near term 2023 and then longer term. It looks like sustainability is a topic that's here to stay. Unlike when we were putting alligator clips on storage arrays, trying to help customers get rebates, that just didn't have legs. It was too complicated. Now it's a, a topic that everybody's measuring. What's next for Elmec, in its sustainability journey? What advice would you might have for sustainability leaders that want to make a meaningful impact on the environment but also on the bottom line? >> Okay. So, sustainability is fortunately a widely spread concept. And our role in, in this great game is to define a strategy, align with the common and fundamentals goals for the future of planet, and capable of expressing our inclination, and the particularities. Elmec sustainability goals in the near future, I can say that are will be basically free. One define sustainability plan, okay. It's fundamentals to define a sustainability plan. Then it's very important to monitor the, its emissions and we will calculate our carbon footprint, okay. And list, button list, produce a certifiable and comprehensive sustainability report, with respect to the demands of customers, suppliers, and also partners. Okay, so I can say that, this three target will be our direction in the, in the future. Okay? >> Yeah, so I mean, pretty straightforward. Make a plan. You got to monitor and measure. You can't improve what you can't measure. So you going to set a baseline, you're going to report on that. You're going to analyze the data and you're going to make continuous improvement. >> Yep. >> Mattia, thanks so much for joining us today and sharing your perspectives from the, the northern part of Italy. Really appreciate it. >> Yep. Thank you for having me on board. Thank you very much. >> It was really our pleasure. Okay, in a moment, I'm going to be back to wrap up the program, and share some resources , that could be valuable in your sustainability journey. Keep it right there. (upbeat music)

Published Date : Dec 7 2022

SUMMARY :

the path to sustainable IT Thank you very much, Dave. All right, before we jump in, and innovation to our clients in Europe, with the, you governance in the company in terms of the evolution? in the world of work, and the Evergreen architecture. and of course the IT and Ajay earlier in the program. by design in the technology is, also on the bottom line? and the particularities. and you're going to make and sharing your perspectives Thank you for having me on board. Okay, in a moment, I'm going to be back

ENTITIES

Entity	Category	Confidence
Mattia	PERSON	0.99+
Elmec	ORGANIZATION	0.99+
Mattia Ballerio	PERSON	0.99+
Elmec Informatica	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Europe	LOCATION	0.99+
Nicole	PERSON	0.99+
90%	QUANTITY	0.99+
first	QUANTITY	0.99+
Ajay	PERSON	0.99+
today	DATE	0.99+
Lombardi	LOCATION	0.99+
first part	QUANTITY	0.99+
Pure	ORGANIZATION	0.99+
2010	DATE	0.99+
more than 90%	QUANTITY	0.97+
2023	DATE	0.97+
two big factors	QUANTITY	0.96+
one	QUANTITY	0.95+
three	QUANTITY	0.95+
Italy	LOCATION	0.94+
second life	QUANTITY	0.92+
Milano	LOCATION	0.91+
past year	DATE	0.89+
EcoVadis	ORGANIZATION	0.88+
Evergreen	ORGANIZATION	0.78+
One	QUANTITY	0.68+
Pure Storage	ORGANIZATION	0.68+
14,001	OTHER	0.67+
ESG	TITLE	0.61+
Storage	OTHER	0.6+
ISO	TITLE	0.6+
of	LOCATION	0.59+
north	LOCATION	0.51+

Pure Storage The Path to Sustainable IT

>>In the early part of this century, we're talking about the 2005 to 2007 timeframe. There was a lot of talk about so-called green it. And at that time there was some organizational friction. Like for example, the line was that the CIO never saw the power bill, so he or she didn't care, or that the facilities folks, they rarely talked to the IT department. So it was kind of that split brain. And, and then the oh 7 0 8 financial crisis really created an inflection point in a couple of ways. First, it caused organizations to kind of pump the brakes on it spending, and then they took their eye off the sustainability ball. And the second big trend, of course, was the cloud model, you know, kind of became a benchmark for it. Simplicity and automation and efficiency, the ability to dial down and dial up capacity as needed. >>And the third was by the end of the first decade of the, the two thousands, the technology of virtualization was really hitting its best stride. And then you had innovations like flash storage, which largely eliminated the need for these massive farms of spinning mechanical devices that sucked up a lot of power. And so really these technologies began their march to mainstream adoption. And as we progressed through the 2020s, the effect of climate change really come into focus as a critical component of esg. Environmental, social, and governance. Shareholders have come to demand metrics around sustainability. Employees are often choosing employers based on their ESG posture. And most importantly, companies are finding that savings on power cooling and footprint, it has a bottom line impact on the income statement. Now you add to that the energy challenges around the world, particularly facing Europe right now, the effects of global inflation and even more advanced technologies like machine intelligence. >>And you've got a perfect storm where technology can really provide some relief to organizations. Hello and welcome to the Path to Sustainable It Made Possible by Pure Storage and Collaboration with the Cube. My name is Dave Valante and I'm one of the host of the program, along with my colleague Lisa Martin. Now, today we're gonna hear from three leaders on the sustainability topic. First up, Lisa will talk to Nicole Johnson. She's the head of Social Impact and Sustainability at Pure Storage. Nicole will talk about the results from a study of around a thousand sustainability leaders worldwide, and she'll share some metrics from that study. And then next, Lisa will speak to AJ Singh. He's the Chief Product Officer at Pure Storage. We've had had him on the cube before, and not only will he share some useful stats in the market, I'll also talk about some of the technology innovations that customers can tap to address their energy consumption, not the least of which is ai, which is is entering every aspect of our lives, including how we deal with energy consumption. And then we'll bring it back to our Boston studio and go north of Italy with Mattia Ballero of Elec Informatica, a services provider with deep expertise on the topic of sustainability. We hope you enjoyed the program today. Thanks for watching. Let's get started >>At Pure Storage, the opportunity for change and our commitment to a sustainable future are a direct reflection of the way we've always operated and the values we live by every day. We are making significant and immediate impact worldwide through our environmental sustainability efforts. The milestones of change can be seen everywhere in everything we do. Pure's Evergreen Storage architecture delivers two key environmental benefits to customers, the reduction of wasted energy and the reduction of e-waste. Additionally, Pure's implemented a series of product packaging redesigns, promoting recycled and reuse in order to reduce waste that will not only benefit our customers, but also the environment. Pure is committed to doing what is right and leading the way with innovation. That has always been the pure difference, making a difference by enabling our customers to drive out energy usage and their data storage systems by up to 80%. Today, more than 97% of pure arrays purchased six years ago are still in service. And tomorrow our goal for the future is to reduce Scope three. Emissions Pure is committing to further reducing our sold products emissions by 66% per petabyte by 2030. All of this means what we said at the beginning, change that is simple and that is what it has always been about. Pure has a vision for the future today, tomorrow, forever. >>Hi everyone, welcome to this special event, pure Storage, the Path to Sustainable it. I'm your host, Lisa Martin. Very pleased to be joined by Nicole Johnson, the head of Social Impact and Sustainability at Pure Storage. Nicole, welcome to the Cube. Thanks >>For having me, Lisa. >>Sustainability is such an important topic to talk about and I understand that Pure just announced a report today about sustainability. What can you tell me what nuggets are in this report? >>Well, actually quite a few really interesting nuggets, at least for us. And I, I think probably for you and your viewers as well. So we actually commissioned about a thousand sustainability leaders across the globe to understand, you know, what are their sustainability goals, what are they working on, and what are the impacts of buying decisions, particularly around infrastructure when it comes to sustainable goals. I think one of the things that was really interesting for us was the fact that around the world we did not see a significant variation in terms of sustainability being a top priority. You've, I'm sure you've heard about the energy crisis that's happening across Europe. And so, you know, there was some thought that perhaps that might play into AMEA being a larger, you know, having sustainability goals that were more significant. But we actually did not find that we found sustainability to be really important no matter where the respondents were located. >>So very interesting at Pure sustainability is really at the heart of what we do and has been since our founding. It's interesting because we set out to make storage really simple, but it turns out really simple is also really sustainable. And the products and services that we bring to our customers have really powerful outcomes when it comes to decreasing their, their own carbon footprints. And so, you know, we often hear from customers that we've actually really helped them to significantly improve their storage performance, but also allow them to save on space power and cooling costs and, and their footprint. So really significant findings. One example of that is a company called Cengage, which is a global education technology company. They recently shared with us that they have actually been able to reduce their overall storage footprint by 80% while doubling to tripling the performance of their storage systems. So it's really critical for, for companies who are thinking about their sustainability goals, to consider the dynamic between their sustainability program and their IT teams who are making these buying decisions, >>Right? Those two teams need to be really inextricably linked these days. You talked about the fact that there was really consistency across the regions in terms of sustainability being of high priority for organizations. You had a great customer story that you shared that showed significant impact can be made there by bringing the sustainability both together with it. But I'm wondering why are we seeing that so much of the vendor selection process still isn't revolving around sustainability or it's overlooked? What are some of the things that you received despite so many people saying sustainability, huge priority? >>Well, in this survey, the most commonly cited challenge was really around the fact that there was a lack of management buy-in. 40% of respondents told us this was the top roadblock. So getting, I think getting that out of the way. And then we also just heard that sustainability teams were not brought into tech purchasing processes until after it's already rolling, right? So they're not even looped in. And that being said, you know, we know that it has been identified as one of the key departments to supporting a company sustainability goals. So we, we really want to ensure that these two teams are talking more to each other. When we look even closer at the data from the respondents, we see some really positive correlations. We see that 65% of respondents reported that they're on track to meet their sustainability goals. And the IT of those 65%, it is significantly engaged with reporting data for those sustainability initiatives. We saw that, that for those who did report, the sustainability is a top priority for vendor selection. They were twice as likely to be on track with their goals and their sustainability directors said that they were getting involved at the beginning of the tech purchasing program. Our process, I'm sorry, rather than towards the end. And so, you know, we know that to curb the impact of climate crisis, we really need to embrace sustainability from a cross-functional viewpoint. >>Definitely has to be cross-functional. So, so strong correlations there in the report that organizations that had closer alignment between the sustainability folks and the IT folks were farther along in their sustainability program development, execution, et cetera, those co was correlations, were they a surprise? >>Not entirely. You know, when we look at some of the statistics that come from the, you know, places like the World Economic Forum, they say that digitization generated 4% of greenhouse gas emissions in 2020. So, and that, you know, that's now almost three years ago, digital data only accelerates, and by 2025, we expect that number could be almost double. And so we know that that communication and that correlation is gonna be really important because data centers are taking up such a huge footprint of when companies are looking at their emissions. And it's, I mean, quite frankly, a really interesting opportunity for it to be a trailblazer in the sustainability journey. And, you know, perhaps people that are in IT haven't thought about how they can make an impact in this area, but there really is some incredible ways to help us work on cutting carbon emissions, both from your company's perspective and from the world's perspective, right? >>Like we are, we're all doing this because it's something that we know we have to do to drive down climate change. So I think when you, when you think about how to be a trailblazer, how to do things differently, how to differentiate your own department, it's a really interesting connection that IT and sustainability work together. I would also say, you know, I'll just note that of the respondents to the survey we were discussing, we do over half of those respondents expect to see closer alignment between the organization's IT and sustainability teams as they move forward. >>And that's really a, a tip a hat to those organizations embracing cultural change. That's always hard to do, but for those two, for sustainability in IT to come together as part of really the overall ethos of an organization, that's huge. And it's great to see the data demonstrating that, that those, that alignment, that close alignment is really on its way to helping organizations across industries make a big impact. I wanna dig in a little bit to here's ESG goals. What can you share with us about >>That? Absolutely. So as I mentioned peers kind of at the beginning of our formal ESG journey, but really has been working on the, on the sustainability front for a long time. I would, it's funny as we're, as we're doing a lot of this work and, and kind of building our own profile around this, we're coming back to some of the things that we have done in the past that consumers weren't necessarily interested in then but are now because the world has changed, becoming more and more invested in. So that's exciting. So we did a baseline scope one, two, and three analysis and discovered, interestingly enough that 70% of our emissions comes from use of sold products. So our customers work running our products in their data centers. So we know that we, we've made some ambitious goals around our Scope one and two emissions, which is our own office, our utilities, you know, those, they only account for 6% of our emissions. So we know that to really address the issue of climate change, we need to work on the use of sold products. So we've also made a, a really ambitious commitment to decrease our carbon emissions by 66% per bed per petabyte by 2030 in our product. So decreasing our own carbon footprint, but also affecting our customers as well. And we've also committed to a science-based target initiative and our road mapping how to achieve the ambitious goals set out in the Paris agreement. >>That's fantastic. It sounds like you really dialed in on where is the biggest opportunity for us as Pure Storage to make the biggest impact across our organization, across our customers organizations. There lofty goals that pure set, but knowing what I know about Pure, you guys are probably well on track to, to accomplish those goals in record time, >>I hope So. >>Talk a little bit about advice that you would give to viewers who might be at the very beginning of their sustainability journey and really wondering what are the core elements besides it, sustainability, team alignment that I need to bring into this program to make it actually successful? >>Yeah, so I think, you know, understanding that you don't have to pick between really powerful technology and sustainable technology. There are opportunities to get both and not just in storage right in, in your entire IT portfolio. We know that, you know, we're in a place in the world where we have to look at things from the bigger picture. We have to solve new challenges and we have to approach business a little bit differently. So adopting solutions and services that are environmentally efficient can actually help to scale and deliver more effective and efficient IT solutions over time. So I think that that's something that we need to, to really remind ourselves, right? We have to go about business a little bit differently and that's okay. We also know that data centers utilize an incredible amount of, of energy and, and carbon. And so everything that we can do to drive that down is going to address the sustainability goals for us individually as well as, again, drive down that climate change. So we, we need to get out of the mindset that data centers are, are about reliability or cost, et cetera, and really think about efficiency and carbon footprint when you're making those business decisions. I'll also say that, you know, the earlier that we can get sustainability teams into the conversation, the more impactful your business decisions are going to be and helping you to guide sustainable decision making. >>So shifting sustainability and IT left almost together really shows that the correlation between those folks getting together in the beginning with intention, the report shows and the successes that peers had demonstrate that that's very impactful for organizations to actually be able to implement even the cultural change that's needed for sustainability programs to be successful. My last question for you goes back to that report. You mentioned in there that the data show a lot of organizations are hampered by management buy-in, where sustainability is concerned. How can pure help its customers navigate around those barriers so that they get that management buy-in and they understand that the value in it for >>Them? Yeah, so I mean, I think that for me, my advice is always to speak to hearts and minds, right? And help the management to understand, first of all, the impact right on climate change. So I think that's the kind of hearts piece on the mind piece. I think it's addressing the sustainability goals that these companies have set for themselves and helping management understand how to, you know, how their IT buying decisions can actually really help them to reach these goals. We also, you know, we always run kind of TCOs for customers to understand what is the actual cost of, of the equipment. And so, you know, especially if you're in a, in a location in which energy costs are rising, I mean, I think we're seeing that around the world right now with inflation. Better understanding your energy costs can really help your management to understand the, again, the bigger picture and what that total cost is gonna be. Often we see, you know, that maybe the I the person who's buying the IT equipment isn't the same person who's purchasing, who's paying the, the electricity bills, right? And so sometimes even those two teams aren't talking. And there's a great opportunity there, I think, to just to just, you know, look at it from a more high level lens to better understand what total cost of ownership is. >>That's a great point. Great advice. Nicole, thank you so much for joining me on the program today, talking about the new report that on sustainability that Pure put out some really compelling nuggets in there, but really also some great successes that you've already achieved internally on your own ESG goals and what you're helping customers to achieve in terms of driving down their carbon footprint and emissions. We so appreciate your insights and your thoughts. >>Thank you, Lisa. It's been great speaking with you. >>AJ Singh joins me, the Chief Product Officer at Peer Storage. Aj, it's great to have you back on the program. >>Great to be back on, Lisa, good morning. >>Good morning. And sustainability is such an important topic to talk about. So we're gonna really unpack what PEER is doing, we're gonna get your viewpoints on what you're seeing and you're gonna leave the audience with some recommendations on how they can get started on their ESG journey. First question, we've been hearing a lot from pure AJ about the role that technology plays in organizations achieving sustainability goals. What's been the biggest environmental impact associated with, with customers achieving that given the massive volumes of data that keep being generated? >>Absolutely, Lisa, you can imagine that the data is only growing and exploding and, and, and, and there's a good reason for it. You know, data is the new currency. Some people call it the new oil. And the opportunity to go process this data gain insights is really helping customers drive an edge in the digital transformation. It's gonna make a difference between them being on the leaderboard a decade from now when the digital transformation kind of pans out versus, you know, being kind of somebody that, you know, quite missed the boat. So data is super critical and and obviously as part of that we see all these big benefits, but it has to be stored and, and, and that means it's gonna consume a lot of resources and, and the, and therefore data center usage has only accelerated, right? You can imagine the amount of data being generated, you know, recent study pointed to roughly by twenty twenty five, a hundred and seventy five zetabytes, which where each zettabyte is a billion terabytes. So just think of that size and scale of data. That's huge. And, and they also say that, you know, pretty soon, today, in fact in the developed world, every person is having an interaction with the data center literally every 18 seconds. So whether it's on Facebook or Twitter or you know, your email, people are constantly interacting with data. So you can imagine this data is only exploding. It has to be stored and it consumes a lot of energy. In fact, >>It, oh, go ahead. Sorry. >>No, I was saying in fact, you know, there's some studies have shown that data center usage literally consumes one to 2% of global energy consumption. So if there's one place we could really help climate change and, and all those aspects, if you can kind of really, you know, tamp down the data center, energy consumption, sorry, you were saying, >>I was just gonna say, it's, it's an incredibly important topic and the, the, the stats on data that you provided and also I, I like how you talked about, you know, every 18 seconds we're interacting with a data center, whether we know it or not, we think about the long term implications, the fact that data is growing massively. As you shared with the stats that you mentioned. If we think about though the responsibility that companies have, every company in today's world needs to be a data company, right? And we consumers expect it. We expect that you are gonna deliver these relevant, personalized experiences whether we're doing a transaction in our personal lives or in business. But what is the, what requirements do technology companies have to really start billing down their carbon footprints? >>No, absolutely. If you can think about it, just to kind of finish up the data story a little bit, the explosion is to the point where, in fact, if you just recently was in the news that Ireland went up and said, sorry, we can't have any more data centers here. We just don't have the power to supply them. That was big in the news and you know, all the hyperscale that was crashing the head. I know they've come around that and figured out a way around it, but it's getting there. Some, some organizations and and areas jurisdictions are saying pretty much no data center the law, you know, we're, we just can't do it. And so as you said, so companies like Pure, I mean, our view is that it has an opportunity here to really do our bit for climate change and be able to, you know, drive a sustainable environment. >>And, and at Pure we believe that, you know, today's data success really ultimately hinges on energy efficiency, you know, so to to really be energy efficient means you are gonna be successful long term with data. Because if you think of classic data infrastructures, the legacy infrastructures, you know, we've got disk infrastructures, hybrid infrastructures, flash infrastructures, low end systems, medium end systems, high end systems. So a lot of silos, you know, a lot of inefficiency across the silos. Cause the data doesn't get used across that. In fact, you know, today a lot of data centers are not really built with kind of the efficiency and environmental mindset. So there's a big opportunity there. >>So aj, talk to me about some of the steps that Pure is implementing as its chief product officer. Would love to get your your thoughts, what steps is it implementing to help Pures customers become more sustainable? >>No, absolutely. So essentially we are all inherently motivated, like pure and, and, and, and everybody else to solve problems for customers and really forward the status quo, right? You know, innovation, you know, that's what we are all about. And while we are doing that, the challenge is to how do you make technology and the data we feed into it faster, smarter, scalable obviously, but more importantly sustainable. And you can do all of that, but if you miss the sustainability bit, you're kind of missing the boat. And I also feel from an ethical perspective, that's really important for us. Not only you do all the other things, but also kind of make it sustainable. In fact, today 80% of the companies, the companies are realizing this, 80% today are in fact report out on sustainability, which is great. In fact, 80% of leadership at companies, you know, CEOs and senior executives say they've been impacted by some climate change event, you know, where it's a fire in the place they had to evacuate or floods or storms or hurricanes, you, you name it, right? >>So mitigating the carbon impact can in fact today be a competitive advantage for companies because that's where the puck is going and everybody's, you know, it's skating, wanting to skate towards the, and it's good, it's good business too to be sustainable and, and, and meet these, you know, customer requirements. In fact, the the recent survey that we released today is saying that more and more organizations are kickstarting, their sustainability initiatives and many take are aiming to make a significant progress against that over the next decade. So that's, that's really, you know, part of the big, the really, so our view is that that IT infrastructure, you know, can really make a big push towards greener it and not just kind of greenwash it, but actually, you know, you know, make things more greener and, and, and really take the, the lead in, in esg. And so it's important that organizations can reach alignment with their IT teams and challenge their IT teams to continue to lead, you know, for the organization, the sustainability aspects. >>I'm curious, aj, when you're in customer conversations, are you seeing that it's really the C-suite plus it coming together and, and how does peer help facilitate that? To your point, it needs to be able to deliver this, but it's, it's a board level objective these days. >>Absolutely. We're seeing increasingly, especially in Europe with the, you know, the war in Ukraine and the energy crisis that, you know, that's, that's, you know, unleashed. We definitely see it's becoming a bigger and bigger board level objective for, for a lot of companies. And we definitely see customers in starting to do that. So, so in particular, I do want to touch briefly on what steps we are taking as a company, you know, to to to make it sustainable. And obviously customers are doing all the things we talked about and, and we're also helping them become smarter with data. But the key difference is, you know, we have a big focus on efficiency, which is really optimizing performance per wat with unmatched storage density. So you can reduce the footprint and dramatically lower the power required. And and how efficient is that? You know, compared to other old flash systems, we tend to be one fifth, we tend to take one fifth the power compared to other flash systems and substantially lower compared to spinning this. >>So you can imagine, you know, cutting your, if data center consumption is a 2% of global consumption, roughly 40% of that tends to be storage cause of all the spinning disc. So you add about, you know, 0.8% to global consumption and if you can cut that by four fifths, you know, you can already start to make an impact. So, so we feel we can do that. And also we're quite a bit more denser, 10 times more denser. So imagine one fifth the power, one 10th the density, but then we take it a step further because okay, you've got the storage system in the data center, but what about the end of life aspect? What about the waste and reclamation? So we also have something called non-disruptive upgrades. We, using our AI technology in pure one, we can start to sense when a particular part is going to fail and just before it goes to failure, we actually replace it in a non-disruptive fashion. So customer's data is not impacted and then we recycle that so you get a full end to end life cycle, you know, from all the way from the time you deploy much lower power, much lower density, but then also at the back end, you know, reduction in e-waste and those kind of things. >>That's a great point you, that you bring up in terms of the reclamation process. It sounds like Pure does that on its own, the customer doesn't have to be involved in that. >>That's right. And we do that, it's a part of our evergreen, you know, service that we offer. A lot of customers sign up for that service and in fact they don't even, we tell them, Hey, you know, that part's about to go, we're gonna come in, we're gonna swap it out and, and then we actually recycle that part, >>The power of ai. Love that. What are some of the, the things that companies can do if they're, if they're early in this journey on sustainability, what are some of the specific steps companies can take to get started and maybe accelerate that journey as it's becoming climate change and things are becoming just more and more of a, of a daily topic on the news? >>No, absolutely. There's a lot of things companies can do. In fact, the four four item that we're gonna highlight, the first one is, you know, they can just start by doing a materiality assessment and a materiality assessment essentially engages all the stakeholders to find out which specific issues are important for the business, right? So you identify your key priorities that intersect with what the stakeholders want, you know, your different groups from sales, customers, partners, you know, different departments in the organization. And for example, for us, when we conducted our materiality assessment, for us, our product we felt was the biggest area of focus that could contribute a lot towards, you know, making an impact in, in, in from a sustainability standpoint. That's number one. I think number two companies can also think about taking an Azure service approach. The beauty of the Azure service approach is that you are buying a, your customer, they're buying outcomes with SLAs and, and when you are starting to buy outcomes with SLAs, you can start small and then grow as you consume more. >>So that way you don't have systems sitting idle waiting for you to consume more, right? And that's the beauty of the as service approach. And so for example, for us, you know, we have something called Evergreen one, which is our as service offer, where essentially customers are able to only use and have systems turned onto as much as they're consuming. So, so that reduces the waste associated with underutilized systems, right? That's number two. Number three is also you can optimize your supply chains end to end, right? Basically by making sure you're moving, recycling, packaging and eliminating waste in that thing so you can recycle it back to your suppliers. And you can also choose a sustainable supplier network that following sort of good practices, you know, you know, across the globe and such supply chains that are responsive and diverse can really help you. Also, the big business benefit benefited. >>You can also handle surges and demand, for example, for us during the pandemic with this global supply chain shortages, you know, whereas most of our competitors, you know, lead times went to 40, 50 weeks, our lead times went from three to six weeks cuz you know, we had this sustainable, you know, supply chain. And so all of these things, you know, the three things important, but the fourth thing I say more cultural and, and the cultural thing is how do you actually begin to have sustainability become a core part of your ethos at the company, you know, across all the departments, you know, and we've at Pure, definitely it's big for us, you know, you know, around sustainability starting with a product design, but all of the areas as well, if you follow those four items, they'll do the great place to start. >>That's great advice, great recommendations. You talk about the, the, the supply chain, sustainable supply chain optimization. We've been having a lot of conversations with businesses and vendors alike about that and how important it is. You bring up a great point too on supplier diversity, if we could have a whole conversation on that. Yes. But I'm also glad that you brought up culture that's huge to, for organizations to adopt an ESG strategy and really drive sustainability in their business. It has to become, to your point, part of their ethos. Yes. It's challenging. Cultural change management is challenging. Although I think with climate change and the things that are so public, it's, it's more on, on the top mindset folks. But it's a great point that the organization really as a whole needs to embrace the sustainability mindset so that it as a, as an organization lives and breathes that. Yes. And last question for you is advice. So you, you outlined the Four Steps organizations can take. I look how you made that quite simple. What advice would you give organizations who are on that journey to adopting those, those actions, as you said, as they look to really build and deploy and execute an ESG strategy? >>No, absolutely. And so obviously, you know, the advice is gonna come from, you know, a company like Pure, you know, our background kind of being a supplier of products. And so, you know, our advice is for companies that have products, usually they tend to be the biggest generator, the products that you sell to your, your customers, especially if they've got hardware components in it. But, you know, the biggest generator of e-waste and, and and, and, and, and kind of from a sustainability standpoint. So it's really important to have an intentional design approach towards your products with sustainability in mind. So it's not something that's, that you can handle at the very back end. You design it front in the product and so that sustainable design becomes very intentional. So for us, for example, doing these non-disruptive upgrades had to be designed up front so that, you know, a, you know, one of our repair person could go into a customer shop and be able to pull out a card and put in a new card without any change in the customer system. >>That non-receptive approach, it has to be designed into the hardware software systems to be able to pull that on. And that intentional design enables you to recover pieces just when they're about to fail and then putting them through a recovery, you know, waste recovery process. So that, that's kind of the one thing I would say that philosophy, again, it comes down to if that is, you know, seeping into the culture, into your core ethos, you will start to do, you know, you know, that type of work. So, so I mean it's important thing, you know, look, this year, you know, with the spike in energy prices, you know, you know, gas prices going up, it's super important that all of us, you know, do our bit in there and start to drive products that are fundamentally sustainable, not just at the initial, you know, install point, but from an end to end full life cycle standpoint. >>Absolutely. And I love that you brought up intention that is everything that peers doing is with, with such thought and intention and really for organizations and any industry to become more sustainable, to develop an ESG strategy. To your point, it all needs to start with intention. And of course that that cultural adoption, aj, it's been so great to have you on the program talking about what PEER is doing to help organizations really navigate that path to sustainable it. We appreciate your insights on your time. >>Thank you, Lisa. Pleasure being on board >>At Pure Storage. The opportunity for change and our commitment to a sustainable future are a direct reflection of the way we've always operated and the values we live by every day. We are making significant and immediate impact worldwide through our environmental sustainability efforts. The milestones of change can be seen everywhere in everything we do. Pures Evergreen storage architecture delivers two key environmental benefits to customers, the reduction of wasted energy and the reduction of e-waste. Additionally, pures implemented a series of product packaging redesigns, promoting recycle and reuse in order to reduce waste that will not only benefit our customers, but also the environment. Pure is committed to doing what is right and leading the way with innovation. That has always been the pure difference, making a difference by enabling our customers to drive out energy usage and their data storage systems by up to 80% today, more than 97% of Pure Array purchased six years ago are still in service. And tomorrow our goal for the future is to reduce Scope three emissions Pure is committing to further reducing our sold products emissions by 66% per petabyte by 2030. All of this means what we said at the beginning, change that is simple and that is what it has always been about. Pure has a vision for the future today, tomorrow, forever. >>We're back talking about the path to sustainable it and now we're gonna get the perspective from Mattia Valerio, who is with Elec Informatica and IT services firm and the beautiful Lombardi region of Italy north of Milano. Mattia, welcome to the Cube. Thanks so much for coming on. >>Thank you very much, Dave. Thank you. >>All right, before we jump in, tell us a little bit more about Elec Informatica. What's your focus, talk about your unique value add to customers. >>Yeah, so basically Alma Informatica is middle company from the north part of Italy and is managed service provider in the IT area. Okay. So the, the main focus area of Al Meca is reach digital transformation innovation to our clients with focus on infrastructure services, workplace services, and also cybersecurity services. Okay. And we try to follow the path of our clients to the digital transformation and the innovation through technology and sustainability. >>Yeah. Obviously very hot topics right now. Sustainability, environmental impact, they're growing areas of focus among leaders across all industries. A particularly acute right now in, in Europe with the, you know, the energy challenges you've talked about things like sustainable business. What does that mean? What does that term Yeah. You know, speak to and, and what can others learn from it? >>Yeah. At at, at our approach to sustainability is grounded in science and, and values and also in customer territory, but also employee centered. I mean, we conduct regular assessments to understand the most significant environment and social issues for our business with, with the goal of prioritizing what we do for a sustainability future. Our service delivery methodology, employee care relationship with the local supplier and local area and institution are a major factor for us to, to build a such a responsibility strategy. Specifically during the past year, we have been particularly focused on define sustainability governance in the company based on stakeholder engagement, defining material issues, establishing quantitative indicators to monitor and setting medium to long-term goals. >>Okay, so you have a lot of data. You can go into a customer, you can do an assessment, you can set a baseline, and then you have other data by which you can compare that and, and understand what's achievable. So what's your vision for sustainable business? You know, that strategy, you know, how has it affected your business in terms of the evolution? Cuz this wasn't, hasn't always been as hot a topic as it is today. And and is it a competitive advantage for you? >>Yeah, yeah. For, for, for all intense and proposed sustainability is a competitive advantage for elec. I mean, it's so, because at the time of profound transformation in the work, in the world of work, CSR issues make a company more attractive when searching for new talent to enter in the workforce of our company. In addition, efforts to ensure people's proper work life balance are a strong retention factor. And regarding our business proposition, ELEX attempts is to meet high standard of sustainability and reliability. Our green data center, you said is a prime example of this approach as at the same time, is there a conditioning activity that is done to give a second life to technology devices that come from back from rental? I mean, our customer inquiries with respect to sustainability are increasingly frequent and in depth and which is why we monitor our performance and invest in certification such as EcoVadis or ISO 14,001. Okay, >>Got it. So in a previous life I actually did some work with, with, with power companies and there were two big factors in it that affected the power consumption. Obviously virtualization was a big one, if you could consolidate servers, you know, that was huge. But the other was the advent of flash storage and that was, we used to actually go in with the, the engineers and the power company put in alligator clips to measure of, of, of an all flash array versus, you know, the spinning disc and it was a big impact. So you, I wanna talk about your, your experience with Pure Storage. You use Flash Array and the Evergreen architecture. Can you talk about what your experience there, why did you make that decision to select Pure Storage? How does that help you meet sustainability and operational requirements? Do those benefits scale as your customers grow? What's your experience been? >>Yeah, it was basically an easy and easy answer to our, to our business needs. Okay. Because you said before that in Elec we, we manage a lot of data, okay? And in the past we, we, we see it, we see that the constraints of managing so many, many data was very, very difficult to manage in terms of power consumption or simply for the, the space of storing the data. And when, when Pure came to us and share our products, their vision to the data management journey for Element Informatica, it was very easy to choose pure why with values and numbers. We, we create a business case and we said that we, we see that our power consumption usage was much less, more than 90% of previous technology that we used in the past. Okay. And so of course you have to manage a grade oil deploy of flash technology storage, but it was a good target. >>So we have tried to monitoring the adoption of flash technology and monitor monitoring also the power consumption and the efficiency that the pure technology bring to our, to our IT systems and of course the IT systems of our clients. And so this is one, the first part, the first good part of our trip with, with Pure. And after that we approach also the sustainability in long term of choosing pure technology storage. You mentioned the Evergreen models of Pure, and of course this was, again, challenge for us because it allows, it allow us to extend the life cycle management of our data centers, but also the, IT allows us to improve the facility of the facilities of using technology from our technical side. Okay. So we are much more efficient than in the past with the choose of Pure storage technologies. Okay. Of course, this easy users, easy usage mode, let me say it, allow us to bring this value to our, to all our clients that put their data in our data centers. >>So you talked about how you've seen a 90% improvement relative to previous technologies. I always, I haven't put you in the spot. Yeah, because I, I, I was on Pure's website and I saw in their ESG report some com, you know, it was a comparison with a generic competitor presuming that competitor was not, you know, a 2010 spinning disc system. But, but, so I'm curious as to the results that you're seeing with Pure in terms of footprint and power usage. You, you're referencing some of that. We heard some metrics from Nicole and AJ earlier in the program. Do you think, again, I'm gonna put you in the spot, do you think that Pure's architecture and the way they've applied, whether it's machine intelligence or the Evergreen model, et cetera, is more competitive than other platforms that you've seen? >>Yeah, of course. Is more competitor improve competitive because basically it allows to service provider to do much more efficient value proposition and offer services that are more, that brings more values to, to the customers. Okay. So the customer is always at the center of a proposition of a service provider and trying to adopt the methodology and also the, the value that pure as inside by design in the technology is, is for us very, very, very important and very, very strategic because, because with like a glass, we can, our self transfer try to transfer the values of pure, pure technologies to our service provider client. >>Okay. Matta, let's wrap and talk about sort of near term 2023 and then longer term it looks like sustainability is a topic that's here to stay. Unlike when we were putting alligator clips on storage arrays, trying to help customers get rebates that just didn't have legs. It was too complicated. Now it's a, a topic that everybody's measuring. What's next for elec in its sustainability journey? What advice would you might have? Sustainability leaders that wanna make a meaningful impact on the environment, but also on the bottom line. >>Okay, so sustainability is fortunately a widely spread concept. And our role in, in this great game is to define a strategy, align with the common and fundamentals goals for the future of planet and capable of expressing our inclination and the, and the particularities and accessibility goals in the near future. I, I say, I can say that are will be basically free one define sustainability plan. Okay? It's fundamentals to define a sustainability plan. Then it's very important to monitor the its emissions and we will calculate our carbon footprint. Okay? And least button list produces certifiable and comprehensive sustainability report with respect to the demands of customers, suppliers, and also partners. Okay. So I can say that this three target will be our direction in the, in the future. Okay. >>Yeah. So I mean, pretty straightforward. Make a plan. You gotta monitor and measure, you can't improve what you can't measure. So you gonna set a baseline, you're gonna report on that. Yep. You're gonna analyze the data and you're gonna make continuous improvement. >>Yep. >>Matea, thanks so much for joining us today in sharing your perspectives from the, the northern part of Italy. Really appreciate it. >>Yeah, thank you for having aboard. Thank you very >>Much. It was really our pleasure. Okay, in a moment, I'm gonna be back to wrap up the program and share some resources that could be valuable in your sustainability journey. Keep it right there. >>Sustainability is becoming increasingly important and is hitting more RFPs than ever before as a critical decision point for customers. Environmental benefits are not the only impetus. Rather bottom line cost savings are proving that sustainability actually means better business. You can make a strong business case around sustainability and you should, many more organizations are setting mid and long-term goals for sustainability and putting forth published metrics for shareholders and customers. Whereas early green IT initiatives at the beginning of this century, were met with skepticism and somewhat disappointing results. Today, vendor r and d is driving innovation in system design, semiconductor advancements, automation in machine intelligence that's really beginning to show tangible results. Thankfully. Now remember, all these videos are available on demand@thecube.net. So check them out at your convenience and don't forget to go to silicon angle.com for all the enterprise tech news of the day. You also want to check out pure storage.com. >>There are a ton of resources there. As an aside, pure is the only company I can recall to allow you to access resources like a Gartner Magic Quadrant without forcing you to fill out a lead gen form. So thank you for that. Pure storage, I love that. There's no squeeze page on that. No friction. It's kind of on brand there for pure well done. But to the topic today, sustainability, there's some really good information on the site around esg, Pure's Environmental, social and Governance mission. So there's more in there than just sustainability. You'll see some transparent statistics on things like gender and ethnic diversity, and of course you'll see that Pure has some work to do there. But kudos for publishing those stats transparently and setting goals so we can track your progress. And there's plenty on the sustainability topic as well, including some competitive benchmarks, which are interesting to look at and may give you some other things to think about. We hope you've enjoyed the path to Sustainable it made possible by Pure Storage produced with the Cube, your leader in enterprise and emerging tech, tech coverage.

Published Date : Dec 5 2022

SUMMARY :

trend, of course, was the cloud model, you know, kind of became a benchmark for it. And then you had innovations like flash storage, which largely eliminated the We hope you enjoyed the program today. At Pure Storage, the opportunity for change and our commitment to a sustainable future Very pleased to be joined by Nicole Johnson, the head of Social What can you tell me what nuggets are in this report? And so, you know, there was some thought that perhaps that might play into AMEA And so, you know, we often hear from customers that What are some of the things that you received despite so many people saying sustainability, And so, you know, we know that to curb the that had closer alignment between the sustainability folks and the IT folks were farther along So, and that, you know, that's now almost three years ago, digital data the respondents to the survey we were discussing, we do And it's great to see the data demonstrating our Scope one and two emissions, which is our own office, our utilities, you know, those, It sounds like you really dialed in on where is the biggest decisions are going to be and helping you to guide sustainable decision My last question for you goes back to that report. And so, you know, especially if you're in a, in a location Nicole, thank you so much for joining me on the program today, it's great to have you back on the program. pure AJ about the role that technology plays in organizations achieving sustainability it's on Facebook or Twitter or you know, your email, people are constantly interacting with you know, tamp down the data center, energy consumption, sorry, you were saying, We expect that you are gonna deliver these relevant, the explosion is to the point where, in fact, if you just recently was in the news that Ireland went So a lot of silos, you know, a lot of inefficiency across the silos. So aj, talk to me about some of the steps that Pure is implementing as its chief product officer. In fact, 80% of leadership at companies, you know, CEOs and senior executives say they've teams and challenge their IT teams to continue to lead, you know, To your point, it needs to be able to deliver this, but it's, it's a board level objective We're seeing increasingly, especially in Europe with the, you know, the war in Ukraine and the the back end, you know, reduction in e-waste and those kind of things. that on its own, the customer doesn't have to be involved in that. they don't even, we tell them, Hey, you know, that part's about to go, we're gonna come in, we're gonna swap it out and, companies can take to get started and maybe accelerate that journey as it's becoming climate the biggest area of focus that could contribute a lot towards, you know, making an impact in, So that way you don't have systems sitting idle waiting for you to consume more, and the cultural thing is how do you actually begin to have sustainability become But I'm also glad that you brought up culture that's And so obviously, you know, the advice is gonna come from, you know, it comes down to if that is, you know, seeping into the culture, into your core ethos, it's been so great to have you on the program talking about what PEER is doing to help organizations really are a direct reflection of the way we've always operated and the values we live by every We're back talking about the path to sustainable it and now we're gonna get the perspective from All right, before we jump in, tell us a little bit more about Elec Informatica. in the IT area. right now in, in Europe with the, you know, the energy challenges you've talked about things sustainability governance in the company based on stakeholder engagement, You know, that strategy, you know, how has it affected your business in terms of the evolution? Our green data center, you of, of, of an all flash array versus, you know, the spinning disc and it was a big impact. And so of course you have to manage a grade oil deploy of the facilities of using technology from our that competitor was not, you know, a 2010 spinning disc system. So the customer is always at the center of a proposition What advice would you might have? monitor the its emissions and we will calculate our So you gonna set a baseline, you're gonna report on that. the northern part of Italy. Yeah, thank you for having aboard. Okay, in a moment, I'm gonna be back to wrap up the program and share some resources case around sustainability and you should, many more organizations are setting mid can recall to allow you to access resources like a Gartner Magic Quadrant without forcing

ENTITIES

Entity	Category	Confidence
Nicole	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Nicole Johnson	PERSON	0.99+
Dave Valante	PERSON	0.99+
Mattia Ballero	PERSON	0.99+
Elec Informatica	ORGANIZATION	0.99+
Mattia	PERSON	0.99+
AJ Singh	PERSON	0.99+
AJ Singh	PERSON	0.99+
40	QUANTITY	0.99+
Mattia Valerio	PERSON	0.99+
Europe	LOCATION	0.99+
Dave	PERSON	0.99+
Lisa	PERSON	0.99+
0.8%	QUANTITY	0.99+
Al Meca	ORGANIZATION	0.99+
2020	DATE	0.99+
three	QUANTITY	0.99+
90%	QUANTITY	0.99+
Alma Informatica	ORGANIZATION	0.99+
10 times	QUANTITY	0.99+
2005	DATE	0.99+
6%	QUANTITY	0.99+
2010	DATE	0.99+
4%	QUANTITY	0.99+
first	QUANTITY	0.99+
2030	DATE	0.99+
2%	QUANTITY	0.99+
70%	QUANTITY	0.99+
ELEX	ORGANIZATION	0.99+
2025	DATE	0.99+
80%	QUANTITY	0.99+
Pure Storage	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
twice	QUANTITY	0.99+
two teams	QUANTITY	0.99+
65%	QUANTITY	0.99+
Lombardi	LOCATION	0.99+
tomorrow	DATE	0.99+
second	QUANTITY	0.99+
Matea	PERSON	0.99+
Pure	ORGANIZATION	0.99+
2007	DATE	0.99+
demand@thecube.net	OTHER	0.99+
Cengage	ORGANIZATION	0.99+
First question	QUANTITY	0.99+
AJ	PERSON	0.99+
Element Informatica	ORGANIZATION	0.99+
today	DATE	0.99+
first part	QUANTITY	0.99+
six weeks	QUANTITY	0.99+
more than 97%	QUANTITY	0.99+
one	QUANTITY	0.99+
First	QUANTITY	0.99+
third	QUANTITY	0.99+
Today	DATE	0.99+
twenty twenty five	QUANTITY	0.99+
2020s	DATE	0.99+
two	QUANTITY	0.99+
two thousands	QUANTITY	0.99+
six years ago	DATE	0.99+
both	QUANTITY	0.99+

Breaking Analysis: re:Invent 2022 marks the next chapter in data & cloud

from the cube studios in Palo Alto in Boston bringing you data-driven insights from the cube and ETR this is breaking analysis with Dave vellante the ascendancy of AWS under the leadership of Andy jassy was marked by a tsunami of data and corresponding cloud services to leverage that data now those Services they mainly came in the form of Primitives I.E basic building blocks that were used by developers to create more sophisticated capabilities AWS in the 2020s being led by CEO Adam solipski will be marked by four high-level Trends in our opinion one A Rush of data that will dwarf anything we've previously seen two a doubling or even tripling down on the basic elements of cloud compute storage database security Etc three a greater emphasis on end-to-end integration of AWS services to simplify and accelerate customer adoption of cloud and four significantly deeper business integration of cloud Beyond it as an underlying element of organizational operations hello and welcome to this week's wikibon Cube insights powered by ETR in this breaking analysis we extract and analyze nuggets from John furrier's annual sit-down with the CEO of AWS we'll share data from ETR and other sources to set the context for the market and competition in cloud and we'll give you our glimpse of what to expect at re invent in 2022. now before we get into the core of our analysis Alibaba has announced earnings they always announced after the big three you know a month later and we've updated our Q3 slash November hyperscale Computing forecast for the year as seen here and we're going to spend a lot of time on this as most of you have seen the bulk of it already but suffice to say alibaba's cloud business is hitting that same macro Trend that we're seeing across the board but a more substantial slowdown than we expected and more substantial than its peers they're facing China headwinds they've been restructuring its Cloud business and it's led to significantly slower growth uh in in the you know low double digits as opposed to where we had it at 15 this puts our year-end estimates for 2022 Revenue at 161 billion still a healthy 34 growth with AWS surpassing 80 billion in 2022 Revenue now on a related note one of the big themes in Cloud that we've been reporting on is how customers are optimizing their Cloud spend it's a technique that they use and when the economy looks a little shaky and here's a graphic that we pulled from aws's website which shows the various pricing plans at a high level as you know they're much more granular than that and more sophisticated but Simplicity we'll just keep it here basically there are four levels first one here is on demand I.E pay by the drink now we're going to jump down to what we've labeled as number two spot instances that's like the right place at the right time I can use that extra capacity in the moment the third is reserved instances or RIS where I pay up front to get a discount and the fourth is sort of optimized savings plans where customers commit to a one or three year term and for a better price now you'll notice we labeled the choices in a different order than AWS presented them on its website and that's because we believe that the order that we chose is the natural progression for customers this started on demand they maybe experiment with spot instances they move to reserve instances when the cloud bill becomes too onerous and if you're large enough you lock in for one or three years okay the interesting thing is the order in which AWS presents them we believe that on-demand accounts for the majority of AWS customer spending now if you think about it those on-demand customers they're also at risk customers yeah sure there's some switching costs like egress and learning curve but many customers they have multiple clouds and they've got experience and so they're kind of already up to a learning curve and if you're not married to AWS with a longer term commitment there's less friction to switch now AWS here presents the most attractive plan from a financial perspective second after on demand and it's also the plan that makes the greatest commitment from a lock-in standpoint now In fairness to AWS it's also true that there is a trend towards subscription-based pricing and we have some data on that this chart is from an ETR drill down survey the end is 300. pay attention to the bars on the right the left side is sort of busy but the pink is subscription and you can see the trend upward the light blue is consumption based or on demand based pricing and you can see there's a steady Trend toward subscription now we'll dig into this in a later episode of Breaking analysis but we'll share with you a little some tidbits with the data that ETR provides you can select which segment is and pass or you can go up the stack Etc but so when you choose is and paths 44 of customers either prefer or are required to use on-demand pricing whereas around 40 percent of customers say they either prefer or are required to use subscription pricing again that's for is so now the further mu you move up the stack the more prominent subscription pricing becomes often with sixty percent or more for the software-based offerings that require or prefer subscription and interestingly cyber security tracks along with software at around 60 percent that that prefer subscription it's likely because as with software you're not shutting down your cyber protection on demand all right let's get into the expectations for reinvent and we're going to start with an observation in data in this 2018 book seeing digital author David michella made the point that whereas most companies apply data on the periphery of their business kind of as an add-on function successful data companies like Google and Amazon and Facebook have placed data at the core of their operations they've operationalized data and they apply machine intelligence to that foundational element why is this the fact is it's not easy to do what the internet Giants have done very very sophisticated engineering and and and cultural discipline and this brings us to reinvent 2022 in the future of cloud machine learning and AI will increasingly be infused into applications we believe the data stack and the application stack are coming together as organizations build data apps and data products data expertise is moving from the domain of Highly specialized individuals to Everyday business people and we are just at the cusp of this trend this will in our view be a massive theme of not only re invent 22 but of cloud in the 2020s the vision of data mesh We Believe jamachtagani's principles will be realized in this decade now what we'd like to do now is share with you a glimpse of the thinking of Adam solipsky from his sit down with John Furrier each year John has a one-on-one conversation with the CEO of AWS AWS he's been doing this for years and the outcome is a better understanding of the directional thinking of the leader of the number one Cloud platform so we're now going to share some direct quotes I'm going to run through them with some commentary and then bring in some ETR data to analyze the market implications here we go this is from solipsky quote I.T in general and data are moving from departments into becoming intrinsic parts of how businesses function okay we're talking here about deeper business integration let's go on to the next one quote in time we'll stop talking about people who have the word analyst we inserted data he meant data data analyst in their title rather will have hundreds of millions of people who analyze data as part of their day-to-day job most of whom will not have the word analyst anywhere in their title we're talking about graphic designers and pizza shop owners and product managers and data scientists as well he threw that in I'm going to come back to that very interesting so he's talking about here about democratizing data operationalizing data next quote customers need to be able to take an end-to-end integrated view of their entire data Journey from ingestion to storage to harmonizing the data to being able to query it doing business Intelligence and human-based Analysis and being able to collaborate and share data and we've been putting together we being Amazon together a broad Suite of tools from database to analytics to business intelligence to help customers with that and this last statement it's true Amazon has a lot of tools and you know they're beginning to become more and more integrated but again under jassy there was not a lot of emphasis on that end-to-end integrated view we believe it's clear from these statements that solipsky's customer interactions are leading him to underscore that the time has come for this capability okay continuing quote if you have data in one place you shouldn't have to move it every time you want to analyze that data couldn't agree more it would be much better if you could leave that data in place avoid all the ETL which has become a nasty three-letter word more and more we're building capabilities where you can query that data in place end quote okay this we see a lot in the marketplace Oracle with mySQL Heatwave the entire Trend toward converge database snowflake [ __ ] extending their platforms into transaction and analytics respectively and so forth a lot of the partners are are doing things as well in that vein let's go into the next quote the other phenomenon is infusing machine learning into all those capabilities yes the comments from the michelleographic come into play here infusing Ai and machine intelligence everywhere next one quote it's not a data Cloud it's not a separate Cloud it's a series of broad but integrated capabilities to help you manage the end-to-end life cycle of your data there you go we AWS are the cloud we're going to come back to that in a moment as well next set of comments around data very interesting here quote data governance is a huge issue really what customers need is to find the right balance of their organization between access to data and control and if you provide too much access then you're nervous that your data is going to end up in places that it shouldn't shouldn't be viewed by people who shouldn't be viewing it and you feel like you lack security around that data and by the way what happens then is people overreact and they lock it down so that almost nobody can see it it's those handcuffs there's data and asset are reliability we've talked about that for years okay very well put by solipsky but this is a gap in our in our view within AWS today and we're we're hoping that they close it at reinvent it's not easy to share data in a safe way within AWS today outside of your organization so we're going to look for that at re invent 2022. now all this leads to the following statement by solipsky quote data clean room is a really interesting area and I think there's a lot of different Industries in which clean rooms are applicable I think that clean rooms are an interesting way of enabling multiple parties to share and collaborate on the data while completely respecting each party's rights and their privacy mandate okay again this is a gap currently within AWS today in our view and we know snowflake is well down this path and databricks with Delta sharing is also on this curve so AWS has to address this and demonstrate this end-to-end data integration and the ability to safely share data in our view now let's bring in some ETR spending data to put some context around these comments with reference points in the form of AWS itself and its competitors and partners here's a chart from ETR that shows Net score or spending momentum on the x-axis an overlap or pervasiveness in the survey um sorry let me go back up the net scores on the y-axis and overlap or pervasiveness in the survey is on the x-axis so spending momentum by pervasiveness okay or should have share within the data set the table that's inserted there with the Reds and the greens that informs us to how the dots are positioned so it's Net score and then the shared ends are how the plots are determined now we've filtered the data on the three big data segments analytics database and machine learning slash Ai and we've only selected one company with fewer than 100 ends in the survey and that's databricks you'll see why in a moment the red dotted line indicates highly elevated customer spend at 40 percent now as usual snowflake outperforms all players on the y-axis with a Net score of 63 percent off the charts all three big U.S cloud players are above that line with Microsoft and AWS dominating the x-axis so very impressive that they have such spending momentum and they're so large and you see a number of other emerging data players like rafana and datadog mongodbs there in the mix and then more established players data players like Splunk and Tableau now you got Cisco who's gonna you know it's a it's a it's a adjacent to their core networking business but they're definitely into you know the analytics business then the really established players in data like Informatica IBM and Oracle all with strong presence but you'll notice in the red from the momentum standpoint now what you're going to see in a moment is we put red highlights around databricks Snowflake and AWS why let's bring that back up and we'll explain so there's no way let's bring that back up Alex if you would there's no way AWS is going to hit the brakes on innovating at the base service level what we call Primitives earlier solipsky told Furrier as much in their sit down that AWS will serve the technical user and data science Community the traditional domain of data bricks and at the same time address the end-to-end integration data sharing and business line requirements that snowflake is positioned to serve now people often ask Snowflake and databricks how will you compete with the likes of AWS and we know the answer focus on data exclusively they have their multi-cloud plays perhaps the more interesting question is how will AWS compete with the likes of Specialists like Snowflake and data bricks and the answer is depicted here in this chart AWS is going to serve both the technical and developer communities and the data science audience and through end-to-end Integrations and future services that simplify the data Journey they're going to serve the business lines as well but the Nuance is in all the other dots in the hundreds or hundreds of thousands that are not shown here and that's the AWS ecosystem you can see AWS has earned the status of the number one Cloud platform that everyone wants to partner with as they say it has over a hundred thousand partners and that ecosystem combined with these capabilities that we're discussing well perhaps behind in areas like data sharing and integrated governance can wildly succeed by offering the capabilities and leveraging its ecosystem now for their part the snowflakes of the world have to stay focused on the mission build the best products possible and develop their own ecosystems to compete and attract the Mind share of both developers and business users and that's why it's so interesting to hear solipski basically say it's not a separate Cloud it's a set of integrated Services well snowflake is in our view building a super cloud on top of AWS Azure and Google when great products meet great sales and marketing good things can happen so this will be really fun to watch what AWS announces in this area at re invent all right one other topic that solipsky talked about was the correlation between serverless and container adoption and you know I don't know if this gets into there certainly their hybrid place maybe it starts to get into their multi-cloud we'll see but we have some data on this so again we're talking about the correlation between serverless and container adoption but before we get into that let's go back to 2017 and listen to what Andy jassy said on the cube about serverless play the clip very very earliest days of AWS Jeff used to say a lot if I were starting Amazon today I'd have built it on top of AWS we didn't have all the capability and all the functionality at that very moment but he knew what was coming and he saw what people were still able to accomplish even with where the services were at that point I think the same thing is true here with Lambda which is I think if Amazon were starting today it's a given they would build it on the cloud and I think we with a lot of the applications that comprise Amazon's consumer business we would build those on on our serverless capabilities now we still have plenty of capabilities and features and functionality we need to add to to Lambda and our various serverless services so that may not be true from the get-go right now but I think if you look at the hundreds of thousands of customers who are building on top of Lambda and lots of real applications you know finra has built a good chunk of their market watch application on top of Lambda and Thompson Reuters has built you know one of their key analytics apps like people are building real serious things on top of Lambda and the pace of iteration you'll see there will increase as well and I really believe that to be true over the next year or two so years ago when Jesse gave a road map that serverless was going to be a key developer platform going forward and so lipsky referenced the correlation between serverless and containers in the Furrier sit down so we wanted to test that within the ETR data set now here's a screen grab of The View across 1300 respondents from the October ETR survey and what we've done here is we've isolated on the cloud computing segment okay so you can see right there cloud computing segment now we've taken the functions from Google AWS Lambda and Microsoft Azure functions all the serverless offerings and we've got Net score on the vertical axis we've got presence in the data set oh by the way 440 by the way is highly elevated remember that and then we've got on the horizontal axis we have the presence in the data center overlap okay that's relative to each other so remember 40 all these guys are above that 40 mark okay so you see that now what we're going to do this is just for serverless and what we're going to do is we're going to turn on containers to see the correlation and see what happens so watch what happens when we click on container boom everything moves to the right you can see all three move to the right Google drops a little bit but all the others now the the filtered end drops as well so you don't have as many people that are aggressively leaning into both but all three move to the right so watch again containers off and then containers on containers off containers on so you can see a really major correlation between containers and serverless okay so to get a better understanding of what that means I call my friend and former Cube co-host Stu miniman what he said was people generally used to think of VMS containers and serverless as distinctly different architectures but the lines are beginning to blur serverless makes things simpler for developers who don't want to worry about underlying infrastructure as solipsky and the data from ETR indicate serverless and containers are coming together but as Stu and I discussed there's a spectrum where on the left you have kind of native Cloud VMS in the middle you got AWS fargate and in the rightmost anchor is Lambda AWS Lambda now traditionally in the cloud if you wanted to use containers developers would have to build a container image they have to select and deploy the ec2 images that they or instances that they wanted to use they have to allocate a certain amount of memory and then fence off the apps in a virtual machine and then run the ec2 instances against the apps and then pay for all those ec2 resources now with AWS fargate you can run containerized apps with less infrastructure management but you still have some you know things that you can you can you can do with the with the infrastructure so with fargate what you do is you'd build the container images then you'd allocate your memory and compute resources then run the app and pay for the resources only when they're used so fargate lets you control the runtime environment while at the same time simplifying the infrastructure management you gotta you don't have to worry about isolating the app and other stuff like choosing server types and patching AWS does all that for you then there's Lambda with Lambda you don't have to worry about any of the underlying server infrastructure you're just running code AS functions so the developer spends their time worrying about the applications and the functions that you're calling the point is there's a movement and we saw in the data towards simplifying the development environment and allowing the cloud vendor AWS in this case to do more of the underlying management now some folks will still want to turn knobs and dials but increasingly we're going to see more higher level service adoption now re invent is always a fire hose of content so let's do a rapid rundown of what to expect we talked about operate optimizing data and the organization we talked about Cloud optimization there'll be a lot of talk on the show floor about best practices and customer sharing data solipsky is leading AWS into the next phase of growth and that means moving beyond I.T transformation into deeper business integration and organizational transformation not just digital transformation organizational transformation so he's leading a multi-vector strategy serving the traditional peeps who want fine-grained access to core services so we'll see continued Innovation compute storage AI Etc and simplification through integration and horizontal apps further up to stack Amazon connect is an example that's often cited now as we've reported many times databricks is moving from its stronghold realm of data science into business intelligence and analytics where snowflake is coming from its data analytics stronghold and moving into the world of data science AWS is going down a path of snowflake meet data bricks with an underlying cloud is and pass layer that puts these three companies on a very interesting trajectory and you can expect AWS to go right after the data sharing opportunity and in doing so it will have to address data governance they go hand in hand okay price performance that is a topic that will never go away and it's something that we haven't mentioned today silicon it's a it's an area we've covered extensively on breaking analysis from Nitro to graviton to the AWS acquisition of Annapurna its secret weapon new special specialized capabilities like inferential and trainium we'd expect something more at re invent maybe new graviton instances David floyer our colleague said he's expecting at some point a complete system on a chip SOC from AWS and maybe an arm-based server to eventually include high-speed cxl connections to devices and memories all to address next-gen applications data intensive applications with low power requirements and lower cost overall now of course every year Swami gives his usual update on machine learning and AI building on Amazon's years of sagemaker innovation perhaps a focus on conversational AI or a better support for vision and maybe better integration across Amazon's portfolio of you know large language models uh neural networks generative AI really infusing AI everywhere of course security always high on the list that reinvent and and Amazon even has reinforce a conference dedicated to it uh to security now here we'd like to see more on supply chain security and perhaps how AWS can help there as well as tooling to make the cio's life easier but the key so far is AWS is much more partner friendly in the security space than say for instance Microsoft traditionally so firms like OCTA and crowdstrike in Palo Alto have plenty of room to play in the AWS ecosystem we'd expect of course to hear something about ESG it's an important topic and hopefully how not only AWS is helping the environment that's important but also how they help customers save money and drive inclusion and diversity again very important topics and finally come back to it reinvent is an ecosystem event it's the Super Bowl of tech events and the ecosystem will be out in full force every tech company on the planet will have a presence and the cube will be featuring many of the partners from the serial floor as well as AWS execs and of course our own independent analysis so you'll definitely want to tune into thecube.net and check out our re invent coverage we start Monday evening and then we go wall to wall through Thursday hopefully my voice will come back we have three sets at the show and our entire team will be there so please reach out or stop by and say hello all right we're going to leave it there for today many thanks to Stu miniman and David floyer for the input to today's episode of course John Furrier for extracting the signal from the noise and a sit down with Adam solipski thanks to Alex Meyerson who was on production and manages the podcast Ken schiffman as well Kristen Martin and Cheryl Knight helped get the word out on social and of course in our newsletters Rob hoef is our editor-in-chief over at siliconangle does some great editing thank thanks to all of you remember all these episodes are available as podcasts wherever you listen you can pop in the headphones go for a walk just search breaking analysis podcast I published each week on wikibon.com at siliconangle.com or you can email me at david.valante at siliconangle.com or DM me at di vallante or please comment on our LinkedIn posts and do check out etr.ai for the best survey data in the Enterprise Tech business this is Dave vellante for the cube insights powered by ETR thanks for watching we'll see it reinvent or we'll see you next time on breaking analysis [Music]

Published Date : Nov 26 2022

SUMMARY :

so now the further mu you move up the

ENTITIES

Entity	Category	Confidence
David michella	PERSON	0.99+
Alex Meyerson	PERSON	0.99+
Cheryl Knight	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Alibaba	ORGANIZATION	0.99+
one	QUANTITY	0.99+
Dave vellante	PERSON	0.99+
David floyer	PERSON	0.99+
Kristen Martin	PERSON	0.99+
John	PERSON	0.99+
sixty percent	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
Adam solipski	PERSON	0.99+
John Furrier	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
2022	DATE	0.99+
Andy jassy	PERSON	0.99+
Google	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
hundreds	QUANTITY	0.99+
2017	DATE	0.99+
Palo Alto	LOCATION	0.99+
40 percent	QUANTITY	0.99+
alibaba	ORGANIZATION	0.99+
Lambda	TITLE	0.99+
63 percent	QUANTITY	0.99+
1300 respondents	QUANTITY	0.99+
Super Bowl	EVENT	0.99+
80 billion	QUANTITY	0.99+
John furrier	PERSON	0.99+
Thursday	DATE	0.99+
Cisco	ORGANIZATION	0.99+
three years	QUANTITY	0.99+
Monday evening	DATE	0.99+
Jesse	PERSON	0.99+
Stu miniman	PERSON	0.99+
siliconangle.com	OTHER	0.99+
October	DATE	0.99+
thecube.net	OTHER	0.99+
fourth	QUANTITY	0.99+
a month later	DATE	0.99+
third	QUANTITY	0.99+
hundreds of thousands	QUANTITY	0.99+
fargate	ORGANIZATION	0.99+

Lie 3, Today’s Modern Data Stack Is Modern | Starburst

(energetic music) >> Okay, we're back with Justin Borgman, CEO of Starburst, Richard Jarvis is the CTO of EMIS Health, and Teresa Tung is the cloud first technologist from Accenture. We're on to lie number three. And that is the claim that today's "Modern Data Stack" is actually modern. So (chuckles), I guess that's the lie. Or, is that it's not modern. Justin, what do you say? >> Yeah, I think new isn't modern. Right? I think it's the new data stack. It's the cloud data stack, but that doesn't necessarily mean it's modern. I think a lot of the components actually, are exactly the same as what we've had for 40 years. Rather than Teradata, you have Snowflake. Rather than Informatica, you have Fivetran. So, it's the same general stack, just, y'know, a cloud version of it. And I think a lot of the challenges that have plagued us for 40 years still maintain. >> So, let me come back to you Justin. Okay, but there are differences, right? You can scale. You can throw resources at the problem. You can separate compute from storage. You really, there's a lot of money being thrown at that by venture capitalists, and Snowflake you mentioned, its competitors. So that's different. Is it not? Is that not at least an aspect of modern dial it up, dial it down? So what do you say to that? >> Well, it is. It's certainly taking, y'know what the cloud offers and taking advantage of that. But it's important to note that the cloud data warehouses out there are really just separating their compute from their storage. So it's allowing them to scale up and down, but your data's still stored in a proprietary format. You're still locked in. You still have to ingest the data to get it even prepared for analysis. So a lot of the same structural constraints that exist with the old enterprise data warehouse model on-preem still exist. Just yes, a little bit more elastic now because the cloud offers that. >> So Teresa, let me go to you, 'cause you have cloud-first in your title. So, what's say you to this conversation? >> Well, even the cloud providers are looking towards more of a cloud continuum, right? So the centralized cloud as we know it, maybe data lake, data warehouse in the central place, that's not even how the cloud providers are looking at it. They have use query services. Every provider has one that really expands those queries to be beyond a single location. And if we look at a lot of where our- the future goes, right? That's going to very much fall the same thing. There was going to be more edge. There's going to be more on-premise, because of data sovereignty, data gravity, because you're working with different parts of the business that have already made major cloud investments in different cloud providers, right? So, there's a lot of reasons why the modern, I guess, the next modern generation of the data stack needs to be much more federated. >> Okay, so Richard, how do you deal with this? You've obviously got, you know, the technical debt, the existing infrastructure, it's on the books. You don't want to just throw it out. A lot of conversation about modernizing applications, which a lot of times is, you know, of microservices layer on top of legacy apps. How do you think about the Modern Data Stack? >> Well, I think probably the first thing to say is that the stack really has to include the processes and people around the data as well is all well and good changing the technology. But if you don't modernize how people use that technology, then you're not going to be able to, to scale because just 'cause you can scale CPU and storage doesn't mean you can get more people to use your data to generate you more value for the business. And so what we've been looking at is really changing in very much aligned to data products and, and data mesh. How do you enable more people to consume the service and have the stack respond in a way that keeps costs low? Because that's important for our customers consuming this data but also allows people to occasionally run enormous queries and then tick along with smaller ones when required. And it's a good job we did because during COVID all of a sudden we had enormous pressures on our data platform to answer really important life threatening queries. And if we couldn't scale both our data stack and our teams we wouldn't have been able to answer those as quickly as we had. So I think the stack needs to support a scalable business not just the technology itself. >> Well thank you for that. So Justin let's, let's try to break down what the critical aspects are of the modern data stack. So you think about the past, you know, five seven years cloud obviously has given a different pricing model. Derisked experimentation, you know that we talked about the ability to scale up scale down, but it's, I'm taking away that that's not enough. Based on what Richard just said, the modern data stack has to serve the business and enable the business to build data products. I buy that. I'm you a big fan of the data mesh concepts, even though we're early days. So what are the critical aspects if you had to think about you know, the, maybe putting some guardrails and definitions around the modern data stack, what does that look like? What are some of the attributes and, and principles there >> Of how it should look like or, or how >> Yeah. What it should be? >> Yeah. Yeah. Well, I think, you know, in, in Theresa mentioned this in in a previous segment about the data warehouse is not necessarily going to disappear. It just becomes one node, one element of the overall data mesh. And I certainly agree with that. So by no means, are we suggesting that, you know Snowflake or what Redshift or whatever cloud data warehouse you may be using is going to disappear, but it's it's not going to become the end all be all. It's not the, the central single source of truth. And I think that's the paradigm shift that needs to occur. And I think it's also worth noting that those who were the early adopters of the modern data stack were primarily digital, native born in the cloud young companies who had the benefit of of idealism. They had the benefit of starting with a clean slate that does not reflect the vast majority of enterprises. And even those companies, as they grow up, mature out of that ideal state, they go by a business. Now they've got something on another cloud provider that has a different data stack and they have to deal with that heterogeneity that is just change and change is a part of life. And so I think there is an element here that is almost philosophical. It's like, do you believe in an absolute ideal where I can just fit everything into one place or do I believe in reality? And I think the far more pragmatic approach is really what data mesh represents. So to answer your question directly, I think it's adding you know, the ability to access data that lives outside of the data warehouse, maybe living in open data formats in a data lake or accessing operational systems as well. Maybe you want to directly access data that lives in an Oracle database or a Mongo database or, or what have you. So creating that flexibility to really future proof yourself from the inevitable change that you will you won't encounter over time. >> So thank you. So Theresa, based on what Justin just said, I I might take away there is it's inclusive whether it's a data mart, data hub, data lake, data warehouse, just a node on the mesh. Okay. I get that. Does that include Theresa on, on Preem data? Obviously it has to. What are you seeing in terms of the ability to, to take that data mesh concept on Preem I mean most implementations I've seen and data mesh, frankly really aren't, you know adhering to the philosophy there. Maybe, maybe it's data lake and maybe it's using glue. You look at what JPMC is doing, HelloFresh, a lot of stuff happening on the AWS cloud in that, you know, closed stack, if you will. What's the answer to that Theresa? >> I mean, I think it's a killer case for data mesh. The fact that you have valuable data sources on Preem, and then yet you still want to modernize and take the best of cloud. Cloud is still, like we mentioned, there's a lot of great reasons for it around the economics and the way ability to tap into the innovation that the cloud providers are giving around data and AI architecture. It's an easy button. So the mesh allows you to have the best of both world. You can start using the data products on Preem, or in the existing systems that are working already. It's meaningful for the business. At the same time, you can modernize the ones that make business sense because it needs better performance. It needs, you know, something that is, is cheaper or or maybe just tapping into better analytics to get better insights, right? So you're going to be able to stretch and really have the best of both worlds. That, again, going back to Richard's point, that is meaningful by the business. Not everything has to have that one size fits all set a tool. >> Okay. Thank you. So Richard, you know, talking about data as product wonder if we could give us your perspectives here what are the advantages of treating data as a product? What, what role do data products have in the modern data stack? We talk about monetizing data. What are your thoughts on data products? >> So for us, one of the most important data products that we've been creating is taking data that is healthcare data across a wide variety of different settings. So information about patients, demographics about their their treatment, about their medications and so on, and taking that into a standards format that can be utilized by a wide variety of different researchers because misinterpreting that data or having the data not presented in the way that the user is expecting means that you generate the wrong insight and in any business that's clearly not a desirable outcome but when that insight is so critical as it might be in healthcare or some security settings you really have to have gone to the trouble of understanding the data, presenting it in a format that everyone can clearly agree on. And then letting people consume in a very structured managed way, even if that data comes from a variety of different sources in the first place. And so our data product journey has really begun by standardizing data across a number of different silos through the data mesh. So we can present out both internally and through the right governance externally to, to researchers. >> So that data product through whatever APIs is is accessible, it's discoverable, but it's obviously got to be governed as well. You mentioned appropriately provided to internally. >> Yeah. >> But also, you know, external folks as well. So the, so you've, you've architected that capability today? >> We have and because the data is standard it can generate value much more quickly and we can be sure of the security and value that that's providing, because the data product isn't just about formatting the data into the correct tables, it's understanding what it means to redact the data or to remove certain rows from it or to interpret what a date actually means. Is it the start of the contract or the start of the treatment or the date of birth of a patient? These things can be lost in the data storage without having the proper product management around the data to say in a very clear business context what does this data mean, and what does it mean to process this data for a particular use case. >> Yeah, it makes sense. It's got the context. If the, if the domains on the data, you know you got to cut through a lot of the, the centralized teams, the technical teams that that data agnostic, they don't really have that context. All right, let's end. Justin. How does Starburst fit into this modern data stack? Bring us home. >> Yeah. So I think for us it's really providing our customers with, you know the flexibility to operate and analyze data that lives in a wide variety of different systems. Ultimately giving them that optionality, you know and optionality provides the ability to reduce costs store more in a data lake rather than data warehouse. It provides the ability for the fastest time to insight to access the data directly where it lives. And ultimately with this concept of data products that we've now, you know incorporated into our offering as well you can really create and, and curate, you know data as a product to be shared and consumed. So we're trying to help enable the data mesh, you know model and make that an appropriate compliment to you know, the modern data stack that people have today. >> Excellent. Hey, I want to thank Justin, Teresa, and Richard for joining us today. You guys are great. Big believers in the in the data mesh concept, and I think, you know we're seeing the future of data architecture. So thank you. Now, remember, all these conversations are going to be available on the cube.net for on demand viewing. You can also go to starburst.io. They have some great content on the website and they host some really thought provoking interviews and they have awesome resources. Lots of data mesh conversations over there and really good stuff in, in the resource section. So check that out. Thanks for watching the "Data Doesn't Lie... or Does It?" made possible by Starburst data. This is Dave Vellante for the Cube, and we'll see you next time. (upbeat music)

Published Date : Aug 22 2022

SUMMARY :

And that is the claim It's the cloud data stack, So, let me come back to you Justin. that the cloud data warehouses out there So Teresa, let me go to you, So the centralized cloud as we know it, it's on the books. the first thing to say is of the modern data stack. from the inevitable change that you will What's the answer to that Theresa? So the mesh allows you to in the modern data stack? or having the data not presented So that data product But also, you know, around the data to say in a on the data, you know enable the data mesh, you know in the data mesh concept,

ENTITIES

Entity	Category	Confidence
Richard	PERSON	0.99+
Teresa Tung	PERSON	0.99+
Justin	PERSON	0.99+
Teresa	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Justin Borgman	PERSON	0.99+
Richard Jarvis	PERSON	0.99+
40 years	QUANTITY	0.99+
Theresa	PERSON	0.99+
Starburst	ORGANIZATION	0.99+
JPMC	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Informatica	ORGANIZATION	0.99+
Accenture	ORGANIZATION	0.99+
both worlds	QUANTITY	0.99+
today	DATE	0.99+
EMIS Health	ORGANIZATION	0.99+
first technologist	QUANTITY	0.98+
one element	QUANTITY	0.98+
both	QUANTITY	0.98+
first thing	QUANTITY	0.98+
five seven years	QUANTITY	0.98+
one	QUANTITY	0.97+
Teradata	ORGANIZATION	0.97+
Oracle	ORGANIZATION	0.97+
cube.net	OTHER	0.96+
Mongo	ORGANIZATION	0.95+
one size	QUANTITY	0.93+
Cube	ORGANIZATION	0.92+
Preem	TITLE	0.92+
both world	QUANTITY	0.91+
one place	QUANTITY	0.91+
Today’s	TITLE	0.89+
Fivetran	ORGANIZATION	0.86+
Data Doesn't Lie... or Does It?	TITLE	0.86+
single location	QUANTITY	0.85+
HelloFresh	ORGANIZATION	0.84+
first place	QUANTITY	0.83+
CEO	PERSON	0.83+
Lie	TITLE	0.82+
single source	QUANTITY	0.79+
first	QUANTITY	0.75+
one node	QUANTITY	0.72+
Snowflake	ORGANIZATION	0.66+
Snowflake	TITLE	0.66+
three	QUANTITY	0.59+
CTO	PERSON	0.53+
Data Stack	TITLE	0.53+
Redshift	TITLE	0.52+
starburst.io	OTHER	0.48+
COVID	TITLE	0.37+

Starburst The Data Lies FULL V2b

>>In 2011, early Facebook employee and Cloudera co-founder Jeff Ocker famously said the best minds of my generation are thinking about how to get people to click on ads. And that sucks. Let's face it more than a decade later organizations continue to be frustrated with how difficult it is to get value from data and build a truly agile data-driven enterprise. What does that even mean? You ask? Well, it means that everyone in the organization has the data they need when they need it. In a context that's relevant to advance the mission of an organization. Now that could mean cutting cost could mean increasing profits, driving productivity, saving lives, accelerating drug discovery, making better diagnoses, solving, supply chain problems, predicting weather disasters, simplifying processes, and thousands of other examples where data can completely transform people's lives beyond manipulating internet users to behave a certain way. We've heard the prognostications about the possibilities of data before and in fairness we've made progress, but the hard truth is the original promises of master data management, enterprise data, warehouses, data marts, data hubs, and yes, even data lakes were broken and left us wanting from more welcome to the data doesn't lie, or doesn't a series of conversations produced by the cube and made possible by Starburst data. >>I'm your host, Dave Lanta and joining me today are three industry experts. Justin Borgman is this co-founder and CEO of Starburst. Richard Jarvis is the CTO at EMI health and Theresa tongue is cloud first technologist at Accenture. Today we're gonna have a candid discussion that will expose the unfulfilled and yes, broken promises of a data past we'll expose data lies, big lies, little lies, white lies, and hidden truths. And we'll challenge, age old data conventions and bust some data myths. We're debating questions like is the demise of a single source of truth. Inevitable will the data warehouse ever have featured parody with the data lake or vice versa is the so-called modern data stack, simply centralization in the cloud, AKA the old guards model in new cloud close. How can organizations rethink their data architectures and regimes to realize the true promises of data can and will and open ecosystem deliver on these promises in our lifetimes, we're spanning much of the Western world today. Richard is in the UK. Teresa is on the west coast and Justin is in Massachusetts with me. I'm in the cube studios about 30 miles outside of Boston folks. Welcome to the program. Thanks for coming on. Thanks for having us. Let's get right into it. You're very welcome. Now here's the first lie. The most effective data architecture is one that is centralized with a team of data specialists serving various lines of business. What do you think Justin? >>Yeah, definitely a lie. My first startup was a company called hit adapt, which was an early SQL engine for hit that was acquired by Teradata. And when I got to Teradata, of course, Teradata is the pioneer of that central enterprise data warehouse model. One of the things that I found fascinating was that not one of their customers had actually lived up to that vision of centralizing all of their data into one place. They all had data silos. They all had data in different systems. They had data on prem data in the cloud. You know, those companies were acquiring other companies and inheriting their data architecture. So, you know, despite being the industry leader for 40 years, not one of their customers truly had everything in one place. So I think definitely history has proven that to be a lie. >>So Richard, from a practitioner's point of view, you know, what, what are your thoughts? I mean, there, there's a lot of pressure to cut cost, keep things centralized, you know, serve the business as best as possible from that standpoint. What, what is your experience show? >>Yeah, I mean, I think I would echo Justin's experience really that we, as a business have grown up through acquisition, through storing data in different places sometimes to do information governance in different ways to store data in, in a platform that's close to data experts, people who really understand healthcare data from pharmacies or from, from doctors. And so, although if you were starting from a Greenfield site and you were building something brand new, you might be able to centralize all the data and all of the tooling and teams in one place. The reality is that that businesses just don't grow up like that. And, and it's just really impossible to get that academic perfection of, of storing everything in one place. >>Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, you know, right. You actually did have to have a single version of the truth for certain financial data, but really for those, some of those other use cases, I, I mentioned, I, I do feel like the industry has kinda let us down. What's your take on this? Where does it make sense to have that sort of centralized approach versus where does it make sense to maybe decentralized? >>I, I think you gotta have centralized governance, right? So from the central team, for things like star Oxley, for things like security for certainly very core data sets, having a centralized set of roles, responsibilities to really QA, right. To serve as a design authority for your entire data estate, just like you might with security, but how it's implemented has to be distributed. Otherwise you're not gonna be able to scale. Right? So being able to have different parts of the business really make the right data investments for their needs. And then ultimately you're gonna collaborate with your partners. So partners that are not within the company, right. External partners, we're gonna see a lot more data sharing and model creation. And so you're definitely going to be decentralized. >>So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, on data mesh. It was a great program. You invited Jamma, Dani, of course, she's the creator of the data mesh. And her one of our fundamental premises is that you've got this hyper specialized team that you've gotta go through. And if you want anything, but at the same time, these, these individuals actually become a bottleneck, even though they're some of the most talented people in the organization. So I guess question for you, Richard, how do you deal with that? Do you, do you organize so that there are a few sort of rock stars that, that, you know, build cubes and, and the like, and, and, and, or have you had any success in sort of decentralizing with, you know, your, your constituencies, that data model? >>Yeah. So, so we absolutely have got rockstar, data scientists and data guardians. If you like people who understand what it means to use this data, particularly as the data that we use at emos is very private it's healthcare information. And some of the, the rules and regulations around using the data are very complex and, and strict. So we have to have people who understand the usage of the data, then people who understand how to build models, how to process the data effectively. And you can think of them like consultants to the wider business, because a pharmacist might not understand how to structure a SQL query, but they do understand how they want to process medication information to improve patient lives. And so that becomes a, a consulting type experience from a, a set of rock stars to help a, a more decentralized business who needs to, to understand the data and to generate some valuable output. >>Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, I got a centralized team and that's the most cost effective way to serve the business. Otherwise I got, I got duplication. What do you say to that? >>Well, I, I would argue it's probably not the most cost effective and, and the reason being really twofold. I think, first of all, when you are deploying a enterprise data warehouse model, the, the data warehouse itself is very expensive, generally speaking. And so you're putting all of your most valuable data in the hands of one vendor who now has tremendous leverage over you, you know, for many, many years to come. I think that's the story at Oracle or Terra data or other proprietary database systems. But the other aspect I think is that the reality is those central data warehouse teams is as much as they are experts in the technology. They don't necessarily understand the data itself. And this is one of the core tenants of data mash that that jam writes about is this idea of the domain owners actually know the data the best. >>And so by, you know, not only acknowledging that data is generally decentralized and to your earlier point about SAR, brain Oxley, maybe saving the data warehouse, I would argue maybe GDPR and data sovereignty will destroy it because data has to be decentralized for, for those laws to be compliant. But I think the reality is, you know, the data mesh model basically says, data's decentralized, and we're gonna turn that into an asset rather than a liability. And we're gonna turn that into an asset by empowering the people that know the data, the best to participate in the process of, you know, curating and creating data products for, for consumption. So I think when you think about it, that way, you're going to get higher quality data and faster time to insight, which is ultimately going to drive more revenue for your business and reduce costs. So I think that that's the way I see the two, the two models comparing and contrasting. >>So do you think the demise of the data warehouse is inevitable? I mean, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing infrastructure. Maybe they're gonna build on top of it, but what does that mean? Does that mean the E D w just becomes, you know, less and less valuable over time, or it's maybe just isolated to specific use cases. What's your take on that? >>Listen, I still would love all my data within a data warehouse would love it. Mastered would love it owned by essential team. Right? I think that's still what I would love to have. That's just not the reality, right? The investment to actually migrate and keep that up to date. I would say it's a losing battle. Like we've been trying to do it for a long time. Nobody has the budgets and then data changes, right? There's gonna be a new technology. That's gonna emerge that we're gonna wanna tap into. There's going to be not enough investment to bring all the legacy, but still very useful systems into that centralized view. So you keep the data warehouse. I think it's a very, very valuable, very high performance tool for what it's there for, but you could have this, you know, new mesh layer that still takes advantage of the things. I mentioned, the data products in the systems that are meaningful today and the data products that actually might span a number of systems, maybe either those that either source systems for the domains that know it best, or the consumer based systems and products that need to be packaged in a way that be really meaningful for that end user, right? Each of those are useful for a different part of the business and making sure that the mesh actually allows you to use all of them. >>So, Richard, let me ask you, you take, take Gemma's principles back to those. You got to, you know, domain ownership and, and, and data as product. Okay, great. Sounds good. But it creates what I would argue are two, you know, challenges, self-serve infrastructure let's park that for a second. And then in your industry, the one of the high, most regulated, most sensitive computational governance, how do you automate and ensure federated governance in that mesh model that Theresa was just talking about? >>Well, it absolutely depends on some of the tooling and processes that you put in place around those tools to be, to centralize the security and the governance of the data. And I think, although a data warehouse makes that very simple, cause it's a single tool, it's not impossible with some of the data mesh technologies that are available. And so what we've done at emus is we have a single security layer that sits on top of our data match, which means that no matter which user is accessing, which data source, we go through a well audited well understood security layer. That means that we know exactly who's got access to which data field, which data tables. And then everything that they do is, is audited in a very kind of standard way, regardless of the underlying data storage technology. So for me, although storing the data in one place might not be possible understanding where your source of truth is and securing that in a common way is still a valuable approach and you can do it without having to bring all that data into a single bucket so that it's all in one place. And, and so having done that and investing quite heavily in making that possible has paid dividends in terms of giving wider access to the platform and ensuring that only data that's available under GDPR and other regulations is being used by, by the data users. >>Yeah. So Justin, I mean, Democrat, we always talk about data democratization and you know, up until recently, they really haven't been line of sight as to how to get there. But do you have anything to add to this because you're essentially taking, you know, do an analytic queries and with data that's all dispersed all over the, how are you seeing your customers handle this, this challenge? >>Yeah. I mean, I think data products is a really interesting aspect of the answer to that. It allows you to, again, leverage the data domain owners, people know the data, the best to, to create, you know, data as a product ultimately to be consumed. And we try to represent that in our product as effectively a almost eCommerce like experience where you go and discover and look for the data products that have been created in your organization. And then you can start to consume them as, as you'd like. And so really trying to build on that notion of, you know, data democratization and self-service, and making it very easy to discover and, and start to use with whatever BI tool you, you may like, or even just running, you know, SQL queries yourself, >>Okay. G guys grab a sip of water. After this short break, we'll be back to debate whether proprietary or open platforms are the best path to the future of data excellence, keep it right there. >>Your company has more data than ever, and more people trying to understand it, but there's a problem. Your data is stored across multiple systems. It's hard to access and that delays analytics and ultimately decisions. The old method of moving all of your data into a single source of truth is slow and definitely not built for the volume of data we have today or where we are headed while your data engineers spent over half their time, moving data, your analysts and data scientists are left, waiting, feeling frustrated, unproductive, and unable to move the needle for your business. But what if you could spend less time moving or copying data? What if your data consumers could analyze all your data quickly? >>Starburst helps your teams run fast queries on any data source. We help you create a single point of access to your data, no matter where it's stored. And we support high concurrency, we solve for speed and scale, whether it's fast, SQL queries on your data lake or faster queries across multiple data sets, Starburst helps your teams run analytics anywhere you can't afford to wait for data to be available. Your team has questions that need answers. Now with Starburst, the wait is over. You'll have faster access to data with enterprise level security, easy connectivity, and 24 7 support from experts, organizations like Zolando Comcast and FINRA rely on Starburst to move their businesses forward. Contact our Trino experts to get started. >>We're back with Jess Borgman of Starburst and Richard Jarvis of EVAs health. Okay, we're gonna get to lie. Number two, and that is this an open source based platform cannot give you the performance and control that you can get with a proprietary system. Is that a lie? Justin, the enterprise data warehouse has been pretty dominant and has evolved and matured. Its stack has mature over the years. Why is it not the default platform for data? >>Yeah, well, I think that's become a lie over time. So I, I think, you know, if we go back 10 or 12 years ago with the advent of the first data lake really around Hudu, that probably was true that you couldn't get the performance that you needed to run fast, interactive, SQL queries in a data lake. Now a lot's changed in 10 or 12 years. I remember in the very early days, people would say, you you'll never get performance because you need to be column there. You need to store data in a column format. And then, you know, column formats we're introduced to, to data apes, you have Parque ORC file in aro that were created to ultimately deliver performance out of that. So, okay. We got, you know, largely over the performance hurdle, you know, more recently people will say, well, you don't have the ability to do updates and deletes like a traditional data warehouse. >>And now we've got the creation of new data formats, again like iceberg and Delta and Hodi that do allow for updates and delete. So I think the data lake has continued to mature. And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, know it takes six or seven years to build a functional database. I think that's that's right. And now we've had almost a decade go by. So, you know, these technologies have matured to really deliver very, very close to the same level performance and functionality of, of cloud data warehouses. So I think the, the reality is that's become a line and now we have large giant hyperscale internet companies that, you know, don't have the traditional data warehouse at all. They do all of their analytics in a data lake. So I think we've, we've proven that it's very much possible today. >>Thank you for that. And so Richard, talk about your perspective as a practitioner in terms of what open brings you versus, I mean, look closed is it's open as a moving target. I remember Unix used to be open systems and so it's, it is an evolving, you know, spectrum, but, but from your perspective, what does open give you that you can't get from a proprietary system where you are fearful of in a proprietary system? >>I, I suppose for me open buys us the ability to be unsure about the future, because one thing that's always true about technology is it evolves in a, a direction, slightly different to what people expect. And what you don't want to end up is done is backed itself into a corner that then prevents it from innovating. So if you have chosen a technology and you've stored trillions of records in that technology and suddenly a new way of processing or machine learning comes out, you wanna be able to take advantage and your competitive edge might depend upon it. And so I suppose for us, we acknowledge that we don't have perfect vision of what the future might be. And so by backing open storage technologies, we can apply a number of different technologies to the processing of that data. And that gives us the ability to remain relevant, innovate on our data storage. And we have bought our way out of the, any performance concerns because we can use cloud scale infrastructure to scale up and scale down as we need. And so we don't have the concerns that we don't have enough hardware today to process what we want to do, want to achieve. We can just scale up when we need it and scale back down. So open source has really allowed us to maintain the being at the cutting edge. >>So Jess, let me play devil's advocate here a little bit, and I've talked to Shaak about this and you know, obviously her vision is there's an open source that, that the data meshes open source, an open source tooling, and it's not a proprietary, you know, you're not gonna buy a data mesh. You're gonna build it with, with open source toolings and, and vendors like you are gonna support it, but to come back to sort of today, you can get to market with a proprietary solution faster. I'm gonna make that statement. You tell me if it's a lie and then you can say, okay, we support Apache iceberg. We're gonna support open source tooling, take a company like VMware, not really in the data business, but how, the way they embraced Kubernetes and, and you know, every new open source thing that comes along, they say, we do that too. Why can't proprietary systems do that and be as effective? >>Yeah, well, I think at least with the, within the data landscape saying that you can access open data formats like iceberg or, or others is, is a bit dis disingenuous because really what you're selling to your customer is a certain degree of performance, a certain SLA, and you know, those cloud data warehouses that can reach beyond their own proprietary storage drop all the performance that they were able to provide. So it is, it reminds me kind of, of, again, going back 10 or 12 years ago when everybody had a connector to Haddo and that they thought that was the solution, right? But the reality was, you know, a connector was not the same as running workloads in Haddo back then. And I think similarly, you know, being able to connect to an external table that lives in an open data format, you know, you're, you're not going to give it the performance that your customers are accustomed to. And at the end of the day, they're always going to be predisposed. They're always going to be incentivized to get that data ingested into the data warehouse, cuz that's where they have control. And you know, the bottom line is the database industry has really been built around vendor lockin. I mean, from the start, how, how many people love Oracle today, but our customers, nonetheless, I think, you know, lockin is, is, is part of this industry. And I think that's really what we're trying to change with open data formats. >>Well, that's interesting reminded when I, you know, I see the, the gas price, the tees or gas price I, I drive up and then I say, oh, that's the cash price credit card. I gotta pay 20 cents more, but okay. But so the, the argument then, so let me, let me come back to you, Justin. So what's wrong with saying, Hey, we support open data formats, but yeah, you're gonna get better performance if you, if you keep it into our closed system, are you saying that long term that's gonna come back and bite you cuz you're gonna end up, you mentioned Oracle, you mentioned Teradata. Yeah. That's by, by implication, you're saying that's where snowflake customers are headed. >>Yeah, absolutely. I think this is a movie that, you know, we've all seen before. At least those of us who've been in the industry long enough to, to see this movie play over a couple times. So I do think that's the future. And I think, you know, I loved what Richard said. I actually wrote it down. Cause I thought it was an amazing quote. He said, it buys us the ability to be unsure of the future. Th that that pretty much says it all the, the future is unknowable and the reality is using open data formats. You remain interoperable with any technology you want to utilize. If you want to use spark to train a machine learning model and you want to use Starbust to query via sequel, that's totally cool. They can both work off the same exact, you know, data, data sets by contrast, if you're, you know, focused on a proprietary model, then you're kind of locked in again to that model. I think the same applies to data, sharing to data products, to a wide variety of, of aspects of the data landscape that a proprietary approach kind of closes you in and locks you in. >>So I, I would say this Richard, I'd love to get your thoughts on it. Cause I talked to a lot of Oracle customers, not as many te data customers, but, but a lot of Oracle customers and they, you know, they'll admit, yeah, you know, they're jamming us on price and the license cost they give, but we do get value out of it. And so my question to you, Richard, is, is do the, let's call it data warehouse systems or the proprietary systems. Are they gonna deliver a greater ROI sooner? And is that in allure of, of that customers, you know, are attracted to, or can open platforms deliver as fast in ROI? >>I think the answer to that is it can depend a bit. It depends on your businesses skillset. So we are lucky that we have a number of proprietary teams that work in databases that provide our operational data capability. And we have teams of analytics and big data experts who can work with open data sets and open data formats. And so for those different teams, they can get to an ROI more quickly with different technologies for the business though, we can't do better for our operational data stores than proprietary databases. Today we can back off very tight SLAs to them. We can demonstrate reliability from millions of hours of those databases being run at enterprise scale, but for an analytics workload where increasing our business is growing in that direction, we can't do better than open data formats with cloud-based data mesh type technologies. And so it's not a simple answer. That one will always be the right answer for our business. We definitely have times when proprietary databases provide a capability that we couldn't easily represent or replicate with open technologies. >>Yeah. Richard, stay with you. You mentioned, you know, you know, some things before that, that strike me, you know, the data brick snowflake, you know, thing is, oh, is a lot of fun for analysts like me. You've got data bricks coming at it. Richard, you mentioned you have a lot of rockstar, data engineers, data bricks coming at it from a data engineering heritage. You get snowflake coming at it from an analytics heritage. Those two worlds are, are colliding people like PJI Mohan said, you know what? I think it's actually harder to play in the data engineering. So I E it's easier to for data engineering world to go into the analytics world versus the reverse, but thinking about up and coming engineers and developers preparing for this future of data engineering and data analytics, how, how should they be thinking about the future? What, what's your advice to those young people? >>So I think I'd probably fall back on general programming skill sets. So the advice that I saw years ago was if you have open source technologies, the pythons and Javas on your CV, you commander 20% pay, hike over people who can only do proprietary programming languages. And I think that's true of data technologies as well. And from a business point of view, that makes sense. I'd rather spend the money that I save on proprietary licenses on better engineers, because they can provide more value to the business that can innovate us beyond our competitors. So I think I would my advice to people who are starting here or trying to build teams to capitalize on data assets is begin with open license, free capabilities, because they're very cheap to experiment with. And they generate a lot of interest from people who want to join you as a business. And you can make them very successful early, early doors with, with your analytics journey. >>It's interesting. Again, analysts like myself, we do a lot of TCO work and have over the last 20 plus years. And in world of Oracle, you know, normally it's the staff, that's the biggest nut in total cost of ownership, not an Oracle. It's the it's the license cost is by far the biggest component in the, in the blame pie. All right, Justin, help us close out this segment. We've been talking about this sort of data mesh open, closed snowflake data bricks. Where does Starburst sort of as this engine for the data lake data lake house, the data warehouse fit in this, in this world? >>Yeah. So our view on how the future ultimately unfolds is we think that data lakes will be a natural center of gravity for a lot of the reasons that we described open data formats, lowest total cost of ownership, because you get to choose the cheapest storage available to you. Maybe that's S3 or Azure data lake storage, or Google cloud storage, or maybe it's on-prem object storage that you bought at a, at a really good price. So ultimately storing a lot of data in a deal lake makes a lot of sense, but I think what makes our perspective unique is we still don't think you're gonna get everything there either. We think that basically centralization of all your data assets is just an impossible endeavor. And so you wanna be able to access data that lives outside of the lake as well. So we kind of think of the lake as maybe the biggest place by volume in terms of how much data you have, but to, to have comprehensive analytics and to truly understand your business and understand it holistically, you need to be able to go access other data sources as well. And so that's the role that we wanna play is to be a single point of access for our customers, provide the right level of fine grained access controls so that the right people have access to the right data and ultimately make it easy to discover and consume via, you know, the creation of data products as well. >>Great. Okay. Thanks guys. Right after this quick break, we're gonna be back to debate whether the cloud data model that we see emerging and the so-called modern data stack is really modern, or is it the same wine new bottle? When it comes to data architectures, you're watching the cube, the leader in enterprise and emerging tech coverage. >>Your data is capable of producing incredible results, but data consumers are often left in the dark without fast access to the data they need. Starers makes your data visible from wherever it lives. Your company is acquiring more data in more places, more rapidly than ever to rely solely on a data centralization strategy. Whether it's in a lake or a warehouse is unrealistic. A single source of truth approach is no longer viable, but disconnected data silos are often left untapped. We need a new approach. One that embraces distributed data. One that enables fast and secure access to any of your data from anywhere with Starburst, you'll have the fastest query engine for the data lake that allows you to connect and analyze your disparate data sources no matter where they live Starburst provides the foundational technology required for you to build towards the vision of a decentralized data mesh Starburst enterprise and Starburst galaxy offer enterprise ready, connectivity, interoperability, and security features for multiple regions, multiple clouds and everchanging global regulatory requirements. The data is yours. And with Starburst, you can perform analytics anywhere in light of your world. >>Okay. We're back with Justin Boardman. CEO of Starbust Richard Jarvis is the CTO of EMI health and Theresa tongue is the cloud first technologist from Accenture. We're on July number three. And that is the claim that today's modern data stack is actually modern. So I guess that's the lie it's it is it's is that it's not modern. Justin, what do you say? >>Yeah. I mean, I think new isn't modern, right? I think it's the, it's the new data stack. It's the cloud data stack, but that doesn't necessarily mean it's modern. I think a lot of the components actually are exactly the same as what we've had for 40 years, rather than Terra data. You have snowflake rather than Informatica you have five trend. So it's the same general stack, just, you know, a cloud version of it. And I think a lot of the challenges that it plagued us for 40 years still maintain. >>So lemme come back to you just, but okay. But, but there are differences, right? I mean, you can scale, you can throw resources at the problem. You can separate compute from storage. You really, you know, there's a lot of money being thrown at that by venture capitalists and snowflake, you mentioned it's competitors. So that's different. Is it not, is that not at least an aspect of, of modern dial it up, dial it down. So what, what do you say to that? >>Well, it, it is, it's certainly taking, you know, what the cloud offers and taking advantage of that, but it's important to note that the cloud data warehouses out there are really just separating their compute from their storage. So it's allowing them to scale up and down, but your data still stored in a proprietary format. You're still locked in. You still have to ingest the data to get it even prepared for analysis. So a lot of the same sort of structural constraints that exist with the old enterprise data warehouse model OnPrem still exist just yes, a little bit more elastic now because the cloud offers that. >>So Theresa, let me go to you cuz you have cloud first in your, in your, your title. So what's what say you to this conversation? >>Well, even the cloud providers are looking towards more of a cloud continuum, right? So the centralized cloud, as we know it, maybe data lake data warehouse in the central place, that's not even how the cloud providers are looking at it. They have news query services. Every provider has one that really expands those queries to be beyond a single location. And if we look at a lot of where our, the future goes, right, that that's gonna very much fall the same thing. There was gonna be more edge. There's gonna be more on premise because of data sovereignty, data gravity, because you're working with different parts of the business that have already made major cloud investments in different cloud providers. Right? So there's a lot of reasons why the modern, I guess, the next modern generation of the data staff needs to be much more federated. >>Okay. So Richard, how do you deal with this? You you've obviously got, you know, the technical debt, the existing infrastructure it's on the books. You don't wanna just throw it out. A lot of, lot of conversation about modernizing applications, which a lot of times is a, you know, a microservices layer on top of leg legacy apps. How do you think about the modern data stack? >>Well, I think probably the first thing to say is that the stack really has to include the processes and people around the data as well is all well and good changing the technology. But if you don't modernize how people use that technology, then you're not going to be able to, to scale because just cuz you can scale CPU and storage doesn't mean you can get more people to use your data, to generate you more, more value for the business. And so what we've been looking at is really changing in very much aligned to data products and, and data mesh. How do you enable more people to consume the service and have the stack respond in a way that keeps costs low? Because that's important for our customers consuming this data, but also allows people to occasionally run enormous queries and then tick along with smaller ones when required. And it's a good job we did because during COVID all of a sudden we had enormous pressures on our data platform to answer really important life threatening queries. And if we couldn't scale both our data stack and our teams, we wouldn't have been able to answer those as quickly as we had. So I think the stack needs to support a scalable business, not just the technology itself. >>Well thank you for that. So Justin let's, let's try to break down what the critical aspects are of the modern data stack. So you think about the past, you know, five, seven years cloud obviously has given a different pricing model. De-risked experimentation, you know that we talked about the ability to scale up scale down, but it's, I'm, I'm taking away that that's not enough based on what Richard just said. The modern data stack has to serve the business and enable the business to build data products. I, I buy that. I'm a big fan of the data mesh concepts, even though we're early days. So what are the critical aspects if you had to think about, you know, paying, maybe putting some guardrails and definitions around the modern data stack, what does that look like? What are some of the attributes and, and principles there >>Of, of how it should look like or, or how >>It's yeah. What it should be. >>Yeah. Yeah. Well, I think, you know, in, in Theresa mentioned this in, in a previous segment about the data warehouse is not necessarily going to disappear. It just becomes one node, one element of the overall data mesh. And I, I certainly agree with that. So by no means, are we suggesting that, you know, snowflake or Redshift or whatever cloud data warehouse you may be using is going to disappear, but it's, it's not going to become the end all be all. It's not the, the central single source of truth. And I think that's the paradigm shift that needs to occur. And I think it's also worth noting that those who were the early adopters of the modern data stack were primarily digital, native born in the cloud young companies who had the benefit of, of idealism. They had the benefit of it was starting with a clean slate that does not reflect the vast majority of enterprises. >>And even those companies, as they grow up mature out of that ideal state, they go buy a business. Now they've got something on another cloud provider that has a different data stack and they have to deal with that heterogeneity that is just change and change is a part of life. And so I think there is an element here that is almost philosophical. It's like, do you believe in an absolute ideal where I can just fit everything into one place or do I believe in reality? And I think the far more pragmatic approach is really what data mesh represents. So to answer your question directly, I think it's adding, you know, the ability to access data that lives outside of the data warehouse, maybe living in open data formats in a data lake or accessing operational systems as well. Maybe you want to directly access data that lives in an Oracle database or a Mongo database or, or what have you. So creating that flexibility to really Futureproof yourself from the inevitable change that you will, you won't encounter over time. >>So thank you. So there, based on what Justin just said, I, my takeaway there is it's inclusive, whether it's a data Mar data hub, data lake data warehouse, it's a, just a node on the mesh. Okay. I get that. Does that include there on Preem data? O obviously it has to, what are you seeing in terms of the ability to, to take that data mesh concept on Preem? I mean, most implementations I've seen in data mesh, frankly really aren't, you know, adhering to the philosophy. They're maybe, maybe it's data lake and maybe it's using glue. You look at what JPMC is doing. Hello, fresh, a lot of stuff happening on the AWS cloud in that, you know, closed stack, if you will. What's the answer to that Theresa? >>I mean, I, I think it's a killer case for data. Me, the fact that you have valuable data sources, OnPrem, and then yet you still wanna modernize and take the best of cloud cloud is still, like we mentioned, there's a lot of great reasons for it around the economics and the way ability to tap into the innovation that the cloud providers are giving around data and AI architecture. It's an easy button. So the mesh allows you to have the best of both worlds. You can start using the data products on-prem or in the existing systems that are working already. It's meaningful for the business. At the same time, you can modernize the ones that make business sense because it needs better performance. It needs, you know, something that is, is cheaper or, or maybe just tap into better analytics to get better insights, right? So you're gonna be able to stretch and really have the best of both worlds. That, again, going back to Richard's point, that is meaningful by the business. Not everything has to have that one size fits all set a tool. >>Okay. Thank you. So Richard, you know, talking about data as product, wonder if we could give us your perspectives here, what are the advantages of treating data as a product? What, what role do data products have in the modern data stack? We talk about monetizing data. What are your thoughts on data products? >>So for us, one of the most important data products that we've been creating is taking data that is healthcare data across a wide variety of different settings. So information about patients' demographics about their, their treatment, about their medications and so on, and taking that into a standards format that can be utilized by a wide variety of different researchers because misinterpreting that data or having the data not presented in the way that the user is expecting means that you generate the wrong insight. And in any business, that's clearly not a desirable outcome, but when that insight is so critical, as it might be in healthcare or some security settings, you really have to have gone to the trouble of understanding the data, presenting it in a format that everyone can clearly agree on. And then letting people consume in a very structured, managed way, even if that data comes from a variety of different sources in, in, in the first place. And so our data product journey has really begun by standardizing data across a number of different silos through the data mesh. So we can present out both internally and through the right governance externally to, to researchers. >>So that data product through whatever APIs is, is accessible, it's discoverable, but it's obviously gotta be governed as well. You mentioned you, you appropriately provided to internally. Yeah. But also, you know, external folks as well. So the, so you've, you've architected that capability today >>We have, and because the data is standard, it can generate value much more quickly and we can be sure of the security and, and, and value that that's providing because the data product isn't just about formatting the data into the correct tables, it's understanding what it means to redact the data or to remove certain rows from it or to interpret what a date actually means. Is it the start of the contract or the start of the treatment or the date of birth of a patient? These things can be lost in the data storage without having the proper product management around the data to say in a very clear business context, what does this data mean? And what does it mean to process this data for a particular use case? >>Yeah, it makes sense. It's got the context. If the, if the domains own the data, you, you gotta cut through a lot of the, the, the centralized teams, the technical teams that, that data agnostic, they don't really have that context. All right. Let's send Justin, how does Starburst fit into this modern data stack? Bring us home. >>Yeah. So I think for us, it's really providing our customers with, you know, the flexibility to operate and analyze data that lives in a wide variety of different systems. Ultimately giving them that optionality, you know, and optionality provides the ability to reduce costs, store more in a data lake rather than data warehouse. It provides the ability for the fastest time to insight to access the data directly where it lives. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, you can really create and, and curate, you know, data as a product to be shared and consumed. So we're trying to help enable the data mesh, you know, model and make that an appropriate compliment to, you know, the, the, the modern data stack that people have today. >>Excellent. Hey, I wanna thank Justin Theresa and Richard for joining us today. You guys are great. I big believers in the, in the data mesh concept, and I think, you know, we're seeing the future of data architecture. So thank you. Now, remember, all these conversations are gonna be available on the cube.net for on-demand viewing. You can also go to starburst.io. They have some great content on the website and they host some really thought provoking interviews and, and, and they have awesome resources, lots of data mesh conversations over there, and really good stuff in, in the resource section. So check that out. Thanks for watching the data doesn't lie or does it made possible by Starburst data? This is Dave Valante for the cube, and we'll see you next time. >>The explosion of data sources has forced organizations to modernize their systems and architecture and come to terms with one size does not fit all for data management today. Your teams are constantly moving and copying data, which requires time management. And in some cases, double paying for compute resources. Instead, what if you could access all your data anywhere using the BI tools and SQL skills your users already have. And what if this also included enterprise security and fast performance with Starburst enterprise, you can provide your data consumers with a single point of secure access to all of your data, no matter where it lives with features like strict, fine grained, access control, end to end data encryption and data masking Starburst meets the security standards of the largest companies. Starburst enterprise can easily be deployed anywhere and managed with insights where data teams holistically view their clusters operation and query execution. So they can reach meaningful business decisions faster, all this with the support of the largest team of Trino experts in the world, delivering fully tested stable releases and available to support you 24 7 to unlock the value in all of your data. You need a solution that easily fits with what you have today and can adapt to your architecture. Tomorrow. Starbust enterprise gives you the fastest path from big data to better decisions, cuz your team can't afford to wait. Trino was created to empower analytics anywhere and Starburst enterprise was created to give you the enterprise grade performance, connectivity, security management, and support your company needs organizations like Zolando Comcast and FINRA rely on Starburst to move their businesses forward. Contact us to get started.

Published Date : Aug 22 2022

SUMMARY :

famously said the best minds of my generation are thinking about how to get people to the data warehouse ever have featured parody with the data lake or vice versa is So, you know, despite being the industry leader for 40 years, not one of their customers truly had So Richard, from a practitioner's point of view, you know, what, what are your thoughts? although if you were starting from a Greenfield site and you were building something brand new, Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, I, I think you gotta have centralized governance, right? So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, And you can think of them Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, you know, for many, many years to come. But I think the reality is, you know, the data mesh model basically says, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing that the mesh actually allows you to use all of them. But it creates what I would argue are two, you know, Well, it absolutely depends on some of the tooling and processes that you put in place around those do an analytic queries and with data that's all dispersed all over the, how are you seeing your the best to, to create, you know, data as a product ultimately to be consumed. open platforms are the best path to the future of data But what if you could spend less you create a single point of access to your data, no matter where it's stored. give you the performance and control that you can get with a proprietary system. I remember in the very early days, people would say, you you'll never get performance because And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, know it takes six or seven it is an evolving, you know, spectrum, but, but from your perspective, And what you don't want to end up So Jess, let me play devil's advocate here a little bit, and I've talked to Shaak about this and you know, And I think similarly, you know, being able to connect to an external table that lives in an open data format, Well, that's interesting reminded when I, you know, I see the, the gas price, And I think, you know, I loved what Richard said. not as many te data customers, but, but a lot of Oracle customers and they, you know, And so for those different teams, they can get to an ROI more quickly with different technologies that strike me, you know, the data brick snowflake, you know, thing is, oh, is a lot of fun for analysts So the advice that I saw years ago was if you have open source technologies, And in world of Oracle, you know, normally it's the staff, easy to discover and consume via, you know, the creation of data products as well. really modern, or is it the same wine new bottle? And with Starburst, you can perform analytics anywhere in light of your world. And that is the claim that today's So it's the same general stack, just, you know, a cloud version of it. So lemme come back to you just, but okay. So a lot of the same sort of structural constraints that exist with So Theresa, let me go to you cuz you have cloud first in your, in your, the data staff needs to be much more federated. you know, a microservices layer on top of leg legacy apps. So I think the stack needs to support a scalable So you think about the past, you know, five, seven years cloud obviously has given What it should be. And I think that's the paradigm shift that needs to occur. data that lives outside of the data warehouse, maybe living in open data formats in a data lake seen in data mesh, frankly really aren't, you know, adhering to So the mesh allows you to have the best of both worlds. So Richard, you know, talking about data as product, wonder if we could give us your perspectives is expecting means that you generate the wrong insight. But also, you know, around the data to say in a very clear business context, It's got the context. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, This is Dave Valante for the cube, and we'll see you next time. You need a solution that easily fits with what you have today and can adapt

ENTITIES

Entity	Category	Confidence
Richard	PERSON	0.99+
Dave Lanta	PERSON	0.99+
Jess Borgman	PERSON	0.99+
Justin	PERSON	0.99+
Theresa	PERSON	0.99+
Justin Borgman	PERSON	0.99+
Teresa	PERSON	0.99+
Jeff Ocker	PERSON	0.99+
Richard Jarvis	PERSON	0.99+
Dave Valante	PERSON	0.99+
Justin Boardman	PERSON	0.99+
six	QUANTITY	0.99+
Dani	PERSON	0.99+
Massachusetts	LOCATION	0.99+
20 cents	QUANTITY	0.99+
Teradata	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Jamma	PERSON	0.99+
UK	LOCATION	0.99+
FINRA	ORGANIZATION	0.99+
40 years	QUANTITY	0.99+
Kurt Monash	PERSON	0.99+
20%	QUANTITY	0.99+
two	QUANTITY	0.99+
five	QUANTITY	0.99+
Jess	PERSON	0.99+
2011	DATE	0.99+
Starburst	ORGANIZATION	0.99+
10	QUANTITY	0.99+
Accenture	ORGANIZATION	0.99+
seven years	QUANTITY	0.99+
thousands	QUANTITY	0.99+
pythons	TITLE	0.99+
Boston	LOCATION	0.99+
GDPR	TITLE	0.99+
Today	DATE	0.99+
two models	QUANTITY	0.99+
Zolando Comcast	ORGANIZATION	0.99+
Gemma	PERSON	0.99+
Starbust	ORGANIZATION	0.99+
JPMC	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
Javas	TITLE	0.99+
today	DATE	0.99+
AWS	ORGANIZATION	0.99+
millions	QUANTITY	0.99+
first lie	QUANTITY	0.99+
10	DATE	0.99+
12 years	QUANTITY	0.99+
one place	QUANTITY	0.99+
Tomorrow	DATE	0.99+

Starburst The Data Lies FULL V1

Published Date : Aug 20 2022

SUMMARY :

ENTITIES

Entity	Category	Confidence
Richard	PERSON	0.99+
Dave Lanta	PERSON	0.99+
Jess Borgman	PERSON	0.99+
Justin	PERSON	0.99+
Theresa	PERSON	0.99+
Justin Borgman	PERSON	0.99+
Teresa	PERSON	0.99+
Jeff Ocker	PERSON	0.99+
Richard Jarvis	PERSON	0.99+
Dave Valante	PERSON	0.99+
Justin Boardman	PERSON	0.99+
six	QUANTITY	0.99+
Dani	PERSON	0.99+
Massachusetts	LOCATION	0.99+
20 cents	QUANTITY	0.99+
Teradata	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Jamma	PERSON	0.99+
UK	LOCATION	0.99+
FINRA	ORGANIZATION	0.99+
40 years	QUANTITY	0.99+
Kurt Monash	PERSON	0.99+
20%	QUANTITY	0.99+
two	QUANTITY	0.99+
five	QUANTITY	0.99+
Jess	PERSON	0.99+
2011	DATE	0.99+
Starburst	ORGANIZATION	0.99+
10	QUANTITY	0.99+
Accenture	ORGANIZATION	0.99+
seven years	QUANTITY	0.99+
thousands	QUANTITY	0.99+
pythons	TITLE	0.99+
Boston	LOCATION	0.99+
GDPR	TITLE	0.99+
Today	DATE	0.99+
two models	QUANTITY	0.99+
Zolando Comcast	ORGANIZATION	0.99+
Gemma	PERSON	0.99+
Starbust	ORGANIZATION	0.99+
JPMC	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
Javas	TITLE	0.99+
today	DATE	0.99+
AWS	ORGANIZATION	0.99+
millions	QUANTITY	0.99+
first lie	QUANTITY	0.99+
10	DATE	0.99+
12 years	QUANTITY	0.99+
one place	QUANTITY	0.99+
Tomorrow	DATE	0.99+

Starburst panel Q3

>>Okay. We're back with Justin Boorman CEO of Starburst. Richard Jarvis is the CTO of EMI health and Teresa tongue is the cloud first technologist from Accenture. We're on July number three. And that is the claim that today's modern data stack is actually modern. So I guess that's the lie or it's it is it's is that it's not modern, Justin, what do you say? >>Yeah, I mean, I think new isn't modern, right? I think it's, the's the new data stack. It's the cloud data stack, but that doesn't necessarily mean it's modern. I think a lot of the components actually are exactly the same as what we've had for 40 years, rather than Terra data. You have snowflake rather than Informatica you have five trend. So it's the same general stack, just, you know, a cloud version of it. And I think a lot of the challenges that it plagued us for 40 years still maintain. >>So lemme come back to you just this, but okay. But, but there are differences, right? I mean, you can scale, you can throw resources at the problem. You can separate compute from storage. You really, you know, there's a lot of money being thrown at that by venture capitalists and snowflake, you mentioned it's competitors. So that's different. Is it not, is that not at least an aspect of, of modern dial it up, dial it down. So what, what do you say to that? >>Well, it, it is, it's certainly taking, you know, what the cloud offers and taking advantage of that, but it's important to note that the cloud data warehouses out there are really just separating their compute from their storage. So it's allowing them to scale up and down, but your data's still stored in a proprietary format. You're still locked in. You still have to ingest the data to get it even prepared for analysis. So a lot of the same sort of structural constraints that exist with the old enterprise data warehouse model OnPrem still exists just, yes, a little bit more elastic now because the cloud offers that. >>So Theresa, let me go to you cuz you have cloud first in your, in your, your title. So what's what say you to this conversation? >>Well, even the cloud providers are looking towards more of a cloud continuum, right? So the centralized cloud, as we know it, maybe data lake data warehouse in the central place, that's not even how the cloud providers are looking at it. They have news query services. Every provider has one that really expands those queries to be beyond a single location. And if we look at a lot of where our, the future goes, right, that that's gonna very much fall the same thing. There was gonna be more edge. There's gonna be more on premise because of data sovereignty, data gravity, because you're working with different parts of the business that have already made major cloud investments in different cloud providers. Right? So there's a lot of reasons why the modern, I guess the next modern generation of the data staff needs to be much more federated. >>Okay. So Richard, how do you deal with this? You you've obviously got, you know, the technical debt, the existing infrastructure it's on the books. You don't wanna just throw it out. A lot of, lot of conversation about modernizing applications, which a lot of times is a, you know, of microservices layer on top of leg legacy apps. Ho how do you think about the modern data stack? >>Well, I think probably the first thing to say is that the stack really has to include the processes and people around the data as well is all well and good changing the technology. But if you don't modernize how people use that technology, then you're not going to be able to, to scale because just cuz you can scale CPU and storage doesn't mean you can get more people to use your data, to generate you more value for the business. And so what we've been looking at is really changing in very much aligned to data products and, and data mesh. How do you enable more people to consume the service and have the stack respond in a way that keeps costs low? Because that's important for our customers consuming this data, but also allows people to occasionally run enormous queries and then tick along with smaller ones when required. And it's a good job we did because during COVID all of a sudden we had enormous pressures on our data platform to answer really important life threatening queries. And if we couldn't scale both our data stack and our teams, we wouldn't have been able to answer those as quickly as we had. So I think the stack needs to support a scalable business, not just the technology itself. >>Oh thank you for that. So Justin let's, let's try to break down what the critical aspects are of the modern data stack. So you think about the past, you know, five, seven years cloud obviously has given a different pricing model. Drisk experimentation, you know that we talked about the ability to scale up scale down, but it's, I'm, I'm taking away that that's not enough based on what Richard just said. The modern data stack has to serve the business and enable the business to build data products. I, I buy that I'm, you know, a big fan of the data mesh concepts, even though we're early days. So what are the critical aspects if you had to think about, you know, the paying, maybe putting some guardrails and definitions around the modern data stack, what does that look like? What are some of the attributes and principles there >>Of, of how it should look like or, or how >>Yeah. What it should be? >>Yeah. Yeah. Well, I think, you know, in Theresa mentioned this in, in a previous segment about the data warehouse is not necessarily going to disappear. It just becomes one node, one element of the overall data mesh. And I, I certainly agree with that. So by no means, are we suggesting that, you know, snowflake or Redshift or whatever cloud data warehouse you may be using is going to disappear, but it's, it's not going to become the end all be all. It's not the, the central single source of truth. And I think that's the paradigm shift that needs to occur. And I think it's also worth noting that those who were the early adopters of the modern data stack were primarily digital, native born in the cloud young companies who had the benefit of, of idealism. They had the benefit of starting with a clean slate that does not reflect the vast majority of enterprises. >>And even those companies, as they grow up mature out of that ideal state, they go by a business. Now they've got something on another cloud provider that has a different data stack and they have to deal with that heterogeneity that is just change and change is a part of life. And so I think there is an element here that is almost philosophical. It's like, do you believe in an absolute ideal where I can just fit everything into one place or do I believe in reality? And I think the far more pragmatic approach is really what data mesh represents. So to answer your question directly, I think it's adding, you know, the ability to access data that lives outside of the data warehouse, maybe living in open data formats in a data lake or accessing operational systems as well. Maybe you want to directly access data that lives in an Oracle database or a Mongo database or, or what have you. So creating that flexibility to really Futureproof yourself from the inevitable change that you will, you won't encounter over time. >>So thank you. So there, based on what Justin just said, I, I might take away there is it's inclusive, whether it's a data Mart, data hub, data lake data warehouse, it's a, just a node on the mesh. Okay. I get that. Does that include Theresa on, on Preem data? Obviously it has to, what are you seeing in terms of the ability to, to take that data mesh concept on pre I mean most implementations I've seen and data mesh, frankly really aren't, you know, adhering to the philosophy there. Maybe, maybe it's data lake and maybe it's using glue. You look at what JPMC is doing. Hello, fresh, a lot of stuff happening on the AWS cloud in that, you know, closed stack, if you will. What's the answer to that Theresa? >>I mean, I, I think it's a killer case for data mesh. The fact that you have valuable data sources, OnPrem, and then yet you still wanna modernize and take the best of cloud cloud is still, like we mentioned, there's a lot of great reasons for it around the economics and the way ability to tap into the innovation that the cloud providers are giving around data and AI architecture. It's an easy button. So the mesh allows you to have the best of both world. You can start using the data products on-prem or in the existing systems that are working already. It's meaningful for the business. At the same time, you can modernize the ones that make business sense because it needs better performance. It needs, you know, something that is, is cheaper or, or maybe just tap into better analytics to get better insights, right? So you're gonna be able to stretch and really have the best of both worlds that, again, going back to Richard's point, that is needful by the business. Not everything has to have that one size fits all set a tool. >>Okay. Thank you. So Richard, you know, you're talking about data as product. Wonder if we could give us your perspectives here, what are the advantages of treating data as a product? What, what role do data products have in the modern data stack? We talk about monetizing data. What are your thoughts on data products? >>So for us, one of the most important data products that we've been creating is taking data that is healthcare data across a wide variety of different settings. So information about patients' demographics about their, their treatment, about their medications and so on, and taking that into a standards format that can be utilized by a wide variety of different researchers because misinterpreting that data or having the data not presented in the way that the user is expecting means that you generate the wrong insight and in any business, that's clearly not a desirable outcome, but when that insight is so critical, as it might be in healthcare or some security settings, you really have to have gone to the trouble of understanding the data, presenting it in a format that everyone can clearly agree on. And then letting people consume in a very structured and managed way, even if that data comes from a variety of different sources in, in, in the first place. And so our data product journey has really begun by standardizing data across a number of different silos through the data mesh. So we can present out both internally and through the right governance externally to, to research is >>So that data product through whatever APIs is, is accessible, it's discoverable, but it's obviously gotta be governed as well. You mentioned appropriately provided to internally. Yeah. But also, you know, external folks as well. So the, so you've, you've architected that capability today >>We have and because the data is standard, it can generate value much more quickly and we can be sure of the security and, and, and value that that's providing because the data product isn't just about formatting the data into the right, correct tables, it's understanding what it means to redact the data or to remove certain rows from it or to interpret what a date actually means. Is it the start of the contract or the start of the treatment or the date of birth of a patient? These things can be lost in the data storage without having the proper product management around the data to say in a very clear business context, what does this data mean? And what does it mean to process this data for a particular use >>Case? Yeah, it makes sense. It's got the context. If the, if the domains on the data, you, you gotta cut through a lot of the, the, the centralized teams, the technical teams that, that data agnostic, they don't really have that context. All right. Let's end, Justin, how does Starburst fit into this modern data stack? Bring us home. >>Yeah. So I think for us, it's really providing our customers with, you know, the flexibility to operate and analyze data that lives in a wide variety of different systems. Ultimately giving them that optionality, you know, and optionality provides the ability to reduce costs, store more in a data lake rather than data warehouse. It provides the ability for the fastest time to insight to access the data directly where it lives. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, you can really create and, and curate, you know, data as a product to be shared and consumed. So we're trying to help enable the data mesh, you know, model and make that an appropriate compliment to, you know, the, the, the modern data stack that people have today. >>Excellent. Hey, I wanna thank Justin Teresa and Richard for joining us today. You guys are great. I big believers in the, in the data mesh concept, and I think, you know, we're seeing the future of data architecture. So thank you. Now, remember, all these conversations are gonna be available on the cube.net for on-demand viewing. You can also go to starburst.io. They have some great content on the website and they host some really thought provoking interviews and, and, and they have awesome resources, lots of data mesh conversations over there, and really good stuff in, in the resource section. So check that out. Thanks for watching the data doesn't lie or does it made possible by Starburst data? This is Dave ante for the, and we'll see you next time.

Published Date : Aug 2 2022

SUMMARY :

And that is the claim that today's So it's the same general stack, So lemme come back to you just this, but okay. So a lot of the same sort of structural So Theresa, let me go to you cuz you have cloud first in your, in your, So the centralized cloud, as we know it, maybe data lake data warehouse in the central place, a, you know, of microservices layer on top of leg legacy apps. you can get more people to use your data, to generate you more value for the business. So you think about the past, you know, five, seven years cloud obviously has given And I think that's the paradigm shift that needs to occur. from the inevitable change that you will, you won't encounter over time. and data mesh, frankly really aren't, you know, adhering to So the mesh allows you to have the best of both world. So Richard, you know, you're talking about data as product. that data or having the data not presented in the way that the user But also, you know, external folks as well. the proper product management around the data to say in a very clear business It's got the context. So we're trying to help enable the data mesh, you know, I big believers in the, in the data mesh concept, and I think, you know,

ENTITIES

Entity	Category	Confidence
Richard	PERSON	0.99+
Theresa	PERSON	0.99+
Richard Jarvis	PERSON	0.99+
Justin	PERSON	0.99+
Justin Boorman	PERSON	0.99+
Dave	PERSON	0.99+
AWS	ORGANIZATION	0.99+
five	QUANTITY	0.99+
40 years	QUANTITY	0.99+
Starburst	ORGANIZATION	0.99+
Accenture	ORGANIZATION	0.99+
40 years	QUANTITY	0.99+
JPMC	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Justin Teresa	PERSON	0.99+
both worlds	QUANTITY	0.99+
today	DATE	0.98+
first thing	QUANTITY	0.98+
Teresa	PERSON	0.98+
first technologist	QUANTITY	0.98+
Oracle	ORGANIZATION	0.98+
first	QUANTITY	0.98+
one element	QUANTITY	0.97+
Informatica	ORGANIZATION	0.97+
cube.net	OTHER	0.97+
Mongo	ORGANIZATION	0.97+
starburst.io	OTHER	0.96+
seven years	QUANTITY	0.95+
one	QUANTITY	0.95+
data Mart	ORGANIZATION	0.91+
one place	QUANTITY	0.88+
both world	QUANTITY	0.85+
COVID	TITLE	0.83+
single location	QUANTITY	0.8+
OnPrem	ORGANIZATION	0.8+
Terra	ORGANIZATION	0.77+
single source	QUANTITY	0.74+
one size	QUANTITY	0.73+
EMI health	ORGANIZATION	0.73+
July number	DATE	0.7+
data	ORGANIZATION	0.64+
five trend	QUANTITY	0.63+
money	QUANTITY	0.51+
three	QUANTITY	0.37+

Rik Tamm Daniels, Informatica & Peter Ku, Informatica | Snowflake Summit 2022

>>Hey everyone. Welcome back to the cube. Lisa Martin here with Dave ante, we're covering snowflake summit 22. This is Dave two of our wall to wall cube coverage of three days. We've been talking with a lot of customers partners, and we've got some more partners to talk with us. Next. Informatica two of our guests are back with us on the program. Rick TA Daniels joins us the G P global ecosystems and technology at Informatica and Peter COO vice president and chief strategist banking and financial services. Welcome guys. >>Thank you guys. Thanks for having us, Peter, >>Talk to us about what some of the trends are that you're seeing in the financial services space with respect to cloud and data and AI. >>Absolutely. You know, I'd say 10 years ago, the conversation around cloud was what is that? Right? How do we actually, or no way, because there was a lot of concerns about privacy and security and so forth. You know, now, as you see organizations modernizing their business capabilities, they're investing in cloud solutions for analytics applications, as well as data data being not only just a byproduct of transactions and interactions in financial services, it truly fuels business success. But we have a term here in Informatica where data really has no value unless it's fit for business. Use data has to be accessible in the systems and applications you use to run your business. It has to be clean. It has to be valid. It has to be transparent. People need to understand where it comes from, where it's going, how it's used and who's using it. It also has to be understood by the business. >>You can have all the data in the world and your business applications, but people don't know what they need it to use it for how they should use it. It has no value as well. And then lastly, it has to be protected when it matters most what we're seeing across financial services, that with the evolution of cloud now, really being the center of focus for many of the net new investments, data is scattered everywhere, not just in one cloud environment, but in multiple cloud environments, but they're still dealing with many of the on premise systems that have been running this industry for many, many years. So organizations need to have the ability to understand what they need to do with their data. More importantly, tie that to a measurable business outcome. So we're seeing the data conversation really at the board level, right? It's an asset of the business. It's no longer just owned by it. Data governance brings both business technology and data leaders together to really understand how do we use manage, govern and really leverage data for positive business outcomes. So we see that as an imperative that cuts across all sectors of financial services, both for large firms, as well as for the mid-market so >>Quick follow up. If I, may you say it's a board level. I totally agree. Is it also a line of business level? Are you seeing increasingly that line of businesses are leaning in owning the data, be building data products and the like >>Absolutely. Because at the end of the day business needs information in order to be successful. And data ownership now really belongs in the front office. Business executives understand that data again is not just a bunch of zeros and ones. These are critical elements for them make decisions and to run their business, whether it's to improve customer experience, whether it's to grow Wallace share, whether it's to comply with regulations, manage risks in today's environment. And of course being agile business knows that data's important. They have ownership of it and technology and data organizations help facilitate that solutions. And of course the investments to ensure that business can make the decisions and take the appropriate actions. >>A lot of asks and requirements on data. That's a big challenge for organizations. You mentioned. Well, one of the things that we've mentioned many times on this program recently is every company has to be a data company. There is no more, it's not an option anymore. If you wanna be successful, how does Informatica help customers navigate all of the requirements on data for them to be able to extract that business value and create new products and services in a timely fashion? >>So Informatica announced what we call the intelligent data management cloud platform. The platform has capabilities to help organizations access the data that they need, share it across to applications that run their business, be able to identify and deal with data, quality issues and requirements. Being able to provide that transparency, the lineage that people need across multiple environments. So we've been investing in this platform that really allows our customers to take advantage of these critical data management, data governance and data privacy requirements, all in one single solution. So we're no longer out there just selling piecemeal products. The platform is the offering that we provide across all industries. >>So how has that affected the way Informatica does business over the last several years? Snowflake is relatively new. You guys have been around a long time. How has your business evolved and specifically, how are you serving the snowflake yeah. Joint customers with >>Informatica? Yeah, I think then when I've been talking with folks here at the event, there are two big areas that keep coming up. So, so data governance, data governance, data governance, right? It's such a hot topic out there. And as Peter was mentioning, data governance is a critical enabler of access to data. In fact, there is an IDC study for last year that said that, you know, 80, 84% of executives, you know, no surprise, right? They wanna have data driven outcomes, data driven organizations, but only 30% of practitioners actually use data to make decisions. There's a huge gap there. And really that's where governance comes in and creating trust around data and not only creating trust, but delivering data to and users. So that's one big trend. The other one is departmental user adoption. We're seeing a, a huge push towards agility and rapid startup of new projects, new data driven transformations that are happening at the departmental level, you know, individual contributors, that sort of thing. So Informatica, we did a made announcement yesterday with snowflake of a whole host of innovations that are really targeting those two big trend areas. >>I wanna get into the announcements, but you know, the point about governance and, and users, business users being reluctant, it's kind of chicken and egg, isn't it. If, if I don't have the governance, I'm, I'm afraid to use it. But even if I do have it, there's the architecture of my, my, my company, my, my data organization, you know, may not facilitate that. And so I'm gonna change the architect, but then it's a wild west. So it has to be governed. Isn't that a challenge that company companies >>Absolutely, and, and governance is, is a lot more than just technology, right? It's of a people process problem. And there really is a community or an ecosystem inside every organization for governance. So it's really important that when you think about deploying governance and being successful, that every stakeholder have the ability to interact with this common framework, right. They get what they need out of it. It's tailored for how they wanna work. You've got your it folks, you got your chief data officer data stewards, you have your privacy folks and you have your business users. They're all different personas. So we really focus on creating a holistic, single pane of glass view with our cloud data governance and catalog offering that that really takes all the way from the raw technical data and actually delivers data in, in a shopping cart, like experience for actual enterprise users. Right? And, and so I think that's when data governance goes from historically data, governments was seen as an impediment. It was seen as a tax, I think, but now it's really an accelerator, an enabler and driving consumption of data, which in turn for our friends here at snowflake is exactly what they're looking for. >>Talk about the news. So data loader, what does that do? >>Well, it's all in the name. We say, no, the data loader it, it's a free utility that we announced here at, at snowflake summit that allows any user to sign up. It's completely free, no capacity limits. You just need an email address, three simple steps start rapidly loading data into snowflake. Right? So that first step is just get data in there. Start working with snowflake. Informatica is investing and making that easy for every single user out there. And especially those departmental users who wanna get started quickly. >>Yeah. So, I mean, that's a key part point of getting data into the snowflake data cloud, right? It's like any cloud, you gotta get data in. How does it work with, with customers? I mean, you guys are, are known, you have a long history of, you know, extract transform ETL. How does it work in the snowflake world? Is it, is it different? Is it, you remember the Hadoop days? It was, it was E LT, right? How are customers doing that today in this environment? >>Yeah, it's different. I mean, there, there are a lot of the, the same patterns are still in play. There's a lot more of a rapid data loading, right. Is a key theme. Just get it into snowflake and then work on the data, transform it inside of snowflake. So it's, it's a flavor of T right. But it's really pushing down to the snowflake data cloud as opposed to Hado with spark or something like that. Right. So that, that's definitely how customers are using it. And, you know, majority of our customers actually with snowflake are using our cloud technology, but we're also helping customers who are on premise customers, automate the migration from our on-premises technology to our cloud native platform as well. Yeah. >>And I'd say, you know, in addition to that, if you think about building a snowflake environment, Informatica helps with our data loader solution, but that's not enough. Then now you need to get value out of your data. So you can put raw data into the snowflake environment, but then you realize the data's not actually fit for business use, what do we need to do actually transform it to clean it, to govern it. And our customers that use Informatica with snowflake are managing the entire data management and data governance process so that they can allow the business to get value out of the snowflake investment. >>How quickly can you enable a business to get value from that data to be able to make business decisions that can transform right. Deliver competitive advantage? >>I think it really depends on an organization on a case by case basis. At the end of the day, you need to understand why are you doing this in the first place, right? What's the business outcome that you're trying to achieve next, identify what data elements do you actually need to capture, govern and manage in order to support the decisions and the actions that the business needs to take. If you don't have those things defined, that's where data governance comes into play. Then all you're doing is setting up a technical environment with a bunch of zeros in ones that no one knows what to do with. So we talk about data governance more holistically, say, you need to align it to your business outcomes, but ensure that you have people, processes, roles, and responsibilities, and the underlying technology to not just load data into snowflake, but to leverage it again for the business needs across the organization. >>Oh, good, please. >>I just wanted to add to that real quickly. Yeah. One of the things Informatica we're philosophically focused on is how do you accelerate the entire business of data management? So with our, our cloud platform, we have what's called our clear AI engine, right? So we use AI techniques, machine learning recommendations to accelerate with the, the knowledge of the metadata of what's gone on the organization. For example, that when we discover data assets figure out is this customer data, is it product data that dramatically shortens the time to find data assets deliver them? And so across our whole portfolio, we're taking things that were traditionally months to do. We're taking 'em down to weeks and days and even hours, right? So that's the whole goal is just accelerate that entire journey and life cycle through cloud native approaches and AI. Yeah, >>You kind of just answered my question. I think Rick, so you have this joint value statement together. We help customers. This is informatic and snowflake together. We help customers modernize their data. Architecture enable the most critical workloads, provide AI driven data governance and accelerate added value with advanced analytics. I mean, you definitely touched on some of those, but kind of unpack the rest of that. What do you mean by modernize? What is their data architecture? What is that? Let's start there. What does that look like? Modernizing a data. Yeah. >>So, so a lot with so many customers, right? They, they built data warehouses, core data and analytics systems on premises, right? They're using ETL technology using those, those either warehouse, appliances or databases. And what they're looking for is they wanna move to a cloud native model, right. And all the benefits of cloud in terms of TCO elasticity, instant scale up agility, all those benefits. So we're looking, we're looking to do with our, our modernization programs for our, for our current customer base that are on premises. We automate the process to get them to a fully cloud native, which means they can now do hybrid. They can do multi-cloud elastic processing. And it's all also in a consumption based model that we introduced about about a year and a half ago. So, so they're looking for all those elements of a cloud native platform and they're, but they're solving the same problems, right? We still have to connect data. We still have to transform data, prepare it, cleanse it, all those things exist, but in a, in a cloud native footprint, and that's what we're helping them get to. >>And the modern architecture these days, quite honestly, it's no longer about getting best breed tools and stitching them together and hoping that it will actually work. And Informatica is value proposition that our platform has all those capabilities as services. So our customers don't have to deal with the costs and the risks of trying to make everything work behind the scenes and what we've done with IDMC or intelligent data management cloud for financial services, retail, CPG, and healthcare and life sciences. In addition to our core capabilities and our clear AI machine learning engine, we also have industry accelerators, prebuilt data, quality rules for certain regulations in within banking. We've got master data management, customer models for healthcare insurance industry, all prebuilt. So these are accelerators that we've actually built over the years. And we're now making available to our customers who adopt informatic as intelligent data management cloud for their data management and governance needs. >>And then, and then the other part of this statement that that's interesting is provide AI driven data governance. You know, we are seeing a move toward, you know, decentralized data architectures and, and, and organizations. And we talk to snowflake about that. They go, yeah, we're globally distributed cloud. Okay, great. So that's decent place, but what we see a lot of customers doing to say, okay, we're gonna give lines of business responsibility for data. We're gonna argue about who owns what. And then once we settle that here's your own, here's your own data lake. Maybe they they'll try to cobble together a catalog or a super catalog. Right. And then they'll try to figure out, you know, some algorithms to, to determine data quality, you know, best, you know, okay. Don't use. Right, right. So that, so if I understand it, you automate all that. >>So what we're doing with AI machine learning is really helping the data professional, whether in the business, in technology or in between not only to get the job done faster, better, and cheaper, but actually do it intelligently. What do we mean by that? For example, our AI engine machine learning will look at data patterns and determine not only what's wrong with your data, but how should you fix it and recommend data quality rules to actually apply them and get those errors addressed. We also infer data relationships across a multi-cloud environment where those definitions were never there in the beginning. So we have the ability to scan the metadata and determine, Hey, this data set is actually related to that data set across multiple clouds. It makes the organization more productive, but more importantly, it increases the confidence level that these organizations have the right infrastructure in place in order to manage and govern their data for what they're trying to do from a business perspective. >>And I add that as well. I think you're talking a lot about data mesh architectures, right? That, that are really kind of popular right now. And I think those kind of, they live or die on, on data governance. Right? If you don't have data governance to share taxonomy, these things, it's very hard to, I think, scale those individual working groups. But if you have a platform where they, the data owners can publish out visibility to what their data means, how to use it, how to interpret it and get that insight, that context directly to the data consumers that's game changing. Right. And that's exactly what we're doing with our cloud data governance and catalog. >>Well, the data mesh, you talk about data mesh, there's four principles, right? It's like decentralized architecture data products. So if, once you figure out those two yep. You just created two more problems, which is the other two parts of the Princip four, two parts of the four principles, self service infrastructure, and computational governance. And that's like the hardest part of federated, federated, computational governance. That's the hardest part. That's the problem that you're solving. >>Yeah. Yeah, absolutely. I mean, think about the whole decentralization and self-service, well, I may be able to access my data in mesh architecture, but if I don't know what it means, how to use it for what purpose, when not to use it, you're creating more problems than what you originally expected to solve. So what we're doing is addressing the data management and the governance requirements, regardless of what the architecture is, whether it's a mesh architecture, a fabric architecture or a traditional data lake or a data store. >>Yeah. Mean, I say, I think data mesh is more of an organizational construct than it is. I, I'm not quite sure what data fabric is. I think Gartner confused the issue that data fabric was an old NetApp term. Yeah. You're probably working in NetApp at the time and it made sense in the NetApp context. And then I think Gartner didn't like the fact that Jamma Dani co-opted this cool term. So they created data fabric, but whatever. But my, my point being, I think when I talk to customers that are they're, they're trying to get more value outta data and they recognize that going through all these hyper specialized roles is time consuming and it's not working for them. And they're frustrated to your points and your joint statement. They want to accelerate that. And they're realizing, and the only way to do that is to distribute responsibility, get more people involved in the process. >>And, and that's, it kind of dovetails with some, the announcements we made on data governance for snowflake, right, is you're taking these, these operational controls of the snowflake layer that are typically managed by SQL and you, and that decentralized architecture data owner doesn't know how to set those patterns and things like that. Right. So we're saying, all right, we're, we're creating these deep integration so that again, we have a fit for persona type experience where they can publish data assets, they can set the rules and policies, and we're gonna push that down to snowflake. So when it actually comes to provisioning data and doing data sharing through snowflake, it's all a seamless experience for the end user and the data owner. Yeah. >>That's great. Beautiful, >>Seamless experience absolutely necessary these days for everybody above guys. Thanks so much for joining David me today, talking about Informatica what's new, what you're doing with snowflake and what you're enabling customers to do in terms of really extracting value from that data. We appreciate your insights. >>Thank you. Yep. >>Thank you for having us >>For our guests and Dave ante. I'm Lisa Martin. You're watching the cubes coverage of snowflake summit day two of the cubes coverage stick around Dave. And I will be right back with our next guest.

Published Date : Jun 15 2022

SUMMARY :

Welcome back to the cube. Thank you guys. Talk to us about what some of the trends are that you're seeing in the financial services Use data has to be accessible in the systems and applications you use to run your business. So organizations need to have the ability to understand what Are you seeing increasingly that line of businesses are leaning in owning the data, be building data And of course the investments to ensure that business can make the decisions and take the appropriate actions. all of the requirements on data for them to be able to extract that business value and create new share it across to applications that run their business, be able to identify and deal with data, So how has that affected the way Informatica does business over the last several years? happening at the departmental level, you know, individual contributors, that sort of thing. if I don't have the governance, I'm, I'm afraid to use it. So it's really important that So data loader, what does that do? We say, no, the data loader it, it's a free utility that we announced here at, I mean, you guys are, are known, you have a long history of, you know, But it's really pushing down to the snowflake data cloud as opposed to managing the entire data management and data governance process so that they can allow the business to get value How quickly can you enable a business to get value from that data to be able to make business At the end of the day, you need to understand why are customer data, is it product data that dramatically shortens the time to find data assets deliver them? I think Rick, so you have this joint value statement together. We automate the process to get them to a fully cloud native, So our customers don't have to deal with the costs and the risks of trying to make everything work behind And then they'll try to figure out, you know, some algorithms to, to determine data quality, So what we're doing with AI machine learning is really helping the data professional, And that's exactly what we're doing with our cloud data governance and catalog. Well, the data mesh, you talk about data mesh, there's four principles, right? how to use it for what purpose, when not to use it, you're creating more problems than what you originally expected And they're frustrated to your points and your joint statement. So when it actually comes to provisioning data and doing data sharing through snowflake, it's all a seamless experience for the end user and the data owner. That's great. We appreciate your insights. Thank you. And I will be right back with our next guest.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Rick	PERSON	0.99+
Peter	PERSON	0.99+
Informatica	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Gartner	ORGANIZATION	0.99+
three days	QUANTITY	0.99+
Rik Tamm Daniels	PERSON	0.99+
two parts	QUANTITY	0.99+
Peter Ku	PERSON	0.99+
two	QUANTITY	0.99+
last year	DATE	0.99+
30%	QUANTITY	0.99+
both	QUANTITY	0.99+
yesterday	DATE	0.99+
NetApp	TITLE	0.99+
Rick TA Daniels	PERSON	0.99+
three simple steps	QUANTITY	0.99+
today	DATE	0.99+
first step	QUANTITY	0.98+
10 years ago	DATE	0.98+
first	QUANTITY	0.98+
one	QUANTITY	0.98+
80, 84%	QUANTITY	0.97+
IDMC	ORGANIZATION	0.97+
Snowflake Summit 2022	EVENT	0.94+
about a year and a half ago	DATE	0.94+
two more problems	QUANTITY	0.93+
Princip four	OTHER	0.93+
four principles	QUANTITY	0.91+
G P	ORGANIZATION	0.9+
two big areas	QUANTITY	0.89+
single pane	QUANTITY	0.89+
one single solution	QUANTITY	0.87+
day two	QUANTITY	0.87+
years	DATE	0.85+
Wallace	PERSON	0.85+
One	QUANTITY	0.85+
one cloud	QUANTITY	0.83+
IDC	ORGANIZATION	0.83+
two of our guests	QUANTITY	0.8+
two big trend areas	QUANTITY	0.79+
Jamma Dani	PERSON	0.79+
Dave ante	PERSON	0.77+
COO	PERSON	0.77+
about	DATE	0.75+
every single user	QUANTITY	0.71+
zeros	QUANTITY	0.69+
SQL	TITLE	0.68+
last	DATE	0.67+
once	QUANTITY	0.58+
Hado	TITLE	0.52+
vice	PERSON	0.51+
ones	QUANTITY	0.5+
summit 22	LOCATION	0.44+
Hadoop	EVENT	0.37+

Matthew Carroll, Immuta | Snowflake Summit 2022

(Upbeat music) >> Hey everyone. Welcome back to theCUBE's continuing coverage day two Snowflake Summit '22 live from Caesar's forum in Las Vegas. Lisa Martin here with Dave Vellante, bringing you wall to wall coverage yesterday, today, and tomorrow. We're excited to welcome Matthew Carroll to the program. The CEO of Immuta, we're going to be talking about removing barriers to secure data access security. Matthew, welcome. >> Thank you for having me, appreciate it. >> Talk to the audience a little bit about Immuta you're a Snowflake premier technology partner, but give him an overview of Immuta what you guys do, your vision, all that good stuff. >> Yeah, absolutely, thanks. Yeah, if you think about what Immunta at it's core is, we're a data security platform for the modern data stack, right? So what does that mean? It means that we embed natively into a Snowflake and we enforce policies on data, right? So, the rules to be able to use it, to accelerate data access, right? So, that means connecting to the data very easily controlling it with any regulatory or security policy on it as well as contractual policies, and then being able to audit it. So, that way, any corporation of any size can leverage their data and share that data without risking leaking it or potentially violating a regulation. >> What are some of the key as we look at industry by industry challenges that Immuta is helping those customers address and obviously quickly since everything is accelerating. >> Yeah. And it's, you're seeing it 'cause the big guys like Snowflake are verticalizing, right? You're seeing a lot of industry specific, you know, concepts. With us, if you think of, like, where we live obviously policies on data regulated, right? So healthcare, how do we automate HIPAA compliance? How do we redesign clinical trial management post COVID, right? If you're going to have billions of users and you're collecting that data, pharmaceutical companies can't wait to collect that data. They need to remove those barriers. So, they need to be able to collect it, secure it, and be able to share it. Right? So, double and triple blinded studies being redesigned in the cloud. Government organizations, how do we share security information globally with different countries instantaneously? Right? So these are some of the examples where we're helping organizations transform and be able to kind of accelerate their adoption of data. >> Matt, I don't know if you remember, I mean, I know you remember coming to our office. But we had an interesting conversation and I was telling Lisa. Years ago I wrote a piece of you know, how to build on top of, AWS. You know, there's so much opportunity. And we had a conversation, at our office, theCUBE studios in Marlborough, Massachusetts. And we both, sort of, agreed that there was this new workload emerging. We said, okay, there's AWS, there's Snowflake at the time, we were thinking, and you bring machine learning, at time where we were using data bricks, >> Yeah. >> As the example, of course now it's been a little bit- >> Yeah. Careful. >> More of a battle, right, with those guys. But, and so, you see them going in their different directions, but the premise stands is that there's an ecosystem developing, new workloads developing, on top of the hyper scale infrastructure. And you guys play a part in that. So, describe what you're seeing there 'cause you were right on in that conversation. >> Yeah. Yeah. >> It's nice to be, right. >> Yeah. So when you think of this design pattern, right, is you have a data lake, you have a warehouse, and you have an exchange, right? And this architecture is what you're seeing around you now, is this is every single organization in the world is adopting this design pattern. The challenge that where we fit into kind of a sliver of this is, the way we used to do before is application design, right? And we would build lots of applications, and we would build all of our business logic to enforce security controls and policies inside each app. And you'd go through security and get it approved. In this paradigm, any user could potentially access any data. There's just too many data sources, too many users, and too many things that can go wrong. And to scale that is really hard. So, like, with Immuta, what we've done, versus what everyone else has done is we natively embedded into every single one of those compute partners. So ,Snowflake, data breaks, big query, Redshift, synapse on and on. Natively underneath the covers, so that was BI tools, those data science tools hit Snowflake. They don't have to rewrite any of their code, but we automatically enforce policy without them having to do anything. And then we consistently audit that. I call that the separation of policy from platform. So, just like in the world in big data, when we had to separate compute from storage, in this world, because we're global, right? So we're, we have a distributed workforce and our data needs to abide by all these new security rules and regulations. We provide a flexible framework for them to be able to operate at that scale. And we're the only ones in the world doing it. >> Dave Vellante: See the key there is, I mean, Snowflake is obviously building out its data cloud and the functions that it's building in are quite impressive. >> Yeah. >> Dave Vellante: But you know at some point a customer's going to say, look I have other stuff, whether it's in an Oracle database, or data lake or wherever, and that should just be a node on this global, whatever you want to call it, mesh or fabric. And then if I'm hearing you right, you participate in all of that. >> Correct? Yeah We kind of, we were able to just natively inject into each, and then be able to enforce that policy consistently, right? So, hey, can you access HIPAA data? Who are you? Are you authorized to use this? What's the purpose you want to query this data? Is it for fraud? Is it for marketing? So, what we're trying to do as part of this new design paradigm is ensure that we can automate nearly the entire data access process, but with the confidence and de-risk it, that's kind of the key thing. But the one thing I will mention is I think we talk a lot about the core compute, but I think, especially at this summit, data sharing is everything. Right? And this concept of no copy data sharing, because the data is too big and there's too many sets to share, that's the keys to the kingdom. You got to get your lake and your warehouse set with good policy, so you can effectively share it. >> Yeah, so, I wanted to just to follow up, if I may. So, you'd mentioned separating compute from storage and a lot of VC money poured into that. A lot of VC money poured into cloud database. How do you see, do you see Snowflake differentiating substantially from all the other cloud databases? And how so? >> I think it's the ease of use, right? Apple produces a phone that isn't much different than other competitors. Right? But what they do is, end to end, they provide an experience that's very simple. Right? And so yes. Are there other warehouses? Are there other ways to, you know you heard about their analytic workloads now, you know through unistore, where they're going to be able to process analytical workloads as well as their ad hoc queries. I think other vendors are obviously going to have the same capabilities, but I think the user experience of Snowflake right now is top tier. Right? Is I can, whether I'm a small business, I can load my debt in there and build an app really quickly. Or if I'm a JP Morgan or, you know, a West Farmer's I can move legacy, you know monolithic architectures in there in months. I mean, these are six months transitions. When think about 20 years of work is now being transitioned to the cloud in six months. That's the difference. >> So measuring ease of views and time to value, time to market. >> Yeah. That's it's everything is time to value. No one wants to manage the infrastructure. In the Hudup world, no one wants to have expensive customized engineers that are, you know, keeping up your Hudup infrastructure any longer. Those days are completely over. >> Can you share an example of a joint customer, where really the joint value proposition that Immuta and Snowflake bring, are delivering some pretty substantial outcomes? >> Yeah. I, what we're seeing is and we're obviously highly incentivized to get them in there because it's easier on us, right? Because we can leverage their row and com level security. We can leverage their features that they've built in to provide a better experience to our customers. And so when we talk about large banks, they're trying to move Terra data workloads into Snowflake. When we talk about clinical trial management, they're trying to get away from physical copies of data, and leverage the exchanges of mechanism, so you can manage data contracts, right? So like, you know, when we think of even like a company like Latch, right? Like Latch uses us to be able to oversee all of the consumer data they have. Without like a Snowflake, what ends up happening is they end up having to double down and invest on their own people building out all their own infrastructure. And they don't have the capital to invest in third party tools like us that keep them safe, prevent data leaks, allow them to do more and get more value out of their data, which is what they're good at. >> So TCO reduction I'm hearing. >> Matthew Carroll: Yes, exactly. >> Matt, where are you as a company, you've obviously made a lot of progress since we last talked. Maybe give us the update on you know, the headcount, and fundraising, and- >> Yeah, we're just at about 250 people, which scares me every day, but it's awesome. But yeah, we've just raised 100 million dollars- >> Lisa Martin: Saw that, congratulations. >> Series E, thank you, with night dragon leading it. And night dragon was very tactical as well. We are moving, we found that data governance, I think what you're seeing in the market now is the catalog players are really maturing, and they're starting to add a suite of features around governance, right? So quality control, observability, and just traditional asset management around their data. What we are finding is is that there's a new gap in this space, right? So if you think about legacy it's we had infrastructure security we had the four walls and we protect our four walls. Then we moved to network security. We said, oh, the adversary is inside zero trust. So, let's protect all of our endpoints, right? But now we're seeing is data is the security flaw data could be, anyone could potentially access it in this organization. So how do we protect data? And so what we have matured into is a data security company. What we have found is, there's this next generation of data security products that are missing. And it's this blend between authentication like an, an Okta or an AuthO and auth- I'm sorry, authorization. Like Immuta, where we're authorizing certain access. And we have to pair together, with the modern observability, like a data dog, to provide an a layer above this modern data stack, to protect the data to analyze the users, to look for threats. And so Immuta has transformed with this capital. And we brought Dave DeWalt onto our board because he's a cybersecurity expert, he gives us that understanding of what is it like to sell into this modern cyber environment. So now, we have this platform where we can discover data, analyze it, tag it, understand its risk, secure it to author and enforce policies. And then monitor, the key thing is monitoring. Who is using the data? Why are they using the data? What are the risks to that? In order to enforce the security. So, we are a data security platform now with this raise. >> Okay. That, well, that's a new, you know, vector for you guys. I always saw you as an adjacency, but you're saying smack dab in the heart >> Matthew Carroll: Yes. Yeah. We're jumping right in. What we've seen is there is a massive global gap. Data is no longer just in one country. So it is, how do we automate policy enforcement of regulatory oversight, like GDPR or CCPA, which I think got this whole category going. But then we quickly realized is, well we have data jurisdiction. So, where does that data have to live? Where can I send it to? Because from Europe to us, what's the export treaty? We don't have defined laws anymore. So we needed a flexible framework to handle that. And now what we're seeing is data leaks, upon data leaks, and you know, the Snowflakes and the other cloud compute vendors, the last thing they ever want is a data leak out of their ecosystem. So, the security aspects are now becoming more and more important. It's going to be an insider threat. It's someone that already has access to that and has the rights to it. That's going to be the risk. And there is no pattern for a data scientist. There's no zero trust model for data. So we have to create that. >> How are you, last question, how are you going to be using a 100 million raised in series E funding, which you mentioned, how are you going to be leveraging that investment to turn the volume up on data security? >> Well, and we still have also another 80 million still in the bank from our last raise, so 180 million now, and potentially more soon, we'll kind of throw that out there. But, the first thing is M and A I believe in a recessing market, we're going to see these platforms consolidate. Larger customer of ours are driving us to say, Hey, we need less tools. We need to make this easier. So we can go faster. They're, even in a recessing market, these customers are not going to go slower. They're moving in the cloud as fast as possible, but it needs to be easier, right? It's going back to the mid nineties kind of Lego blocks, right? Like the IBM, the SAP, the Informatica, right? So that's number one. Number two is investing globally. Customer success, engineering, support, 24 by seven support globally. Global infrastructure on cloud, moving to true SaaS everywhere in the world. That's where we're going. So sales, engineering, and customer success globally. And the third is, is doubling down on R and D. That monitor capability, we're going to be building software around. How do we monitor and understand risk of users, third parties. So how do you handle data contracts? How do you handle data use agreements? So those are three areas we're focused on. >> Dave Vellante: How are you scaling go to market at this point? I mean, I presume you are. >> Yeah, well, I think as we're leveraging these types of engagements, so like our partners are the big cloud compute vendors, right? Those data clouds. We're injecting as much as we can into them and helping them get more workloads onto their infrastructure because it benefits us. And then obviously we're working with GSIs and then RSIs to kind of help with this transformation, but we're all in, we're actually deprecating support of legacy connectors. And we're all in on cloud compute. >> How did the pivot to all in on security, how did it affect your product portfolio? I mean, is that more positioning or was there other product extensions that where you had to test product market fit? >> Yeah. This comes out of customer drive. So we've been holding customer advisory boards across Europe, Asia and U.S. And what we just saw was a pattern of some of these largest banks and pharmaceutical companies and insurance companies in the world was, hey we need to understand who is actually on our data. We have a better understanding of our data now, but we don't actually understand why they're using our data. Why are they running these types of queries? Is this machine, you know logic, that we're running on this now, we invested all this money in AI. What's the risk? They just don't know. And so, yeah, it's going to change our product portfolio. We modularized our platform to the street components over the past year, specifically now, so we can start building custom applications on top of it, for specific users like the CSO, like, you know, the legal department, and like third party regulators to come in, as well as as going back to data sharing, to build data use agreements between one or many entities, right? So an SMP global can expose their data to third parties and have one consistent digital contract, no more long memo that you have to read the contract, like, Immuta can automate those data contracts between one or many entities. >> Dave Vellante: And make it a checkbox item. >> It's just a checkbox, but then you can audit it all, right? >> The key thing is this, I always tell people, there's negligence and gross negligence. Negligence, you can go back and fix something, gross negligence you don't have anything to put into controls. Regulators want you to be at least negligent, grossly negligent. They get upset. (laughs) >> Matthew, it sounds like great stuff is going on at Immuta, lots of money in the bank. And it sounds like a very clear and strategic vision and direction. We thank you so much for joining us on theCUBE this morning. >> Thank you so much >> For our guest and Dave Vellante, I'm Lisa Martin, you're watching theCUBE's coverage of day two, Snowflake Summit '22, coming at ya live, from the show floor in Las Vegas. Be right back with our next guest. (Soft music)

Published Date : Jun 15 2022

SUMMARY :

Matthew Carroll to the program. of Immuta what you guys do, your vision, So, the rules to be able to use it, What are some of the key So, they need to be able to collect it, at the time, we were thinking, And you guys play a part in that. of our business logic to Dave Vellante: See the key there is, on this global, whatever you What's the purpose you just to follow up, if I may. they're going to be able to and time to value, time to market. that are, you know, keeping And they don't have the capital to invest Matt, where are you as a company, Yeah, we're just at about 250 people, What are the risks to that? I always saw you That's going to be the risk. but it needs to be easier, right? I mean, I presume you are. and then RSIs to kind of help the CSO, like, you know, Dave Vellante: And Regulators want you to be at Immuta, lots of money in the bank. from the show floor in Las Vegas.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Matthew Carroll	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Matthew	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
Dave DeWalt	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Lisa	PERSON	0.99+
Las Vegas	LOCATION	0.99+
Matt	PERSON	0.99+
Immuta	ORGANIZATION	0.99+
tomorrow	DATE	0.99+
100 million dollars	QUANTITY	0.99+
Asia	LOCATION	0.99+
Apple	ORGANIZATION	0.99+
U.S.	LOCATION	0.99+
Informatica	ORGANIZATION	0.99+
today	DATE	0.99+
six months	QUANTITY	0.99+
Latch	ORGANIZATION	0.99+
100 million	QUANTITY	0.99+
SAP	ORGANIZATION	0.99+
180 million	QUANTITY	0.99+
Hudup	ORGANIZATION	0.99+
JP Morgan	ORGANIZATION	0.99+
third	QUANTITY	0.99+
Immunta	ORGANIZATION	0.99+
yesterday	DATE	0.99+
Marlborough, Massachusetts	LOCATION	0.99+
theCUBE	ORGANIZATION	0.99+
one country	QUANTITY	0.98+
each app	QUANTITY	0.98+
West Farmer	ORGANIZATION	0.98+
Snowflake Summit '22	EVENT	0.98+
Snowflake	TITLE	0.98+
80 million	QUANTITY	0.98+
one	QUANTITY	0.97+
Oracle	ORGANIZATION	0.97+
both	QUANTITY	0.97+
mid nineties	DATE	0.96+
GDPR	TITLE	0.96+
Snowflakes	EVENT	0.96+
day two	QUANTITY	0.95+
HIPAA	TITLE	0.95+
Snowflake Summit '22	EVENT	0.94+
24	QUANTITY	0.94+
about 20 years	QUANTITY	0.93+
first thing	QUANTITY	0.93+
this morning	DATE	0.93+
past year	DATE	0.92+
seven	QUANTITY	0.92+
Snowflake	ORGANIZATION	0.92+
Snowflake Summit 2022	EVENT	0.92+
about 250 people	QUANTITY	0.92+
each	QUANTITY	0.91+
day	QUANTITY	0.87+
double	QUANTITY	0.87+
SMP	ORGANIZATION	0.85+
unistore	TITLE	0.84+
triple	QUANTITY	0.84+
Snowflake	EVENT	0.83+
Caesar	PERSON	0.82+
three areas	QUANTITY	0.82+
GSIs	ORGANIZATION	0.81+
Years ago	DATE	0.8+

Nagaraj Sastry, HCL Technologies | Snowflake Summit 2022

>>Welcome back to the cubes. Continuing coverage of day, one of the snowflake summit 22 live from seizures forum in Las Vegas. I'm Lisa Martin. My co-host for the week is Dave ante, Dave and I are pleased to welcome Naga Raj Sastry to the program, the vice president of data and analytics at HCL technologies. Welcome. Great to have you. >>Same here. Thank you for inviting me here. >>Isn't it great to be back in person? >>Oh, love it. >>This the keynote this morning. I don't know if you had a chance to see it standing room only there was overflow rooms. People are ready for this, and it was a jam packed morning of announcements. >>Absolutely. >>Talk to us a little bit about the HCL snowflake partnership, but anybody in the audience who may not be familiar with HCL, give us a little bit of a background, vision, mission differentiation, and then that snowflake duo. >>Sure, sure. So let me first start off with, um, uh, talking about H at seal, we are 11.5 billion organization. Uh, we have three modes of working mode. One is everything to do with our infrastructure business and application services and maintenance mode. Two is anything that we do in the cutting edge, uh, ecosystem, whether it is cloud, whether it is application modernization, ERPs, uh, SA all of those put together is more to data. Analytics is part of our more to culture. Um, the whole ecosystem is called digital services business and, uh, within digital, uh, services, the one of the arms is data and analytics. We are about a billion dollars in terms of revenues from a data and analytics perspective, uh, of the 11 billion that I was talking to you about. And mode three is everything to do with our software services. So we have got our own software products, and that's a third of our business. So that's about HCL. So at C and, uh, snowflake relationship, we are a elite partner with snowflake. We are one of the fastest growing partners. We achieved the elite level within 18 months of us signing up as a snowflake partner. We're close to about 50 plus implementations worldwide, and, uh, about 800 people who are snowflake professionals within, within that CLE ecosystem, large customers that we serve. >>And how long have you been partners? >>Uh, about 18 to 20 months now. >>Okay. So, so the, during the last couple of tumultuous years, why snowflake, what was it about their vision, their strategy, their leadership that really to spoke to HCL as this is a partner for us? >>So, so one of the, uh, biggest things that we realized, uh, probably about four years ago was in terms of, you know, you had all the application databases or RDBMSs PPS, the huddle P ecosystems, which are getting expense systems, which were getting expensive, not in terms of the cost, but in terms of the pro processing times, the way the queries were getting created. And we knew that there was, there is something that is going to come and the people and the people. Yeah. >>And, uh, and we knew that, you know, there will be a hyperscaler that will come. And, uh, of course there was Azure was already there. AWS was there, Google was just picking it up. And at that point in time, we realized that, you know, there will be a cloud data warehouse because we had started reading about snowflake at that point in time. So fast forward a couple of years after that, and we realized that if we are to be in this business, you know, the, the right way of doing it is by getting partnering a partnering with the right tooling company. And snowflake brings that to table. We all know that now. And, uh, with, with what, what the keynote speakers were also saying, right, from 150 member team about five years ago in, uh, conference to about 12,000 people now. So you know that this is the right thing to do, and this is the right place to be at. So we, we devised a methodology in terms of saying that let's get into the partnership, let's get our resources trained and certified on the snowflake ecosystem. And let's take a point of view to our customers in terms of how data migrations and transformations have to be done in the snowflake arena. When >>You, when you think about your modes, you talked about modes one, two, and three. If I feel like snowflake touches on each of those, maybe not so much of the infrastructure and the apps, but although maybe going forward, it does increasingly. So, yeah, that's my question is where do you see snowflake vectoring into your modes? >>So it doesn in both in the first two modes, uh, and mode three also, uh, because, and I'll give you the reasons why mode one is predominantly because you can do application development on cloud yep. On the data cloud now, um, which basically means that I can have a qu application run on snowflake. Eventually that's the goal. Second is, uh, in, in more two, because it is a cloud data warehouse, it fits in exactly because the application data is in snowflake. I've got my, uh, regular data sets within snowflake. Both are talking to each other. There is zero, um, lapse time from a user perspective, >>It's a direct >>Tip. And then more three, the reason why I said more three was because software as a service or software services and products is because I can power by snowflake. I can implement that. So that's why it cuts across our entire ecosystem. >>The, the dig, the whole thing is called your dig business, correct? Yes. Is that right? So that's, this is the, the next wave of digital business that we're seeing here, cuz it's digital is data <laugh> right. That's really what it's about. It's about putting that data to work. >>So the president of our digital business, a BJA who was, who had done the, who had done a session in the, in the afternoon today, he says the D in the digital is data. >>There is right. >>And, uh, that's what we are seeing with our customers, large implementations that we do in this ecosystem. There is one other thing that we are focusing, uh, very heavily on is industrial solutions or industry led solutions. Like whether it is for healthcare, whether it's for retail or financial services, name, a vertical. And we have got our own capabilities around industrialized solutions that's fit that fit certain use cases. >>So in thinking about the D in, in digital is really data. If you think about the operating model for data, it's obviously evolved, you mentioned, had do, went to the cloud and all the data went to the cloud, but today it's, you've got an application development model, you got database, which is sort of hardened. And then you've got your data pipeline and your, your data stack and, and that's kind of the operating model. There's sort of siloed to a great degree. Mm-hmm <affirmative> how is that operating model changing as a result of, of data? So >>I answered it in two parts. Part is if you, if you realize over the years, what used to happen is you had a CIO in an organization or C more CIO, but, and then you had enterprise architecture teams, application development teams, support teams, and so on and so forth in the last 36 months. If you see there is an emergence of a new role, which is called the da chief data and analytics officer. So the data and analytics officer is a role that has been created. And the purpose of creating that role is to ensure that organizations will pull out our call out resources within the CIO organizations who are enterprise architects, who are data architects, who are application architects or security architects, and bring them under into the ecosystem of the data office from an operating model perspective. So that innovations can be driven. >>Data driven enterprises could be created and innovations can come through there. The other part of that is the use cases get prioritized when you start innovating. And then it is a factory model in terms of how those use cases get built, which is, which is, which is a no brainer in my mind, at least. But that is how the operating model is coming up from a people perspective, from a technology perspective. Also there is an operating model that is emerging. If you see all the hyperscalers that are there today, snowflake with its LA most latest and greatest announcements. If you see the way the industry is going, is everything will be housed into one ecosystem and the beauty of this entire thing. And if you, you are to, you'll be able to fathom it effectively, right? Because if you are, if I'm, multi-cloud kind of an environment and if I'm on snowflake, I don't care why, because I'm snowflake, which is, which can work around across the multi clouds. So my data is in one place >>Effectively. Yeah. It's interesting what you were saying about the chief data officer, the chief data officer, that role emerged out of the, the ashes, like a Phoenix of, of, you know, compliance data quality and, and healthcare and financial services and government, the highly regulated industries. And then it took a while, but it, it increasingly became, wow, this is a really front front of the board level role, if you will, you know, data, and now you're seeing it. It's it's, it is integrated with digital. >>Absolutely. And there is one other point, if you think about it, the emergence of the chief data officer came in because there were issues associated to data quality. Yeah. There were issues associated to data cataloging as to how data is cataloged. And there were issues in terms of trustability of the data. Now, the trustability of the data can be in two places. One is a data quality, Hey, bad data, garbage and garbage out. But then the other aspect of the trustability is in terms of, can I do the seven CS of data quality and say that, okay, I can hallmark this data platinum or gold or silver or bronze or UN hallmark data. And with snowflake, the advantage is if I, if you have a hallmark data set, that is a, say a platinum or a gold, and thanks to the virtual warehouse, the same data set gets penetrated across the enterprise. That's the beauty with which it comes. And then of course the metadata aspect of it, bringing in the technical metadata and the business metadata together for the purpose of creating the data catalogs is another key cool thing and enabled again by snowflake. >>What are some of it when you're in customer conversations, some of the myths or misconceptions that customers historically have typically been making when it comes to creating a data strategy, some of the misconceptions, and then what is your recommendation for those folks since every company, these days to be competitive has to be a data company. >>Yeah. So around data structures, the, the whole thought process has to be, uh, either do in the past, we used to go with, from source applications, we would gather requirements. Then we would figure out what sources are there, do a profiling of the data and then say, okay, the target data, data model should be this >>Too slow, >>Too slow right now, fast forward to the digital transformation. There is producers of data, which is basically that applications that are being modernized today are producers of data. They're actually telling you that I'm producing this kind of data. This is the kind of events that I'm producing. And this is my structure. Now the whole deal is I don't need to figure out what the requirements are. I know what the use case the application is going to be helping me with. So therefore the entire data model is supported. So, but at the same point in time, the newer generation applications that are getting created are not only created getting created in terms of the customer experience. Of course, that is very critical, but they're also taking into account aspects around metadata, the technical metadata associated within an application, the data quality rules or business rules that are implemented within an application, all of that is getting documented as a result, the whole timeline from source to profile to model, which used to be X number of days in the past is X minus, at least 20% now or 30% actually. So that is how the structures, uh, the data structures are coming into a play future futuristic thought process would be, there will be producers of data and there'll be consumers of data. Where is ETL then or ELT. Then there is not going to be any ETL or ELT because a producer is going to say that I'm producing the data for this. A consumer says that, okay, I wanna consume the data for this purpose. There, they meet through an API layer. So where is ETL eventually going to go away? >>Well, and those consumers of, if you think about the, the way it works today, the, the data operating model, if you will, the transaction systems and other systems draw off a bunch of exhaust, they gets thrown over the fence to the analytics system. They're not operation the data, the data pipeline, the data systems are not operationalized in a way that they need to be. And obviously Snowflake's trying to change that. >>So data >>That's a big change, please. >>Yeah. Sorry. Didn't mean to cut you off. My >>Apologies. No, no. I'm >>So data operations is a very, very critical aspect. And if you think about it holistically, we used to have ETL pipelines T pipelines. And then we used to have queries being written on top of metadata or PPS and HaLoop and all of that and reporting tools that would have number of reports that were created and certain self-service BI reports into the ecosystem. Now, when you think in terms of a cloud data warehouse, what is happening? Is this the way you are architecting your solution today in terms of data pipelines, those data pipelines are self manageable or self-healing do not need the number of people where there was no documentation in terms of what ETL pipelines were written in the past on certain ETL tools or why something is failing. Nobody knew why something was failing because these are age old code, but take it forward today. >>What happens is our organizations are migrating from on-prem to cloud and to the cloud data warehouse. And the overall cost of ownership is decreasing. The reason is the way we are implementing the data pipelines, the way the data operations are being done in terms of, you know, even before a pipeline is kicked, uh, or kicked in, then, you know, there is a check process to say whether the source application is ready or not ready. So such things, small, small things, which are part and parcel of the entire data operations lifecycle are taking the center stage as a result, self fueling mechanisms are coming in. And because of those self fueling mechanisms, metrics are being captured as a result, you know exactly where to focus on and where not to focus on as, as a result, the number of resources needed to support gets reduced. Cost of one service >>Is low, much higher trust self-service infrastructure, uh, data context in the hands of, of business users. Data is now more discoverable it's governed. So you can now create data products more quickly. So speed and scale become extremely important. >>Absolutely. And in fact, one of the things that, that, uh, that is changing is the way search is getting implemented here to in the past, you created an index and then, you know, the data is searchable, but now it is contextual search. Can I contextualize the entire search? Can I create a machine learning algorithm that will actually say that, okay, Nara as a persona was looking for this kind of data and then Nara as a person, or comes back again and looks for some different kind of data. Can the machine learning algorithm go and figure out, okay, what is, what is going on in a garage's mind? What is he trying to look at? And then, you know, improve the, the whole learnability of the, of the entire algorithm. That's how search is going to also take, get into a change kind of a scenario. >>Excellent NAAU garage. Thank you so much for joining us, talking about data modernization at speed, end scale HCL, what you're doing, what you're doing with snowflake, and the sounds like incredible power that you're enabling. And we're only just scratching the surface. I have a feeling there's a lot more under there that you guys are gonna uncover. >>Sure. So we have, we have a tool or an accelerator. We call it an accelerator in the HCL parlance, but just actually a tool. So when you think about data modernization onto snowflake, it is predominantly migrating the data set from your existing ecosystem onto snowflake. That is one aspect of it. The second aspect of it is the modernization of the ETL or E LT pipelines. The third aspect associated to the data that is there within this, these ecosystems is the reconciliation older application, uh, sorry, older legacy, uh, platform snowflake legacy platform gives me result. X does snowflake give me result X that kind of a reconciliation has to be done. Data reconciliation and testing. And then the third fourth layer associated is the reporting and visualization. So these four layers are part and parcel of something that we call as advantage. Migrate advantage migrate will convert your ter data, data, uh, model into a snowflake understandable data model automatically whether it's ter data, whether it is Oracle, extra data, green plum, <inaudible> you name a ecosystem. >>We have the mechanism to convert a data model from whatever it is into snowflake readable, understandable data model. The second aspect is the et L E L T pipeline. Whether you want to go from Informatica to DBT or Informatica to something else or data stage to something else doesn't matter. There is a, there is an algorithm, or there is a tool which is called the ETL pipeline. We call it gateway suit, gateway suit actually converts the code. It reads the code that is there on the left hand side, which is the legacy code, understands the logic, it reverse engineers and understands the logic. And then what it does is we use that understanding or that logic that has been called out into spark code or DBT or any other tool of your choice from a customer standpoint. That's the second layer. Third layer I talked about, which is basically data testing, automated data testing and data reconciliation and the last, but not the least is the reporting because older ways of reporting and visualization with, with current day reporting and visualization, which is more persona based, the art of visualization is something difficult or different in this, in this aspect, come over to our booth at 2 1, 1 4, and you'll see, uh, advantage migrate in the works >>Advantage. Migrate. There you go. Nero, thank you so much for joining us on the program and unpacking HCL, giving us really that technical dissection of what you guys are doing and together with snowflake. We appreciate your time. >>Thank you. My pleasure. Thank you >>For our guest and Dave ante. This is Lisa Martin live from the show floor of snowflake summit 22, Dave and I will be right back with our final guest of day one in just a minute.

Published Date : Jun 15 2022

SUMMARY :

Continuing coverage of day, one of the snowflake summit 22 live Thank you for inviting me here. This the keynote this morning. Talk to us a little bit about the HCL snowflake partnership, but anybody in the audience who may not be familiar We are one of the fastest growing partners. their strategy, their leadership that really to spoke to HCL as this cost, but in terms of the pro processing times, the way the queries were getting created. And at that point in time, we realized that, you know, there will be a cloud data warehouse because we had started reading You, when you think about your modes, you talked about modes one, two, and three. So it doesn in both in the first two modes, uh, So that's why it cuts across our entire ecosystem. The, the dig, the whole thing is called your dig business, correct? So the president of our digital business, a BJA who was, who had done the, who had done a session in There is one other thing that we are focusing, uh, very heavily on is industrial all the data went to the cloud, but today it's, you've got an application development model, So the data and analytics officer is a role that has been created. The other part of that is the use cases get prioritized when you start innovating. of the board level role, if you will, you know, data, and now you're seeing it. And there is one other point, if you think about it, the emergence of the chief some of the misconceptions, and then what is your recommendation for those folks since every company, these days to be competitive the whole thought process has to be, uh, either do in the past, So that is how the structures, the way it works today, the, the data operating model, if you will, the transaction systems and Didn't mean to cut you off. And if you think about it holistically, The reason is the way we are implementing the data pipelines, the way the data operations So you can now create data products more quickly. And in fact, one of the things that, that, uh, I have a feeling there's a lot more under there that you guys are So when you think about data modernization We have the mechanism to convert a data model from whatever it is into snowflake giving us really that technical dissection of what you guys are doing and together with snowflake. Thank you. This is Lisa Martin live from the show floor of snowflake summit

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
Dave	PERSON	0.99+
Nagaraj Sastry	PERSON	0.99+
Google	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
11 billion	QUANTITY	0.99+
two parts	QUANTITY	0.99+
Naga Raj Sastry	PERSON	0.99+
HCL	ORGANIZATION	0.99+
Las Vegas	LOCATION	0.99+
Informatica	ORGANIZATION	0.99+
11.5 billion	QUANTITY	0.99+
30%	QUANTITY	0.99+
two places	QUANTITY	0.99+
third aspect	QUANTITY	0.99+
Second	QUANTITY	0.99+
second layer	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
second aspect	QUANTITY	0.99+
DBT	ORGANIZATION	0.99+
both	QUANTITY	0.99+
18 months	QUANTITY	0.99+
HCL Technologies	ORGANIZATION	0.99+
one aspect	QUANTITY	0.99+
one	QUANTITY	0.99+
NAAU	ORGANIZATION	0.99+
Both	QUANTITY	0.99+
Third layer	QUANTITY	0.99+
Nero	PERSON	0.99+
Two	QUANTITY	0.98+
four layers	QUANTITY	0.98+
first	QUANTITY	0.98+
20 months	QUANTITY	0.98+
today	DATE	0.97+
about 800 people	QUANTITY	0.97+
each	QUANTITY	0.97+
One	QUANTITY	0.97+
LA	LOCATION	0.96+
day one	QUANTITY	0.96+
about 12,000 people	QUANTITY	0.96+
Snowflake	ORGANIZATION	0.96+
Dave ante	PERSON	0.96+
Snowflake Summit 2022	EVENT	0.96+
first two modes	QUANTITY	0.95+
zero	QUANTITY	0.95+
Nara	PERSON	0.95+
about a billion dollars	QUANTITY	0.94+
Azure	TITLE	0.93+
third fourth layer	QUANTITY	0.93+
third	QUANTITY	0.93+
three	QUANTITY	0.92+
about 50 plus implementations	QUANTITY	0.92+
mode three	OTHER	0.91+
this morning	DATE	0.91+
about four years ago	DATE	0.91+
about five years ago	DATE	0.9+
2	OTHER	0.89+
two	QUANTITY	0.88+
BJA	ORGANIZATION	0.87+
thing	QUANTITY	0.87+
1	OTHER	0.87+
H at seal	ORGANIZATION	0.86+
seven CS	QUANTITY	0.85+
one ecosystem	QUANTITY	0.85+
least 20%	QUANTITY	0.82+
150 member team	QUANTITY	0.81+
wave	EVENT	0.78+
about 18	QUANTITY	0.77+
last 36 months	DATE	0.77+
point	QUANTITY	0.73+
22	EVENT	0.73+

Data Power Panel V3

(upbeat music) >> The stampede to cloud and massive VC investments has led to the emergence of a new generation of object store based data lakes. And with them two important trends, actually three important trends. First, a new category that combines data lakes and data warehouses aka the lakehouse is emerged as a leading contender to be the data platform of the future. And this novelty touts the ability to address data engineering, data science, and data warehouse workloads on a single shared data platform. The other major trend we've seen is query engines and broader data fabric virtualization platforms have embraced NextGen data lakes as platforms for SQL centric business intelligence workloads, reducing, or somebody even claim eliminating the need for separate data warehouses. Pretty bold. However, cloud data warehouses have added complimentary technologies to bridge the gaps with lakehouses. And the third is many, if not most customers that are embracing the so-called data fabric or data mesh architectures. They're looking at data lakes as a fundamental component of their strategies, and they're trying to evolve them to be more capable, hence the interest in lakehouse, but at the same time, they don't want to, or can't abandon their data warehouse estate. As such we see a battle royale is brewing between cloud data warehouses and cloud lakehouses. Is it possible to do it all with one cloud center analytical data platform? Well, we're going to find out. My name is Dave Vellante and welcome to the data platform's power panel on theCUBE. Our next episode in a series where we gather some of the industry's top analysts to talk about one of our favorite topics, data. In today's session, we'll discuss trends, emerging options, and the trade offs of various approaches and we'll name names. Joining us today are Sanjeev Mohan, who's the principal at SanjMo, Tony Baers, principal at dbInsight. And Doug Henschen is the vice president and principal analyst at Constellation Research. Guys, welcome back to theCUBE. Great to see you again. >> Thank guys. Thank you. >> Thank you. >> So it's early June and we're gearing up with two major conferences, there's several database conferences, but two in particular that were very interested in, Snowflake Summit and Databricks Data and AI Summit. Doug let's start off with you and then Tony and Sanjeev, if you could kindly weigh in. Where did this all start, Doug? The notion of lakehouse. And let's talk about what exactly we mean by lakehouse. Go ahead. >> Yeah, well you nailed it in your intro. One platform to address BI data science, data engineering, fewer platforms, less cost, less complexity, very compelling. You can credit Databricks for coining the term lakehouse back in 2020, but it's really a much older idea. You can go back to Cloudera introducing their Impala database in 2012. That was a database on top of Hadoop. And indeed in that last decade, by the middle of that last decade, there were several SQL on Hadoop products, open standards like Apache Drill. And at the same time, the database vendors were trying to respond to this interest in machine learning and the data science. So they were adding SQL extensions, the likes Hudi and Vertical we're adding SQL extensions to support the data science. But then later in that decade with the shift to cloud and object storage, you saw the vendor shift to this whole cloud, and object storage idea. So you have in the database camp Snowflake introduce Snowpark to try to address the data science needs. They introduced that in 2020 and last year they announced support for Python. You also had Oracle, SAP jumped on this lakehouse idea last year, supporting both the lake and warehouse single vendor, not necessarily quite single platform. Google very recently also jumped on the bandwagon. And then you also mentioned, the SQL engine camp, the Dremios, the Ahanas, the Starbursts, really doing two things, a fabric for distributed access to many data sources, but also very firmly planning that idea that you can just have the lake and we'll help you do the BI workloads on that. And then of course, the data lake camp with the Databricks and Clouderas providing a warehouse style deployments on top of their lake platforms. >> Okay, thanks, Doug. I'd be remiss those of you who me know that I typically write my own intros. This time my colleagues fed me a lot of that material. So thank you. You guys make it easy. But Tony, give us your thoughts on this intro. >> Right. Well, I very much agree with both of you, which may not make for the most exciting television in terms of that it has been an evolution just like Doug said. I mean, for instance, just to give an example when Teradata bought AfterData was initially seen as a hardware platform play. In the end, it was basically, it was all those after functions that made a lot of sort of big data analytics accessible to SQL. (clears throat) And so what I really see just in a more simpler definition or functional definition, the data lakehouse is really an attempt by the data lake folks to make the data lake friendlier territory to the SQL folks, and also to get into friendly territory, to all the data stewards, who are basically concerned about the sprawl and the lack of control in governance in the data lake. So it's really kind of a continuing of an ongoing trend that being said, there's no action without counter action. And of course, at the other end of the spectrum, we also see a lot of the data warehouses starting to edit things like in database machine learning. So they're certainly not surrendering without a fight. Again, as Doug was mentioning, this has been part of a continual blending of platforms that we've seen over the years that we first saw in the Hadoop years with SQL on Hadoop and data warehouses starting to reach out to cloud storage or should say the HDFS and then with the cloud then going cloud native and therefore trying to break the silos down even further. >> Now, thank you. And Sanjeev, data lakes, when we first heard about them, there were such a compelling name, and then we realized all the problems associated with them. So pick it up from there. What would you add to Doug and Tony? >> I would say, these are excellent points that Doug and Tony have brought to light. The concept of lakehouse was going on to your point, Dave, a long time ago, long before the tone was invented. For example, in Uber, Uber was trying to do a mix of Hadoop and Vertical because what they really needed were transactional capabilities that Hadoop did not have. So they weren't calling it the lakehouse, they were using multiple technologies, but now they're able to collapse it into a single data store that we call lakehouse. Data lakes, excellent at batch processing large volumes of data, but they don't have the real time capabilities such as change data capture, doing inserts and updates. So this is why lakehouse has become so important because they give us these transactional capabilities. >> Great. So I'm interested, the name is great, lakehouse. The concept is powerful, but I get concerned that it's a lot of marketing hype behind it. So I want to examine that a bit deeper. How mature is the concept of lakehouse? Are there practical examples that really exist in the real world that are driving business results for practitioners? Tony, maybe you could kick that off. >> Well, put it this way. I think what's interesting is that both data lakes and data warehouse that each had to extend themselves. To believe the Databricks hype it's that this was just a natural extension of the data lake. In point of fact, Databricks had to go outside its core technology of Spark to make the lakehouse possible. And it's a very similar type of thing on the part with data warehouse folks, in terms of that they've had to go beyond SQL, In the case of Databricks. There have been a number of incremental improvements to Delta lake, to basically make the table format more performative, for instance. But the other thing, I think the most dramatic change in all that is in their SQL engine and they had to essentially pretty much abandon Spark SQL because it really, in off itself Spark SQL is essentially stop gap solution. And if they wanted to really address that crowd, they had to totally reinvent SQL or at least their SQL engine. And so Databricks SQL is not Spark SQL, it is not Spark, it's basically SQL that it's adapted to run in a Spark environment, but the underlying engine is C++, it's not scale or anything like that. So Databricks had to take a major detour outside of its core platform to do this. So to answer your question, this is not mature because these are all basically kind of, even though the idea of blending platforms has been going on for well over a decade, I would say that the current iteration is still fairly immature. And in the cloud, I could see a further evolution of this because if you think through cloud native architecture where you're essentially abstracting compute from data, there is no reason why, if let's say you are dealing with say, the same basically data targets say cloud storage, cloud object storage that you might not apportion the task to different compute engines. And so therefore you could have, for instance, let's say you're Google, you could have BigQuery, perform basically the types of the analytics, the SQL analytics that would be associated with the data warehouse and you could have BigQuery ML that does some in database machine learning, but at the same time for another part of the query, which might involve, let's say some deep learning, just for example, you might go out to let's say the serverless spark service or the data proc. And there's no reason why Google could not blend all those into a coherent offering that's basically all triggered through microservices. And I just gave Google as an example, if you could generalize that with all the other cloud or all the other third party vendors. So I think we're still very early in the game in terms of maturity of data lakehouses. >> Thanks, Tony. So Sanjeev, is this all hype? What are your thoughts? >> It's not hype, but completely agree. It's not mature yet. Lakehouses have still a lot of work to do, so what I'm now starting to see is that the world is dividing into two camps. On one hand, there are people who don't want to deal with the operational aspects of vast amounts of data. They are the ones who are going for BigQuery, Redshift, Snowflake, Synapse, and so on because they want the platform to handle all the data modeling, access control, performance enhancements, but these are trade off. If you go with these platforms, then you are giving up on vendor neutrality. On the other side are those who have engineering skills. They want the independence. In other words, they don't want vendor lock in. They want to transform their data into any number of use cases, especially data science, machine learning use case. What they want is agility via open file formats using any compute engine. So why do I say lakehouses are not mature? Well, cloud data warehouses they provide you an excellent user experience. That is the main reason why Snowflake took off. If you have thousands of cables, it takes minutes to get them started, uploaded into your warehouse and start experimentation. Table formats are far more resonating with the community than file formats. But once the cost goes up of cloud data warehouse, then the organization start exploring lakehouses. But the problem is lakehouses still need to do a lot of work on metadata. Apache Hive was a fantastic first attempt at it. Even today Apache Hive is still very strong, but it's all technical metadata and it has so many different restrictions. That's why we see Databricks is investing into something called Unity Catalog. Hopefully we'll hear more about Unity Catalog at the end of the month. But there's a second problem. I just want to mention, and that is lack of standards. All these open source vendors, they're running, what I call ego projects. You see on LinkedIn, they're constantly battling with each other, but end user doesn't care. End user wants a problem to be solved. They want to use Trino, Dremio, Spark from EMR, Databricks, Ahana, DaaS, Frink, Athena. But the problem is that we don't have common standards. >> Right. Thanks. So Doug, I worry sometimes. I mean, I look at the space, we've debated for years, best of breed versus the full suite. You see AWS with whatever, 12 different plus data stores and different APIs and primitives. You got Oracle putting everything into its database. It's actually done some interesting things with MySQL HeatWave, so maybe there's proof points there, but Snowflake really good at data warehouse, simplifying data warehouse. Databricks, really good at making lakehouses actually more functional. Can one platform do it all? >> Well in a word, I can't be best at breed at all things. I think the upshot of and cogen analysis from Sanjeev there, the database, the vendors coming out of the database tradition, they excel at the SQL. They're extending it into data science, but when it comes to unstructured data, data science, ML AI often a compromise, the data lake crowd, the Databricks and such. They've struggled to completely displace the data warehouse when it really gets to the tough SLAs, they acknowledge that there's still a role for the warehouse. Maybe you can size down the warehouse and offload some of the BI workloads and maybe and some of these SQL engines, good for ad hoc, minimize data movement. But really when you get to the deep service level, a requirement, the high concurrency, the high query workloads, you end up creating something that's warehouse like. >> Where do you guys think this market is headed? What's going to take hold? Which projects are going to fade away? You got some things in Apache projects like Hudi and Iceberg, where do they fit Sanjeev? Do you have any thoughts on that? >> So thank you, Dave. So I feel that table formats are starting to mature. There is a lot of work that's being done. We will not have a single product or single platform. We'll have a mixture. So I see a lot of Apache Iceberg in the news. Apache Iceberg is really innovating. Their focus is on a table format, but then Delta and Apache Hudi are doing a lot of deep engineering work. For example, how do you handle high concurrency when there are multiple rights going on? Do you version your Parquet files or how do you do your upcerts basically? So different focus, at the end of the day, the end user will decide what is the right platform, but we are going to have multiple formats living with us for a long time. >> Doug is Iceberg in your view, something that's going to address some of those gaps in standards that Sanjeev was talking about earlier? >> Yeah, Delta lake, Hudi, Iceberg, they all address this need for consistency and scalability, Delta lake open technically, but open for access. I don't hear about Delta lakes in any worlds, but Databricks, hearing a lot of buzz about Apache Iceberg. End users want an open performance standard. And most recently Google embraced Iceberg for its recent a big lake, their stab at having supporting both lakes and warehouses on one conjoined platform. >> And Tony, of course, you remember the early days of the sort of big data movement you had MapR was the most closed. You had Horton works the most open. You had Cloudera in between. There was always this kind of contest as to who's the most open. Does that matter? Are we going to see a repeat of that here? >> I think it's spheres of influence, I think, and Doug very much was kind of referring to this. I would call it kind of like the MongoDB syndrome, which is that you have... and I'm talking about MongoDB before they changed their license, open source project, but very much associated with MongoDB, which basically, pretty much controlled most of the contributions made decisions. And I think Databricks has the same iron cloud hold on Delta lake, but still the market is pretty much associated Delta lake as the Databricks, open source project. I mean, Iceberg is probably further advanced than Hudi in terms of mind share. And so what I see that's breaking down to is essentially, basically the Databricks open source versus the everything else open source, the community open source. So I see it's a very similar type of breakdown that I see repeating itself here. >> So by the way, Mongo has a conference next week, another data platform is kind of not really relevant to this discussion totally. But in the sense it is because there's a lot of discussion on earnings calls these last couple of weeks about consumption and who's exposed, obviously people are concerned about Snowflake's consumption model. Mongo is maybe less exposed because Atlas is prominent in the portfolio, blah, blah, blah. But I wanted to bring up the little bit of controversy that we saw come out of the Snowflake earnings call, where the ever core analyst asked Frank Klutman about discretionary spend. And Frank basically said, look, we're not discretionary. We are deeply operationalized. Whereas he kind of poo-pooed the lakehouse or the data lake, et cetera, saying, oh yeah, data scientists will pull files out and play with them. That's really not our business. Do any of you have comments on that? Help us swing through that controversy. Who wants to take that one? >> Let's put it this way. The SQL folks are from Venus and the data scientists are from Mars. So it means it really comes down to it, sort that type of perception. The fact is, is that, traditionally with analytics, it was very SQL oriented and that basically the quants were kind of off in their corner, where they're using SaaS or where they're using Teradata. It's really a great leveler today, which is that, I mean basic Python it's become arguably one of the most popular programming languages, depending on what month you're looking at, at the title index. And of course, obviously SQL is, as I tell the MongoDB folks, SQL is not going away. You have a large skills base out there. And so basically I see this breaking down to essentially, you're going to have each group that's going to have its own natural preferences for its home turf. And the fact that basically, let's say the Python and scale of folks are using Databricks does not make them any less operational or machine critical than the SQL folks. >> Anybody else want to chime in on that one? >> Yeah, I totally agree with that. Python support in Snowflake is very nascent with all of Snowpark, all of the things outside of SQL, they're very much relying on partners too and make things possible and make data science possible. And it's very early days. I think the bottom line, what we're going to see is each of these camps is going to keep working on doing better at the thing that they don't do today, or they're new to, but they're not going to nail it. They're not going to be best of breed on both sides. So the SQL centric companies and shops are going to do more data science on their database centric platform. That data science driven companies might be doing more BI on their leagues with those vendors and the companies that have highly distributed data, they're going to add fabrics, and maybe offload more of their BI onto those engines, like Dremio and Starburst. >> So I've asked you this before, but I'll ask you Sanjeev. 'Cause Snowflake and Databricks are such great examples 'cause you have the data engineering crowd trying to go into data warehousing and you have the data warehousing guys trying to go into the lake territory. Snowflake has $5 billion in the balance sheet and I've asked you before, I ask you again, doesn't there has to be a semantic layer between these two worlds? Does Snowflake go out and do M&A and maybe buy ad scale or a data mirror? Or is that just sort of a bandaid? What are your thoughts on that Sanjeev? >> I think semantic layer is the metadata. The business metadata is extremely important. At the end of the day, the business folks, they'd rather go to the business metadata than have to figure out, for example, like let's say, I want to update somebody's email address and we have a lot of overhead with data residency laws and all that. I want my platform to give me the business metadata so I can write my business logic without having to worry about which database, which location. So having that semantic layer is extremely important. In fact, now we are taking it to the next level. Now we are saying that it's not just a semantic layer, it's all my KPIs, all my calculations. So how can I make those calculations independent of the compute engine, independent of the BI tool and make them fungible. So more disaggregation of the stack, but it gives us more best of breed products that the customers have to worry about. >> So I want to ask you about the stack, the modern data stack, if you will. And we always talk about injecting machine intelligence, AI into applications, making them more data driven. But when you look at the application development stack, it's separate, the database is tends to be separate from the data and analytics stack. Do those two worlds have to come together in the modern data world? And what does that look like organizationally? >> So organizationally even technically I think it is starting to happen. Microservices architecture was a first attempt to bring the application and the data world together, but they are fundamentally different things. For example, if an application crashes, that's horrible, but Kubernetes will self heal and it'll bring the application back up. But if a database crashes and corrupts your data, we have a huge problem. So that's why they have traditionally been two different stacks. They are starting to come together, especially with data ops, for instance, versioning of the way we write business logic. It used to be, a business logic was highly embedded into our database of choice, but now we are disaggregating that using GitHub, CICD the whole DevOps tool chain. So data is catching up to the way applications are. >> We also have databases, that trans analytical databases that's a little bit of what the story is with MongoDB next week with adding more analytical capabilities. But I think companies that talk about that are always careful to couch it as operational analytics, not the warehouse level workloads. So we're making progress, but I think there's always going to be, or there will long be a separate analytical data platform. >> Until data mesh takes over. (all laughing) Not opening a can of worms. >> Well, but wait, I know it's out of scope here, but wouldn't data mesh say, hey, do take your best of breed to Doug's earlier point. You can't be best of breed at everything, wouldn't data mesh advocate, data lakes do your data lake thing, data warehouse, do your data lake, then you're just a node on the mesh. (Tony laughs) Now you need separate data stores and you need separate teams. >> To my point. >> I think, I mean, put it this way. (laughs) Data mesh itself is a logical view of the world. The data mesh is not necessarily on the lake or on the warehouse. I think for me, the fear there is more in terms of, the silos of governance that could happen and the silo views of the world, how we redefine. And that's why and I want to go back to something what Sanjeev said, which is that it's going to be raising the importance of the semantic layer. Now does Snowflake that opens a couple of Pandora's boxes here, which is one, does Snowflake dare go into that space or do they risk basically alienating basically their partner ecosystem, which is a key part of their whole appeal, which is best of breed. They're kind of the same situation that Informatica was where in the early 2000s, when Informatica briefly flirted with analytic applications and realized that was not a good idea, need to redouble down on their core, which was data integration. The other thing though, that raises the importance of and this is where the best of breed comes in, is the data fabric. My contention is that and whether you use employee data mesh practice or not, if you do employee data mesh, you need data fabric. If you deploy data fabric, you don't necessarily need to practice data mesh. But data fabric at its core and admittedly it's a category that's still very poorly defined and evolving, but at its core, we're talking about a common meta data back plane, something that we used to talk about with master data management, this would be something that would be more what I would say basically, mutable, that would be more evolving, basically using, let's say, machine learning to kind of, so that we don't have to predefine rules or predefine what the world looks like. But so I think in the long run, what this really means is that whichever way we implement on whichever physical platform we implement, we need to all be speaking the same metadata language. And I think at the end of the day, regardless of whether it's a lake, warehouse or a lakehouse, we need common metadata. >> Doug, can I come back to something you pointed out? That those talking about bringing analytic and transaction databases together, you had talked about operationalizing those and the caution there. Educate me on MySQL HeatWave. I was surprised when Oracle put so much effort in that, and you may or may not be familiar with it, but a lot of folks have talked about that. Now it's got nowhere in the market, that no market share, but a lot of we've seen these benchmarks from Oracle. How real is that bringing together those two worlds and eliminating ETL? >> Yeah, I have to defer on that one. That's my colleague, Holger Mueller. He wrote the report on that. He's way deep on it and I'm not going to mock him. >> I wonder if that is something, how real that is or if it's just Oracle marketing, anybody have any thoughts on that? >> I'm pretty familiar with HeatWave. It's essentially Oracle doing what, I mean, there's kind of a parallel with what Google's doing with AlloyDB. It's an operational database that will have some embedded analytics. And it's also something which I expect to start seeing with MongoDB. And I think basically, Doug and Sanjeev were kind of referring to this before about basically kind of like the operational analytics, that are basically embedded within an operational database. The idea here is that the last thing you want to do with an operational database is slow it down. So you're not going to be doing very complex deep learning or anything like that, but you might be doing things like classification, you might be doing some predictives. In other words, we've just concluded a transaction with this customer, but was it less than what we were expecting? What does that mean in terms of, is this customer likely to turn? I think we're going to be seeing a lot of that. And I think that's what a lot of what MySQL HeatWave is all about. Whether Oracle has any presence in the market now it's still a pretty new announcement, but the other thing that kind of goes against Oracle, (laughs) that they had to battle against is that even though they own MySQL and run the open source project, everybody else, in terms of the actual commercial implementation it's associated with everybody else. And the popular perception has been that MySQL has been basically kind of like a sidelight for Oracle. And so it's on Oracles shoulders to prove that they're damn serious about it. >> There's no coincidence that MariaDB was launched the day that Oracle acquired Sun. Sanjeev, I wonder if we could come back to a topic that we discussed earlier, which is this notion of consumption, obviously Wall Street's very concerned about it. Snowflake dropped prices last week. I've always felt like, hey, the consumption model is the right model. I can dial it down in when I need to, of course, the street freaks out. What are your thoughts on just pricing, the consumption model? What's the right model for companies, for customers? >> Consumption model is here to stay. What I would like to see, and I think is an ideal situation and actually plays into the lakehouse concept is that, I have my data in some open format, maybe it's Parquet or CSV or JSON, Avro, and I can bring whatever engine is the best engine for my workloads, bring it on, pay for consumption, and then shut it down. And by the way, that could be Cloudera. We don't talk about Cloudera very much, but it could be one business unit wants to use Athena. Another business unit wants to use some other Trino let's say or Dremio. So every business unit is working on the same data set, see that's critical, but that data set is maybe in their VPC and they bring any compute engine, you pay for the use, shut it down. That then you're getting value and you're only paying for consumption. It's not like, I left a cluster running by mistake, so there have to be guardrails. The reason FinOps is so big is because it's very easy for me to run a Cartesian joint in the cloud and get a $10,000 bill. >> This looks like it's been a sort of a victim of its own success in some ways, they made it so easy to spin up single note instances, multi note instances. And back in the day when compute was scarce and costly, those database engines optimized every last bit so they could get as much workload as possible out of every instance. Today, it's really easy to spin up a new node, a new multi node cluster. So that freedom has meant many more nodes that aren't necessarily getting that utilization. So Snowflake has been doing a lot to add reporting, monitoring, dashboards around the utilization of all the nodes and multi node instances that have spun up. And meanwhile, we're seeing some of the traditional on-prem databases that are moving into the cloud, trying to offer that freedom. And I think they're going to have that same discovery that the cost surprises are going to follow as they make it easy to spin up new instances. >> Yeah, a lot of money went into this market over the last decade, separating compute from storage, moving to the cloud. I'm glad you mentioned Cloudera Sanjeev, 'cause they got it all started, the kind of big data movement. We don't talk about them that much. Sometimes I wonder if it's because when they merged Hortonworks and Cloudera, they dead ended both platforms, but then they did invest in a more modern platform. But what's the future of Cloudera? What are you seeing out there? >> Cloudera has a good product. I have to say the problem in our space is that there're way too many companies, there's way too much noise. We are expecting the end users to parse it out or we expecting analyst firms to boil it down. So I think marketing becomes a big problem. As far as technology is concerned, I think Cloudera did turn their selves around and Tony, I know you, you talked to them quite frequently. I think they have quite a comprehensive offering for a long time actually. They've created Kudu, so they got operational, they have Hadoop, they have an operational data warehouse, they're migrated to the cloud. They are in hybrid multi-cloud environment. Lot of cloud data warehouses are not hybrid. They're only in the cloud. >> Right. I think what Cloudera has done the most successful has been in the transition to the cloud and the fact that they're giving their customers more OnRamps to it, more hybrid OnRamps. So I give them a lot of credit there. They're also have been trying to position themselves as being the most price friendly in terms of that we will put more guardrails and governors on it. I mean, part of that could be spin. But on the other hand, they don't have the same vested interest in compute cycles as say, AWS would have with EMR. That being said, yes, Cloudera does it, I think its most powerful appeal so of that, it almost sounds in a way, I don't want to cast them as a legacy system. But the fact is they do have a huge landed legacy on-prem and still significant potential to land and expand that to the cloud. That being said, even though Cloudera is multifunction, I think it certainly has its strengths and weaknesses. And the fact this is that yes, Cloudera has an operational database or an operational data store with a kind of like the outgrowth of age base, but Cloudera is still based, primarily known for the deep analytics, the operational database nobody's going to buy Cloudera or Cloudera data platform strictly for the operational database. They may use it as an add-on, just in the same way that a lot of customers have used let's say Teradata basically to do some machine learning or let's say, Snowflake to parse through JSON. Again, it's not an indictment or anything like that, but the fact is obviously they do have their strengths and their weaknesses. I think their greatest opportunity is with their existing base because that base has a lot invested and vested. And the fact is they do have a hybrid path that a lot of the others lack. >> And of course being on the quarterly shock clock was not a good place to be under the microscope for Cloudera and now they at least can refactor the business accordingly. I'm glad you mentioned hybrid too. We saw Snowflake last month, did a deal with Dell whereby non-native Snowflake data could access on-prem object store from Dell. They announced a similar thing with pure storage. What do you guys make of that? Is that just... How significant will that be? Will customers actually do that? I think they're using either materialized views or extended tables. >> There are data rated and residency requirements. There are desires to have these platforms in your own data center. And finally they capitulated, I mean, Frank Klutman is famous for saying to be very focused and earlier, not many months ago, they called the going on-prem as a distraction, but clearly there's enough demand and certainly government contracts any company that has data residency requirements, it's a real need. So they finally addressed it. >> Yeah, I'll bet dollars to donuts, there was an EBC session and some big customer said, if you don't do this, we ain't doing business with you. And that was like, okay, we'll do it. >> So Dave, I have to say, earlier on you had brought this point, how Frank Klutman was poo-pooing data science workloads. On your show, about a year or so ago, he said, we are never going to on-prem. He burnt that bridge. (Tony laughs) That was on your show. >> I remember exactly the statement because it was interesting. He said, we're never going to do the halfway house. And I think what he meant is we're not going to bring the Snowflake architecture to run on-prem because it defeats the elasticity of the cloud. So this was kind of a capitulation in a way. But I think it still preserves his original intent sort of, I don't know. >> The point here is that every vendor will poo-poo whatever they don't have until they do have it. >> Yes. >> And then it'd be like, oh, we are all in, we've always been doing this. We have always supported this and now we are doing it better than others. >> Look, it was the same type of shock wave that we felt basically when AWS at the last moment at one of their reinvents, oh, by the way, we're going to introduce outposts. And the analyst group is typically pre briefed about a week or two ahead under NDA and that was not part of it. And when they dropped, they just casually dropped that in the analyst session. It's like, you could have heard the sound of lots of analysts changing their diapers at that point. >> (laughs) I remember that. And a props to Andy Jassy who once, many times actually told us, never say never when it comes to AWS. So guys, I know we got to run. We got some hard stops. Maybe you could each give us your final thoughts, Doug start us off and then-- >> Sure. Well, we've got the Snowflake Summit coming up. I'll be looking for customers that are really doing data science, that are really employing Python through Snowflake, through Snowpark. And then a couple weeks later, we've got Databricks with their Data and AI Summit in San Francisco. I'll be looking for customers that are really doing considerable BI workloads. Last year I did a market overview of this analytical data platform space, 14 vendors, eight of them claim to support lakehouse, both sides of the camp, Databricks customer had 32, their top customer that they could site was unnamed. It had 32 concurrent users doing 15,000 queries per hour. That's good but it's not up to the most demanding BI SQL workloads. And they acknowledged that and said, they need to keep working that. Snowflake asked for their biggest data science customer, they cited Kabura, 400 terabytes, 8,500 users, 400,000 data engineering jobs per day. I took the data engineering job to be probably SQL centric, ETL style transformation work. So I want to see the real use of the Python, how much Snowpark has grown as a way to support data science. >> Great. Tony. >> Actually of all things. And certainly, I'll also be looking for similar things in what Doug is saying, but I think sort of like, kind of out of left field, I'm interested to see what MongoDB is going to start to say about operational analytics, 'cause I mean, they're into this conquer the world strategy. We can be all things to all people. Okay, if that's the case, what's going to be a case with basically, putting in some inline analytics, what are you going to be doing with your query engine? So that's actually kind of an interesting thing we're looking for next week. >> Great. Sanjeev. >> So I'll be at MongoDB world, Snowflake and Databricks and very interested in seeing, but since Tony brought up MongoDB, I see that even the databases are shifting tremendously. They are addressing both the hashtag use case online, transactional and analytical. I'm also seeing that these databases started in, let's say in case of MySQL HeatWave, as relational or in MongoDB as document, but now they've added graph, they've added time series, they've added geospatial and they just keep adding more and more data structures and really making these databases multifunctional. So very interesting. >> It gets back to our discussion of best of breed, versus all in one. And it's likely Mongo's path or part of their strategy of course, is through developers. They're very developer focused. So we'll be looking for that. And guys, I'll be there as well. I'm hoping that we maybe have some extra time on theCUBE, so please stop by and we can maybe chat a little bit. Guys as always, fantastic. Thank you so much, Doug, Tony, Sanjeev, and let's do this again. >> It's been a pleasure. >> All right and thank you for watching. This is Dave Vellante for theCUBE and the excellent analyst. We'll see you next time. (upbeat music)

Published Date : Jun 2 2022

SUMMARY :

And Doug Henschen is the vice president Thank you. Doug let's start off with you And at the same time, me a lot of that material. And of course, at the and then we realized all the and Tony have brought to light. So I'm interested, the And in the cloud, So Sanjeev, is this all hype? But the problem is that we I mean, I look at the space, and offload some of the So different focus, at the end of the day, and warehouses on one conjoined platform. of the sort of big data movement most of the contributions made decisions. Whereas he kind of poo-pooed the lakehouse and the data scientists are from Mars. and the companies that have in the balance sheet that the customers have to worry about. the modern data stack, if you will. and the data world together, the story is with MongoDB Until data mesh takes over. and you need separate teams. that raises the importance of and the caution there. Yeah, I have to defer on that one. The idea here is that the of course, the street freaks out. and actually plays into the And back in the day when the kind of big data movement. We are expecting the end And the fact is they do have a hybrid path refactor the business accordingly. saying to be very focused And that was like, okay, we'll do it. So Dave, I have to say, the Snowflake architecture to run on-prem The point here is that and now we are doing that in the analyst session. And a props to Andy Jassy and said, they need to keep working that. Great. Okay, if that's the case, Great. I see that even the databases I'm hoping that we maybe have and the excellent analyst.

ENTITIES

Entity	Category	Confidence
Doug	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
Tony	PERSON	0.99+
Uber	ORGANIZATION	0.99+
Frank	PERSON	0.99+
Frank Klutman	PERSON	0.99+
Tony Baers	PERSON	0.99+
Mars	LOCATION	0.99+
Doug Henschen	PERSON	0.99+
2020	DATE	0.99+
AWS	ORGANIZATION	0.99+
Venus	LOCATION	0.99+
Oracle	ORGANIZATION	0.99+
2012	DATE	0.99+
Databricks	ORGANIZATION	0.99+
Dell	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Holger Mueller	PERSON	0.99+
Andy Jassy	PERSON	0.99+
last year	DATE	0.99+
$5 billion	QUANTITY	0.99+
$10,000	QUANTITY	0.99+
14 vendors	QUANTITY	0.99+
Last year	DATE	0.99+
last week	DATE	0.99+
San Francisco	LOCATION	0.99+
SanjMo	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
8,500 users	QUANTITY	0.99+
Sanjeev	PERSON	0.99+
Informatica	ORGANIZATION	0.99+
32 concurrent users	QUANTITY	0.99+
two	QUANTITY	0.99+
Constellation Research	ORGANIZATION	0.99+
Mongo	ORGANIZATION	0.99+
Sanjeev Mohan	PERSON	0.99+
Ahana	ORGANIZATION	0.99+
DaaS	ORGANIZATION	0.99+
EMR	ORGANIZATION	0.99+
32	QUANTITY	0.99+
Atlas	ORGANIZATION	0.99+
Delta	ORGANIZATION	0.99+
Snowflake	ORGANIZATION	0.99+
Python	TITLE	0.99+
each	QUANTITY	0.99+
Athena	ORGANIZATION	0.99+
next week	DATE	0.99+

Breaking Analysis: Enterprise Technology Predictions 2022

>> From theCUBE Studios in Palo Alto and Boston, bringing you data-driven insights from theCUBE and ETR, this is Breaking Analysis with Dave Vellante. >> The pandemic has changed the way we think about and predict the future. As we enter the third year of a global pandemic, we see the significant impact that it's had on technology strategy, spending patterns, and company fortunes Much has changed. And while many of these changes were forced reactions to a new abnormal, the trends that we've seen over the past 24 months have become more entrenched, and point to the way that's coming ahead in the technology business. Hello and welcome to this week's Wikibon CUBE Insights powered by ETR. In this Breaking Analysis, we welcome our partner and colleague and business friend, Erik Porter Bradley, as we deliver what's becoming an annual tradition for Erik and me, our predictions for Enterprise Technology in 2022 and beyond Erik, welcome. Thanks for taking some time out. >> Thank you, Dave. Luckily we did pretty well last year, so we were able to do this again. So hopefully we can keep that momentum going. >> Yeah, you know, I want to mention that, you know, we get a lot of inbound predictions from companies and PR firms that help shape our thinking. But one of the main objectives that we have is we try to make predictions that can be measured. That's why we use a lot of data. Now not all will necessarily fit that parameter, but if you've seen the grading of our 2021 predictions that Erik and I did, you'll see we do a pretty good job of trying to put forth prognostications that can be declared correct or not, you know, as black and white as possible. Now let's get right into it. Our first prediction, we're going to go run into spending, something that ETR surveys for quarterly. And we've reported extensively on this. We're calling for tech spending to increase somewhere around 8% in 2022, we can see there on the slide, Erik, we predicted spending last year would increase by 4% IDC. Last check was came in at five and a half percent. Gardner was somewhat higher, but in general, you know, not too bad, but looking ahead, we're seeing an acceleration from the ETR September surveys, as you can see in the yellow versus the blue bar in this chart, many of the SMBs that were hard hit by the pandemic are picking up spending again. And the ETR data is showing acceleration above the mean for industries like energy, utilities, retail, and services, and also, notably, in the Forbes largest 225 private companies. These are companies like Mars or Koch industries. They're predicting well above average spending for 2022. So Erik, please weigh in here. >> Yeah, a lot to bring up on this one, I'm going to be quick. So 1200 respondents on this, over a third of which were at the C-suite level. So really good data that we brought in, the usual bucket of, you know, fortune 500, global 2000 make up the meat of that median, but it's 8.3% and rising with momentum as we see. What's really interesting right now is that energy and utilities. This is usually like, you know, an orphan stock dividend type of play. You don't see them at the highest point of tech spending. And the reason why right now is really because this state of tech infrastructure in our energy infrastructure needs help. And it's obvious, remember the Florida municipality break reach last year? When they took over the water systems or they had the ability to? And this is a real issue, you know, there's bad nation state actors out there, and I'm no alarmist, but the energy and utility has to spend this money to keep up. It's really important. And then you also hit on the retail consumer. Obviously what's happened, the work from home shift created a shop from home shift, and the trends that are happening right now in retail. If you don't spend and keep up, you're not going to be around much longer. So I think the really two interesting things here to call out are energy utilities, usually a laggard in IT spend and it's leading, and also retail consumer, a lot of changes happening. >> Yeah. Great stuff. I mean, I recall when we entered the pandemic, really ETR was the first to emphasize the impact that work from home was going to have, so I really put a lot of weight on this data. Okay. Our next prediction is we're going to get into security, it's one of our favorite topics. And that is that the number one priority that needs to be addressed by organizations in 2022 is security and you can see, in this slide, the degree to which security is top of mind, relative to some other pretty important areas like cloud, productivity, data, and automation, and some others. Now people may say, "Oh, this is obvious." But I'm going to add some context here, Erik, and then bring you in. First, organizations, they don't have unlimited budgets. And there are a lot of competing priorities for dollars, especially with the digital transformation mandate. And depending on the size of the company, this data will vary. For example, while security is still number one at the largest public companies, and those are of course of the biggest spenders, it's not nearly as pronounced as it is on average, or in, for example, mid-sized companies and government agencies. And this is because midsized companies or smaller companies, they don't have the resources that larger companies do. Larger companies have done a better job of securing their infrastructure. So these mid-size firms are playing catch up and the data suggests cyber is even a bigger priority there, gaps that they have to fill, you know, going forward. And that's why we think there's going to be more demand for MSSPs, managed security service providers. And we may even see some IPO action there. And then of course, Erik, you and I have talked about events like the SolarWinds Hack, there's more ransomware attacks, other vulnerabilities. Just recently, like Log4j in December. All of this has heightened concerns. Now I want to talk a little bit more about how we measure this, you know, relatively, okay, it's an obvious prediction, but let's stick our necks out a little bit. And so in addition to the rise of managed security services, we're calling for M&A and/or IPOs, we've specified some names here on this chart, and we're also pointing to the digital supply chain as an area of emphasis. Again, Log4j really shone that under a light. And this is going to help the likes of Auth0, which is now Okta, SailPoint, which is called out on this chart, and some others. We're calling some winners in end point security. Erik, you're going to talk about sort of that lifecycle, that transformation that we're seeing, that migration to new endpoint technologies that are going to benefit from this reset refresh cycle. So Erik, weigh in here, let's talk about some of the elements of this prediction and some of the names on that chart. >> Yeah, certainly. I'm going to start right with Log4j top of mind. And the reason why is because we're seeing a real paradigm shift here where things are no longer being attacked at the network layer, they're being attacked at the application layer, and in the application stack itself. And that is a huge shift left. And that's taking in DevSecOps now as a real priority in 2022. That's a real paradigm shift over the last 20 years. That's not where attacks used to come from. And this is going to have a lot of changes. You called out a bunch of names in there that are, they're either going to work. I would add to that list Wiz. I would add Orca Security. Two names in our emerging technology study, in addition to the ones you added that are involved in cloud security and container security. These names are either going to get gobbled up. So the traditional legacy names are going to have to start writing checks and, you know, legacy is not fair, but they're in the data center, right? They're, on-prem, they're not cloud native. So these are the names that money is going to be flowing to. So they're either going to get gobbled up, or we're going to see some IPO's. And on the other thing I want to talk about too, is what you mentioned. We have CrowdStrike on that list, We have SentinalOne on the list. Everyone knows them. Our data was so strong on Tanium that we actually went positive for the first time just today, just this morning, where that was released. The trifecta of these are so important because of what you mentioned, under resourcing. We can't have security just tell us when something happens, it has to automate, and it has to respond. So in this next generation of EDR and XDR, an automated response has to happen because people are under-resourced, salaries are really high, there's a skill shortage out there. Security has to become responsive. It can't just monitor anymore. >> Yeah. Great. And we should call out too. So we named some names, Snyk, Aqua, Arctic Wolf, Lacework, Netskope, Illumio. These are all sort of IPO, or possibly even M&A candidates. All right. Our next prediction goes right to the way we work. Again, something that ETR has been on for awhile. We're calling for a major rethink in remote work for 2022. We had predicted last year that by the end of 2021, there'd be a larger return to the office with the norm being around a third of workers permanently remote. And of course the variants changed that equation and, you know, gave more time for people to think about this idea of hybrid work and that's really come in to focus. So we're predicting that is going to overtake fully remote as the dominant work model with only about a third of the workers back in the office full-time. And Erik, we expect a somewhat lower percentage to be fully remote. It's now sort of dipped under 30%, at around 29%, but it's still significantly higher than the historical average of around 15 to 16%. So still a major change, but this idea of hybrid and getting hybrid right, has really come into focus. Hasn't it? >> Yeah. It's here to stay. There's no doubt about it. We started this in March of 2020, as soon as the virus hit. This is the 10th iteration of the survey. No one, no one ever thought we'd see a number where only 34% of people were going to be in office permanently. That's a permanent number. They're expecting only a third of the workers to ever come back fully in office. And against that, there's 63% that are saying their permanent workforce is going to be either fully remote or hybrid. And this, I can't really explain how big of a paradigm shift this is. Since the start of the industrial revolution, people leave their house and go to work. Now they're saying that's not going to happen. The economic impact here is so broad, on so many different areas And, you know, the reason is like, why not? Right? The productivity increase is real. We're seeing the productivity increase. Enterprises are spending on collaboration tools, productivity tools, We're seeing an increased perception in productivity of their workforce. And the CFOs can cut down an expense item. I just don't see a reason why this would end, you know, I think it's going to continue. And I also want to point out these results, as high as they are, were before the Omicron wave hit us. I can only imagine what these results would have been if we had sent the survey out just two or three weeks later. >> Yeah. That's a great point. Okay. Next prediction, we're going to look at the supply chain, specifically in how it's affecting some of the hardware spending and cloud strategies in the future. So in this chart, ETRS buyers, have you experienced problems procuring hardware as a result of supply chain issues? And, you know, despite the fact that some companies are, you know, I would call out Dell, for example, doing really well in terms of delivering, you can see that in the numbers, it's pretty clear, there's been an impact. And that's not not an across the board, you know, thing where vendors are able to deliver, especially acute in PCs, but also pronounced in networking, also in firewall servers and storage. And what's interesting is how companies are responding and reacting. So first, you know, I'm going to call the laptop and PC demand staying well above pre-COVID norms. It had peaked in 2012. Pre-pandemic it kept dropping and dropping and dropping, in terms of, you know, unit volume, where the market was contracting. And we think can continue to grow this year in double digits in 2022. But what's interesting, Erik, is when you survey customers, is despite the difficulty they're having in procuring network hardware, there's as much of a migration away from existing networks to the cloud. You could probably comment on that. Their networks are more fossilized, but when it comes to firewalls and servers and storage, there's a much higher propensity to move to the cloud. 30% of customers that ETR surveyed will replace security appliances with cloud services and 41% and 34% respectively will move to cloud compute and storage in 2022. So cloud's relentless march on traditional on-prem models continues. Erik, what do you make of this data? Please weigh in on this prediction. >> As if we needed another reason to go to the cloud. Right here, here it is yet again. So this was added to the survey by client demand. They were asking about the procurement difficulties, the supply chain issues, and how it was impacting our community. So this is the first time we ran it. And it really was interesting to see, you know, the move there. And storage particularly I found interesting because it correlated with a huge jump that we saw on one of our vendor names, which was Rubrik, had the highest net score that it's ever had. So clearly we're seeing some correlation with some of these names that are there, you know, really well positioned to take storage, to take data into the cloud. So again, you didn't need another reason to, you know, hasten this digital transformation, but here we are, we have it yet again, and I don't see it slowing down anytime soon. >> You know, that's a really good point. I mean, it's not necessarily bad news for the... I mean, obviously you wish that it had no change, would be great, but things, you know, always going to change. So we'll talk about this a little bit later when we get into the Supercloud conversation, but this is an opportunity for people who embrace the cloud. So we'll come back to that. And I want to hang on cloud a bit and share some recent projections that we've made. The next prediction is the big four cloud players are going to surpass 167 billion, an IaaS and PaaS revenue in 2022. We track this. Observers of this program know that we try to create an apples to apples comparison between AWS, Azure, GCP and Alibaba in IaaS and PaaS. So we're calling for 38% revenue growth in 2022, which is astounding for such a massive market. You know, AWS is probably not going to hit a hundred billion dollar run rate, but they're going to be close this year. And we're going to get there by 2023, you know they're going to surpass that. Azure continues to close the gap. Now they're about two thirds of the size of AWS and Google, we think is going to surpass Alibaba and take the number three spot. Erik, anything you'd like to add here? >> Yeah, first of all, just on a sector level, we saw our sector, new survey net score on cloud jumped another 10%. It was already really high at 48. Went up to 53. This train is not slowing down anytime soon. And we even added an edge compute type of player, like CloudFlare into our cloud bucket this year. And it debuted with a net score of almost 60. So this is really an area that's expanding, not just the big three, but everywhere. We even saw Oracle and IBM jump up. So even they're having success, taking some of their on-prem customers and then selling them to their cloud services. This is a massive opportunity and it's not changing anytime soon, it's going to continue. >> And I think the operative word there is opportunity. So, you know, the next prediction is something that we've been having fun with and that's this Supercloud becomes a thing. Now, the reason I say we've been having fun is we put this concept of Supercloud out and it's become a bit of a controversy. First, you know, what the heck's the Supercloud right? It's sort of a buzz-wordy term, but there really is, we believe, a thing here. We think there needs to be a rethinking or at least an evolution of the term multi-cloud. And what we mean is that in our view, you know, multicloud from a vendor perspective was really cloud compatibility. It wasn't marketed that way, but that's what it was. Either a vendor would containerize its legacy stack, shove it into the cloud, or a company, you know, they'd do the work, they'd build a cloud native service on one of the big clouds and they did do it for AWS, and then Azure, and then Google. But there really wasn't much, if any, leverage across clouds. Now from a buyer perspective, we've always said multicloud was a symptom of multi-vendor, meaning I got different workloads, running in different clouds, or I bought a company and they run on Azure, and I do a lot of work on AWS, but generally it wasn't necessarily a prescribed strategy to build value on top of hyperscale infrastructure. There certainly was somewhat of a, you know, reducing lock-in and hedging the risk. But we're talking about something more here. We're talking about building value on top of the hyperscale gift of hundreds of billions of dollars in CapEx. So in addition, we're not just talking about transforming IT, which is what the last 10 years of cloud have been like. And, you know, doing work in the cloud because it's cheaper or simpler or more agile, all of those things. So that's beginning to change. And this chart shows some of the technology vendors that are leaning toward this Supercloud vision, in our view, building on top of the hyperscalers that are highlighted in red. Now, Jerry Chan at Greylock, they wrote a piece called Castles in the Cloud. It got our thinking going, and he and the team at Greylock, they're building out a database of all the cloud services and all the sub-markets in cloud. And that got us thinking that there's a higher level of abstraction coalescing in the market, where there's tight integration of services across clouds, but the underlying complexity is hidden, and there's an identical experience across clouds, and even, in my dreams, on-prem for some platforms, so what's new or new-ish and evolving are things like location independence, you've got to include the edge on that, metadata services to optimize locality of reference and data source awareness, governance, privacy, you know, application independent and dependent, actually, recovery across clouds. So we're seeing this evolve. And in our view, the two biggest things that are new are the technology is evolving, where you're seeing services truly integrate cross-cloud. And the other big change is digital transformation, where there's this new innovation curve developing, and it's not just about making your IT better. It's about SaaS-ifying and automating your entire company workflows. So Supercloud, it's not just a vendor thing to us. It's the evolution of, you know, the, the Marc Andreessen quote, "Every company will be a SaaS company." Every company will deliver capabilities that can be consumed as cloud services. So Erik, the chart shows spending momentum on the y-axis and net score, or presence in the ETR data center, or market share on the x-axis. We've talked about snowflake as the poster child for this concept where the vision is you're in their cloud and sharing data in that safe place. Maybe you could make some comments, you know, what do you think of this Supercloud concept and this change that we're sensing in the market? >> Well, I think you did a great job describing the concept. So maybe I'll support it a little bit on the vendor level and then kind of give examples of the ones that are doing it. You stole the lead there with Snowflake, right? There is no better example than what we've seen with what Snowflake can do. Cross-portability in the cloud, the ability to be able to be, you know, completely agnostic, but then build those services on top. They're better than anything they could offer. And it's not just there. I mean, you mentioned edge compute, that's a whole nother layer where this is coming in. And CloudFlare, the momentum there is out of control. I mean, this is a company that started off just doing CDN and trying to compete with Okta Mite. And now they're giving you a full soup to nuts with security and actual edge compute layer, but it's a fantastic company. What they're doing, it's another great example of what you're seeing here. I'm going to call out HashiCorp as well. They're more of an infrastructure services, a little bit more of an open-source freemium model, but what they're doing as well is completely cloud agnostic. It's dynamic. It doesn't care if you're in a container, it doesn't matter where you are. They recently IPO'd and they're down 25%, but their data looks so good across both of our emerging technology and TISA survey. It's certainly another name that's playing on this. And another one that we mentioned as well is Rubrik. If you need storage, compute, and in the cloud layer and you need to be agnostic to it, they're another one that's really playing in this space. So I think it's a great concept you're bringing up. I think it's one that's here to stay and there's certainly a lot of vendors that fit into what you're describing. >> Excellent. Thank you. All right, let's shift to data. The next prediction, it might be a little tough to measure. Before I said we're trying to be a little black and white here, but it relates to Data Mesh, which is, the ideas behind that term were created by Zhamak Dehghani of ThoughtWorks. And we see Data Mesh is really gaining momentum in 2022, but it's largely going to be, we think, confined to a more narrow scope. Now, the impetus for change in data architecture in many companies really stems from the fact that their Hadoop infrastructure really didn't solve their data problems and they struggle to get more value out of their data investments. Data Mesh prescribes a shift to a decentralized architecture in domain ownership of data and a shift to data product thinking, beyond data for analytics, but data products and services that can be monetized. Now this a very powerful in our view, but they're difficult for organizations to get their heads around and further decentralization creates the need for a self-service platform and federated data governance that can be automated. And not a lot of standards around this. So it's going to take some time. At our power panel a couple of weeks ago on data management, Tony Baer predicted a backlash on Data Mesh. And I don't think it's going to be so much of a backlash, but rather the adoption will be more limited. Most implementations we think are going to use a starting point of AWS and they'll enable domains to access and control their own data lakes. And while that is a very small slice of the Data Mesh vision, I think it's going to be a starting point. And the last thing I'll say is, this is going to take a decade to evolve, but I think it's the right direction. And whether it's a data lake or a data warehouse or a data hub or an S3 bucket, these are really, the concept is, they'll eventually just become nodes on the data mesh that are discoverable and access is governed. And so the idea is that the stranglehold that the data pipeline and process and hyper-specialized roles that they have on data agility is going to evolve. And decentralized architectures and the democratization of data will eventually become a norm for a lot of different use cases. And Erik, I wonder if you'd add anything to this. >> Yeah. There's a lot to add there. The first thing that jumped out to me was that that mention of the word backlash you said, and you said it's not really a backlash, but what it could be is these are new words trying to solve an old problem. And I do think sometimes the industry will notice that right away and maybe that'll be a little pushback. And the problems are what you already mentioned, right? We're trying to get to an area where we can have more assets in our data site, more deliverable, and more usable and relevant to the business. And you mentioned that as self-service with governance laid on top. And that's really what we're trying to get to. Now, there's a lot of ways you can get there. Data fabric is really the technical aspect and data mesh is really more about the people, the process, and the governance, but the two of those need to meet, in order to make that happen. And as far as tools, you know, there's even cataloging names like Informatica that play in this, right? Istio plays in this, Snowflake plays in this. So there's a lot of different tools that will support it. But I think you're right in calling out AWS, right? They have AWS Lake, they have AWS Glue. They have so much that's trying to drive this. But I think the really important thing to keep here is what you said. It's going to be a decade long journey. And by the way, we're on the shoulders of giants a decade ago that have even gotten us to this point to talk about these new words because this has been an ongoing type of issue, but ultimately, no matter which vendors you use, this is going to come down to your data governance plan and the data literacy in your business. This is really about workflows and people as much as it is tools. So, you know, the new term of data mesh is wonderful, but you still have to have the people and the governance and the processes in place to get there. >> Great, thank you for that, Erik. Some great points. All right, for the next prediction, we're going to shine the spotlight on two of our favorite topics, Snowflake and Databricks, and the prediction here is that, of course, Databricks is going to IPO this year, as expected. Everybody sort of expects that. And while, but the prediction really is, well, while these two companies are facing off already in the market, they're also going to compete with each other for M&A, especially as Databricks, you know, after the IPO, you're going to have, you know, more prominence and a war chest. So first, these companies, they're both looking pretty good, the same XY graph with spending velocity and presence and market share on the horizontal axis. And both Snowflake and Databricks are well above that magic 40% red dotted line, the elevated line, to us. And for context, we've included a few other firms. So you can see kind of what a good position these two companies are really in, especially, I mean, Snowflake, wow, it just keeps moving to the right on this horizontal picture, but maintaining the next net score in the Y axis. Amazing. So, but here's the thing, Databricks is using the term Lakehouse implying that it has the best of data lakes and data warehouses. And Snowflake has the vision of the data cloud and data sharing. And Snowflake, they've nailed analytics, and now they're moving into data science in the domain of Databricks. Databricks, on the other hand, has nailed data science and is moving into the domain of Snowflake, in the data warehouse and analytics space. But to really make this seamless, there has to be a semantic layer between these two worlds and they're either going to build it or buy it or both. And there are other areas like data clean rooms and privacy and data prep and governance and machine learning tooling and AI, all that stuff. So the prediction is they'll not only compete in the market, but they'll step up and in their competition for M&A, especially after the Databricks IPO. We've listed some target names here, like Atscale, you know, Iguazio, Infosum, Habu, Immuta, and I'm sure there are many, many others. Erik, you care to comment? >> Yeah. I remember a year ago when we were talking Snowflake when they first came out and you, and I said, "I'm shocked if they don't use this war chest of money" "and start going after more" "because we know Slootman, we have so much respect for him." "We've seen his playbook." And I'm actually a little bit surprised that here we are, at 12 months later, and he hasn't spent that money yet. So I think this prediction's just spot on. To talk a little bit about the data side, Snowflake is in rarefied air. It's all by itself. It is the number one net score in our entire TISA universe. It is absolutely incredible. There's almost no negative intentions. Global 2000 organizations are increasing their spend on it. We maintain our positive outlook. It's really just, you know, stands alone. Databricks, however, also has one of the highest overall net sentiments in the entire universe, not just its area. And this is the first time we're coming up positive on this name as well. It looks like it's not slowing down. Really interesting comment you made though that we normally hear from our end-user commentary in our panels and our interviews. Databricks is really more used for the data science side. The MLAI is where it's best positioned in our survey. So it might still have some catching up to do to really have that caliber of usability that you know Snowflake is seeing right now. That's snowflake having its own marketplace. There's just a lot more to Snowflake right now than there is Databricks. But I do think you're right. These two massive vendors are sort of heading towards a collision course, and it'll be very interesting to see how they deploy their cash. I think Snowflake, with their incredible management and leadership, probably will make the first move. >> Well, I think you're right on that. And by the way, I'll just add, you know, Databricks has basically said, hey, it's going to be easier for us to come from data lakes into data warehouse. I'm not sure I buy that. I think, again, that semantic layer is a missing ingredient. So it's going to be really interesting to see how this plays out. And to your point, you know, Snowflake's got the war chest, they got the momentum, they've got the public presence now since November, 2020. And so, you know, they're probably going to start making some aggressive moves. Anyway, next prediction is something, Erik, that you and I have talked about many, many times, and that is observability. I know it's one of your favorite topics. And we see this world screaming for more consolidation it's going all in on cloud native. These legacy stacks, they're fighting to stay relevant, but the direction is pretty clear. And the same XY graph lays out the players in the field, with some of the new entrants that we've also highlighted, like Observe and Honeycomb and ChaosSearch that we've talked about. Erik, we put a big red target around Splunk because everyone wants their gold. So please give us your thoughts. >> Oh man, I feel like I've been saying negative things about Splunk for too long. I've got a bad rap on this name. The Splunk shareholders come after me all the time. Listen, it really comes down to this. They're a fantastic company that was designed to do logging and monitoring and had some great tool sets around what you could do with it. But they were designed for the data center. They were designed for prem. The world we're in now is so dynamic. Everything I hear from our end user community is that all net new workloads will be going to cloud native players. It's that simple. So Splunk has entrenched. It's going to continue doing what it's doing and it does it really, really well. But if you're doing something new, the new workloads are going to be in a dynamic environment and that's going to go to the cloud native players. And in our data, it is extremely clear that that means Datadog and Elastic. They are by far number one and two in net score, increase rates, adoption rates. It's not even close. Even New Relic actually is starting to, you know, entrench itself really well. We saw New Relic's adoption's going up, which is super important because they went to that freemium model, you know, to try to get their little bit of an entrenched customer base and that's working as well. And then you made a great list here, of all the new entrants, but it goes beyond this. There's so many more. In our emerging technology survey, we're seeing Century, Catchpoint, Securonix, Lucid Works. There are so many options in this space. And let's not forget, the biggest data that we're seeing is with Grafana. And Grafana labs as yet to turn on their enterprise. Elastic did it, why can't Grafana labs do it? They have an enterprise stack. So when you look at how crowded this space is, there has to be consolidation. I recently hosted a panel and every single guy on that panel said, "Please give me a consolidation." Because they're the end users trying to actually deploy these and it's getting a little bit confusing. >> Great. Thank you for that. Okay. Last prediction. Erik, might be a little out of your wheelhouse, but you know, you might have some thoughts on it. And that's a hybrid events become the new digital model and a new category in 2022. You got these pure play digital or virtual events. They're going to take a back seat to in-person hybrids. The virtual experience will eventually give way to metaverse experiences and that's going to take some time, but the physical hybrid is going to drive it. And metaverse is ultimately going to define the virtual experience because the virtual experience today is not great. Nobody likes virtual. And hybrid is going to become the business model. Today's pure virtual experience has to evolve, you know, theCUBE first delivered hybrid mid last decade, but nobody really wanted it. We did Mobile World Congress last summer in Barcelona in an amazing hybrid model, which we're showing in some of the pictures here. Alex, if you don't mind bringing that back up. And every physical event that we're we're doing now has a hybrid and virtual component, including the pre-records. You can see in our studios, you see that the green screen. I don't know. Erik, what do you think about, you know, the Zoom fatigue and all this. I know you host regular events with your round tables, but what are your thoughts? >> Well, first of all, I think you and your company here have just done an amazing job on this. So that's really your expertise. I spent 20 years of my career hosting intimate wall street idea dinners. So I'm better at navigating a wine list than I am navigating a conference floor. But I will say that, you know, the trend just goes along with what we saw. If 35% are going to be fully remote. If 70% are going to be hybrid, then our events are going to be as well. I used to host round table dinners on, you know, one or two nights a week. Now those have gone virtual. They're now panels. They're now one-on-one interviews. You know, we do chats. We do submitted questions. We do what we can, but there's no reason that this is going to change anytime soon. I think you're spot on here. >> Yeah. Great. All right. So there you have it, Erik and I, Listen, we always love the feedback. Love to know what you think. Thank you, Erik, for your partnership, your collaboration, and love doing these predictions with you. >> Yeah. I always enjoy them too. And I'm actually happy. Last year you made us do a baker's dozen, so thanks for keeping it to 10 this year. >> (laughs) We've got a lot to say. I know, you know, we cut out. We didn't do much on crypto. We didn't really talk about SaaS. I mean, I got some thoughts there. We didn't really do much on containers and AI. >> You want to keep going? I've got another 10 for you. >> RPA...All right, we'll have you back and then let's do that. All right. All right. Don't forget, these episodes are all available as podcasts, wherever you listen, all you can do is search Breaking Analysis podcast. Check out ETR's website at etr.plus, they've got a new website out. It's the best data in the industry, and we publish a full report every week on wikibon.com and siliconangle.com. You can always reach out on email, David.Vellante@siliconangle.com I'm @DVellante on Twitter. Comment on our LinkedIn posts. This is Dave Vellante for the Cube Insights powered by ETR. Have a great week, stay safe, be well. And we'll see you next time. (mellow music)

Published Date : Jan 22 2022

SUMMARY :

bringing you data-driven and predict the future. So hopefully we can keep to mention that, you know, And this is a real issue, you know, And that is that the number one priority and in the application stack itself. And of course the variants And the CFOs can cut down an expense item. the board, you know, thing interesting to see, you know, and take the number three spot. not just the big three, but everywhere. It's the evolution of, you know, the, the ability to be able to be, and the democratization of data and the processes in place to get there. and is moving into the It is the number one net score And by the way, I'll just add, you know, and that's going to go to has to evolve, you know, that this is going to change anytime soon. Love to know what you think. so thanks for keeping it to 10 this year. I know, you know, we cut out. You want to keep going? This is Dave Vellante for the

ENTITIES

Entity	Category	Confidence
Erik	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Jerry Chan	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
March of 2020	DATE	0.99+
Dave Vellante	PERSON	0.99+
Zhamak Dehghani	PERSON	0.99+
Dave	PERSON	0.99+
Marc Andreessen	PERSON	0.99+
Google	ORGANIZATION	0.99+
2022	DATE	0.99+
Tony Baer	PERSON	0.99+
Alex	PERSON	0.99+
Databricks	ORGANIZATION	0.99+
8.3%	QUANTITY	0.99+
2021	DATE	0.99+
December	DATE	0.99+
38%	QUANTITY	0.99+
last year	DATE	0.99+
November, 2020	DATE	0.99+
two	QUANTITY	0.99+
20 years	QUANTITY	0.99+
Last year	DATE	0.99+
Erik Porter Bradley	PERSON	0.99+
Alibaba	ORGANIZATION	0.99+
41%	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
Mars	ORGANIZATION	0.99+
Dell	ORGANIZATION	0.99+
40%	QUANTITY	0.99+
30%	QUANTITY	0.99+
Netskope	ORGANIZATION	0.99+
one	QUANTITY	0.99+
Boston	LOCATION	0.99+
Grafana	ORGANIZATION	0.99+
63%	QUANTITY	0.99+
Arctic Wolf	ORGANIZATION	0.99+
167 billion	QUANTITY	0.99+
Slootman	PERSON	0.99+
two companies	QUANTITY	0.99+
35%	QUANTITY	0.99+
34%	QUANTITY	0.99+
Snyk	ORGANIZATION	0.99+
70%	QUANTITY	0.99+
Florida	LOCATION	0.99+
Palo Alto	LOCATION	0.99+
4%	QUANTITY	0.99+
Greylock	ORGANIZATION	0.99+

Predictions 2022: Top Analysts See the Future of Data

(bright music) >> In the 2010s, organizations became keenly aware that data would become the key ingredient to driving competitive advantage, differentiation, and growth. But to this day, putting data to work remains a difficult challenge for many, if not most organizations. Now, as the cloud matures, it has become a game changer for data practitioners by making cheap storage and massive processing power readily accessible. We've also seen better tooling in the form of data workflows, streaming, machine intelligence, AI, developer tools, security, observability, automation, new databases and the like. These innovations they accelerate data proficiency, but at the same time, they add complexity for practitioners. Data lakes, data hubs, data warehouses, data marts, data fabrics, data meshes, data catalogs, data oceans are forming, they're evolving and exploding onto the scene. So in an effort to bring perspective to the sea of optionality, we've brought together the brightest minds in the data analyst community to discuss how data management is morphing and what practitioners should expect in 2022 and beyond. Hello everyone, my name is Dave Velannte with theCUBE, and I'd like to welcome you to a special Cube presentation, analysts predictions 2022: the future of data management. We've gathered six of the best analysts in data and data management who are going to present and discuss their top predictions and trends for 2022 in the first half of this decade. Let me introduce our six power panelists. Sanjeev Mohan is former Gartner Analyst and Principal at SanjMo. Tony Baer, principal at dbInsight, Carl Olofson is well-known Research Vice President with IDC, Dave Menninger is Senior Vice President and Research Director at Ventana Research, Brad Shimmin, Chief Analyst, AI Platforms, Analytics and Data Management at Omdia and Doug Henschen, Vice President and Principal Analyst at Constellation Research. Gentlemen, welcome to the program and thanks for coming on theCUBE today. >> Great to be here. >> Thank you. >> All right, here's the format we're going to use. I as moderator, I'm going to call on each analyst separately who then will deliver their prediction or mega trend, and then in the interest of time management and pace, two analysts will have the opportunity to comment. If we have more time, we'll elongate it, but let's get started right away. Sanjeev Mohan, please kick it off. You want to talk about governance, go ahead sir. >> Thank you Dave. I believe that data governance which we've been talking about for many years is now not only going to be mainstream, it's going to be table stakes. And all the things that you mentioned, you know, the data, ocean data lake, lake houses, data fabric, meshes, the common glue is metadata. If we don't understand what data we have and we are governing it, there is no way we can manage it. So we saw Informatica went public last year after a hiatus of six. I'm predicting that this year we see some more companies go public. My bet is on Culebra, most likely and maybe Alation we'll see go public this year. I'm also predicting that the scope of data governance is going to expand beyond just data. It's not just data and reports. We are going to see more transformations like spark jawsxxxxx, Python even Air Flow. We're going to see more of a streaming data. So from Kafka Schema Registry, for example. We will see AI models become part of this whole governance suite. So the governance suite is going to be very comprehensive, very detailed lineage, impact analysis, and then even expand into data quality. We already seen that happen with some of the tools where they are buying these smaller companies and bringing in data quality monitoring and integrating it with metadata management, data catalogs, also data access governance. So what we are going to see is that once the data governance platforms become the key entry point into these modern architectures, I'm predicting that the usage, the number of users of a data catalog is going to exceed that of a BI tool. That will take time and we already seen that trajectory. Right now if you look at BI tools, I would say there a hundred users to BI tool to one data catalog. And I see that evening out over a period of time and at some point data catalogs will really become the main way for us to access data. Data catalog will help us visualize data, but if we want to do more in-depth analysis, it'll be the jumping off point into the BI tool, the data science tool and that is the journey I see for the data governance products. >> Excellent, thank you. Some comments. Maybe Doug, a lot of things to weigh in on there, maybe you can comment. >> Yeah, Sanjeev I think you're spot on, a lot of the trends the one disagreement, I think it's really still far from mainstream. As you say, we've been talking about this for years, it's like God, motherhood, apple pie, everyone agrees it's important, but too few organizations are really practicing good governance because it's hard and because the incentives have been lacking. I think one thing that deserves mention in this context is ESG mandates and guidelines, these are environmental, social and governance, regs and guidelines. We've seen the environmental regs and guidelines and posts in industries, particularly the carbon-intensive industries. We've seen the social mandates, particularly diversity imposed on suppliers by companies that are leading on this topic. We've seen governance guidelines now being imposed by banks on investors. So these ESGs are presenting new carrots and sticks, and it's going to demand more solid data. It's going to demand more detailed reporting and solid reporting, tighter governance. But we're still far from mainstream adoption. We have a lot of, you know, best of breed niche players in the space. I think the signs that it's going to be more mainstream are starting with things like Azure Purview, Google Dataplex, the big cloud platform players seem to be upping the ante and starting to address governance. >> Excellent, thank you Doug. Brad, I wonder if you could chime in as well. >> Yeah, I would love to be a believer in data catalogs. But to Doug's point, I think that it's going to take some more pressure for that to happen. I recall metadata being something every enterprise thought they were going to get under control when we were working on service oriented architecture back in the nineties and that didn't happen quite the way we anticipated. And so to Sanjeev's point it's because it is really complex and really difficult to do. My hope is that, you know, we won't sort of, how do I put this? Fade out into this nebula of domain catalogs that are specific to individual use cases like Purview for getting data quality right or like data governance and cybersecurity. And instead we have some tooling that can actually be adaptive to gather metadata to create something. And I know its important to you, Sanjeev and that is this idea of observability. If you can get enough metadata without moving your data around, but understanding the entirety of a system that's running on this data, you can do a lot. So to help with the governance that Doug is talking about. >> So I just want to add that, data governance, like any other initiatives did not succeed even AI went into an AI window, but that's a different topic. But a lot of these things did not succeed because to your point, the incentives were not there. I remember when Sarbanes Oxley had come into the scene, if a bank did not do Sarbanes Oxley, they were very happy to a million dollar fine. That was like, you know, pocket change for them instead of doing the right thing. But I think the stakes are much higher now. With GDPR, the flood gates opened. Now, you know, California, you know, has CCPA but even CCPA is being outdated with CPRA, which is much more GDPR like. So we are very rapidly entering a space where pretty much every major country in the world is coming up with its own compliance regulatory requirements, data residents is becoming really important. And I think we are going to reach a stage where it won't be optional anymore. So whether we like it or not, and I think the reason data catalogs were not successful in the past is because we did not have the right focus on adoption. We were focused on features and these features were disconnected, very hard for business to adopt. These are built by IT people for IT departments to take a look at technical metadata, not business metadata. Today the tables have turned. CDOs are driving this initiative, regulatory compliances are beating down hard, so I think the time might be right. >> Yeah so guys, we have to move on here. But there's some real meat on the bone here, Sanjeev. I like the fact that you called out Culebra and Alation, so we can look back a year from now and say, okay, he made the call, he stuck it. And then the ratio of BI tools to data catalogs that's another sort of measurement that we can take even though with some skepticism there, that's something that we can watch. And I wonder if someday, if we'll have more metadata than data. But I want to move to Tony Baer, you want to talk about data mesh and speaking, you know, coming off of governance. I mean, wow, you know the whole concept of data mesh is, decentralized data, and then governance becomes, you know, a nightmare there, but take it away, Tony. >> We'll put this way, data mesh, you know, the idea at least as proposed by ThoughtWorks. You know, basically it was at least a couple of years ago and the press has been almost uniformly almost uncritical. A good reason for that is for all the problems that basically Sanjeev and Doug and Brad we're just speaking about, which is that we have all this data out there and we don't know what to do about it. Now, that's not a new problem. That was a problem we had in enterprise data warehouses, it was a problem when we had over DoOP data clusters, it's even more of a problem now that data is out in the cloud where the data is not only your data lake, is not only us three, it's all over the place. And it's also including streaming, which I know we'll be talking about later. So the data mesh was a response to that, the idea of that we need to bait, you know, who are the folks that really know best about governance? It's the domain experts. So it was basically data mesh was an architectural pattern and a process. My prediction for this year is that data mesh is going to hit cold heart reality. Because if you do a Google search, basically the published work, the articles on data mesh have been largely, you know, pretty uncritical so far. Basically loading and is basically being a very revolutionary new idea. I don't think it's that revolutionary because we've talked about ideas like this. Brad now you and I met years ago when we were talking about so and decentralizing all of us, but it was at the application level. Now we're talking about it at the data level. And now we have microservices. So there's this thought of have we managed if we're deconstructing apps in cloud native to microservices, why don't we think of data in the same way? My sense this year is that, you know, this has been a very active search if you look at Google search trends, is that now companies, like enterprise are going to look at this seriously. And as they look at it seriously, it's going to attract its first real hard scrutiny, it's going to attract its first backlash. That's not necessarily a bad thing. It means that it's being taken seriously. The reason why I think that you'll start to see basically the cold hearted light of day shine on data mesh is that it's still a work in progress. You know, this idea is basically a couple of years old and there's still some pretty major gaps. The biggest gap is in the area of federated governance. Now federated governance itself is not a new issue. Federated governance decision, we started figuring out like, how can we basically strike the balance between getting let's say between basically consistent enterprise policy, consistent enterprise governance, but yet the groups that understand the data and know how to basically, you know, that, you know, how do we basically sort of balance the two? There's a huge gap there in practice and knowledge. Also to a lesser extent, there's a technology gap which is basically in the self-service technologies that will help teams essentially govern data. You know, basically through the full life cycle, from develop, from selecting the data from, you know, building the pipelines from, you know, determining your access control, looking at quality, looking at basically whether the data is fresh or whether it's trending off course. So my prediction is that it will receive the first harsh scrutiny this year. You are going to see some organization and enterprises declare premature victory when they build some federated query implementations. You going to see vendors start with data mesh wash their products anybody in the data management space that they are going to say that where this basically a pipelining tool, whether it's basically ELT, whether it's a catalog or federated query tool, they will all going to get like, you know, basically promoting the fact of how they support this. Hopefully nobody's going to call themselves a data mesh tool because data mesh is not a technology. We're going to see one other thing come out of this. And this harks back to the metadata that Sanjeev was talking about and of the catalog just as he was talking about. Which is that there's going to be a new focus, every renewed focus on metadata. And I think that's going to spur interest in data fabrics. Now data fabrics are pretty vaguely defined, but if we just take the most elemental definition, which is a common metadata back plane, I think that if anybody is going to get serious about data mesh, they need to look at the data fabric because we all at the end of the day, need to speak, you know, need to read from the same sheet of music. >> So thank you Tony. Dave Menninger, I mean, one of the things that people like about data mesh is it pretty crisply articulate some of the flaws in today's organizational approaches to data. What are your thoughts on this? >> Well, I think we have to start by defining data mesh, right? The term is already getting corrupted, right? Tony said it's going to see the cold hard light of day. And there's a problem right now that there are a number of overlapping terms that are similar but not identical. So we've got data virtualization, data fabric, excuse me for a second. (clears throat) Sorry about that. Data virtualization, data fabric, data federation, right? So I think that it's not really clear what each vendor means by these terms. I see data mesh and data fabric becoming quite popular. I've interpreted data mesh as referring primarily to the governance aspects as originally intended and specified. But that's not the way I see vendors using it. I see vendors using it much more to mean data fabric and data virtualization. So I'm going to comment on the group of those things. I think the group of those things is going to happen. They're going to happen, they're going to become more robust. Our research suggests that a quarter of organizations are already using virtualized access to their data lakes and another half, so a total of three quarters will eventually be accessing their data lakes using some sort of virtualized access. Again, whether you define it as mesh or fabric or virtualization isn't really the point here. But this notion that there are different elements of data, metadata and governance within an organization that all need to be managed collectively. The interesting thing is when you look at the satisfaction rates of those organizations using virtualization versus those that are not, it's almost double, 68% of organizations, I'm sorry, 79% of organizations that were using virtualized access express satisfaction with their access to the data lake. Only 39% express satisfaction if they weren't using virtualized access. >> Oh thank you Dave. Sanjeev we just got about a couple of minutes on this topic, but I know you're speaking or maybe you've always spoken already on a panel with (indistinct) who sort of invented the concept. Governance obviously is a big sticking point, but what are your thoughts on this? You're on mute. (panelist chuckling) >> So my message to (indistinct) and to the community is as opposed to what they said, let's not define it. We spent a whole year defining it, there are four principles, domain, product, data infrastructure, and governance. Let's take it to the next level. I get a lot of questions on what is the difference between data fabric and data mesh? And I'm like I can't compare the two because data mesh is a business concept, data fabric is a data integration pattern. How do you compare the two? You have to bring data mesh a level down. So to Tony's point, I'm on a warpath in 2022 to take it down to what does a data product look like? How do we handle shared data across domains and governance? And I think we are going to see more of that in 2022, or is "operationalization" of data mesh. >> I think we could have a whole hour on this topic, couldn't we? Maybe we should do that. But let's corner. Let's move to Carl. So Carl, you're a database guy, you've been around that block for a while now, you want to talk about graph databases, bring it on. >> Oh yeah. Okay thanks. So I regard graph database as basically the next truly revolutionary database management technology. I'm looking forward for the graph database market, which of course we haven't defined yet. So obviously I have a little wiggle room in what I'm about to say. But this market will grow by about 600% over the next 10 years. Now, 10 years is a long time. But over the next five years, we expect to see gradual growth as people start to learn how to use it. The problem is not that it's not useful, its that people don't know how to use it. So let me explain before I go any further what a graph database is because some of the folks on the call may not know what it is. A graph database organizes data according to a mathematical structure called a graph. The graph has elements called nodes and edges. So a data element drops into a node, the nodes are connected by edges, the edges connect one node to another node. Combinations of edges create structures that you can analyze to determine how things are related. In some cases, the nodes and edges can have properties attached to them which add additional informative material that makes it richer, that's called a property graph. There are two principle use cases for graph databases. There's semantic property graphs, which are use to break down human language texts into the semantic structures. Then you can search it, organize it and answer complicated questions. A lot of AI is aimed at semantic graphs. Another kind is the property graph that I just mentioned, which has a dazzling number of use cases. I want to just point out as I talk about this, people are probably wondering, well, we have relation databases, isn't that good enough? So a relational database defines... It supports what I call definitional relationships. That means you define the relationships in a fixed structure. The database drops into that structure, there's a value, foreign key value, that relates one table to another and that value is fixed. You don't change it. If you change it, the database becomes unstable, it's not clear what you're looking at. In a graph database, the system is designed to handle change so that it can reflect the true state of the things that it's being used to track. So let me just give you some examples of use cases for this. They include entity resolution, data lineage, social media analysis, Customer 360, fraud prevention. There's cybersecurity, there's strong supply chain is a big one actually. There is explainable AI and this is going to become important too because a lot of people are adopting AI. But they want a system after the fact to say, how do the AI system come to that conclusion? How did it make that recommendation? Right now we don't have really good ways of tracking that. Machine learning in general, social network, I already mentioned that. And then we've got, oh gosh, we've got data governance, data compliance, risk management. We've got recommendation, we've got personalization, anti money laundering, that's another big one, identity and access management, network and IT operations is already becoming a key one where you actually have mapped out your operation, you know, whatever it is, your data center and you can track what's going on as things happen there, root cause analysis, fraud detection is a huge one. A number of major credit card companies use graph databases for fraud detection, risk analysis, tracking and tracing turn analysis, next best action, what if analysis, impact analysis, entity resolution and I would add one other thing or just a few other things to this list, metadata management. So Sanjeev, here you go, this is your engine. Because I was in metadata management for quite a while in my past life. And one of the things I found was that none of the data management technologies that were available to us could efficiently handle metadata because of the kinds of structures that result from it, but graphs can, okay? Graphs can do things like say, this term in this context means this, but in that context, it means that, okay? Things like that. And in fact, logistics management, supply chain. And also because it handles recursive relationships, by recursive relationships I mean objects that own other objects that are of the same type. You can do things like build materials, you know, so like parts explosion. Or you can do an HR analysis, who reports to whom, how many levels up the chain and that kind of thing. You can do that with relational databases, but yet it takes a lot of programming. In fact, you can do almost any of these things with relational databases, but the problem is, you have to program it. It's not supported in the database. And whenever you have to program something, that means you can't trace it, you can't define it. You can't publish it in terms of its functionality and it's really, really hard to maintain over time. >> Carl, thank you. I wonder if we could bring Brad in, I mean. Brad, I'm sitting here wondering, okay, is this incremental to the market? Is it disruptive and replacement? What are your thoughts on this phase? >> It's already disrupted the market. I mean, like Carl said, go to any bank and ask them are you using graph databases to get fraud detection under control? And they'll say, absolutely, that's the only way to solve this problem. And it is frankly. And it's the only way to solve a lot of the problems that Carl mentioned. And that is, I think it's Achilles heel in some ways. Because, you know, it's like finding the best way to cross the seven bridges of Koenigsberg. You know, it's always going to kind of be tied to those use cases because it's really special and it's really unique and because it's special and it's unique, it's still unfortunately kind of stands apart from the rest of the community that's building, let's say AI outcomes, as a great example here. Graph databases and AI, as Carl mentioned, are like chocolate and peanut butter. But technologically, you think don't know how to talk to one another, they're completely different. And you know, you can't just stand up SQL and query them. You've got to learn, know what is the Carl? Specter special. Yeah, thank you to, to actually get to the data in there. And if you're going to scale that data, that graph database, especially a property graph, if you're going to do something really complex, like try to understand you know, all of the metadata in your organization, you might just end up with, you know, a graph database winter like we had the AI winter simply because you run out of performance to make the thing happen. So, I think it's already disrupted, but we need to like treat it like a first-class citizen in the data analytics and AI community. We need to bring it into the fold. We need to equip it with the tools it needs to do the magic it does and to do it not just for specialized use cases, but for everything. 'Cause I'm with Carl. I think it's absolutely revolutionary. >> Brad identified the principal, Achilles' heel of the technology which is scaling. When these things get large and complex enough that they spill over what a single server can handle, you start to have difficulties because the relationships span things that have to be resolved over a network and then you get network latency and that slows the system down. So that's still a problem to be solved. >> Sanjeev, any quick thoughts on this? I mean, I think metadata on the word cloud is going to be the largest font, but what are your thoughts here? >> I want to (indistinct) So people don't associate me with only metadata, so I want to talk about something slightly different. dbengines.com has done an amazing job. I think almost everyone knows that they chronicle all the major databases that are in use today. In January of 2022, there are 381 databases on a ranked list of databases. The largest category is RDBMS. The second largest category is actually divided into two property graphs and IDF graphs. These two together make up the second largest number databases. So talking about Achilles heel, this is a problem. The problem is that there's so many graph databases to choose from. They come in different shapes and forms. To Brad's point, there's so many query languages in RDBMS, in SQL. I know the story, but here We've got cipher, we've got gremlin, we've got GQL and then we're proprietary languages. So I think there's a lot of disparity in this space. >> Well, excellent. All excellent points, Sanjeev, if I must say. And that is a problem that the languages need to be sorted and standardized. People need to have a roadmap as to what they can do with it. Because as you say, you can do so many things. And so many of those things are unrelated that you sort of say, well, what do we use this for? And I'm reminded of the saying I learned a bunch of years ago. And somebody said that the digital computer is the only tool man has ever device that has no particular purpose. (panelists chuckle) >> All right guys, we got to move on to Dave Menninger. We've heard about streaming. Your prediction is in that realm, so please take it away. >> Sure. So I like to say that historical databases are going to become a thing of the past. By that I don't mean that they're going to go away, that's not my point. I mean, we need historical databases, but streaming data is going to become the default way in which we operate with data. So in the next say three to five years, I would expect that data platforms and we're using the term data platforms to represent the evolution of databases and data lakes, that the data platforms will incorporate these streaming capabilities. We're going to process data as it streams into an organization and then it's going to roll off into historical database. So historical databases don't go away, but they become a thing of the past. They store the data that occurred previously. And as data is occurring, we're going to be processing it, we're going to be analyzing it, we're going to be acting on it. I mean we only ever ended up with historical databases because we were limited by the technology that was available to us. Data doesn't occur in patches. But we processed it in patches because that was the best we could do. And it wasn't bad and we've continued to improve and we've improved and we've improved. But streaming data today is still the exception. It's not the rule, right? There are projects within organizations that deal with streaming data. But it's not the default way in which we deal with data yet. And so that's my prediction is that this is going to change, we're going to have streaming data be the default way in which we deal with data and how you label it and what you call it. You know, maybe these databases and data platforms just evolved to be able to handle it. But we're going to deal with data in a different way. And our research shows that already, about half of the participants in our analytics and data benchmark research, are using streaming data. You know, another third are planning to use streaming technologies. So that gets us to about eight out of 10 organizations need to use this technology. And that doesn't mean they have to use it throughout the whole organization, but it's pretty widespread in its use today and has continued to grow. If you think about the consumerization of IT, we've all been conditioned to expect immediate access to information, immediate responsiveness. You know, we want to know if an item is on the shelf at our local retail store and we can go in and pick it up right now. You know, that's the world we live in and that's spilling over into the enterprise IT world We have to provide those same types of capabilities. So that's my prediction, historical databases become a thing of the past, streaming data becomes the default way in which we operate with data. >> All right thank you David. Well, so what say you, Carl, the guy who has followed historical databases for a long time? >> Well, one thing actually, every database is historical because as soon as you put data in it, it's now history. They'll no longer reflect the present state of things. But even if that history is only a millisecond old, it's still history. But I would say, I mean, I know you're trying to be a little bit provocative in saying this Dave 'cause you know, as well as I do that people still need to do their taxes, they still need to do accounting, they still need to run general ledger programs and things like that. That all involves historical data. That's not going to go away unless you want to go to jail. So you're going to have to deal with that. But as far as the leading edge functionality, I'm totally with you on that. And I'm just, you know, I'm just kind of wondering if this requires a change in the way that we perceive applications in order to truly be manifested and rethinking the way applications work. Saying that an application should respond instantly, as soon as the state of things changes. What do you say about that? >> I think that's true. I think we do have to think about things differently. It's not the way we designed systems in the past. We're seeing more and more systems designed that way. But again, it's not the default. And I agree 100% with you that we do need historical databases you know, that's clear. And even some of those historical databases will be used in conjunction with the streaming data, right? >> Absolutely. I mean, you know, let's take the data warehouse example where you're using the data warehouse as its context and the streaming data as the present and you're saying, here's the sequence of things that's happening right now. Have we seen that sequence before? And where? What does that pattern look like in past situations? And can we learn from that? >> So Tony Baer, I wonder if you could comment? I mean, when you think about, you know, real time inferencing at the edge, for instance, which is something that a lot of people talk about, a lot of what we're discussing here in this segment, it looks like it's got a great potential. What are your thoughts? >> Yeah, I mean, I think you nailed it right. You know, you hit it right on the head there. Which is that, what I'm seeing is that essentially. Then based on I'm going to split this one down the middle is that I don't see that basically streaming is the default. What I see is streaming and basically and transaction databases and analytics data, you know, data warehouses, data lakes whatever are converging. And what allows us technically to converge is cloud native architecture, where you can basically distribute things. So you can have a node here that's doing the real-time processing, that's also doing... And this is where it leads in or maybe doing some of that real time predictive analytics to take a look at, well look, we're looking at this customer journey what's happening with what the customer is doing right now and this is correlated with what other customers are doing. So the thing is that in the cloud, you can basically partition this and because of basically the speed of the infrastructure then you can basically bring these together and kind of orchestrate them sort of a loosely coupled manner. The other parts that the use cases are demanding, and this is part of it goes back to what Dave is saying. Is that, you know, when you look at Customer 360, when you look at let's say Smart Utility products, when you look at any type of operational problem, it has a real time component and it has an historical component. And having predictive and so like, you know, my sense here is that technically we can bring this together through the cloud. And I think the use case is that we can apply some real time sort of predictive analytics on these streams and feed this into the transactions so that when we make a decision in terms of what to do as a result of a transaction, we have this real-time input. >> Sanjeev, did you have a comment? >> Yeah, I was just going to say that to Dave's point, you know, we have to think of streaming very different because in the historical databases, we used to bring the data and store the data and then we used to run rules on top, aggregations and all. But in case of streaming, the mindset changes because the rules are normally the inference, all of that is fixed, but the data is constantly changing. So it's a completely reversed way of thinking and building applications on top of that. >> So Dave Menninger, there seem to be some disagreement about the default. What kind of timeframe are you thinking about? Is this end of decade it becomes the default? What would you pin? >> I think around, you know, between five to 10 years, I think this becomes the reality. >> I think its... >> It'll be more and more common between now and then, but it becomes the default. And I also want Sanjeev at some point, maybe in one of our subsequent conversations, we need to talk about governing streaming data. 'Cause that's a whole nother set of challenges. >> We've also talked about it rather in two dimensions, historical and streaming, and there's lots of low latency, micro batch, sub-second, that's not quite streaming, but in many cases its fast enough and we're seeing a lot of adoption of near real time, not quite real-time as good enough for many applications. (indistinct cross talk from panelists) >> Because nobody's really taking the hardware dimension (mumbles). >> That'll just happened, Carl. (panelists laughing) >> So near real time. But maybe before you lose the customer, however we define that, right? Okay, let's move on to Brad. Brad, you want to talk about automation, AI, the pipeline people feel like, hey, we can just automate everything. What's your prediction? >> Yeah I'm an AI aficionados so apologies in advance for that. But, you know, I think that we've been seeing automation play within AI for some time now. And it's helped us do a lot of things especially for practitioners that are building AI outcomes in the enterprise. It's helped them to fill skills gaps, it's helped them to speed development and it's helped them to actually make AI better. 'Cause it, you know, in some ways provide some swim lanes and for example, with technologies like AutoML can auto document and create that sort of transparency that we talked about a little bit earlier. But I think there's an interesting kind of conversion happening with this idea of automation. And that is that we've had the automation that started happening for practitioners, it's trying to move out side of the traditional bounds of things like I'm just trying to get my features, I'm just trying to pick the right algorithm, I'm just trying to build the right model and it's expanding across that full life cycle, building an AI outcome, to start at the very beginning of data and to then continue on to the end, which is this continuous delivery and continuous automation of that outcome to make sure it's right and it hasn't drifted and stuff like that. And because of that, because it's become kind of powerful, we're starting to actually see this weird thing happen where the practitioners are starting to converge with the users. And that is to say that, okay, if I'm in Tableau right now, I can stand up Salesforce Einstein Discovery, and it will automatically create a nice predictive algorithm for me given the data that I pull in. But what's starting to happen and we're seeing this from the companies that create business software, so Salesforce, Oracle, SAP, and others is that they're starting to actually use these same ideals and a lot of deep learning (chuckles) to basically stand up these out of the box flip-a-switch, and you've got an AI outcome at the ready for business users. And I am very much, you know, I think that's the way that it's going to go and what it means is that AI is slowly disappearing. And I don't think that's a bad thing. I think if anything, what we're going to see in 2022 and maybe into 2023 is this sort of rush to put this idea of disappearing AI into practice and have as many of these solutions in the enterprise as possible. You can see, like for example, SAP is going to roll out this quarter, this thing called adaptive recommendation services, which basically is a cold start AI outcome that can work across a whole bunch of different vertical markets and use cases. It's just a recommendation engine for whatever you needed to do in the line of business. So basically, you're an SAP user, you look up to turn on your software one day, you're a sales professional let's say, and suddenly you have a recommendation for customer churn. Boom! It's going, that's great. Well, I don't know, I think that's terrifying. In some ways I think it is the future that AI is going to disappear like that, but I'm absolutely terrified of it because I think that what it really does is it calls attention to a lot of the issues that we already see around AI, specific to this idea of what we like to call at Omdia, responsible AI. Which is, you know, how do you build an AI outcome that is free of bias, that is inclusive, that is fair, that is safe, that is secure, that its audible, et cetera, et cetera, et cetera, et cetera. I'd take a lot of work to do. And so if you imagine a customer that's just a Salesforce customer let's say, and they're turning on Einstein Discovery within their sales software, you need some guidance to make sure that when you flip that switch, that the outcome you're going to get is correct. And that's going to take some work. And so, I think we're going to see this move, let's roll this out and suddenly there's going to be a lot of problems, a lot of pushback that we're going to see. And some of that's going to come from GDPR and others that Sanjeev was mentioning earlier. A lot of it is going to come from internal CSR requirements within companies that are saying, "Hey, hey, whoa, hold up, we can't do this all at once. "Let's take the slow route, "let's make AI automated in a smart way." And that's going to take time. >> Yeah, so a couple of predictions there that I heard. AI simply disappear, it becomes invisible. Maybe if I can restate that. And then if I understand it correctly, Brad you're saying there's a backlash in the near term. You'd be able to say, oh, slow down. Let's automate what we can. Those attributes that you talked about are non trivial to achieve, is that why you're a bit of a skeptic? >> Yeah. I think that we don't have any sort of standards that companies can look to and understand. And we certainly, within these companies, especially those that haven't already stood up an internal data science team, they don't have the knowledge to understand when they flip that switch for an automated AI outcome that it's going to do what they think it's going to do. And so we need some sort of standard methodology and practice, best practices that every company that's going to consume this invisible AI can make use of them. And one of the things that you know, is sort of started that Google kicked off a few years back that's picking up some momentum and the companies I just mentioned are starting to use it is this idea of model cards where at least you have some transparency about what these things are doing. You know, so like for the SAP example, we know, for example, if it's convolutional neural network with a long, short term memory model that it's using, we know that it only works on Roman English and therefore me as a consumer can say, "Oh, well I know that I need to do this internationally. "So I should not just turn this on today." >> Thank you. Carl could you add anything, any context here? >> Yeah, we've talked about some of the things Brad mentioned here at IDC and our future of intelligence group regarding in particular, the moral and legal implications of having a fully automated, you know, AI driven system. Because we already know, and we've seen that AI systems are biased by the data that they get, right? So if they get data that pushes them in a certain direction, I think there was a story last week about an HR system that was recommending promotions for White people over Black people, because in the past, you know, White people were promoted and more productive than Black people, but it had no context as to why which is, you know, because they were being historically discriminated, Black people were being historically discriminated against, but the system doesn't know that. So, you know, you have to be aware of that. And I think that at the very least, there should be controls when a decision has either a moral or legal implication. When you really need a human judgment, it could lay out the options for you. But a person actually needs to authorize that action. And I also think that we always will have to be vigilant regarding the kind of data we use to train our systems to make sure that it doesn't introduce unintended biases. In some extent, they always will. So we'll always be chasing after them. But that's (indistinct). >> Absolutely Carl, yeah. I think that what you have to bear in mind as a consumer of AI is that it is a reflection of us and we are a very flawed species. And so if you look at all of the really fantastic, magical looking supermodels we see like GPT-3 and four, that's coming out, they're xenophobic and hateful because the people that the data that's built upon them and the algorithms and the people that build them are us. So AI is a reflection of us. We need to keep that in mind. >> Yeah, where the AI is biased 'cause humans are biased. All right, great. All right let's move on. Doug you mentioned mentioned, you know, lot of people that said that data lake, that term is not going to live on but here's to be, have some lakes here. You want to talk about lake house, bring it on. >> Yes, I do. My prediction is that lake house and this idea of a combined data warehouse and data lake platform is going to emerge as the dominant data management offering. I say offering that doesn't mean it's going to be the dominant thing that organizations have out there, but it's going to be the pro dominant vendor offering in 2022. Now heading into 2021, we already had Cloudera, Databricks, Microsoft, Snowflake as proponents, in 2021, SAP, Oracle, and several of all of these fabric virtualization/mesh vendors joined the bandwagon. The promise is that you have one platform that manages your structured, unstructured and semi-structured information. And it addresses both the BI analytics needs and the data science needs. The real promise there is simplicity and lower cost. But I think end users have to answer a few questions. The first is, does your organization really have a center of data gravity or is the data highly distributed? Multiple data warehouses, multiple data lakes, on premises, cloud. If it's very distributed and you'd have difficulty consolidating and that's not really a goal for you, then maybe that single platform is unrealistic and not likely to add value to you. You know, also the fabric and virtualization vendors, the mesh idea, that's where if you have this highly distributed situation, that might be a better path forward. The second question, if you are looking at one of these lake house offerings, you are looking at consolidating, simplifying, bringing together to a single platform. You have to make sure that it meets both the warehouse need and the data lake need. So you have vendors like Databricks, Microsoft with Azure Synapse. New really to the data warehouse space and they're having to prove that these data warehouse capabilities on their platforms can meet the scaling requirements, can meet the user and query concurrency requirements. Meet those tight SLS. And then on the other hand, you have the Oracle, SAP, Snowflake, the data warehouse folks coming into the data science world, and they have to prove that they can manage the unstructured information and meet the needs of the data scientists. I'm seeing a lot of the lake house offerings from the warehouse crowd, managing that unstructured information in columns and rows. And some of these vendors, Snowflake a particular is really relying on partners for the data science needs. So you really got to look at a lake house offering and make sure that it meets both the warehouse and the data lake requirement. >> Thank you Doug. Well Tony, if those two worlds are going to come together, as Doug was saying, the analytics and the data science world, does it need to be some kind of semantic layer in between? I don't know. Where are you in on this topic? >> (chuckles) Oh, didn't we talk about data fabrics before? Common metadata layer (chuckles). Actually, I'm almost tempted to say let's declare victory and go home. And that this has actually been going on for a while. I actually agree with, you know, much of what Doug is saying there. Which is that, I mean I remember as far back as I think it was like 2014, I was doing a study. I was still at Ovum, (indistinct) Omdia, looking at all these specialized databases that were coming up and seeing that, you know, there's overlap at the edges. But yet, there was still going to be a reason at the time that you would have, let's say a document database for JSON, you'd have a relational database for transactions and for data warehouse and you had basically something at that time that resembles a dupe for what we consider your data life. Fast forward and the thing is what I was seeing at the time is that you were saying they sort of blending at the edges. That was saying like about five to six years ago. And the lake house is essentially on the current manifestation of that idea. There is a dichotomy in terms of, you know, it's the old argument, do we centralize this all you know in a single place or do we virtualize? And I think it's always going to be a union yeah and there's never going to be a single silver bullet. I do see that there are also going to be questions and these are points that Doug raised. That you know, what do you need for your performance there, or for your free performance characteristics? Do you need for instance high concurrency? You need the ability to do some very sophisticated joins, or is your requirement more to be able to distribute and distribute our processing is, you know, as far as possible to get, you know, to essentially do a kind of a brute force approach. All these approaches are valid based on the use case. I just see that essentially that the lake house is the culmination of it's nothing. It's a relatively new term introduced by Databricks a couple of years ago. This is the culmination of basically what's been a long time trend. And what we see in the cloud is that as we start seeing data warehouses as a check box items say, "Hey, we can basically source data in cloud storage, in S3, "Azure Blob Store, you know, whatever, "as long as it's in certain formats, "like, you know parquet or CSP or something like that." I see that as becoming kind of a checkbox item. So to that extent, I think that the lake house, depending on how you define is already reality. And in some cases, maybe new terminology, but not a whole heck of a lot new under the sun. >> Yeah. And Dave Menninger, I mean a lot of these, thank you Tony, but a lot of this is going to come down to, you know, vendor marketing, right? Some people just kind of co-op the term, we talked about you know, data mesh washing, what are your thoughts on this? (laughing) >> Yeah, so I used the term data platform earlier. And part of the reason I use that term is that it's more vendor neutral. We've tried to sort of stay out of the vendor terminology patenting world, right? Whether the term lake houses, what sticks or not, the concept is certainly going to stick. And we have some data to back it up. About a quarter of organizations that are using data lakes today, already incorporate data warehouse functionality into it. So they consider their data lake house and data warehouse one in the same, about a quarter of organizations, a little less, but about a quarter of organizations feed the data lake from the data warehouse and about a quarter of organizations feed the data warehouse from the data lake. So it's pretty obvious that three quarters of organizations need to bring this stuff together, right? The need is there, the need is apparent. The technology is going to continue to converge. I like to talk about it, you know, you've got data lakes over here at one end, and I'm not going to talk about why people thought data lakes were a bad idea because they thought you just throw stuff in a server and you ignore it, right? That's not what a data lake is. So you've got data lake people over here and you've got database people over here, data warehouse people over here, database vendors are adding data lake capabilities and data lake vendors are adding data warehouse capabilities. So it's obvious that they're going to meet in the middle. I mean, I think it's like Tony says, I think we should declare victory and go home. >> As hell. So just a follow-up on that, so are you saying the specialized lake and the specialized warehouse, do they go away? I mean, Tony data mesh practitioners would say or advocates would say, well, they could all live. It's just a node on the mesh. But based on what Dave just said, are we gona see those all morphed together? >> Well, number one, as I was saying before, there's always going to be this sort of, you know, centrifugal force or this tug of war between do we centralize the data, do we virtualize? And the fact is I don't think that there's ever going to be any single answer. I think in terms of data mesh, data mesh has nothing to do with how you're physically implement the data. You could have a data mesh basically on a data warehouse. It's just that, you know, the difference being is that if we use the same physical data store, but everybody's logically you know, basically governing it differently, you know? Data mesh in space, it's not a technology, it's processes, it's governance process. So essentially, you know, I basically see that, you know, as I was saying before that this is basically the culmination of a long time trend we're essentially seeing a lot of blurring, but there are going to be cases where, for instance, if I need, let's say like, Upserve, I need like high concurrency or something like that. There are certain things that I'm not going to be able to get efficiently get out of a data lake. And, you know, I'm doing a system where I'm just doing really brute forcing very fast file scanning and that type of thing. So I think there always will be some delineations, but I would agree with Dave and with Doug, that we are seeing basically a confluence of requirements that we need to essentially have basically either the element, you know, the ability of a data lake and the data warehouse, these need to come together, so I think. >> I think what we're likely to see is organizations look for a converge platform that can handle both sides for their center of data gravity, the mesh and the fabric virtualization vendors, they're all on board with the idea of this converged platform and they're saying, "Hey, we'll handle all the edge cases "of the stuff that isn't in that center of data gravity "but that is off distributed in a cloud "or at a remote location." So you can have that single platform for the center of your data and then bring in virtualization, mesh, what have you, for reaching out to the distributed data. >> As Dave basically said, people are happy when they virtualized data. >> I think we have at this point, but to Dave Menninger's point, they are converging, Snowflake has introduced support for unstructured data. So obviously literally splitting here. Now what Databricks is saying is that "aha, but it's easy to go from data lake to data warehouse "than it is from databases to data lake." So I think we're getting into semantics, but we're already seeing these two converge. >> So take somebody like AWS has got what? 15 data stores. Are they're going to 15 converge data stores? This is going to be interesting to watch. All right, guys, I'm going to go down and list do like a one, I'm going to one word each and you guys, each of the analyst, if you would just add a very brief sort of course correction for me. So Sanjeev, I mean, governance is going to to be... Maybe it's the dog that wags the tail now. I mean, it's coming to the fore, all this ransomware stuff, which you really didn't talk much about security, but what's the one word in your prediction that you would leave us with on governance? >> It's going to be mainstream. >> Mainstream. Okay. Tony Baer, mesh washing is what I wrote down. That's what we're going to see in 2022, a little reality check, you want to add to that? >> Reality check, 'cause I hope that no vendor jumps the shark and close they're offering a data niche product. >> Yeah, let's hope that doesn't happen. If they do, we're going to call them out. Carl, I mean, graph databases, thank you for sharing some high growth metrics. I know it's early days, but magic is what I took away from that, so magic database. >> Yeah, I would actually, I've said this to people too. I kind of look at it as a Swiss Army knife of data because you can pretty much do anything you want with it. That doesn't mean you should. I mean, there's definitely the case that if you're managing things that are in fixed schematic relationship, probably a relation database is a better choice. There are times when the document database is a better choice. It can handle those things, but maybe not. It may not be the best choice for that use case. But for a great many, especially with the new emerging use cases I listed, it's the best choice. >> Thank you. And Dave Menninger, thank you by the way, for bringing the data in, I like how you supported all your comments with some data points. But streaming data becomes the sort of default paradigm, if you will, what would you add? >> Yeah, I would say think fast, right? That's the world we live in, you got to think fast. >> Think fast, love it. And Brad Shimmin, love it. I mean, on the one hand I was saying, okay, great. I'm afraid I might get disrupted by one of these internet giants who are AI experts. I'm going to be able to buy instead of build AI. But then again, you know, I've got some real issues. There's a potential backlash there. So give us your bumper sticker. >> I'm would say, going with Dave, think fast and also think slow to talk about the book that everyone talks about. I would say really that this is all about trust, trust in the idea of automation and a transparent and visible AI across the enterprise. And verify, verify before you do anything. >> And then Doug Henschen, I mean, I think the trend is your friend here on this prediction with lake house is really becoming dominant. I liked the way you set up that notion of, you know, the data warehouse folks coming at it from the analytics perspective and then you get the data science worlds coming together. I still feel as though there's this piece in the middle that we're missing, but your, your final thoughts will give you the (indistinct). >> I think the idea of consolidation and simplification always prevails. That's why the appeal of a single platform is going to be there. We've already seen that with, you know, DoOP platforms and moving toward cloud, moving toward object storage and object storage, becoming really the common storage point for whether it's a lake or a warehouse. And that second point, I think ESG mandates are going to come in alongside GDPR and things like that to up the ante for good governance. >> Yeah, thank you for calling that out. Okay folks, hey that's all the time that we have here, your experience and depth of understanding on these key issues on data and data management really on point and they were on display today. I want to thank you for your contributions. Really appreciate your time. >> Enjoyed it. >> Thank you. >> Thanks for having me. >> In addition to this video, we're going to be making available transcripts of the discussion. We're going to do clips of this as well we're going to put them out on social media. I'll write this up and publish the discussion on wikibon.com and siliconangle.com. No doubt, several of the analysts on the panel will take the opportunity to publish written content, social commentary or both. I want to thank the power panelists and thanks for watching this special CUBE presentation. This is Dave Vellante, be well and we'll see you next time. (bright music)

Published Date : Jan 7 2022

SUMMARY :

and I'd like to welcome you to I as moderator, I'm going to and that is the journey to weigh in on there, and it's going to demand more solid data. Brad, I wonder if you that are specific to individual use cases in the past is because we I like the fact that you the data from, you know, Dave Menninger, I mean, one of the things that all need to be managed collectively. Oh thank you Dave. and to the community I think we could have a after the fact to say, okay, is this incremental to the market? the magic it does and to do it and that slows the system down. I know the story, but And that is a problem that the languages move on to Dave Menninger. So in the next say three to five years, the guy who has followed that people still need to do their taxes, And I agree 100% with you and the streaming data as the I mean, when you think about, you know, and because of basically the all of that is fixed, but the it becomes the default? I think around, you know, but it becomes the default. and we're seeing a lot of taking the hardware dimension That'll just happened, Carl. Okay, let's move on to Brad. And that is to say that, Those attributes that you And one of the things that you know, Carl could you add in the past, you know, I think that what you have to bear in mind that term is not going to and the data science needs. and the data science world, You need the ability to do lot of these, thank you Tony, I like to talk about it, you know, It's just a node on the mesh. basically either the element, you know, So you can have that single they virtualized data. "aha, but it's easy to go from I mean, it's coming to the you want to add to that? I hope that no vendor Yeah, let's hope that doesn't happen. I've said this to people too. I like how you supported That's the world we live I mean, on the one hand I And verify, verify before you do anything. I liked the way you set up We've already seen that with, you know, the time that we have here, We're going to do clips of this as well

ENTITIES

Entity	Category	Confidence
Dave Menninger	PERSON	0.99+
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Doug Henschen	PERSON	0.99+
David	PERSON	0.99+
Brad Shimmin	PERSON	0.99+
Doug	PERSON	0.99+
Tony Baer	PERSON	0.99+
Dave Velannte	PERSON	0.99+
Tony	PERSON	0.99+
Carl	PERSON	0.99+
Brad	PERSON	0.99+
Carl Olofson	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
2014	DATE	0.99+
Sanjeev Mohan	PERSON	0.99+
Ventana Research	ORGANIZATION	0.99+
2022	DATE	0.99+
Oracle	ORGANIZATION	0.99+
last year	DATE	0.99+
January of 2022	DATE	0.99+
three	QUANTITY	0.99+
381 databases	QUANTITY	0.99+
IDC	ORGANIZATION	0.99+
Informatica	ORGANIZATION	0.99+
Snowflake	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Sanjeev	PERSON	0.99+
2021	DATE	0.99+
Google	ORGANIZATION	0.99+
Omdia	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
SanjMo	ORGANIZATION	0.99+
79%	QUANTITY	0.99+
second question	QUANTITY	0.99+
last week	DATE	0.99+
15 data stores	QUANTITY	0.99+
100%	QUANTITY	0.99+
SAP	ORGANIZATION	0.99+

Rik Tamm-Daniels, Informatica | AWS re:Invent 2021

>>Hey everyone. Welcome back to the cube. Live in Las Vegas, Lisa Martin, with Dave Nicholson, we are covering AWS reinvent 2021. This was probably one of the most important and largest hybrid tech events this year with AWS and its enormous ecosystem of partners. We're going to be talking with a hundred guests in the next couple of days. We started a couple of days ago and about really the innovation that's going to be going on in the cloud and tech in the next decade. We're pleased to welcome Rick Tam Daniel's as our next guest VP of strategic ecosystems at Informatica. Rick. Welcome to >>The program. Thank you for having me. It's a, it's a pleasure to be back. >>Isn't it nice to be back in person? Oh, it's amazing. All these conversations you just can't replicate by video conferencing. Absolutely >>Great to reconnect with folks haven't seen in a few years as well. >>Absolutely. That's been the sentiment. I think one of the, one of the sentiments that we've heard the last three days, so one of the things thematically that we've also been hearing about in, in between all of the plethora of AWS announcements, typical reinvent is that every company has to become a data company, public sector, private sector, small business, large business. Talk to us about how Informatica and AWS are helping companies become data companies so that they don't get left behind. >>But one of the biggest things that we're hearing at reinvent is that customers are really concerned with data, fragmentation, data silos, access to trusted data, and how do they, how do they get that information to really affect data led transformation? In fact, we did a survey earlier in the year of chief, the chief data officers were found that up to 80, almost 80% of organizations had 50% or more of their data in hybrid or multi-cloud environments. And also a 79% are looking to leverage more than 100 data sources. And 30% are looking to leverage more than 1000 data sources. So Informatica we, with our intelligent data management cloud, we're really focused on enabling customers to bring together the data assets, no matter where they live, what format they're in, on-premise cloud, multi-cloud bringing that all together. >>Well, we sold this massive scatter 22 months ago now, right? Of everyone just, and the edge exploded and data exploded and volumes and data sources exploded hard for organizations to get their head around that, to go or that the data is going to be living in all these different places. You talked about a lot of customers and every industry being hybrid multi-cloud because based on strategy, based on acquisition, but to get their arms around that data and to be able to actually extract value from it fast is going to be the difference between those businesses that succeed and those that don't >>Absolutely. And our partnership with AWS, that's a long standing partnership and we're very much focused on addressing the challenges you're talking about. Uh, and in fact, earlier this year we announced our cloud first, our cloud native, uh, data governance and data catalog on AWS, which is really focused on creating that central point of trusted data access and visibility for the organization. And just today, we had an announcement about how we're bringing data democratization and really accelerating data democratization for AWS lake formation. >>What is, when you, when you, we talk about data democratization often, what does that mean to you? What does that mean to Informatica? How do you deliver that to customers so that they can really be able to extract as much value as they can? >>Yeah. So a great question. And really that whole data management journey is a big piece of this. So it starts with data discovery. How do I even begin to find my data assets? How do I get them from where they are to where they need to go in the cloud? How do I make sure they're clean, they're ready to use. I trust them. I understand where they came from. And so the solution that we announced today is really focused on how do we provide a business users with a self-service way of getting access to data lake data, sitting in Amazon S3 with lake formation governance, but doing it in a way that doesn't create an undue burden on those business users, around data compliance and data policies. And so what we've done is we brought our business user-friendly self-service experience an axon data marketplace together with AWS lake formation. >>So Informatica has had a long history in the data world. Um, I think of terms like MDM and ETL. Yes. Where does, where does Informatica is history dovetail with the present day in terms of cloud the con does the concept of extract translate load? I think that's what ETL stood for too many TLAs running as far as trying to transform, uh, w where does that play in today's world? Are you focused separately on cloud from on-premise data center or do you, or do you link the two? Yeah, >>So we focus on, uh, addressing data management, uh, when, no matter where the data lives. So on-premise cloud multi-cloud, uh, on our intelligent data management cloud platform is a, is the industry's first end-to-end cloud native as a service data management platform that delivers all those capabilities. I mentioned before, uh, to customers. So we can manage all those workloads that are distributed from a single cloud-based as a service data management platform. So >>The platform is, is as a service in the cloud, but you could be managing data assets that are in traditional on premises, data centers, the like, absolutely. >>Okay. >>So congratulations on the IPO. Of course we can't, we can't not talk to Informatica without that. I imagined the momentum is probably pretty great right about now when we think of AWS, I, when I think of AWS, I always think of momentum. We, I mean the, the volume of announcements, but also when I think about AWS, I think about their absolute focus on the customer, that working backwards approach from a partnership perspective. Is there alignment there? I imagine, like I said, with the IPO, a lot of momentum right now, probably a lot of excitement are, is infant medical also was focused and customer obsessed as AWS's. >>Yeah. So, um, first of all, thank you so much. Congratulations. Uh, so we had a very successful IPO last month. And in fact, just yesterday, our CEO I'm at Wailea presented our Q3 results, uh, which showcase the continued growth of our subscription revenue or cloud revenue. And in fact, our cloud revenue grew 44% year over year, which is really reflective of our big shift to being a cloud first company and also the success of our intelligent data management cloud platform. Right. And, and that platform, again, as I mentioned, it's spanning all those aspects of data management and we're delivering that for more than 5,000 customers globally. Uh, and just from an adoption perspective, we processed about 23 trillion transactions a month for customers in our cloud platform. And that's doubling every six to 12 months. So it's incredible amount of adoption. Some of the biggest enterprises in the world like Unilever, Sanofi folks like that are using the cloud is their preferred data management platform of choice in the cloud. >>Well, you know, of course, congratulations is in order for the IPO, but also really on what you just mentioned, the trajectory of where Informatica is going, because Informatica wasn't born yesterday. Right. And, uh, we shouldn't overlook the fact that there are challenges associated with moving from the world as it exists on premises for still 80% of it spend at least navigating that transition, going from private to public, getting the right kind of investment where people realize that cloud is a significant barrier to entry, uh, for, for a lot of companies. I think it's, it's, you know, you have a lot of folks cheering for you as you navigate this transition. >>Well, one thing I do I say is, yes, we have it in the business of data for a long time, but we also then the business of cloud quite a long time. So this is true. This is the 10th reinvent. This is also the ten-year anniversary of the Informatica AWS partnership, right? So we've been working in the cloud with AWS for, for that long innovating all of these different, different core services. So, um, and from that perspective, you know, I think we're doing a tremendous amount of innovation together, you know, solutions like when we talked about for lake formation, but we also announced today a couple of key programs that we partnered with AWS around, around modernization and migration, right? So that's a big area of focus as well is how do we help customers modernize and take advantage of all the great services that AWS offers? So that's how we announced our membership and what's called the workload migration program and also the data lead migrations program, which is part of the public sector focus at AWS as well. >>The station perspective that was talked a lot about by Adam yesterday. And we've talked about it a lot today, every organization needs to monitorize, even some of those younger ones that you think, oh, aren't, they already, you know, fairly modern, but where, where are your customer conversations happening from a modernization perspective is that elevated up the, the C stat that we've got to modernize our or our organization get better handle of our data, be able to use it more protected, secure it so that we can be competitive and deliver outstanding customer experiences. >>What happens is the pain points that the legacy infrastructure has from the business perspective really do elevate the conversation to the C-suite. They're looking at saying, Hey, especially with the pandemic, right? We have to transform our business. We have to have data. We have to have trust in data. How do we do that? And we're not going to get there >>On rigid on-premise infrastructure. We need to be in a cloud native footprint. And so we've been focused on helping customers get to those cloud native end points, but also to a truly cloud native data management, we talked about earlier can manage all those different workloads, right? From a single that SAS serverless type experience. Right? What have been some of the interesting conversations that you've had here? Again, we are in person yep. Fresh off the IPO, lots of announcements coming out. You guys made announcements today. What's been the sentiment from the, those customers and partners that you've talked about. >>Well, I'll give you guys actually a little sneak preview of another announcement we have coming tomorrow, uh, with our friends at Databricks. So we, uh, we are announcing a data, data democratization solution with Databricks accelerating some of the same, you know, addressing some of the same challenges we were talking about here, but in the data breaks in the Lakehouse environment. Um, so, so, but around that, and I had a great conversation with some partners here, some of the global system integrators, and they're just so happy to see that, right, because a lot of the infrastructure that's around data lakes are lake formation. It's pretty technical it's for a technical audience. And, and Informatica has really been focused on how do we expand the base of users that are able to tap into data and that's through no code experiences, right? It's through visual experiences. And we bring that tightly coupled together with the performance and the power and scale of platforms like Databricks and the AWS Redshift and S3, it's really transformative for customers. >>What are some of the things that here we are wrapping up the 10th, re-invent almost as tomorrow, but also wrapping up the end of 2021. What are some of the things that th th that there's obviously a lot of momentum with Informatica right now that from a partnership perspective, anything that you, you just gave us some breaking news. Thank you. We always love that. What are some of the things that you're looking forward to in 2022 that you think are really going to help Informatica customers just be incredibly competitive and utilizing data in the cloud on prem to their maximum? >>Well, I think as we go into the next year data complexity data fragmentation, it's gonna continue to grow. It's, it's, it's exploding out there. Uh, and one of the key components of our platform or the IDMC platform is we call it Clare, which is the industry first kind of metadata driven AI engine. And what we've done is we've taken the intelligence of machine learning and AI, and brought that to the business of data management. And we truly believe that the way customers are going to tame that data, they're going to address those problems and continue to scale and keep up is leveraging the power of AI in a cloud native cloud, first data management platform. >>Excellent. Rick, thank you so much for joining us today. Again, congratulations on last month, Informatica IPO, great solid, strong, deep partnership with AWS. We thank you for your insights and best of luck next year. >>Awesome. Thank you so much. Pleasure being here. Our >>Pleasure to have you for my co-host David Nicholson, I'm Martin. You're watching the cube, the global leader in live tech coverage.

Published Date : Dec 2 2021

SUMMARY :

We started a couple of days ago and about really the innovation that's going to be It's a, it's a pleasure to be back. Isn't it nice to be back in person? that every company has to become a data company, public sector, private sector, But one of the biggest things that we're hearing at reinvent is that customers are really concerned with data, fast is going to be the difference between those businesses that succeed and those And just today, we had an announcement about how we're bringing data democratization And so the solution that we announced today So Informatica has had a long history in the data world. So we focus on, uh, addressing data management, uh, when, no matter where the data lives. The platform is, is as a service in the cloud, but you could be managing data assets that are So congratulations on the IPO. And that's doubling every six to 12 months. that cloud is a significant barrier to entry, uh, but we also announced today a couple of key programs that we partnered with AWS around, our organization get better handle of our data, be able to use it more protected, secure it so that we can really do elevate the conversation to the C-suite. What have been some of the interesting conversations that you've had here? some of the same, you know, addressing some of the same challenges we were talking about here, but in the data breaks in the Lakehouse environment. What are some of the things that here we are wrapping up the 10th, and brought that to the business of data management. We thank you for your insights and best of luck next year. Thank you so much. Pleasure to have you for my co-host David Nicholson, I'm Martin.

ENTITIES

Entity	Category	Confidence
David Nicholson	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Informatica	ORGANIZATION	0.99+
Dave Nicholson	PERSON	0.99+
Rick	PERSON	0.99+
Unilever	ORGANIZATION	0.99+
44%	QUANTITY	0.99+
Sanofi	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
80%	QUANTITY	0.99+
Lisa Martin	PERSON	0.99+
2022	DATE	0.99+
yesterday	DATE	0.99+
Las Vegas	LOCATION	0.99+
50%	QUANTITY	0.99+
Martin	PERSON	0.99+
next year	DATE	0.99+
tomorrow	DATE	0.99+
two	QUANTITY	0.99+
Adam	PERSON	0.99+
first	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
more than 1000 data sources	QUANTITY	0.99+
more than 100 data sources	QUANTITY	0.99+
today	DATE	0.99+
79%	QUANTITY	0.99+
last month	DATE	0.99+
last month	DATE	0.99+
more than 5,000 customers	QUANTITY	0.99+
Rick Tam Daniel	PERSON	0.99+
Rik Tamm-Daniels	PERSON	0.99+
Wailea	ORGANIZATION	0.99+
ten-year	QUANTITY	0.99+
22 months ago	DATE	0.98+
12 months	QUANTITY	0.98+
30%	QUANTITY	0.98+
first company	QUANTITY	0.98+
this year	DATE	0.97+
earlier this year	DATE	0.97+
2021	DATE	0.97+
one	QUANTITY	0.97+
Informatica AWS	ORGANIZATION	0.96+
next decade	DATE	0.96+
end of 2021	DATE	0.95+
up to 80	QUANTITY	0.95+
almost 80%	QUANTITY	0.94+
about 23 trillion transactions a month	QUANTITY	0.91+
next couple of days	DATE	0.88+
single	QUANTITY	0.88+

Manish Sood, Reltio | AWS re:Invent 2021

(upbeat music) >> We're back at AWS reinvent 2021. You're watching The Cube, I'm Dave Vellante with my co-host Dave Nicholson. David Nicholson, I'm Dave he's David. >> We're trying something new here at the cube. A little stand up cube. You've heard of the pop-up cube, maybe. We're going to stand up. I work at a stand, standing desk at my office, so let's try it. Four days, two sets, a hundred plus guests. Why not? So Manish Sood is here, he's the founder and CTO of Reltio, Cube alum. >> Dave: Manish, thank you for standing and good to see you again. >> Dave, It's great to see you again, and David, thank you for having me here. >> So, tell us a little bit about your, yourself, your background. I'm always interested to ask founders why you started your company, but tell us the background. >> Yeah, so a little bit of my background and the company's history. I, most of my background has been in data management and creating products for data management. I was at a company called Informatica, came through an acquisition through Informatica, back in 2010. And Started Reltio in 2011. The reason why we started Reltio was that, if you look at the enterprise space and how things have been evolving, there have been an explosion of applications. There's almost an application for every little business process that you can possibly imagine. Enterprise customers who used to struggle with 12 or 24 different systems, are now coming to us and saying they have 300 or 500 different applications that they use to run their business. And that's at the lower end of the spectrum. Even a business like Reltio today, runs on a hundred plus SAAS applications, end to end. And that it is creating one of the biggest opportunities, as well as one of the biggest friction points in the enterprise. Because in order to create better, efficient business outcomes, you have silos of data and you don't know where the source of truth is. And that is something that we saw early on in 2011. At the same time, we also saw that digital transformation or cloud transformation type of requirements, were going to drive a larger need for this kind of capability, where Reltio type of products could act as that single source of truth to unify all of the multi-source siloed information. So, that's what got us started down this journey. >> So, okay. So, when see people hear single source of truth, they think, oh, database, right? But that's not what you guys do, right? I mean, it's, it's, can I call it master data management? But it's really modern master data management. You're kind of recreating a new or creating a new category that- >> Manish: A little bit. >> solves a similar problem. Maybe you could explain that. >> Yeah. A little bit of background. So the term master data management came about the 1920s. (Dave laughing) You believe that? When during the pandemic, the U.S. government was trying to figure out how to know who is still alive versus, you know, not there anymore. And they created something called the death master. Now a very ominous name, for a concept of just bringing data together and figuring out what's going on in the economy, but that need, or problem hasn't gone away. It has just become a harder problem to solve because now we have so many different systems, to deal with. And both internal as well as third-party data sources that companies have to work with. And that's where the need has been around, but the technical capabilities to really keep solving the problem and delivering the solution in a manner where it can keep pace with the evolving needs, that capability has been missing. And that's where the "aha" moment for us was that we really needed to build it out as a foundation that would continue to grow and scale, with the magnitude of the problem that we were going to see in the future. >> Okay, so this idea of single version of the truth, obviously critically important for reporting, financials, you can't, you can't tell an auditor one thing, you know, your, your customers are another thing, your consumers, it's got to be consistent. And especially in regulated industries. Is there a difference Manish, between sort of that type of data and the data maybe that's in the line of business that doesn't necessarily affect the rest of the business? Can they have their own version of the truth, which is just their version, their, their, their single version? It doesn't necessarily have to affect anything else. Do you, are you seeing that changing data landscape, where things are getting more distributed and ownership is becoming more distributed? >> So, the change in the paradigm that we are seeing is because of the proliferation of the data, there is a need to establish, what is the aggregated view of the information. Aggregated and unified, which means that, you know, if there is a record for Dave Vellante or David Vellante. It's the same person. Establishing that fact as the truth across any number of systems that you have, versus the multiple versions of the truth, where somebody comes in and says, for compliance reasons, I want the entire collection of data versus for marketing reasons, I only want one third the slice of this information. So that's where this concept of aggregate once, unify that information, but then make it ready and available for multiple consumers to partake from that. That's becoming the norm. >> Dave: Got it. >> And you mentioned something, Dave, that analytics, reporting, BI, data science, those have been some of the traditional playgrounds for this kind of information to be unified, because if you're trying to roll up the revenue for, you know, the business that you do with Coke or Coca-Cola, you know, you don't know which name you used, then you have to go back to the analytics warehouse and aggregate all of that information and do the reporting. But the same problem is coming up in real time, digital experiences as well. The only difference is, that instead of having the luxury of a few hours, now you have to make the decision in a few milliseconds. >> So, when you talk about those silos of data and seeking to have a unification of those silos, how has that changed in the era of cloud? Is it that Reltio is integrating those disparate sources that now exist in cloud, or is it that you are leveraging cloud to address the problem that's been with us for a long time? And I have to say that Dave Vellante, take him off the the death master. He's definitely still with us. (Manish and Dave laugh) >> Dave: Another good day. >> I'm pretty sure too. But how, how, how has, how have things changed as you know, with, with the dawn of cloud? >> With the dawn of cloud, there are two things that have become available to us. One is using the power of the cloud compute to solve the problem, so that you can keep growing with the footprint of the problem itself and have a solution that scales along with it. But at the same time, you have systems of record, could be your mainframe systems, could be your SAP, ERP type of deployments that you have. Some of those functional applications, they're not going away anytime soon, they're there to stay. But at the same time, you also need the new digital experiences to be delivered on. The glue between those two worlds is the source of truth data that sits in the middle and the best place for it to sit is the cloud, because you have to open it up to the rest of the ecosystem that sits in the cloud, but you also have to maintain a connection to the on the ground type of systems. Putting it behind the firewall and trying to do that is next to impossible, but doing it in the cloud opens up all the doors that you need for your transformation to take place. >> You know Dave, there was a time when I was part of an industry where coding, not writing code, but coding data to basically say, look, this field here is the person's last name. This field is the address where the mortgage is being held. How much of that is still manual, as opposed to applying some form of AI to the problem? Let's say you have 200 different sources of information, where Dave Vellante's name shows up in a variety of contexts. Are we still having to go in manually and sift through to make those correlations? How much of that has been automated at this point? >> So, there are systems of capture where some of that information, because your loan mortgage application was entered by somebody into a system, will still be captured in those places, but we'll take in that information. That's the starting point, but if there are other sources, then we will apply AIML type of capabilities to bring on those new emerging sources. Because at the same time, think about this equation where, you started with five systems or, you know, a dozen systems. Now you're talking about 300 plus systems. You cannot keep doing this manually for every system possible. And this number is only going to grow as we move forward. So AIML definitely has a role to play and further automate this landscape. >> I had to, I saw an amazing stat the other day, the source was the Sand Hill Econometrics, you know, a Silicon valley company. And the stat was that 70% of the series, A, B and C companies, fail to return at least one X to their investors. So you've made it through that nut hole. Congratulations you just raised $120 million dollar round. That's got to be super exciting for you. >> David: No pressure by the way. >> Dave: Tell us about that. Well, I mean, you'd think the industry would have de-risked by now, right. But anyway, so, tell us about that raise. Where are you, where are you guys are at? Very exciting times for you. >> Yeah, really, really exciting time for us. We just raised $120 million dollars. The company was valued at $1.7 billion dollars. >> Dave: Awesome. Congratulations. >> And the round was, you know, all of our existing investors participated in it. We also had a new investor join in the process, as well. >> Dave: They wanted their pro-rata. (Dave and Manish laugh) >> Everybody, everybody wanted their pro-rata. >> Dave: That's great. >> But you know, one of the things that we have been very careful about in this whole process and journey, is something that you and I were talking about, the step function of scale. We're making sure that we are efficient stewards of capital and applying it in a manner where we are at every turn, looking at what's the next step function that we need to graduate to, because we want to make use of this capital to efficiently grow our business and be a Rule of 40 growth company. And that's something that you don't typically hear these days from a lot of the growth companies, but we are certainly focused on building long-term value and focusing on that Rule of 40 growth efficiency. >> Yeah, so Rule of 40 is growth plus EBITDA, or sometimes they use other metrics, but is that how you look at it? Growth plus EBITDA. >> Yes. Yeah. >> Great. >> And that's the formula that we are driving for. And most of our investments with this round of capital are going to be not only pushing forward with the go-to market strategy, because we have a lot of growth opportunity, we have been North America focused, now we can take this global. At the same time, looking at the verticals where we need to double down and invest more, given that we have been a horizontal platform that is core to our capabilities, that we have built with Reltio. But at the same time, making sure that we are investing in the key verticals that we are present in. >> Yeah. So, you were explaining to me that you, you started in the pharmaceutical industry, that's where you got go to market fit. And then you went to other industries. When you went to those other industries where they're similar patterns, or do you do almost have to start from ground zero again, to get that product market fit? >> No. So from the very beginning, the concept has been that this is a horizontal data problem. And at the heart of it, it's information about people, organizations, product, locations, and most of the businesses run on that type of information. That's the core part of the data that they build their business on. Life sciences was a perfect starting point for us, because it had examples of all of those data. When you start with commercial operations, which is sales and marketing, you have people, organization, product type of information. When you go into clinical trials, you have site investigators and patient type of information. When you go into R and D within that same space, you have drugs, compounds, substances, finished products, type of information, all coming from multiple sources. So it was a perfect place for us to prove out, all of the capabilities end to end, which we like to call multi-domain capabilities. And then we looked at what other verticals have similar patterns. And that's why we went after healthcare, financial services, insurance, retail, high tech. Those are some of the key verticals that we are in right now. >> That's awesome. Great vision. Last question, could you give us a sense of the futures, where you're going? Well, first of all, what are you doing with the money? Is it, you go to market, throwing gas on the fire? And what can we expect in the coming year and years? >> Go to market expansion is a key area of investment, but also doubling down on the customer experience that we deliver, how we invest in the product, what are some of the adjacent capabilities that we need to invest in? Because you know, data is a great starting point and data should not hold businesses back. Data should be the accelerant to the business. And that's our philosophy, that we are trying to bring to life. So making sure that we are making the data, readily available, accessible and usable for all of our customers is the key goal to aim for. And that's where all the investment is going. >> Well, Manish was a pleasure having you on at the AWS startup showcase, and then subsequently you become a unicorn. So congratulations on that. Really excited to watch the continued progress. Thanks for coming back in The Cube. >> Well, thank you so much, Dave and David, thanks for having me. >> David: Thanks for validating that Mr. Vellante is still with us. >> (laughs) He's going to be with us for a long time. >> I hope so, I hope so, I got, I got one more to put through college. Thank you for watching this edition of The Cube, at AWS reinvent. I'm Dave Vellante, for Dave Nicholson. We are The Cube, the leader in high-tech coverage, Be right back. (somber music)

Published Date : Dec 1 2021

SUMMARY :

with my co-host Dave Nicholson. You've heard of the pop-up cube, maybe. and good to see you again. Dave, It's great to see you again, why you started your company, At the same time, we also saw But that's not what you guys do, right? Maybe you could explain that. and delivering the solution in a manner of the business? Establishing that fact as the truth and aggregate all of that how has that changed in the era of cloud? how have things changed as you know, with, But at the same time, you also need This field is the address where Because at the same time, think And the stat was that 70% of the series, But anyway, so, tell us about that raise. The company was valued Dave: Awesome. And the round was, you know, (Dave and Manish laugh) wanted their pro-rata. is something that you but is that how you look And that's the formula that's where you got go to market fit. all of the capabilities end to end, of the futures, where you're going? is the key goal to aim for. at the AWS startup showcase, Well, thank you so that Mr. Vellante is still with us. (laughs) He's going to We are The Cube, the leader

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Dave	PERSON	0.99+
Dave Nicholson	PERSON	0.99+
Dave Vellante	PERSON	0.99+
David Vellante	PERSON	0.99+
Informatica	ORGANIZATION	0.99+
David Nicholson	PERSON	0.99+
2011	DATE	0.99+
12	QUANTITY	0.99+
2010	DATE	0.99+
$120 million	QUANTITY	0.99+
five systems	QUANTITY	0.99+
70%	QUANTITY	0.99+
200 different sources	QUANTITY	0.99+
Manish Sood	PERSON	0.99+
300	QUANTITY	0.99+
Vellante	PERSON	0.99+
Four days	QUANTITY	0.99+
Manish	PERSON	0.99+
a dozen systems	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Sand Hill Econometrics	ORGANIZATION	0.99+
two things	QUANTITY	0.99+
two sets	QUANTITY	0.99+
North America	LOCATION	0.99+
Reltio	ORGANIZATION	0.99+
$1.7 billion dollars	QUANTITY	0.99+
$120 million dollars	QUANTITY	0.99+
Silicon valley	LOCATION	0.98+
two worlds	QUANTITY	0.98+
1920s	DATE	0.98+
both	QUANTITY	0.98+
one	QUANTITY	0.98+
One	QUANTITY	0.98+
Rule of 40	OTHER	0.97+
Coke	ORGANIZATION	0.97+
24 different systems	QUANTITY	0.97+
one third	QUANTITY	0.97+
500 different applications	QUANTITY	0.97+
Rule of 40	OTHER	0.96+
single source	QUANTITY	0.95+
The Cube	TITLE	0.92+
2021	DATE	0.9+
The Cube	TITLE	0.9+
a hundred plus guests	QUANTITY	0.9+
Coca-Cola	ORGANIZATION	0.89+
pandemic	EVENT	0.89+
Reltio	PERSON	0.89+
single version	QUANTITY	0.89+
today	DATE	0.89+
40	QUANTITY	0.87+
Invent	EVENT	0.87+
U.S. government	ORGANIZATION	0.86+
about 300 plus systems	QUANTITY	0.86+
a hundred	QUANTITY	0.8+
at least one X	QUANTITY	0.78+
ground zero	QUANTITY	0.77+

Mark Hill, Digital River and Dave Vellante with closing thoughts

(upbeat music) >> Dave Vellante: Okay. We're back with Mark Hill. who's the Director of IT Operations at Digital River. Mark. Welcome to the cube. Good to see you. Thanks for having me. I really appreciate it. >> Hey, tell us a little bit more about Digital River, people know you as a, a payment platform, you've got marketing expertise. How do you differentiate from other e-commerce platforms? >> Well, I don't think people realize it, but Digital River was founded about 27 years ago. Primarily as a one-stop shop for e-commerce right? And so we offered site development, hosting, order management, fraud, expert controls, tax, um, physical and digital fulfillment, as well as multilingual customer service, advanced reporting and email marketing campaigns, right? So it was really just kind of a broad base for e-commerce. People could just go there. Didn't have to worry about anything. What we found over time as e-commerce has matured, we've really pivoted to a more focused API offering, specializing in just our global seller services. And to us that means payment, fraud, tax, and compliance management. So our, our global footprint allows companies to outsource that risk management and expand their markets internationally, um very quickly. And with low cost of entry. >> Yeah. It's an awesome business. And, you know, to your point, you were founded way before there was such a thing as the modern cloud, and yet you're a cloud native business. >> Yeah. >> Which I think talks to the fact that, that incumbents can evolve. They can reinvent themselves from a technology perspective. I wonder if you could first paint a picture of, of how you use the cloud, you use AWS, you know, I'm sure you got S3 in there. Maybe we could talk about that a little bit. >> Yeah, exactly. So when I think of a cloud native business, you kind of go back to the history. Well, 27 years ago, there wasn't a cloud, right? There wasn't any public infrastructure. It was, we basically stood our own data center up in a warehouse. And so over our history, we've managed our own infrastructure and collocated data centers over time through acquisitions and just how things worked. You know those over 10 data centers globally. for us it was expensive, well from a software hardware perspective, as well as, you know, getting the operational teams and expertise up to up to speed too. So, and it was really difficult to maintain and ultimately not core to our business, right? Nowhere in our mission statement, does it say that we're our goal is to manage data centers? So, so about five years ago, we started the journey from our hosted into AWS. It was a hundred percent lift it and shift plan, and we were able to bleed that migration a little over two years, right. Amazon really just fit for us. It was a natural, a natural place for us to land and they made it really easy here for us to not to say it wasn't difficult, but, but once in the public cloud, we really adopted a cloud first vision. Meaning that we'll not only consume their infrastructure as the service, but we'll also purposely evaluate and migrate to software as a service. So I come from a database background. So an example would be migrating from self deployed and managed relational databases over to AWS RDS, relational database service. You know, you're able to utilize the backups, the standby and the patching tools. Automagically, you know, with a click of the button. And that's pretty cool. And so we moved away from the time consuming operational tasks and, and really put our resources into revenue and generate new products, you know, like pivoting to an API offering. I always like to say that we stopped being busy and started being productive. >> Ha ha. I love that. >> That's really what the cloud has done for us. >> Is that you mean by cloud native? I mean, being able to take advantage of those primitives and native API. So what does that mean for your business? >> Yeah, exactly. I think, well, the first step for us was just to consume the infrastructure right, in that, but now we're looking at targeted services that they have in there too. So, you know, we have our, our, our data stream of services. So log analytics, for example, we used to put it locally on the machine. Now we're just dumping into an S3 bucket and we're using Kinesis to consume that data, put it in Eastic and go from there. And none of the services are managed by Digital River. We're just utilizing the capabilities that AWS has there too. So. >> And as an e-commerce player, retail company, we were ever concerned about moving to AWS as a possible competitor, or did you look at other clouds? What can you tell us about that? >> Yeah. And, and so I think e-commerce has really matured, right? And so we, we got squeezed out by the Amazons of the world. It's just not something that we were doing, but we had really a good area of expertise with our global seller services. But so we evaluated Microsoft. We evaluated AWS as well as Google. And, you know, back when we did that, Microsoft was Windows-based. Google was just coming into the picture, really didn't fit for what we were doing, but Amazon was just a natural fit. So we made a business decision, right? It was financially really the best decision for us. And so we didn't really put our feelings into it, right? We just had to move forward and it's better than where we're at. And we've been delighted actually. >> Yeah. It makes sense. Best cloud, best, best tech. >> Yeah. >> Yeah. I want to talk about ChaosSearch. A lot of people describe it as a data lake for log analytics. Do you agree with that? You know, what does that, what does that even mean? >> Well, from, from our perspective, because they're self-managed solutions were costly and difficult to maintain, you know, we had older versions of self deployed using Splunk, other things like that, too. So over time, we made a conscious decision to limit our data retention in generally seven days. But in a lot of cases, it was zero. We just couldn't consume that, that log data because of the cost, intimidating in itself, because of this limit, you know, we've lost important data points use for incident triage, problem management, problem management, trending, and other things too. So ChaosSearch has offered us a manageable and cost-effective opportunity to store months, or even years of data that we can use for operations, as well as trending automation. And really the big thing that we're pushing into is an event driven architecture so that we can proactively manage our services. >> Yeah. You mentioned Elastic, I know I've talked to people who use the ELK Stack. They say you there's these exponential growth in the amount of data. So you have to cut it off at whatever. I think you said seven days or, or less you're saying, you're not finding that with, with ChaosSearch? >> Yeah. Yeah, exactly. And that was one of the huge benefits here too. So, you know, we were losing out if there was a lower priority incident, for example, and people didn't get to it until eight, nine days later. Well, all the breadcrumbs are gone. So it was really just kind of a best guess or the incident really wasn't resolved. We didn't find a root cause. >> Yeah. Like my video camera down there. My, you know, my other house, somebody breaks in and I don't find out for, for two weeks and then the video's gone. That kind of same thing. >> Yep So, so, so how do you, can you give us some more detail on how you use your data lake and ChaosSearch specifically? >> Yeah, yeah. Yep. And, and so there's, there's many different areas, but what we found is we were able to easily consolidate data from multiple regions, into a single pane of glass to our customers. So internally and externally, you know, it relieves us of that operational support for the data extract transformation load process, right? It offered us also a seamless transition for the users, who were familiar with ElasticSearch, right? It wasn't, it wasn't difficult to move over. And so all these are a lot of selling points, benefits. And, and so now that we have all this data that we're able to, to capture and utilize, it gives us an opportunity to use machine learning, predictive analysis. And like I said, you know, driving to an event driven architecture. >> Okay. >> So that's, that's really what it's offered. And it's, it's been a huge benefit. >> So you're saying that you can speak the language of Elastic. You don't have to move the data out of an S3 bucket and you can scale more easily. Is that right? >> Yeah, yeah, absolutely. And, so for us, just because we're running in multiple regions to drive more high availability, having that data available from multiple regions in a single pane of glass or a single way to utilize it, is a huge benefit as well. Just, you know, not to mention actually having the data. >> What was the initial catalyst to sort of rethink what you were doing with log analytics? Was it cost? Was it flexibility? Scale? >> There was, I think all of those went into it. One of the main drivers. So, so last year we had a huge project, so we have our ELK Stack and it's probably from a decade ago, right? And, you know, a version point oh two or something, you know, anyways, it's a very old, and we went through a whole project to get that upgraded and migrated over. And it was just, we found it impossible internally to do, right? And so this was a method for us to get out of that business, to get rid of the security risks, the support risk, and have a way for people to easily migrate over. And it was just a nightmare here, consolidating the data across regions. And so that was, that was a huge thing, but yeah, it was also been the cost, right? It was, we were finding it cheaper to use ChaosSearch and have more data available versus what we're doing currently in AWS. >> Got it. I wonder if you could, you could share maybe any stories that you have or examples that, that underscore the impact that this approach to analytics is having on your business, maybe your team's everyday activities, any, any metrics you can provide or even just anecdotal information. >> Yeah. Yeah. And, and I think, you know, one coming from an Oracle background here, so Digital River historically has been an Oracle shop, right? And we've been developing a reporting and analytics environment on Oracle and that's complicated and expensive, right? We had to use advance features in Oracle, like partitioning materialized views, and bring in other supporting software like Informatica, Hyperion, Sbase, right? And all of these required our large team with a wide set of expertise into these separate focus areas, right? And the amount of data that we were pushing at the ChaosSearch would simply have overwhelmed this legacy method for data analysis than a relational database, right? In that dimension, the human toll of, of the stress of supporting that Oracle environment, meant that a 24 by seven by 365 environment, you know, which requires little or no downtime. So, just that alone, it's a huge thing. So it's allowed us to break away from Oracle, it's allowed us to use new technologies that make sense to solve business solutions. >> I, you know, ChaosSearch is really interesting company to me. I'm sure like me, you see a lot of startups, I'm sure they're knocking on your door every day. And I always like to say, okay, where are they going after? Are they going after a big market? How are they getting product market fit? And it seems like ChaosSearch has really looked at, hard at log analytics and kind of maybe disrupting the ELK Stack. But I see, you know, other potential use cases, you know, beyond analyzing logs. I wonder if you agree, are there other use cases that you see in your future? >> Yeah, exactly. So I think there's, one area would be Splunk, for example, we have that here too. So we use Splunk versus, you know, flat file analysis or other ways to, to capture that data just because from a PCI perspective, it needs to be secured for our compliance and certification, right? So ChaosSearch allows us to do that. There's different types of authentication. Um, really a hodgepodge of authentication that we used in our old environment, but ChaosSearch has a more easily usable one, One that we could set up, one that can really segregate the data and allow us to satisfy our PCR requirements too. So, but Splunk, but I think really deprecating all of our ElasticSearch environments are homegrown ones, but then also taking a hard look at what we're doing with relational databases, right? 27 years ago, there was only relational databases; Oracle and Sequel Server. So we we've been logging into those types of databases and that's not, cost-effective, it's not supportable. And so really getting away from that and putting the data where it belongs and that was easily accessible in a secure environment and allowing us to, to push our business forward. >> Yep. When you say, where the data belongs, right? It sounds like you're putting it in the bit bucket, S3, leaving it there, because it's the the most cost-effective way to do it and then sort of adding value on top of it. That's, what's interesting about ChaosSearch to me. >> Yeah, exactly. Yup. Yup. Versus the high priced storage, you know, that you have to use for a relational database, you know, and not to mention that the standbys, the backups. So, you know, you're duplicating, triplicating all this data too in an expensive manner, so yeah. Yeah. >> Yeah. Copy. Create. Moving data around and it gets expensive. It's funny when you say about databases, it's true. But database used to be such a boring market. Now it's exploded. Then you had the whole no Sequel movement and Sequel, Sequel became the killer app. You know, it's like full circle, right? >> Yeah, exactly. >> Well, anyway, good stuff, Mark, really, really appreciate you coming on the Cube and, and sharing your perspectives. We'd love to have you back in the future. >> Oh yeah, no problem. Thanks for having me. I really appreciate it. (upbeat music) >> Okay. So that's a wrap. You know, we're seeing a new era in data and analytics. For example, we're moving from a world where data lives in a cloud object store and needs to be extracted, moved into a new data store, transformed, cleansed, structured into a schema, and then analyzed. This cumbersome and expensive process is being revolutionized by companies like ChaosSearch that leave the data in place and then interact with it in a multi-lingual fashion with tooling, that's familiar to analytic pros. You know, I see a lot of potential for this technology beyond just login analytics use cases, but that's a good place to start. You know, really, if I project out into the future, we see a trend of the global data mesh, really taking hold where a data warehouse or data hub or a data lake or an S3 bucket is just a discoverable node on that mesh. And that's governed by an automated computational processes. And I do see ChaosSearch as an enabler of this vision, you know, but for now, if you're struggling to scale with existing tools or you're forced to limit your attention because data is exploding at too rapid a pace, you might want to check these guys out. You can schedule a demo just by clicking the button on the site to do that. Or stop by the ChaosSearch booth at AWS Reinvent. The Cube is going to also be there. We'll have two sets, a hundred guests. I'm Dave Volante. You're watching the Cube, your leader in high-tech coverage.

Published Date : Nov 15 2021

SUMMARY :

Welcome to the people know you as a, a payment platform, And to us that means payment, fraud, tax, And, you know, to your point, I wonder if you could and generate new products, you know, I love that. That's really what the Is that you mean by cloud native? So, you know, we have our, our, And, you know, Do you agree with that? and difficult to maintain, you know, So you have to cut it off at whatever. So, you know, we were losing out My, you know, my other And, and so now that we have all this data And it's, it's been a huge benefit. and you can scale more Just, you know, not to mention And, you know, a version any stories that you have And, and I think, you know, that you see in your future? use Splunk versus, you know, about ChaosSearch to me. Versus the high priced storage, you know, and Sequel, Sequel became the killer app. We'd love to have you back in the future. I really appreciate it. and needs to be extracted,

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Mark Hill	PERSON	0.99+
Dave Volante	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Mark	PERSON	0.99+
Digital River	ORGANIZATION	0.99+
ChaosSearch	ORGANIZATION	0.99+
seven days	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
two weeks	QUANTITY	0.99+
last year	DATE	0.99+
Sequel	ORGANIZATION	0.99+
two sets	QUANTITY	0.99+
24	QUANTITY	0.99+
zero	QUANTITY	0.99+
first step	QUANTITY	0.98+
Informatica	ORGANIZATION	0.98+
hundred percent	QUANTITY	0.98+
a decade ago	DATE	0.98+
365	QUANTITY	0.98+
over two years	QUANTITY	0.98+
27 years ago	DATE	0.97+
ElasticSearch	TITLE	0.97+
one	QUANTITY	0.96+
first	QUANTITY	0.96+
One	QUANTITY	0.96+
single way	QUANTITY	0.95+
Amazons	ORGANIZATION	0.95+
over 10 data centers	QUANTITY	0.94+
eight	DATE	0.94+
Elastic	TITLE	0.94+
S3	TITLE	0.94+
seven	QUANTITY	0.93+
first vision	QUANTITY	0.91+
single pane	QUANTITY	0.9+
nine days later	DATE	0.88+
one area	QUANTITY	0.87+
Hyperion	ORGANIZATION	0.85+
Windows	TITLE	0.83+
Cube	COMMERCIAL_ITEM	0.81+
Kinesis	TITLE	0.79+
about five years ago	DATE	0.77+
about 27 years ago	DATE	0.76+
Sbase	ORGANIZATION	0.74+
ELK Stack	COMMERCIAL_ITEM	0.74+
Eastic	LOCATION	0.73+
ChaosSearch	TITLE	0.72+
hundred guests	QUANTITY	0.72+
Splunk	ORGANIZATION	0.71+
S3	COMMERCIAL_ITEM	0.71+
one-	QUANTITY	0.7+
Elastic	ORGANIZATION	0.68+

Sanjeev Mohan, SanjMo & Nong Li, Okera | AWS Startup Showcase

(cheerful music) >> Hello everyone, welcome to today's session of theCUBE's presentation of AWS Startup Showcase, New Breakthroughs in DevOps, Data Analytics, Cloud Management Tools, featuring Okera from the cloud management migration track. I'm John Furrier, your host. We've got two great special guests today, Nong Li, founder and CTO of Okera, and Sanjeev Mohan, principal @SanjMo, and former research vice president of big data and advanced analytics at Gartner. He's a legend, been around the industry for a long time, seen the big data trends from the past, present, and knows the future. Got a great lineup here. Gentlemen, thank you for this, so, life in the trenches, lessons learned across compliance, cloud migration, analytics, and use cases for Fortune 1000s. Thanks for joining us. >> Thanks for having us. >> So Sanjeev, great to see you, I know you've seen this movie, I was saying that in the open, you've at Gartner seen all the visionaries, the leaders, you know everything about this space. It's changing extremely fast, and one of the big topics right out of the gate is not just innovation, we'll get to that, that's the fun part, but it's the regulatory compliance and audit piece of it. It's keeping people up at night, and frankly if not done right, slows things down. This is a big part of the showcase here, is to solve these problems. Share us your thoughts, what's your take on this wide-ranging issue? >> So, thank you, John, for bringing this up, and I'm so happy you mentioned the fact that, there's this notion that it can slow things down. Well I have to say that the old way of doing governance slowed things down, because it was very much about control and command. But the new approach to data governance is actually in my opinion, it's liberating data. If you want to democratize or monetize, whatever you want to call it, you cannot do it 'til you know you can trust said data and it's governed in some ways, so data governance has actually become very interesting, and today if you want to talk about three different areas within compliance regulatory, for example, we all know about the EU GDPR, we know California has CCPA, and in fact California is now getting even a more stringent version called CPRA in a couple of years, which is more aligned to GDPR. That is a first area we know we need to comply to that, we don't have any way out. But then, there are other areas, there is insider trading, there is how you secure the data that comes from third parties, you know, vendors, partners, suppliers, so Nong, I'd love to hand it over to you, and see if you can maybe throw some light into how our customers are handling these use cases. >> Yeah, absolutely, and I love what you said about balancing agility and liberating, in the face of what may be seen as things that slow you down. So we work with customers across verticals with old and new regulations, so you know, you brought up GDPR. One of our clients is using this to great effect to power their ecosystem. They are a very large retail company that has operations and customers across the world, obviously the importance of GDPR, and the regulations that imposes on them are very top of mind, and at the same time, being able to do effective targeting analytics on customer information is equally critical, right? So they're exactly at that spot where they need this customer insight for powering their business, and then the regulatory concerns are extremely prevalent for them. So in the context of GDPR, you'll hear about things like consent management and right to be forgotten, right? I, as a customer of that retailer should say "I don't want my information used for this purpose," right? "Use it for this, but not this." And you can imagine at a very, very large scale, when you have a billion customers, managing that, all the data you've collected over time through all of your devices, all of your telemetry, really, really challenging. And they're leveraging Okera embedded into their analytics platform so they can do both, right? Their data scientists and analysts who need to do everything they're doing to power the business, not have to think about these kind of very granular customer filtering requirements that need to happen, and then they leverage us to do that. So that's kind of new, right, GDPR, relatively new stuff at this point, but we obviously also work with customers that have regulations from a long long time ago, right? So I think you also mentioned insider trading and that supply chain, so we'll talk to customers, and they want really data-driven decisions on their supply chain, everything about their production pipeline, right? They want to understand all of that, and of course that makes sense, whether you're the CFO, if you're going to make business decisions, you need that information readily available, and supply chains as we know get more and more and more complex, we have more and more integrated into manufacturing and other verticals. So that's your, you're a little bit stuck, right? You want to be data-driven on those supply chain analytics, but at the same time, knowing the details of all the supply chain across all of your dependencies exposes your internal team to very high blackout periods or insider trading concerns, right? For example, if you knew Apple was buying a bunch of something, that's maybe information that only a select few people can have, and the way that manifests into data policies, 'cause you need the ability to have very, very scalable, per employee kind of scalable data restriction policies, so they can do their job easier, right? If we talk about speeding things up, instead of a very complex process for them to get approved, and approved on SEC regulations, all that kind of stuff, you can now go give them access to the part of the supply chain that they need, and no more, and limit their exposure and the company's exposure and all of that kind of stuff. So one of our customers able to do this, getting two orders of magnitude, a 100x reduction in the policies to manage the system like that. >> When I hear you talking like that, I think the old days of "Oh yeah, regulatory, it kind of slows down innovation, got to go faster," pretty basic variables, not a lot of combination of things to check. Now with cloud, there seems to be combinations, Sanjeev, because how complicated has the regulatory compliance and audit environment gotten in the past few years, because I hear security in a supply chain, I hear insider threats, I mean these are security channels, not just compliance department G&A kind of functions. You're talking about large-scale, potentially combinations of access, distribution, I mean it seems complicated. How much more complicated is it now, just than it was a few years ago? >> So, you know the way I look at it is, I'm just mentioning these companies just as an example, when PayPal or Ebay, all these companies started, they started in California. Anybody who ever did business on Ebay or PayPal, guess where that data was? In the US in some data center. Today you cannot do it. Today, data residency laws are really tough, and so now these organizations have to really understand what data needs to remain where. On top of that, we now have so many regulations. You know, earlier on if you were healthcare, you needed to be HIPAA compliant, or banking PCI DSS, but today, in the cloud, you really need to know, what data I have, what sensitive data I have, how do I discover it? So that data discovery becomes really important. What roles I have, so for example, let's say I work for a bank in the US, and I decide to move to Germany. Now, the old school is that a new rule will be created for me, because of German... >> John: New email address, all these new things happen, right? >> Right, exactly. So you end up with this really, a mass of rules and... And these are all static. >> Rules and tools, oh my god. >> Yeah. So Okera actually makes a lot of this dynamic, which reduces your cloud migration overhead, and Nong used some great examples, in fact, sorry if I take just a second, without mentioning any names, there's one of the largest banks in the world is going global in the digital space for the first time, and they're taking Okera with them. So... >> But what's the point? This is my next topic in cloud migration, I want to bring this up because, complexity, when you're in that old school kind of data center, waterfall, these old rules and tools, you have to roll this out, and it's a pain in the butt for everybody, it's a hassle, huge hassle. Cloud gives the agility, we know that, and cloud's becoming more secure, and I think now people see the on-premise, certainly things that'd be on-premises for secure things, I get that, but when you start getting into agility, and you now have cloud regions, you can start being more programmatic, so I want to get you guys' thoughts on the cloud migration, how companies who are now lifting and shifting, replatforming, what's the refactoring beyond that, because you can replatform in the cloud, and still some are kind of holding back on that. Then when you're in the cloud, the ones that are winning, the companies that are winning are the ones that are refactoring in the cloud. Doing things different with new services. Sanjeev, you start. >> Yeah, so you know, in fact lot of people tell me, "You know, we are just going to lift and shift into the cloud." But you're literally using cloud as a data center. You still have all the, if I may say, junk you had on-prem, you just moved it into the cloud, and now you're paying for it. In cloud, nothing is free. Every storage, every processing, you're going to pay for it. The most successful companies are the ones that are replatforming, they are taking advantage of the platform as a service or software as a service, so that includes things like, you pay as you go, you pay for exactly the amount you use, so you scale up and scale down or scale out and scale in, pretty quickly, you know? So you're handling that demand, so without replatforming, you are not really utilizing your- >> John: It's just hosting. >> Yeah, you're just hosting. >> It's basically hosting if you're not doing anything right there. >> Right. The reason why people sometimes resist to replatform, is because there's a hidden cost that we don't really talk about, PaaS adds 3x to IaaS cost. So, some organizations that are very mature, and they have a few thousand people in the IT department, for them, they're like "No, we just want to run it in the cloud, we have the expertise, and it's cheaper for us." But in the long run, to get the most benefit, people should think of using cloud as a service. >> Nong what's your take, because you see examples of companies, I'll just call one out, Snowflake for instance, they're essentially a data warehouse in the cloud, they refactored and they replatformed, they have a competitive advantage with the scale, so they have things that others don't have, that just hosting. Or even on-premise. The new model developing where there's real advantages, and how should companies think about this when they have to manage these data lakes, and they have to manage all these new access methods, but they want to maintain that operational stability and control and growth? >> Yeah, so. No? Yeah. >> There's a few topics that are all (indistinct) this topic. (indistinct) enterprises moving to the cloud, they do this maybe for some cost savings, but a ton of it is agility, right? The motor that the business can run at is just so much faster. So we'll work with companies in the context of cloud migration for data, where they might have a data warehouse they've been using for 20 years, and building policies over that time, right? And it's taking a long time to go proof of access and those kind of things, made more sense, right? If it took you months to procure a physical infrastructure, get machines shipped to your data center, then this data access taking so long feels okay, right? That's kind of the same rate that everything is moving. In the cloud, you can spin up new infrastructure instantly, so you don't want approvals for getting policies, creating rules, all that stuff that Sanjeev was talking about, that being slow is a huge, huge problem. So this is a very common environment that we see where they're trying to do that kind of thing. And then, for replatforming, again, they've been building these roles and processes and policies for 20 years. What they don't want to do is take 20 years to go migrate all that stuff into the cloud, right? That's probably an experience nobody wants to repeat, and frankly for many of them, people who did it originally may or may not be involved in this kind of effort. So we work with a lot of companies like that, they have their, they want stability, they got to have the business running as normal, they got to get moving into the new infrastructure, doing it in a new way that, you know, with all the kind of lessons learned, so, as Sanjeev said, one of these big banks that we work with, that classical story of on-premise data warehousing, maybe a little bit of Hadoop, moved onto AWS, S3, Snowflake, that kind of setup, extremely intricate policies, but let's go reimagine how we can do this faster, right? What we like to talk about is, you're an organization, you need a design that, if you onboarded 1000 more data users, that's got to be way, way easier than the first 10 you onboarded, right? You got to get it to be easier over time, in a really, really significant way. >> Talk about the data authorization safety factor, because I can almost imagine all the intricacies of these different tools creates specialism amongst people who operate them. And each one might have their own little authorization nuance. Trend is not to have that siloed mentality. What's your take on clients that want to just "Hey, you know what? I want to have the maximum agility, but I don't want to get caught in the weeds on some of these tripwires around access and authorization." >> Yeah, absolutely, I think it's real important to get the balance of it, right? Because if you are an enterprise, or if you have diversive teams, you want them to have the ability to use tools as best of breed for their purpose, right? But you don't want to have it be so that every tool has its own access and provisioning and whatever, that's definitely going to be a security, or at least, a lot of friction for you to get things going. So we think about that really hard, I think we've seen great success with things like SSO and Okta, right? Unifying authentication. We think there's a very, very similar thing about to happen with authorization. You want that single control plane that can integrate with all the tools, and still get the best of what you need, but it's much, much easier (indistinct). >> Okta's a great example, if people don't want to build their own thing and just go with that, same with what you guys are doing. That seems to be the dots that are connecting you, Sanjeev. The ease of use, but yet the stability factor. >> Right. Yeah, because John, today I may want to bring up a SQL editor to go into Snowflake, just as an example. Tomorrow, I may want to use the Azure Bot, you know? I may not even want to go to Snowflake, I may want to go to an underlying piece of data, or I may use Power BI, you know, for some reason, and come from Azure side, so the point is that, unless we are able to control, in some sort of a centralized manner, we will not get that consistency. And security you know is all or nothing. You cannot say "Well, I secured my Snowflake, but if you come through HTFS, Hadoop, or some, you know, that is outside of my realm, or my scope," what's the point? So that is why it is really important to have a watertight way, in fact I'm using just a few examples, maybe tomorrow I decide to use a data catalog, or I use Denodo as my data virtualization and I run a query. I'm the same identity, but I'm using different tools. I may use it from home, over VPN, or I may use it from the office, so you want this kind of flexibility, all encompassed in a policy, rather than a separate rule if you do this and this, if you do that, because then you end up with literally thousands of rules. >> And it's never going to stop, either, it's like fashion, the next tool's going to come out, it's going to be cool, and people are going to want to use it, again, you don't want to have to then move the train from the compliance side this way or that way, it's a lot of hassle, right? So we have that one capability, you can bring on new things pretty quickly. Nong, am I getting it right, this is kind of like the trend, that you're going to see more and more tools and/or things that are relevant or, certain use cases that might justify it, but yet, AppSec review, compliance review, I mean, good luck with that, right? >> Yeah, absolutely, I mean we certainly expect tools to continue to get more and more diverse, and better, right? Most innovation in the data space, and I think we... This is a great time for that, a lot of things that need to happen, and so on and so forth. So I think one of the early goals of the company, when we were just brainstorming, is we don't want data teams to not be able to use the tools because it doesn't have the right security (indistinct), right? Often those tools may not be focused on that particular area. They're great at what they do, but we want to make sure they're enabled, they do some enterprise investments, they see broader adoption much easier. A lot of those things. >> And I can hear the sirens in the background, that's someone who's not using your platform, they need some help there. But that's the case, I mean if you don't get this right, there are some consequences, and I think one of the things I would like to bring up on next track is, to talk through with you guys is, the persona pigeonhole role, "Oh yeah, a data person, the developer, the DevOps, the SRE," you start to see now, developers and with cloud developers, and data folks, people, however they get pigeonholed, kind of blending in, okay? You got data services, you got analytics, you got data scientists, you got more democratization, all these things are being kicked around, but the notion of a developer now is a data developer, because cloud is about DevOps, data is now a big part of it, it's not just some department, it's actually blending in. Just a cultural shift, can you guys share your thoughts on this trend of data people versus developers now becoming kind of one, do you guys see this happening, and if so, how? >> So when, John, I started my career, I was a DBA, and then a data architect. Today, I think you cannot have a DBA who's not a developer. That's just my opinion. Because there is so much of CICD, DevOps, that happens today, and you know, you write your code in Python, you put it in version control, you deploy using Jenkins, you roll back if there's a problem. And then, you are interacting, you're building your data to be consumed as a service. People in the past, you would have a thick client that would connect to the database over TCP/IP. Today, people don't want to connect over TCP/IP necessarily, they want to go by HTTP. And they want an API gateway in the middle. So, if you're a data architect or DBA, now you have to worry about, "I have a REST API call that's coming in, how am I going to secure that, and make sure that people are allowed to see that?" And that was just yesterday. >> Exactly. Got to build an abstraction layer. You got to build an abstraction layer. The old days, you have to worry about schema, and do all that, it was hard work back then, but now, it's much different. You got serverless, functions are going to show way... It's happening. >> Correct, GraphQL, and semantic layer, that just blows me away because, it used to be, it was all in database, then we took it out of database and we put it in a BI tool. So we said, like BusinessObjects started this whole trend. So we're like "Let's put the semantic layer there," well okay, great, but that was when everything was surrounding BusinessObjects and Oracle Database, or some other database, but today what if somebody brings Power BI or Tableau or Qlik, you know? Now you don't have a semantic layer access. So you cannot have it in the BI layer, so you move it down to its own layer. So now you've got a semantic layer, then where do you store your metrics? Same story repeats, you have a metrics layer, then the data centers want to do feature engineering, where do you store your features? You have a feature store. And before you know, this stack has disaggregated over and over and over, and then you've got layers and layers of specialization that are happening, there's query accelerators like Dremio or Trino, so you've got your data here, which Nong is trying really hard to protect, and then you've got layers and layers and layers of abstraction, and networks are fast, so the end user gets great service, but it's a nightmare for architects to bring all these things together. >> How do you tame the complexity? What's the bottom line? >> Nong? >> Yeah, so, I think... So there's a few things you need to do, right? So, we need to re-think how we express security permanence, right? I think you guys have just maybe in passing (indistinct) talked about creating all these rules and all that kind of stuff, that's been the way we've done things forever. We get to think about policies and mechanisms that are much more dynamic, right? You need to really think about not having to do any additional work, for the new things you add to the system. That's really, really core to solving the complexity problem, right? 'Cause that gets you those orders of magnitude reduction, system's got to be more expressive and map to those policies. That's one. And then second, it's got to be implemented at the right layer, right, to Sanjeev's point, close to the data, and it can service all of those applications and use cases at the same time, and have that uniformity and breadth of support. So those two things have to happen. >> Love this universal data authorization vision that you guys have. Super impressive, we had a CUBE Conversation earlier with Nick Halsey, who's a veteran in the industry, and he likes it. That's a good sign, 'cause he's seen a lot of stuff, too, Sanjeev, like yourself. This is a new thing, you're seeing compliance being addressed, and with programmatic, I'm imagining there's going to be bots someday, very quickly with AI that's going to scale that up, so they kind of don't get in the innovation way, they can still get what they need, and enable innovation. You've got cloud migration, which is only going faster and faster. Nong, you mentioned speed, that's what CloudOps is all about, developers want speed, not things in days or hours, they want it in minutes and seconds. And then finally, ultimately, how's it scale up, how does it scale up for the people operating and/or programming? These are three major pieces. What happens next? Where do we go from here, what's, the customer's sitting there saying "I need help, I need trust, I need scale, I need security." >> So, I just wrote a blog, if I may diverge a bit, on data observability. And you know, so there are a lot of these little topics that are critical, DataOps is one of them, so to me data observability is really having a transparent view of, what is the state of your data in the pipeline, anywhere in the pipeline? So you know, when we talk to these large banks, these banks have like 1000, over 1000 data pipelines working every night, because they've got that hundred, 200 data sources from which they're bringing data in. Then they're doing all kinds of data integration, they have, you know, we talked about Python or Informatica, or whatever data integration, data transformation product you're using, so you're combining this data, writing it into an analytical data store, something's going to break. So, to me, data observability becomes a very critical thing, because it shows me something broke, walk me down the pipeline, so I know where it broke. Maybe the data drifted. And I know Okera does a lot of work in data drift, you know? So this is... Nong, jump in any time, because I know we have use cases for that. >> Nong, before you get in there, I just want to highlight a quick point. I think you're onto something there, Sanjeev, because we've been reporting, and we believe, that data workflows is intellectual property. And has to be protected. Nong, go ahead, your thoughts, go ahead. >> Yeah, I mean, the observability thing is critically important. I would say when you want to think about what's next, I think it's really effectively bridging tools and processes and systems and teams that are focused on data production, with the data analysts, data scientists, that are focused on data consumption, right? I think bridging those two, which cover a lot of the topics we talked about, that's kind of where security almost meets, that's kind of where you got to draw it. I think for observability and pipelines and data movement, understanding that is essential. And I think broadly, on all of these topics, where all of us can be better, is if we're able to close the loop, get the feedback loop of success. So data drift is an example of the loop rarely being closed. It drifts upstream, and downstream users can take forever to figure out what's going on. And we'll have similar examples related to buy-ins, or data quality, all those kind of things, so I think that's really a problem that a lot of us should think about. How do we make sure that loop is closed as quickly as possible? >> Great insight. Quick aside, as the founder CTO, how's life going for you, you feel good? I mean, you started a company, doing great, it's not drifting, it's right in the stream, mainstream, right in the wheelhouse of where the trends are, you guys have a really crosshairs on the real issues, how you feeling, tell us a little bit about how you see the vision. >> Yeah, I obviously feel really good, I mean we started the company a little over five years ago, there are kind of a few things that we bet would happen, and I think those things were out of our control, I don't think we would've predicted GDPR security and those kind of things being as prominent as they are. Those things have really matured, probably as best as we could've hoped, so that feels awesome. Yeah, (indistinct) really expanded in these years, and it feels good. Feels like we're in the right spot. >> Yeah, it's great, data's competitive advantage, and certainly has a lot of issues. It could be a blocker if not done properly, and you're doing great work. Congratulations on your company. Sanjeev, thanks for kind of being my cohost in this segment, great to have you on, been following your work, and you continue to unpack it at your new place that you started. SanjMo, good to see your Twitter handle taking on the name of your new firm, congratulations. Thanks for coming on. >> Thank you so much, such a pleasure. >> Appreciate it. Okay, I'm John Furrier with theCUBE, you're watching today's session presentation of AWS Startup Showcase, featuring Okera, a hot startup, check 'em out, great solution, with a really great concept. Thanks for watching. (calm music)

Published Date : Sep 22 2021

SUMMARY :

and knows the future. and one of the big topics and I'm so happy you in the policies to manage of things to check. and I decide to move to Germany. So you end up with this really, is going global in the digital and you now have cloud regions, Yeah, so you know, if you're not doing anything right there. But in the long run, to and they have to manage all Yeah, so. In the cloud, you can spin up get caught in the weeds and still get the best of what you need, with what you guys are doing. the Azure Bot, you know? are going to want to use it, a lot of things that need to happen, the SRE," you start to see now, People in the past, you The old days, you have and networks are fast, so the for the new things you add to the system. that you guys have. So you know, when we talk Nong, before you get in there, I would say when you want I mean, you started a and I think those things and you continue to unpack it Thank you so much, of AWS Startup Showcase,

ENTITIES

Entity	Category	Confidence
Nick Halsey	PERSON	0.99+
John	PERSON	0.99+
John Furrier	PERSON	0.99+
California	LOCATION	0.99+
US	LOCATION	0.99+
Nong Li	PERSON	0.99+
Apple	ORGANIZATION	0.99+
Germany	LOCATION	0.99+
Ebay	ORGANIZATION	0.99+
PayPal	ORGANIZATION	0.99+
20 years	QUANTITY	0.99+
Sanjeev	PERSON	0.99+
Tomorrow	DATE	0.99+
two	QUANTITY	0.99+
GDPR	TITLE	0.99+
Sanjeev Mohan	PERSON	0.99+
Today	DATE	0.99+
One	QUANTITY	0.99+
yesterday	DATE	0.99+
Snowflake	TITLE	0.99+
today	DATE	0.99+
Python	TITLE	0.99+
Gartner	ORGANIZATION	0.99+
Tableau	TITLE	0.99+
first time	QUANTITY	0.99+
3x	QUANTITY	0.99+
both	QUANTITY	0.99+
100x	QUANTITY	0.99+
one	QUANTITY	0.99+
Okera	ORGANIZATION	0.99+
Informatica	ORGANIZATION	0.98+
two orders	QUANTITY	0.98+
Nong	ORGANIZATION	0.98+
SanjMo	PERSON	0.98+
second	QUANTITY	0.98+
Power BI	TITLE	0.98+
1000	QUANTITY	0.98+
tomorrow	DATE	0.98+
two things	QUANTITY	0.98+
Qlik	TITLE	0.98+
each one	QUANTITY	0.97+
thousands of rules	QUANTITY	0.97+
1000 more data users	QUANTITY	0.96+
Twitter	ORGANIZATION	0.96+
first 10	QUANTITY	0.96+
Okera	PERSON	0.96+
AWS	ORGANIZATION	0.96+
hundred, 200 data sources	QUANTITY	0.95+
HIPAA	TITLE	0.94+
EU	ORGANIZATION	0.94+
CCPA	TITLE	0.94+
over 1000 data pipelines	QUANTITY	0.93+
single	QUANTITY	0.93+
first area	QUANTITY	0.93+
two great special guests	QUANTITY	0.92+
BusinessObjects	TITLE	0.92+

Breaking Analysis: How JPMC is Implementing a Data Mesh Architecture on the AWS Cloud

>> From theCUBE studios in Palo Alto and Boston, bringing you data-driven insights from theCUBE and ETR. This is braking analysis with Dave Vellante. >> A new era of data is upon us, and we're in a state of transition. You know, even our language reflects that. We rarely use the phrase big data anymore, rather we talk about digital transformation or digital business, or data-driven companies. Many have come to the realization that data is a not the new oil, because unlike oil, the same data can be used over and over for different purposes. We still use terms like data as an asset. However, that same narrative, when it's put forth by the vendor and practitioner communities, includes further discussions about democratizing and sharing data. Let me ask you this, when was the last time you wanted to share your financial assets with your coworkers or your partners or your customers? Hello everyone, and welcome to this week's Wikibon Cube Insights powered by ETR. In this breaking analysis, we want to share our assessment of the state of the data business. We'll do so by looking at the data mesh concept and how a leading financial institution, JP Morgan Chase is practically applying these relatively new ideas to transform its data architecture. Let's start by looking at what is the data mesh. As we've previously reported many times, data mesh is a concept and set of principles that was introduced in 2018 by Zhamak Deghani who's director of technology at ThoughtWorks, it's a global consultancy and software development company. And she created this movement because her clients, who were some of the leading firms in the world had invested heavily in predominantly monolithic data architectures that had failed to deliver desired outcomes in ROI. So her work went deep into trying to understand that problem. And her main conclusion that came out of this effort was the world of data is distributed and shoving all the data into a single monolithic architecture is an approach that fundamentally limits agility and scale. Now a profound concept of data mesh is the idea that data architectures should be organized around business lines with domain context. That the highly technical and hyper specialized roles of a centralized cross functional team are a key blocker to achieving our data aspirations. This is the first of four high level principles of data mesh. So first again, that the business domain should own the data end-to-end, rather than have it go through a centralized big data technical team. Second, a self-service platform is fundamental to a successful architectural approach where data is discoverable and shareable across an organization and an ecosystem. Third, product thinking is central to the idea of data mesh. In other words, data products will power the next era of data success. And fourth data products must be built with governance and compliance that is automated and federated. Now there's lot more to this concept and there are tons of resources on the web to learn more, including an entire community that is formed around data mesh. But this should give you a basic idea. Now, the other point is that, in observing Zhamak Deghani's work, she is deliberately avoided discussions around specific tooling, which I think has frustrated some folks because we all like to have references that tie to products and tools and companies. So this has been a two-edged sword in that, on the one hand it's good, because data mesh is designed to be tool agnostic and technology agnostic. On the other hand, it's led some folks to take liberties with the term data mesh and claim mission accomplished when their solution, you know, maybe more marketing than reality. So let's look at JP Morgan Chase in their data mesh journey. Is why I got really excited when I saw this past week, a team from JPMC held a meet up to discuss what they called, data lake strategy via data mesh architecture. I saw that title, I thought, well, that's a weird title. And I wondered, are they just taking their legacy data lakes and claiming they're now transformed into a data mesh? But in listening to the presentation, which was over an hour long, the answer is a definitive no, not at all in my opinion. A gentleman named Scott Hollerman organized the session that comprised these three speakers here, James Reid, who's a divisional CIO at JPMC, Arup Nanda who is a technologist and architect and Serita Bakst who is an information architect, again, all from JPMC. This was the most detailed and practical discussion that I've seen to date about implementing a data mesh. And this is JP Morgan's their approach, and we know they're extremely savvy and technically sound. And they've invested, it has to be billions in the past decade on data architecture across their massive company. And rather than dwell on the downsides of their big data past, I was really pleased to see how they're evolving their approach and embracing new thinking around data mesh. So today, we're going to share some of the slides that they use and comment on how it dovetails into the concept of data mesh that Zhamak Deghani has been promoting, and at least as we understand it. And dig a bit into some of the tooling that is being used by JP Morgan, particularly around it's AWS cloud. So the first point is it's all about business value, JPMC, they're in the money business, and in that world, business value is everything. So Jr Reid, the CIO showed this slide and talked about their overall goals, which centered on a cloud first strategy to modernize the JPMC platform. I think it's simple and sensible, but there's three factors on which he focused, cut costs always short, you got to do that. Number two was about unlocking new opportunities, or accelerating time to value. But I was really happy to see number three, data reuse. That's a fundamental value ingredient in the slide that he's presenting here. And his commentary was all about aligning with the domains and maximizing data reuse, i.e. data is not like oil and making sure there's appropriate governance around that. Now don't get caught up in the term data lake, I think it's just how JP Morgan communicates internally. It's invested in the data lake concept, so they use water analogies. They use things like data puddles, for example, which are single project data marts or data ponds, which comprise multiple data puddles. And these can feed in to data lakes. And as we'll see, JPMC doesn't strive to have a single version of the truth from a data standpoint that resides in a monolithic data lake, rather it enables the business lines to create and own their own data lakes that comprise fit for purpose data products. And they do have a single truth of metadata. Okay, we'll get to that. But generally speaking, each of the domains will own end-to-end their own data and be responsible for those data products, we'll talk about that more. Now the genesis of this was sort of a cloud first platform, JPMC is leaning into public cloud, which is ironic since the early days, in the early days of cloud, all the financial institutions were like never. Anyway, JPMC is going hard after it, they're adopting agile methods and microservices architectures, and it sees cloud as a fundamental enabler, but it recognizes that on-prem data must be part of the data mesh equation. Here's a slide that starts to get into some of that generic tooling, and then we'll go deeper. And I want to make a couple of points here that tie back to Zhamak Deghani's original concept. The first is that unlike many data architectures, this puts data as products right in the fat middle of the chart. The data products live in the business domains and are at the heart of the architecture. The databases, the Hadoop clusters, the files and APIs on the left-hand side, they serve the data product builders. The specialized roles on the right hand side, the DBA's, the data engineers, the data scientists, the data analysts, we could have put in quality engineers, et cetera, they serve the data products. Because the data products are owned by the business, they inherently have the context that is the middle of this diagram. And you can see at the bottom of the slide, the key principles include domain thinking, an end-to-end ownership of the data products. They build it, they own it, they run it, they manage it. At the same time, the goal is to democratize data with a self-service as a platform. One of the biggest points of contention of data mesh is governance. And as Serita Bakst said on the Meetup, metadata is your friend, and she kind of made a joke, she said, "This sounds kind of geeky, but it's important to have a metadata catalog to understand where data resides and the data lineage in overall change management. So to me, this really past the data mesh stink test pretty well. Let's look at data as products. CIO Reid said the most difficult thing for JPMC was getting their heads around data product, and they spent a lot of time getting this concept to work. Here's the slide they use to describe their data products as it related to their specific industry. They set a common language and taxonomy is very important, and you can imagine how difficult that was. He said, for example, it took a lot of discussion and debate to define what a transaction was. But you can see at a high level, these three product groups around wholesale, credit risk, party, and trade and position data as products, and each of these can have sub products, like, party, we'll have to know your customer, KYC for example. So a key for JPMC was to start at a high level and iterate to get more granular over time. So lots of decisions had to be made around who owns the products and the sub-products. The product owners interestingly had to defend why that product should even exist, what boundaries should be in place and what data sets do and don't belong in the various products. And this was a collaborative discussion, I'm sure there was contention around that between the lines of business. And which sub products should be part of these circles? They didn't say this, but tying it back to data mesh, each of these products, whether in a data lake or a data hub or a data pond or data warehouse, data puddle, each of these is a node in the global data mesh that is discoverable and governed. And supporting this notion, Serita said that, "This should not be infrastructure-bound, logically, any of these data products, whether on-prem or in the cloud can connect via the data mesh." So again, I felt like this really stayed true to the data mesh concept. Well, let's look at some of the key technical considerations that JPM discussed in quite some detail. This chart here shows a diagram of how JP Morgan thinks about the problem, and some of the challenges they had to consider were how to write to various data stores, can you and how can you move data from one data store to another? How can data be transformed? Where's the data located? Can the data be trusted? How can it be easily accessed? Who has the right to access that data? These are all problems that technology can help solve. And to address these issues, Arup Nanda explained that the heart of this slide is the data in ingestor instead of ETL. All data producers and contributors, they send their data to the ingestor and the ingestor then registers the data so it's in the data catalog. It does a data quality check and it tracks the lineage. Then, data is sent to the router, which persists the data in the data store based on the best destination as informed by the registration. This is designed to be a flexible system. In other words, the data store for a data product is not fixed, it's determined at the point of inventory, and that allows changes to be easily made in one place. The router simply reads that optimal location and sends it to the appropriate data store. Nowadays you see the schema infer there is used when there is no clear schema on right. In this case, the data product is not allowed to be consumed until the schema is inferred, and then the data goes into a raw area, and the inferer determines the schema and then updates the inventory system so that the data can be routed to the proper location and properly tracked. So that's some of the detail of how the sausage factory works in this particular use case, it was very interesting and informative. Now let's take a look at the specific implementation on AWS and dig into some of the tooling. As described in some detail by Arup Nanda, this diagram shows the reference architecture used by this group within JP Morgan, and it shows all the various AWS services and components that support their data mesh approach. So start with the authorization block right there underneath Kinesis. The lake formation is the single point of entitlement and has a number of buckets including, you can see there the raw area that we just talked about, a trusted bucket, a refined bucket, et cetera. Depending on the data characteristics at the data catalog registration block where you see the glue catalog, that determines in which bucket the router puts the data. And you can see the many AWS services in use here, identity, the EMR, the elastic MapReduce cluster from the legacy Hadoop work done over the years, the Redshift Spectrum and Athena, JPMC uses Athena for single threaded workloads and Redshift Spectrum for nested types so they can be queried independent of each other. Now remember very importantly, in this use case, there is not a single lake formation, rather than multiple lines of business will be authorized to create their own lakes, and that creates a challenge. So how can that be done in a flexible and automated manner? And that's where the data mesh comes into play. So JPMC came up with this federated lake formation accounts idea, and each line of business can create as many data producer or consumer accounts as they desire and roll them up into their master line of business lake formation account. And they cross-connect these data products in a federated model. And these all roll up into a master glue catalog so that any authorized user can find out where a specific data element is located. So this is like a super set catalog that comprises multiple sources and syncs up across the data mesh. So again to me, this was a very well thought out and practical application of database. Yes, it includes some notion of centralized management, but much of that responsibility has been passed down to the lines of business. It does roll up to a master catalog, but that's a metadata management effort that seems compulsory to ensure federated and automated governance. As well at JPMC, the office of the chief data officer is responsible for ensuring governance and compliance throughout the federation. All right, so let's take a look at some of the suspects in this world of data mesh and bring in the ETR data. Now, of course, ETR doesn't have a data mesh category, there's no such thing as that data mesh vendor, you build a data mesh, you don't buy it. So, what we did is we use the ETR dataset to select and filter on some of the culprits that we thought might contribute to the data mesh to see how they're performing. This chart depicts a popular view that we often like to share. It's a two dimensional graphic with net score or spending momentum on the vertical axis and market share or pervasiveness in the data set on the horizontal axis. And we filtered the data on sectors such as analytics, data warehouse, and the adjacencies to things that might fit into data mesh. And we think that these pretty well reflect participation that data mesh is certainly not all compassing. And it's a subset obviously, of all the vendors who could play in the space. Let's make a few observations. Now as is often the case, Azure and AWS, they're almost literally off the charts with very high spending velocity and large presence in the market. Oracle you can see also stands out because much of the world's data lives inside of Oracle databases. It doesn't have the spending momentum or growth, but the company remains prominent. And you can see Google Cloud doesn't have nearly the presence in the dataset, but it's momentum is highly elevated. Remember that red dotted line there, that 40% line, anything over that indicates elevated spending momentum. Let's go to Snowflake. Snowflake is consistently shown to be the gold standard in net score in the ETR dataset. It continues to maintain highly elevated spending velocity in the data. And in many ways, Snowflake with its data marketplace and its data cloud vision and data sharing approach, fit nicely into the data mesh concept. Now, a caution, Snowflake has used the term data mesh in it's marketing, but in our view, it lacks clarity, and we feel like they're still trying to figure out how to communicate what that really is. But is really, we think a lot of potential there to that vision. Databricks is also interesting because the firm has momentum and we expect further elevated levels in the vertical axis in upcoming surveys, especially as it readies for its IPO. The firm has a strong product and managed service, and is really one to watch. Now we included a number of other database companies for obvious reasons like Redis and Mongo, MariaDB, Couchbase and Terradata. SAP as well is in there, but that's not all database, but SAP is prominent so we included them. As is IBM more of a database, traditional database player also with the big presence. Cloudera includes Hortonworks and HPE Ezmeral comprises the MapR business that HPE acquired. So these guys got the big data movement started, between Cloudera, Hortonworks which is born out of Yahoo, which was the early big data, sorry early Hadoop innovator, kind of MapR when it's kind of owned course, and now that's all kind of come together in various forms. And of course, we've got Talend and Informatica are there, they are two data integration companies that are worth noting. We also included some of the AI and ML specialists and data science players in the mix like DataRobot who just did a monster $250 million round. Dataiku, H2O.ai and ThoughtSpot, which is all about democratizing data and injecting AI, and I think fits well into the data mesh concept. And you know we put VMware Cloud in there for reference because it really is the predominant on-prem infrastructure platform. All right, let's wrap with some final thoughts here, first, thanks a lot to the JP Morgan team for sharing this data. I really want to encourage practitioners and technologists, go to watch the YouTube of that meetup, we'll include it in the link of this session. And thank you to Zhamak Deghani and the entire data mesh community for the outstanding work that you're doing, challenging the established conventions of monolithic data architectures. The JPM presentation, it gives you real credibility, it takes Data Mesh well beyond concept, it demonstrates how it can be and is being done. And you know, this is not a perfect world, you're going to start somewhere and there's going to be some failures, the key is to recognize that shoving everything into a monolithic data architecture won't support massive scale and agility that you're after. It's maybe fine for smaller use cases in smaller firms, but if you're building a global platform in a data business, it's time to rethink data architecture. Now much of this is enabled by the cloud, but cloud first doesn't mean cloud only, doesn't mean you'll leave your on-prem data behind, on the contrary, you have to include non-public cloud data in your Data Mesh vision just as JPMC has done. You've got to get some quick wins, that's crucial so you can gain credibility within the organization and grow. And one of the key takeaways from the JP Morgan team is, there is a place for dogma, like organizing around data products and domains and getting that right. On the other hand, you have to remain flexible because technologies is going to come, technology is going to go, so you got to be flexible in that regard. And look, if you're going to embrace the metaphor of water like puddles and ponds and lakes, we suggest maybe a little tongue in cheek, but still we believe in this, that you expand your scope to include data ocean, something John Furry and I have talked about and laughed about extensively in theCUBE. Data oceans, it's huge. It's the new data lake, go transcend data lake, think oceans. And think about this, just as we're evolving our language, we should be evolving our metrics. Much the last the decade of big data was around just getting the stuff to work, getting it up and running, standing up infrastructure and managing massive, how much data you got? Massive amounts of data. And there were many KPIs built around, again, standing up that infrastructure, ingesting data, a lot of technical KPIs. This decade is not just about enabling better insights, it's a more than that. Data mesh points us to a new era of data value, and that requires the new metrics around monetizing data products, like how long does it take to go from data product conception to monetization? And how does that compare to what it is today? And what is the time to quality if the business owns the data, and the business has the context? the quality that comes out of them, out of the shoot should be at a basic level, pretty good, and at a higher mark than out of a big data team with no business context. Automation, AI, and very importantly, organizational restructuring of our data teams will heavily contribute to success in the coming years. So we encourage you, learn, lean in and create your data future. Okay, that's it for now, remember these episodes, they're all available as podcasts wherever you listen, all you got to do is search, breaking analysis podcast, and please subscribe. Check out ETR's website at etr.plus for all the data and all the survey information. We publish a full report every week on wikibon.com and siliconangle.com. And you can get in touch with us, email me david.vellante@siliconangle.com, you can DM me @dvellante, or you can comment on my LinkedIn posts. This is Dave Vellante for theCUBE insights powered by ETR. Have a great week everybody, stay safe, be well, and we'll see you next time. (upbeat music)

Published Date : Jul 12 2021

SUMMARY :

This is braking analysis and the adjacencies to things

ENTITIES

Entity	Category	Confidence
JPMC	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
2018	DATE	0.99+
Zhamak Deghani	PERSON	0.99+
James Reid	PERSON	0.99+
JP Morgan	ORGANIZATION	0.99+
JP Morgan	ORGANIZATION	0.99+
Cloudera	ORGANIZATION	0.99+
Serita Bakst	PERSON	0.99+
IBM	ORGANIZATION	0.99+
HPE	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Scott Hollerman	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
40%	QUANTITY	0.99+
JP Morgan Chase	ORGANIZATION	0.99+
Serita	PERSON	0.99+
Yahoo	ORGANIZATION	0.99+
Arup Nanda	PERSON	0.99+
each	QUANTITY	0.99+
ThoughtWorks	ORGANIZATION	0.99+
first	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
david.vellante@siliconangle.com	OTHER	0.99+
each line	QUANTITY	0.99+
Terradata	ORGANIZATION	0.99+
Redis	ORGANIZATION	0.99+
$250 million	QUANTITY	0.99+
first point	QUANTITY	0.99+
three factors	QUANTITY	0.99+
Second	QUANTITY	0.99+
MapR	ORGANIZATION	0.99+
today	DATE	0.99+
Informatica	ORGANIZATION	0.99+
Talend	ORGANIZATION	0.99+
John Furry	PERSON	0.99+
Zhamak Deghani	PERSON	0.99+
first platform	QUANTITY	0.98+
YouTube	ORGANIZATION	0.98+
fourth	QUANTITY	0.98+
single	QUANTITY	0.98+
One	QUANTITY	0.98+
Third	QUANTITY	0.97+
Couchbase	ORGANIZATION	0.97+
three speakers	QUANTITY	0.97+
two data	QUANTITY	0.97+
first strategy	QUANTITY	0.96+
one	QUANTITY	0.96+
one place	QUANTITY	0.96+
Jr Reid	PERSON	0.96+
single lake	QUANTITY	0.95+
SAP	ORGANIZATION	0.95+
wikibon.com	OTHER	0.95+
siliconangle.com	OTHER	0.94+
Azure	ORGANIZATION	0.93+

Breaking Analysis: Unpacking Oracle’s Autonomous Data Warehouse Announcement

(upbeat music) >> On February 19th of this year, Barron's dropped an article declaring Oracle, a cloud giant and the article explained why the stock was a buy. Investors took notice and the stock ran up 18% over the next nine trading days and it peaked on March 9th, the day before Oracle announced its latest earnings. The company beat consensus earnings on both top-line and EPS last quarter, but investors, they did not like Oracle's tepid guidance and the stock pulled back. But it's still, as you can see, well above its pre-Barron's article price. What does all this mean? Is Oracle a cloud giant? What are its growth prospects? Now many parts of Oracle's business are growing including Fusion ERP, Fusion HCM, NetSuite, we're talking deep into the double digits, 20 plus percent growth. It's OnPrem legacy licensed business however, continues to decline and that moderates, the overall company growth because that OnPrem business is so large. So the overall Oracle's growing in the low single digits. Now what stands out about Oracle is it's recurring revenue model. That figure, the company says now it represents 73% of its revenue and that's going to continue to grow. Now two other things stood out on the earnings call to us. First, Oracle plans on increasing its CapEX by 50% in the coming quarter, that's a lot. Now it's still far less than AWS Google or Microsoft Spend on capital but it's a meaningful data point. Second Oracle's consumption revenue for Autonomous Database and Cloud Infrastructure, OCI or Oracle Cloud Infrastructure grew at 64% and 139% respectively and these two factors combined with the CapEX Spend suggest that the company has real momentum. I mean look, it's possible that the CapEx announcements maybe just optics in they're front loading, some spend to show the street that it's a player in cloud but I don't think so. Oracle's Safra Catz's usually pretty disciplined when it comes to it's spending. Now today on March 17th, Oracle announced updates towards Autonomous Data Warehouse and with me is David Floyer who has extensively researched Oracle over the years and today we're going to unpack the Oracle Autonomous Data Warehouse, ADW announcement. What it means to customers but we also want to dig into Oracle's strategy. We want to compare it to some other prominent database vendors specifically, AWS and Snowflake. David Floyer, Welcome back to The Cube, thanks for making some time for me. >> Thank you Vellante, great pleasure to be here. >> All right, I want to get into the news but I want to start with this idea of the autonomous database which Oracle's announcement today is building on. Oracle uses the analogy of a self-driving car. It's obviously powerful metaphor as they call it the self-driving database and my takeaway is that, this means that the system automatically provisions, it upgrades, it does all the patching for you, it tunes itself. Oracle claims that all reduces labor costs or admin costs by 90%. So I ask you, is this the right interpretation of what Oracle means by autonomous database? And is it real? >> Is that the right interpretation? It's a nice analogy. It's a test to that analogy, isn't it? I would put it as the first stage of the Autonomous Data Warehouse was to do the things that you talked about, which was the tuning, the provisioning, all of that sort of thing. The second stage is actually, I think more interesting in that what they're focusing on is making it easy to use for the end user. Eliminating the requirement for IT, staff to be there to help in the actual using of it and that is a very big step for them but an absolutely vital step because all of the competition focusing on ease of use, ease of use, ease of use and cheapness of being able to manage and deploy. But, so I think that is the really important area that Oracle has focused on and it seemed to have done so very well. >> So in your view, is this, I mean you don't really hear a lot of other companies talking about this analogy of the self-driving database, is this unique? Is it differentiable for Oracle? If so, why, or maybe you could help us understand that a little bit better. >> Well, the whole strategy is unique in its breadth. It has really brought together a whole number of things together and made it of its type the best. So it has a single, whole number of data sources and database types. So it's got a very broad range of different ways that you can look at the data and the second thing that is also excellent is it's a platform. It is fully self provisioned and its functionality is very, very broad indeed. The quality of the original SQL and the query languages, etc, is very, very good indeed and it's a better agent to do joints for example, is excellent. So all of the building blocks are there and together with it's sharing of the same data with OLTP and inference and in memory data paces as well. All together the breadth of what they have is unique and very, very powerful. >> I want to come back to this but let's get into the news a little bit and the announcement. I mean, it seems like what's new in the autonomous data warehouse piece for Oracle's new tooling around four areas that so Andy Mendelsohn, the head of this group instead of the guy who releases his baby, he talked about four things. My takeaway, faster simpler loads, simplified transforms, autonomous machine learning models which are facilitating, What do you call it? Citizen data science and then faster time to insights. So tooling to make those four things happen. What's your take and takeaways on the news? >> I think those are all correct. I would add the ease of use in terms of being able to drag and drop, the user interface has been dramatically improved. Again, I think those, strategically are actually more important that the others are all useful and good components of it but strategically, I think is more important. There's ease of use, the use of apex for example, are more important. And, >> Why are they more important strategically? >> Because they focus on the end users capability. For example, one of other things that they've started to introduce is Python together with their spatial databases, for example. That is really important that you reach out to the developer as they are and what tools they want to use. So those type of ease of use things, those types of things are respecting what the end users use. For example, they haven't come out with anything like click or Tableau. They've left that there for that marketplace for the end user to use what they like best. >> Do you mean, they're not trying to compete with those two tools. They indeed had a laundry list of stuff that they supported, Talend, Tableau, Looker, click, Informatica, IBM, I had IBM there. So their claim was, hey, we're open. But so that's smart. That's just, hey, they realized that people use these tools. >> I'm trying to exclude other people, be a platform and be an ecosystem for the end users. >> Okay, so Mendelsohn who made the announcement said that Oracle's the smartphone of databases and I think, I actually think Alison kind of used that or maybe that was us planing to have, I thought he did like the iPhone of when he announced the exit data way back when the integrated hardware and software but is that how you see it, is Oracle, the smartphone of databases? >> It is, I mean, they are trying to own the complete stack, the hardware with the exit data all the way up to the databases at the data warehouses and the OLTP databases, the inference databases. They're trying to own the complete stack from top to bottom and that's what makes autonomy process possible. You can make it autonomous when you control all of that. Take away all of the requirements for IT in the business itself. So it's democratizing the use of data warehouses. It is pushing it out to the lines of business and it's simplifying it and making it possible to push out so that they can own their own data. They can manage their own data and they do not need an IT person from headquarters to help them. >> Let's stay in this a little bit more and then I want to go into some of the competitive stuff because Mendelsohn mentioned AWS several times. One of the things that struck me, he said, hey, we're basically one API 'cause we're doing analytics in the cloud, we're doing data in the cloud, we're doing integration in the cloud and that's sort of a big part of the value proposition. He made some comparisons to Redshift. Of course, I would say, if you can't find a workload where you beat your big competitor then you shouldn't be in this business. So I take those things with a grain of salt but one of the other things that caught me is that migrating from OnPrem to Oracle, Oracle Cloud was very simple and I think he might've made some comparisons to other platforms. And this to me is important because he also brought in that Gartner data. We looked at that Gardner data when they came out with it in the operational database class, Oracle smoked everybody. They were like way ahead and the reason why I think that's important is because let's face it, the Mission Critical Workloads, when you look at what's moving into AWS, the Mission Critical Workloads, the high performance, high criticality OLTP stuff. That's not moving in droves and you've made the point often that companies with their own cloud particularly, Oracle you've mentioned this about IBM for certain, DB2 for instance, customers are going to, there should be a lower risk environment moving from OnPrem to their cloud, because you could do, I don't think you could get Oracle RAC on AWS. For example, I don't think EXIF data is running in AWS data centers and so that like component is going to facilitate migration. What's your take on all that spiel? >> I think that's absolutely right. You all crown Jewels, the most expensive and the most valuable applications, the mission-critical applications. The ones that have got to take a beating, keep on taking. So those types of applications are where Oracle really shines. They own a very large high percentage of those Mission Critical Workloads and you have the choice if you're going to AWS, for example of either migrating to Oracle on AWS and that is frankly not a good fit at all. There're a lot of constraints to running large systems on AWS, large mission critical systems. So that's not an option and then the option, of course, that AWS will push is move to a Roller, change your way of writing applications, make them tiny little pieces and stitch them all together with microservices and that's okay if you're a small organization but that has got a lot of problems in its own, right? Because then you, the user have to stitch all those pieces together and you're responsible for testing it and you're responsible for looking after it. And that as you grow becomes a bigger and bigger overhead. So AWS, in my opinion needs to have a move towards a tier-one database of it's own and it's not in that position at the moment. >> Interesting, okay. So, let's talk about the competitive landscape and the choices that customers have. As I said, Mendelssohn mentioned AWS many times, Larry on the calls often take shy, it's a compliment to me. When Larry Ellison calls you out, that means you've made it, you're doing well. We've seen it over the years, whether it's IBM or Workday or Salesforce, even though Salesforce's big Oracle customer 'cause AWS, as we know are Oracle customer as well, even though AWS tells us they've off called when you peel the onion >> Five years should be great, some of the workers >> Well, as I said, I believe they're still using Oracle in certain workloads. Way, way, we digress. So AWS though, they take a different approach and I want to push on this a little bit with database. It's got more than a dozen, I think purpose-built databases. They take this kind of right tool for the right job approach was Oracle there converging all this function into a single database. SQL JSON graph databases, machine learning, blockchain. I'd love to talk about more about blockchain if we have time but seems to me that the right tool for the right job purpose-built, very granular down to the primitives and APIs. That seems to me to be a pretty viable approach versus kind of a Swiss Army approach. How do you compare the two? >> Yes, and it is to many initial programmers who are very interested for example, in graph databases or in time series databases. They are looking for a cheap database that will do the job for a particular project and that makes, for the program or for that individual piece of work is making a very sensible way of doing it and they pay for ads on it's clear cloud dynamics. The challenge as you have more and more data and as you're building up your data warehouse in your data lakes is that you do not want to have to move data from one place to another place. So for example, if you've got a Roller,, you have to move the database and it's a pretty complicated thing to do it, to move it to Redshift. It's a five or six steps to do that and each of those costs money and each of those take time. More importantly, they take time. The Oracle approach is a single database in terms of all the pieces that obviously you have multiple databases you have different OLTP databases and data warehouse databases but as a single architecture and a single design which means that all of the work in terms of moving stuff from one place to another place is within Oracle itself. It's Oracle that's doing that work for you and as you grow, that becomes very, very important. To me, very, very important, cost saving. The overhead of all those different ones and the databases themselves originate with all as open source and they've done very well with it and then there's a large revenue stream behind the, >> The AWS, you mean? >> Yes, the original database is in AWS and they've done a lot of work in terms of making it set with the panels, etc. But if a larger organization, especially very large ones and certainly if they want to combine, for example data warehouse with the OLTP and the inference which is in my opinion, a very good thing that they should be trying to do then that is incredibly difficult to do with AWS and in my opinion, AWS has to invest enormously in to make the whole ecosystem much better. >> Okay, so innovation required there maybe is part of the TAM expansion strategy but just to sort of digress for a second. So it seems like, and by the way, there are others that are doing, they're taking this converged approach. It seems like that is a trend. I mean, you certainly see it with single store. I mean, the name sort of implies that formerly MemSQL I think Monte Zweben of splice machine is probably headed in a similar direction, embedding AI in Microsoft's, kind of interesting. It seems like Microsoft is willing to build this abstraction layer that hides that complexity of the different tooling. AWS thus far has not taken that approach and then sort of looking at Snowflake, Snowflake's got a completely different, I think Snowflake's trying to do something completely different. I don't think they're necessarily trying to take Oracle head-on. I mean, they're certainly trying to just, I guess, let's talk about this. Snowflake simplified EDW, that's clear. Zero to snowflake in 90 minutes. It's got this data cloud vision. So you sign on to this Snowflake, speaking of layers they're abstracting the complexity of the underlying cloud. That's what the data cloud vision is all about. They, talk about this Global Mesh but they've not done a good job of explaining what the heck it is. We've been pushing them on that, but we got, >> Aspiration of moment >> Well, I guess, yeah, it seems that way. And so, but conceptually, it's I think very powerful but in reality, what snowflake is doing with data sharing, a lot of reading it's probably mostly read-only and I say, mostly read-only, oh, there you go. You'll get better but it's mostly read and so you're able to share the data, it's governed. I mean, it's exactly, quite genius how they've implemented this with its simplicity. It is a caching architecture. We've talked about that, we can geek out about that. There's good, there's bad, there's ugly but generally speaking, I guess my premise here I would love your thoughts. Is snowflakes trying to do something different? It's trying to be not just another data warehouse. It's not just trying to compete with data lakes. It's trying to create this data cloud to facilitate data sharing, put data in the hands of business owners in terms of a product build, data product builders. That's a different vision than anything I've seen thus far, your thoughts. >> I agree and even more going further, being a place where people can sell data. Put it up and make it available to whoever needs it and making it so simple that it can be shared across the country and across the world. I think it's a very powerful vision indeed. The challenge they have is that the pieces at the moment are very, very easy to use but the quality in terms of the, for example, joints, I mentioned, the joints were very powerful in Oracle. They don't try and do joints. They, they say >> They being Snowflake, snowflake. Yeah, they don't even write it. They would say use another Postgres >> Yeah. >> Database to do that. >> Yeah, so then they have a long way to go. >> Complex joints anyway, maybe simple joints, yeah. >> Complex joints, so they have a long way to go in terms of the functionality of their product and also in my opinion, they sure be going to have more types of databases inside it, including OLTP and they can do that. They have obviously got a great market gap and they can do that by acquisition as well as they can >> They've started. I think, I think they support JSON, right. >> Do they support JSON? And graph, I think there's a graph database that's either coming or it's there, I can't keep all that stuff in my head but there's no reason they can't go in that direction. I mean, in speaking to the founders in Snowflake they were like, look, we're kind of new. We would focus on simple. A lot of them came from Oracle so they know all database and they know how hard it is to do things like facilitate complex joints and do complex workload management and so they said, let's just simplify, we'll put it in the cloud and it will spin up a separate data warehouse. It's a virtual data warehouse every time you want one to. So that's how they handle those things. So different philosophy but again, coming back to some of the mission critical work and some of the larger Oracle customers, they said they have a thousand autonomous database customers. I think it was autonomous database, not ADW but anyway, a few stood out AON, lift, I think Deloitte stood out and as obviously, hundreds more. So we have people who misunderstand Oracle, I think. They got a big install base. They invest in R and D and they talk about lock-in sure but the CIO that I talked to and you talked to David, they're looking for business value. I would say that 75 to 80% of them will gravitate toward business value over the fear of lock-in and I think at the end of the day, they feel like, you know what? If our business is performing, it's a better business decision, it's a better business case. >> I fully agree, they've been very difficult to do business with in the past. Everybody's in dread of the >> The audit. >> The knock on the door from the auditor. >> Right. >> And that from a purchasing point of view has been really bad experience for many, many customers. The users of the database itself are very happy indeed. I mean, you talk to them and they understand why, what they're paying for. They understand the value and in terms of availability and all of the tools for complex multi-dimensional types of applications. It's pretty well, the only game in town. It's only DB2 and SQL that had any hope of doing >> Doing Microsoft, Microsoft SQL, right. >> Okay, SQL >> Which, okay, yeah, definitely competitive for sure. DB2, no IBM look, IBM lost its dominant position in database. They kind of seeded that. Oracle had to fight hard to win it. It wasn't obvious in the 80s who was going to be the database King and all had to fight. And to me, I always tell people the difference is that the chairman of Oracle is also the CTO. They spend money on R and D and they throw off a ton of cash. I want to say something about, >> I was just going to make one extra point. The simplicity and the capability of their cloud versions of all of this is incredibly good. They are better in terms of spending what you need or what you use much better than AWS, for example or anybody else. So they have really come full circle in terms of attractiveness in a cloud environment. >> You mean charging you for what you consume. Yeah, Mendelsohn talked about that. He made a big point about the granularity, you pay for only what you need. If you need 33 CPUs or the other databases you've got to shape, if you need 33, you've got to go to 64. I know that's true for everyone. I'm not sure if that's true too for snowflake. It may be, I got to dig into that a little bit, but maybe >> Yes, Snowflake has got a front end to hiding behind. >> Right, but I didn't want to push it that a little bit because I want to go look at their pricing strategies because I still think they make you buy, I may be wrong. I thought they make you still do a one-year or two-year or three-year term. I don't know if you can just turn it off at any time. They might allow, I should hold off. I'll do some more research on that but I wanted to make a point about the audits, you mentioned audits before. A big mistake that a lot of Oracle customers have made many times and we've written about this, negotiating with Oracle, you've got to bring your best and your brightest when you negotiate with Oracle. Some of the things that people didn't pay attention to and I think they've sort of caught onto this is that Oracle's SOW is adjudicate over the MSA, a lot of legal departments and procurement department. Oh, do we have an MSA? With all, Yes, you do, okay, great and because they think the MSA, they then can run. If they have an MSA, they can rubber stamp it but the SOW really dictateS and Oracle's gotcha there and they're really smart about that. So you got to bring your best and the brightest and you've got to really negotiate hard with Oracle, you get trouble. >> Sure. >> So it is what it is but coming back to Oracle, let's sort of wrap on this. Dominant position in mission critical, we saw that from the Gartner research, especially for operational, giant customer base, there's cloud-first notion, there's investing in R and D, open, we'll put a question Mark around that but hey, they're doing some cool stuff with Michael stuff. >> Ecosystem, I put that, ecosystem they're promoting their ecosystem. >> Yeah, and look, I mean, for a lot of their customers, we've talked to many, they say, look, there's actually, a tail at the tail way, this saves us money and we don't have to migrate. >> Yeah. So interesting, so I'll give you the last word. We started sort of focusing on the announcement. So what do you want to leave us with? >> My last word is that there are platforms with a certain key application or key parts of the infrastructure, which I think can differentiate themselves from the Azures or the AWS. and Oracle owns one of those, SAP might be another one but there are certain platforms which are big enough and important enough that they will, in my opinion will succeed in that cloud strategy for this. >> Great, David, thanks so much, appreciate your insights. >> Good to be here. Thank you for watching everybody, this is Dave Vellante for The Cube. We'll see you next time. (upbeat music)

Published Date : Mar 17 2021

SUMMARY :

and that moderates, the great pleasure to be here. that the system automatically and it seemed to have done so very well. So in your view, is this, I mean and the second thing and the announcement. that the others are all useful that they've started to of stuff that they supported, and be an ecosystem for the end users. and the OLTP databases, and the reason why I and the most valuable applications, and the choices that customers have. for the right job approach was and that makes, for the program OLTP and the inference that complexity of the different tooling. put data in the hands of business owners that the pieces at the moment Yeah, they don't even write it. Yeah, so then they Complex joints anyway, and also in my opinion, they sure be going I think, I think they support JSON, right. and some of the larger Everybody's in dread of the and all of the tools is that the chairman of The simplicity and the capability He made a big point about the granularity, front end to hiding behind. and because they think the but coming back to Oracle, Ecosystem, I put that, ecosystem Yeah, and look, I mean, on the announcement. and important enough that much, appreciate your insights. Good to be here.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Mendelsohn	PERSON	0.99+
Andy Mendelsohn	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
David Floyer	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
IBM	ORGANIZATION	0.99+
March 9th	DATE	0.99+
February 19th	DATE	0.99+
five	QUANTITY	0.99+
Deloitte	ORGANIZATION	0.99+
75	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
Larry Ellison	PERSON	0.99+
Mendelssohn	PERSON	0.99+
two	QUANTITY	0.99+
each	QUANTITY	0.99+
90%	QUANTITY	0.99+
one-year	QUANTITY	0.99+
Gartner	ORGANIZATION	0.99+
73%	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
two tools	QUANTITY	0.99+
Michael	PERSON	0.99+
64%	QUANTITY	0.99+
two factors	QUANTITY	0.99+
more than a dozen	QUANTITY	0.99+
last quarter	DATE	0.99+
SQL	TITLE	0.99+

Rik Tamm-Daniels, Informatica & Rick Turnock, Invesco | AWS re:Invent 2020

>> Announcer: From around the globe, it's "theCUBE" with digital coverage of AWS "re:Invent" 2020. Sponsored by Intel, AWS and our community partners. >> Hi, everyone, welcome back to theCUBE's virtual coverage of AWS "re:Invent" virtual 2020. It's not an in-person event this year. It's remote, it's virtual, "theCUBE" is virtual and our guests and our interviewers will be remote as well. And so we're here covering the event for the next three weeks, throughout the next three cause we're weaving in commentary from "theCUBE", check out the cube.net and all of our coverage. And here at AWS we have special feature programming, we got a great segment here talking about big data in the cloud, governance, data lakes, all that good stuff. Rik Tamm-Daniels, vice-president strategic ecosystems and technology for Informatica, and Rick Turnock, head of enterprise data services, Invesco, customer of Informatica. Welcome to the cube. >> Hey John, thanks for having us. >> So Rik, with a K from Informatica, I want to ask you first, we've been covering the company journey for many, many years. Always been impressed with the focus on data and specifically cloud and all the things that you guys have been announcing over the years, have been spot on the mark. You know, AI with CLAIRE, you know, making things, cloud native, all that's kind of playing out now with the pandemic, "re:Invent", that's the story here. Building blocks with high level services, cloud native, but data is the critical piece again. More machine learning, more AI, more data management. What's your take on this year's "re:Invent". >> Absolutely John and again, we're always excited to be here at "re:Invent", we've been here since the very first one. You know, it's a deep talk to a couple of key trends there, especially the era of the global pandemic here. There's so many challenges that so many enterprises are experiencing. I think the big surprise has been, that has actually translated into a tremendous amount of demand for digital transformation, and cloud modernization in particular. So we've seen a huge uptake in our cloud relationships with AWS when it comes to transformational architecture solutions around data and analytics, and using data as a fundamental asset for digital transformation. And so some of those solution areas are things like data warehouse, modernization of the cloud, or end-to-end data governance. That's a huge topic as well for many enterprises today. >> Before coming into "re:Invent", I had a chance to sit down an exclusive interview with Andy Jassy. I just spoke with Matt Garman who's now heading up sales and marketing, who ran EC too. Rick, you're a customer of Informatica. Their big talking point to me and validation to the trends is, there's no excuse to go slow anymore because there's a reason to go fast cause there's consequences and the pandemic has highlighted that you got to move faster otherwise, you know, you're going to be on the wrong side of history and necessity is the mother of all invention. Okay, great. I buy that by the way. So I have no complaints on talking point there from Amazon Web Services. The problem is, you got to manage the data. (John chuckles) To go fast. The gas in the tank is data, and if it's screwed up, it's not going to go well, all right? So it's like putting gas in a car. So, this is where I see the data lake coming into the cloud and all the benefits and look at the successes of companies. The cloud is a real enabler. What's your take on this? The importance of data governance, because cloud scale is here, people want to go faster, data is like the key thing. >> Yeah. The data governance was a critical component when we started our enterprise data platform and looking at, you know, how can we build a modern-day architecture with scale, bringing our enterprise data, but doing it in a governed fashion. So, when we did it, we kind of looked at, you know, what are critical partners? How can we apply data governance and the full catalog capabilities of knowing what data's coming in, identifying it, and then really controlling the quality of it to meet the needs of the organization. It was a critical component for us because typically it's been difficult to get access to that right data. And as we look in the future and even current needs, we really need to understand our data and bring the right data in and make it easily accessible and governance, and quality of that is a critical component of it. >> I want to just follow up with that if you don't mind cause you know, I've done so many of these interviews, I've been on the block now 30 years in the industry, I've seen the waves come and go, and you see a lot of these mandates, you know, "Data governance, we're adding data governance." From the Ivory tower, or you hear, "Everything got to be a service." But when you peel back and look under the hood to make that happen, it's complicated. You've got to have put things in place and it's got to be set up properly, you got to do your work. How important it is to have... And well what's under the covers to this? Cause governance, yeah, it's a talking point, I get that. But to make it actually happen well, it's hard. >> We started really with the operating models from the start. So I kind of took over data governance seven years ago and had a governing global architecture that's been around for 40 years, and it was hard. So this was really our shot and time to get it right. So we did an operating model, a governance model, and it really ingrained it through the whole build and execution process. And so it was part with technology and it was foundational to the process to really deliver it. So it wasn't governance from a governance saying, it was really part of our operating model and process to build this out and really succeed at it. >> Rik, on the Informatica side, I got to get your take on the new solution you guys announced, "The Governed Data Lake", I think it was solution. Does this tie into that? Take a minute to explain the announcement, and how does this tie in? >> Yeah, absolutely John. So I think you take a step back, look at... We talked about some of the drivers of why companies are investing in cloud data lakes. And I think what comes down to is, when you think about that core foundation of data analytics, you know, they're really looking at, you know, how do we go ahead and create a tremendous leap forward in terms of their ability to leverage data as an asset. And again, as we talked about, one of the biggest challenges is trust around the data. And what the solution does though, is it really looks to say, "Okay, first and foremost, "let's create that foundation of trust "not just for the cloud data lake, "but for the entire enterprise. "To ensure that when we start to build this "new architecture, one, we understand the data assets "that are coming in at the very get-go." Right? It's much harder to add data governance after the fact, but you put it in upfront, you understand your existing data landscape. And once that data is there, you make sure you understand the quality of the information, you cleanse the data, you also make sure you put it under the right data management policies. So many policies that enterprises are subject to now like CCPA and GDPR. They have to be concerned about consumer privacy and being able as part of your governance foundational layer, to make sure that you're in compliance as data moves through your new architectures. It's fundamental having that end trust and confidence to be able to use that data downstream. So our solution looks to do that end-to-end across a cloud environment, and again, not just the cloud environment, but the full enterprise as well. One thing I do want to touch on if you don't mind is on the AI side of things and the tooling side of things. Because I think data governance has been around a while, as you said, it's not that it's a new concept. But how do you do it efficiently in today's world? >> John: Yeah. >> And this is where Informatica is focused on a concept of data 4.0. Which is the use of metadata and AI and machine learning and intelligence, to make this process much, much more efficient. >> Yeah that's a good point, Rik, from these two Rickes, I got to go, one's with a K, one with a C, and CK. So Rick, CK and from Invesco customer, I want to just check that with you because I was your customer of Informatica, by they brought up a good point about governance. And I saw this movie before, we've all seen this before, people just slap on solutions or tooling to a pre-existing architecture. You see that with security, you know, now it's, you can't have a conversation without saying, "Oh security's got to be baked in from the beginning." Okay cool, I get that. There's no debate there. Governance, same kind of thing, you know, you're hearing this over and over again, if you don't bake governance into the beginning of everything, you're going to be screwed. Okay? So how important is that foundation of trust for this peace. (Rick mumbling) >> It's critical and to do it at beginning, right? So you're profiling the data, you're defining entitlements and who has access to it, you're defining data quality rules that you can validate that, you define the policies, is there a PII data, all of that, as you do that from the start, then you have a well-governed and documented data catalog and taxonomy that has the policies and the controls in place to allow that to use. If you do it after the fact, then you're always going to be catching up. So a part of our process and policies and where the really Informatica tools delivered for us is to make it part of that process. And to use that as we continue to build out our data platform with the quality controls and all the governance processes built in. >> I got to ask on your journey, that's seven years ago, you took over the practice. You were probably right in the middle of the sea change when everyone kind of woke up and said, "Hey, you know, Amazon, you go back seven years, "look at Amazon where they were to where they are today." Okay? Significantly strong then, total bellwether now in terms of value opportunity. So, how did you look at the cloud migration? How do you think about the cloud architecture? Because I'm sure, and I'd love to get your story here about how you thought about cloud, in the midst of architecting the data foundational platform there. >> Yeah, we're a global company that had architecture, we grew it by acquisition. So a lot of our data architecture was on-prem, difficult really to pull that enterprise data together to meet the business needs. So when we started this, we really wanted to leverage cloud technology. We wanted a modern stack. We wanted scale, flexibility, agility, security, all the things that the cloud brought us too. So we did a search, and looking at that, and looked at competitors, but really landed on to Amazon just bought by core capabilities and scale they have innovation and just the services to bring a lot what we're looking at and really deliver on what we wanted from a platform. >> Why Informatica and AWS, why the combination? Can you share some of the reasons why you went with Informatica with AWS? >> Yeah, again, when we started this off, we looked at the competitors, right? And we were using IVQ. So we had an Informatica product on-prem, but we looked at a lot of the different governance competitors, and really the integrated platform that Informatica brings to us, what was the key deliverer, right? So we can really have the technical metadata with EDC and enterprise data client, catalog, scan our sources, our file, understanding the data and lineage of what it is. And we can tie that into axon and the governance tools to really define business costs returns. We were very critical of defining all our key data elements business glossary, and then we can see where that is by linking that to the technical metadata. So we know where our PII data, where are all our data and how it flows, both tactically and from a business process. And then the IDQ. So when we've defined and understand the data, we want to bring in the delight and how we want to conform it, to make it easily accessible, we can define data quality rules within the governance tool, and then execute that with IDQ, and really have a well-defined data quality process that really takes it from governance in theory to governance in really execution. >> That's awesome. Hey, you are using the data, you're using the cloud, you're getting everything you need out of it. That's the whole idea, isn't it? >> Yeah. >> That's good stuff, Rik at Informatica, tell us about what's going on, you mentioned data 4.0, I think people should pay attention to some of the interviews I've done with your team. They're online also, it's part of that next-gen, next level thinking. Here at "re:Invent", what should customers pay attention to, that you guys are doing? Great customer example here of cloud scale. What's the story for "re:Invent" this year for Informatica. >> But what John, it comes down to when customers think about their cloud journey, right? And the difference, especially with their data centric workloads and priorities and initiatives, all the different hurdles that they need to overcome. I think Informatica we're uniquely positioned to help customers address all those different challenges and you heard Rick speak about a whole bunch of those along the way. And I think particularly at "re:Invent", first of all, I just welcome folks to... They want to learn more about our data governance solution. Please come by our virtual booth. We also have a great interactive experience that encouraged folks to check out. One of the key components of our solution is our enterprise data catalog. And attendees at "re:Invent" can actually get hands on with our data catalog through the demo jam, the AWS demo jam as part of "re:invent". So I'd encourage folks to check that out as well, just to see what we're talking about yet actually. >> Awesome. Final question for you guys, as "re:Invent" is going on, a lot app stores are popping up, you seeing obviously the same trends, machine learning and you know, outpost is booming, so a cloud operations is clearly here, Rick from Invesco, what do you think the most important story is for your peers as they're here? It's a learning conference and you guys have seven years in the cloud working together with Informatica, in your opinion, what should people be paying attention to as they looked at this pandemic and what they got to get through? And then coming out of it with the growth strategy, it's all got to be more about the data, there's more data coming in, more sources, IoT data, certainly the work at home is causing these workloads, workplace, workflows, everything's changed, the future of work. What's your advice to peers out there on what to pay attention to and what to think about? >> We really started with a top-down strategy, right? To really the vision and the future. What do we want to get out of our data? Data is just data, right? But it's the information, it's the analytics, it's really delivering value for our clients, shareholders, and employees to really do their job, simplify our architecture. So really defining that vision of what you want and approach, and then executing on it, right? So how do you build it in a way to make it flexible and scalable, and how do you build an operating and governance model really to deliver on it because, you know, garbage in is garbage out, and you really got to have those processes, I think to really get the full value of what you're building. >> Get the data out there at the right place, at the right time and the right quality data. That's a lot more involved now and you need to be agile. And I think agile data is a way to go. Rick Turnock... >> And then with channels and capabilities that makes it easier, right? It makes it doable. And I think that's what cloud and the Informatica tools, right? Where in the past, you know, it was people hard coding and doing it right? The capability of that cloud and these tools give us makes it really achievable. >> You know, we have an old saying here in our CUBE team, you know, "If there's a problem, "you got to see if it's important, "and then look at the consequences "of not solving that problem, quantify the value of "solving or not solving that problem, "and then look and deploy solutions to do it." I think now with the data, you can actually do that and say, "This ain't quite the consequences of not doing this "or doing this, have a quantifiable value." I just loved that because it brings the whole ROI back to the table. And, you know, it's a dark art, it used to be, you mentioned the old days, you know, you got to do all this custom work, it was like a dark art. Oh yeah, the ROI calculation, payback. I mean, it was a moving train. That's the way it used to be. Not anymore. >> You got to do it to survive, really, if you're not doing it, you know, I don't know. >> Necessity is the mother of all inventions I think, now more than ever, data's going to be the key. Rik final word from Informatica. What should people pay attention to? >> Yeah, I mean, I think as you mentioned there, data is obviously a critical asset, right? And, and to your point with cloud, you can not only realize ROI quickly, but, you can actually iterate so much more quickly, where you can get that ROI immediately or you can validate that ROI, you can adjust your approach. But again, from an Informatica standpoint, we are seeing such a huge uptake in demand for customers who want to go to the cloud, who are modernizing. Every day we're investing heavily and how do we make sure that customers can get there quickly? They can maximize the ROI from their data assets, and we're doing it with all things, data management, from traditional data integration, all the way to the data governance, all the capabilities we talked about today. >> Yeah. Congratulations. That's the benefit of investing in a platform and having a set of out of the box tooling with SaaS, platform as a service, really it can enable success. And I think the pandemic is pretty obvious who's taking advantage of it, so congratulations and continued success. Thanks for coming on. Appreciate it. Rick Turnock, head of data service, enterprise data services at Invesco, customer of Informatica sharing his insight. Great insight there. Necessity is the mother of all inventions, baking it in from the beginning data governance foundational, it's not a bolt on, that's the message. I'm John Furrier with theCUBE. Thanks for watching. (soft music)

Published Date : Dec 2 2020

SUMMARY :

Announcer: From around the globe, in the cloud, governance, data and specifically cloud and all the things modernization of the cloud, and all the benefits and look and bring the right data in From the Ivory tower, or you hear, and time to get it right. on the new solution you guys announced, to is, when you think about and intelligence, to make this process I want to just check that with you because and taxonomy that has the I got to ask on your journey, the services to bring a lot and then we can see where That's the whole idea, isn't it? that you guys are doing? and you heard Rick speak and you know, outpost is booming, really to deliver on it because, you know, at the right time and Where in the past, you know, I think now with the data, you you know, I don't know. Necessity is the mother And, and to your point with cloud, and having a set of out of the box tooling

ENTITIES

Entity	Category	Confidence
AWS	ORGANIZATION	0.99+
Rick	PERSON	0.99+
John	PERSON	0.99+
Rick Turnock	PERSON	0.99+
Matt Garman	PERSON	0.99+
Rik Tamm-Daniels	PERSON	0.99+
Amazon Web Services	ORGANIZATION	0.99+
Informatica	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Invesco	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Andy Jassy	PERSON	0.99+
seven years	QUANTITY	0.99+
30 years	QUANTITY	0.99+
Rik	PERSON	0.99+
two	QUANTITY	0.99+
one	QUANTITY	0.99+
GDPR	TITLE	0.99+
seven years ago	DATE	0.99+
first	QUANTITY	0.98+
Intel	ORGANIZATION	0.98+
cube.net	OTHER	0.98+
CCPA	TITLE	0.98+
CLAIRE	PERSON	0.98+
first one	QUANTITY	0.97+
CUBE	ORGANIZATION	0.97+
One	QUANTITY	0.96+
this year	DATE	0.96+
both	QUANTITY	0.95+
Rickes	PERSON	0.94+
pandemic	EVENT	0.93+
today	DATE	0.93+
theCUBE	TITLE	0.9+
2020	DATE	0.89+
Ivory tower	LOCATION	0.89+
CK	PERSON	0.89+
re:Invent	EVENT	0.83+
40 years	QUANTITY	0.8+
IDQ	TITLE	0.78+
next three weeks	DATE	0.75+

Breaking Analysis: Five Questions About Snowflake’s Pending IPO

>> From theCUBE Studios in Palo Alto in Boston, bringing you data driven insights from theCUBE and ETR. This is breaking analysis with Dave Vellante. >> In June of this year, Snowflake filed a confidential document suggesting that it would do an IPO. Now of course, everybody knows about it, found out about it and it had a $20 billion valuation. So, many in the community and the investment community and so forth are excited about this IPO. It could be the hottest one of the year, and we're getting a number of questions from investors and practitioners and the entire Wiki bond, ETR and CUBE community. So, welcome everybody. This is Dave Vellante. This is "CUBE Insights" powered by ETR. In this breaking analysis, we're going to unpack five critical questions around Snowflake's IPO or pending IPO. And with me to discuss that is Erik Bradley. He's the Chief Engagement Strategists at ETR and he's also the Managing Director of VENN. Erik, thanks for coming on and great to see you as always. >> Great to see you too. Always enjoy being on the show. Thank you. >> Now for those of you don't know Erik, VENN is a roundtable that he hosts and he brings in CIOs, IT practitioners, CSOs, data experts and they have an open and frank conversation, but it's private to ETR clients. But they know who the individual is, what their role is, what their title is, et cetera and it's a kind of an ask me anything. And I participated in one of them this past week. Outstanding. And we're going to share with you some of that. But let's bring up the agenda slide if we can here. And these are really some of the questions that we're getting from investors and others in the community. There's really five areas that we want to address. The first is what's happening in this enterprise data warehouse marketplace? The second thing is kind of a one area. What about the legacy EDW players like Oracle and Teradata and Netezza? The third question we get a lot is can Snowflake compete with the big cloud players? Amazon, Google, Microsoft. I mean they're right there in the heart, in the thick of things there. And then what about that multi-cloud strategy? Is that viable? How much of a differentiator is that? And then we get a lot of questions on the TAM. Meaning the total available market. How big is that market? Does it justify the valuation for Snowflake? Now, Erik, you've been doing this now. You've run a couple VENNs, you've been following this, you've done some other work that you've done with Eagle Alpha. What's your, just your initial sort of takeaway from all this work that you've been doing. >> Yeah, sure. So my first take on Snowflake was about two and a half years ago. I actually hosted them for one of my VENN interviews and my initial thought was impressed. So impressed. They were talking at the time about their ability to kind of make ease of use of a multi-cloud strategy. At the time although I was impressed, I did not expect the growth and the hyper growth that we have seen now. But, looking at the company in its current iteration, I understand where the hype is coming from. I mean, it's 12 and a half billion private valuation in the last round. The least confidential IPO (laughs) anyone's ever seen (Dave laughs) with a 15 to $20 billion valuation coming out, which is more than Teradata, Margo and Cloudera combined. It's a great question. So obviously the success to this point is warranted, but we need to see what they're going to be able to do next. So I think the agenda you laid out is a great one and I'm looking forward to getting into some of those details. >> So let's start with what's happening in the marketplace and let's pull up a slide that I very much love to use. It's the classic X-Y. On the vertical axis here we show net score. And remember folks, net score is an indicator of spending momentum. ETR every quarter does like a clockwork survey where they're asking people, "Essentially are you spending more or less?" They subtract the less from the more and comes up with a net score. It's more complicated than, but like NPS, it's a very simple and reliable methodology. That's the vertical axis. And the horizontal axis is what's called market share. Market share is the pervasiveness within the data set. So it's calculated by the number of mentions of the vendor divided by the number of mentions within that sector. And what we're showing here is the EDW sector. And we've pulled out a few companies that I want to talk about. So the big three, obviously Microsoft, AWS and Google. And you can see Microsoft has a huge presence far to the right. AWS, very, very strong. A lot of Redshift in there. And then they're pretty high on the vertical axis. And then Google, not as much share, but very solid in that. Close to 60% net score. And then you can see above all of them from a vertical standpoint is Snowflake with a 77.5% net score. You can see them in the upper right there in the green. One of the highest Erik in the entire data set. So, let's start with some sort of initial comments on the big guys and Snowflakes. Your thoughts? >> Sure. Just first of all to comment on the data, what we're showing there is just the data warehousing sector, but Snowflake's actual net score is that high amongst the entire universe that we follow. Their data strength is unprecedented and we have forward-looking spending intention. So this bodes very well for them. Now, what you did say very accurately is there's a difference between their spending intentions on a net revenue level compared to AWS, Microsoft. There no one's saying that this is an apples-to-apples comparison when it comes to actual revenue. So we have to be very cognizant of that. There is domination (laughs) quite frankly from AWS and from Azure. And Snowflake is a necessary component for them not only to help facilitate a multi-cloud, but look what's happening right now in the US Congress, right? We have these tech leaders being grilled on their actual dominance. And one of the main concerns they have is the amount of data that they're collecting. So I think the environment is right to have another player like this. I think Snowflake really has a lot of longevity and our data is supporting that. And the commentary that we hear from our end users, the people that take the survey are supporting that as well. >> Okay, and then let's stay on this X-Y slide for a moment. I want to just pull out a couple of other comments here, because one of the questions we're asking is Whither, the legacy EDW players. So we've got in here, IBM, Oracle, you can see Teradata and then Hortonworks and MapR. We're going to talk a little bit about Hortonworks 'cause it's now Cloudera. We're going to talk a little bit about Hadoop and some of the data lakes. So you can see there they don't have nearly the net score momentum. Oracle obviously has a huge install base and is investing quite frankly in R&D and do an Exadata and it has its own cloud. So, it's got a lock on it's customers and if it keeps investing and adding value, it's not going away. IBM with Netezza, there's really been some questions around their commitment to that base. And I know that a lot of the folks in the VENNs that we've talked to Erik have said, "Well, we're replacing Netezza." Frank Slootman has been very vocal about going after Teradata. And then we're going to talk a little bit about the Hadoop space. But, can you summarize for us your thoughts in your research and the commentary from your community, what's going on with the legacy guys? Are these guys cooked? Can they hang on? What's your take? >> Sure. We focus on this quite a bit actually. So, I'm going to talk about it from the data perspective first, and then we'll go into some of the commentary and the panel. You even joined one yesterday. You know that it was touched upon. But, first on the data side, what we're noticing and capturing is a widening bifurcation between these cloud native and the legacy on-prem. It is undeniable. There is nothing that you can really refute. The data is concrete and it is getting worse. That gap is getting wider and wider and wider. Now, the one thing I will say is, nobody's going to rip out their legacy applications tomorrow. It takes years and years. So when you look at Teradata, right? Their market cap's only 2 billion, 2.3 billion. How much revenue growth do they need to stay where they are? Not much, right? No one's expecting them to grow 20%, which is what you're seeing on the left side of that screen. So when you look at the legacy versus the cloud native, there is very clear direction of what's happening. The one thing I would note from the data perspective is if you switched from net score or adoptions and you went to flat spending, you suddenly see Oracle and Teradata move over to that left a little bit, because again what I'm trying to say is I don't think they're going to catch up. No, but also don't think they're going away tomorrow. That these have large install bases, they have relationships. Now to kind of get into what you were saying about each particular one, IBM, they shut down Netezza. They shut it down and then they brought it back to life. How does that make you feel if you're the head of data architecture or you're DevOps and you're trying to build an application for a large company? I'm not going back to that. There's absolutely no way. Teradata on the other hand is known to be incredibly stable. They are known to just not fail. If you need to kind of re-architect or you do a migration, they work. Teradata also has a lot of compliance built in. So if you're a financials, if you have a regulated business or industry, there's still some data sets that you're not going to move up to the cloud. Whether it's a PII compliance or financial reasons, some of that stuff is still going to live on-prem. So Teradata is still has a very good niche. And from what we're hearing from our panels, then this is a direct quote if you don't mind me looking off screen for one second. But this is a great one. Basically said, "Teradata is the only one from the legacy camp who is putting up a fight and not giving up." Basically from a CIO perspective, the rest of them aren't an option anymore. But Teradata is still fighting and that's great to hear. They have their own data as a service offering and listen, they're a small market cap compared to these other companies we're talking about. But, to summarize, the data is very clear. There is a widening bifurcation between the two camps. I do not think legacy will catch up. I think all net new workloads are moving to data as a service, moving to cloud native, moving to hosted, but there are still going to be some existing legacy on-prem applications that will be supported with these older databases. And of those, Oracle and Teradata are still viable options. >> I totally agree with you and my colleague David Floyd is actually quite high on Teradata Vantage because he really does believe that a key component, we're going to talk about the TAM in a minute, but a key component of the TAM he believes must include the on-premises workloads. And Frank Slootman has been very clear, "We're not doing on-prem, we're not doing this halfway house." And so that's an opportunity for companies like Teradata, certainly Oracle I would put it in that camp is putting up a fight. Vertica is another one. They're very small, but another one that's sort of battling it out from the old NPP world. But that's great. Let's go into some of the specifics. Let's bring up here some of the specific commentary that we've curated here from the roundtables. I'm going to go through these and then ask you to comment. The first one is just, I mean, people are obviously very excited about Snowflake. It's easy to use, the whole thing zero to Snowflake in 90 minutes, but Snowflake is synonymous with cloud-native data warehousing. There are no equals. We heard that a lot from your VENN panelist. >> We certainly did. There was even more euphoria around Snowflake than I expected when we started hosting these series of data warehousing panels. And this particular gentleman that said that happens to be the global head of data architecture for a fortune 100 financials company. And you mentioned earlier that we did a report alongside Eagle Alpha. And we noticed that among fortune 100 companies that are also using the big three public cloud companies, Snowflake is growing market share faster than anyone else. They are positioned in a way where even if you're aligned with Azure, even if you're aligned with AWS, if you're a large company, they are gaining share right now. So that particular gentleman's comments was very interesting. He also made a comment that said, "Snowflake is the person who championed the idea that data warehousing is not dead yet. Use that old monthly Python line and you're not dead yet." And back in the day where the Hadoop came along and the data lakes turned into a data swamp and everyone said, "We don't need warehousing anymore." Well, that turned out to be a head fake, right? Hadoop was an interesting technology, but it's a complex technology. And it ended up not really working the way people want it. I think Snowflake came in at that point at an opportune time and said, "No, data warehousing isn't dead. We just have to separate the compute from the storage layer and look at what I can do. That increases flexibility, security. It gives you that ability to run across multi-cloud." So honestly the commentary has been nothing but positive. We can get into some of the commentary about people thinking that there's competition catching up to what they do, but there is no doubt that right now Snowflake is the name when it comes to data as a service. >> The other thing we heard a lot was ETL is going to get completely disrupted, you sort of embedded ETL. You heard one panelist say, "Well, it's interesting to see that guys like Informatica are talking about how fast they can run inside a Snowflake." But Snowflake is making that easy. That data prep is sort of part of the package. And so that does not bode well for ETL vendors. >> It does not, right? So ETL is a legacy of on-prem databases and even when Hadoop came along, it still needed that extra layer to kind of work with the data. But this is really, really disrupting them. Now the Snowflake's credit, they partner well. All the ETL players are partnered with Snowflake, they're trying to play nice with them, but the writings on the wall as more and more of this application and workloads move to the cloud, you don't need the ETL layer. Now, obviously that's going to affect their talent and Informatica the most. We had a recent comment that said, this was a CIO who basically said, "The most telling thing about the ETL players right now is every time you speak to them, all they talk about is how they work in a Snowflake architecture." That's their only metric that they talk about right now. And he said, "That's very telling." That he basically used it as it's their existential identity to be part of Snowflake. If they're not, they don't exist anymore. So it was interesting to have sort of a philosophical comment brought up in one of my roundtables. But that's how important playing nice and finding a niche within this new data as a service is for ETL, but to be quite honest, they might be going the same way of, "Okay, let's figure out our niche on these still the on-prem workloads that are still there." I think over time we might see them maybe as an M&A possibility, whether it's Snowflake or one of these new up and comers, kind of bring them in and sort of take some of the technology that's useful and layer it in. But as a large market cap, solo existing niche, I just don't know how long ETL is for this world. >> Now, yeah. I mean, you're right that if it wasn't for the marketing, they're not fighting fashion. But >> No. >> really there're some challenges there. Now, there were some contrarians in the panel and they signaled some potential icebergs ahead. And I guarantee you're going to see this in Snowflake's Red Herring when we actually get it. Like we're going to see all the risks. One of the comments, I'll mention the two and then we can talk about it. "Their engineering advantage will fade over time." Essentially we're saying that people are going to copycat and we've seen that. And the other point is, "Hey, we might see some similar things that happened to Hadoop." The public cloud players giving away these offerings at zero cost. Essentially marginal cost of adding another service is near zero. So the cloud players will use their heft to compete. Your thoughts? >> Yeah, first of all one of the reasons I love doing panels, right? Because we had three gentlemen on this panel that all had nothing but wonderful things to say. But you always get one. And this particular person is a CTO of a well known online public travel agency. We'll put it that way. And he said, "I'm going to be the contrarian here. I have seven different technologies from private companies that do the same thing that I'm evaluating." So that's the pressure from behind, right? The technology, they're going to catch up. Right now Snowflake has the best engineering which interestingly enough they took a lot of that engineering from IBM and Teradata if you actually go back and look at it, which was brought up in our panel as well. He said, "However, the engineering will catch up. They always do." Now from the other side they're getting squeezed because the big cloud players just say, "Hey, we can do this too. I can bundle it with all the other services I'm giving you and I can squeeze your pay. Pretty much give it a waive at the cost." So I do think that there is a very valid concern. When you come out with a $20 billion IPO evaluation, you need to warrant that. And when you see competitive pressures from both sides, from private emerging technologies and from the more dominant public cloud players, you're going to get squeezed there a little bit. And if pricing gets squeezed, it's going to be very, very important for Snowflake to continue to innovate. That comment you brought up about possibly being the next Cloudera was certainly the best sound bite that I got. And I'm going to use it as Clickbait in future articles, because I think everyone who starts looking to buy a Snowflake stock and they see that, they're going to need to take a look. But I would take that with a grain of salt. I don't think that's happening anytime soon, but what that particular CTO was referring to was if you don't innovate, the technology itself will become commoditized. And he believes that this technology will become commoditized. So therefore Snowflake has to continue to innovate. They have to find other layers to bring in. Whether that's through their massive war chest of cash they're about to have and M&A, whether that's them buying analytics company, whether that's them buying an ETL layer, finding a way to provide more value as they move forward is going to be very important for them to justify this valuation going forward. >> And I want to comment on that. The Cloudera, Hortonworks, MapRs, Hadoop, et cetera. I mean, there are dramatic differences obviously. I mean, that whole space was so hard, very difficult to stand up. You needed science project guys and lab coats to do it. It was very services intensive. As well companies like Cloudera had to fund all these open source projects and it really squeezed their R&D. I think Snowflake is much more focused and you mentioned some of the background of their engineers, of course Oracle guys as well. However, you will see Amazon's going to trot out a ton of customers using their RA3 managed storage and their flash. I think it's the DC two piece. They have a ton of action in the marketplace because it's just so easy. It's interesting one of the comments, you asked this yesterday, was with regard to separating compute from storage, which of course it's Snowflakes they basically invented it, it was one of their climbs to fame. The comment was what AWS has done to separate compute from storage for Redshift is largely a bolt on. Which I thought that was an interesting comment. I've had some other comments. My friend George Gilbert said, "Hey, despite claims to the contrary, AWS still hasn't separated storage from compute. What they have is really primitive." We got to dig into that some more, but you're seeing some data points that suggest there's copycatting going on. May not be as functional, but at the same time, Erik, like I was saying good enough is maybe good enough in this space. >> Yeah, and especially with the enterprise, right? You see what Microsoft has done. Their technology is not as good as all the niche players, but it's good enough and I already have a Microsoft license. So, (laughs) you know why am I going to move off of it. But I want to get back to the comment you mentioned too about that particular gentleman who made that comment about RedShift, their separation is really more of a bolt on than a true offering. It's interesting because I know who these people are behind the scenes and he has a very strong relationship with AWS. So it was interesting to me that in the panel yesterday he said he switched from Redshift to Snowflake because of that and some other functionality issues. So there is no doubt from the end users that are buying this. And he's again a fortune 100 financial organization. Not the same one we mentioned. That's a different one. But again, a fortune 100 well known financials organization. He switched from AWS to Snowflake. So there is no doubt that right now they have the technological lead. And when you look at our ETR data platform, we have that adoption reasoning slide that you show. When you look at the number one reason that people are adopting Snowflake is their feature set of technological lead. They have that lead now. They have to maintain it. Now, another thing to bring up on this to think about is when you have large data sets like this, and as we're moving forward, you need to have machine learning capabilities layered into it, right? So they need to make sure that they're playing nicely with that. And now you could go open source with the Apache suite, but Google is doing so well with BigQuery and so well with their machine learning aspects. And although they don't speak enterprise well, they don't sell to the enterprise well, that's changing. I think they're somebody to really keep an eye on because their machine learning capabilities that are layered into the BigQuery are impressive. Now, of course, Microsoft Azure has Databricks. They're layering that in, but this is an area where I think you're going to see maybe what's next. You have to have machine learning capabilities out of the box if you're going to do data as a service. Right now Snowflake doesn't really have that. Some of the other ones do. So I had one of my guest panelist basically say to me, because of that, they ended up going with Google BigQuery because he was able to run a machine learning algorithm within hours of getting set up. Within hours. And he said that that kind of capability out of the box is what people are going to have to use going forward. So that's another thing we should dive into a little bit more. >> Let's get into that right now. Let's bring up the next slide which shows net score. Remember this is spending momentum across the major cloud players and plus Snowflake. So you've got Snowflake on the left, Google, AWS and Microsoft. And it's showing three survey timeframes last October, April 20, which is right in the middle of the pandemic. And then the most recent survey which has just taken place this month in July. And you can see Snowflake very, very high scores. Actually improving from the last October survey. Google, lower net scores, but still very strong. Want to come back to that and pick up on your comments. AWS dipping a little bit. I think what's happening here, we saw this yesterday with AWS's results. 30% growth. Awesome. Slight miss on the revenue side for AWS, but look, I mean massive. And they're so exposed to so many industries. So some of their industries have been pretty hard hit. Microsoft pretty interesting. A little softness there. But one of the things I wanted to pick up on Erik, when you're talking about Google and BigQuery and it's ML out of the box was what we heard from a lot of the VENN participants. There's no question about it that Google technically I would say is one of Snowflake's biggest competitors because it's cloud native. Remember >> Yep. >> AWS did a license one time. License deal with PowerShell and had a sort of refactor the thing to be cloud native. And of course we know what's happening with Microsoft. They basically were on-prem and then they put stuff in the cloud and then all the updates happen in the cloud. And then they pushed to on-prem. But they have that what Frank Slootman calls that halfway house, but BigQuery no question technically is very, very solid. But again, you see Snowflake right now anyway outpacing these guys in terms of momentum. >> Snowflake is out outpacing everyone (laughs) across our entire survey universe. It really is impressive to see. And one of the things that they have going for them is they can connect all three. It's that multi-cloud ability, right? That portability that they bring to you is such an important piece for today's modern CIO as data architects. They don't want vendor lock-in. They are afraid of vendor lock-in. And this ability to make their data portable and to do that with ease and the flexibility that they offer is a huge advantage right now. However, I think you're a hundred percent right. Google has been so focused on the engineering side and never really focusing on the enterprise sales side. That is why they're playing catch up. I think they can catch up. They're bringing in some really important enterprise salespeople with experience. They're starting to learn how to talk to enterprise, how to sell, how to support. And nobody can really doubt their engineering. How many open sources have they given us, right? They invented Kubernetes and the entire container space. No one's really going to compete with them on that side if they learn how to sell it and support it. Yeah, right now they're behind. They're a distant third. Don't get me wrong. From a pure hosted ability, AWS is number one. Microsoft is yours. Sometimes it looks like it's number one, but you have to recognize that a lot of that is because of simply they're hosted 365. It's a SAS app. It's not a true cloud type of infrastructure as a service. But Google is a distant third, but their technology is really, really great. And their ability to catch up is there. And like you said, in the panels we were hearing a lot about their machine learning capability is right out of the box. And that's where this is going. What's the point of having this huge data if you're not going to be supporting it on new application architecture. And all of those applications require machine learning. >> Awesome. So we're. And I totally agree with what you're saying about Google. They just don't have it figured out how to sell the enterprise yet. And a hundred percent AWS has the best cloud. I mean, hands down. But a very, very competitive market as we heard yesterday in front of Congress. Now we're on the point about, can Snowflake compete with the big cloud players? I want to show one more data point. So let's bring up, this is the same chart as we showed before, but it's new adoptions. And this is really telling. >> Yeah. >> You can see Snowflake with 34% in the yellow, new adoptions, down yes from previous surveys, but still significantly higher than the other players. Interesting to see Google showing momentum on new adoptions, AWS down on new adoptions. And again, exposed to a lot of industries that have been hard hit. And Microsoft actually quite low on new adoption. So this is very impressive for Snowflake. And I want to talk about the multi-cloud strategy now Erik. This came up a lot. The VENN participants who are sort of fans of Snowflake said three things: It was really the flexibility, the security which is really interesting to me. And a lot of that had to do with the flexibility. The ability to easily set up roles and not have to waste a lot of time wrangling. And then the third was multi-cloud. And that was really something that came through heavily in the VENN. Didn't it? >> It really did. And again, I think it just comes down to, I don't think you can ever overstate how afraid these guys are of vendor lock-in. They can't have it. They don't want it. And it's best practice to make sure your sensitive information is being kind of spread out a little bit. We all know that people don't trust Bezos. So if you're in certain industries, you're not going to use AWS at all, right? So yeah, this ability to have your data portability through multi-cloud is the number one reason I think people start looking at Snowflake. And to go to your point about the adoptions, it's very telling and it bodes well for them going forward. Most of the things that we're seeing right now are net new workloads. So let's go again back to the legacy side that we were talking about, the Teradatas, IBMs, Oracles. They still have the monolithic applications and the data that needs to support that, right? Like an old ERP type of thing. But anyone who's now building a new application, bringing something new to market, it's all net new workloads. There is no net new workload that is going to go to SAP or IBM. It's not going to happen. The net new workloads are going to the cloud. And that's why when you switch from net score to adoption, you see Snowflake really stand out because this is about new adoption for net new workloads. And that's really where they're driving everything. So I would just say that as this continues, as data as a service continues, I think Snowflake's only going to gain more and more share for all the reasons you stated. Now get back to your comment about security. I was shocked by that. I really was. I did not expect these guys to say, "Oh, no. Snowflake enterprise security not a concern." So two panels ago, a gentleman from a fortune 100 financials said, "Listen, it's very difficult to get us to sign off on something for security. Snowflake is past it, it is enterprise ready, and we are going full steam ahead." Once they got that go ahead, there was no turning back. We gave it to our DevOps guys, we gave it to everyone and said, "Run with it." So, when a company that's big, I believe their fortune rank is 28. (laughs) So when a company that big says, "Yeah, you've got the green light. That we were okay with the internal compliance aspect, we're okay with the security aspect, this gives us multi-cloud portability, this gives us flexibility, ease of use." Honestly there's a really long runway ahead for Snowflake. >> Yeah, so the big question I have around the multi-cloud piece and I totally and I've been on record saying, "Look, if you're going looking for an agnostic multi-cloud, you're probably not going to go with the cloud vendor." (laughs) But I've also said that I think multi-cloud to date anyway has largely been a symptom as opposed to a strategy, but that's changing. But to your point about lock-in and also I think people are maybe looking at doing things across clouds, but I think that certainly it expands Snowflake's TAM and we're going to talk about that because they support multiple clouds and they're going to be the best at that. That's a mandate for them. The question I have is how much of complex joining are you going to be doing across clouds? And is that something that is just going to be too latency intensive? Is that really Snowflake's expertise? You're really trying to build that data layer. You're probably going to maybe use some kind of Postgres database for that. >> Right. >> I don't know. I need to dig into that, but that would be an opportunity from a TAM standpoint. I just don't know how real that is. >> Yeah, unfortunately I'm going to just be honest with this one. I don't think I have great expertise there and I wouldn't want to lead anyone a wrong direction. But from what I've heard from some of my VENN interview subjects, this is happening. So the data portability needs to be agnostic to the cloud. I do think that when you're saying, are there going to be real complex kind of workloads and applications? Yes, the answer is yes. And I think a lot of that has to do with some of the container architecture as well, right? If I can just pull data from one spot, spin it up for as long as I need and then just get rid of that container, that ethereal layer of compute. It doesn't matter where the cloud lies. It really doesn't. I do think that multi-cloud is the way of the future. I know that the container workloads right now in the enterprise are still very small. I've heard people say like, "Yeah, I'm kicking the tires. We got 5%." That's going to grow. And if Snowflake can make themselves an integral part of that, then yes. I think that's one of those things where, I remember the guy said, "Snowflake has to continue to innovate. They have to find a way to grow this TAM." This is an area where they can do so. I think you're right about that, but as far as my expertise, on this one I'm going to be honest with you and say, I don't want to answer incorrectly. So you and I need to dig in a little bit on this one. >> Yeah, as it relates to question four, what's the viability of Snowflake's multi-cloud strategy? I'll say unquestionably supporting multiple clouds, very viable. Whether or not portability across clouds, multi-cloud joins, et cetera, TBD. So we'll keep digging into that. The last thing I want to focus on here is the last question, does Snowflake's TAM justify its $20 billion valuation? And you think about the data pipeline. You go from data acquisition to data prep. I mean, that really is where Snowflake shines. And then of course there's analysis. You've got to bring in EMI or AI and ML tools. That's not Snowflake's strength. And then you're obviously preparing that, serving that up to the business, visualization. So there's potential adjacencies that they could get into that they may or may not decide to. But so we put together this next chart which is kind of the TAM expansion opportunity. And I just want to briefly go through it. We published this stuff so you can go and look at all the fine print, but it's kind of starts with the data lake disruption. You called it data swamp before. The Hadoop no schema on, right? Basically the ROI of Hadoop became reduction of investment as my friend Abby Meadow would say. But so they're kind of disrupting that data lake which really was a failure. And then really going after that enterprise data warehouse which is kind of I have it here as a 10 billion. It's actually bigger than that. It's probably more like a $20 billion market. I'll update this slide. And then really what Snowflake is trying to do is be data as a service. A data layer across data stores, across clouds, really make it easy to ingest and prepare data and then serve the business with insights. And then ultimately this huge TAM around automated decision making, real-time analytics, automated business processes. I mean, that is potentially an enormous market. We got a couple of hundred billion. I mean, just huge. Your thoughts on their TAM? >> I agree. I'm not worried about their TAM and one of the reasons why as I mentioned before, they are coming out with a whole lot of cash. (laughs) This is going to be a red hot IPO. They are going to have a lot of money to spend. And look at their management team. Who is leading the way? A very successful, wise, intelligent, acquisitive type of CEO. I think there is going to be M&A activity, and I believe that M&A activity is going to be 100% for the mindset of growing their TAM. The entire world is moving to data as a service. So let's take as a backdrop. I'm going to go back to the panel we did yesterday. The first question we asked was, there was an understanding or a theory that when the virus pandemic hit, people wouldn't be taking on any sort of net new architecture. They're like, "Okay, I have Teradata, I have IBM. Let's just make sure the lights are on. Let's stick with it." Every single person I've asked, they're just now eight different experts, said to us, "Oh, no. Oh, no, no." There is the virus pandemic, the shift from work from home. Everything we're seeing right now has only accelerated and advanced our data as a service strategy in the cloud. We are building for scale, adopting cloud for data initiatives. So, across the board they have a great backdrop. So that's going to only continue, right? This is very new. We're in the early innings of this. So for their TAM, that's great because that's the core of what they do. Now on top of it you mentioned the type of things about, yeah, right now they don't have great machine learning. That could easily be acquired and built in. Right now they don't have an analytics layer. I for one would love to see these guys talk to Alteryx. Alteryx is red hot. We're seeing great data and great feedback on them. If they could do that business intelligence, that analytics layer on top of it, the entire suite as a service, I mean, come on. (laughs) Their TAM is expanding in my opinion. >> Yeah, your point about their leadership is right on. And I interviewed Frank Slootman right in the heart of the pandemic >> So impressed. >> and he said, "I'm investing in engineering almost sight unseen. More circumspect around sales." But I will caution people. That a lot of people I think see what Slootman did with ServiceNow. And he came into ServiceNow. I have to tell you. It was they didn't have their unit economics right, they didn't have their sales model and marketing model. He cleaned that up. Took it from 120 million to 1.2 billion and really did an amazing job. People are looking for a repeat here. This is a totally different situation. ServiceNow drove a truck through BMCs install base and with IT help desk and then created this brilliant TAM expansion. Let's learn and expand model. This is much different here. And Slootman also told me that he's a situational CEO. He doesn't have a playbook. And so that's what is most impressive and interesting about this. He's now up against the biggest competitors in the world: AWS, Google and Microsoft and dozens of other smaller startups that have raised a lot of money. Look at the company like Yellowbrick. They've raised I don't know $180 million. They've got a great team. Google, IBM, et cetera. So it's going to be really, really fun to watch. I'm super excited, Erik, but I'll tell you the data right now suggest they've got a great tailwind and if they can continue to execute, this is going to be really fun to watch. >> Yeah, certainly. I mean, when you come out and you are as impressive as Snowflake is, you get a target on your back. There's no doubt about it, right? So we said that they basically created the data as a service. That's going to invite competition. There's no doubt about it. And Yellowbrick is one that came up in the panel yesterday about one of our CIOs were doing a proof of concept with them. We had about seven others mentioned as well that are startups that are in this space. However, none of them despite their great valuation and their great funding are going to have the kind of money and the market lead that Slootman is going to have which Snowflake has as this comes out. And what we're seeing in Congress right now with some antitrust scrutiny around the large data that's being collected by AWS as your Google, I'm not going to bet against this guy either. Right now I think he's got a lot of opportunity, there's a lot of additional layers and because he can basically develop this as a suite service, I think there's a lot of great opportunity ahead for this company. >> Yeah, and I guarantee that he understands well that customer acquisition cost and the lifetime value of the customer, the retention rates. Those are all things that he and Mike Scarpelli, his CFO learned at ServiceNow. Not learned, perfected. (Erik laughs) Well Erik, really great conversation, awesome data. It's always a pleasure having you on. Thank you so much, my friend. I really appreciate it. >> I appreciate talking to you too. We'll do it again soon. And stay safe everyone out there. >> All right, and thank you for watching everybody this episode of "CUBE Insights" powered by ETR. This is Dave Vellante, and we'll see you next time. (soft music)

Published Date : Jul 31 2020

SUMMARY :

This is breaking analysis and he's also the Great to see you too. and others in the community. I did not expect the And the horizontal axis is And one of the main concerns they have and some of the data lakes. and the legacy on-prem. but a key component of the TAM And back in the day where of part of the package. and Informatica the most. I mean, you're right that if And the other point is, "Hey, and from the more dominant It's interesting one of the comments, that in the panel yesterday and it's ML out of the box the thing to be cloud native. That portability that they bring to you And I totally agree with what And a lot of that had to and the data that needs and they're going to be the best at that. I need to dig into that, I know that the container on here is the last question, and one of the reasons heart of the pandemic and if they can continue to execute, And Yellowbrick is one that and the lifetime value of the customer, I appreciate talking to you too. This is Dave Vellante, and

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Frank Slootman	PERSON	0.99+
George Gilbert	PERSON	0.99+
Erik Bradley	PERSON	0.99+
Erik	PERSON	0.99+
Frank Slootman	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Mike Scarpelli	PERSON	0.99+
Google	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
David Floyd	PERSON	0.99+
Slootman	PERSON	0.99+
Teradata	ORGANIZATION	0.99+
Abby Meadow	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
$180 million	QUANTITY	0.99+
$20 billion	QUANTITY	0.99+
Netezza	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
77.5%	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
20%	QUANTITY	0.99+
10 billion	QUANTITY	0.99+
12 and a half billion	QUANTITY	0.99+
120 million	QUANTITY	0.99+
Oracles	ORGANIZATION	0.99+
one	QUANTITY	0.99+
two	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
Yellowbrick	ORGANIZATION	0.99+

Rachini Moosavi & Sonya Jordan, UNC Health | CUBE Conversation, July 2020

>> From theCUBE studios in Palo Alto in Boston, connecting with thought leaders all around the world, this a CUBE conversation. >> Hello, and welcome to this CUBE conversation, I'm John Furrier, host of theCUBE here, in our Palo Alto, California studios, here with our quarantine crew. We're getting all the remote interviews during this time of COVID-19. We've got two great remote guests here, Rachini Moosavi who's the Executive Director of Analytical Services and Data Governance at UNC Healthcare, and Sonya Jordan, Enterprise Analytics Manager of Data Governance at UNC Health. Welcome to theCUBE, thanks for coming on. >> Thank you. >> Thanks for having us. >> So, I'm super excited. University of North Carolina, my daughter will be a freshman this year, and she is coming, so hopefully she won't have to visit UNC Health, but looking forward to having more visits down there, it's a great place. So, thanks for coming on, really appreciate it. Okay, so the conversation today is going to be about how data and how analytics are helping solve problems, and ultimately, in your case, serve the community, and this is a super important conversation. So, before we get started, talk about UNC Health, what's going on there, how you guys organize, how big is it, what are some of the challenges that you have? >> SO UNC Health is comprised of about 12 different entities within our hospital system. We have physician groups as well as hospitals, and we serve, we're spread throughout all of North Carolina, and so we serve the patients of North Carolina, and that is our primary focus and responsibility for our mission. As part of the offices Sonya and I are in, we are in the Enterprise Analytics and Data Sciences Office that serves all of those entities and so we are centrally located in the triangle area of North Carolina, which is pretty central to the state, and we serve all of our entities equally from our Analytics and Data Governance needs. >> John: You guys got a different customer base, obviously you've got the clinical support, and you got the business applications, you got to be agile, that's what it's all about today, you don't need to rely on IT support. How do you guys do that? What's the framework? How do you guys tackle that problem of being agile, having the data be available, and you got two different customers, you got all the compliance issues with clinical, I can only imagine all the regulations involved, and you've got the business applications. How do you handle those? >> Yeah, so for us in the roles that we are in, we are fully responsible for more of the data and analytics needs of the organization, and so we provide services that truly are balanced across our clinician group, so we have physicians, and nurses, and all of the other ancillary clinical staff that we support, as well as the operational needs as well, so revenue cycle, finance, pharmacy, any of those groups that are required in order to run a healthcare system. So, we balance our time amongst all of those and for the work that we take on and how we continuously support them is really based on governance at the end of the day. How we make decisions around what the priorities are and what needs to happen next, and requires the best insights, is really how we focus on what work we do next. As for the applications that we build, in our office, we truly only build analytical applications or products like visualizations within Tableau as well as we support data governance platforms and services and so we provide some of the tools that enable our end users to be able to interact with the information that we're providing around analytics and insights, at the end of the day. >> Sonya, what's your job? Your title is Analytics Manager of Data Governance, obviously that sounds broad but governance is obviously required in all things. What is your job, what is your day-to-day roles like? What's your focus? >> Well, my day-to-day operations is first around building a data governance program. I try to work with identifying customers who we can start partnering with so that we can start getting documentation and utilizing a lot of the programs that we currently have, such as certification, so when we talk about initiatives, this is one of the initiatives that we use to partner with our stakeholders in order to start bringing visibilities to the various assets, such as metrics, or universes that we want to certify, or dashboards, algorithm, just various lists of different types of assets that we certify that we like to partner with the customers in order for them to start documenting within the tools, so that we can bring visibility to what's available, really focusing on data literacy, helping people to understand what assets are available, not only what assets are available, but who owns them, and who own the asset, and what can they do with it, making sure that we have great documentation in order to be able to leverage literacy as well. >> So, I can only imagine with how much volume you guys are dealing from a data standpoint, and the diversity, that the data warehouse must be massive, or it must be architected in a way that it can be agile because the needs, of the diverse needs. Can you guys share your thoughts on how you guys look on the data warehouse challenge and opportunity, and what you guys are currently doing? >> Well, so- >> Yeah you go ahead, Rachini. >> Go ahead, Sonya. >> Well, last year we implemented a tool, an enterprise warehouse, basically behind a tool that we implemented, and that was an opportunity for Data Governance to really lay some foundation and really bring visibility to the work that we could provide for the enterprise. We were able to embed into probably about six or seven of the 13 initiatives, I was actually within that project, and with that we were able to develop our stewardship committee, our data governance council, and because Rachini managed Data Solutions, our data solution manager was able to really help with the architect and integration of the tools. >> Rachini, your thoughts on running the data warehouse, because you've got to have flexibility for new types of data sources. How do you look at that? >> So, as Sonya just mentioned, we upgraded our data warehouse platform just recently because of these evolving needs, and like a lot of healthcare providers out there, a lot of them are either one or the other EMRs that are top in the market. With our EMR, they provide their own data warehouse, so you have to factor almost the impact of what they bring to the table in with an addition to all of those other sources of data that you're trying to co-mingle and bring together into the same data warehouse, and so for us, it was time for us to evolve our data warehouse. We ended up deciding on trying to create a virtual data warehouse, and in doing so, with virtualization, we had to upgrade our platform, which is what created that opportunity that Sonya was mentioning. And by moving to this new platform we are now able to bring all of that into one space and it's enabled us to think about how does the community of analysts interact with the data? How do we make that available to them in a secure way? In a way that they can take advantage of reusable master data files that could be our source of truth within our data warehouse, while also being able to have the flexibility to build what they need in their own functional spaces so that they can get the wealth of information that they need out of the same source and it's available to everyone. >> Okay, so I got to ask the question, and I was trying to get the good stuff out first, but let's get at the reality of COVID-19. You got pre-COVID-19 pandemic, we're kind of in the middle of it, and people are looking at strategies to come out of it, obviously the world will be changed, higher with a lot of virtualization, virtual meetings, and virtual workforce, but the data still needs to be, the business still needs to run, but data will be changing different sources, how are you guys responding to that crisis because you're going to be leaned on heavily for more and more support? >> Yeah it's been non-stop since March (laughs). So, I'm going to tell you about the reporting aspects of it, and then I'd love to turn it over to Sonya to tell you about some of the great things that we've actually been able to do to it and enhance our data governance program by not wasting this terrible event and this opportunity that's come up. So, with COVID, when it kicked off back in March, we actually formed a war room to address the needs around reporting analytics and just insights that our executives needed, and so in doing so, we created within the first week, our first weekend actually, our first dashboard, and within the next two weeks we had about eight or nine other dashboards that were available. And we continuously add to that. Information is so critical to our executives, to our clinicians, to be able to know how to address the evolving needs of COVID-19 and how we need to respond. We literally, and I'm not even exaggerating, at this very moment we have probably, let's see, I think it's seven different forecasts that we're trying to build all at the same time to try and help us prepare for this new recovery, this sort of ramp up efforts, so to your point, it started off as we're shutting down so that we can flatten the curve, but now as we try to also reopen at the same time while we're still meeting the needs of our COVID patients, there's this balancing act that we're trying to keep up with and so analytics is playing a critical factor in doing that. >> Sonya, your thoughts. First of all, congratulations, and action is what defines the players from the pretenders in my mind, you're seeing that play out, so congratulations for taking great action, I know you're working hard. Sonya, your thoughts, COVID, it's putting a lot of pressure? It highlights the weaknesses and strengths of what's kind of out there, what's your thoughts? >> Well, it just requires a great deal of collaboration and making sure that you're documenting metrics in a way where you're factoring true definition because at the end of the day, this information can go into a dashboard that's going to be visualized across the organization, I think what COVID has done was really enhanced the need and the understanding of why data governance is important and also it has allowed us to create a lot of standardization, where we we're standardizing a lot of processes that we currently had in correct place but just enhancing them. >> You know, not to go on a tangent, but I will, it's funny how the reality has kind of pulled back, exposed a lot of things, whether it's the remote work situation, people are VPNing, not under provision with the IT side. On the data side, everyone now understands the quality of the data. I mean, I got my kids talking progression analysis, "Oh, the curves are all wrong," I mean people are now seeing the science behind the data and they're looking at graphs all the time, you guys are in the visualization piece, this really highlights the need of data as a story, because there's an impact, and two, quality data. And if you don't have the data, the story isn't being told and then misinformation comes out of it, and this is actually playing out in real time, so it's not like it's just a use case for the most analytics but this again highlights the value of proposition of what you guys do. What's your personal thoughts on all this because this really is playing out globally. >> Yeah, it's been amazing how much information is out there. So, we have been extremely blessed at times but also burdened at times by that amount of information. So, there's the data that's going through our healthcare system that we're trying to manage and wrangle and do that data storytelling so that people can drive those insights to very effective decisions. But there's also all of this external data that we're trying to be able to leverage as well. And this is where the whole sharing of information can sometimes become really hard to try and get ahead of, we leverage the Johns Hopkins data for some time, but even that, too, can have some hiccups in terms of what's available. We try to use our State Department of Health and Human Services data and they just about updated their website and how information was being shared every other week and it was making it impossible for us to ingest that into our dashboards that we were providing, and so there's really great opportunities but also risks in some of the information that we're pulling. >> Sonya, what's your thoughts? I was just having a conversation this morning with the Chief of Analytics and Insight from NOA which is the National Oceanic Administration, about weather data and forecasting weather, and they've got this community model where they're trying to get the edges to kind of come in, this teases out a template. You guys have multiple locations. As you get more democratized in the connection points, whether it's third-party data, having a system managing that is hard, and again, this is a new trend that's emerging, this community connection points, where I think you guys might also might be a template, and your multiple locations, what's your general thoughts on that because the data's coming in, it's now connected in, whether it's first-party to the healthcare system or third-party. >> Yeah, well we have been leveraging our data governance tool to try to get that centralized location, making sure that we obtain the documentations. Due to COVID, everything is moving very fast, so it requires us to really sit down and capture the information and when you don't have enough resources in order to do that, it's easy to miss some very important information, so really trying to encourage people to understand the reason why we have data governance tools in order for them to leverage, in order to capture the documentation in a way that it can tell the story about the data, but most of all, to be able to capture it in a way so that if that person happened to leave the organization, we're not spending a lot of time trying to figure out how was this information created, how was this dashboard designed, where are the requirements, where are the specifications, where are the key elements, where does that information live, and making sure we capture that up front. >> So, guys, you guys are using Informatica, how are they helping you? Obviously, they have a system they're getting some great feedback on, how are you using Informatica, how is it going, and how has that enabled you guys to be successful? >> Yeah, so we decided on Informatica after doing a really thorough vetting of all of the other vendors in the industry that could provide us these services. We've really loved the capabilities that we've been able to provide to our customers at this point. It's evolving, I think, for us, the ability to partner with a group like Prominence, to be able to really leverage the capabilities of Informatica and then be really super, super hyper focused on providing data literacy back to our end users and making that the full intent of what we're doing within data governance has really enabled us to take the tools and make it something that's specific to UNC Health and the needs that our end users are verbalizing and provide that to them in a very positive way. >> Sonya, they talk about this master catalog, and I've talked to the CEO of Informatica and all their leaders, governance is a big part of it, and I've always said, I've always kind of had a hard time, I'm an entrepreneur, I like to innovate, move fast, break things, which is kind of not the way you work in the data world, you don't want to be breaking anything, so how do you balance governance and compliance with innovation? This has been a key topic and I know that you guys are using their enterprise data catolog. Is that helping? How does that fit in, is that part of it? >> Well, yeah, so during our COVID initiatives and building these telos dashboards, these visualizations and forecast models for executive leaders, we were able to document and EMPower you, which we rebranded Axon to EMPower, we were able to document a lot of our dashboards, which is a data set, and pretty much document attributes and show lineage from EMPower to EDC, so that users would know exactly when they start looking at the visualization not only what does this information mean, but they're also able to see what other sources that that information impacts as well as the data lineage, where did the information come from in EDC. >> So I got to ask the question to kind of wrap things up, has Informatica helped you guys out now that you're in this crisis? Obviously you've implemented before, now that you're in the middle of it, have you seen any things that jumped out at you that's been helpful, and are there areas that need to be worked on so that you guys continue to fight the good fight, come out of this thing stronger than before you came in? >> Yeah, there is a lot of new information, what we consider as "aha" moments that we've been learning about, and how EMPower, yes there's definitely a learning curve because we implemented EDC and EMPower last year doing our warehouse implementation, and so there's a lot of work that still needs to be done, but based on where we were the first of the year, I can say we have evolved tremendously due to a lot of the pandemic issues that arised, and we're looking to really evolve even greater, and pilot across the entire organization so that they can start leveraging these tools for their needs. >> Rachini you got any thoughts on your end on what's worked, what you see improvements coming, anything to share? >> Yeah, so we're excited about some of the new capabilities like the marketplace for example that's available in Axon, we're looking forward to being able to take advantage of some of these great new aspects of the tool so that we can really focus more on providing those insights back to our end users. I think for us, during COVID, it's really been about how do we take advantage of the immediate needs that are surfacing. How do we build all of these dashboards in record-breaking time but also make sure that folks understand exactly what's being represented within those dashboards, and so being able to provide that through our Informatica tools and service it back to our end users, almost in a seamless way like it's built into our dashboards, has been a really critical factor for us, and feeling like we can provide that level of transparency, and so I think that's where as we evolve that we would look for more opportunities, too. How do we make it simple for people to get that immediate answers to their questions, of what does the information need without it feeling like they're going elsewhere for the information. >> Rachini, thank you so much for your insight, Sonya as well, thanks for the insight, and stay safe. Sonya, behind you, I was pointing out, that's your artwork, you painted that picture. >> Yes. >> Looks beautiful. >> Yes, I did. >> You got two jobs, you're an artist, and you're doing data governance. >> Yes, I am, and I enjoy painting, that's how I relax (laughs). >> Looks great, get that on the market soon, get that on the marketplace, let's get that going. Appreciate the time, thank you so much for the insights, and stay safe and again, congratulations on the hard work you're doing, I know there's still a lot more to do, thanks for your time, appreciate it. >> Thank you. >> Thank you. >> It's theCUBE conversation, I'm John Furrier at the Palo Alto studios, for the remote interviews with Informatica, I'm John Furrier, thanks for watching. (upbeat music)

Published Date : Jul 24 2020

SUMMARY :

leaders all around the world, Hello, and welcome to and this is a super and so we serve the and you got the business applications, and all of the other obviously that sounds broad so that we can start getting documentation and what you guys are currently doing? and that was an opportunity running the data warehouse, and it's available to everyone. but the data still needs to be, so that we can flatten the curve, and action is what defines the players and making sure that and this is actually and do that data storytelling and again, this is a new and capture the information and making that the full intent and I know that you guys are using their so that users would know and pilot across the entire organization and so being able to provide that and stay safe. and you're doing data governance. Yes, I am, and I enjoy painting, that on the market soon, for the remote interviews

ENTITIES

Entity	Category	Confidence
Rachini Moosavi	PERSON	0.99+
Rachini	PERSON	0.99+
John	PERSON	0.99+
National Oceanic Administration	ORGANIZATION	0.99+
Sonya	PERSON	0.99+
March	DATE	0.99+
Sonya Jordan	PERSON	0.99+
John Furrier	PERSON	0.99+
July 2020	DATE	0.99+
Palo Alto	LOCATION	0.99+
Informatica	ORGANIZATION	0.99+
North Carolina	LOCATION	0.99+
last year	DATE	0.99+
two jobs	QUANTITY	0.99+
EMPower	ORGANIZATION	0.99+
State Department of Health and Human Services	ORGANIZATION	0.99+
one	QUANTITY	0.99+
UNC Healthcare	ORGANIZATION	0.99+
UNC Health	ORGANIZATION	0.99+
first dashboard	QUANTITY	0.99+
COVID	OTHER	0.99+
Prominence	ORGANIZATION	0.99+
two	QUANTITY	0.99+
today	DATE	0.99+
theCUBE	ORGANIZATION	0.98+
Palo Alto, California	LOCATION	0.98+
13 initiatives	QUANTITY	0.98+
NOA	ORGANIZATION	0.98+
CUBE	ORGANIZATION	0.98+
COVID-19	OTHER	0.98+
this year	DATE	0.97+
COVID	TITLE	0.97+
one space	QUANTITY	0.97+
Boston	LOCATION	0.97+
first weekend	QUANTITY	0.97+
Sonya	ORGANIZATION	0.97+
first week	QUANTITY	0.96+
Tableau	TITLE	0.96+
first	QUANTITY	0.96+
University of North Carolina	ORGANIZATION	0.96+
nine	QUANTITY	0.95+
about six	QUANTITY	0.94+
EDC	ORGANIZATION	0.94+
Chief	PERSON	0.94+
Axon	ORGANIZATION	0.93+
seven	QUANTITY	0.93+
Johns Hopkins	ORGANIZATION	0.92+
seven different forecasts	QUANTITY	0.92+
two different customers	QUANTITY	0.91+
First	QUANTITY	0.91+
two great remote guests	QUANTITY	0.91+
agile	TITLE	0.91+
pandemic	EVENT	0.9+
Enterprise Analytics and Data Sciences Office	ORGANIZATION	0.9+
about 12 different entities	QUANTITY	0.88+
Analytical Services	ORGANIZATION	0.87+
this morning	DATE	0.87+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Informatica World2018: