Opening Session feat. Jon Ramsey, AWS | AWS Startup Showcase S2 E4 | Cybersecurity
>>Hello, everyone. Welcome to the AWS startup showcase. This is season two, episode four, the ongoing series covering exciting startups from the AWS ecosystem to talk about cybersecurity. I'm your host, John furrier. And today I'm excited for this keynote presentation and I'm joined by John Ramsey, vice president of AWS security, John, welcome to the cubes coverage of the startup community within AWS. And thanks for this keynote presentation, >>Happy to be here. >>So, John, what do you guys, what do you do at AWS? Take, take minutes to explain your role, cuz it's very comprehensive. We saw at AWS reinforce event recently in Boston, a broad coverage of topics from Steven Schmid CJ, a variety of the executives. What's your role in particular at AWS? >>If you look at AWS, there are, there is a shared security responsibility model and CJ, the C the CSO for AWS is responsible for securing the AWS portion of the shared security responsibility model. Our customers are responsible for securing their part of the shared security responsible, responsible model. For me, I provide services to those customers to help them secure their part of that model. And those services come in different different categories. The first category is threat detection with guard. We that does real time detection and alerting and detective is then used to investigate those alerts to determine if there is an incident vulnerability management, which is inspector, which looks for third party vulnerabilities and security hub, which looks for configuration vulnerabilities and then Macy, which does sensitive data discovery. So I have those sets of services underneath me to help provide, to help customers secure their part of their shared security responsibility model. >>Okay, well, thanks for the call out there. I want to get that out there because I think it's important to note that, you know, everyone talks inside out, outside in customer focus. 80 of us has always been customer focused. We've been covering you guys for a long time, but you do have to secure the core cloud that you provide and you got great infrastructure tools technology down to the, down to the chip level. So that's cool. You're on the customer side. And right now we're seeing from these startups that are serving them. We had interviewed here at the showcase. There's a huge security transformation going on within the security market. It's the plane at 35,000 feet. That's engines being pulled out and rechange, as they say, this is huge. And, and what, what's it take for your, at customers with the enterprises out there that are trying to be more cyber resilient from threats, but also at the same time, protect what they also got. They can't just do a wholesale change overnight. They gotta be, you know, reactive, but proactive. How does it, what, what do they need to do to be resilient? That's the >>Question? Yeah. So, so I, I think it's important to focus on spending your resources. Everyone has constrained security resources and you have to focus those resources in the areas and the ways that reduce the greatest amount of risk. So risk really can be summed up is assets that I have that are most valuable that have a vulnerability that a threat is going to attack in that world. Then you wanna mitigate the threat or mitigate the vulnerability to protect the asset. If you have an asset that's vulnerable, but a threat isn't going to attack, that's less risky, but that changes over time. The threat and vulnerability windows are continuously evolving as threats, developing trade craft as vulnerabilities are being discovered as new software is being released. So it's a continuous picture and it's an adaptive picture where you have to continuously monitor what's happening. You, if you like use the N framework cybersecurity framework, you identify what you have to protect. >>That's the asset parts. Then you have to protect it. That's putting controls in place so that you don't have an incident. Then you from a threat perspective, then you ha to de detect an incident or, or a breach or a, a compromise. And then you respond and then you remediate and you have to continuously do that cycle to be in a position to, to de to have cyber resiliency. And one of the powers of the cloud is if you're building your applications in a cloud native form, you, your ability to respond can be very surgical, which is very important because then you don't introduce risk when you're responding. And by design, the cloud was, is, is architected to be more resilient. So being able to stay cyber resilient in a cloud native architecture is, is important characteristic. >>Yeah. And I think that's, I mean, it sounds so easy. Just identify what's to be protected. You monitor it. You're protected. You remediate sounds easy, but there's a lot of change going on and you got the cloud scale. And so you got security, you got cloud, you guys's a lot of things going on there. How do you think about security and how does the cloud help customers? Because again, there's two things going on. There's a shared responsibility model. And at the end of the day, the customer's responsible on their side. That's right, right. So that's right. Cloud has some tools. How, how do you think about going about security and, and where cloud helps specifically? >>Yeah, so really it's about there, there's a model called observe, orient, decide an actor, the ULO and it was created by John Boyd. He was a fighter pilot in the Korean war. And he knew that if I could observe what the opponent is doing, orient myself to my goals and their goals, make a decision on what the next best action is, and then act, and then follow that UTI loop, or, or also said a sense sense, making, deciding, and acting. If I can do that faster than the, than the enemy, then I can, I will win every fight. So in the cyber world, being in a position where you are observing and that's where cloud can really help you, because you can interrogate the infrastructure, you can look at what's happening, you can build baselines from it. And then you can look at deviations from, from the norm. It's just one way to observe this orient yourself around. Does this represent something that increases risk? If it does, then what's the next best action that I need to take, make that decision and then act. And that's also where the cloud is really powerful, cuz there's this huge con control plane that lets you lets you enable or disable resources or reconfigure resources. And if you're in, in the, in the situation where you can continuously do that very, very rapidly, you can, you can outpace and out maneuver the adversary. >>Yeah. You know, I remember I interviewed Steven Schmidt in 2014 and at that time everybody was poo pooing. Oh man, the cloud is so unsecure. He made a statement to me and we wrote about this. The cloud is more secure and will be more secure because it can be complicated to the hacker, but also easy for the, for provisioning. So he kind of brought up this, this discussion around how cloud would be more secure turns out he's right. He was right now. People are saying, oh, the cloud's more secure than, than standalone. What's different John now than not even going back to 2014, just go back a few years. Cloud is helpful, is more interrogation. You mentioned, this is important. What's, what's changed in the cloud per se in AWS that enables customers and say third parties who are trying to comply and manage risk as well. So you have this shared back and forth. What's different in the cloud now than just a few years ago that that's helping security. >>Yeah. So if you look at the, the parts of the shared responsibility model, AWS is the further up the stack you go from just infrastructure to platforms, say containers up to serverless the, the, we are taking more of the responsibility of that, of that stack. And in the process, we are investing resources and capabilities. For example, guard duty takes an S audit feed for containers to be able to monitor what's happening from a container perspective. And then in server list, really the majority of what, what needs to be defended is, is part of our responsibility model. So that that's an important shift because in that world, we have a very large team in our world. We have a very large team who knows the infrastructure who knows the threat and who knows how to protect customers all the way up to the, to the, to the boundary. And so that, that's a really important consideration. When you think about how you design your design, your applications is you want the developers to focus on the business logic, the business value and let, but still, also the security of the code that they're writing, but let us take over the rest of it so that you don't have to worry about it. >>Great, good, good insight there. I want to get your thoughts too. On another trend here at the showcase, one of the things that's emerging besides the normal threat landscape and the compliance and whatnot is API protection. I mean APIs, that's what made the cloud great. Right? So, you know, and it's not going away, it's only gonna get better cuz we live in an interconnected digital world. So, you know, APIs are gonna be lingual Franko what they say here. Companies just can't sit back and expect third parties complying with cyber regulations and best practices. So how do security and organizations be proactive? Not just on API, it's just a, a signal in my mind of, of, of more connections. So you got shared responsibility, AWS, your customers and your customers, partners and customers of connection points. So we live in an interconnected world. How do security teams and organizations be proactive on the cyber risk management piece? >>Yeah. So when it comes to APIs, the, the thing you look for is the trust boundaries. Where are the trust boundaries in the system between the user and the, in the machine, the machine and another machine on the network, the API is a trust boundary. And it, it is a place where you need to facilitate some kind of some form of control because what you're, what could happen on the trust boundaries, it could be used to, to attack. Like I trust that someone's gonna give me something that is legitimate, but you don't know that that a actually is true. You should assume that the, the one side of the trust boundary is, is malicious and you have to validate it. And by default, make sure that you know, that what you're getting is actually trustworthy and, and valid. So think of an API is just a trust boundary and that whatever you're gonna receive at that boundary is not gonna be legitimate in that you need to validate, validate the contents of, of whatever you receive. >>You know, I was noticing online, I saw my land who runs S3 a us commenting about 10 years anniversary, 10, 10 year birthday of S3, Amazon simple storage service. A lot of the customers are using all their applications with S3 means it's file repository for their application, workflow ingesting literally thousands and trillions of objects from S3 today. You guys have about, I mean, trillions of objects on S3, this is big part of the application workflow. Data security has come up as a big discussion item. You got S3. I mean, forget about the misconfiguration about S3 buckets. That's kind of been reported on beyond that as application workflows, tap into S3 and data becomes the conversation around securing data. How do you talk to customers about that? Because that's also now part of the scaling of these modern cloud native applications, managing data on Preem cross in flight at rest in motion. What's your view on data security, John? >>Yeah. Data security is also a trust boundary. The thing that's going to access the data there, you have to validate it. The challenge with data security is, is customers don't really know where all their data is or even where their sensitive data is. And that continues to be a large problem. That's why we have services like Macy, which are whose job is to find in S3 the data that you need to protect the most because it's because it's sensitive. Getting the least privilege has always been the, the goal when it comes, when it comes to data security. The problem is, is least privilege is really, really hard to, to achieve because there's so many different common nations of roles and accounts and org orgs. And, and so there, there's also another technology called access analyzer that we have that helps customers figure out like this is this the right, if are my intended authorizations, the authorizations I have, are they the ones that are intended for that user? And you have to continuously review that as a, as a means to make sure that you're getting as close to least privilege as you possibly can. >>Well, one of the, the luxuries of having you here on the cube keynote for this showcase is that you also have the internal view at AWS, but also you have the external view with customers. So I have to ask you, as you talk to customers, obviously there's a lot of trends. We're seeing more managed services in areas where there's skill gaps, but teams are also overloaded too. We're hearing stories about security teams, overwhelmed by the solutions that they have to deploy quickly and scale up quickly cost effectively the need for in instrumentation. Sometimes it's intrusive. Sometimes it agentless sensors, OT. I mean, it's getting crazy at re Mars. We saw a bunch of stuff there. This is a reality, the teams aspect of it. Can you share your experiences and observations on how companies are organizing, how they're thinking about team formation, how they're thinking about all these new things coming at them, new environments, new scale choices. What, what do you seeing on, on the customer side relative to security team? Yeah. And their role and relationship to the cloud and, and the technologies. >>Yeah, yeah. A absolutely it. And we have to remember at the end of the day on one end of the wire is a black hat on the other end of the wire is a white hat. And so you need people and, and people are a critical component of being able to defend in the context of security operations alert. Fatigue is absolutely a problem. The, the alerts, the number of alerts, the volume of alerts is, is overwhelming. And so you have to have a means to effectively triage them and get the ones into investigation that, that you think will be the most, the, the most significant going back to the risk equation, you found, you find those alerts and events that are, are the ones that, that could harm you. The most. You'll also one common theme is threat hunting. And the concept behind threat hunting is, is I don't actually wait for an alert I lean in and I'm proactive instead of reactive. >>So I find the system that I at least want the hacker in. I go to that system and I look for any anomalies. I look for anything that might make me think that there is a, that there is a hacker there or a compromise or some unattended consequence. And the reason you do that is because it reduces your dwell time, time between you get compromised to the time detect something, which is you, which might be, you know, months, because there wasn't an alert trigger. So that that's also a very important aspect for, for AWS and our security services. We have a strategy across all of the security services that we call end to end, or how do we move from APIs? Because they're all API driven and security buyers generally not most do not ha have like a development team, like their security operators and they want a solution. And so we're moving more from APIs to outcomes. So how do we stitch all the services together in a way so that the time, the time that an analyst, the SOC analyst spends or someone doing investigation or someone doing incident response is the, is the most important time, most valuable time. And in the process of stitching this all together and helping our customers with alert, fatigue, we'll be doing things that will use sort of inference and machine learning to help prioritize the greatest risk for our customers. >>That's a great, that's a great call out. And that brings up the point of you get the frontline, so to speak and back office, front office kind of approach here. The threats are out there. There's a lot of leaning in, which is a great point. I think that's a good, good comment and insight there. The question I have for you is that everyone's kind of always talks about that, but there's the, the, I won't say boring, the important compliance aspect of things, you know, this has become huge, right? So there's a lot of blocking and tackling that's needed behind the scenes on the compliance side, as well as prevention, right? So can you take us through in your mind how customers are looking at the best strategies for compliance and security, because there's a lot of work you gotta get done and you gotta lay out everything as you mentioned, but compliance specifically to report is also a big thing for >>This. Yeah. Yeah. Compliance is interesting. I suggest taking a security approach to compliance instead of a compliance approach to security. If you're compliant, you may not be secure, but if you're secure, you'll be compliant. And the, the really interesting thing about compliance also is that as soon as something like a, a, a category of control is required in, in some form of compliance, compliance regime, the effectiveness of that control is reduced because the threats go well, I'm gonna presume that they have this control. I'm gonna presume cuz they're compliant. And so now I'm gonna change my tactic to evade the control. So if you only are ever following compliance, you're gonna miss a whole set of tactics that threats have developed because they presume you're compliant and you have those controls in place. So you wanna make sure you have something that's outside of the outside of the realm of compliance, because that's the thing that will trip them up. That's the thing that they're not expecting that threats not expecting and that that's what we'll be able to detect them. >>Yeah. And it almost becomes one of those things where it's his fault, right? So, you know, finger pointing with compliance, you get complacent. I can see that. Can you give an example? Cause I think that's probably something that people are really gonna want to know more about because it's common sense. But can you give an example of security driving compliance? Is there >>Yeah, sure. So there's there they're used just as an example, like multifactor authentication was used everywhere that for, for banks in high risk transactions, in real high risk transactions. And then that like that was a security approach to compliance. Like we said, that's a, that's a high net worth individual. We're gonna give them a token and that's how they're gonna authenticate. And there was no, no, the F F I C didn't say at the time that there needed to be multifactor authentication. And then after a period of time, when account takeover was, was on the rise, the F F I C the federally financial Institute examiner's council, something like that said, we, you need to do multifactor authentication. Multifactor authentication was now on every account. And then the threat went down to, okay, well, we're gonna do man in the browser attacks after the user authenticates, which now is a new tactic in that tactic for those high net worth individuals that had multifactor didn't exist before became commonplace. Yeah. And so that, that, that's a, that's an example of sort of the full life cycle and the important lesson there is that security controls. They have a diminishing halflife of effectiveness. They, they need to be continuous and adaptive or else the value of them is gonna decrease over time. >>Yeah. And I think that's a great call up because agility and speed is a big factor when he's merging threats. It's not a stable, mature hacker market. They're evolving too. All right. Great stuff. I know your time's very valuable, John. I really appreciate you coming on the queue. A couple more questions for you. We have 10 amazing startups here in the, a AWS ecosystem, all private looking grade performance wise, they're all got the kind of the same vibe of they're kind of on something new. They're doing something new and clever and different than what was, what was kind of done 10 years ago. And this is where the cloud advantage is coming in cloud scale. You mentioned that some of those things, data, so you start to see new things emerge. How, how would you talk to CSOs or CXOs that are watching about how to evaluate startups like these they're, they're, they're somewhat, still small relative to some of the bigger players, but they've got unique solutions and they're doing things a little bit differently. How should some, how should CSOs and Steve evaluate them? How can startups work with the CSOs? What's your advice to both the buyer and the startup to, to bring their product to the market. And what's the best way to do that? >>Yeah. So the first thing is when you talk to a CSO, be respected, be respectful of their time like that. Like, they'll appreciate that. I remember when I was very, when I just just started, I went to talk to one of the CISOs as one of the five major banks and he sat me down and he said, and I tried to tell him what I had. And he was like son. And he went through his book and he had, he had 10 of every, one thing that I had. And I realized that, and I, I was grateful for him giving me an explanation. And I said to him, I said, look, I'm sorry. I wasted your time. I will not do that again. I apologize. I, if I can't bring any value, I won't come back. But if I think I can bring you something of value now that I know what I know, please, will you take the meeting? >>He was like, of course. And so be respectful of their time. They know what the problem is. They know what the threat is. You be, be specific about how you're different right now. There is so much confusion in the market about what you do. Like if you're really have something that's differentiated, be very, very specific about it. And don't be afraid of it, like lean into it and explain the value to that. And that, that, that would, would save a, a lot of time and a lot and make the meeting more valuable for the CSO >>And the CISOs. Are they evaluate these startups? How should they look at them? What are some kind of markers that you would say would be good, kind of things to look for size of the team reviews technology, or is it doesn't matter? It's more of a everyone's environment's different. What >>Would your, yeah. And, you know, for me, I, I always look first to the security value. Cause if there isn't security value, nothing else matters. So there's gotta be some security value. Then I tend to look at the management team, quite frankly, what are, what are the, what are their experiences and what, what do they know that that has led them to do something different that is driving security value. And then after that, for me, I tend to look to, is this someone that I can have a long term relationship with? Is this someone that I can, you know, if I have a problem and I call them, are they gonna, you know, do this? Or are they gonna say, yes, we're in, we're in this together, we'll figure it out. And then finally, if, if for AWS, you know, scale is important. So we like to look at, at scale in terms of, is this a solution that I can, that I can, that I can get to, to the scale that I needed at >>Awesome. Awesome. John Ramsey, vice president of security here on the cubes. Keynote. John, thank you for your time. I really appreciate, I know how busy you are with that for the next minute, or so share a little bit of what you're up to. What's on your plate. What are you thinking about as you go out to the marketplace, talk to customers what's on your agenda. What's your talk track, put a plug in for what you're up to. >>Yeah. So for, for the services I have, we, we are, we are absolutely moving. As I mentioned earlier, from APIs to outcomes, we're moving up the stack to be able to defend both containers, as well as, as serverless we're, we're moving out in terms of we wanna get visibility and signal, not just from what we see in AWS, but from other places to inform how do we defend AWS? And then also across, across the N cybersecurity framework in terms of we're doing a lot of, we, we have amazing detection capability and we have this infrastructure that we could respond, do like micro responses to be able to, to interdict the threat. And so me moving across the N cybersecurity framework from detection to respond. >>All right, thanks for your insight and your time sharing in this keynote. We've got great 10 great, amazing startups. Congratulations for all your success at AWS. You guys doing a great job, shared responsibility that the threats are out there. The landscape is changing. The scale's increasing more data tsunamis coming every day, more integration, more interconnected, it's getting more complex. So you guys are doing a lot of great work there. Thanks for your time. Really appreciate >>It. Thank you, John. >>Okay. This is the AWS startup showcase. Season two, episode four of the ongoing series covering the exciting startups coming out of the, a AWS ecosystem. This episode's about cyber security and I'm your host, John furrier. Thanks for watching.
SUMMARY :
episode four, the ongoing series covering exciting startups from the AWS ecosystem to talk about So, John, what do you guys, what do you do at AWS? If you look at AWS, there are, there is a shared security responsibility We've been covering you guys for a long time, but you do have to secure the core cloud that you provide and you got So it's a continuous picture and it's an adaptive picture where you have to continuously monitor And one of the powers of the cloud is if you're building your applications in a cloud And so you got security, you got cloud, you guys's a lot of things going on there. So in the cyber world, being in a position where you are observing and So you have this shared back AWS is the further up the stack you go from just infrastructure to platforms, So you got shared responsibility, And it, it is a place where you need to facilitate some How do you talk to customers about that? the data there, you have to validate it. security teams, overwhelmed by the solutions that they have to deploy quickly and scale up quickly cost And so you have to have a And the reason you do that is because it reduces your dwell time, time between you get compromised to the And that brings up the point of you get the frontline, so to speak and back office, So you wanna make sure you have something that's outside of the outside of the realm of So, you know, finger pointing with examiner's council, something like that said, we, you need to do multifactor authentication. You mentioned that some of those things, data, so you start to see new things emerge. And I said to him, I said, look, I'm sorry. the market about what you do. And the CISOs. And, you know, for me, I, I always look first to the security value. What are you thinking about as you go out to the marketplace, talk to customers what's on your And so me moving across the N cybersecurity framework from detection So you guys are doing a lot of great work there. the exciting startups coming out of the, a AWS ecosystem.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Steve | PERSON | 0.99+ |
Jon Ramsey | PERSON | 0.99+ |
John Boyd | PERSON | 0.99+ |
2014 | DATE | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
John Ramsey | PERSON | 0.99+ |
John | PERSON | 0.99+ |
10 | QUANTITY | 0.99+ |
Boston | LOCATION | 0.99+ |
35,000 feet | QUANTITY | 0.99+ |
Steven Schmidt | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
one | QUANTITY | 0.99+ |
S3 | TITLE | 0.99+ |
80 | QUANTITY | 0.99+ |
first category | QUANTITY | 0.99+ |
two things | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
10 years ago | DATE | 0.98+ |
10 amazing startups | QUANTITY | 0.98+ |
both | QUANTITY | 0.98+ |
John furrier | PERSON | 0.98+ |
Korean war | EVENT | 0.98+ |
trillions of objects | QUANTITY | 0.97+ |
five major banks | QUANTITY | 0.97+ |
one way | QUANTITY | 0.97+ |
10 year | QUANTITY | 0.97+ |
Macy | ORGANIZATION | 0.96+ |
one thing | QUANTITY | 0.94+ |
first thing | QUANTITY | 0.93+ |
first | QUANTITY | 0.92+ |
one side | QUANTITY | 0.91+ |
thousands and trillions of objects | QUANTITY | 0.91+ |
both containers | QUANTITY | 0.9+ |
about 10 years | QUANTITY | 0.86+ |
few years ago | DATE | 0.84+ |
one common theme | QUANTITY | 0.84+ |
Season two | QUANTITY | 0.82+ |
Franko | PERSON | 0.8+ |
Steven Schmid CJ | PERSON | 0.78+ |
episode four | OTHER | 0.76+ |
Startup Showcase S2 E4 | EVENT | 0.76+ |
Preem | TITLE | 0.74+ |
F F I C | ORGANIZATION | 0.71+ |
one end | QUANTITY | 0.7+ |
couple more questions | QUANTITY | 0.7+ |
season | QUANTITY | 0.66+ |
episode | QUANTITY | 0.62+ |
Macy | TITLE | 0.58+ |
F I | OTHER | 0.56+ |
CSO | ORGANIZATION | 0.54+ |
two | OTHER | 0.53+ |
Lie 3, Today’s Modern Data Stack Is Modern | Starburst
(energetic music) >> Okay, we're back with Justin Borgman, CEO of Starburst, Richard Jarvis is the CTO of EMIS Health, and Teresa Tung is the cloud first technologist from Accenture. We're on to lie number three. And that is the claim that today's "Modern Data Stack" is actually modern. So (chuckles), I guess that's the lie. Or, is that it's not modern. Justin, what do you say? >> Yeah, I think new isn't modern. Right? I think it's the new data stack. It's the cloud data stack, but that doesn't necessarily mean it's modern. I think a lot of the components actually, are exactly the same as what we've had for 40 years. Rather than Teradata, you have Snowflake. Rather than Informatica, you have Fivetran. So, it's the same general stack, just, y'know, a cloud version of it. And I think a lot of the challenges that have plagued us for 40 years still maintain. >> So, let me come back to you Justin. Okay, but there are differences, right? You can scale. You can throw resources at the problem. You can separate compute from storage. You really, there's a lot of money being thrown at that by venture capitalists, and Snowflake you mentioned, its competitors. So that's different. Is it not? Is that not at least an aspect of modern dial it up, dial it down? So what do you say to that? >> Well, it is. It's certainly taking, y'know what the cloud offers and taking advantage of that. But it's important to note that the cloud data warehouses out there are really just separating their compute from their storage. So it's allowing them to scale up and down, but your data's still stored in a proprietary format. You're still locked in. You still have to ingest the data to get it even prepared for analysis. So a lot of the same structural constraints that exist with the old enterprise data warehouse model on-preem still exist. Just yes, a little bit more elastic now because the cloud offers that. >> So Teresa, let me go to you, 'cause you have cloud-first in your title. So, what's say you to this conversation? >> Well, even the cloud providers are looking towards more of a cloud continuum, right? So the centralized cloud as we know it, maybe data lake, data warehouse in the central place, that's not even how the cloud providers are looking at it. They have use query services. Every provider has one that really expands those queries to be beyond a single location. And if we look at a lot of where our- the future goes, right? That's going to very much fall the same thing. There was going to be more edge. There's going to be more on-premise, because of data sovereignty, data gravity, because you're working with different parts of the business that have already made major cloud investments in different cloud providers, right? So, there's a lot of reasons why the modern, I guess, the next modern generation of the data stack needs to be much more federated. >> Okay, so Richard, how do you deal with this? You've obviously got, you know, the technical debt, the existing infrastructure, it's on the books. You don't want to just throw it out. A lot of conversation about modernizing applications, which a lot of times is, you know, of microservices layer on top of legacy apps. How do you think about the Modern Data Stack? >> Well, I think probably the first thing to say is that the stack really has to include the processes and people around the data as well is all well and good changing the technology. But if you don't modernize how people use that technology, then you're not going to be able to, to scale because just 'cause you can scale CPU and storage doesn't mean you can get more people to use your data to generate you more value for the business. And so what we've been looking at is really changing in very much aligned to data products and, and data mesh. How do you enable more people to consume the service and have the stack respond in a way that keeps costs low? Because that's important for our customers consuming this data but also allows people to occasionally run enormous queries and then tick along with smaller ones when required. And it's a good job we did because during COVID all of a sudden we had enormous pressures on our data platform to answer really important life threatening queries. And if we couldn't scale both our data stack and our teams we wouldn't have been able to answer those as quickly as we had. So I think the stack needs to support a scalable business not just the technology itself. >> Well thank you for that. So Justin let's, let's try to break down what the critical aspects are of the modern data stack. So you think about the past, you know, five seven years cloud obviously has given a different pricing model. Derisked experimentation, you know that we talked about the ability to scale up scale down, but it's, I'm taking away that that's not enough. Based on what Richard just said, the modern data stack has to serve the business and enable the business to build data products. I buy that. I'm you a big fan of the data mesh concepts, even though we're early days. So what are the critical aspects if you had to think about you know, the, maybe putting some guardrails and definitions around the modern data stack, what does that look like? What are some of the attributes and, and principles there >> Of how it should look like or, or how >> Yeah. What it should be? >> Yeah. Yeah. Well, I think, you know, in, in Theresa mentioned this in in a previous segment about the data warehouse is not necessarily going to disappear. It just becomes one node, one element of the overall data mesh. And I certainly agree with that. So by no means, are we suggesting that, you know Snowflake or what Redshift or whatever cloud data warehouse you may be using is going to disappear, but it's it's not going to become the end all be all. It's not the, the central single source of truth. And I think that's the paradigm shift that needs to occur. And I think it's also worth noting that those who were the early adopters of the modern data stack were primarily digital, native born in the cloud young companies who had the benefit of of idealism. They had the benefit of starting with a clean slate that does not reflect the vast majority of enterprises. And even those companies, as they grow up, mature out of that ideal state, they go by a business. Now they've got something on another cloud provider that has a different data stack and they have to deal with that heterogeneity that is just change and change is a part of life. And so I think there is an element here that is almost philosophical. It's like, do you believe in an absolute ideal where I can just fit everything into one place or do I believe in reality? And I think the far more pragmatic approach is really what data mesh represents. So to answer your question directly, I think it's adding you know, the ability to access data that lives outside of the data warehouse, maybe living in open data formats in a data lake or accessing operational systems as well. Maybe you want to directly access data that lives in an Oracle database or a Mongo database or, or what have you. So creating that flexibility to really future proof yourself from the inevitable change that you will you won't encounter over time. >> So thank you. So Theresa, based on what Justin just said, I I might take away there is it's inclusive whether it's a data mart, data hub, data lake, data warehouse, just a node on the mesh. Okay. I get that. Does that include Theresa on, on Preem data? Obviously it has to. What are you seeing in terms of the ability to, to take that data mesh concept on Preem I mean most implementations I've seen and data mesh, frankly really aren't, you know adhering to the philosophy there. Maybe, maybe it's data lake and maybe it's using glue. You look at what JPMC is doing, HelloFresh, a lot of stuff happening on the AWS cloud in that, you know, closed stack, if you will. What's the answer to that Theresa? >> I mean, I think it's a killer case for data mesh. The fact that you have valuable data sources on Preem, and then yet you still want to modernize and take the best of cloud. Cloud is still, like we mentioned, there's a lot of great reasons for it around the economics and the way ability to tap into the innovation that the cloud providers are giving around data and AI architecture. It's an easy button. So the mesh allows you to have the best of both world. You can start using the data products on Preem, or in the existing systems that are working already. It's meaningful for the business. At the same time, you can modernize the ones that make business sense because it needs better performance. It needs, you know, something that is, is cheaper or or maybe just tapping into better analytics to get better insights, right? So you're going to be able to stretch and really have the best of both worlds. That, again, going back to Richard's point, that is meaningful by the business. Not everything has to have that one size fits all set a tool. >> Okay. Thank you. So Richard, you know, talking about data as product wonder if we could give us your perspectives here what are the advantages of treating data as a product? What, what role do data products have in the modern data stack? We talk about monetizing data. What are your thoughts on data products? >> So for us, one of the most important data products that we've been creating is taking data that is healthcare data across a wide variety of different settings. So information about patients, demographics about their their treatment, about their medications and so on, and taking that into a standards format that can be utilized by a wide variety of different researchers because misinterpreting that data or having the data not presented in the way that the user is expecting means that you generate the wrong insight and in any business that's clearly not a desirable outcome but when that insight is so critical as it might be in healthcare or some security settings you really have to have gone to the trouble of understanding the data, presenting it in a format that everyone can clearly agree on. And then letting people consume in a very structured managed way, even if that data comes from a variety of different sources in the first place. And so our data product journey has really begun by standardizing data across a number of different silos through the data mesh. So we can present out both internally and through the right governance externally to, to researchers. >> So that data product through whatever APIs is is accessible, it's discoverable, but it's obviously got to be governed as well. You mentioned appropriately provided to internally. >> Yeah. >> But also, you know, external folks as well. So the, so you've, you've architected that capability today? >> We have and because the data is standard it can generate value much more quickly and we can be sure of the security and value that that's providing, because the data product isn't just about formatting the data into the correct tables, it's understanding what it means to redact the data or to remove certain rows from it or to interpret what a date actually means. Is it the start of the contract or the start of the treatment or the date of birth of a patient? These things can be lost in the data storage without having the proper product management around the data to say in a very clear business context what does this data mean, and what does it mean to process this data for a particular use case. >> Yeah, it makes sense. It's got the context. If the, if the domains on the data, you know you got to cut through a lot of the, the centralized teams, the technical teams that that data agnostic, they don't really have that context. All right, let's end. Justin. How does Starburst fit into this modern data stack? Bring us home. >> Yeah. So I think for us it's really providing our customers with, you know the flexibility to operate and analyze data that lives in a wide variety of different systems. Ultimately giving them that optionality, you know and optionality provides the ability to reduce costs store more in a data lake rather than data warehouse. It provides the ability for the fastest time to insight to access the data directly where it lives. And ultimately with this concept of data products that we've now, you know incorporated into our offering as well you can really create and, and curate, you know data as a product to be shared and consumed. So we're trying to help enable the data mesh, you know model and make that an appropriate compliment to you know, the modern data stack that people have today. >> Excellent. Hey, I want to thank Justin, Teresa, and Richard for joining us today. You guys are great. Big believers in the in the data mesh concept, and I think, you know we're seeing the future of data architecture. So thank you. Now, remember, all these conversations are going to be available on the cube.net for on demand viewing. You can also go to starburst.io. They have some great content on the website and they host some really thought provoking interviews and they have awesome resources. Lots of data mesh conversations over there and really good stuff in, in the resource section. So check that out. Thanks for watching the "Data Doesn't Lie... or Does It?" made possible by Starburst data. This is Dave Vellante for the Cube, and we'll see you next time. (upbeat music)
SUMMARY :
And that is the claim It's the cloud data stack, So, let me come back to you Justin. that the cloud data warehouses out there So Teresa, let me go to you, So the centralized cloud as we know it, it's on the books. the first thing to say is of the modern data stack. from the inevitable change that you will What's the answer to that Theresa? So the mesh allows you to in the modern data stack? or having the data not presented So that data product But also, you know, around the data to say in a on the data, you know enable the data mesh, you know in the data mesh concept,
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Richard | PERSON | 0.99+ |
Teresa Tung | PERSON | 0.99+ |
Justin | PERSON | 0.99+ |
Teresa | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Justin Borgman | PERSON | 0.99+ |
Richard Jarvis | PERSON | 0.99+ |
40 years | QUANTITY | 0.99+ |
Theresa | PERSON | 0.99+ |
Starburst | ORGANIZATION | 0.99+ |
JPMC | ORGANIZATION | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Informatica | ORGANIZATION | 0.99+ |
Accenture | ORGANIZATION | 0.99+ |
both worlds | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
EMIS Health | ORGANIZATION | 0.99+ |
first technologist | QUANTITY | 0.98+ |
one element | QUANTITY | 0.98+ |
both | QUANTITY | 0.98+ |
first thing | QUANTITY | 0.98+ |
five seven years | QUANTITY | 0.98+ |
one | QUANTITY | 0.97+ |
Teradata | ORGANIZATION | 0.97+ |
Oracle | ORGANIZATION | 0.97+ |
cube.net | OTHER | 0.96+ |
Mongo | ORGANIZATION | 0.95+ |
one size | QUANTITY | 0.93+ |
Cube | ORGANIZATION | 0.92+ |
Preem | TITLE | 0.92+ |
both world | QUANTITY | 0.91+ |
one place | QUANTITY | 0.91+ |
Today’s | TITLE | 0.89+ |
Fivetran | ORGANIZATION | 0.86+ |
Data Doesn't Lie... or Does It? | TITLE | 0.86+ |
single location | QUANTITY | 0.85+ |
HelloFresh | ORGANIZATION | 0.84+ |
first place | QUANTITY | 0.83+ |
CEO | PERSON | 0.83+ |
Lie | TITLE | 0.82+ |
single source | QUANTITY | 0.79+ |
first | QUANTITY | 0.75+ |
one node | QUANTITY | 0.72+ |
Snowflake | ORGANIZATION | 0.66+ |
Snowflake | TITLE | 0.66+ |
three | QUANTITY | 0.59+ |
CTO | PERSON | 0.53+ |
Data Stack | TITLE | 0.53+ |
Redshift | TITLE | 0.52+ |
starburst.io | OTHER | 0.48+ |
COVID | TITLE | 0.37+ |
Starburst The Data Lies FULL V2b
>>In 2011, early Facebook employee and Cloudera co-founder Jeff Ocker famously said the best minds of my generation are thinking about how to get people to click on ads. And that sucks. Let's face it more than a decade later organizations continue to be frustrated with how difficult it is to get value from data and build a truly agile data-driven enterprise. What does that even mean? You ask? Well, it means that everyone in the organization has the data they need when they need it. In a context that's relevant to advance the mission of an organization. Now that could mean cutting cost could mean increasing profits, driving productivity, saving lives, accelerating drug discovery, making better diagnoses, solving, supply chain problems, predicting weather disasters, simplifying processes, and thousands of other examples where data can completely transform people's lives beyond manipulating internet users to behave a certain way. We've heard the prognostications about the possibilities of data before and in fairness we've made progress, but the hard truth is the original promises of master data management, enterprise data, warehouses, data marts, data hubs, and yes, even data lakes were broken and left us wanting from more welcome to the data doesn't lie, or doesn't a series of conversations produced by the cube and made possible by Starburst data. >>I'm your host, Dave Lanta and joining me today are three industry experts. Justin Borgman is this co-founder and CEO of Starburst. Richard Jarvis is the CTO at EMI health and Theresa tongue is cloud first technologist at Accenture. Today we're gonna have a candid discussion that will expose the unfulfilled and yes, broken promises of a data past we'll expose data lies, big lies, little lies, white lies, and hidden truths. And we'll challenge, age old data conventions and bust some data myths. We're debating questions like is the demise of a single source of truth. Inevitable will the data warehouse ever have featured parody with the data lake or vice versa is the so-called modern data stack, simply centralization in the cloud, AKA the old guards model in new cloud close. How can organizations rethink their data architectures and regimes to realize the true promises of data can and will and open ecosystem deliver on these promises in our lifetimes, we're spanning much of the Western world today. Richard is in the UK. Teresa is on the west coast and Justin is in Massachusetts with me. I'm in the cube studios about 30 miles outside of Boston folks. Welcome to the program. Thanks for coming on. Thanks for having us. Let's get right into it. You're very welcome. Now here's the first lie. The most effective data architecture is one that is centralized with a team of data specialists serving various lines of business. What do you think Justin? >>Yeah, definitely a lie. My first startup was a company called hit adapt, which was an early SQL engine for hit that was acquired by Teradata. And when I got to Teradata, of course, Teradata is the pioneer of that central enterprise data warehouse model. One of the things that I found fascinating was that not one of their customers had actually lived up to that vision of centralizing all of their data into one place. They all had data silos. They all had data in different systems. They had data on prem data in the cloud. You know, those companies were acquiring other companies and inheriting their data architecture. So, you know, despite being the industry leader for 40 years, not one of their customers truly had everything in one place. So I think definitely history has proven that to be a lie. >>So Richard, from a practitioner's point of view, you know, what, what are your thoughts? I mean, there, there's a lot of pressure to cut cost, keep things centralized, you know, serve the business as best as possible from that standpoint. What, what is your experience show? >>Yeah, I mean, I think I would echo Justin's experience really that we, as a business have grown up through acquisition, through storing data in different places sometimes to do information governance in different ways to store data in, in a platform that's close to data experts, people who really understand healthcare data from pharmacies or from, from doctors. And so, although if you were starting from a Greenfield site and you were building something brand new, you might be able to centralize all the data and all of the tooling and teams in one place. The reality is that that businesses just don't grow up like that. And, and it's just really impossible to get that academic perfection of, of storing everything in one place. >>Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, you know, right. You actually did have to have a single version of the truth for certain financial data, but really for those, some of those other use cases, I, I mentioned, I, I do feel like the industry has kinda let us down. What's your take on this? Where does it make sense to have that sort of centralized approach versus where does it make sense to maybe decentralized? >>I, I think you gotta have centralized governance, right? So from the central team, for things like star Oxley, for things like security for certainly very core data sets, having a centralized set of roles, responsibilities to really QA, right. To serve as a design authority for your entire data estate, just like you might with security, but how it's implemented has to be distributed. Otherwise you're not gonna be able to scale. Right? So being able to have different parts of the business really make the right data investments for their needs. And then ultimately you're gonna collaborate with your partners. So partners that are not within the company, right. External partners, we're gonna see a lot more data sharing and model creation. And so you're definitely going to be decentralized. >>So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, on data mesh. It was a great program. You invited Jamma, Dani, of course, she's the creator of the data mesh. And her one of our fundamental premises is that you've got this hyper specialized team that you've gotta go through. And if you want anything, but at the same time, these, these individuals actually become a bottleneck, even though they're some of the most talented people in the organization. So I guess question for you, Richard, how do you deal with that? Do you, do you organize so that there are a few sort of rock stars that, that, you know, build cubes and, and the like, and, and, and, or have you had any success in sort of decentralizing with, you know, your, your constituencies, that data model? >>Yeah. So, so we absolutely have got rockstar, data scientists and data guardians. If you like people who understand what it means to use this data, particularly as the data that we use at emos is very private it's healthcare information. And some of the, the rules and regulations around using the data are very complex and, and strict. So we have to have people who understand the usage of the data, then people who understand how to build models, how to process the data effectively. And you can think of them like consultants to the wider business, because a pharmacist might not understand how to structure a SQL query, but they do understand how they want to process medication information to improve patient lives. And so that becomes a, a consulting type experience from a, a set of rock stars to help a, a more decentralized business who needs to, to understand the data and to generate some valuable output. >>Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, I got a centralized team and that's the most cost effective way to serve the business. Otherwise I got, I got duplication. What do you say to that? >>Well, I, I would argue it's probably not the most cost effective and, and the reason being really twofold. I think, first of all, when you are deploying a enterprise data warehouse model, the, the data warehouse itself is very expensive, generally speaking. And so you're putting all of your most valuable data in the hands of one vendor who now has tremendous leverage over you, you know, for many, many years to come. I think that's the story at Oracle or Terra data or other proprietary database systems. But the other aspect I think is that the reality is those central data warehouse teams is as much as they are experts in the technology. They don't necessarily understand the data itself. And this is one of the core tenants of data mash that that jam writes about is this idea of the domain owners actually know the data the best. >>And so by, you know, not only acknowledging that data is generally decentralized and to your earlier point about SAR, brain Oxley, maybe saving the data warehouse, I would argue maybe GDPR and data sovereignty will destroy it because data has to be decentralized for, for those laws to be compliant. But I think the reality is, you know, the data mesh model basically says, data's decentralized, and we're gonna turn that into an asset rather than a liability. And we're gonna turn that into an asset by empowering the people that know the data, the best to participate in the process of, you know, curating and creating data products for, for consumption. So I think when you think about it, that way, you're going to get higher quality data and faster time to insight, which is ultimately going to drive more revenue for your business and reduce costs. So I think that that's the way I see the two, the two models comparing and contrasting. >>So do you think the demise of the data warehouse is inevitable? I mean, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing infrastructure. Maybe they're gonna build on top of it, but what does that mean? Does that mean the E D w just becomes, you know, less and less valuable over time, or it's maybe just isolated to specific use cases. What's your take on that? >>Listen, I still would love all my data within a data warehouse would love it. Mastered would love it owned by essential team. Right? I think that's still what I would love to have. That's just not the reality, right? The investment to actually migrate and keep that up to date. I would say it's a losing battle. Like we've been trying to do it for a long time. Nobody has the budgets and then data changes, right? There's gonna be a new technology. That's gonna emerge that we're gonna wanna tap into. There's going to be not enough investment to bring all the legacy, but still very useful systems into that centralized view. So you keep the data warehouse. I think it's a very, very valuable, very high performance tool for what it's there for, but you could have this, you know, new mesh layer that still takes advantage of the things. I mentioned, the data products in the systems that are meaningful today and the data products that actually might span a number of systems, maybe either those that either source systems for the domains that know it best, or the consumer based systems and products that need to be packaged in a way that be really meaningful for that end user, right? Each of those are useful for a different part of the business and making sure that the mesh actually allows you to use all of them. >>So, Richard, let me ask you, you take, take Gemma's principles back to those. You got to, you know, domain ownership and, and, and data as product. Okay, great. Sounds good. But it creates what I would argue are two, you know, challenges, self-serve infrastructure let's park that for a second. And then in your industry, the one of the high, most regulated, most sensitive computational governance, how do you automate and ensure federated governance in that mesh model that Theresa was just talking about? >>Well, it absolutely depends on some of the tooling and processes that you put in place around those tools to be, to centralize the security and the governance of the data. And I think, although a data warehouse makes that very simple, cause it's a single tool, it's not impossible with some of the data mesh technologies that are available. And so what we've done at emus is we have a single security layer that sits on top of our data match, which means that no matter which user is accessing, which data source, we go through a well audited well understood security layer. That means that we know exactly who's got access to which data field, which data tables. And then everything that they do is, is audited in a very kind of standard way, regardless of the underlying data storage technology. So for me, although storing the data in one place might not be possible understanding where your source of truth is and securing that in a common way is still a valuable approach and you can do it without having to bring all that data into a single bucket so that it's all in one place. And, and so having done that and investing quite heavily in making that possible has paid dividends in terms of giving wider access to the platform and ensuring that only data that's available under GDPR and other regulations is being used by, by the data users. >>Yeah. So Justin, I mean, Democrat, we always talk about data democratization and you know, up until recently, they really haven't been line of sight as to how to get there. But do you have anything to add to this because you're essentially taking, you know, do an analytic queries and with data that's all dispersed all over the, how are you seeing your customers handle this, this challenge? >>Yeah. I mean, I think data products is a really interesting aspect of the answer to that. It allows you to, again, leverage the data domain owners, people know the data, the best to, to create, you know, data as a product ultimately to be consumed. And we try to represent that in our product as effectively a almost eCommerce like experience where you go and discover and look for the data products that have been created in your organization. And then you can start to consume them as, as you'd like. And so really trying to build on that notion of, you know, data democratization and self-service, and making it very easy to discover and, and start to use with whatever BI tool you, you may like, or even just running, you know, SQL queries yourself, >>Okay. G guys grab a sip of water. After this short break, we'll be back to debate whether proprietary or open platforms are the best path to the future of data excellence, keep it right there. >>Your company has more data than ever, and more people trying to understand it, but there's a problem. Your data is stored across multiple systems. It's hard to access and that delays analytics and ultimately decisions. The old method of moving all of your data into a single source of truth is slow and definitely not built for the volume of data we have today or where we are headed while your data engineers spent over half their time, moving data, your analysts and data scientists are left, waiting, feeling frustrated, unproductive, and unable to move the needle for your business. But what if you could spend less time moving or copying data? What if your data consumers could analyze all your data quickly? >>Starburst helps your teams run fast queries on any data source. We help you create a single point of access to your data, no matter where it's stored. And we support high concurrency, we solve for speed and scale, whether it's fast, SQL queries on your data lake or faster queries across multiple data sets, Starburst helps your teams run analytics anywhere you can't afford to wait for data to be available. Your team has questions that need answers. Now with Starburst, the wait is over. You'll have faster access to data with enterprise level security, easy connectivity, and 24 7 support from experts, organizations like Zolando Comcast and FINRA rely on Starburst to move their businesses forward. Contact our Trino experts to get started. >>We're back with Jess Borgman of Starburst and Richard Jarvis of EVAs health. Okay, we're gonna get to lie. Number two, and that is this an open source based platform cannot give you the performance and control that you can get with a proprietary system. Is that a lie? Justin, the enterprise data warehouse has been pretty dominant and has evolved and matured. Its stack has mature over the years. Why is it not the default platform for data? >>Yeah, well, I think that's become a lie over time. So I, I think, you know, if we go back 10 or 12 years ago with the advent of the first data lake really around Hudu, that probably was true that you couldn't get the performance that you needed to run fast, interactive, SQL queries in a data lake. Now a lot's changed in 10 or 12 years. I remember in the very early days, people would say, you you'll never get performance because you need to be column there. You need to store data in a column format. And then, you know, column formats we're introduced to, to data apes, you have Parque ORC file in aro that were created to ultimately deliver performance out of that. So, okay. We got, you know, largely over the performance hurdle, you know, more recently people will say, well, you don't have the ability to do updates and deletes like a traditional data warehouse. >>And now we've got the creation of new data formats, again like iceberg and Delta and Hodi that do allow for updates and delete. So I think the data lake has continued to mature. And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, know it takes six or seven years to build a functional database. I think that's that's right. And now we've had almost a decade go by. So, you know, these technologies have matured to really deliver very, very close to the same level performance and functionality of, of cloud data warehouses. So I think the, the reality is that's become a line and now we have large giant hyperscale internet companies that, you know, don't have the traditional data warehouse at all. They do all of their analytics in a data lake. So I think we've, we've proven that it's very much possible today. >>Thank you for that. And so Richard, talk about your perspective as a practitioner in terms of what open brings you versus, I mean, look closed is it's open as a moving target. I remember Unix used to be open systems and so it's, it is an evolving, you know, spectrum, but, but from your perspective, what does open give you that you can't get from a proprietary system where you are fearful of in a proprietary system? >>I, I suppose for me open buys us the ability to be unsure about the future, because one thing that's always true about technology is it evolves in a, a direction, slightly different to what people expect. And what you don't want to end up is done is backed itself into a corner that then prevents it from innovating. So if you have chosen a technology and you've stored trillions of records in that technology and suddenly a new way of processing or machine learning comes out, you wanna be able to take advantage and your competitive edge might depend upon it. And so I suppose for us, we acknowledge that we don't have perfect vision of what the future might be. And so by backing open storage technologies, we can apply a number of different technologies to the processing of that data. And that gives us the ability to remain relevant, innovate on our data storage. And we have bought our way out of the, any performance concerns because we can use cloud scale infrastructure to scale up and scale down as we need. And so we don't have the concerns that we don't have enough hardware today to process what we want to do, want to achieve. We can just scale up when we need it and scale back down. So open source has really allowed us to maintain the being at the cutting edge. >>So Jess, let me play devil's advocate here a little bit, and I've talked to Shaak about this and you know, obviously her vision is there's an open source that, that the data meshes open source, an open source tooling, and it's not a proprietary, you know, you're not gonna buy a data mesh. You're gonna build it with, with open source toolings and, and vendors like you are gonna support it, but to come back to sort of today, you can get to market with a proprietary solution faster. I'm gonna make that statement. You tell me if it's a lie and then you can say, okay, we support Apache iceberg. We're gonna support open source tooling, take a company like VMware, not really in the data business, but how, the way they embraced Kubernetes and, and you know, every new open source thing that comes along, they say, we do that too. Why can't proprietary systems do that and be as effective? >>Yeah, well, I think at least with the, within the data landscape saying that you can access open data formats like iceberg or, or others is, is a bit dis disingenuous because really what you're selling to your customer is a certain degree of performance, a certain SLA, and you know, those cloud data warehouses that can reach beyond their own proprietary storage drop all the performance that they were able to provide. So it is, it reminds me kind of, of, again, going back 10 or 12 years ago when everybody had a connector to Haddo and that they thought that was the solution, right? But the reality was, you know, a connector was not the same as running workloads in Haddo back then. And I think similarly, you know, being able to connect to an external table that lives in an open data format, you know, you're, you're not going to give it the performance that your customers are accustomed to. And at the end of the day, they're always going to be predisposed. They're always going to be incentivized to get that data ingested into the data warehouse, cuz that's where they have control. And you know, the bottom line is the database industry has really been built around vendor lockin. I mean, from the start, how, how many people love Oracle today, but our customers, nonetheless, I think, you know, lockin is, is, is part of this industry. And I think that's really what we're trying to change with open data formats. >>Well, that's interesting reminded when I, you know, I see the, the gas price, the tees or gas price I, I drive up and then I say, oh, that's the cash price credit card. I gotta pay 20 cents more, but okay. But so the, the argument then, so let me, let me come back to you, Justin. So what's wrong with saying, Hey, we support open data formats, but yeah, you're gonna get better performance if you, if you keep it into our closed system, are you saying that long term that's gonna come back and bite you cuz you're gonna end up, you mentioned Oracle, you mentioned Teradata. Yeah. That's by, by implication, you're saying that's where snowflake customers are headed. >>Yeah, absolutely. I think this is a movie that, you know, we've all seen before. At least those of us who've been in the industry long enough to, to see this movie play over a couple times. So I do think that's the future. And I think, you know, I loved what Richard said. I actually wrote it down. Cause I thought it was an amazing quote. He said, it buys us the ability to be unsure of the future. Th that that pretty much says it all the, the future is unknowable and the reality is using open data formats. You remain interoperable with any technology you want to utilize. If you want to use spark to train a machine learning model and you want to use Starbust to query via sequel, that's totally cool. They can both work off the same exact, you know, data, data sets by contrast, if you're, you know, focused on a proprietary model, then you're kind of locked in again to that model. I think the same applies to data, sharing to data products, to a wide variety of, of aspects of the data landscape that a proprietary approach kind of closes you in and locks you in. >>So I, I would say this Richard, I'd love to get your thoughts on it. Cause I talked to a lot of Oracle customers, not as many te data customers, but, but a lot of Oracle customers and they, you know, they'll admit, yeah, you know, they're jamming us on price and the license cost they give, but we do get value out of it. And so my question to you, Richard, is, is do the, let's call it data warehouse systems or the proprietary systems. Are they gonna deliver a greater ROI sooner? And is that in allure of, of that customers, you know, are attracted to, or can open platforms deliver as fast in ROI? >>I think the answer to that is it can depend a bit. It depends on your businesses skillset. So we are lucky that we have a number of proprietary teams that work in databases that provide our operational data capability. And we have teams of analytics and big data experts who can work with open data sets and open data formats. And so for those different teams, they can get to an ROI more quickly with different technologies for the business though, we can't do better for our operational data stores than proprietary databases. Today we can back off very tight SLAs to them. We can demonstrate reliability from millions of hours of those databases being run at enterprise scale, but for an analytics workload where increasing our business is growing in that direction, we can't do better than open data formats with cloud-based data mesh type technologies. And so it's not a simple answer. That one will always be the right answer for our business. We definitely have times when proprietary databases provide a capability that we couldn't easily represent or replicate with open technologies. >>Yeah. Richard, stay with you. You mentioned, you know, you know, some things before that, that strike me, you know, the data brick snowflake, you know, thing is, oh, is a lot of fun for analysts like me. You've got data bricks coming at it. Richard, you mentioned you have a lot of rockstar, data engineers, data bricks coming at it from a data engineering heritage. You get snowflake coming at it from an analytics heritage. Those two worlds are, are colliding people like PJI Mohan said, you know what? I think it's actually harder to play in the data engineering. So I E it's easier to for data engineering world to go into the analytics world versus the reverse, but thinking about up and coming engineers and developers preparing for this future of data engineering and data analytics, how, how should they be thinking about the future? What, what's your advice to those young people? >>So I think I'd probably fall back on general programming skill sets. So the advice that I saw years ago was if you have open source technologies, the pythons and Javas on your CV, you commander 20% pay, hike over people who can only do proprietary programming languages. And I think that's true of data technologies as well. And from a business point of view, that makes sense. I'd rather spend the money that I save on proprietary licenses on better engineers, because they can provide more value to the business that can innovate us beyond our competitors. So I think I would my advice to people who are starting here or trying to build teams to capitalize on data assets is begin with open license, free capabilities, because they're very cheap to experiment with. And they generate a lot of interest from people who want to join you as a business. And you can make them very successful early, early doors with, with your analytics journey. >>It's interesting. Again, analysts like myself, we do a lot of TCO work and have over the last 20 plus years. And in world of Oracle, you know, normally it's the staff, that's the biggest nut in total cost of ownership, not an Oracle. It's the it's the license cost is by far the biggest component in the, in the blame pie. All right, Justin, help us close out this segment. We've been talking about this sort of data mesh open, closed snowflake data bricks. Where does Starburst sort of as this engine for the data lake data lake house, the data warehouse fit in this, in this world? >>Yeah. So our view on how the future ultimately unfolds is we think that data lakes will be a natural center of gravity for a lot of the reasons that we described open data formats, lowest total cost of ownership, because you get to choose the cheapest storage available to you. Maybe that's S3 or Azure data lake storage, or Google cloud storage, or maybe it's on-prem object storage that you bought at a, at a really good price. So ultimately storing a lot of data in a deal lake makes a lot of sense, but I think what makes our perspective unique is we still don't think you're gonna get everything there either. We think that basically centralization of all your data assets is just an impossible endeavor. And so you wanna be able to access data that lives outside of the lake as well. So we kind of think of the lake as maybe the biggest place by volume in terms of how much data you have, but to, to have comprehensive analytics and to truly understand your business and understand it holistically, you need to be able to go access other data sources as well. And so that's the role that we wanna play is to be a single point of access for our customers, provide the right level of fine grained access controls so that the right people have access to the right data and ultimately make it easy to discover and consume via, you know, the creation of data products as well. >>Great. Okay. Thanks guys. Right after this quick break, we're gonna be back to debate whether the cloud data model that we see emerging and the so-called modern data stack is really modern, or is it the same wine new bottle? When it comes to data architectures, you're watching the cube, the leader in enterprise and emerging tech coverage. >>Your data is capable of producing incredible results, but data consumers are often left in the dark without fast access to the data they need. Starers makes your data visible from wherever it lives. Your company is acquiring more data in more places, more rapidly than ever to rely solely on a data centralization strategy. Whether it's in a lake or a warehouse is unrealistic. A single source of truth approach is no longer viable, but disconnected data silos are often left untapped. We need a new approach. One that embraces distributed data. One that enables fast and secure access to any of your data from anywhere with Starburst, you'll have the fastest query engine for the data lake that allows you to connect and analyze your disparate data sources no matter where they live Starburst provides the foundational technology required for you to build towards the vision of a decentralized data mesh Starburst enterprise and Starburst galaxy offer enterprise ready, connectivity, interoperability, and security features for multiple regions, multiple clouds and everchanging global regulatory requirements. The data is yours. And with Starburst, you can perform analytics anywhere in light of your world. >>Okay. We're back with Justin Boardman. CEO of Starbust Richard Jarvis is the CTO of EMI health and Theresa tongue is the cloud first technologist from Accenture. We're on July number three. And that is the claim that today's modern data stack is actually modern. So I guess that's the lie it's it is it's is that it's not modern. Justin, what do you say? >>Yeah. I mean, I think new isn't modern, right? I think it's the, it's the new data stack. It's the cloud data stack, but that doesn't necessarily mean it's modern. I think a lot of the components actually are exactly the same as what we've had for 40 years, rather than Terra data. You have snowflake rather than Informatica you have five trend. So it's the same general stack, just, you know, a cloud version of it. And I think a lot of the challenges that it plagued us for 40 years still maintain. >>So lemme come back to you just, but okay. But, but there are differences, right? I mean, you can scale, you can throw resources at the problem. You can separate compute from storage. You really, you know, there's a lot of money being thrown at that by venture capitalists and snowflake, you mentioned it's competitors. So that's different. Is it not, is that not at least an aspect of, of modern dial it up, dial it down. So what, what do you say to that? >>Well, it, it is, it's certainly taking, you know, what the cloud offers and taking advantage of that, but it's important to note that the cloud data warehouses out there are really just separating their compute from their storage. So it's allowing them to scale up and down, but your data still stored in a proprietary format. You're still locked in. You still have to ingest the data to get it even prepared for analysis. So a lot of the same sort of structural constraints that exist with the old enterprise data warehouse model OnPrem still exist just yes, a little bit more elastic now because the cloud offers that. >>So Theresa, let me go to you cuz you have cloud first in your, in your, your title. So what's what say you to this conversation? >>Well, even the cloud providers are looking towards more of a cloud continuum, right? So the centralized cloud, as we know it, maybe data lake data warehouse in the central place, that's not even how the cloud providers are looking at it. They have news query services. Every provider has one that really expands those queries to be beyond a single location. And if we look at a lot of where our, the future goes, right, that that's gonna very much fall the same thing. There was gonna be more edge. There's gonna be more on premise because of data sovereignty, data gravity, because you're working with different parts of the business that have already made major cloud investments in different cloud providers. Right? So there's a lot of reasons why the modern, I guess, the next modern generation of the data staff needs to be much more federated. >>Okay. So Richard, how do you deal with this? You you've obviously got, you know, the technical debt, the existing infrastructure it's on the books. You don't wanna just throw it out. A lot of, lot of conversation about modernizing applications, which a lot of times is a, you know, a microservices layer on top of leg legacy apps. How do you think about the modern data stack? >>Well, I think probably the first thing to say is that the stack really has to include the processes and people around the data as well is all well and good changing the technology. But if you don't modernize how people use that technology, then you're not going to be able to, to scale because just cuz you can scale CPU and storage doesn't mean you can get more people to use your data, to generate you more, more value for the business. And so what we've been looking at is really changing in very much aligned to data products and, and data mesh. How do you enable more people to consume the service and have the stack respond in a way that keeps costs low? Because that's important for our customers consuming this data, but also allows people to occasionally run enormous queries and then tick along with smaller ones when required. And it's a good job we did because during COVID all of a sudden we had enormous pressures on our data platform to answer really important life threatening queries. And if we couldn't scale both our data stack and our teams, we wouldn't have been able to answer those as quickly as we had. So I think the stack needs to support a scalable business, not just the technology itself. >>Well thank you for that. So Justin let's, let's try to break down what the critical aspects are of the modern data stack. So you think about the past, you know, five, seven years cloud obviously has given a different pricing model. De-risked experimentation, you know that we talked about the ability to scale up scale down, but it's, I'm, I'm taking away that that's not enough based on what Richard just said. The modern data stack has to serve the business and enable the business to build data products. I, I buy that. I'm a big fan of the data mesh concepts, even though we're early days. So what are the critical aspects if you had to think about, you know, paying, maybe putting some guardrails and definitions around the modern data stack, what does that look like? What are some of the attributes and, and principles there >>Of, of how it should look like or, or how >>It's yeah. What it should be. >>Yeah. Yeah. Well, I think, you know, in, in Theresa mentioned this in, in a previous segment about the data warehouse is not necessarily going to disappear. It just becomes one node, one element of the overall data mesh. And I, I certainly agree with that. So by no means, are we suggesting that, you know, snowflake or Redshift or whatever cloud data warehouse you may be using is going to disappear, but it's, it's not going to become the end all be all. It's not the, the central single source of truth. And I think that's the paradigm shift that needs to occur. And I think it's also worth noting that those who were the early adopters of the modern data stack were primarily digital, native born in the cloud young companies who had the benefit of, of idealism. They had the benefit of it was starting with a clean slate that does not reflect the vast majority of enterprises. >>And even those companies, as they grow up mature out of that ideal state, they go buy a business. Now they've got something on another cloud provider that has a different data stack and they have to deal with that heterogeneity that is just change and change is a part of life. And so I think there is an element here that is almost philosophical. It's like, do you believe in an absolute ideal where I can just fit everything into one place or do I believe in reality? And I think the far more pragmatic approach is really what data mesh represents. So to answer your question directly, I think it's adding, you know, the ability to access data that lives outside of the data warehouse, maybe living in open data formats in a data lake or accessing operational systems as well. Maybe you want to directly access data that lives in an Oracle database or a Mongo database or, or what have you. So creating that flexibility to really Futureproof yourself from the inevitable change that you will, you won't encounter over time. >>So thank you. So there, based on what Justin just said, I, my takeaway there is it's inclusive, whether it's a data Mar data hub, data lake data warehouse, it's a, just a node on the mesh. Okay. I get that. Does that include there on Preem data? O obviously it has to, what are you seeing in terms of the ability to, to take that data mesh concept on Preem? I mean, most implementations I've seen in data mesh, frankly really aren't, you know, adhering to the philosophy. They're maybe, maybe it's data lake and maybe it's using glue. You look at what JPMC is doing. Hello, fresh, a lot of stuff happening on the AWS cloud in that, you know, closed stack, if you will. What's the answer to that Theresa? >>I mean, I, I think it's a killer case for data. Me, the fact that you have valuable data sources, OnPrem, and then yet you still wanna modernize and take the best of cloud cloud is still, like we mentioned, there's a lot of great reasons for it around the economics and the way ability to tap into the innovation that the cloud providers are giving around data and AI architecture. It's an easy button. So the mesh allows you to have the best of both worlds. You can start using the data products on-prem or in the existing systems that are working already. It's meaningful for the business. At the same time, you can modernize the ones that make business sense because it needs better performance. It needs, you know, something that is, is cheaper or, or maybe just tap into better analytics to get better insights, right? So you're gonna be able to stretch and really have the best of both worlds. That, again, going back to Richard's point, that is meaningful by the business. Not everything has to have that one size fits all set a tool. >>Okay. Thank you. So Richard, you know, talking about data as product, wonder if we could give us your perspectives here, what are the advantages of treating data as a product? What, what role do data products have in the modern data stack? We talk about monetizing data. What are your thoughts on data products? >>So for us, one of the most important data products that we've been creating is taking data that is healthcare data across a wide variety of different settings. So information about patients' demographics about their, their treatment, about their medications and so on, and taking that into a standards format that can be utilized by a wide variety of different researchers because misinterpreting that data or having the data not presented in the way that the user is expecting means that you generate the wrong insight. And in any business, that's clearly not a desirable outcome, but when that insight is so critical, as it might be in healthcare or some security settings, you really have to have gone to the trouble of understanding the data, presenting it in a format that everyone can clearly agree on. And then letting people consume in a very structured, managed way, even if that data comes from a variety of different sources in, in, in the first place. And so our data product journey has really begun by standardizing data across a number of different silos through the data mesh. So we can present out both internally and through the right governance externally to, to researchers. >>So that data product through whatever APIs is, is accessible, it's discoverable, but it's obviously gotta be governed as well. You mentioned you, you appropriately provided to internally. Yeah. But also, you know, external folks as well. So the, so you've, you've architected that capability today >>We have, and because the data is standard, it can generate value much more quickly and we can be sure of the security and, and, and value that that's providing because the data product isn't just about formatting the data into the correct tables, it's understanding what it means to redact the data or to remove certain rows from it or to interpret what a date actually means. Is it the start of the contract or the start of the treatment or the date of birth of a patient? These things can be lost in the data storage without having the proper product management around the data to say in a very clear business context, what does this data mean? And what does it mean to process this data for a particular use case? >>Yeah, it makes sense. It's got the context. If the, if the domains own the data, you, you gotta cut through a lot of the, the, the centralized teams, the technical teams that, that data agnostic, they don't really have that context. All right. Let's send Justin, how does Starburst fit into this modern data stack? Bring us home. >>Yeah. So I think for us, it's really providing our customers with, you know, the flexibility to operate and analyze data that lives in a wide variety of different systems. Ultimately giving them that optionality, you know, and optionality provides the ability to reduce costs, store more in a data lake rather than data warehouse. It provides the ability for the fastest time to insight to access the data directly where it lives. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, you can really create and, and curate, you know, data as a product to be shared and consumed. So we're trying to help enable the data mesh, you know, model and make that an appropriate compliment to, you know, the, the, the modern data stack that people have today. >>Excellent. Hey, I wanna thank Justin Theresa and Richard for joining us today. You guys are great. I big believers in the, in the data mesh concept, and I think, you know, we're seeing the future of data architecture. So thank you. Now, remember, all these conversations are gonna be available on the cube.net for on-demand viewing. You can also go to starburst.io. They have some great content on the website and they host some really thought provoking interviews and, and, and they have awesome resources, lots of data mesh conversations over there, and really good stuff in, in the resource section. So check that out. Thanks for watching the data doesn't lie or does it made possible by Starburst data? This is Dave Valante for the cube, and we'll see you next time. >>The explosion of data sources has forced organizations to modernize their systems and architecture and come to terms with one size does not fit all for data management today. Your teams are constantly moving and copying data, which requires time management. And in some cases, double paying for compute resources. Instead, what if you could access all your data anywhere using the BI tools and SQL skills your users already have. And what if this also included enterprise security and fast performance with Starburst enterprise, you can provide your data consumers with a single point of secure access to all of your data, no matter where it lives with features like strict, fine grained, access control, end to end data encryption and data masking Starburst meets the security standards of the largest companies. Starburst enterprise can easily be deployed anywhere and managed with insights where data teams holistically view their clusters operation and query execution. So they can reach meaningful business decisions faster, all this with the support of the largest team of Trino experts in the world, delivering fully tested stable releases and available to support you 24 7 to unlock the value in all of your data. You need a solution that easily fits with what you have today and can adapt to your architecture. Tomorrow. Starbust enterprise gives you the fastest path from big data to better decisions, cuz your team can't afford to wait. Trino was created to empower analytics anywhere and Starburst enterprise was created to give you the enterprise grade performance, connectivity, security management, and support your company needs organizations like Zolando Comcast and FINRA rely on Starburst to move their businesses forward. Contact us to get started.
SUMMARY :
famously said the best minds of my generation are thinking about how to get people to the data warehouse ever have featured parody with the data lake or vice versa is So, you know, despite being the industry leader for 40 years, not one of their customers truly had So Richard, from a practitioner's point of view, you know, what, what are your thoughts? although if you were starting from a Greenfield site and you were building something brand new, Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, I, I think you gotta have centralized governance, right? So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, And you can think of them Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, you know, for many, many years to come. But I think the reality is, you know, the data mesh model basically says, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing that the mesh actually allows you to use all of them. But it creates what I would argue are two, you know, Well, it absolutely depends on some of the tooling and processes that you put in place around those do an analytic queries and with data that's all dispersed all over the, how are you seeing your the best to, to create, you know, data as a product ultimately to be consumed. open platforms are the best path to the future of data But what if you could spend less you create a single point of access to your data, no matter where it's stored. give you the performance and control that you can get with a proprietary system. I remember in the very early days, people would say, you you'll never get performance because And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, know it takes six or seven it is an evolving, you know, spectrum, but, but from your perspective, And what you don't want to end up So Jess, let me play devil's advocate here a little bit, and I've talked to Shaak about this and you know, And I think similarly, you know, being able to connect to an external table that lives in an open data format, Well, that's interesting reminded when I, you know, I see the, the gas price, And I think, you know, I loved what Richard said. not as many te data customers, but, but a lot of Oracle customers and they, you know, And so for those different teams, they can get to an ROI more quickly with different technologies that strike me, you know, the data brick snowflake, you know, thing is, oh, is a lot of fun for analysts So the advice that I saw years ago was if you have open source technologies, And in world of Oracle, you know, normally it's the staff, easy to discover and consume via, you know, the creation of data products as well. really modern, or is it the same wine new bottle? And with Starburst, you can perform analytics anywhere in light of your world. And that is the claim that today's So it's the same general stack, just, you know, a cloud version of it. So lemme come back to you just, but okay. So a lot of the same sort of structural constraints that exist with So Theresa, let me go to you cuz you have cloud first in your, in your, the data staff needs to be much more federated. you know, a microservices layer on top of leg legacy apps. So I think the stack needs to support a scalable So you think about the past, you know, five, seven years cloud obviously has given What it should be. And I think that's the paradigm shift that needs to occur. data that lives outside of the data warehouse, maybe living in open data formats in a data lake seen in data mesh, frankly really aren't, you know, adhering to So the mesh allows you to have the best of both worlds. So Richard, you know, talking about data as product, wonder if we could give us your perspectives is expecting means that you generate the wrong insight. But also, you know, around the data to say in a very clear business context, It's got the context. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, This is Dave Valante for the cube, and we'll see you next time. You need a solution that easily fits with what you have today and can adapt
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Richard | PERSON | 0.99+ |
Dave Lanta | PERSON | 0.99+ |
Jess Borgman | PERSON | 0.99+ |
Justin | PERSON | 0.99+ |
Theresa | PERSON | 0.99+ |
Justin Borgman | PERSON | 0.99+ |
Teresa | PERSON | 0.99+ |
Jeff Ocker | PERSON | 0.99+ |
Richard Jarvis | PERSON | 0.99+ |
Dave Valante | PERSON | 0.99+ |
Justin Boardman | PERSON | 0.99+ |
six | QUANTITY | 0.99+ |
Dani | PERSON | 0.99+ |
Massachusetts | LOCATION | 0.99+ |
20 cents | QUANTITY | 0.99+ |
Teradata | ORGANIZATION | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
Jamma | PERSON | 0.99+ |
UK | LOCATION | 0.99+ |
FINRA | ORGANIZATION | 0.99+ |
40 years | QUANTITY | 0.99+ |
Kurt Monash | PERSON | 0.99+ |
20% | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
five | QUANTITY | 0.99+ |
Jess | PERSON | 0.99+ |
2011 | DATE | 0.99+ |
Starburst | ORGANIZATION | 0.99+ |
10 | QUANTITY | 0.99+ |
Accenture | ORGANIZATION | 0.99+ |
seven years | QUANTITY | 0.99+ |
thousands | QUANTITY | 0.99+ |
pythons | TITLE | 0.99+ |
Boston | LOCATION | 0.99+ |
GDPR | TITLE | 0.99+ |
Today | DATE | 0.99+ |
two models | QUANTITY | 0.99+ |
Zolando Comcast | ORGANIZATION | 0.99+ |
Gemma | PERSON | 0.99+ |
Starbust | ORGANIZATION | 0.99+ |
JPMC | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
Javas | TITLE | 0.99+ |
today | DATE | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
millions | QUANTITY | 0.99+ |
first lie | QUANTITY | 0.99+ |
10 | DATE | 0.99+ |
12 years | QUANTITY | 0.99+ |
one place | QUANTITY | 0.99+ |
Tomorrow | DATE | 0.99+ |
Starburst The Data Lies FULL V1
>>In 2011, early Facebook employee and Cloudera co-founder Jeff Ocker famously said the best minds of my generation are thinking about how to get people to click on ads. And that sucks. Let's face it more than a decade later organizations continue to be frustrated with how difficult it is to get value from data and build a truly agile data-driven enterprise. What does that even mean? You ask? Well, it means that everyone in the organization has the data they need when they need it. In a context that's relevant to advance the mission of an organization. Now that could mean cutting cost could mean increasing profits, driving productivity, saving lives, accelerating drug discovery, making better diagnoses, solving, supply chain problems, predicting weather disasters, simplifying processes, and thousands of other examples where data can completely transform people's lives beyond manipulating internet users to behave a certain way. We've heard the prognostications about the possibilities of data before and in fairness we've made progress, but the hard truth is the original promises of master data management, enterprise data, warehouses, data marts, data hubs, and yes, even data lakes were broken and left us wanting from more welcome to the data doesn't lie, or doesn't a series of conversations produced by the cube and made possible by Starburst data. >>I'm your host, Dave Lanta and joining me today are three industry experts. Justin Borgman is this co-founder and CEO of Starburst. Richard Jarvis is the CTO at EMI health and Theresa tongue is cloud first technologist at Accenture. Today we're gonna have a candid discussion that will expose the unfulfilled and yes, broken promises of a data past we'll expose data lies, big lies, little lies, white lies, and hidden truths. And we'll challenge, age old data conventions and bust some data myths. We're debating questions like is the demise of a single source of truth. Inevitable will the data warehouse ever have featured parody with the data lake or vice versa is the so-called modern data stack, simply centralization in the cloud, AKA the old guards model in new cloud close. How can organizations rethink their data architectures and regimes to realize the true promises of data can and will and open ecosystem deliver on these promises in our lifetimes, we're spanning much of the Western world today. Richard is in the UK. Teresa is on the west coast and Justin is in Massachusetts with me. I'm in the cube studios about 30 miles outside of Boston folks. Welcome to the program. Thanks for coming on. Thanks for having us. Let's get right into it. You're very welcome. Now here's the first lie. The most effective data architecture is one that is centralized with a team of data specialists serving various lines of business. What do you think Justin? >>Yeah, definitely a lie. My first startup was a company called hit adapt, which was an early SQL engine for hit that was acquired by Teradata. And when I got to Teradata, of course, Teradata is the pioneer of that central enterprise data warehouse model. One of the things that I found fascinating was that not one of their customers had actually lived up to that vision of centralizing all of their data into one place. They all had data silos. They all had data in different systems. They had data on prem data in the cloud. You know, those companies were acquiring other companies and inheriting their data architecture. So, you know, despite being the industry leader for 40 years, not one of their customers truly had everything in one place. So I think definitely history has proven that to be a lie. >>So Richard, from a practitioner's point of view, you know, what, what are your thoughts? I mean, there, there's a lot of pressure to cut cost, keep things centralized, you know, serve the business as best as possible from that standpoint. What, what is your experience show? >>Yeah, I mean, I think I would echo Justin's experience really that we, as a business have grown up through acquisition, through storing data in different places sometimes to do information governance in different ways to store data in, in a platform that's close to data experts, people who really understand healthcare data from pharmacies or from, from doctors. And so, although if you were starting from a Greenfield site and you were building something brand new, you might be able to centralize all the data and all of the tooling and teams in one place. The reality is that that businesses just don't grow up like that. And, and it's just really impossible to get that academic perfection of, of storing everything in one place. >>Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, you know, right. You actually did have to have a single version of the truth for certain financial data, but really for those, some of those other use cases, I, I mentioned, I, I do feel like the industry has kinda let us down. What's your take on this? Where does it make sense to have that sort of centralized approach versus where does it make sense to maybe decentralized? >>I, I think you gotta have centralized governance, right? So from the central team, for things like star Oxley, for things like security for certainly very core data sets, having a centralized set of roles, responsibilities to really QA, right. To serve as a design authority for your entire data estate, just like you might with security, but how it's implemented has to be distributed. Otherwise you're not gonna be able to scale. Right? So being able to have different parts of the business really make the right data investments for their needs. And then ultimately you're gonna collaborate with your partners. So partners that are not within the company, right. External partners, we're gonna see a lot more data sharing and model creation. And so you're definitely going to be decentralized. >>So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, on data mesh. It was a great program. You invited Jamma, Dani, of course, she's the creator of the data mesh. And her one of our fundamental premises is that you've got this hyper specialized team that you've gotta go through. And if you want anything, but at the same time, these, these individuals actually become a bottleneck, even though they're some of the most talented people in the organization. So I guess question for you, Richard, how do you deal with that? Do you, do you organize so that there are a few sort of rock stars that, that, you know, build cubes and, and the like, and, and, and, or have you had any success in sort of decentralizing with, you know, your, your constituencies, that data model? >>Yeah. So, so we absolutely have got rockstar, data scientists and data guardians. If you like people who understand what it means to use this data, particularly as the data that we use at emos is very private it's healthcare information. And some of the, the rules and regulations around using the data are very complex and, and strict. So we have to have people who understand the usage of the data, then people who understand how to build models, how to process the data effectively. And you can think of them like consultants to the wider business, because a pharmacist might not understand how to structure a SQL query, but they do understand how they want to process medication information to improve patient lives. And so that becomes a, a consulting type experience from a, a set of rock stars to help a, a more decentralized business who needs to, to understand the data and to generate some valuable output. >>Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, I got a centralized team and that's the most cost effective way to serve the business. Otherwise I got, I got duplication. What do you say to that? >>Well, I, I would argue it's probably not the most cost effective and, and the reason being really twofold. I think, first of all, when you are deploying a enterprise data warehouse model, the, the data warehouse itself is very expensive, generally speaking. And so you're putting all of your most valuable data in the hands of one vendor who now has tremendous leverage over you, you know, for many, many years to come. I think that's the story at Oracle or Terra data or other proprietary database systems. But the other aspect I think is that the reality is those central data warehouse teams is as much as they are experts in the technology. They don't necessarily understand the data itself. And this is one of the core tenants of data mash that that jam writes about is this idea of the domain owners actually know the data the best. >>And so by, you know, not only acknowledging that data is generally decentralized and to your earlier point about SAR, brain Oxley, maybe saving the data warehouse, I would argue maybe GDPR and data sovereignty will destroy it because data has to be decentralized for, for those laws to be compliant. But I think the reality is, you know, the data mesh model basically says, data's decentralized, and we're gonna turn that into an asset rather than a liability. And we're gonna turn that into an asset by empowering the people that know the data, the best to participate in the process of, you know, curating and creating data products for, for consumption. So I think when you think about it, that way, you're going to get higher quality data and faster time to insight, which is ultimately going to drive more revenue for your business and reduce costs. So I think that that's the way I see the two, the two models comparing and contrasting. >>So do you think the demise of the data warehouse is inevitable? I mean, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing infrastructure. Maybe they're gonna build on top of it, but what does that mean? Does that mean the E D w just becomes, you know, less and less valuable over time, or it's maybe just isolated to specific use cases. What's your take on that? >>Listen, I still would love all my data within a data warehouse would love it. Mastered would love it owned by essential team. Right? I think that's still what I would love to have. That's just not the reality, right? The investment to actually migrate and keep that up to date. I would say it's a losing battle. Like we've been trying to do it for a long time. Nobody has the budgets and then data changes, right? There's gonna be a new technology. That's gonna emerge that we're gonna wanna tap into. There's going to be not enough investment to bring all the legacy, but still very useful systems into that centralized view. So you keep the data warehouse. I think it's a very, very valuable, very high performance tool for what it's there for, but you could have this, you know, new mesh layer that still takes advantage of the things. I mentioned, the data products in the systems that are meaningful today and the data products that actually might span a number of systems, maybe either those that either source systems for the domains that know it best, or the consumer based systems and products that need to be packaged in a way that be really meaningful for that end user, right? Each of those are useful for a different part of the business and making sure that the mesh actually allows you to use all of them. >>So, Richard, let me ask you, you take, take Gemma's principles back to those. You got to, you know, domain ownership and, and, and data as product. Okay, great. Sounds good. But it creates what I would argue are two, you know, challenges, self-serve infrastructure let's park that for a second. And then in your industry, the one of the high, most regulated, most sensitive computational governance, how do you automate and ensure federated governance in that mesh model that Theresa was just talking about? >>Well, it absolutely depends on some of the tooling and processes that you put in place around those tools to be, to centralize the security and the governance of the data. And I think, although a data warehouse makes that very simple, cause it's a single tool, it's not impossible with some of the data mesh technologies that are available. And so what we've done at emus is we have a single security layer that sits on top of our data match, which means that no matter which user is accessing, which data source, we go through a well audited well understood security layer. That means that we know exactly who's got access to which data field, which data tables. And then everything that they do is, is audited in a very kind of standard way, regardless of the underlying data storage technology. So for me, although storing the data in one place might not be possible understanding where your source of truth is and securing that in a common way is still a valuable approach and you can do it without having to bring all that data into a single bucket so that it's all in one place. And, and so having done that and investing quite heavily in making that possible has paid dividends in terms of giving wider access to the platform and ensuring that only data that's available under GDPR and other regulations is being used by, by the data users. >>Yeah. So Justin, I mean, Democrat, we always talk about data democratization and you know, up until recently, they really haven't been line of sight as to how to get there. But do you have anything to add to this because you're essentially taking, you know, do an analytic queries and with data that's all dispersed all over the, how are you seeing your customers handle this, this challenge? >>Yeah. I mean, I think data products is a really interesting aspect of the answer to that. It allows you to, again, leverage the data domain owners, people know the data, the best to, to create, you know, data as a product ultimately to be consumed. And we try to represent that in our product as effectively a almost eCommerce like experience where you go and discover and look for the data products that have been created in your organization. And then you can start to consume them as, as you'd like. And so really trying to build on that notion of, you know, data democratization and self-service, and making it very easy to discover and, and start to use with whatever BI tool you, you may like, or even just running, you know, SQL queries yourself, >>Okay. G guys grab a sip of water. After this short break, we'll be back to debate whether proprietary or open platforms are the best path to the future of data excellence, keep it right there. >>Your company has more data than ever, and more people trying to understand it, but there's a problem. Your data is stored across multiple systems. It's hard to access and that delays analytics and ultimately decisions. The old method of moving all of your data into a single source of truth is slow and definitely not built for the volume of data we have today or where we are headed while your data engineers spent over half their time, moving data, your analysts and data scientists are left, waiting, feeling frustrated, unproductive, and unable to move the needle for your business. But what if you could spend less time moving or copying data? What if your data consumers could analyze all your data quickly? >>Starburst helps your teams run fast queries on any data source. We help you create a single point of access to your data, no matter where it's stored. And we support high concurrency, we solve for speed and scale, whether it's fast, SQL queries on your data lake or faster queries across multiple data sets, Starburst helps your teams run analytics anywhere you can't afford to wait for data to be available. Your team has questions that need answers. Now with Starburst, the wait is over. You'll have faster access to data with enterprise level security, easy connectivity, and 24 7 support from experts, organizations like Zolando Comcast and FINRA rely on Starburst to move their businesses forward. Contact our Trino experts to get started. >>We're back with Jess Borgman of Starburst and Richard Jarvis of EVAs health. Okay, we're gonna get to lie. Number two, and that is this an open source based platform cannot give you the performance and control that you can get with a proprietary system. Is that a lie? Justin, the enterprise data warehouse has been pretty dominant and has evolved and matured. Its stack has mature over the years. Why is it not the default platform for data? >>Yeah, well, I think that's become a lie over time. So I, I think, you know, if we go back 10 or 12 years ago with the advent of the first data lake really around Hudu, that probably was true that you couldn't get the performance that you needed to run fast, interactive, SQL queries in a data lake. Now a lot's changed in 10 or 12 years. I remember in the very early days, people would say, you you'll never get performance because you need to be column there. You need to store data in a column format. And then, you know, column formats we're introduced to, to data apes, you have Parque ORC file in aro that were created to ultimately deliver performance out of that. So, okay. We got, you know, largely over the performance hurdle, you know, more recently people will say, well, you don't have the ability to do updates and deletes like a traditional data warehouse. >>And now we've got the creation of new data formats, again like iceberg and Delta and Hodi that do allow for updates and delete. So I think the data lake has continued to mature. And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, know it takes six or seven years to build a functional database. I think that's that's right. And now we've had almost a decade go by. So, you know, these technologies have matured to really deliver very, very close to the same level performance and functionality of, of cloud data warehouses. So I think the, the reality is that's become a line and now we have large giant hyperscale internet companies that, you know, don't have the traditional data warehouse at all. They do all of their analytics in a data lake. So I think we've, we've proven that it's very much possible today. >>Thank you for that. And so Richard, talk about your perspective as a practitioner in terms of what open brings you versus, I mean, look closed is it's open as a moving target. I remember Unix used to be open systems and so it's, it is an evolving, you know, spectrum, but, but from your perspective, what does open give you that you can't get from a proprietary system where you are fearful of in a proprietary system? >>I, I suppose for me open buys us the ability to be unsure about the future, because one thing that's always true about technology is it evolves in a, a direction, slightly different to what people expect. And what you don't want to end up is done is backed itself into a corner that then prevents it from innovating. So if you have chosen a technology and you've stored trillions of records in that technology and suddenly a new way of processing or machine learning comes out, you wanna be able to take advantage and your competitive edge might depend upon it. And so I suppose for us, we acknowledge that we don't have perfect vision of what the future might be. And so by backing open storage technologies, we can apply a number of different technologies to the processing of that data. And that gives us the ability to remain relevant, innovate on our data storage. And we have bought our way out of the, any performance concerns because we can use cloud scale infrastructure to scale up and scale down as we need. And so we don't have the concerns that we don't have enough hardware today to process what we want to do, want to achieve. We can just scale up when we need it and scale back down. So open source has really allowed us to maintain the being at the cutting edge. >>So Jess, let me play devil's advocate here a little bit, and I've talked to Shaak about this and you know, obviously her vision is there's an open source that, that the data meshes open source, an open source tooling, and it's not a proprietary, you know, you're not gonna buy a data mesh. You're gonna build it with, with open source toolings and, and vendors like you are gonna support it, but to come back to sort of today, you can get to market with a proprietary solution faster. I'm gonna make that statement. You tell me if it's a lie and then you can say, okay, we support Apache iceberg. We're gonna support open source tooling, take a company like VMware, not really in the data business, but how, the way they embraced Kubernetes and, and you know, every new open source thing that comes along, they say, we do that too. Why can't proprietary systems do that and be as effective? >>Yeah, well, I think at least with the, within the data landscape saying that you can access open data formats like iceberg or, or others is, is a bit dis disingenuous because really what you're selling to your customer is a certain degree of performance, a certain SLA, and you know, those cloud data warehouses that can reach beyond their own proprietary storage drop all the performance that they were able to provide. So it is, it reminds me kind of, of, again, going back 10 or 12 years ago when everybody had a connector to Haddo and that they thought that was the solution, right? But the reality was, you know, a connector was not the same as running workloads in Haddo back then. And I think similarly, you know, being able to connect to an external table that lives in an open data format, you know, you're, you're not going to give it the performance that your customers are accustomed to. And at the end of the day, they're always going to be predisposed. They're always going to be incentivized to get that data ingested into the data warehouse, cuz that's where they have control. And you know, the bottom line is the database industry has really been built around vendor lockin. I mean, from the start, how, how many people love Oracle today, but our customers, nonetheless, I think, you know, lockin is, is, is part of this industry. And I think that's really what we're trying to change with open data formats. >>Well, that's interesting reminded when I, you know, I see the, the gas price, the tees or gas price I, I drive up and then I say, oh, that's the cash price credit card. I gotta pay 20 cents more, but okay. But so the, the argument then, so let me, let me come back to you, Justin. So what's wrong with saying, Hey, we support open data formats, but yeah, you're gonna get better performance if you, if you keep it into our closed system, are you saying that long term that's gonna come back and bite you cuz you're gonna end up, you mentioned Oracle, you mentioned Teradata. Yeah. That's by, by implication, you're saying that's where snowflake customers are headed. >>Yeah, absolutely. I think this is a movie that, you know, we've all seen before. At least those of us who've been in the industry long enough to, to see this movie play over a couple times. So I do think that's the future. And I think, you know, I loved what Richard said. I actually wrote it down. Cause I thought it was an amazing quote. He said, it buys us the ability to be unsure of the future. Th that that pretty much says it all the, the future is unknowable and the reality is using open data formats. You remain interoperable with any technology you want to utilize. If you want to use spark to train a machine learning model and you want to use Starbust to query via sequel, that's totally cool. They can both work off the same exact, you know, data, data sets by contrast, if you're, you know, focused on a proprietary model, then you're kind of locked in again to that model. I think the same applies to data, sharing to data products, to a wide variety of, of aspects of the data landscape that a proprietary approach kind of closes you in and locks you in. >>So I, I would say this Richard, I'd love to get your thoughts on it. Cause I talked to a lot of Oracle customers, not as many te data customers, but, but a lot of Oracle customers and they, you know, they'll admit, yeah, you know, they're jamming us on price and the license cost they give, but we do get value out of it. And so my question to you, Richard, is, is do the, let's call it data warehouse systems or the proprietary systems. Are they gonna deliver a greater ROI sooner? And is that in allure of, of that customers, you know, are attracted to, or can open platforms deliver as fast in ROI? >>I think the answer to that is it can depend a bit. It depends on your businesses skillset. So we are lucky that we have a number of proprietary teams that work in databases that provide our operational data capability. And we have teams of analytics and big data experts who can work with open data sets and open data formats. And so for those different teams, they can get to an ROI more quickly with different technologies for the business though, we can't do better for our operational data stores than proprietary databases. Today we can back off very tight SLAs to them. We can demonstrate reliability from millions of hours of those databases being run at enterprise scale, but for an analytics workload where increasing our business is growing in that direction, we can't do better than open data formats with cloud-based data mesh type technologies. And so it's not a simple answer. That one will always be the right answer for our business. We definitely have times when proprietary databases provide a capability that we couldn't easily represent or replicate with open technologies. >>Yeah. Richard, stay with you. You mentioned, you know, you know, some things before that, that strike me, you know, the data brick snowflake, you know, thing is, oh, is a lot of fun for analysts like me. You've got data bricks coming at it. Richard, you mentioned you have a lot of rockstar, data engineers, data bricks coming at it from a data engineering heritage. You get snowflake coming at it from an analytics heritage. Those two worlds are, are colliding people like PJI Mohan said, you know what? I think it's actually harder to play in the data engineering. So I E it's easier to for data engineering world to go into the analytics world versus the reverse, but thinking about up and coming engineers and developers preparing for this future of data engineering and data analytics, how, how should they be thinking about the future? What, what's your advice to those young people? >>So I think I'd probably fall back on general programming skill sets. So the advice that I saw years ago was if you have open source technologies, the pythons and Javas on your CV, you commander 20% pay, hike over people who can only do proprietary programming languages. And I think that's true of data technologies as well. And from a business point of view, that makes sense. I'd rather spend the money that I save on proprietary licenses on better engineers, because they can provide more value to the business that can innovate us beyond our competitors. So I think I would my advice to people who are starting here or trying to build teams to capitalize on data assets is begin with open license, free capabilities, because they're very cheap to experiment with. And they generate a lot of interest from people who want to join you as a business. And you can make them very successful early, early doors with, with your analytics journey. >>It's interesting. Again, analysts like myself, we do a lot of TCO work and have over the last 20 plus years. And in world of Oracle, you know, normally it's the staff, that's the biggest nut in total cost of ownership, not an Oracle. It's the it's the license cost is by far the biggest component in the, in the blame pie. All right, Justin, help us close out this segment. We've been talking about this sort of data mesh open, closed snowflake data bricks. Where does Starburst sort of as this engine for the data lake data lake house, the data warehouse fit in this, in this world? >>Yeah. So our view on how the future ultimately unfolds is we think that data lakes will be a natural center of gravity for a lot of the reasons that we described open data formats, lowest total cost of ownership, because you get to choose the cheapest storage available to you. Maybe that's S3 or Azure data lake storage, or Google cloud storage, or maybe it's on-prem object storage that you bought at a, at a really good price. So ultimately storing a lot of data in a deal lake makes a lot of sense, but I think what makes our perspective unique is we still don't think you're gonna get everything there either. We think that basically centralization of all your data assets is just an impossible endeavor. And so you wanna be able to access data that lives outside of the lake as well. So we kind of think of the lake as maybe the biggest place by volume in terms of how much data you have, but to, to have comprehensive analytics and to truly understand your business and understand it holistically, you need to be able to go access other data sources as well. And so that's the role that we wanna play is to be a single point of access for our customers, provide the right level of fine grained access controls so that the right people have access to the right data and ultimately make it easy to discover and consume via, you know, the creation of data products as well. >>Great. Okay. Thanks guys. Right after this quick break, we're gonna be back to debate whether the cloud data model that we see emerging and the so-called modern data stack is really modern, or is it the same wine new bottle? When it comes to data architectures, you're watching the cube, the leader in enterprise and emerging tech coverage. >>Your data is capable of producing incredible results, but data consumers are often left in the dark without fast access to the data they need. Starers makes your data visible from wherever it lives. Your company is acquiring more data in more places, more rapidly than ever to rely solely on a data centralization strategy. Whether it's in a lake or a warehouse is unrealistic. A single source of truth approach is no longer viable, but disconnected data silos are often left untapped. We need a new approach. One that embraces distributed data. One that enables fast and secure access to any of your data from anywhere with Starburst, you'll have the fastest query engine for the data lake that allows you to connect and analyze your disparate data sources no matter where they live Starburst provides the foundational technology required for you to build towards the vision of a decentralized data mesh Starburst enterprise and Starburst galaxy offer enterprise ready, connectivity, interoperability, and security features for multiple regions, multiple clouds and everchanging global regulatory requirements. The data is yours. And with Starburst, you can perform analytics anywhere in light of your world. >>Okay. We're back with Justin Boardman. CEO of Starbust Richard Jarvis is the CTO of EMI health and Theresa tongue is the cloud first technologist from Accenture. We're on July number three. And that is the claim that today's modern data stack is actually modern. So I guess that's the lie it's it is it's is that it's not modern. Justin, what do you say? >>Yeah. I mean, I think new isn't modern, right? I think it's the, it's the new data stack. It's the cloud data stack, but that doesn't necessarily mean it's modern. I think a lot of the components actually are exactly the same as what we've had for 40 years, rather than Terra data. You have snowflake rather than Informatica you have five trend. So it's the same general stack, just, you know, a cloud version of it. And I think a lot of the challenges that it plagued us for 40 years still maintain. >>So lemme come back to you just, but okay. But, but there are differences, right? I mean, you can scale, you can throw resources at the problem. You can separate compute from storage. You really, you know, there's a lot of money being thrown at that by venture capitalists and snowflake, you mentioned it's competitors. So that's different. Is it not, is that not at least an aspect of, of modern dial it up, dial it down. So what, what do you say to that? >>Well, it, it is, it's certainly taking, you know, what the cloud offers and taking advantage of that, but it's important to note that the cloud data warehouses out there are really just separating their compute from their storage. So it's allowing them to scale up and down, but your data still stored in a proprietary format. You're still locked in. You still have to ingest the data to get it even prepared for analysis. So a lot of the same sort of structural constraints that exist with the old enterprise data warehouse model OnPrem still exist just yes, a little bit more elastic now because the cloud offers that. >>So Theresa, let me go to you cuz you have cloud first in your, in your, your title. So what's what say you to this conversation? >>Well, even the cloud providers are looking towards more of a cloud continuum, right? So the centralized cloud, as we know it, maybe data lake data warehouse in the central place, that's not even how the cloud providers are looking at it. They have news query services. Every provider has one that really expands those queries to be beyond a single location. And if we look at a lot of where our, the future goes, right, that that's gonna very much fall the same thing. There was gonna be more edge. There's gonna be more on premise because of data sovereignty, data gravity, because you're working with different parts of the business that have already made major cloud investments in different cloud providers. Right? So there's a lot of reasons why the modern, I guess, the next modern generation of the data staff needs to be much more federated. >>Okay. So Richard, how do you deal with this? You you've obviously got, you know, the technical debt, the existing infrastructure it's on the books. You don't wanna just throw it out. A lot of, lot of conversation about modernizing applications, which a lot of times is a, you know, a microservices layer on top of leg legacy apps. How do you think about the modern data stack? >>Well, I think probably the first thing to say is that the stack really has to include the processes and people around the data as well is all well and good changing the technology. But if you don't modernize how people use that technology, then you're not going to be able to, to scale because just cuz you can scale CPU and storage doesn't mean you can get more people to use your data, to generate you more, more value for the business. And so what we've been looking at is really changing in very much aligned to data products and, and data mesh. How do you enable more people to consume the service and have the stack respond in a way that keeps costs low? Because that's important for our customers consuming this data, but also allows people to occasionally run enormous queries and then tick along with smaller ones when required. And it's a good job we did because during COVID all of a sudden we had enormous pressures on our data platform to answer really important life threatening queries. And if we couldn't scale both our data stack and our teams, we wouldn't have been able to answer those as quickly as we had. So I think the stack needs to support a scalable business, not just the technology itself. >>Well thank you for that. So Justin let's, let's try to break down what the critical aspects are of the modern data stack. So you think about the past, you know, five, seven years cloud obviously has given a different pricing model. De-risked experimentation, you know that we talked about the ability to scale up scale down, but it's, I'm, I'm taking away that that's not enough based on what Richard just said. The modern data stack has to serve the business and enable the business to build data products. I, I buy that. I'm a big fan of the data mesh concepts, even though we're early days. So what are the critical aspects if you had to think about, you know, paying, maybe putting some guardrails and definitions around the modern data stack, what does that look like? What are some of the attributes and, and principles there >>Of, of how it should look like or, or how >>It's yeah. What it should be. >>Yeah. Yeah. Well, I think, you know, in, in Theresa mentioned this in, in a previous segment about the data warehouse is not necessarily going to disappear. It just becomes one node, one element of the overall data mesh. And I, I certainly agree with that. So by no means, are we suggesting that, you know, snowflake or Redshift or whatever cloud data warehouse you may be using is going to disappear, but it's, it's not going to become the end all be all. It's not the, the central single source of truth. And I think that's the paradigm shift that needs to occur. And I think it's also worth noting that those who were the early adopters of the modern data stack were primarily digital, native born in the cloud young companies who had the benefit of, of idealism. They had the benefit of it was starting with a clean slate that does not reflect the vast majority of enterprises. >>And even those companies, as they grow up mature out of that ideal state, they go buy a business. Now they've got something on another cloud provider that has a different data stack and they have to deal with that heterogeneity that is just change and change is a part of life. And so I think there is an element here that is almost philosophical. It's like, do you believe in an absolute ideal where I can just fit everything into one place or do I believe in reality? And I think the far more pragmatic approach is really what data mesh represents. So to answer your question directly, I think it's adding, you know, the ability to access data that lives outside of the data warehouse, maybe living in open data formats in a data lake or accessing operational systems as well. Maybe you want to directly access data that lives in an Oracle database or a Mongo database or, or what have you. So creating that flexibility to really Futureproof yourself from the inevitable change that you will, you won't encounter over time. >>So thank you. So there, based on what Justin just said, I, my takeaway there is it's inclusive, whether it's a data Mar data hub, data lake data warehouse, it's a, just a node on the mesh. Okay. I get that. Does that include there on Preem data? O obviously it has to, what are you seeing in terms of the ability to, to take that data mesh concept on Preem? I mean, most implementations I've seen in data mesh, frankly really aren't, you know, adhering to the philosophy. They're maybe, maybe it's data lake and maybe it's using glue. You look at what JPMC is doing. Hello, fresh, a lot of stuff happening on the AWS cloud in that, you know, closed stack, if you will. What's the answer to that Theresa? >>I mean, I, I think it's a killer case for data. Me, the fact that you have valuable data sources, OnPrem, and then yet you still wanna modernize and take the best of cloud cloud is still, like we mentioned, there's a lot of great reasons for it around the economics and the way ability to tap into the innovation that the cloud providers are giving around data and AI architecture. It's an easy button. So the mesh allows you to have the best of both worlds. You can start using the data products on-prem or in the existing systems that are working already. It's meaningful for the business. At the same time, you can modernize the ones that make business sense because it needs better performance. It needs, you know, something that is, is cheaper or, or maybe just tap into better analytics to get better insights, right? So you're gonna be able to stretch and really have the best of both worlds. That, again, going back to Richard's point, that is meaningful by the business. Not everything has to have that one size fits all set a tool. >>Okay. Thank you. So Richard, you know, talking about data as product, wonder if we could give us your perspectives here, what are the advantages of treating data as a product? What, what role do data products have in the modern data stack? We talk about monetizing data. What are your thoughts on data products? >>So for us, one of the most important data products that we've been creating is taking data that is healthcare data across a wide variety of different settings. So information about patients' demographics about their, their treatment, about their medications and so on, and taking that into a standards format that can be utilized by a wide variety of different researchers because misinterpreting that data or having the data not presented in the way that the user is expecting means that you generate the wrong insight. And in any business, that's clearly not a desirable outcome, but when that insight is so critical, as it might be in healthcare or some security settings, you really have to have gone to the trouble of understanding the data, presenting it in a format that everyone can clearly agree on. And then letting people consume in a very structured, managed way, even if that data comes from a variety of different sources in, in, in the first place. And so our data product journey has really begun by standardizing data across a number of different silos through the data mesh. So we can present out both internally and through the right governance externally to, to researchers. >>So that data product through whatever APIs is, is accessible, it's discoverable, but it's obviously gotta be governed as well. You mentioned you, you appropriately provided to internally. Yeah. But also, you know, external folks as well. So the, so you've, you've architected that capability today >>We have, and because the data is standard, it can generate value much more quickly and we can be sure of the security and, and, and value that that's providing because the data product isn't just about formatting the data into the correct tables, it's understanding what it means to redact the data or to remove certain rows from it or to interpret what a date actually means. Is it the start of the contract or the start of the treatment or the date of birth of a patient? These things can be lost in the data storage without having the proper product management around the data to say in a very clear business context, what does this data mean? And what does it mean to process this data for a particular use case? >>Yeah, it makes sense. It's got the context. If the, if the domains own the data, you, you gotta cut through a lot of the, the, the centralized teams, the technical teams that, that data agnostic, they don't really have that context. All right. Let's send Justin, how does Starburst fit into this modern data stack? Bring us home. >>Yeah. So I think for us, it's really providing our customers with, you know, the flexibility to operate and analyze data that lives in a wide variety of different systems. Ultimately giving them that optionality, you know, and optionality provides the ability to reduce costs, store more in a data lake rather than data warehouse. It provides the ability for the fastest time to insight to access the data directly where it lives. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, you can really create and, and curate, you know, data as a product to be shared and consumed. So we're trying to help enable the data mesh, you know, model and make that an appropriate compliment to, you know, the, the, the modern data stack that people have today. >>Excellent. Hey, I wanna thank Justin Theresa and Richard for joining us today. You guys are great. I big believers in the, in the data mesh concept, and I think, you know, we're seeing the future of data architecture. So thank you. Now, remember, all these conversations are gonna be available on the cube.net for on-demand viewing. You can also go to starburst.io. They have some great content on the website and they host some really thought provoking interviews and, and, and they have awesome resources, lots of data mesh conversations over there, and really good stuff in, in the resource section. So check that out. Thanks for watching the data doesn't lie or does it made possible by Starburst data? This is Dave Valante for the cube, and we'll see you next time. >>The explosion of data sources has forced organizations to modernize their systems and architecture and come to terms with one size does not fit all for data management today. Your teams are constantly moving and copying data, which requires time management. And in some cases, double paying for compute resources. Instead, what if you could access all your data anywhere using the BI tools and SQL skills your users already have. And what if this also included enterprise security and fast performance with Starburst enterprise, you can provide your data consumers with a single point of secure access to all of your data, no matter where it lives with features like strict, fine grained, access control, end to end data encryption and data masking Starburst meets the security standards of the largest companies. Starburst enterprise can easily be deployed anywhere and managed with insights where data teams holistically view their clusters operation and query execution. So they can reach meaningful business decisions faster, all this with the support of the largest team of Trino experts in the world, delivering fully tested stable releases and available to support you 24 7 to unlock the value in all of your data. You need a solution that easily fits with what you have today and can adapt to your architecture. Tomorrow. Starbust enterprise gives you the fastest path from big data to better decisions, cuz your team can't afford to wait. Trino was created to empower analytics anywhere and Starburst enterprise was created to give you the enterprise grade performance, connectivity, security management, and support your company needs organizations like Zolando Comcast and FINRA rely on Starburst to move their businesses forward. Contact us to get started.
SUMMARY :
famously said the best minds of my generation are thinking about how to get people to the data warehouse ever have featured parody with the data lake or vice versa is So, you know, despite being the industry leader for 40 years, not one of their customers truly had So Richard, from a practitioner's point of view, you know, what, what are your thoughts? although if you were starting from a Greenfield site and you were building something brand new, Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, I, I think you gotta have centralized governance, right? So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, And you can think of them Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, you know, for many, many years to come. But I think the reality is, you know, the data mesh model basically says, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing that the mesh actually allows you to use all of them. But it creates what I would argue are two, you know, Well, it absolutely depends on some of the tooling and processes that you put in place around those do an analytic queries and with data that's all dispersed all over the, how are you seeing your the best to, to create, you know, data as a product ultimately to be consumed. open platforms are the best path to the future of data But what if you could spend less you create a single point of access to your data, no matter where it's stored. give you the performance and control that you can get with a proprietary system. I remember in the very early days, people would say, you you'll never get performance because And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, know it takes six or seven it is an evolving, you know, spectrum, but, but from your perspective, And what you don't want to end up So Jess, let me play devil's advocate here a little bit, and I've talked to Shaak about this and you know, And I think similarly, you know, being able to connect to an external table that lives in an open data format, Well, that's interesting reminded when I, you know, I see the, the gas price, And I think, you know, I loved what Richard said. not as many te data customers, but, but a lot of Oracle customers and they, you know, And so for those different teams, they can get to an ROI more quickly with different technologies that strike me, you know, the data brick snowflake, you know, thing is, oh, is a lot of fun for analysts So the advice that I saw years ago was if you have open source technologies, And in world of Oracle, you know, normally it's the staff, easy to discover and consume via, you know, the creation of data products as well. really modern, or is it the same wine new bottle? And with Starburst, you can perform analytics anywhere in light of your world. And that is the claim that today's So it's the same general stack, just, you know, a cloud version of it. So lemme come back to you just, but okay. So a lot of the same sort of structural constraints that exist with So Theresa, let me go to you cuz you have cloud first in your, in your, the data staff needs to be much more federated. you know, a microservices layer on top of leg legacy apps. So I think the stack needs to support a scalable So you think about the past, you know, five, seven years cloud obviously has given What it should be. And I think that's the paradigm shift that needs to occur. data that lives outside of the data warehouse, maybe living in open data formats in a data lake seen in data mesh, frankly really aren't, you know, adhering to So the mesh allows you to have the best of both worlds. So Richard, you know, talking about data as product, wonder if we could give us your perspectives is expecting means that you generate the wrong insight. But also, you know, around the data to say in a very clear business context, It's got the context. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, This is Dave Valante for the cube, and we'll see you next time. You need a solution that easily fits with what you have today and can adapt
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Richard | PERSON | 0.99+ |
Dave Lanta | PERSON | 0.99+ |
Jess Borgman | PERSON | 0.99+ |
Justin | PERSON | 0.99+ |
Theresa | PERSON | 0.99+ |
Justin Borgman | PERSON | 0.99+ |
Teresa | PERSON | 0.99+ |
Jeff Ocker | PERSON | 0.99+ |
Richard Jarvis | PERSON | 0.99+ |
Dave Valante | PERSON | 0.99+ |
Justin Boardman | PERSON | 0.99+ |
six | QUANTITY | 0.99+ |
Dani | PERSON | 0.99+ |
Massachusetts | LOCATION | 0.99+ |
20 cents | QUANTITY | 0.99+ |
Teradata | ORGANIZATION | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
Jamma | PERSON | 0.99+ |
UK | LOCATION | 0.99+ |
FINRA | ORGANIZATION | 0.99+ |
40 years | QUANTITY | 0.99+ |
Kurt Monash | PERSON | 0.99+ |
20% | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
five | QUANTITY | 0.99+ |
Jess | PERSON | 0.99+ |
2011 | DATE | 0.99+ |
Starburst | ORGANIZATION | 0.99+ |
10 | QUANTITY | 0.99+ |
Accenture | ORGANIZATION | 0.99+ |
seven years | QUANTITY | 0.99+ |
thousands | QUANTITY | 0.99+ |
pythons | TITLE | 0.99+ |
Boston | LOCATION | 0.99+ |
GDPR | TITLE | 0.99+ |
Today | DATE | 0.99+ |
two models | QUANTITY | 0.99+ |
Zolando Comcast | ORGANIZATION | 0.99+ |
Gemma | PERSON | 0.99+ |
Starbust | ORGANIZATION | 0.99+ |
JPMC | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
Javas | TITLE | 0.99+ |
today | DATE | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
millions | QUANTITY | 0.99+ |
first lie | QUANTITY | 0.99+ |
10 | DATE | 0.99+ |
12 years | QUANTITY | 0.99+ |
one place | QUANTITY | 0.99+ |
Tomorrow | DATE | 0.99+ |
Kevin Warenda and Drew Schlussel Wasabi Secure Storage Hot Takes
>>Drew and I are pleased to welcome Kevin Warda. Who's the director of information technology services at the Hotchkis school, a very prestigious and well respected boarding school in the beautiful Northwest corner of Connecticut. Hello, Kevin? >>Hello. It's nice to be here. Thanks for having me. >>Yeah, you, you bet. Hey, tell us a little bit more about the Hotchkis school and your role. >>Sure. The hacha school is an independent boarding school, grades nine through 12, as you said, very prestigious and in an absolutely beautiful location on the deepest freshwater lake in Connecticut, we have 500 K 500 acre main campus and a 200 acre farm down the street. My role is as the director of information technology services, essentially to oversee all of the technology that supports the school operations, academics, sports, everything we do on campus. >>Yeah. And you've had a very strong history in the educational field, you know, from that lens what's, what's the unique, you know, or not unique, but the pressing security challenge that's top of mind for you. >>I think that it's clear that educational institutions are a target these days, especially for ransomware. We have a lot of data that can be used by threat actors and schools are often underfunded in the area of it, security it in general sometimes. So I think threat actors often see us as easy targets or at least worthwhile to try to get into, >>Because specifically you are potentially spread thin underfunded. You gotta, you, you got students, you got teachers. So there really are some, are there any specific data privacy concerns as well around student privacy or regulations that you can speak to? >>Certainly because of the fact that we're an independent boarding school, we operate things like even a health center. So data privacy regulations across the board in terms of just student data rights Ferra, some of our students are under 18. So data privacy laws such as Copa apply HIPAA can apply. We have PCI regulations with many of our financial transactions, whether it be fundraising through alumni development, or even just accepting the revenue for tuition. So it's, it's a unique place to be. Again, we operate very much like a college would, right? We have all the trappings of a, of a private college in terms of all the operations we do. And that's what I love most about working education is that it's, it's all the industries combined in many ways. >>Very cool. So let's talk about some of the defense strategies from a practitioner point of view, then I want to bring in, in drew to the conversation. So what are the, the best practice and the right strategies from your standpoint of defending your, your data? >>Well, we take a defense and depth approach. So we layer multiple technologies on top of each other to make sure that no single failure is a key to getting beyond those defenses. We also keep it simple. You know, I think there's some core things that all organizations need to do these days in including, you know, vulnerability scanning, patching using multifactor authentication and having really excellent backups in case something does happen. >>Drew, are you seeing any similar patterns across other industries or customers? I mean, I know we're talking about some uniqueness in the education market, but what, what, what can we learn from other adjacent industries? >>Yeah, I, you know, Kevin is spot on and I love hearing what, what he's doing going back to our prior conversation about zero trust, right? That defense in depth approach is beautifully aligned, right? With a zero trust approach, especially things like multifactor authentication, always shocked at how few folks are applying that very, very simple technology and, and across the board, right? I mean, Kevin is referring to, you know, financial industry, healthcare industry, even, you know, the security and police, right. They need to make sure that the data that they're keeping evidence right. Is secure and imutable right, because that's evidence, >>Well, Kevin paint a picture for us, if you would. So you were primarily on, Preem looking at potentially, you know, using more cloud, you were a VMware shop, but tell us, paint a picture of your environment, kind of the applications that you support and, and the kind of, I wanna get to the before and the, after wasabi, but start with kind of where you came from. >>Sure. Well, I came to the hatchet school about seven years ago and I had come most recently from public K12 and municipal. So again, not a lot of funding for it in general security or infrastructure in general. So Nutanix was actually a solu, a hyperconverged solution that I implemented at my previous position. So when I came to Hodges and found mostly on-prem workloads, everything from the student information system to the card access system, that students would use financial systems, they were almost all on premise, but there were some new SAS solutions coming in play. We had also taken some time to do some business continuity planning, you know, in the event of some kind of issue. I don't think we were thinking about the pandemic at the time, but certainly it helped prepare us for that. So as different workloads were moved off to hosted or cloud based, we didn't really need as much of the on premise compute and storage as, as we had. And it was time to retire that cluster. And so I brought the experience I had with Nutanix with me, and we consolidated all that into a, a hyper-converged platform, running Nutanix AV, which allowed us to get rid of all the cost of the VMware licensing as well. And it is an easier platform to manage, especially for small it shops like ours. >>Yeah. AHV is the Acropolis hypervisor. And so you migrated off of VMware avoidance V the VTax avoidance. That's a common theme among Nu Nutanix customers. And now did you consider moving into AWS? You know, what was the catalyst to consider wasabi as part of your defense strategy? >>We were looking at cloud storage options and they were just all so expensive, especially in egress fees to get data back out, WASA became across our, our desks. And it was such a low, low barrier to entry to sign up for a trial and get, you know, terabyte for a month. And then it was, you know, $6 a month for terabyte. After that, I said, we can try this out in a very low stakes way to see how this works for us. And there was a couple things we were trying to solve at the time. It wasn't just a place to put backup, but we also needed a place to have some files that might serve to some degree as a content delivery network. Some of our software applications that are deployed through our mobile device management needed a place that was accessible on, on the internet that they could be stored as well. >>So we were testing it for a couple different scenarios and it worked great, you know, performance wise, fast security wise. It has all the features of, of S3 compliance that works with, with Nutanix and anyone who's familiar with S3 permissions can apply them very easily. And then there was no egress fees. We can pull data down, put data up at will, and it's not costing us any extra, which is excellent because especially in education, we need fixed costs. We need to know what we're gonna spend over a year before we spend it and not be hit with, you know, bills for, for egres or, or because our workload or our data storage footprint grew tremendously. We need, we need that. We, we can't have the variability that the cloud providers would give us. >>So Kevin, you, you explained you're hypersensitive about security and privacy for obvious reasons that we discussed. Were you concerned about doing business with a company with a funny name? Was it the trial that got you through that knothole? How did you address those, those concerns as an it practitioner? >>Yeah, anytime we adopt anything, we go through a risk review. So we did our homework and we checked the funny name really means nothing. There's lots of companies with funny names. >>I think we don't go based on the name necessarily, but we did go based on the history understanding, you know, who started the company, where it came from and really looking into the technology, understanding that the value proposition, the ability to, to provide that lower cost is based specifically on the technology, in which it lays down data. So, so having a legitimate, reasonable, you know, excuse as to why it's cheap, we weren't thinking, well, you know, you get what you pay for it. It may be less expensive than alternatives, but it's, it's not cheap. It's not, you know, it's, it's reliable. And that was really our concern. So we, we did our homework for sure before even starting the trial, but then the trial certainly confirmed everything that we had learned. >>Yeah. Thank you for that. Drew explain the whole egres charge. We hear a lot about that. What do people need to know? >>First of all, it's not a funny name, it's a memorable name, date, just like the cube. Let's be very clear about that. Second of all egres charges. So, you know, other storage providers charge you for every API call, right? Every get every, put every list, everything okay. It's, it's part of their, their, you know, their, their process. It's part of how they make money. It's part of how they cover the cost of all their other services. We don't do that. And I think, you know, as, as Kevin has pointed out, right, that's a huge differentiator because you're talking about a significant amount of money above and beyond. What is the list price? In fact, I would tell you that most of the other storage providers, hyperscalers, you know, their list price, first of all, is, is, you know, far exceeding anything else in the industry, especially what we offer and then right. Their, their additional cost, the egres cost, the API requests can be two, three, 400% more on top of what you're paying per terabyte. >>So you used the little coffee analogy earlier in our conversation. So I'm, here's what I'm imagining. Like I have a lot of stuff. Right. And, and I, I, I had to clear up my bar and I put some stuff in storage, you know, right down the street and I pay them monthly. I can't imagine having to pay them to go get my stuff. That's kinda the same thing here. >>Oh, that's a great metaphor, right. That, that storage locker, right? Yeah. You know, can you imagine every time you wanna open the door to that locker and look inside having to pay a fee? >>No, no, that would be annoying. >>Or, or every time you pull into the yard and you want to put something in that storage locker, you have to pay an access fee to get to the yard. You have to pay a door opening fee. Right. And then if you wanna look and get an inventory of everything in there, you have to pay and it's ridiculous. Yeah. It's your data, it's your storage, it's your locker. You've already paid the annual fee probably cuz that they gave you a discount on that. So why shouldn't you have unfettered access to your data? That's what wasabi does. And I think as Kevin pointed out, right, that's what sets us completely apart from everybody >>Else. Okay, good. That's helpful. It helps us understand how Wasabi's different. Kevin. I'm always interested when I talk to practitioners like yourself in, in, in learning what you do, you know, outside of the technology, what are you doing in terms of educating your community and making them more cyber aware? Do you have training for students and faculty to learn about security and, and ransomware protection? For example? >>Yes. Cyber security awareness training is definitely one of the required things everyone should be doing in their organizations. And we do have a program that we use and we try to make it fun and engaging too. Right? This is, this is often the checking, the box kind of activity. Insurance companies require it, but we wanna make it something that people want to do and wanna engage with. So even last year, I think we did one around the holidays and kind of pointed out the kinds of scams they may expect in their personal life about, you know, shipping of orders and time for the holidays and things like that. So it wasn't just about protecting our school data. It's about the fact that, you know, protecting their information is something you do in all aspects of your life. Especially now that the folks are working hybrid off of working from home with equipment from the school, this stakes are much higher and people have a lot of our data at home. And so knowing how to protect that is important. And so we definitely run, run those programs in a way that, that we want to be engaging and fun and memorable so that when they do encounter those things, especially email threats, they know how to handle them. >>So when you say fun, it's like you come up with an example that we can laugh at until of course we click on that bad link, but I'm sure you can, you can come up with a lot of interesting and engaging examples. Is that what you're talking about? About having fun? >>Yeah. I mean, sometimes they are kind of choose your own adventure type stories. You know, they, they, they, they stop as they run. So they're, they're, they're telling a story and they stop and you have to answer questions along the way to keep going. So you're not just watching a video, you're engaged with the story of the topic. Yeah. That's why I think is, is memorable about it, but it's also, that's what makes it fun. It's not, you're not just watching some talking head saying, you know, to avoid shortened URLs or to check, to make sure, you know, the sender of, of the email. Now you you're engaged in a real life scenario story that you're kind of following and making choices along the way and finding out was that the right choice to make or maybe not. So that's where I think the learning comes in. >>Excellent. Okay, gentlemen, thanks so much. Appreciate your time. Kevin drew awesome. Having you in the cube. >>My pleasure. Thank you. >>Yeah. Great to be here. Thanks. Okay. In a moment, I'll give you some closing thoughts on the changing world of data protection and the evolution of cloud object storage. You're watching the cube, the leader in high tech enterprise coverage.
SUMMARY :
Who's the director of information technology services It's nice to be here. Hey, tell us a little bit more about the Hotchkis school and your role. location on the deepest freshwater lake in Connecticut, we have 500 K 500 acre you know, from that lens what's, what's the unique, you know, or not unique, We have a lot of data that can be used by threat actors or regulations that you can speak to? Certainly because of the fact that we're an independent boarding school, we So let's talk about some of the defense strategies from a practitioner point of view, you know, vulnerability scanning, patching using multifactor authentication and you know, financial industry, healthcare industry, even, you know, kind of the applications that you support and, and the kind of, I wanna get to the before and the, We had also taken some time to do some business continuity planning, you know, And so you migrated off to entry to sign up for a trial and get, you know, terabyte for a month. we spend it and not be hit with, you know, bills for, Was it the trial that got you through that knothole? So we did our well, you know, you get what you pay for it. Drew explain the whole egres charge. the other storage providers, hyperscalers, you know, their list price, first of all, I, I had to clear up my bar and I put some stuff in storage, you know, right down the street and I You know, can you imagine every So why shouldn't you have unfettered access to your data? you know, outside of the technology, what are you doing in terms of educating your community and making them more cyber aware? It's about the fact that, you know, protecting their information So when you say fun, it's like you come up with an example that we can laugh at until of course we click URLs or to check, to make sure, you know, the sender of, of the email. Having you in the cube. Thank you. In a moment, I'll give you some closing thoughts on the changing world of data
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Kevin | PERSON | 0.99+ |
Drew | PERSON | 0.99+ |
Kevin Warda | PERSON | 0.99+ |
Connecticut | LOCATION | 0.99+ |
Kevin Warenda | PERSON | 0.99+ |
200 acre | QUANTITY | 0.99+ |
three | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
Nutanix | ORGANIZATION | 0.99+ |
last year | DATE | 0.99+ |
Nu | ORGANIZATION | 0.99+ |
S3 | TITLE | 0.99+ |
HIPAA | TITLE | 0.99+ |
zero trust | QUANTITY | 0.98+ |
AWS | ORGANIZATION | 0.98+ |
Hodges | ORGANIZATION | 0.98+ |
egres | ORGANIZATION | 0.98+ |
12 | QUANTITY | 0.98+ |
WASA | ORGANIZATION | 0.98+ |
single | QUANTITY | 0.97+ |
Acropolis | ORGANIZATION | 0.97+ |
Preem | PERSON | 0.97+ |
Hotchkis school | ORGANIZATION | 0.96+ |
Second | QUANTITY | 0.96+ |
a month | QUANTITY | 0.95+ |
500 K 500 acre | QUANTITY | 0.94+ |
terabyte | ORGANIZATION | 0.94+ |
hacha school | ORGANIZATION | 0.93+ |
over a year | QUANTITY | 0.92+ |
one | QUANTITY | 0.91+ |
pandemic | EVENT | 0.91+ |
$6 a month | QUANTITY | 0.9+ |
VMware | ORGANIZATION | 0.89+ |
AHV | ORGANIZATION | 0.88+ |
about seven years ago | DATE | 0.87+ |
under 18 | QUANTITY | 0.85+ |
K12 | COMMERCIAL_ITEM | 0.8+ |
Wasabi | ORGANIZATION | 0.8+ |
First | QUANTITY | 0.78+ |
Copa | ORGANIZATION | 0.75+ |
wasabi | ORGANIZATION | 0.72+ |
400% | QUANTITY | 0.71+ |
egress | ORGANIZATION | 0.69+ |
Drew Schlussel Wasabi | PERSON | 0.68+ |
nine | QUANTITY | 0.62+ |
couple | QUANTITY | 0.54+ |
VTax | TITLE | 0.54+ |
VMware | TITLE | 0.51+ |
Atri Basu & Necati Cehreli | Zebrium Root Cause as a Service
>>Okay. We're back with Ari Basu, who is Cisco's resident philosopher, who also holds a master's in computer science. We're gonna have to unpack that a little bit and Najati chair he who's technical lead at Cisco. Welcome guys. Thanks for coming on the cube. >>Happy to be here. Thanks a >>Lot. All right, let's get into it. We want you to explain how Cisco validated the SBRI technology and the proof points that, that you have, that it actually works as advertised. So first Outre tell first, tell us about Cisco tech. What does Cisco tech do? >>So T is otherwise it's an acronym for technical assistance center is Cisco's support arm, the support organization, and, you know, the risk of sounding like I'm spotting a corporate line. The, the easiest way to summarize what tag does is provide world class support to Cisco customers. What that means is we have about 8,000 engineers worldwide, and any of our Cisco customers can either go on our web portal or call us to open a support request. And we get about 2.2 million of these support requests a year. And what these support requests are, are essentially the customer will describe something that they need done some networking goal that they have, that they wanna accomplish. And then it's tax job to make sure that that goal does get accomplished. Now, it could be something like they're having trouble with an existing network solution, and it's not working as expected, or it could be that they're integrating with a new solution. >>They're, you know, upgrading devices, maybe there's a hardware failure, anything really to do with networking support and, you know, the customer's network goals. If they open up a case for request for help, then tax job is to, is to respond and make sure the customer's, you know, questions and requirements are met about 44% of these support requests are usually trivial and, you know, can be solved within a call or within a day. But the rest of tax cases really involve getting into the network device, looking at logs. It's a very technical role. It's a very technical job. You're look you're, you need to be conversing with network solutions, their designs protocols, et cetera. >>Wow. So 56% non-trivial. And so I would imagine you spend a lot of time digging through through logs. Is that, is that true? Can you quantify that like, you know, every month, how much time you spend digging through logs and is that a pain point? >>Yeah, it's interesting. You asked that because when we started this on this journey to augment our support engineers workflow with zebra solution, one of the things that we did was we went out and asked our engineers what their experience was like doing log analysis. And the anecdotal evidence was that on average, an engineer will spend three out of their eight hours reviewing logs, either online or offline. So what that means is either with the customer live on a WebEx, they're going to be going over logs, network, state information, et cetera, or they're gonna do it offline, where the customer sends them the logs, it's attached to a, you know, a service request and they review it and try to figure out what's going on and provide the customer with information. So it's a very large chunk of our day. You know, I said 8,000 plus engineers, and so three hours a day, that's 24,000 man hours a day spent on long analysis. >>Now the struggle with logs or analyzing logs is there by out of necessity. Logs are very contr contr. They try to pack a lot of information in a very little space. And this is for performance reasons, storage reasons, et cetera, BEC, but the side effect of that is they're very esoteric. So they're hard to read if you're not conversant, if you're not the developer who wrote these logs or you or you, aren't doing code deep dives. And you're looking at where this logs getting printed and things like that, it may not be immediately obvious or even after a low while it may not be obvious what that log line means or how it correlates to whatever problem you're troubleshooting. So it requires tenure. It requires, you know, like I was saying before, it requires a lot of knowledge about the protocol what's expected because when you're doing log analysis, what you're really looking for is a needle in a haystack. You're looking for that one anomalous event, that single thing that tells you this shouldn't have happened. And this was a problem right now doing that kind of anomaly detection requires you to know what is normal. It requires, you know, what the baseline is. And that requires a very in-depth understanding of, you know, the state changes for that network solution or product. So it requires time, tenure and expertise to do well. And it takes a lot of time even when you have that kind of expertise. >>Wow. So thank you, archery. And Najati, that's, that's about, that's almost two days a week for, for a technical resource. That's that's not inexpensive. So what was Cisco looking for to sort of help with this and, and how'd you stumble upon zebra? >>Yeah, so, I mean, we have our internal automation system, which has been running more than a decade now. And what happens is when a customer attaches a log bundle or diagnostic bundle into the service request, we take that from the Sr we analyze it and we represent some kind of information. You know, it can be alert or some tables, some graph to the engineer, so they can, you know, troubleshoot this particular issue. This is an incredible system, but it comes with its own challenges around maintenance to keep it up to date and relevant with Cisco's new products or new version of the product, new defects, new issues, and all kind of things. And when I, what I mean with those challenges are, let's say Cisco comes up with a product today. We need to come together with those engineers. We need to figure out how this bundle works, how it's structured out. >>We need to select individual logs, which are relevant and then start modeling these logs and get some values out of those logs, using pars or some rag access to come to a level that we can consume the logs. And then people start writing rules on top of that abstraction. So people can say in this log, I'm seeing this value together with this other value in another log, maybe I'm hitting this particular defect. So that's how it works. And if you look at it, the abstraction, it can fail the next time. And the next release when the development or the engineer decides to change that log line, which you write that rag X, or we can come up with a new version, which we completely change the services or processes, then whatever you have wrote needs to be re written for that new service. And we see that a lot with products, like for instance, WebEx, where you have a very short release cycle that things can change maybe the next week with a new release. >>So whatever you are writing, especially for that abstraction and for those rules are maybe not relevant with that new release. With that being sake, we have a incredible rule creation process and governance process around it, which starts with maybe a defect. And then it takes it to a level where we have an automation in place. But if you look at it, this really ties to human bandwidth. And our engineers are really busy working on, you know, customer facing, working on issues daily and sometimes creating these rules or these pars are not their biggest priorities, so they can be delayed a bit. So we have this delay between a new issue being identified to a level where we have the automation to detect it next time that some customer faces it. So with all these questions and with all challenges in mind, we start looking into ways of actually how we can automate these automations. >>So these things that we are doing manually, how we can move it a bit further and automate. And we had actually a couple of things in mind that we were looking for and this being one of them being, this has to be product agnostic. Like if Cisco comes up with a product tomorrow, I should be able to take it logs without writing, you know, complex regs, pars, whatever, and deploy it into this system. So it can embrace our logs and make sense of it. And we wanted this platform to be unsupervised. So none of the engineers need to create rules, you know, label logs. This is bad. This is good. Or train the system like which requires a lot of computational power. And the other most important thing for us was we wanted this to be not noisy at all, because what happens with noises when your level of false PE positives really high your engineers start ignoring the good things between that noise. >>So they start the next time, you know, thinking that this thing will not be relevant. So we want something with a lot or less noise. And ultimately we wanted this new platform or new framework to be easily adaptable to our existing workflows. So this is where we started. We start looking into the, you know, first of all, internally, if we can build this thing and also start researching it, and we came up to Zeum actually Larry, one of the co co-founders of Zeum. We came upon his presentation where he clearly explained why this is different, how this works, and it immediately clicked in. And we said, okay, this is exactly what we were looking for. We dived deeper. We checked the block posts where SBRI guys really explained everything very clearly there, they are really open about it. And most importantly, there is a button in their system. >>So what happens usually with AI ML vendors is they have this button where you fill in your details and sales guys call you back. And, you know, we explain the system here. They were like, this is our trial system. We believe in the system, you can just sign up and try it yourself. And that's what we did. We took our, one of our Cisco live DNA center, wireless platforms. We start streaming logs out of it. And then we synthetically, you know, introduce errors, like we broke things. And then we realized that zebra was really catching the errors perfectly. And on top of that, it was really quiet unless you are really breaking something. And the other thing we realized was during that first trial is zebra was actually bringing a lot of context on top of the logs. During those failures, we work with couple of technical leaders and they said, okay, if this failure happens, I I'm expecting this individual log to be there. And we found out with zebra, apart from that individual log, there were a lot of other things which gives a bit more context around the root columns, which was great. And that's where we wanted to take it to the next level. Yeah. >>Okay. So, you know, a couple things to unpack there. I mean, you have the dart board behind you, which is kind of interesting, cuz a lot of times it's like throwing darts at the board to try to figure this stuff out. But to your other point, Cisco actually has some pretty rich tools with AppD and doing observability and you've made acquisitions like thousand eyes. And like you said, I'm, I'm presuming you gotta eat your own dog food or drink your own champagne. And so you've gotta be tools agnostic. And when I first heard about Z zebra, I was like, wait a minute. Really? I was kind of skeptical. I've heard this before. You're telling me all I need is plain text and, and a timestamp. And you got my problem solved. So, and I, I understand that you guys said, okay, let's run a POC. Let's see if we can cut that from, let's say two days a week down to one day, a week. In other words, 50%, let's see if we can automate 50% of the root cause analysis. And, and so you funded a POC. How, how did you test it? You, you put, you know, synthetic, you know, errors and problems in there, but how did you test that? It actually works Najati >>Yeah. So we, we wanted to take it to the next level, which is meaning that we wanted to back test is with existing SARS. And we decided, you know, we, we chose four different products from four different verticals, data center, security, collaboration, and enterprise networking. And we find out SARS where the engineer put some kind of log in the resolution summary. So they closed the case. And in the summary of the Sr, they put, I identified these log lines and they led me to the roots and we, we ingested those log bundles. And we, we tried to see if Zeum can surface that exact same log line in their analysis. So we initially did it with archery ourself and after 50 tests or so we were really happy with the results. I mean, almost most of them, we saw the log line that we were looking for, but that was not enough. >>And we brought it of course, to our management and they said, okay, let's, let's try this with real users because the log being there is one thing, but the engineer reaching to that log is another take. So we wanted to make sure that when we put it in front of our users, our engineers, they can actually come to that log themselves because, you know, we, we know this platform so we can, you know, make searches and find whatever we are looking for, but we wanted to do that. So we extended our pilots to some selected engineers and they tested with their own SRSS. Also do some back testing for some SARS, which are closed in the past or recently. And with, with a sample set of, I guess, close to 200 SARS, we find out like majority of the time, almost 95% of the time the engineer could find the log they were looking for in zebra analysis. >>Yeah. Okay. So you were looking for 50%, you got to 95%. And my understanding is you actually did it with four pretty well known Cisco products, WebEx client DNA center, identity services, engine ISE, and then, then UCS. Yes. Unified pursuit. So you use actual real data and, and that was kind of your proof proof point, but Ari. So that's sounds pretty impressive. And, and you've have you put this into production now and what have you found? >>Well, yes, we're, we've launched this with the four products that you mentioned. We're providing our tech engineers with the ability, whenever a, whenever a support bundle for that product gets attached to the support request. We are processing it, using sense and then providing that sense analysis to the tech engineer for their review. >>So are you seeing the results in production? I mean, are you actually able to, to, to reclaim that time that people are spending? I mean, it was literally almost two days a week down to, you know, a part of a day, is that what you're seeing in production and what are you able to do with that extra time and people getting their weekends back? Are you putting 'em on more strategic tasks? How are you handling that? >>Yeah. So, so what we're seeing is, and I can tell you from my own personal experience using this tool, that troubleshooting any one of the cases, I don't take more than 15 to 20 minutes to go through the zebra report. And I know within that time either what the root causes or I know that zebra doesn't have the information that I need to solve this particular case. So we've definitely seen, well, it's been very hard to measure exactly how much time we've saved per engineer, right? What we, again, anecdotally, what we've heard from our users is that out of those three hours that they were spending per day, we're definitely able to reclaim at least one of those hours and, and what, even more importantly, you know, what the kind of feedback that we've gotten in terms of, I think one statement that really summarizes how Zebra's impacted our workflow was from one of our users. >>And they said, well, you know, until you provide us with this tool, log analysis was a very black and white affair, but now it's become really colorful. And I mean, if you think about it, log analysis is indeed black and white. You're looking at it on a terminal screen where the background is black and the text is white, or you're looking at it as a text where the background is white and the text is black, but what's what they're really trying to say. Is there hardly any visual cues that help you navigate these logs, which are so esoteric, so dense, et cetera. But what XRM does is it provides a lot of color and context to the whole process. So now you're able to quickly get to, you know, using their word cloud, using their interactive histogram, using the summaries of every incident. You're very quickly able to summarize what might be happening and what you need to look into. >>Like, what are the important aspects of this particular log bundle that might be relevant to you? So we've definitely seen that a really great use case that kind of encapsulates all of this was very early on in our experiment. There was, there was this support request that had been escalated to the business unit or the development team. And the tech engineer had really, they, they had an intuition about what was going wrong because of their experience because of, you know, the symptoms that they'd seen. They kind of had an idea, but they weren't able to convince the development team because they weren't able to find any evidence to back up what they thought was happening. And we, it was entirely happenstance that I happened to pick up that case and did an analysis using Seebri. And then I sat down with the attack engineer and we were very quickly within 15 minutes, we were able to get down to the exact sequence of events that highlighted what the customer thought was happening, evidence of what the, so not the customer, what the attack engineer thought was the, was a root cause. It was a rude pause. And then we were able to share that evidence with our business unit and, you know, redirect their resources so that we could change down what the problem was. And that really has been, that that really shows you how that color and context helps in log analysis. >>Interesting. You know, we do a fair amount of work in the cube in the RPA space, the robotic process automation and the narrative in the press when our RPA first started taking off was, oh, it's, you know, machines replacing humans, or we're gonna lose jobs. And, and what actually happened was people were just eliminating mundane tasks and, and the, the employee's actually very happy about it. But my question to you is, was there ever a reticence amongst your team? Like, oh, wow, I'm gonna, I'm gonna lose my job if the machine's gonna replace me, or have you found that people were excited about this and what what's been the reaction amongst the team? >>Well, I think, you know, every automation and AI project has that immediate gut reaction of you're automating away our jobs and so forth. And there is initially there's a little bit of reticence, but I mean, it's like you said, once you start using the tool, you realize that it's not your job, that's getting automated away. It's just that your job's becoming a little easier to do, and it's faster and more efficient. And you're able to get more done in less time. That's really what we're trying to accomplish here at the end of the day, rim will identify these incidents. They'll do the correlation, et cetera. But if you don't understand what you're reading, then that information's useless to you. So you need the human, you need the network expert to actually look at these incidents, but what we are able to skin away or get rid of is all of the fat that's involved in our, you know, in our process, like without having to download the bundle, which, you know, when it's many gigabytes in size, and now we're working from home with the pandemic and everything, you're, you know, pulling massive amounts of logs from the corporate network onto your local device that takes time and then opening it up, loading it in a text editor that takes time. >>All of these things are we're trying to get rid of. And instead we're trying to make it easier and quicker for you to find what you're looking for. So it's like you said, you take away the mundane, you take away the, the difficulties and the slog, but you don't really take away the work, the work still needs to be done. >>Yeah. Great guys. Thanks so much. Appreciate you sharing your story. It's quite, quite fascinating. Really. Thank you for coming on. >>Thanks for having us. >>You're very welcome. Okay. In a moment, I'll be back to wrap up with some final thoughts. This is Dave Valante and you're watching the, >>So today we talked about the need, not only to gain end to end visibility, but why there's a need to automate the identification of root cause problems and doing so with modern technology and machine intelligence can dramatically speed up the process and identify the vast majority of issues right out of the box. If you will. And this technology, it can work with log bundles in batches, or with real time data, as long as there's plain text and a timestamp, it seems Zebra's technology will get you the outcome of automating root cause analysis with very high degrees of accuracy. Zebra is available on Preem or in the cloud. Now this is important for some companies on Preem because there's really some sensitive data inside logs that for compliance and governance reasons, companies have to keep inside their four walls. Now SBRI has a free trial. Of course they better, right? So check it out@zebra.com. You can book a live demo and sign up for a free trial. Thanks for watching this special presentation on the cube, the leader in enterprise and emerging tech coverage on Dave Valante and.
SUMMARY :
Thanks for coming on the cube. Happy to be here. and the proof points that, that you have, that it actually works as advertised. Cisco's support arm, the support organization, and, you know, to do with networking support and, you know, the customer's network goals. And so I would imagine you spend a lot of where the customer sends them the logs, it's attached to a, you know, a service request and And that requires a very in-depth understanding of, you know, to sort of help with this and, and how'd you stumble upon zebra? some graph to the engineer, so they can, you know, troubleshoot this particular issue. And if you look at it, the abstraction, it can fail the next time. And our engineers are really busy working on, you know, customer facing, So none of the engineers need to create rules, you know, label logs. So they start the next time, you know, thinking that this thing will So what happens usually with AI ML vendors is they have this button where you fill in your And like you said, I'm, you know, we, we chose four different products from four different verticals, And we brought it of course, to our management and they said, okay, let's, let's try this with And my understanding is you actually did it with Well, yes, we're, we've launched this with the four products that you mentioned. and what, even more importantly, you know, what the kind of feedback that we've gotten in terms And they said, well, you know, until you provide us with this tool, And that really has been, that that really shows you how that color and context helps But my question to you is, was there ever a reticence amongst or get rid of is all of the fat that's involved in our, you know, So it's like you said, you take away the mundane, Appreciate you sharing your story. This is Dave Valante and you're watching the, it seems Zebra's technology will get you the outcome of automating root cause analysis with
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Ari Basu | PERSON | 0.99+ |
Dave Valante | PERSON | 0.99+ |
Cisco | ORGANIZATION | 0.99+ |
one day | QUANTITY | 0.99+ |
50% | QUANTITY | 0.99+ |
95% | QUANTITY | 0.99+ |
Zeum | ORGANIZATION | 0.99+ |
eight hours | QUANTITY | 0.99+ |
SARS | ORGANIZATION | 0.99+ |
Najati | PERSON | 0.99+ |
56% | QUANTITY | 0.99+ |
Larry | PERSON | 0.99+ |
three hours | QUANTITY | 0.99+ |
UCS | ORGANIZATION | 0.99+ |
50 tests | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
a week | QUANTITY | 0.98+ |
about 8,000 engineers | QUANTITY | 0.98+ |
one | QUANTITY | 0.98+ |
next week | DATE | 0.97+ |
about 2.2 million | QUANTITY | 0.97+ |
three | QUANTITY | 0.97+ |
one statement | QUANTITY | 0.97+ |
first trial | QUANTITY | 0.97+ |
WebEx | ORGANIZATION | 0.97+ |
three hours a day | QUANTITY | 0.96+ |
first | QUANTITY | 0.96+ |
Seebri | ORGANIZATION | 0.96+ |
15 minutes | QUANTITY | 0.96+ |
SBRI | ORGANIZATION | 0.95+ |
tomorrow | DATE | 0.95+ |
more than a decade | QUANTITY | 0.95+ |
about 44% | QUANTITY | 0.95+ |
Outre | ORGANIZATION | 0.93+ |
single thing | QUANTITY | 0.93+ |
more than 15 | QUANTITY | 0.93+ |
two days a week | QUANTITY | 0.93+ |
AppD | TITLE | 0.92+ |
a day | QUANTITY | 0.91+ |
Necati Cehreli | PERSON | 0.91+ |
four products | QUANTITY | 0.9+ |
couple | QUANTITY | 0.89+ |
Ari | PERSON | 0.89+ |
pandemic | EVENT | 0.87+ |
one thing | QUANTITY | 0.87+ |
SRSS | TITLE | 0.86+ |
almost 95% | QUANTITY | 0.86+ |
20 minutes | QUANTITY | 0.85+ |
two days a week | QUANTITY | 0.85+ |
Zebra | ORGANIZATION | 0.85+ |
a year | QUANTITY | 0.85+ |
8,000 plus engineers | QUANTITY | 0.83+ |
almost two days a week | QUANTITY | 0.82+ |
WebEx | TITLE | 0.82+ |
ISE | ORGANIZATION | 0.81+ |
Zebrium | ORGANIZATION | 0.81+ |
24,000 man hours a day | QUANTITY | 0.8+ |
thousand eyes | QUANTITY | 0.79+ |
Atri Basu | PERSON | 0.79+ |
DNA | ORGANIZATION | 0.76+ |
zebra | ORGANIZATION | 0.74+ |
out@zebra.com | OTHER | 0.74+ |
BEC | ORGANIZATION | 0.72+ |
four | QUANTITY | 0.72+ |
Zebra | TITLE | 0.71+ |
one anomalous event | QUANTITY | 0.71+ |
one of our users | QUANTITY | 0.67+ |
Najati | ORGANIZATION | 0.65+ |
200 | QUANTITY | 0.63+ |