Image Title

Search Results for Emi:

Starburst The Data Lies FULL V2b


 

>>In 2011, early Facebook employee and Cloudera co-founder Jeff Ocker famously said the best minds of my generation are thinking about how to get people to click on ads. And that sucks. Let's face it more than a decade later organizations continue to be frustrated with how difficult it is to get value from data and build a truly agile data-driven enterprise. What does that even mean? You ask? Well, it means that everyone in the organization has the data they need when they need it. In a context that's relevant to advance the mission of an organization. Now that could mean cutting cost could mean increasing profits, driving productivity, saving lives, accelerating drug discovery, making better diagnoses, solving, supply chain problems, predicting weather disasters, simplifying processes, and thousands of other examples where data can completely transform people's lives beyond manipulating internet users to behave a certain way. We've heard the prognostications about the possibilities of data before and in fairness we've made progress, but the hard truth is the original promises of master data management, enterprise data, warehouses, data marts, data hubs, and yes, even data lakes were broken and left us wanting from more welcome to the data doesn't lie, or doesn't a series of conversations produced by the cube and made possible by Starburst data. >>I'm your host, Dave Lanta and joining me today are three industry experts. Justin Borgman is this co-founder and CEO of Starburst. Richard Jarvis is the CTO at EMI health and Theresa tongue is cloud first technologist at Accenture. Today we're gonna have a candid discussion that will expose the unfulfilled and yes, broken promises of a data past we'll expose data lies, big lies, little lies, white lies, and hidden truths. And we'll challenge, age old data conventions and bust some data myths. We're debating questions like is the demise of a single source of truth. Inevitable will the data warehouse ever have featured parody with the data lake or vice versa is the so-called modern data stack, simply centralization in the cloud, AKA the old guards model in new cloud close. How can organizations rethink their data architectures and regimes to realize the true promises of data can and will and open ecosystem deliver on these promises in our lifetimes, we're spanning much of the Western world today. Richard is in the UK. Teresa is on the west coast and Justin is in Massachusetts with me. I'm in the cube studios about 30 miles outside of Boston folks. Welcome to the program. Thanks for coming on. Thanks for having us. Let's get right into it. You're very welcome. Now here's the first lie. The most effective data architecture is one that is centralized with a team of data specialists serving various lines of business. What do you think Justin? >>Yeah, definitely a lie. My first startup was a company called hit adapt, which was an early SQL engine for hit that was acquired by Teradata. And when I got to Teradata, of course, Teradata is the pioneer of that central enterprise data warehouse model. One of the things that I found fascinating was that not one of their customers had actually lived up to that vision of centralizing all of their data into one place. They all had data silos. They all had data in different systems. They had data on prem data in the cloud. You know, those companies were acquiring other companies and inheriting their data architecture. So, you know, despite being the industry leader for 40 years, not one of their customers truly had everything in one place. So I think definitely history has proven that to be a lie. >>So Richard, from a practitioner's point of view, you know, what, what are your thoughts? I mean, there, there's a lot of pressure to cut cost, keep things centralized, you know, serve the business as best as possible from that standpoint. What, what is your experience show? >>Yeah, I mean, I think I would echo Justin's experience really that we, as a business have grown up through acquisition, through storing data in different places sometimes to do information governance in different ways to store data in, in a platform that's close to data experts, people who really understand healthcare data from pharmacies or from, from doctors. And so, although if you were starting from a Greenfield site and you were building something brand new, you might be able to centralize all the data and all of the tooling and teams in one place. The reality is that that businesses just don't grow up like that. And, and it's just really impossible to get that academic perfection of, of storing everything in one place. >>Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, you know, right. You actually did have to have a single version of the truth for certain financial data, but really for those, some of those other use cases, I, I mentioned, I, I do feel like the industry has kinda let us down. What's your take on this? Where does it make sense to have that sort of centralized approach versus where does it make sense to maybe decentralized? >>I, I think you gotta have centralized governance, right? So from the central team, for things like star Oxley, for things like security for certainly very core data sets, having a centralized set of roles, responsibilities to really QA, right. To serve as a design authority for your entire data estate, just like you might with security, but how it's implemented has to be distributed. Otherwise you're not gonna be able to scale. Right? So being able to have different parts of the business really make the right data investments for their needs. And then ultimately you're gonna collaborate with your partners. So partners that are not within the company, right. External partners, we're gonna see a lot more data sharing and model creation. And so you're definitely going to be decentralized. >>So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, on data mesh. It was a great program. You invited Jamma, Dani, of course, she's the creator of the data mesh. And her one of our fundamental premises is that you've got this hyper specialized team that you've gotta go through. And if you want anything, but at the same time, these, these individuals actually become a bottleneck, even though they're some of the most talented people in the organization. So I guess question for you, Richard, how do you deal with that? Do you, do you organize so that there are a few sort of rock stars that, that, you know, build cubes and, and the like, and, and, and, or have you had any success in sort of decentralizing with, you know, your, your constituencies, that data model? >>Yeah. So, so we absolutely have got rockstar, data scientists and data guardians. If you like people who understand what it means to use this data, particularly as the data that we use at emos is very private it's healthcare information. And some of the, the rules and regulations around using the data are very complex and, and strict. So we have to have people who understand the usage of the data, then people who understand how to build models, how to process the data effectively. And you can think of them like consultants to the wider business, because a pharmacist might not understand how to structure a SQL query, but they do understand how they want to process medication information to improve patient lives. And so that becomes a, a consulting type experience from a, a set of rock stars to help a, a more decentralized business who needs to, to understand the data and to generate some valuable output. >>Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, I got a centralized team and that's the most cost effective way to serve the business. Otherwise I got, I got duplication. What do you say to that? >>Well, I, I would argue it's probably not the most cost effective and, and the reason being really twofold. I think, first of all, when you are deploying a enterprise data warehouse model, the, the data warehouse itself is very expensive, generally speaking. And so you're putting all of your most valuable data in the hands of one vendor who now has tremendous leverage over you, you know, for many, many years to come. I think that's the story at Oracle or Terra data or other proprietary database systems. But the other aspect I think is that the reality is those central data warehouse teams is as much as they are experts in the technology. They don't necessarily understand the data itself. And this is one of the core tenants of data mash that that jam writes about is this idea of the domain owners actually know the data the best. >>And so by, you know, not only acknowledging that data is generally decentralized and to your earlier point about SAR, brain Oxley, maybe saving the data warehouse, I would argue maybe GDPR and data sovereignty will destroy it because data has to be decentralized for, for those laws to be compliant. But I think the reality is, you know, the data mesh model basically says, data's decentralized, and we're gonna turn that into an asset rather than a liability. And we're gonna turn that into an asset by empowering the people that know the data, the best to participate in the process of, you know, curating and creating data products for, for consumption. So I think when you think about it, that way, you're going to get higher quality data and faster time to insight, which is ultimately going to drive more revenue for your business and reduce costs. So I think that that's the way I see the two, the two models comparing and contrasting. >>So do you think the demise of the data warehouse is inevitable? I mean, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing infrastructure. Maybe they're gonna build on top of it, but what does that mean? Does that mean the E D w just becomes, you know, less and less valuable over time, or it's maybe just isolated to specific use cases. What's your take on that? >>Listen, I still would love all my data within a data warehouse would love it. Mastered would love it owned by essential team. Right? I think that's still what I would love to have. That's just not the reality, right? The investment to actually migrate and keep that up to date. I would say it's a losing battle. Like we've been trying to do it for a long time. Nobody has the budgets and then data changes, right? There's gonna be a new technology. That's gonna emerge that we're gonna wanna tap into. There's going to be not enough investment to bring all the legacy, but still very useful systems into that centralized view. So you keep the data warehouse. I think it's a very, very valuable, very high performance tool for what it's there for, but you could have this, you know, new mesh layer that still takes advantage of the things. I mentioned, the data products in the systems that are meaningful today and the data products that actually might span a number of systems, maybe either those that either source systems for the domains that know it best, or the consumer based systems and products that need to be packaged in a way that be really meaningful for that end user, right? Each of those are useful for a different part of the business and making sure that the mesh actually allows you to use all of them. >>So, Richard, let me ask you, you take, take Gemma's principles back to those. You got to, you know, domain ownership and, and, and data as product. Okay, great. Sounds good. But it creates what I would argue are two, you know, challenges, self-serve infrastructure let's park that for a second. And then in your industry, the one of the high, most regulated, most sensitive computational governance, how do you automate and ensure federated governance in that mesh model that Theresa was just talking about? >>Well, it absolutely depends on some of the tooling and processes that you put in place around those tools to be, to centralize the security and the governance of the data. And I think, although a data warehouse makes that very simple, cause it's a single tool, it's not impossible with some of the data mesh technologies that are available. And so what we've done at emus is we have a single security layer that sits on top of our data match, which means that no matter which user is accessing, which data source, we go through a well audited well understood security layer. That means that we know exactly who's got access to which data field, which data tables. And then everything that they do is, is audited in a very kind of standard way, regardless of the underlying data storage technology. So for me, although storing the data in one place might not be possible understanding where your source of truth is and securing that in a common way is still a valuable approach and you can do it without having to bring all that data into a single bucket so that it's all in one place. And, and so having done that and investing quite heavily in making that possible has paid dividends in terms of giving wider access to the platform and ensuring that only data that's available under GDPR and other regulations is being used by, by the data users. >>Yeah. So Justin, I mean, Democrat, we always talk about data democratization and you know, up until recently, they really haven't been line of sight as to how to get there. But do you have anything to add to this because you're essentially taking, you know, do an analytic queries and with data that's all dispersed all over the, how are you seeing your customers handle this, this challenge? >>Yeah. I mean, I think data products is a really interesting aspect of the answer to that. It allows you to, again, leverage the data domain owners, people know the data, the best to, to create, you know, data as a product ultimately to be consumed. And we try to represent that in our product as effectively a almost eCommerce like experience where you go and discover and look for the data products that have been created in your organization. And then you can start to consume them as, as you'd like. And so really trying to build on that notion of, you know, data democratization and self-service, and making it very easy to discover and, and start to use with whatever BI tool you, you may like, or even just running, you know, SQL queries yourself, >>Okay. G guys grab a sip of water. After this short break, we'll be back to debate whether proprietary or open platforms are the best path to the future of data excellence, keep it right there. >>Your company has more data than ever, and more people trying to understand it, but there's a problem. Your data is stored across multiple systems. It's hard to access and that delays analytics and ultimately decisions. The old method of moving all of your data into a single source of truth is slow and definitely not built for the volume of data we have today or where we are headed while your data engineers spent over half their time, moving data, your analysts and data scientists are left, waiting, feeling frustrated, unproductive, and unable to move the needle for your business. But what if you could spend less time moving or copying data? What if your data consumers could analyze all your data quickly? >>Starburst helps your teams run fast queries on any data source. We help you create a single point of access to your data, no matter where it's stored. And we support high concurrency, we solve for speed and scale, whether it's fast, SQL queries on your data lake or faster queries across multiple data sets, Starburst helps your teams run analytics anywhere you can't afford to wait for data to be available. Your team has questions that need answers. Now with Starburst, the wait is over. You'll have faster access to data with enterprise level security, easy connectivity, and 24 7 support from experts, organizations like Zolando Comcast and FINRA rely on Starburst to move their businesses forward. Contact our Trino experts to get started. >>We're back with Jess Borgman of Starburst and Richard Jarvis of EVAs health. Okay, we're gonna get to lie. Number two, and that is this an open source based platform cannot give you the performance and control that you can get with a proprietary system. Is that a lie? Justin, the enterprise data warehouse has been pretty dominant and has evolved and matured. Its stack has mature over the years. Why is it not the default platform for data? >>Yeah, well, I think that's become a lie over time. So I, I think, you know, if we go back 10 or 12 years ago with the advent of the first data lake really around Hudu, that probably was true that you couldn't get the performance that you needed to run fast, interactive, SQL queries in a data lake. Now a lot's changed in 10 or 12 years. I remember in the very early days, people would say, you you'll never get performance because you need to be column there. You need to store data in a column format. And then, you know, column formats we're introduced to, to data apes, you have Parque ORC file in aro that were created to ultimately deliver performance out of that. So, okay. We got, you know, largely over the performance hurdle, you know, more recently people will say, well, you don't have the ability to do updates and deletes like a traditional data warehouse. >>And now we've got the creation of new data formats, again like iceberg and Delta and Hodi that do allow for updates and delete. So I think the data lake has continued to mature. And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, know it takes six or seven years to build a functional database. I think that's that's right. And now we've had almost a decade go by. So, you know, these technologies have matured to really deliver very, very close to the same level performance and functionality of, of cloud data warehouses. So I think the, the reality is that's become a line and now we have large giant hyperscale internet companies that, you know, don't have the traditional data warehouse at all. They do all of their analytics in a data lake. So I think we've, we've proven that it's very much possible today. >>Thank you for that. And so Richard, talk about your perspective as a practitioner in terms of what open brings you versus, I mean, look closed is it's open as a moving target. I remember Unix used to be open systems and so it's, it is an evolving, you know, spectrum, but, but from your perspective, what does open give you that you can't get from a proprietary system where you are fearful of in a proprietary system? >>I, I suppose for me open buys us the ability to be unsure about the future, because one thing that's always true about technology is it evolves in a, a direction, slightly different to what people expect. And what you don't want to end up is done is backed itself into a corner that then prevents it from innovating. So if you have chosen a technology and you've stored trillions of records in that technology and suddenly a new way of processing or machine learning comes out, you wanna be able to take advantage and your competitive edge might depend upon it. And so I suppose for us, we acknowledge that we don't have perfect vision of what the future might be. And so by backing open storage technologies, we can apply a number of different technologies to the processing of that data. And that gives us the ability to remain relevant, innovate on our data storage. And we have bought our way out of the, any performance concerns because we can use cloud scale infrastructure to scale up and scale down as we need. And so we don't have the concerns that we don't have enough hardware today to process what we want to do, want to achieve. We can just scale up when we need it and scale back down. So open source has really allowed us to maintain the being at the cutting edge. >>So Jess, let me play devil's advocate here a little bit, and I've talked to Shaak about this and you know, obviously her vision is there's an open source that, that the data meshes open source, an open source tooling, and it's not a proprietary, you know, you're not gonna buy a data mesh. You're gonna build it with, with open source toolings and, and vendors like you are gonna support it, but to come back to sort of today, you can get to market with a proprietary solution faster. I'm gonna make that statement. You tell me if it's a lie and then you can say, okay, we support Apache iceberg. We're gonna support open source tooling, take a company like VMware, not really in the data business, but how, the way they embraced Kubernetes and, and you know, every new open source thing that comes along, they say, we do that too. Why can't proprietary systems do that and be as effective? >>Yeah, well, I think at least with the, within the data landscape saying that you can access open data formats like iceberg or, or others is, is a bit dis disingenuous because really what you're selling to your customer is a certain degree of performance, a certain SLA, and you know, those cloud data warehouses that can reach beyond their own proprietary storage drop all the performance that they were able to provide. So it is, it reminds me kind of, of, again, going back 10 or 12 years ago when everybody had a connector to Haddo and that they thought that was the solution, right? But the reality was, you know, a connector was not the same as running workloads in Haddo back then. And I think similarly, you know, being able to connect to an external table that lives in an open data format, you know, you're, you're not going to give it the performance that your customers are accustomed to. And at the end of the day, they're always going to be predisposed. They're always going to be incentivized to get that data ingested into the data warehouse, cuz that's where they have control. And you know, the bottom line is the database industry has really been built around vendor lockin. I mean, from the start, how, how many people love Oracle today, but our customers, nonetheless, I think, you know, lockin is, is, is part of this industry. And I think that's really what we're trying to change with open data formats. >>Well, that's interesting reminded when I, you know, I see the, the gas price, the tees or gas price I, I drive up and then I say, oh, that's the cash price credit card. I gotta pay 20 cents more, but okay. But so the, the argument then, so let me, let me come back to you, Justin. So what's wrong with saying, Hey, we support open data formats, but yeah, you're gonna get better performance if you, if you keep it into our closed system, are you saying that long term that's gonna come back and bite you cuz you're gonna end up, you mentioned Oracle, you mentioned Teradata. Yeah. That's by, by implication, you're saying that's where snowflake customers are headed. >>Yeah, absolutely. I think this is a movie that, you know, we've all seen before. At least those of us who've been in the industry long enough to, to see this movie play over a couple times. So I do think that's the future. And I think, you know, I loved what Richard said. I actually wrote it down. Cause I thought it was an amazing quote. He said, it buys us the ability to be unsure of the future. Th that that pretty much says it all the, the future is unknowable and the reality is using open data formats. You remain interoperable with any technology you want to utilize. If you want to use spark to train a machine learning model and you want to use Starbust to query via sequel, that's totally cool. They can both work off the same exact, you know, data, data sets by contrast, if you're, you know, focused on a proprietary model, then you're kind of locked in again to that model. I think the same applies to data, sharing to data products, to a wide variety of, of aspects of the data landscape that a proprietary approach kind of closes you in and locks you in. >>So I, I would say this Richard, I'd love to get your thoughts on it. Cause I talked to a lot of Oracle customers, not as many te data customers, but, but a lot of Oracle customers and they, you know, they'll admit, yeah, you know, they're jamming us on price and the license cost they give, but we do get value out of it. And so my question to you, Richard, is, is do the, let's call it data warehouse systems or the proprietary systems. Are they gonna deliver a greater ROI sooner? And is that in allure of, of that customers, you know, are attracted to, or can open platforms deliver as fast in ROI? >>I think the answer to that is it can depend a bit. It depends on your businesses skillset. So we are lucky that we have a number of proprietary teams that work in databases that provide our operational data capability. And we have teams of analytics and big data experts who can work with open data sets and open data formats. And so for those different teams, they can get to an ROI more quickly with different technologies for the business though, we can't do better for our operational data stores than proprietary databases. Today we can back off very tight SLAs to them. We can demonstrate reliability from millions of hours of those databases being run at enterprise scale, but for an analytics workload where increasing our business is growing in that direction, we can't do better than open data formats with cloud-based data mesh type technologies. And so it's not a simple answer. That one will always be the right answer for our business. We definitely have times when proprietary databases provide a capability that we couldn't easily represent or replicate with open technologies. >>Yeah. Richard, stay with you. You mentioned, you know, you know, some things before that, that strike me, you know, the data brick snowflake, you know, thing is, oh, is a lot of fun for analysts like me. You've got data bricks coming at it. Richard, you mentioned you have a lot of rockstar, data engineers, data bricks coming at it from a data engineering heritage. You get snowflake coming at it from an analytics heritage. Those two worlds are, are colliding people like PJI Mohan said, you know what? I think it's actually harder to play in the data engineering. So I E it's easier to for data engineering world to go into the analytics world versus the reverse, but thinking about up and coming engineers and developers preparing for this future of data engineering and data analytics, how, how should they be thinking about the future? What, what's your advice to those young people? >>So I think I'd probably fall back on general programming skill sets. So the advice that I saw years ago was if you have open source technologies, the pythons and Javas on your CV, you commander 20% pay, hike over people who can only do proprietary programming languages. And I think that's true of data technologies as well. And from a business point of view, that makes sense. I'd rather spend the money that I save on proprietary licenses on better engineers, because they can provide more value to the business that can innovate us beyond our competitors. So I think I would my advice to people who are starting here or trying to build teams to capitalize on data assets is begin with open license, free capabilities, because they're very cheap to experiment with. And they generate a lot of interest from people who want to join you as a business. And you can make them very successful early, early doors with, with your analytics journey. >>It's interesting. Again, analysts like myself, we do a lot of TCO work and have over the last 20 plus years. And in world of Oracle, you know, normally it's the staff, that's the biggest nut in total cost of ownership, not an Oracle. It's the it's the license cost is by far the biggest component in the, in the blame pie. All right, Justin, help us close out this segment. We've been talking about this sort of data mesh open, closed snowflake data bricks. Where does Starburst sort of as this engine for the data lake data lake house, the data warehouse fit in this, in this world? >>Yeah. So our view on how the future ultimately unfolds is we think that data lakes will be a natural center of gravity for a lot of the reasons that we described open data formats, lowest total cost of ownership, because you get to choose the cheapest storage available to you. Maybe that's S3 or Azure data lake storage, or Google cloud storage, or maybe it's on-prem object storage that you bought at a, at a really good price. So ultimately storing a lot of data in a deal lake makes a lot of sense, but I think what makes our perspective unique is we still don't think you're gonna get everything there either. We think that basically centralization of all your data assets is just an impossible endeavor. And so you wanna be able to access data that lives outside of the lake as well. So we kind of think of the lake as maybe the biggest place by volume in terms of how much data you have, but to, to have comprehensive analytics and to truly understand your business and understand it holistically, you need to be able to go access other data sources as well. And so that's the role that we wanna play is to be a single point of access for our customers, provide the right level of fine grained access controls so that the right people have access to the right data and ultimately make it easy to discover and consume via, you know, the creation of data products as well. >>Great. Okay. Thanks guys. Right after this quick break, we're gonna be back to debate whether the cloud data model that we see emerging and the so-called modern data stack is really modern, or is it the same wine new bottle? When it comes to data architectures, you're watching the cube, the leader in enterprise and emerging tech coverage. >>Your data is capable of producing incredible results, but data consumers are often left in the dark without fast access to the data they need. Starers makes your data visible from wherever it lives. Your company is acquiring more data in more places, more rapidly than ever to rely solely on a data centralization strategy. Whether it's in a lake or a warehouse is unrealistic. A single source of truth approach is no longer viable, but disconnected data silos are often left untapped. We need a new approach. One that embraces distributed data. One that enables fast and secure access to any of your data from anywhere with Starburst, you'll have the fastest query engine for the data lake that allows you to connect and analyze your disparate data sources no matter where they live Starburst provides the foundational technology required for you to build towards the vision of a decentralized data mesh Starburst enterprise and Starburst galaxy offer enterprise ready, connectivity, interoperability, and security features for multiple regions, multiple clouds and everchanging global regulatory requirements. The data is yours. And with Starburst, you can perform analytics anywhere in light of your world. >>Okay. We're back with Justin Boardman. CEO of Starbust Richard Jarvis is the CTO of EMI health and Theresa tongue is the cloud first technologist from Accenture. We're on July number three. And that is the claim that today's modern data stack is actually modern. So I guess that's the lie it's it is it's is that it's not modern. Justin, what do you say? >>Yeah. I mean, I think new isn't modern, right? I think it's the, it's the new data stack. It's the cloud data stack, but that doesn't necessarily mean it's modern. I think a lot of the components actually are exactly the same as what we've had for 40 years, rather than Terra data. You have snowflake rather than Informatica you have five trend. So it's the same general stack, just, you know, a cloud version of it. And I think a lot of the challenges that it plagued us for 40 years still maintain. >>So lemme come back to you just, but okay. But, but there are differences, right? I mean, you can scale, you can throw resources at the problem. You can separate compute from storage. You really, you know, there's a lot of money being thrown at that by venture capitalists and snowflake, you mentioned it's competitors. So that's different. Is it not, is that not at least an aspect of, of modern dial it up, dial it down. So what, what do you say to that? >>Well, it, it is, it's certainly taking, you know, what the cloud offers and taking advantage of that, but it's important to note that the cloud data warehouses out there are really just separating their compute from their storage. So it's allowing them to scale up and down, but your data still stored in a proprietary format. You're still locked in. You still have to ingest the data to get it even prepared for analysis. So a lot of the same sort of structural constraints that exist with the old enterprise data warehouse model OnPrem still exist just yes, a little bit more elastic now because the cloud offers that. >>So Theresa, let me go to you cuz you have cloud first in your, in your, your title. So what's what say you to this conversation? >>Well, even the cloud providers are looking towards more of a cloud continuum, right? So the centralized cloud, as we know it, maybe data lake data warehouse in the central place, that's not even how the cloud providers are looking at it. They have news query services. Every provider has one that really expands those queries to be beyond a single location. And if we look at a lot of where our, the future goes, right, that that's gonna very much fall the same thing. There was gonna be more edge. There's gonna be more on premise because of data sovereignty, data gravity, because you're working with different parts of the business that have already made major cloud investments in different cloud providers. Right? So there's a lot of reasons why the modern, I guess, the next modern generation of the data staff needs to be much more federated. >>Okay. So Richard, how do you deal with this? You you've obviously got, you know, the technical debt, the existing infrastructure it's on the books. You don't wanna just throw it out. A lot of, lot of conversation about modernizing applications, which a lot of times is a, you know, a microservices layer on top of leg legacy apps. How do you think about the modern data stack? >>Well, I think probably the first thing to say is that the stack really has to include the processes and people around the data as well is all well and good changing the technology. But if you don't modernize how people use that technology, then you're not going to be able to, to scale because just cuz you can scale CPU and storage doesn't mean you can get more people to use your data, to generate you more, more value for the business. And so what we've been looking at is really changing in very much aligned to data products and, and data mesh. How do you enable more people to consume the service and have the stack respond in a way that keeps costs low? Because that's important for our customers consuming this data, but also allows people to occasionally run enormous queries and then tick along with smaller ones when required. And it's a good job we did because during COVID all of a sudden we had enormous pressures on our data platform to answer really important life threatening queries. And if we couldn't scale both our data stack and our teams, we wouldn't have been able to answer those as quickly as we had. So I think the stack needs to support a scalable business, not just the technology itself. >>Well thank you for that. So Justin let's, let's try to break down what the critical aspects are of the modern data stack. So you think about the past, you know, five, seven years cloud obviously has given a different pricing model. De-risked experimentation, you know that we talked about the ability to scale up scale down, but it's, I'm, I'm taking away that that's not enough based on what Richard just said. The modern data stack has to serve the business and enable the business to build data products. I, I buy that. I'm a big fan of the data mesh concepts, even though we're early days. So what are the critical aspects if you had to think about, you know, paying, maybe putting some guardrails and definitions around the modern data stack, what does that look like? What are some of the attributes and, and principles there >>Of, of how it should look like or, or how >>It's yeah. What it should be. >>Yeah. Yeah. Well, I think, you know, in, in Theresa mentioned this in, in a previous segment about the data warehouse is not necessarily going to disappear. It just becomes one node, one element of the overall data mesh. And I, I certainly agree with that. So by no means, are we suggesting that, you know, snowflake or Redshift or whatever cloud data warehouse you may be using is going to disappear, but it's, it's not going to become the end all be all. It's not the, the central single source of truth. And I think that's the paradigm shift that needs to occur. And I think it's also worth noting that those who were the early adopters of the modern data stack were primarily digital, native born in the cloud young companies who had the benefit of, of idealism. They had the benefit of it was starting with a clean slate that does not reflect the vast majority of enterprises. >>And even those companies, as they grow up mature out of that ideal state, they go buy a business. Now they've got something on another cloud provider that has a different data stack and they have to deal with that heterogeneity that is just change and change is a part of life. And so I think there is an element here that is almost philosophical. It's like, do you believe in an absolute ideal where I can just fit everything into one place or do I believe in reality? And I think the far more pragmatic approach is really what data mesh represents. So to answer your question directly, I think it's adding, you know, the ability to access data that lives outside of the data warehouse, maybe living in open data formats in a data lake or accessing operational systems as well. Maybe you want to directly access data that lives in an Oracle database or a Mongo database or, or what have you. So creating that flexibility to really Futureproof yourself from the inevitable change that you will, you won't encounter over time. >>So thank you. So there, based on what Justin just said, I, my takeaway there is it's inclusive, whether it's a data Mar data hub, data lake data warehouse, it's a, just a node on the mesh. Okay. I get that. Does that include there on Preem data? O obviously it has to, what are you seeing in terms of the ability to, to take that data mesh concept on Preem? I mean, most implementations I've seen in data mesh, frankly really aren't, you know, adhering to the philosophy. They're maybe, maybe it's data lake and maybe it's using glue. You look at what JPMC is doing. Hello, fresh, a lot of stuff happening on the AWS cloud in that, you know, closed stack, if you will. What's the answer to that Theresa? >>I mean, I, I think it's a killer case for data. Me, the fact that you have valuable data sources, OnPrem, and then yet you still wanna modernize and take the best of cloud cloud is still, like we mentioned, there's a lot of great reasons for it around the economics and the way ability to tap into the innovation that the cloud providers are giving around data and AI architecture. It's an easy button. So the mesh allows you to have the best of both worlds. You can start using the data products on-prem or in the existing systems that are working already. It's meaningful for the business. At the same time, you can modernize the ones that make business sense because it needs better performance. It needs, you know, something that is, is cheaper or, or maybe just tap into better analytics to get better insights, right? So you're gonna be able to stretch and really have the best of both worlds. That, again, going back to Richard's point, that is meaningful by the business. Not everything has to have that one size fits all set a tool. >>Okay. Thank you. So Richard, you know, talking about data as product, wonder if we could give us your perspectives here, what are the advantages of treating data as a product? What, what role do data products have in the modern data stack? We talk about monetizing data. What are your thoughts on data products? >>So for us, one of the most important data products that we've been creating is taking data that is healthcare data across a wide variety of different settings. So information about patients' demographics about their, their treatment, about their medications and so on, and taking that into a standards format that can be utilized by a wide variety of different researchers because misinterpreting that data or having the data not presented in the way that the user is expecting means that you generate the wrong insight. And in any business, that's clearly not a desirable outcome, but when that insight is so critical, as it might be in healthcare or some security settings, you really have to have gone to the trouble of understanding the data, presenting it in a format that everyone can clearly agree on. And then letting people consume in a very structured, managed way, even if that data comes from a variety of different sources in, in, in the first place. And so our data product journey has really begun by standardizing data across a number of different silos through the data mesh. So we can present out both internally and through the right governance externally to, to researchers. >>So that data product through whatever APIs is, is accessible, it's discoverable, but it's obviously gotta be governed as well. You mentioned you, you appropriately provided to internally. Yeah. But also, you know, external folks as well. So the, so you've, you've architected that capability today >>We have, and because the data is standard, it can generate value much more quickly and we can be sure of the security and, and, and value that that's providing because the data product isn't just about formatting the data into the correct tables, it's understanding what it means to redact the data or to remove certain rows from it or to interpret what a date actually means. Is it the start of the contract or the start of the treatment or the date of birth of a patient? These things can be lost in the data storage without having the proper product management around the data to say in a very clear business context, what does this data mean? And what does it mean to process this data for a particular use case? >>Yeah, it makes sense. It's got the context. If the, if the domains own the data, you, you gotta cut through a lot of the, the, the centralized teams, the technical teams that, that data agnostic, they don't really have that context. All right. Let's send Justin, how does Starburst fit into this modern data stack? Bring us home. >>Yeah. So I think for us, it's really providing our customers with, you know, the flexibility to operate and analyze data that lives in a wide variety of different systems. Ultimately giving them that optionality, you know, and optionality provides the ability to reduce costs, store more in a data lake rather than data warehouse. It provides the ability for the fastest time to insight to access the data directly where it lives. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, you can really create and, and curate, you know, data as a product to be shared and consumed. So we're trying to help enable the data mesh, you know, model and make that an appropriate compliment to, you know, the, the, the modern data stack that people have today. >>Excellent. Hey, I wanna thank Justin Theresa and Richard for joining us today. You guys are great. I big believers in the, in the data mesh concept, and I think, you know, we're seeing the future of data architecture. So thank you. Now, remember, all these conversations are gonna be available on the cube.net for on-demand viewing. You can also go to starburst.io. They have some great content on the website and they host some really thought provoking interviews and, and, and they have awesome resources, lots of data mesh conversations over there, and really good stuff in, in the resource section. So check that out. Thanks for watching the data doesn't lie or does it made possible by Starburst data? This is Dave Valante for the cube, and we'll see you next time. >>The explosion of data sources has forced organizations to modernize their systems and architecture and come to terms with one size does not fit all for data management today. Your teams are constantly moving and copying data, which requires time management. And in some cases, double paying for compute resources. Instead, what if you could access all your data anywhere using the BI tools and SQL skills your users already have. And what if this also included enterprise security and fast performance with Starburst enterprise, you can provide your data consumers with a single point of secure access to all of your data, no matter where it lives with features like strict, fine grained, access control, end to end data encryption and data masking Starburst meets the security standards of the largest companies. Starburst enterprise can easily be deployed anywhere and managed with insights where data teams holistically view their clusters operation and query execution. So they can reach meaningful business decisions faster, all this with the support of the largest team of Trino experts in the world, delivering fully tested stable releases and available to support you 24 7 to unlock the value in all of your data. You need a solution that easily fits with what you have today and can adapt to your architecture. Tomorrow. Starbust enterprise gives you the fastest path from big data to better decisions, cuz your team can't afford to wait. Trino was created to empower analytics anywhere and Starburst enterprise was created to give you the enterprise grade performance, connectivity, security management, and support your company needs organizations like Zolando Comcast and FINRA rely on Starburst to move their businesses forward. Contact us to get started.

Published Date : Aug 22 2022

SUMMARY :

famously said the best minds of my generation are thinking about how to get people to the data warehouse ever have featured parody with the data lake or vice versa is So, you know, despite being the industry leader for 40 years, not one of their customers truly had So Richard, from a practitioner's point of view, you know, what, what are your thoughts? although if you were starting from a Greenfield site and you were building something brand new, Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, I, I think you gotta have centralized governance, right? So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, And you can think of them Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, you know, for many, many years to come. But I think the reality is, you know, the data mesh model basically says, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing that the mesh actually allows you to use all of them. But it creates what I would argue are two, you know, Well, it absolutely depends on some of the tooling and processes that you put in place around those do an analytic queries and with data that's all dispersed all over the, how are you seeing your the best to, to create, you know, data as a product ultimately to be consumed. open platforms are the best path to the future of data But what if you could spend less you create a single point of access to your data, no matter where it's stored. give you the performance and control that you can get with a proprietary system. I remember in the very early days, people would say, you you'll never get performance because And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, know it takes six or seven it is an evolving, you know, spectrum, but, but from your perspective, And what you don't want to end up So Jess, let me play devil's advocate here a little bit, and I've talked to Shaak about this and you know, And I think similarly, you know, being able to connect to an external table that lives in an open data format, Well, that's interesting reminded when I, you know, I see the, the gas price, And I think, you know, I loved what Richard said. not as many te data customers, but, but a lot of Oracle customers and they, you know, And so for those different teams, they can get to an ROI more quickly with different technologies that strike me, you know, the data brick snowflake, you know, thing is, oh, is a lot of fun for analysts So the advice that I saw years ago was if you have open source technologies, And in world of Oracle, you know, normally it's the staff, easy to discover and consume via, you know, the creation of data products as well. really modern, or is it the same wine new bottle? And with Starburst, you can perform analytics anywhere in light of your world. And that is the claim that today's So it's the same general stack, just, you know, a cloud version of it. So lemme come back to you just, but okay. So a lot of the same sort of structural constraints that exist with So Theresa, let me go to you cuz you have cloud first in your, in your, the data staff needs to be much more federated. you know, a microservices layer on top of leg legacy apps. So I think the stack needs to support a scalable So you think about the past, you know, five, seven years cloud obviously has given What it should be. And I think that's the paradigm shift that needs to occur. data that lives outside of the data warehouse, maybe living in open data formats in a data lake seen in data mesh, frankly really aren't, you know, adhering to So the mesh allows you to have the best of both worlds. So Richard, you know, talking about data as product, wonder if we could give us your perspectives is expecting means that you generate the wrong insight. But also, you know, around the data to say in a very clear business context, It's got the context. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, This is Dave Valante for the cube, and we'll see you next time. You need a solution that easily fits with what you have today and can adapt

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
RichardPERSON

0.99+

Dave LantaPERSON

0.99+

Jess BorgmanPERSON

0.99+

JustinPERSON

0.99+

TheresaPERSON

0.99+

Justin BorgmanPERSON

0.99+

TeresaPERSON

0.99+

Jeff OckerPERSON

0.99+

Richard JarvisPERSON

0.99+

Dave ValantePERSON

0.99+

Justin BoardmanPERSON

0.99+

sixQUANTITY

0.99+

DaniPERSON

0.99+

MassachusettsLOCATION

0.99+

20 centsQUANTITY

0.99+

TeradataORGANIZATION

0.99+

OracleORGANIZATION

0.99+

JammaPERSON

0.99+

UKLOCATION

0.99+

FINRAORGANIZATION

0.99+

40 yearsQUANTITY

0.99+

Kurt MonashPERSON

0.99+

20%QUANTITY

0.99+

twoQUANTITY

0.99+

fiveQUANTITY

0.99+

JessPERSON

0.99+

2011DATE

0.99+

StarburstORGANIZATION

0.99+

10QUANTITY

0.99+

AccentureORGANIZATION

0.99+

seven yearsQUANTITY

0.99+

thousandsQUANTITY

0.99+

pythonsTITLE

0.99+

BostonLOCATION

0.99+

GDPRTITLE

0.99+

TodayDATE

0.99+

two modelsQUANTITY

0.99+

Zolando ComcastORGANIZATION

0.99+

GemmaPERSON

0.99+

StarbustORGANIZATION

0.99+

JPMCORGANIZATION

0.99+

FacebookORGANIZATION

0.99+

JavasTITLE

0.99+

todayDATE

0.99+

AWSORGANIZATION

0.99+

millionsQUANTITY

0.99+

first lieQUANTITY

0.99+

10DATE

0.99+

12 yearsQUANTITY

0.99+

one placeQUANTITY

0.99+

TomorrowDATE

0.99+

Starburst The Data Lies FULL V1


 

>>In 2011, early Facebook employee and Cloudera co-founder Jeff Ocker famously said the best minds of my generation are thinking about how to get people to click on ads. And that sucks. Let's face it more than a decade later organizations continue to be frustrated with how difficult it is to get value from data and build a truly agile data-driven enterprise. What does that even mean? You ask? Well, it means that everyone in the organization has the data they need when they need it. In a context that's relevant to advance the mission of an organization. Now that could mean cutting cost could mean increasing profits, driving productivity, saving lives, accelerating drug discovery, making better diagnoses, solving, supply chain problems, predicting weather disasters, simplifying processes, and thousands of other examples where data can completely transform people's lives beyond manipulating internet users to behave a certain way. We've heard the prognostications about the possibilities of data before and in fairness we've made progress, but the hard truth is the original promises of master data management, enterprise data, warehouses, data marts, data hubs, and yes, even data lakes were broken and left us wanting from more welcome to the data doesn't lie, or doesn't a series of conversations produced by the cube and made possible by Starburst data. >>I'm your host, Dave Lanta and joining me today are three industry experts. Justin Borgman is this co-founder and CEO of Starburst. Richard Jarvis is the CTO at EMI health and Theresa tongue is cloud first technologist at Accenture. Today we're gonna have a candid discussion that will expose the unfulfilled and yes, broken promises of a data past we'll expose data lies, big lies, little lies, white lies, and hidden truths. And we'll challenge, age old data conventions and bust some data myths. We're debating questions like is the demise of a single source of truth. Inevitable will the data warehouse ever have featured parody with the data lake or vice versa is the so-called modern data stack, simply centralization in the cloud, AKA the old guards model in new cloud close. How can organizations rethink their data architectures and regimes to realize the true promises of data can and will and open ecosystem deliver on these promises in our lifetimes, we're spanning much of the Western world today. Richard is in the UK. Teresa is on the west coast and Justin is in Massachusetts with me. I'm in the cube studios about 30 miles outside of Boston folks. Welcome to the program. Thanks for coming on. Thanks for having us. Let's get right into it. You're very welcome. Now here's the first lie. The most effective data architecture is one that is centralized with a team of data specialists serving various lines of business. What do you think Justin? >>Yeah, definitely a lie. My first startup was a company called hit adapt, which was an early SQL engine for hit that was acquired by Teradata. And when I got to Teradata, of course, Teradata is the pioneer of that central enterprise data warehouse model. One of the things that I found fascinating was that not one of their customers had actually lived up to that vision of centralizing all of their data into one place. They all had data silos. They all had data in different systems. They had data on prem data in the cloud. You know, those companies were acquiring other companies and inheriting their data architecture. So, you know, despite being the industry leader for 40 years, not one of their customers truly had everything in one place. So I think definitely history has proven that to be a lie. >>So Richard, from a practitioner's point of view, you know, what, what are your thoughts? I mean, there, there's a lot of pressure to cut cost, keep things centralized, you know, serve the business as best as possible from that standpoint. What, what is your experience show? >>Yeah, I mean, I think I would echo Justin's experience really that we, as a business have grown up through acquisition, through storing data in different places sometimes to do information governance in different ways to store data in, in a platform that's close to data experts, people who really understand healthcare data from pharmacies or from, from doctors. And so, although if you were starting from a Greenfield site and you were building something brand new, you might be able to centralize all the data and all of the tooling and teams in one place. The reality is that that businesses just don't grow up like that. And, and it's just really impossible to get that academic perfection of, of storing everything in one place. >>Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, you know, right. You actually did have to have a single version of the truth for certain financial data, but really for those, some of those other use cases, I, I mentioned, I, I do feel like the industry has kinda let us down. What's your take on this? Where does it make sense to have that sort of centralized approach versus where does it make sense to maybe decentralized? >>I, I think you gotta have centralized governance, right? So from the central team, for things like star Oxley, for things like security for certainly very core data sets, having a centralized set of roles, responsibilities to really QA, right. To serve as a design authority for your entire data estate, just like you might with security, but how it's implemented has to be distributed. Otherwise you're not gonna be able to scale. Right? So being able to have different parts of the business really make the right data investments for their needs. And then ultimately you're gonna collaborate with your partners. So partners that are not within the company, right. External partners, we're gonna see a lot more data sharing and model creation. And so you're definitely going to be decentralized. >>So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, on data mesh. It was a great program. You invited Jamma, Dani, of course, she's the creator of the data mesh. And her one of our fundamental premises is that you've got this hyper specialized team that you've gotta go through. And if you want anything, but at the same time, these, these individuals actually become a bottleneck, even though they're some of the most talented people in the organization. So I guess question for you, Richard, how do you deal with that? Do you, do you organize so that there are a few sort of rock stars that, that, you know, build cubes and, and the like, and, and, and, or have you had any success in sort of decentralizing with, you know, your, your constituencies, that data model? >>Yeah. So, so we absolutely have got rockstar, data scientists and data guardians. If you like people who understand what it means to use this data, particularly as the data that we use at emos is very private it's healthcare information. And some of the, the rules and regulations around using the data are very complex and, and strict. So we have to have people who understand the usage of the data, then people who understand how to build models, how to process the data effectively. And you can think of them like consultants to the wider business, because a pharmacist might not understand how to structure a SQL query, but they do understand how they want to process medication information to improve patient lives. And so that becomes a, a consulting type experience from a, a set of rock stars to help a, a more decentralized business who needs to, to understand the data and to generate some valuable output. >>Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, I got a centralized team and that's the most cost effective way to serve the business. Otherwise I got, I got duplication. What do you say to that? >>Well, I, I would argue it's probably not the most cost effective and, and the reason being really twofold. I think, first of all, when you are deploying a enterprise data warehouse model, the, the data warehouse itself is very expensive, generally speaking. And so you're putting all of your most valuable data in the hands of one vendor who now has tremendous leverage over you, you know, for many, many years to come. I think that's the story at Oracle or Terra data or other proprietary database systems. But the other aspect I think is that the reality is those central data warehouse teams is as much as they are experts in the technology. They don't necessarily understand the data itself. And this is one of the core tenants of data mash that that jam writes about is this idea of the domain owners actually know the data the best. >>And so by, you know, not only acknowledging that data is generally decentralized and to your earlier point about SAR, brain Oxley, maybe saving the data warehouse, I would argue maybe GDPR and data sovereignty will destroy it because data has to be decentralized for, for those laws to be compliant. But I think the reality is, you know, the data mesh model basically says, data's decentralized, and we're gonna turn that into an asset rather than a liability. And we're gonna turn that into an asset by empowering the people that know the data, the best to participate in the process of, you know, curating and creating data products for, for consumption. So I think when you think about it, that way, you're going to get higher quality data and faster time to insight, which is ultimately going to drive more revenue for your business and reduce costs. So I think that that's the way I see the two, the two models comparing and contrasting. >>So do you think the demise of the data warehouse is inevitable? I mean, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing infrastructure. Maybe they're gonna build on top of it, but what does that mean? Does that mean the E D w just becomes, you know, less and less valuable over time, or it's maybe just isolated to specific use cases. What's your take on that? >>Listen, I still would love all my data within a data warehouse would love it. Mastered would love it owned by essential team. Right? I think that's still what I would love to have. That's just not the reality, right? The investment to actually migrate and keep that up to date. I would say it's a losing battle. Like we've been trying to do it for a long time. Nobody has the budgets and then data changes, right? There's gonna be a new technology. That's gonna emerge that we're gonna wanna tap into. There's going to be not enough investment to bring all the legacy, but still very useful systems into that centralized view. So you keep the data warehouse. I think it's a very, very valuable, very high performance tool for what it's there for, but you could have this, you know, new mesh layer that still takes advantage of the things. I mentioned, the data products in the systems that are meaningful today and the data products that actually might span a number of systems, maybe either those that either source systems for the domains that know it best, or the consumer based systems and products that need to be packaged in a way that be really meaningful for that end user, right? Each of those are useful for a different part of the business and making sure that the mesh actually allows you to use all of them. >>So, Richard, let me ask you, you take, take Gemma's principles back to those. You got to, you know, domain ownership and, and, and data as product. Okay, great. Sounds good. But it creates what I would argue are two, you know, challenges, self-serve infrastructure let's park that for a second. And then in your industry, the one of the high, most regulated, most sensitive computational governance, how do you automate and ensure federated governance in that mesh model that Theresa was just talking about? >>Well, it absolutely depends on some of the tooling and processes that you put in place around those tools to be, to centralize the security and the governance of the data. And I think, although a data warehouse makes that very simple, cause it's a single tool, it's not impossible with some of the data mesh technologies that are available. And so what we've done at emus is we have a single security layer that sits on top of our data match, which means that no matter which user is accessing, which data source, we go through a well audited well understood security layer. That means that we know exactly who's got access to which data field, which data tables. And then everything that they do is, is audited in a very kind of standard way, regardless of the underlying data storage technology. So for me, although storing the data in one place might not be possible understanding where your source of truth is and securing that in a common way is still a valuable approach and you can do it without having to bring all that data into a single bucket so that it's all in one place. And, and so having done that and investing quite heavily in making that possible has paid dividends in terms of giving wider access to the platform and ensuring that only data that's available under GDPR and other regulations is being used by, by the data users. >>Yeah. So Justin, I mean, Democrat, we always talk about data democratization and you know, up until recently, they really haven't been line of sight as to how to get there. But do you have anything to add to this because you're essentially taking, you know, do an analytic queries and with data that's all dispersed all over the, how are you seeing your customers handle this, this challenge? >>Yeah. I mean, I think data products is a really interesting aspect of the answer to that. It allows you to, again, leverage the data domain owners, people know the data, the best to, to create, you know, data as a product ultimately to be consumed. And we try to represent that in our product as effectively a almost eCommerce like experience where you go and discover and look for the data products that have been created in your organization. And then you can start to consume them as, as you'd like. And so really trying to build on that notion of, you know, data democratization and self-service, and making it very easy to discover and, and start to use with whatever BI tool you, you may like, or even just running, you know, SQL queries yourself, >>Okay. G guys grab a sip of water. After this short break, we'll be back to debate whether proprietary or open platforms are the best path to the future of data excellence, keep it right there. >>Your company has more data than ever, and more people trying to understand it, but there's a problem. Your data is stored across multiple systems. It's hard to access and that delays analytics and ultimately decisions. The old method of moving all of your data into a single source of truth is slow and definitely not built for the volume of data we have today or where we are headed while your data engineers spent over half their time, moving data, your analysts and data scientists are left, waiting, feeling frustrated, unproductive, and unable to move the needle for your business. But what if you could spend less time moving or copying data? What if your data consumers could analyze all your data quickly? >>Starburst helps your teams run fast queries on any data source. We help you create a single point of access to your data, no matter where it's stored. And we support high concurrency, we solve for speed and scale, whether it's fast, SQL queries on your data lake or faster queries across multiple data sets, Starburst helps your teams run analytics anywhere you can't afford to wait for data to be available. Your team has questions that need answers. Now with Starburst, the wait is over. You'll have faster access to data with enterprise level security, easy connectivity, and 24 7 support from experts, organizations like Zolando Comcast and FINRA rely on Starburst to move their businesses forward. Contact our Trino experts to get started. >>We're back with Jess Borgman of Starburst and Richard Jarvis of EVAs health. Okay, we're gonna get to lie. Number two, and that is this an open source based platform cannot give you the performance and control that you can get with a proprietary system. Is that a lie? Justin, the enterprise data warehouse has been pretty dominant and has evolved and matured. Its stack has mature over the years. Why is it not the default platform for data? >>Yeah, well, I think that's become a lie over time. So I, I think, you know, if we go back 10 or 12 years ago with the advent of the first data lake really around Hudu, that probably was true that you couldn't get the performance that you needed to run fast, interactive, SQL queries in a data lake. Now a lot's changed in 10 or 12 years. I remember in the very early days, people would say, you you'll never get performance because you need to be column there. You need to store data in a column format. And then, you know, column formats we're introduced to, to data apes, you have Parque ORC file in aro that were created to ultimately deliver performance out of that. So, okay. We got, you know, largely over the performance hurdle, you know, more recently people will say, well, you don't have the ability to do updates and deletes like a traditional data warehouse. >>And now we've got the creation of new data formats, again like iceberg and Delta and Hodi that do allow for updates and delete. So I think the data lake has continued to mature. And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, know it takes six or seven years to build a functional database. I think that's that's right. And now we've had almost a decade go by. So, you know, these technologies have matured to really deliver very, very close to the same level performance and functionality of, of cloud data warehouses. So I think the, the reality is that's become a line and now we have large giant hyperscale internet companies that, you know, don't have the traditional data warehouse at all. They do all of their analytics in a data lake. So I think we've, we've proven that it's very much possible today. >>Thank you for that. And so Richard, talk about your perspective as a practitioner in terms of what open brings you versus, I mean, look closed is it's open as a moving target. I remember Unix used to be open systems and so it's, it is an evolving, you know, spectrum, but, but from your perspective, what does open give you that you can't get from a proprietary system where you are fearful of in a proprietary system? >>I, I suppose for me open buys us the ability to be unsure about the future, because one thing that's always true about technology is it evolves in a, a direction, slightly different to what people expect. And what you don't want to end up is done is backed itself into a corner that then prevents it from innovating. So if you have chosen a technology and you've stored trillions of records in that technology and suddenly a new way of processing or machine learning comes out, you wanna be able to take advantage and your competitive edge might depend upon it. And so I suppose for us, we acknowledge that we don't have perfect vision of what the future might be. And so by backing open storage technologies, we can apply a number of different technologies to the processing of that data. And that gives us the ability to remain relevant, innovate on our data storage. And we have bought our way out of the, any performance concerns because we can use cloud scale infrastructure to scale up and scale down as we need. And so we don't have the concerns that we don't have enough hardware today to process what we want to do, want to achieve. We can just scale up when we need it and scale back down. So open source has really allowed us to maintain the being at the cutting edge. >>So Jess, let me play devil's advocate here a little bit, and I've talked to Shaak about this and you know, obviously her vision is there's an open source that, that the data meshes open source, an open source tooling, and it's not a proprietary, you know, you're not gonna buy a data mesh. You're gonna build it with, with open source toolings and, and vendors like you are gonna support it, but to come back to sort of today, you can get to market with a proprietary solution faster. I'm gonna make that statement. You tell me if it's a lie and then you can say, okay, we support Apache iceberg. We're gonna support open source tooling, take a company like VMware, not really in the data business, but how, the way they embraced Kubernetes and, and you know, every new open source thing that comes along, they say, we do that too. Why can't proprietary systems do that and be as effective? >>Yeah, well, I think at least with the, within the data landscape saying that you can access open data formats like iceberg or, or others is, is a bit dis disingenuous because really what you're selling to your customer is a certain degree of performance, a certain SLA, and you know, those cloud data warehouses that can reach beyond their own proprietary storage drop all the performance that they were able to provide. So it is, it reminds me kind of, of, again, going back 10 or 12 years ago when everybody had a connector to Haddo and that they thought that was the solution, right? But the reality was, you know, a connector was not the same as running workloads in Haddo back then. And I think similarly, you know, being able to connect to an external table that lives in an open data format, you know, you're, you're not going to give it the performance that your customers are accustomed to. And at the end of the day, they're always going to be predisposed. They're always going to be incentivized to get that data ingested into the data warehouse, cuz that's where they have control. And you know, the bottom line is the database industry has really been built around vendor lockin. I mean, from the start, how, how many people love Oracle today, but our customers, nonetheless, I think, you know, lockin is, is, is part of this industry. And I think that's really what we're trying to change with open data formats. >>Well, that's interesting reminded when I, you know, I see the, the gas price, the tees or gas price I, I drive up and then I say, oh, that's the cash price credit card. I gotta pay 20 cents more, but okay. But so the, the argument then, so let me, let me come back to you, Justin. So what's wrong with saying, Hey, we support open data formats, but yeah, you're gonna get better performance if you, if you keep it into our closed system, are you saying that long term that's gonna come back and bite you cuz you're gonna end up, you mentioned Oracle, you mentioned Teradata. Yeah. That's by, by implication, you're saying that's where snowflake customers are headed. >>Yeah, absolutely. I think this is a movie that, you know, we've all seen before. At least those of us who've been in the industry long enough to, to see this movie play over a couple times. So I do think that's the future. And I think, you know, I loved what Richard said. I actually wrote it down. Cause I thought it was an amazing quote. He said, it buys us the ability to be unsure of the future. Th that that pretty much says it all the, the future is unknowable and the reality is using open data formats. You remain interoperable with any technology you want to utilize. If you want to use spark to train a machine learning model and you want to use Starbust to query via sequel, that's totally cool. They can both work off the same exact, you know, data, data sets by contrast, if you're, you know, focused on a proprietary model, then you're kind of locked in again to that model. I think the same applies to data, sharing to data products, to a wide variety of, of aspects of the data landscape that a proprietary approach kind of closes you in and locks you in. >>So I, I would say this Richard, I'd love to get your thoughts on it. Cause I talked to a lot of Oracle customers, not as many te data customers, but, but a lot of Oracle customers and they, you know, they'll admit, yeah, you know, they're jamming us on price and the license cost they give, but we do get value out of it. And so my question to you, Richard, is, is do the, let's call it data warehouse systems or the proprietary systems. Are they gonna deliver a greater ROI sooner? And is that in allure of, of that customers, you know, are attracted to, or can open platforms deliver as fast in ROI? >>I think the answer to that is it can depend a bit. It depends on your businesses skillset. So we are lucky that we have a number of proprietary teams that work in databases that provide our operational data capability. And we have teams of analytics and big data experts who can work with open data sets and open data formats. And so for those different teams, they can get to an ROI more quickly with different technologies for the business though, we can't do better for our operational data stores than proprietary databases. Today we can back off very tight SLAs to them. We can demonstrate reliability from millions of hours of those databases being run at enterprise scale, but for an analytics workload where increasing our business is growing in that direction, we can't do better than open data formats with cloud-based data mesh type technologies. And so it's not a simple answer. That one will always be the right answer for our business. We definitely have times when proprietary databases provide a capability that we couldn't easily represent or replicate with open technologies. >>Yeah. Richard, stay with you. You mentioned, you know, you know, some things before that, that strike me, you know, the data brick snowflake, you know, thing is, oh, is a lot of fun for analysts like me. You've got data bricks coming at it. Richard, you mentioned you have a lot of rockstar, data engineers, data bricks coming at it from a data engineering heritage. You get snowflake coming at it from an analytics heritage. Those two worlds are, are colliding people like PJI Mohan said, you know what? I think it's actually harder to play in the data engineering. So I E it's easier to for data engineering world to go into the analytics world versus the reverse, but thinking about up and coming engineers and developers preparing for this future of data engineering and data analytics, how, how should they be thinking about the future? What, what's your advice to those young people? >>So I think I'd probably fall back on general programming skill sets. So the advice that I saw years ago was if you have open source technologies, the pythons and Javas on your CV, you commander 20% pay, hike over people who can only do proprietary programming languages. And I think that's true of data technologies as well. And from a business point of view, that makes sense. I'd rather spend the money that I save on proprietary licenses on better engineers, because they can provide more value to the business that can innovate us beyond our competitors. So I think I would my advice to people who are starting here or trying to build teams to capitalize on data assets is begin with open license, free capabilities, because they're very cheap to experiment with. And they generate a lot of interest from people who want to join you as a business. And you can make them very successful early, early doors with, with your analytics journey. >>It's interesting. Again, analysts like myself, we do a lot of TCO work and have over the last 20 plus years. And in world of Oracle, you know, normally it's the staff, that's the biggest nut in total cost of ownership, not an Oracle. It's the it's the license cost is by far the biggest component in the, in the blame pie. All right, Justin, help us close out this segment. We've been talking about this sort of data mesh open, closed snowflake data bricks. Where does Starburst sort of as this engine for the data lake data lake house, the data warehouse fit in this, in this world? >>Yeah. So our view on how the future ultimately unfolds is we think that data lakes will be a natural center of gravity for a lot of the reasons that we described open data formats, lowest total cost of ownership, because you get to choose the cheapest storage available to you. Maybe that's S3 or Azure data lake storage, or Google cloud storage, or maybe it's on-prem object storage that you bought at a, at a really good price. So ultimately storing a lot of data in a deal lake makes a lot of sense, but I think what makes our perspective unique is we still don't think you're gonna get everything there either. We think that basically centralization of all your data assets is just an impossible endeavor. And so you wanna be able to access data that lives outside of the lake as well. So we kind of think of the lake as maybe the biggest place by volume in terms of how much data you have, but to, to have comprehensive analytics and to truly understand your business and understand it holistically, you need to be able to go access other data sources as well. And so that's the role that we wanna play is to be a single point of access for our customers, provide the right level of fine grained access controls so that the right people have access to the right data and ultimately make it easy to discover and consume via, you know, the creation of data products as well. >>Great. Okay. Thanks guys. Right after this quick break, we're gonna be back to debate whether the cloud data model that we see emerging and the so-called modern data stack is really modern, or is it the same wine new bottle? When it comes to data architectures, you're watching the cube, the leader in enterprise and emerging tech coverage. >>Your data is capable of producing incredible results, but data consumers are often left in the dark without fast access to the data they need. Starers makes your data visible from wherever it lives. Your company is acquiring more data in more places, more rapidly than ever to rely solely on a data centralization strategy. Whether it's in a lake or a warehouse is unrealistic. A single source of truth approach is no longer viable, but disconnected data silos are often left untapped. We need a new approach. One that embraces distributed data. One that enables fast and secure access to any of your data from anywhere with Starburst, you'll have the fastest query engine for the data lake that allows you to connect and analyze your disparate data sources no matter where they live Starburst provides the foundational technology required for you to build towards the vision of a decentralized data mesh Starburst enterprise and Starburst galaxy offer enterprise ready, connectivity, interoperability, and security features for multiple regions, multiple clouds and everchanging global regulatory requirements. The data is yours. And with Starburst, you can perform analytics anywhere in light of your world. >>Okay. We're back with Justin Boardman. CEO of Starbust Richard Jarvis is the CTO of EMI health and Theresa tongue is the cloud first technologist from Accenture. We're on July number three. And that is the claim that today's modern data stack is actually modern. So I guess that's the lie it's it is it's is that it's not modern. Justin, what do you say? >>Yeah. I mean, I think new isn't modern, right? I think it's the, it's the new data stack. It's the cloud data stack, but that doesn't necessarily mean it's modern. I think a lot of the components actually are exactly the same as what we've had for 40 years, rather than Terra data. You have snowflake rather than Informatica you have five trend. So it's the same general stack, just, you know, a cloud version of it. And I think a lot of the challenges that it plagued us for 40 years still maintain. >>So lemme come back to you just, but okay. But, but there are differences, right? I mean, you can scale, you can throw resources at the problem. You can separate compute from storage. You really, you know, there's a lot of money being thrown at that by venture capitalists and snowflake, you mentioned it's competitors. So that's different. Is it not, is that not at least an aspect of, of modern dial it up, dial it down. So what, what do you say to that? >>Well, it, it is, it's certainly taking, you know, what the cloud offers and taking advantage of that, but it's important to note that the cloud data warehouses out there are really just separating their compute from their storage. So it's allowing them to scale up and down, but your data still stored in a proprietary format. You're still locked in. You still have to ingest the data to get it even prepared for analysis. So a lot of the same sort of structural constraints that exist with the old enterprise data warehouse model OnPrem still exist just yes, a little bit more elastic now because the cloud offers that. >>So Theresa, let me go to you cuz you have cloud first in your, in your, your title. So what's what say you to this conversation? >>Well, even the cloud providers are looking towards more of a cloud continuum, right? So the centralized cloud, as we know it, maybe data lake data warehouse in the central place, that's not even how the cloud providers are looking at it. They have news query services. Every provider has one that really expands those queries to be beyond a single location. And if we look at a lot of where our, the future goes, right, that that's gonna very much fall the same thing. There was gonna be more edge. There's gonna be more on premise because of data sovereignty, data gravity, because you're working with different parts of the business that have already made major cloud investments in different cloud providers. Right? So there's a lot of reasons why the modern, I guess, the next modern generation of the data staff needs to be much more federated. >>Okay. So Richard, how do you deal with this? You you've obviously got, you know, the technical debt, the existing infrastructure it's on the books. You don't wanna just throw it out. A lot of, lot of conversation about modernizing applications, which a lot of times is a, you know, a microservices layer on top of leg legacy apps. How do you think about the modern data stack? >>Well, I think probably the first thing to say is that the stack really has to include the processes and people around the data as well is all well and good changing the technology. But if you don't modernize how people use that technology, then you're not going to be able to, to scale because just cuz you can scale CPU and storage doesn't mean you can get more people to use your data, to generate you more, more value for the business. And so what we've been looking at is really changing in very much aligned to data products and, and data mesh. How do you enable more people to consume the service and have the stack respond in a way that keeps costs low? Because that's important for our customers consuming this data, but also allows people to occasionally run enormous queries and then tick along with smaller ones when required. And it's a good job we did because during COVID all of a sudden we had enormous pressures on our data platform to answer really important life threatening queries. And if we couldn't scale both our data stack and our teams, we wouldn't have been able to answer those as quickly as we had. So I think the stack needs to support a scalable business, not just the technology itself. >>Well thank you for that. So Justin let's, let's try to break down what the critical aspects are of the modern data stack. So you think about the past, you know, five, seven years cloud obviously has given a different pricing model. De-risked experimentation, you know that we talked about the ability to scale up scale down, but it's, I'm, I'm taking away that that's not enough based on what Richard just said. The modern data stack has to serve the business and enable the business to build data products. I, I buy that. I'm a big fan of the data mesh concepts, even though we're early days. So what are the critical aspects if you had to think about, you know, paying, maybe putting some guardrails and definitions around the modern data stack, what does that look like? What are some of the attributes and, and principles there >>Of, of how it should look like or, or how >>It's yeah. What it should be. >>Yeah. Yeah. Well, I think, you know, in, in Theresa mentioned this in, in a previous segment about the data warehouse is not necessarily going to disappear. It just becomes one node, one element of the overall data mesh. And I, I certainly agree with that. So by no means, are we suggesting that, you know, snowflake or Redshift or whatever cloud data warehouse you may be using is going to disappear, but it's, it's not going to become the end all be all. It's not the, the central single source of truth. And I think that's the paradigm shift that needs to occur. And I think it's also worth noting that those who were the early adopters of the modern data stack were primarily digital, native born in the cloud young companies who had the benefit of, of idealism. They had the benefit of it was starting with a clean slate that does not reflect the vast majority of enterprises. >>And even those companies, as they grow up mature out of that ideal state, they go buy a business. Now they've got something on another cloud provider that has a different data stack and they have to deal with that heterogeneity that is just change and change is a part of life. And so I think there is an element here that is almost philosophical. It's like, do you believe in an absolute ideal where I can just fit everything into one place or do I believe in reality? And I think the far more pragmatic approach is really what data mesh represents. So to answer your question directly, I think it's adding, you know, the ability to access data that lives outside of the data warehouse, maybe living in open data formats in a data lake or accessing operational systems as well. Maybe you want to directly access data that lives in an Oracle database or a Mongo database or, or what have you. So creating that flexibility to really Futureproof yourself from the inevitable change that you will, you won't encounter over time. >>So thank you. So there, based on what Justin just said, I, my takeaway there is it's inclusive, whether it's a data Mar data hub, data lake data warehouse, it's a, just a node on the mesh. Okay. I get that. Does that include there on Preem data? O obviously it has to, what are you seeing in terms of the ability to, to take that data mesh concept on Preem? I mean, most implementations I've seen in data mesh, frankly really aren't, you know, adhering to the philosophy. They're maybe, maybe it's data lake and maybe it's using glue. You look at what JPMC is doing. Hello, fresh, a lot of stuff happening on the AWS cloud in that, you know, closed stack, if you will. What's the answer to that Theresa? >>I mean, I, I think it's a killer case for data. Me, the fact that you have valuable data sources, OnPrem, and then yet you still wanna modernize and take the best of cloud cloud is still, like we mentioned, there's a lot of great reasons for it around the economics and the way ability to tap into the innovation that the cloud providers are giving around data and AI architecture. It's an easy button. So the mesh allows you to have the best of both worlds. You can start using the data products on-prem or in the existing systems that are working already. It's meaningful for the business. At the same time, you can modernize the ones that make business sense because it needs better performance. It needs, you know, something that is, is cheaper or, or maybe just tap into better analytics to get better insights, right? So you're gonna be able to stretch and really have the best of both worlds. That, again, going back to Richard's point, that is meaningful by the business. Not everything has to have that one size fits all set a tool. >>Okay. Thank you. So Richard, you know, talking about data as product, wonder if we could give us your perspectives here, what are the advantages of treating data as a product? What, what role do data products have in the modern data stack? We talk about monetizing data. What are your thoughts on data products? >>So for us, one of the most important data products that we've been creating is taking data that is healthcare data across a wide variety of different settings. So information about patients' demographics about their, their treatment, about their medications and so on, and taking that into a standards format that can be utilized by a wide variety of different researchers because misinterpreting that data or having the data not presented in the way that the user is expecting means that you generate the wrong insight. And in any business, that's clearly not a desirable outcome, but when that insight is so critical, as it might be in healthcare or some security settings, you really have to have gone to the trouble of understanding the data, presenting it in a format that everyone can clearly agree on. And then letting people consume in a very structured, managed way, even if that data comes from a variety of different sources in, in, in the first place. And so our data product journey has really begun by standardizing data across a number of different silos through the data mesh. So we can present out both internally and through the right governance externally to, to researchers. >>So that data product through whatever APIs is, is accessible, it's discoverable, but it's obviously gotta be governed as well. You mentioned you, you appropriately provided to internally. Yeah. But also, you know, external folks as well. So the, so you've, you've architected that capability today >>We have, and because the data is standard, it can generate value much more quickly and we can be sure of the security and, and, and value that that's providing because the data product isn't just about formatting the data into the correct tables, it's understanding what it means to redact the data or to remove certain rows from it or to interpret what a date actually means. Is it the start of the contract or the start of the treatment or the date of birth of a patient? These things can be lost in the data storage without having the proper product management around the data to say in a very clear business context, what does this data mean? And what does it mean to process this data for a particular use case? >>Yeah, it makes sense. It's got the context. If the, if the domains own the data, you, you gotta cut through a lot of the, the, the centralized teams, the technical teams that, that data agnostic, they don't really have that context. All right. Let's send Justin, how does Starburst fit into this modern data stack? Bring us home. >>Yeah. So I think for us, it's really providing our customers with, you know, the flexibility to operate and analyze data that lives in a wide variety of different systems. Ultimately giving them that optionality, you know, and optionality provides the ability to reduce costs, store more in a data lake rather than data warehouse. It provides the ability for the fastest time to insight to access the data directly where it lives. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, you can really create and, and curate, you know, data as a product to be shared and consumed. So we're trying to help enable the data mesh, you know, model and make that an appropriate compliment to, you know, the, the, the modern data stack that people have today. >>Excellent. Hey, I wanna thank Justin Theresa and Richard for joining us today. You guys are great. I big believers in the, in the data mesh concept, and I think, you know, we're seeing the future of data architecture. So thank you. Now, remember, all these conversations are gonna be available on the cube.net for on-demand viewing. You can also go to starburst.io. They have some great content on the website and they host some really thought provoking interviews and, and, and they have awesome resources, lots of data mesh conversations over there, and really good stuff in, in the resource section. So check that out. Thanks for watching the data doesn't lie or does it made possible by Starburst data? This is Dave Valante for the cube, and we'll see you next time. >>The explosion of data sources has forced organizations to modernize their systems and architecture and come to terms with one size does not fit all for data management today. Your teams are constantly moving and copying data, which requires time management. And in some cases, double paying for compute resources. Instead, what if you could access all your data anywhere using the BI tools and SQL skills your users already have. And what if this also included enterprise security and fast performance with Starburst enterprise, you can provide your data consumers with a single point of secure access to all of your data, no matter where it lives with features like strict, fine grained, access control, end to end data encryption and data masking Starburst meets the security standards of the largest companies. Starburst enterprise can easily be deployed anywhere and managed with insights where data teams holistically view their clusters operation and query execution. So they can reach meaningful business decisions faster, all this with the support of the largest team of Trino experts in the world, delivering fully tested stable releases and available to support you 24 7 to unlock the value in all of your data. You need a solution that easily fits with what you have today and can adapt to your architecture. Tomorrow. Starbust enterprise gives you the fastest path from big data to better decisions, cuz your team can't afford to wait. Trino was created to empower analytics anywhere and Starburst enterprise was created to give you the enterprise grade performance, connectivity, security management, and support your company needs organizations like Zolando Comcast and FINRA rely on Starburst to move their businesses forward. Contact us to get started.

Published Date : Aug 20 2022

SUMMARY :

famously said the best minds of my generation are thinking about how to get people to the data warehouse ever have featured parody with the data lake or vice versa is So, you know, despite being the industry leader for 40 years, not one of their customers truly had So Richard, from a practitioner's point of view, you know, what, what are your thoughts? although if you were starting from a Greenfield site and you were building something brand new, Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, I, I think you gotta have centralized governance, right? So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, And you can think of them Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, you know, for many, many years to come. But I think the reality is, you know, the data mesh model basically says, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing that the mesh actually allows you to use all of them. But it creates what I would argue are two, you know, Well, it absolutely depends on some of the tooling and processes that you put in place around those do an analytic queries and with data that's all dispersed all over the, how are you seeing your the best to, to create, you know, data as a product ultimately to be consumed. open platforms are the best path to the future of data But what if you could spend less you create a single point of access to your data, no matter where it's stored. give you the performance and control that you can get with a proprietary system. I remember in the very early days, people would say, you you'll never get performance because And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, know it takes six or seven it is an evolving, you know, spectrum, but, but from your perspective, And what you don't want to end up So Jess, let me play devil's advocate here a little bit, and I've talked to Shaak about this and you know, And I think similarly, you know, being able to connect to an external table that lives in an open data format, Well, that's interesting reminded when I, you know, I see the, the gas price, And I think, you know, I loved what Richard said. not as many te data customers, but, but a lot of Oracle customers and they, you know, And so for those different teams, they can get to an ROI more quickly with different technologies that strike me, you know, the data brick snowflake, you know, thing is, oh, is a lot of fun for analysts So the advice that I saw years ago was if you have open source technologies, And in world of Oracle, you know, normally it's the staff, easy to discover and consume via, you know, the creation of data products as well. really modern, or is it the same wine new bottle? And with Starburst, you can perform analytics anywhere in light of your world. And that is the claim that today's So it's the same general stack, just, you know, a cloud version of it. So lemme come back to you just, but okay. So a lot of the same sort of structural constraints that exist with So Theresa, let me go to you cuz you have cloud first in your, in your, the data staff needs to be much more federated. you know, a microservices layer on top of leg legacy apps. So I think the stack needs to support a scalable So you think about the past, you know, five, seven years cloud obviously has given What it should be. And I think that's the paradigm shift that needs to occur. data that lives outside of the data warehouse, maybe living in open data formats in a data lake seen in data mesh, frankly really aren't, you know, adhering to So the mesh allows you to have the best of both worlds. So Richard, you know, talking about data as product, wonder if we could give us your perspectives is expecting means that you generate the wrong insight. But also, you know, around the data to say in a very clear business context, It's got the context. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, This is Dave Valante for the cube, and we'll see you next time. You need a solution that easily fits with what you have today and can adapt

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
RichardPERSON

0.99+

Dave LantaPERSON

0.99+

Jess BorgmanPERSON

0.99+

JustinPERSON

0.99+

TheresaPERSON

0.99+

Justin BorgmanPERSON

0.99+

TeresaPERSON

0.99+

Jeff OckerPERSON

0.99+

Richard JarvisPERSON

0.99+

Dave ValantePERSON

0.99+

Justin BoardmanPERSON

0.99+

sixQUANTITY

0.99+

DaniPERSON

0.99+

MassachusettsLOCATION

0.99+

20 centsQUANTITY

0.99+

TeradataORGANIZATION

0.99+

OracleORGANIZATION

0.99+

JammaPERSON

0.99+

UKLOCATION

0.99+

FINRAORGANIZATION

0.99+

40 yearsQUANTITY

0.99+

Kurt MonashPERSON

0.99+

20%QUANTITY

0.99+

twoQUANTITY

0.99+

fiveQUANTITY

0.99+

JessPERSON

0.99+

2011DATE

0.99+

StarburstORGANIZATION

0.99+

10QUANTITY

0.99+

AccentureORGANIZATION

0.99+

seven yearsQUANTITY

0.99+

thousandsQUANTITY

0.99+

pythonsTITLE

0.99+

BostonLOCATION

0.99+

GDPRTITLE

0.99+

TodayDATE

0.99+

two modelsQUANTITY

0.99+

Zolando ComcastORGANIZATION

0.99+

GemmaPERSON

0.99+

StarbustORGANIZATION

0.99+

JPMCORGANIZATION

0.99+

FacebookORGANIZATION

0.99+

JavasTITLE

0.99+

todayDATE

0.99+

AWSORGANIZATION

0.99+

millionsQUANTITY

0.99+

first lieQUANTITY

0.99+

10DATE

0.99+

12 yearsQUANTITY

0.99+

one placeQUANTITY

0.99+

TomorrowDATE

0.99+

Starburst panel Q3


 

>>Okay. We're back with Justin Boorman CEO of Starburst. Richard Jarvis is the CTO of EMI health and Teresa tongue is the cloud first technologist from Accenture. We're on July number three. And that is the claim that today's modern data stack is actually modern. So I guess that's the lie or it's it is it's is that it's not modern, Justin, what do you say? >>Yeah, I mean, I think new isn't modern, right? I think it's, the's the new data stack. It's the cloud data stack, but that doesn't necessarily mean it's modern. I think a lot of the components actually are exactly the same as what we've had for 40 years, rather than Terra data. You have snowflake rather than Informatica you have five trend. So it's the same general stack, just, you know, a cloud version of it. And I think a lot of the challenges that it plagued us for 40 years still maintain. >>So lemme come back to you just this, but okay. But, but there are differences, right? I mean, you can scale, you can throw resources at the problem. You can separate compute from storage. You really, you know, there's a lot of money being thrown at that by venture capitalists and snowflake, you mentioned it's competitors. So that's different. Is it not, is that not at least an aspect of, of modern dial it up, dial it down. So what, what do you say to that? >>Well, it, it is, it's certainly taking, you know, what the cloud offers and taking advantage of that, but it's important to note that the cloud data warehouses out there are really just separating their compute from their storage. So it's allowing them to scale up and down, but your data's still stored in a proprietary format. You're still locked in. You still have to ingest the data to get it even prepared for analysis. So a lot of the same sort of structural constraints that exist with the old enterprise data warehouse model OnPrem still exists just, yes, a little bit more elastic now because the cloud offers that. >>So Theresa, let me go to you cuz you have cloud first in your, in your, your title. So what's what say you to this conversation? >>Well, even the cloud providers are looking towards more of a cloud continuum, right? So the centralized cloud, as we know it, maybe data lake data warehouse in the central place, that's not even how the cloud providers are looking at it. They have news query services. Every provider has one that really expands those queries to be beyond a single location. And if we look at a lot of where our, the future goes, right, that that's gonna very much fall the same thing. There was gonna be more edge. There's gonna be more on premise because of data sovereignty, data gravity, because you're working with different parts of the business that have already made major cloud investments in different cloud providers. Right? So there's a lot of reasons why the modern, I guess the next modern generation of the data staff needs to be much more federated. >>Okay. So Richard, how do you deal with this? You you've obviously got, you know, the technical debt, the existing infrastructure it's on the books. You don't wanna just throw it out. A lot of, lot of conversation about modernizing applications, which a lot of times is a, you know, of microservices layer on top of leg legacy apps. Ho how do you think about the modern data stack? >>Well, I think probably the first thing to say is that the stack really has to include the processes and people around the data as well is all well and good changing the technology. But if you don't modernize how people use that technology, then you're not going to be able to, to scale because just cuz you can scale CPU and storage doesn't mean you can get more people to use your data, to generate you more value for the business. And so what we've been looking at is really changing in very much aligned to data products and, and data mesh. How do you enable more people to consume the service and have the stack respond in a way that keeps costs low? Because that's important for our customers consuming this data, but also allows people to occasionally run enormous queries and then tick along with smaller ones when required. And it's a good job we did because during COVID all of a sudden we had enormous pressures on our data platform to answer really important life threatening queries. And if we couldn't scale both our data stack and our teams, we wouldn't have been able to answer those as quickly as we had. So I think the stack needs to support a scalable business, not just the technology itself. >>Oh thank you for that. So Justin let's, let's try to break down what the critical aspects are of the modern data stack. So you think about the past, you know, five, seven years cloud obviously has given a different pricing model. Drisk experimentation, you know that we talked about the ability to scale up scale down, but it's, I'm, I'm taking away that that's not enough based on what Richard just said. The modern data stack has to serve the business and enable the business to build data products. I, I buy that I'm, you know, a big fan of the data mesh concepts, even though we're early days. So what are the critical aspects if you had to think about, you know, the paying, maybe putting some guardrails and definitions around the modern data stack, what does that look like? What are some of the attributes and principles there >>Of, of how it should look like or, or how >>Yeah. What it should be? >>Yeah. Yeah. Well, I think, you know, in Theresa mentioned this in, in a previous segment about the data warehouse is not necessarily going to disappear. It just becomes one node, one element of the overall data mesh. And I, I certainly agree with that. So by no means, are we suggesting that, you know, snowflake or Redshift or whatever cloud data warehouse you may be using is going to disappear, but it's, it's not going to become the end all be all. It's not the, the central single source of truth. And I think that's the paradigm shift that needs to occur. And I think it's also worth noting that those who were the early adopters of the modern data stack were primarily digital, native born in the cloud young companies who had the benefit of, of idealism. They had the benefit of starting with a clean slate that does not reflect the vast majority of enterprises. >>And even those companies, as they grow up mature out of that ideal state, they go by a business. Now they've got something on another cloud provider that has a different data stack and they have to deal with that heterogeneity that is just change and change is a part of life. And so I think there is an element here that is almost philosophical. It's like, do you believe in an absolute ideal where I can just fit everything into one place or do I believe in reality? And I think the far more pragmatic approach is really what data mesh represents. So to answer your question directly, I think it's adding, you know, the ability to access data that lives outside of the data warehouse, maybe living in open data formats in a data lake or accessing operational systems as well. Maybe you want to directly access data that lives in an Oracle database or a Mongo database or, or what have you. So creating that flexibility to really Futureproof yourself from the inevitable change that you will, you won't encounter over time. >>So thank you. So there, based on what Justin just said, I, I might take away there is it's inclusive, whether it's a data Mart, data hub, data lake data warehouse, it's a, just a node on the mesh. Okay. I get that. Does that include Theresa on, on Preem data? Obviously it has to, what are you seeing in terms of the ability to, to take that data mesh concept on pre I mean most implementations I've seen and data mesh, frankly really aren't, you know, adhering to the philosophy there. Maybe, maybe it's data lake and maybe it's using glue. You look at what JPMC is doing. Hello, fresh, a lot of stuff happening on the AWS cloud in that, you know, closed stack, if you will. What's the answer to that Theresa? >>I mean, I, I think it's a killer case for data mesh. The fact that you have valuable data sources, OnPrem, and then yet you still wanna modernize and take the best of cloud cloud is still, like we mentioned, there's a lot of great reasons for it around the economics and the way ability to tap into the innovation that the cloud providers are giving around data and AI architecture. It's an easy button. So the mesh allows you to have the best of both world. You can start using the data products on-prem or in the existing systems that are working already. It's meaningful for the business. At the same time, you can modernize the ones that make business sense because it needs better performance. It needs, you know, something that is, is cheaper or, or maybe just tap into better analytics to get better insights, right? So you're gonna be able to stretch and really have the best of both worlds that, again, going back to Richard's point, that is needful by the business. Not everything has to have that one size fits all set a tool. >>Okay. Thank you. So Richard, you know, you're talking about data as product. Wonder if we could give us your perspectives here, what are the advantages of treating data as a product? What, what role do data products have in the modern data stack? We talk about monetizing data. What are your thoughts on data products? >>So for us, one of the most important data products that we've been creating is taking data that is healthcare data across a wide variety of different settings. So information about patients' demographics about their, their treatment, about their medications and so on, and taking that into a standards format that can be utilized by a wide variety of different researchers because misinterpreting that data or having the data not presented in the way that the user is expecting means that you generate the wrong insight and in any business, that's clearly not a desirable outcome, but when that insight is so critical, as it might be in healthcare or some security settings, you really have to have gone to the trouble of understanding the data, presenting it in a format that everyone can clearly agree on. And then letting people consume in a very structured and managed way, even if that data comes from a variety of different sources in, in, in the first place. And so our data product journey has really begun by standardizing data across a number of different silos through the data mesh. So we can present out both internally and through the right governance externally to, to research is >>So that data product through whatever APIs is, is accessible, it's discoverable, but it's obviously gotta be governed as well. You mentioned appropriately provided to internally. Yeah. But also, you know, external folks as well. So the, so you've, you've architected that capability today >>We have and because the data is standard, it can generate value much more quickly and we can be sure of the security and, and, and value that that's providing because the data product isn't just about formatting the data into the right, correct tables, it's understanding what it means to redact the data or to remove certain rows from it or to interpret what a date actually means. Is it the start of the contract or the start of the treatment or the date of birth of a patient? These things can be lost in the data storage without having the proper product management around the data to say in a very clear business context, what does this data mean? And what does it mean to process this data for a particular use >>Case? Yeah, it makes sense. It's got the context. If the, if the domains on the data, you, you gotta cut through a lot of the, the, the centralized teams, the technical teams that, that data agnostic, they don't really have that context. All right. Let's end, Justin, how does Starburst fit into this modern data stack? Bring us home. >>Yeah. So I think for us, it's really providing our customers with, you know, the flexibility to operate and analyze data that lives in a wide variety of different systems. Ultimately giving them that optionality, you know, and optionality provides the ability to reduce costs, store more in a data lake rather than data warehouse. It provides the ability for the fastest time to insight to access the data directly where it lives. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, you can really create and, and curate, you know, data as a product to be shared and consumed. So we're trying to help enable the data mesh, you know, model and make that an appropriate compliment to, you know, the, the, the modern data stack that people have today. >>Excellent. Hey, I wanna thank Justin Teresa and Richard for joining us today. You guys are great. I big believers in the, in the data mesh concept, and I think, you know, we're seeing the future of data architecture. So thank you. Now, remember, all these conversations are gonna be available on the cube.net for on-demand viewing. You can also go to starburst.io. They have some great content on the website and they host some really thought provoking interviews and, and, and they have awesome resources, lots of data mesh conversations over there, and really good stuff in, in the resource section. So check that out. Thanks for watching the data doesn't lie or does it made possible by Starburst data? This is Dave ante for the, and we'll see you next time.

Published Date : Aug 2 2022

SUMMARY :

And that is the claim that today's So it's the same general stack, So lemme come back to you just this, but okay. So a lot of the same sort of structural So Theresa, let me go to you cuz you have cloud first in your, in your, So the centralized cloud, as we know it, maybe data lake data warehouse in the central place, a, you know, of microservices layer on top of leg legacy apps. you can get more people to use your data, to generate you more value for the business. So you think about the past, you know, five, seven years cloud obviously has given And I think that's the paradigm shift that needs to occur. from the inevitable change that you will, you won't encounter over time. and data mesh, frankly really aren't, you know, adhering to So the mesh allows you to have the best of both world. So Richard, you know, you're talking about data as product. that data or having the data not presented in the way that the user But also, you know, external folks as well. the proper product management around the data to say in a very clear business It's got the context. So we're trying to help enable the data mesh, you know, I big believers in the, in the data mesh concept, and I think, you know,

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
RichardPERSON

0.99+

TheresaPERSON

0.99+

Richard JarvisPERSON

0.99+

JustinPERSON

0.99+

Justin BoormanPERSON

0.99+

DavePERSON

0.99+

AWSORGANIZATION

0.99+

fiveQUANTITY

0.99+

40 yearsQUANTITY

0.99+

StarburstORGANIZATION

0.99+

AccentureORGANIZATION

0.99+

40 yearsQUANTITY

0.99+

JPMCORGANIZATION

0.99+

bothQUANTITY

0.99+

Justin TeresaPERSON

0.99+

both worldsQUANTITY

0.99+

todayDATE

0.98+

first thingQUANTITY

0.98+

TeresaPERSON

0.98+

first technologistQUANTITY

0.98+

OracleORGANIZATION

0.98+

firstQUANTITY

0.98+

one elementQUANTITY

0.97+

InformaticaORGANIZATION

0.97+

cube.netOTHER

0.97+

MongoORGANIZATION

0.97+

starburst.ioOTHER

0.96+

seven yearsQUANTITY

0.95+

oneQUANTITY

0.95+

data MartORGANIZATION

0.91+

one placeQUANTITY

0.88+

both worldQUANTITY

0.85+

COVIDTITLE

0.83+

single locationQUANTITY

0.8+

OnPremORGANIZATION

0.8+

TerraORGANIZATION

0.77+

single sourceQUANTITY

0.74+

one sizeQUANTITY

0.73+

EMI healthORGANIZATION

0.73+

July numberDATE

0.7+

dataORGANIZATION

0.64+

five trendQUANTITY

0.63+

moneyQUANTITY

0.51+

threeQUANTITY

0.37+

Starburst Panel Q1


 

>>In 2011, early Facebook employee and Cloudera co-founder Jeff Ocker famously said the best minds of my generation are thinking about how to get people to click on ads. And that sucks. Let's face it more than a decade later organizations continue to be frustrated with how difficult it is to get value from data and build a truly agile data driven enterprise. What does that even mean? You ask? Well, it means that everyone in the organization has the data they need when they need it. In a context that's relevant to advance the mission of an organization. Now that could mean cutting costs could mean increasing profits, driving productivity, saving lives, accelerating drug discovery, making better diagnoses, solving, supply chain problems, predicting weather disasters, simplifying processes, and thousands of other examples where data can completely transform people's lives beyond manipulating internet users to behave a certain way. We've heard the prognostications about the possibilities of data before and in fairness we've made progress, but the hard truth is the original promises of master data management, enterprise data, warehouses, data, Mars, data hubs, and yes, even data lakes were broken and left us wanting for more welcome to the data doesn't lie, or does it a series of conversations produced by the cube and made possible by Starburst data. >>I'm your host, Dave Lanta and joining me today are three industry experts. Justin Borgman is this co-founder and CEO of Starburst. Richard Jarvis is the CTO at EMI health and Theresa tongue is cloud first technologist at Accenture. Today we're gonna have a candid discussion that will expose the unfulfilled and yes, broken promises of a data past we'll expose data lies, big lies, little lies, white lies, and hidden truths. And we'll challenge, age old data conventions and bust some data myths. We're debating questions like is the demise of a single source of truth. Inevitable will the data warehouse ever have feature parody with the data lake or vice versa is the so-called modern data stack simply centralization in the cloud, AKA the old guards model in new cloud close. How can organizations rethink their data architectures and regimes to realize the true promises of data can and will and open ecosystem deliver on these promises in our lifetimes, we're spanning much of the Western world today. Richard is in the UK. Teresa is on the west coast and Justin is in Massachusetts with me. I'm in the cube studios about 30 miles outside of Boston folks. Welcome to the program. Thanks for coming on. Thanks for having us. Let's get right into it. You're very welcome. Now here's the first lie. The most effective data architecture is one that is centralized with a team of data specialists serving various lines of business. What do you think Justin? >>Yeah, definitely a lie. My first startup was a company called hit adapt, which was an early SQL engine for IDU that was acquired by Teradata. And when I got to Teradata, of course, Terada is the pioneer of that central enterprise data warehouse model. One of the things that I found fascinating was that not one of their customers had actually lived up to that vision of centralizing all of their data into one place. They all had data silos. They all had data in different systems. They had data on-prem data in the cloud. You know, those companies were acquiring other companies and inheriting their data architecture. So, you know, despite being the industry leader for 40 years, not one of their customers truly had everything in one place. So I think definitely history has proven that to be a lie. >>So Richard, from a practitioner's point of view, you know, what, what are your thoughts? I mean, there, there's a lot of pressure to cut cost, keep things centralized, you know, serve the business as best as possible from that standpoint. What, what is your experience, Joe? >>Yeah, I mean, I think I would echo Justin's experience really that we, as a business have grown up through acquisition, through storing data in different places sometimes to do information governance in different ways to store data in, in a platform that's close to data experts, people who really understand healthcare data from pharmacies or from, from doctors. And so, although if you were starting from a Greenfield site and you were building something brand new, you might be able to centralize all the data and all of the tooling and teams in one place. The reality is that that businesses just don't grow up like that. And, and it's just really impossible to get that academic perfection of, of storing everything in one place. >>Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, you know? Right. But you actually did have to have a single version of the truth for certain financial data, but really for those, some of those other use cases, I, I mentioned, I, I do feel like the industry has kinda let us down. What's your take on this? Where does it make sense to have that sort of centralized approach versus where does it make sense to maybe decentralized? >>I, I think you gotta have centralized governance, right? So from the central team, for things like swans Oxley, for things like security, for certain very core data sets, having a centralized set of roles, responsibilities to really QA, right. To serve as a design authority for your entire data estate, just like you might with security, but how it's implemented has to be distributed. Otherwise you're not gonna be able to scale. Right? So being able to have different parts of the business really make the right data investments for their needs. And then ultimately you're gonna collaborate with your partners. So partners that are not within the company, right. External partners, we're gonna see a lot more data sharing and model creation. And so you're definitely going to be decentralized. >>So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, on data mesh. It was a great program. You invited JAK, Dani, of course, she's the creator of the data mesh. And her one of our fundamental premises is that you've got this hyper specialized team that you've gotta go through. And if you want anything, but at the same time, these, these individuals actually become a bottleneck, even though they're some of the most talented people in the organization. So I guess question for you, Richard, how do you deal with that? Do you, do you organize so that there are a few sort of rock stars that, that, you know, build cubes and, and the like, and, and, and, or have you had any success in sort of decentralizing with, you know, your, your constituencies, that data model? >>Yeah. So, so we absolutely have got rockstar, data scientists and data guardians. If you like people who understand what it means to use this data, particularly as the data that we use at emos is very private it's healthcare information. And some of the, the rules and regulations around using the data are very complex and, and strict. So we have to have people who understand the usage of the data, then people who understand how to build models, how to process the data effectively. And you can think of them like consultants to the wider business, because a pharmacist might not understand how to structure a SQL query, but they do understand how they want to process medication information to improve patient lives. And so that becomes a, a consulting type experience from a, a set of rock stars to help a, a more decentralized business who needs to, to understand the data and to generate some valuable output. >>Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, I got a centralized team and that's the most cost effective way to serve the business. Otherwise I got, I got duplication. What do you say to that? >>Well, I, I would argue it's probably not the most cost effective and, and the reason being really twofold. I think, first of all, when you are deploying a enterprise data warehouse model, the, the data warehouse itself is very expensive, generally speaking. And so you're putting all of your most valuable data in the hands of one vendor who now has tremendous leverage over you, you know, for many, many years to come, I think that's the story of Oracle or Terra data or other proprietary database systems. But the other aspect I think is that the reality is those central data warehouse teams is as much as they are experts in the technology. They don't necessarily understand the data itself. And this is one of the core tenets of data mash that that jam writes about is this idea of the domain owners actually know the data the best. >>And so by, you know, not only acknowledging that data is generally decentralized and to your earlier point about, so Oxley, maybe saving the data warehouse, I would argue maybe GDPR and data sovereignty will destroy it because data has to be decentralized for, for those laws to be compliant. But I think the reality is, you know, the data mesh model basically says, data's decentralized, and we're gonna turn that into an asset rather than a liability. And we're gonna turn that into an asset by empowering the people that know the data, the best to participate in the process of, you know, curating and creating data products for, for consumption. So I think when you think about it, that way, you're going to get higher quality data and faster time to insight, which is ultimately going to drive more revenue for your business and reduce costs. So I think that that's the way I see the two, the two models comparing and con contrasting. >>So do you think the demise of the data warehouse is inevitable? I mean, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing infrastructure. Maybe they're gonna build on top of it, but the, what does that mean? Does that mean the ed w just becomes, you know, less and less valuable over time, or it's maybe just isolated to specific use cases. What's your take on that? >>Listen, I still would love all my data within a data warehouse would love it. Mastered would love it owned by essential team. Right? I think that's still what I would love to have. That's just not the reality, right? The investment to actually migrate and keep that up to date. I would say it's a losing battle. Like we've been trying to do it for a long time. Nobody has the budgets and then data changes, right? There's gonna be a new technology. That's gonna emerge that we're gonna wanna tap into. There's gonna be not enough investment to bring all the legacy, but still very useful systems into that centralized view. So you keep the data warehouse. I think it's a very, very valuable, very high performance tool for what it's there for, but you could have this, you know, new mesh layer that still takes advantage of the things. I mentioned, the data products in the systems that are meaningful today and the data products that actually might span a number of systems. Maybe either those that either source systems, the domains that know it best, or the consumer based systems and products that need to be packaged in a way that be really meaningful for that end user, right? Each of those are useful for a different part of the business and making sure that the mesh actually allows you to lose all of them. >>So, Richard, let me ask you, you take, take Gemma's principles back to those. You got, you know, the domain ownership and, and, and data as product. Okay, great. Sounds good. But it creates what I would argue or two, you know, challenges self-serve infrastructure let's park that for a second. And then in your industry, one of the high, most regulated, most sensitive computational governance, how do you automate and ensure federated governance in that mesh model that Theresa was just talking about? >>Well, it absolutely depends on some of the tooling and processes that you put in place around those tools to be, to centralize the security and the governance of the data. And, and I think, although a data warehouse makes that very simple, cause it's a single tool, it's not impossible with some of the data mesh technologies that are available. And so what we've done at EMI is we have a single security layer that sits on top of our data mesh, which means that no matter which user is accessing, which data source, we go through a well audited well understood security layer. That means that we know exactly who's got access to which data field, which data tables. And then everything that they do is, is audited in a very kind of standard way, regardless of the underlying data storage technology. So for me, although storing the data in one place might not be possible understanding where your source of truth is and securing that in a common way is still a valuable approach and you can do it without having to bring all that data into a single bucket so that it's all in one place. >>And, and so having done that and investing quite heavily in making that possible has paid dividends in terms of giving wider access to the platform and ensuring that only data that's available under GDPR and other regulations is being used by, by the data users. >>Yeah. So Justin mean Democrat, we always talk about data democratization and you know, up until recently, they really haven't been line of sight as to how to get there. But do you have anything to add to this because you're essentially taking, you know, doing analytic queries and with data, that's all dispersed all over the, how are you seeing your customers handle this, this challenge? >>Yeah, I mean, I think data products is a really interesting aspect of the answer to that. It allows you to, again, leverage the data domain owners, people know the data, the best to, to create, you know, data as a product ultimately to be consumed. And we try to represent that in our product as effectively, almost eCommerce, like experience where you go and discover and look for the data products that have been created in your organization. And then you can start to consume them as, as you'd like. And so really trying to build on that notion of, you know, data democratization and self-service, and making it very easy to discover and, and start to use with whatever BI tool you, you may like, or even just running, you know, SQL queries yourself. >>Okay. G guys grab a sip of water. After the short break, we'll be back to debate whether proprietary or open platforms are the best path to the future of data excellence. Keep it right there.

Published Date : Aug 2 2022

SUMMARY :

famously said the best minds of my generation are thinking about how to get people to Teresa is on the west coast and Justin is in Massachusetts with me. So, you know, despite being the industry leader for 40 years, not one of their customers truly had So Richard, from a practitioner's point of view, you know, what, what are your thoughts? you might be able to centralize all the data and all of the tooling and teams in one place. Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, I, I think you gotta have centralized governance, right? of rock stars that, that, you know, build cubes and, and the like, And you can think of them like consultants Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, you know, for many, many years to come, I think that's the story of Oracle or Terra data or other proprietary But I think the reality is, you know, the data mesh model basically says, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing you know, new mesh layer that still takes advantage of the things. But it creates what I would argue or two, you know, Well, it absolutely depends on some of the tooling and processes that you put in place around And, and so having done that and investing quite heavily in making that possible But do you have anything to add to this because you're essentially taking, you know, the best to, to create, you know, data as a product ultimately to be consumed. open platforms are the best path to the future of

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Dave LantaPERSON

0.99+

DaniPERSON

0.99+

RichardPERSON

0.99+

Justin BorgmanPERSON

0.99+

JustinPERSON

0.99+

Jeff OckerPERSON

0.99+

TheresaPERSON

0.99+

Richard JarvisPERSON

0.99+

TeresaPERSON

0.99+

MassachusettsLOCATION

0.99+

TeradataORGANIZATION

0.99+

40 yearsQUANTITY

0.99+

OracleORGANIZATION

0.99+

UKLOCATION

0.99+

twoQUANTITY

0.99+

JoePERSON

0.99+

GDPRTITLE

0.99+

JAKPERSON

0.99+

2011DATE

0.99+

StarburstORGANIZATION

0.99+

BostonLOCATION

0.99+

thousandsQUANTITY

0.99+

two modelsQUANTITY

0.99+

EMIORGANIZATION

0.99+

FacebookORGANIZATION

0.99+

GemmaPERSON

0.99+

TeradaORGANIZATION

0.99+

AccentureORGANIZATION

0.99+

EachQUANTITY

0.99+

first lieQUANTITY

0.99+

todayDATE

0.99+

first startupQUANTITY

0.98+

ClouderaORGANIZATION

0.98+

TodayDATE

0.98+

SQLTITLE

0.98+

first technologistQUANTITY

0.97+

one placeQUANTITY

0.97+

DemocratORGANIZATION

0.97+

singleQUANTITY

0.97+

about 30 milesQUANTITY

0.97+

oneQUANTITY

0.96+

three industry expertsQUANTITY

0.95+

more than a decade laterDATE

0.94+

OneQUANTITY

0.94+

hit adaptORGANIZATION

0.94+

Terra dataORGANIZATION

0.93+

GreenfieldLOCATION

0.92+

single sourceQUANTITY

0.91+

single toolQUANTITY

0.91+

OxleyPERSON

0.91+

one vendorQUANTITY

0.9+

single bucketQUANTITY

0.9+

single versionQUANTITY

0.88+

about a year agoDATE

0.85+

Theresa tonguePERSON

0.83+

emosORGANIZATION

0.82+

MarsORGANIZATION

0.8+

swans OxleyPERSON

0.77+

IDUTITLE

0.69+

firstQUANTITY

0.59+

a secondQUANTITY

0.55+

Sarbanes OxleyORGANIZATION

0.53+

MasteredPERSON

0.45+

Q1QUANTITY

0.37+

Zubin Chagpar, AWS | AWSPS Summit Bahrain 2019


 

>> from Bahrain. It's the Q covering AWS Public sector Bahrain brought to you by Amazon Web service is >> Welcome back to the cubes coverage here in by rain in the Middle East for AWS Summit wrapping up event here with the cloud computing shift that's happening. Amazon regions live lot of innovation in the area Middle East and Africa. We're here with the head executive of A W s who manages the territory. Suban Shag, part head of Middle East Africa for public sector. Thanks for coming on. Appreciate it. >> Absolutely. John, Thank you. Thank you for having me here. And thanks for being here with your team, learning about this story from the Middle East and, of course, all the way to North Africa and Africa as well to >> Well, it's been a lot of work that you're on your team have done it been successful, were super interested because there's a big story developing here. The Amazon region, which was announced two years ago, is available on Day one. Pun intended a lot of couples shipping their products on the cloud, but it's also a cultural revolution happening for a whole another generation that could change the landscape of the Middle East. Culture, technology, innovation, banking, entrepreneurship. This is a grass roots groundswell of change, empowerment, self esteem, money making, banking. I mean, all this that's going on >> absolute in the middle of it. Well, hey, we're proud to be part of it, and we're proud to be partners with many different organizations here as well. I think it's actually coming from all different directions, not just the ground swell, but even the leadership, the leadership of the countries that can Bahrain, but also the other countries in the Middle East and across the region. What we're seeing is that these countries air creating these vision plants, and these vision plans air about diversifying the economy, creating opportunity for youth, creating opportunity for folks that are diversifying from the economy. So then they're actually being displaced, and where do you turn to sew? These countries are turning to technology and in ah, Bahrain, in particular, they've set up programs where they helped to develop youth. They helped develop citizens, they could get free training on technology, all that to do, sees it and take it. And we've seen that uptake as well to have seen a lot of Bahrainis just taking courses, getting trained and then building. And that's what you witnessed today at our summit. >> You know, innovation takes on many forms you've seen in every way from the mobile wave. You saw, um, countries that didn't have good infrastructure, stand up mobile, fisher and Beam or connected than some of the incumbent countries that had connectivity last mile into >> an elite leap is >> the lead frog, right? So So this is happening here. But the trend we're seeing is that the old business school adage of focus on your core competency and then it's okay to outsource that that's not a core competency is a trend we're seeing with Cloud where you guys are providing the infrastructure for countries, not just businesses. Yeah, to focus on their core competency, and it's making things go faster. So speed is the table stakes, so it's slow. It's probably the old way. What are some of the conversations around this? Because this seems to be the hottest topic from things I've seen in the United States with CIA and Jed I that Therese has been involved in two education with educate and now here with the government modernizing with cloud first. Yeah, that's trickling throughout the entire country. >> Yeah, well, I mean, it's it's need, I'm part of that is needs. So in some of the countries in the region barring included, they wanted to spend controls. They want to reduce their costs. You want to get away from legacy, They want to get away from licenses they're not using or hardware that's sitting on shelves. So then they come and look at what we're doing in the same Well, wait a second. You're investing in the region. You're making infrastructure available. I can build on top of that, I can leverage open source I can create I could make citizen service is it's almost a no brainer for them to one of the biggest problems they have is that the train people and then those folks go in. They do start ups where they're going to start work for other businesses. Well, which is fantastic. It's actually creating that Gertrude a cycle that we want to be a part of his well too. So already starting to see that happening in the past couple of years. Have you been here? And of course, now we have the region lunch, which is only accelerating the journey. >> Talk about the business because you guys are active. Your presents early with a region which we think is a revitalisation, creates economic value. That's something we've been reporting on. But there's more than that. It's not just by rain. It's the whole region that you cover. What's the business landscape like? What if some of the deals you're doing? What's the startups look like when you talk about some of the the landscape dynamics? >> Absolutely. So first, let's start up Eco system is very, um, I was gonna say robust, but it's it's some to use the word that use. It's a groundswell Right now. We're seeing a lot of interest, Lot of activity. Ah, lot of folks getting into in experimenting very quickly. But it's not easy as we know. Doing a start. It's not easy, so that robust war will come with time. Ah, we're seeing that more funding is going into the space now. It's still very anemic. So in the Middle East, there's not enough funds going in there, especially for the early stage. We're seeing interest coming from outside for the later stages. We're talking about the creams, and we're talking about the cabbages and some of those companies, but at the earliest, ones >> that are clearly validated, growing like >> exactly the ones who have already earned market share with ones that air. Starting right now, they're finding their way. So we didn't do you need more funding for experimentation? Enter AWS and we provide is programs like Activate, which is an early stage start, a program. We work very closely with the V C's and the accelerators, so they're leveraging that now, too. So we're starting to see that growth, whether it's in Dubai or Abu Dhabi or Riyadh or Cairo or here. It's just starting to grow, which is great. And we're seeing that interest and some of that, um, should I go into this field? Would what When my mom's safe, I'm an entrepreneur versus a doctor that started swaying Now to where people say entrepreneurship is actually pretty >> cool, it's a legitimate field. >> It's a legitimate feeling that, believe it or not, which is excited, and people are going into it as well. On the enterprise side, s Emi's all the way to large enterprises. We're seeing that folks are saying, Hey, I can actually maybe even get new markets. So if you look at Al Tayyar Group, which are telltale Travel Group, which is out of Saudi Arabia, they're using that to learn more about their custom. Customers come up with new solutions and new packages that they can offer, all using a I and ML, which is incredible or fly Dubai FlyDubai in, In. In Dubai, you probably guessed that Ah, they need to stand up very quickly and online booking system. They did it in four months. It's leveraging cloud. So they're getting that they're getting that idea that you can kick it off real quick, get it running, and then launch it as well, too, and then the government. So we're seeing governments across the region saying, Hey, we've got a controller cost first of all, and we've got to make a better citizen service is we want to make sure that we improve the lives of citizens. So been able to launch new service is in a short amount of time. Today we heard about the I G A. Working on a building permit system that allows people to build quicker because they're able to get a permit in days instead of months. This is an example scene where speed matters to the point where we now have the government challenging business in terms of moving quicker, which is unheard of anywhere else in the world, which is super exciting, >> not certainly in the United States, that's for sure. I would like just to point out that from my standpoint, I heard this many V C. C are seeing them in the hallways mix of veces entrepreneurs business, globalize V's and New Eyes bees. The Amazon partner networks here. Yeah, it is partner Network. So it's all the things are in place now. Yes, What's missing? What do you see? It's to do items. Where do you think that you can raise the bar for AWS? What's what do you sees as to do items for you? What's your plan? >> Well, to your point, I mean, all the ingredients are here. So in terms of what's missing, I don't think there's anything missing spoke putting the ingredients together and build baking that cake. That's what we gotta work on right now, too. So that's why we're doing a lot of activity in the schools across the region, getting our program Educate and Academy, which of two different programs, ones broad. One is very directed to the schools. That is what we're helping to make sure that we get the next generation of learners. We're working on items, a cloud degrees. So in four years you can actually a degree leveraging all the technologies that are enabled by cloud. That's happened at the University of Bahrain, and we hope to get that in other schools. But also just getting training out there, just getting quick training. So you learn a little mint modules and you go and build something that's another area. Need spend more time and again encouraging experimentation. Try some new things, get it out there, see what the market says. So in terms of what we could do more of it. AWS is continue to push that message. Continue developing the people out here so that they're building. >> I gotta ask you, you know, you've been in Silicon Valley. You know that game for the folks living in silicon value in the U. S. What should they know about what's going on in your territory out here that they may not know? Or maybe this was fleeting thought or something that's not being reported. What's what. What's should did people know about what's going on in this region? >> Absolutely. I think sometimes for for many markets that are outside of Silicon Valley outside of the U. S. It's about copycats, start ups, which is fine. You can actually do quite well with that. Ah, usually the returns on that when you go to an exit event is less than what you might see it in the valley. Ah, but again, there's a great triage opportunity. So you want to bring the funds here? But I do see now a change where folks are building original technologies as well, too. And I think we're going to see more of the multiples, tens or 20 time multiples out here in the region, which would be wonderful. Ah, we saw a company today. You know, phonic, which is from Saudi Arabia that is building telecommunications, and, uh, contact invoice type of service is that's gonna be huge if they get this right. And I think they will, because they got a really scrappy and hungry team. That's one of those companies that actually could start getting people to notice the Middle East is a place for original innovation. >> Awesome. Um, anything else on your to do list for next year? What's Ah, Give a quick plug for your crew. You guys hiring? I see the region here. You gotta fill staff here. Actually got curriculum in the schools here. What are you looking >> for? You coming all the big What's not here to do? Yeah. You know, into this is continue to develop my team. We got a wonderful team here. Ah, lot of people that are local to the, uh, to the region. Ah, that We want to make sure that they're growing and contributing. A swell too. Our team missions to do good into. Well, at the same time, I think they go hand in hand. So we want to do more around programs that help to develop communities. You know, there are refugee crisis is around the region. We want to make sure we can help out over there. Ah, women in tech is a big area of focus for us. How can we get more women into technology and leading and technology as well? To have got my team having a mandate to get more solution architects that our women as well to we don't have one yet. We have a lot of great women on our team, but we need more technical women, too. That's another key focus area for us. Um, and just continue building continued to help the communities build solutions on the cloud. >> Zubin Chang, part here, head of the territory here in the Middle East and Africa. That's the cubes coverage of eight of US Summit and buyer in the Middle East. Signing off our second year. We'll be back with more next year and you'll see it around in the territory. Thanks for Thanks for watching.

Published Date : Sep 15 2019

SUMMARY :

from Bahrain. It's the Q covering AWS Amazon regions live lot of innovation in the area Middle East and Africa. And thanks for being here with your team, the landscape of the Middle East. creating opportunity for folks that are diversifying from the economy. than some of the incumbent countries that had connectivity last mile into But the trend we're seeing is that the old So in some of the countries in the region It's the whole region that you cover. So in the Middle East, there's not enough funds going in there, especially for the early stage. exactly the ones who have already earned market share with ones that air. So they're getting that they're getting that idea that you can kick it off real quick, So it's all the things are in place now. That's happened at the University of Bahrain, and we hope to get that in other schools. You know that game for the folks living in silicon value in the U. here in the region, which would be wonderful. I see the region here. You coming all the big What's not here to do? That's the cubes coverage of eight of US Summit and buyer in the Middle East.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DubaiLOCATION

0.99+

Suban ShagPERSON

0.99+

RiyadhLOCATION

0.99+

CairoLOCATION

0.99+

AWSORGANIZATION

0.99+

Zubin ChangPERSON

0.99+

Zubin ChagparPERSON

0.99+

Middle EastLOCATION

0.99+

CIAORGANIZATION

0.99+

JohnPERSON

0.99+

AmazonORGANIZATION

0.99+

Abu DhabiLOCATION

0.99+

North AfricaLOCATION

0.99+

Al Tayyar GroupORGANIZATION

0.99+

next yearDATE

0.99+

Silicon ValleyLOCATION

0.99+

United StatesLOCATION

0.99+

Saudi ArabiaLOCATION

0.99+

GertrudePERSON

0.99+

AfricaLOCATION

0.99+

JedPERSON

0.99+

TodayDATE

0.99+

BahrainLOCATION

0.99+

todayDATE

0.99+

oneQUANTITY

0.99+

tensQUANTITY

0.98+

University of BahrainORGANIZATION

0.98+

20QUANTITY

0.98+

four yearsQUANTITY

0.98+

U. S.LOCATION

0.98+

OneQUANTITY

0.98+

two years agoDATE

0.97+

U. S.LOCATION

0.97+

firstQUANTITY

0.97+

A WORGANIZATION

0.97+

BahrainisPERSON

0.97+

two different programsQUANTITY

0.96+

AWS SummitEVENT

0.96+

four monthsQUANTITY

0.94+

Middle East AfricaLOCATION

0.93+

two educationQUANTITY

0.93+

TheresePERSON

0.93+

second yearQUANTITY

0.92+

EmiPERSON

0.91+

eightQUANTITY

0.9+

telltale Travel GroupORGANIZATION

0.89+

onesQUANTITY

0.87+

past couple of yearsDATE

0.86+

mobileEVENT

0.86+

US SummitEVENT

0.85+

Amazon WebORGANIZATION

0.82+

couplesQUANTITY

0.8+

AWSPS SummitEVENT

0.8+

Day oneQUANTITY

0.78+

AmazonLOCATION

0.65+

secondQUANTITY

0.5+

CORGANIZATION

0.49+

waveEVENT

0.49+

2019EVENT

0.48+

FlyDubaiCOMMERCIAL_ITEM

0.44+