Justin Borgman, Starburst & Ashwin Patil, Deloitte | AWS re:Invent 2022

(electronic music) (graphics whoosh) (graphics tinkle) >> Welcome to Las Vegas! It's theCUBE live at AWS re:Invent '22. Lisa Martin here with Dave Vellante. Dave, it is not only great to be back, but this re:Invent seems to be bigger than last year for sure. >> Oh, definitely. I'd say it's double last year. I'd say it's comparable to 2019. Maybe even a little bigger, I've heard it's the largest re:Invent ever. And we're going to talk data, one of our favorite topics. >> We're going to talk data products. We have some great guests. One of them is an alumni who's back with us. Justin Borgman, the CEO of Starburst, and Ashwin Patil also joins us, Principal AI and Data Engineering at Deloitte. Guys, welcome to the program. >> Thank you. >> Together: Thank you. >> Justin, define data products. Give us the scoop, what's goin' on with Starburst. But define data products and the value in it for organizations of productizing data. >> Mm-hmm. So, data products are curated data sets that are able to span across multiple data sets. And I think that's what's makes it particularly unique, is you can span across multiple data sources to create federated data products that allow you to really bring together the business value that you're seeking. And I think ultimately, what's driving the interest in data products is a desire to ultimately facilitate self-service consumption within the enterprise. I think that's the holy grail that we've all been building towards. And data products represents a framework for sort of how you would do that. >> So, monetization is not necessarily a criterion? >> Not necessarily. (Dave's voice drowns) >> But it could be. >> It could be. It can be internal data products or external data products. And in either case, it's really intended to facilitate easier discovery and consumption of data. >> Ashwin, bringing you into the conversation, talk about some of the revenue drivers that data products can help organizations to unlock. >> Sure. Like Justin said, there are internal and external revenue drivers. So internally, a lot of clients are focused around, hey, how do I make the most out of my modernization platform? So, a lot of them are thinking about what AI, what analytics, what can they run to drive consumption? And when you think about consumption, consumption typically requires data from across the enterprise, right? And data from the enterprise is sometimes fragmented in pieces, in places. So, we've gone from being data in too many places to now, data products, helping bring all of that together, and really aid, drive business decisions faster with more data and more accuracy, right? Externally, a lot of that has got to do with how the ecosystems are evolving for data products that use not only company data, but also the ecosystem data that includes customers, that include suppliers and vendors. >> I mean, conceptually, data products, you could say have been around a long time. When I think of financial services, I think that's always been a data product in a sense. But suddenly, there's a lot more conversation about it. There's data mesh, there's data fabric, we could talk about that too, but why do you think now it's coming to the fore again? >> Yeah, I mean, I think it's because historically, there's always been this disconnect between the people that understand data infrastructure, and the people who know the right questions to ask of the data. Generally, these have been two very distinct groups. And so, the interest in data mesh as you mentioned, and data products as a foundational element of it, is really centered around how do we bring these groups together? How do we get the people who know the data the best to participate in the process of creating data to be consumed? Ultimately, again, trying to facilitate greater self-service consumption. And I think that's the real beauty behind it. And I think increasingly, in today's world, people are realizing the data will always be decentralized to some degree. That notion of bringing everything together into one single database has never really been successfully achieved, and is probably even further from the truth at this point in time, given you've got data on-prem and multiple clouds, and multiple different systems. And so, data products and data mesh represents, again, a framework for you to sort of think about data that lives everywhere. >> We did a session this summer with (chuckles) Justin and I, and some others on the data lies. And that was one of the good ol' lies, right? There's a single source of truth. >> Justin: Right. >> And all that is, we've probably never been further from the single source of truth. But actually, you're suggesting that there's maybe multiple truths that the same data can support. Is that a right way to think about it? >> Yeah, exactly. And I think ultimately, you want a single point of access that gives you, at your fingertips, everything that your organization knows about its business today. And that's really what data products aims to do, is sort of curate that for you, and provide high quality data sets that you can trust, that you can now self-service to answer your business question. >> One of the things that, oh, go ahead. >> No, no, I was just going to say, I mean, if you pivot it from the way the usage of data has changed, right? Traditionally, IT has been in the business of providing data to the business users. Today, with more self-service being driven, we want business users to be the drivers of consumption, right? So if you take that backwards one step, it's basically saying, what data do I need to support my business needs, such that IT doesn't always have to get involved in providing that data, or providing the reports on top of that data? So, the data products concept, I think supports that thinking of business-led technology-enabled, or IT-enabled really well. >> Business led. One of the things that Adam Zelinsky talked with John Furrier about just a week or so ago in their pre re:Invent interview, was talking about the role of the data analyst going away. That everybody in an organization, regardless of function, will be able to eventually be a data analyst, and need to evaluate and analyze data for their roles. Talk about data products as a facilitator of that democratization. >> Yeah. We are seeing more and more the concept of citizen data scientists. We are seeing more and more citizens AI. What we are seeing is a general trend, as we move towards self-service, there is going to be a need for business users to be able to access data when they want, how they want, and merge data across the enterprise in ways that they haven't done before, right? Technology today, through products like data products, right, provides you the access to do that. And that's why we are going to see this movement of people of seeing people become more and more self-service oriented, where you're going to democratize the use of AI and analytics into the business users. >> Do you think, when you talk to a data analyst, by the way, about that, he or she will be like, yeah, mm, maybe, good luck with that. So, do ya think maybe there's a sort of an interim step? Because we've had these highly, ZeMac lays this out very well. We've had these highly-centralized, highly-specialized teams. The premise being, oh, that's less expensive. Perhaps data analysts, like functions, get put into the line of business. Do you see that as a bridge or a stepping stone? Because it feels like it's quite a distance between what a data analyst does today, and this nirvana that we talk about. What are your thoughts on that? >> Yeah, I mean, I think there's possibly a new role around a data product manager. Much the way you have product managers in the products you actually build to sell, you might need data product managers to help facilitate and curate the high quality data products that others can consume. And I think that becomes an interesting and important, a skill set. Much the way that data scientist was created as a occupation, if you will, maybe 10 years ago, when previously, those were statisticians, or other names. >> Right. A big risk that many clients are seeing around data products is, how do you drive governance? And to that, to the point that Justin's making, we are going to see that role evolve where governance in the world, where data products are getting democratized is going to become increasingly important in terms of how are data products being generated, how is the propensity of data products towards a more governed environment being managed? And that's going to continue to play an important role as data products evolve. >> Okay, so how do you guys fit, because you take ZeMac's four principles, domain ownership, data as product. And that creates two problems. Governance. (chuckles) Right? How do you automate, and self-service, infrastructure and automated governance. >> Yep. >> Tell us what role Starburst plays in solving all of those, but the latter two in particular. >> Yeah. Well, we're working on all four of those dimensions to some degree, but I think ultimately, what we're focused today is the governance piece, providing fine-grained access controls, which is so important, if you're going to have a single point of access, you better have a way of controlling who has access to what. But secondly, data products allows you to really abstract away or decouple where the data is stored from the business meaning of the data. And I think that's what's so key here is, if we're going to ultimately democratize data as we've talked about, we need to change the conversation from a very storage-centric world, like, oh, that table lives in this system or that system, or that system. And make it much more about the data, and the value that it represents. And I think that's what data products aims to do. >> What about data fabric? I have to say, I'm confused by data fabric. I read this, I feel like Gartner just threw it in there to muck it up. And say, no, no, we get to make up the terms, but I've read data mesh versus data fabric, is data fabric just more sort of the physical infrastructure? And data mesh is more of an organizational construct, or how do you see it? >> Yeah, I'm happy to take that or. So, I mean, to me, it's a little bit of potato potato. I think there are some subtle differences. Data fabric is a little bit more about data movement. Whereas, I think data mesh is a little bit more about accessing the data where it lies. But they're both trying to solve the similar problem, which is that we have data in a wide variety of different data sets. And for us to actually analyze it, we need to have a single view. >> Because Gartner hype cycle says data mesh is DOA- >> Justin: I know. >> Which I think is complete BS, I think is real. You talk to customers that are doing it, they're doing it on AWS, they're trying to extend it across clouds, I mean, it's a real trend. I mean, anyway, that's how I see it. >> Yeah. I feel the word data fabric many a times gets misused. Because when you think about the digitization movement that happened, started almost a decade ago, many companies tried to digitize or create digital twins of their systems into the data world, right? So, everything has an underlying data fabric that replicates what's happening transactionally, or otherwise in the real world. What data mesh does is creates structure that works complimentary to the data fabric, that then lends itself to data products, right? So to me, data products becomes a medium, which drives the connection between data mesh and data fabric into the real world for usage and consumption. >> You should write for Gartner. (all laugh) That's the best explanation I've heard. That made sense! >> That really did. That was excellent. So, when we think about any company these days has to be a data company, whether it's your grocery store, a gas station, a car dealer, what can companies do to start productizing their data, so that they can actually unlock new revenue streams, new routes to market? What are some steps and recommendations that you have? Justin, we'll start with you. >> Sure. I would say the first thing is find data that is ultimately valuable to the consumers within your business, and create a product of it. And the way you do that at Starburst is allow you to essentially create a view of your data that can span multiple data sources. So again, we're decoupling where the data lives. That might be a table that lives in a traditional data warehouse, a table that lives in an operational system like Mongo, a table that lives in a data lake. And you can actually join those together, and represent it as a view, and now make it easily consumable. And so, the end user doesn't need to know, did that live in a data warehouse, an operational database, or a data lake? I'm just accessing that. And I think that's a great, easy way to start in your journey. Because I think if you absorb all the elements of data mesh at once, it can feel overwhelming. And I think that's a great way to start. >> Irrespective of physical location. >> Yes. >> Right? >> Precisely. Yep, multicloud, hybrid cloud, you name it. >> And when you think about the broader landscape, right? For the traditionally, companies that only looked at internal data as a way of driving business decisions. More and more, as things evolve into industry, clouds, or ecosystem data, and companies start going beyond their four walls in terms of the data that they manage or the data that they use to make decisions, I think data products are going to play more and more an important part in that construct where you don't govern all the data that our entities within that ecosystem will govern parts of their data, but that data lives together in the form of data products that are governed somewhat centrally. I mean, kind of like a blockchain system, but not really. >> Justin, for our folks here, as we kind of wrap the segment here, what's the bumper sticker for Starburst, and how you're helping organizations to really be able to build data products that add value to their organization? >> I would say analytics anywhere. Our core ethos is, we want to give you the ability to access data wherever it lives, and understand your business holistically. And our query engine allows you to do that from a query perspective, and data products allows you to bring that up a level and make it consumable. >> Make it consumable. Ashwin, last question for you, here we are, day one of re:Invent, loads of people behind us. Tomorrow all the great keynotes kick up. What are you hoping to take away from re:Invent '22? >> Well, I'm hoping to understand how all of these different entities that are represented here connect with each other, right? And to me, Starburst is an important player in terms of how do you drive connectivity. And to me, as we help plans from a Deloitte perspective, drive that business value, connectivity across all of the technology players is extremely important part. So, integration across those technology players is what I'm trying to get from re:Invent here. >> And so, you guys do, you're dot connectors. (Ashwin chuckles) >> Exactly, excellent. Guys, thank you so much for joining David and me on the program tonight. We appreciate your insights, your time, and probably the best explanation of data fabric versus data mesh. (Justin chuckles) And data products that we've maybe ever had on the show! We appreciate your time, thank you. >> Together: Thank you- >> Thanks, guys. >> All right. For our guests and Dave Vellante, I'm Lisa Martin, you're watching theCUBE, the leader in enterprise and emerging tech coverage. (electronic music)

Published Date : Nov 29 2022

SUMMARY :

Dave, it is not only great to be back, I've heard it's the Justin Borgman, the CEO of Starburst, and the value in it for that are able to span really intended to facilitate into the conversation, And data from the enterprise coming to the fore again? And so, the interest in data mesh and some others on the data lies. And all that is, we've And I think ultimately, you want data do I need to support One of the things that Adam Zelinsky and merge data across the enterprise into the line of business. in the products you And that's going to continue And that creates two problems. all of those, but the data products aims to do. And data mesh is more of an about accessing the data where it lies. You talk to customers that are doing it, and data fabric into the real world That's the best explanation I've heard. recommendations that you have? And the way you do that cloud, you name it. in terms of the data that they manage the ability to access Tomorrow all the great keynotes kick up. And to me, as we help plans And so, you guys do, And data products that we've the leader in enterprise

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Justin	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Justin Borgman	PERSON	0.99+
Dave	PERSON	0.99+
Adam Zelinsky	PERSON	0.99+
Las Vegas	LOCATION	0.99+
Starburst	ORGANIZATION	0.99+
Gartner	ORGANIZATION	0.99+
2019	DATE	0.99+
John Furrier	PERSON	0.99+
Deloitte	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Tomorrow	DATE	0.99+
Ashwin Patil	PERSON	0.99+
two	QUANTITY	0.99+
last year	DATE	0.99+
Ashwin	PERSON	0.99+
Today	DATE	0.98+
Mongo	ORGANIZATION	0.98+
One	QUANTITY	0.98+
two very distinct groups	QUANTITY	0.98+
two problems	QUANTITY	0.98+
tonight	DATE	0.97+
single source	QUANTITY	0.97+
both	QUANTITY	0.97+
10 years ago	DATE	0.96+
today	DATE	0.96+
single point	QUANTITY	0.96+
four	QUANTITY	0.95+
this summer	DATE	0.94+
a decade ago	DATE	0.93+
one step	QUANTITY	0.92+
one single database	QUANTITY	0.91+
secondly	QUANTITY	0.89+
Starburst	TITLE	0.87+
ZeMac	ORGANIZATION	0.86+
one	QUANTITY	0.84+
double	QUANTITY	0.84+
single view	QUANTITY	0.83+
re	EVENT	0.83+
first thing	QUANTITY	0.81+
re:Invent	EVENT	0.8+
re:Invent '22	EVENT	0.78+
a week or so ago	DATE	0.72+
four principles	QUANTITY	0.68+
Invent 2022	EVENT	0.67+
Invent	EVENT	0.57+
CEO	PERSON	0.56+
once	QUANTITY	0.53+

Justin Borgman, Starburst and Teresa Tung, Accenture | AWS re:Invent 2021

>>Hey, welcome back to the cubes. Continuing coverage of AWS reinvent 2021. I'm your host, Lisa Martin. This is day two, our first full day of coverage. But day two, we have two life sets here with AWS and its ecosystem partners to remote sets over a hundred guests on the program. We're going to be talking about the next decade of cloud innovation, and I'm pleased to welcome back to cube alumni to the program. Justin Borkman is here, the co-founder and CEO of Starburst and Teresa Tung, the cloud first chief technologist at Accenture guys. Welcome back to the queue. Thank you. Thank you for having me. Good to have you back. So, so Teresa, I was doing some research on you and I see you are the most prolific prolific inventor at Accenture with over 220 patents and patent applications. That's huge. Congratulations. Thank you. Thank you. And I love your title. I think it's intriguing. I'd like to learn a little bit more about your role cloud-first chief technologist. Tell me about, >>Well, I get to think about the future of cloud and if you think about clouded powers, everything experiences in our everyday lives and our homes and our car in our stores. So pretty much I get to be cute, right? The rest of Accenture's James Bond >>And your queue. I like that. Wow. What a great analogy. Just to talk to me a little bit, I know service has been on the program before, but give me a little bit of an overview of the company, what you guys do. What were some of the gaps in the markets that you saw a few years ago and said, we have an idea to solve this? Sure. >>So Starburst offers a distributed query engine, which essentially means we're able to run SQL queries on data anywhere, uh, could be in traditional relational databases, data lakes in the cloud on-prem. And I think that was the gap that we saw was basically that people had data everywhere and really had a challenge with how they analyze that data. And, uh, my co-founders are the creators of an open source project originally called Presto now called Trino. And it's how Facebook and Netflix and Airbnb and, and a number of the internet companies run their analytics. And so our idea was basically to take that, commercialize that and make it enterprise grade for the thousands of other companies that are struggling with data management, data analytics problems. >>And that's one of the things we've seen explode during the last 22 months, among many other things is data, right? In every company. These days has to be a data company. If they're not, there's a competitor in the rear view rear view mirror, ready to come and take that place. We're going to talk about the data mesh Teresa, we're going to start with you. This is not a new car. This is a new concept. Talk to us about what a data mesh is and why organizations need to embrace this >>Approach. So there's a canonical definition about data mesh with four attributes and any data geek or data architect really resonates with them. So number one, it's really routed decentralized domain ownership. So data is not within a single line of business within a single entity within a single partner has to be across different domains. Second is publishing data as products. And so instead of these really, you know, technology solutions, data sets, data tables, really thinking about the product and who's going to use it. The third one is really around self-service infrastructure. So you want everybody to be able to use those products. And finally, number four, it's really about federated and global governance. So even though their products, you really need to make sure that you're doing the right things, but what's data money. >>We're not talking about a single tool here, right? This is more of a, an approach, a solution. >>It is a data strategy first and foremost, right? So companies, they are multi-cloud, they have many projects going on, they are on premise. So what do you do about it? And so that's the reality of the situation today, and it's first and foremost, a business strategy and framework to think about the data. And then there's a new architecture that underlines and supports that >>Just didn't talk to me about when you're having customer conversations. Obviously organizations need to have a core data strategy that runs the business. They need to be able to, to democratize really truly democratized data access across all business units. What are some of the, what are some of your customer conversations like are customers really embracing the data strategy, vision and approach? >>Yeah, well, I think as you alluded to, you know, every business is data-driven today and the pandemic, if anything has accelerated digital transformation in that move to become data-driven. So it's imperative that every business of every shape and size really put the power of data in the hands of everyone within their organization. And I think part of what's making data mesh resonates so well, is that decentralization concept that Teresa spoke about? Like, I think companies acknowledge that data is inherently decentralized. They have a lot of different database systems, different teams and data mesh is a framework for thinking about that. Then not only acknowledges that reality, but also braces it and basically says there's actually advantages to this decentralized approach. And so I think that's, what's driving the interest level in the data mesh, uh, paradigm. And it's been exciting to work with customers as they think about that strategy. And I think that, you know, essentially every company in the space is, is in transition, whether they're moving from on cloud to the prem, uh, to, uh, sorry, from on-prem to the cloud or from one cloud to another cloud or undergoing that digital transformation, they have left behind data everywhere. And so they're, they're trying to wrestle with how to grasp that. >>And there's, we know that there's so much value in data. The, the need is to be able to get it, to be able to analyze it quickly in real time. I think another thing we learned in the pandemic is it real-time is no longer a nice to have. It is essential for businesses in every organization. So Theresa let's talk about how Accenture and servers are working together to take the data mesh from a concept of framework and put this into production into execution. >>Yeah. I mean, many clients are already doing some aspect of the data mesh as I listed those four attributes. I'm sure everybody thought like I'm already doing some of this. And so a lot of that is reviewing your existing data projects and looking at it from a data product landscape we're at Amazon, right? Amazon famous for being customer obsessed. So in data, we're not always customer obsessed. We put up tables, we put up data sets, feature stores. Who's actually going to use this data. What's the value from it. And I think that's a big change. And so a lot of what we're doing is helping apply that product lens, a literal product lens and thinking about the customer. >>So what are some w you know, we often talk about outcomes, everything being outcomes focused and customers, vendors wanting to help customers deliver big outcomes, you know, cost reduction, et cetera, things like that. How, what are some of the key outcomes Theresa that the data mesh framework unlocks for organizations in any industry to be able to leverage? >>Yeah. I mean, it really depends on the product. Some of it is organizational efficiency and data-driven decisions. So just by the able to see the data, see what's happening now, that's great. But then you have so beyond the, now what the, so what the analytics, right. Both predictive prescriptive analytics. So what, so now I have all this data I can analyze and drive and predict. And then finally, the, what if, if I have this data and my partners have this data in this mesh, and I can use it, I can ask a lot of what if and, and kind of game out scenarios about what if I did things differently, all of this in a very virtualized data-driven fashion, >>Right? Well, we've been talking about being data-driven for years and years and years, but it's one thing to say that it's a whole other thing to actually be able to put that into practice and to use it, to develop new products and services, delight customers, right. And, and really achieve the competitive advantage that businesses want to have. Just so talk to me about how your customer conversations have changed in the last 22 months, as we've seen this massive acceleration of digital transformation companies initially, really trying to survive and figure out how to pivot, not once, but multiple times. How are those customer conversations changing now is as that data strategy becomes core to the survival of every business and its ability to thrive. >>Yeah. I mean, I think it's accelerated everything and, and that's been obviously good for companies like us and like Accenture, cause there's a lot of work to be done out there. Um, but I think it's a transition from a storage centric mindset to more of an analytics centric mindset. You know, I think traditionally data warehousing has been all about moving data into one central place. And, and once you get it there, then you can analyze it. But I think companies don't have the time to wait for that anymore. Right there, there's no time to build all the ETL pipelines and maintain them and get all of that data together. We need to shorten that time to insight. And that's really what we, what we've been focusing on with our, with our customers, >>Shorten that time to insight to get that value out of the data faster. Exactly. Like I said, you know, the time is no longer a nice to have. It's an absolute differentiator for folks in every business. And as, as in our consumer lives, we have this expectation that we can get whatever we want on our phone, on any device, 24 by seven. And of course now in our business lives, we're having the same expectation, but you have to be able to unlock that access to that data, to be able to do the analytics, to make the decisions based on what the data say. Are you, are you finding our total? Let's talk about a little bit about the go to market strategy. You guys go in together. Talk to me about how you're working with AWS, Theresa, we'll start with you. And then Justin we'll head over to you. Okay. >>Well, a lot of this is powered by the cloud, right? So being able to imagine a new data business to run the analytics on it and then push it out, all of that is often cloud-based. But then the great thing about data mesh it's it gives you a framework to look at and tap into multi-cloud on-prem edge data, right? Data that can't be moved because it is a private and secure has to be at the edge and on-prem so you need to have that's their data reality. And the cloud really makes this easier to do. And then with data virtualization, especially coming from the digital natives, we know it scales >>Just to talk to me about it from your perspective that the GTL. >>Yeah. So, I mean, I think, uh, data mesh is really about people process and technology. I think Theresa alluded to it as a strategy. It's, it's more than just technology. Obviously we bring some of that technology to bear by allowing customers to query the data where it lives. But the people in process side is just as important training people to kind of think about how they do data management, data analytics differently is essential thinking about how to create data as a product. That's one of the core principles that Theresa mentioned, you know, that's where I think, um, you know, folks like Accenture can be really instrumental in helping people drive that transformational change within their organization. And that's >>Hard. Transformational change is hard with, you know, the last 22 months. I've been hard on everyone for every reason. How are you facilitating? I'm curious, like to get Theresa, we'll start with you, your perspectives on how our together as servers and Accenture, with the power of AWS, helping to drive that cultural change within organizations. Because like we talked about Justin there, nobody has extra time to waste on anything these days. >>The good news is there's that imperative, right? Every business is a digital business. We found that our technology leaders, right, the top 10% investors in digital, they are outperforming are the laggards. So before pandemic, it's times to post pep devek times five, so there's a need to change. And so data is really the heart of the company. That's how you unlock your technical debt into technical wealth. And so really using cloud and technologies like Starburst and data virtualization is how we can actually do that. >>And so how do you, Justin, how does Starburst help organizations transfer that technical debt or reduce it? How does the D how does the data much help facilitate that? Because we talk about technical debt and it can, it can really add up. >>Yeah, well, a lot of people use us, uh, or think about us as an abstraction layer above the different data sources that they have. So they may have legacy data sources today. Um, then maybe they want to move off of over time, um, could be classical data, warehouses, other classical, uh, relational databases, perhaps they're moving to the cloud. And by leveraging Starburst as this abstraction, they can query the data that they have today, while in the background, moving data into the cloud or moving it into the new data stores that they want to utilize. And it sort of hides that complexity. It decouples the end user experience, the business analyst, the data scientists from where the data lives. And I think that gives people a lot of freedom and a lot of optionality. And I think, you know, the only constant is change. Um, and so creating an architecture that can stand the test of time, I think is really, really important. >>Absolutely. Speaking of change, I just saw the announcement about Starburst galaxy fully managed SAS platform now available in all three major clouds. Of course, here we are at AWS. This is a, is this a big directional shift for servers? >>It is, you know, uh, I think there's great precedent within open source enterprise software companies like Mongo DB or confluent who started with a self managed product, much the way that we did, and then moved in the direction of creating a SAS product, a cloud hosted, fully managed product that really I think, expands the market. And that's really essentially what we're doing with galaxy galaxy is designed to be as easy as possible. Um, you know, Starburst was already powerful. This makes it powerful and easy. And, uh, and, and in our view, can, can hopefully expand the market to thousands of potential customers that can now leverage this technology in a, in a faster, easier way, >>Just in sticking with you for a minute. Talk to me about kind of where you're going in, where services heading in terms of support for the data mesh architecture across industries. >>Yeah. So a couple of things that we've, we've done recently, and whether we're doing, uh, as we speak, one is, uh, we introduced a new capability. We call star gate. Now star gate is a connector between Starburst clusters. So you're going to have a Starbucks cluster, and let's say Azure service cluster in AWS, a Starbucks cluster, maybe an AWS west and AWS east. And this basically pushes the processing to where the data lives. So again, living within this construct of, uh, of decentralized data that a data mesh is all about, this allows you to do that at an even greater level of abstraction. So it doesn't even matter what cloud region the data lives in or what cloud entirely it lives in. And there are a lot of important applications for this, not only latency in terms of giving you fast, uh, ability to join across those different clouds, but also, uh, data sovereignty constraints, right? >>Um, increasingly important, especially in Europe, but increasingly everywhere. And, you know, if your data isn't Switzerland, it needs to stay in Switzerland. So starting date as a way of pushing the processing to Switzerland. So you're minimizing the data that you need to pull back to complete your analysis. And, uh, and so we think that's a big deal about, you know, kind of enabling a data mash on a, on a global scale. Um, another thing we're working on back to the point of data products is how do customers curate and create these data products and share them within their organization. And so we're investing heavily in our product to make that easier as well, because I think back to one of the things, uh, Theresa said, it's, it's really all about, uh, making this practical and finding quick wins that customers can deploy, deploy in their data mess journey, right? >>This quick wins are key. So Theresa, last question to you, where should companies go to get started today? Obviously everybody has gotten, we're still in this work from anywhere environment. Companies have tons of data, tons of sources of data, did it, infrastructure's already in place. How did they go and get started with data? >>I think they should start looking at their data projects and thinking about the best data products. I think just that mindset shift about thinking about who's this for what's the business value. And then underneath that architecture and support comes to bear. And then thinking about who are the products that your product could work better with just like any other practice partnerships, like what we have with AWS, right? Like that's a stronger together sort of thing, >>Right? So there's that kind of that cultural component that really strategic shift in thinking and on the architecture. Awesome guys, thank you so much for joining me on the program, coming back on the cube at re-invent talking about data mesh really help. You can help organizations and industry put that together and what's going on at service. We appreciate your time. Thanks again. All right. For my guests, I'm Lisa Martin, you're watching the cubes coverage of AWS reinvent 2021. The cube is the leader in global live tech coverage. We'll be right back.

Published Date : Nov 30 2021

SUMMARY :

Good to have you back. Well, I get to think about the future of cloud and if you think about clouded powers, I know service has been on the program before, but give me a little bit of an overview of the company, what you guys do. And it's how Facebook and Netflix and Airbnb and, and a number of the internet And that's one of the things we've seen explode during the last 22 months, among many other things is data, So even though their products, you really need to make sure that you're doing the right things, but what's data money. This is more of a, an approach, And so that's the reality of the situation today, and it's first and foremost, Just didn't talk to me about when you're having customer conversations. And I think that, you know, essentially every company in the space is, The, the need is to be able to get it, And so a lot of that is reviewing your existing data projects So what are some w you know, we often talk about outcomes, So just by the able to see the data, see what's happening now, that's great. Just so talk to me about how your customer conversations have changed in the last 22 But I think companies don't have the time to wait for that anymore. Let's talk about a little bit about the go to market strategy. And the cloud really makes this easier to do. That's one of the core principles that Theresa mentioned, you know, that's where I think, I'm curious, like to get Theresa, we'll start with you, your perspectives on how And so data is really the heart of the company. And so how do you, Justin, how does Starburst help organizations transfer that technical And I think, you know, the only constant is change. This is a, is this a big directional can, can hopefully expand the market to thousands of potential customers that can now leverage Talk to me about kind of where you're going in, where services heading in the processing to where the data lives. And, uh, and so we think that's a big deal about, you know, kind of enabling a data mash So Theresa, last question to you, where should companies go to get started today? And then thinking about who are the products that your product could work better with just like any other The cube is the leader in global live tech coverage.

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
Theresa	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Teresa Tung	PERSON	0.99+
Justin Borkman	PERSON	0.99+
Justin Borgman	PERSON	0.99+
Teresa	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Justin	PERSON	0.99+
Europe	LOCATION	0.99+
Switzerland	LOCATION	0.99+
Starburst	ORGANIZATION	0.99+
Accenture	ORGANIZATION	0.99+
Second	QUANTITY	0.99+
thousands	QUANTITY	0.99+
Netflix	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
third one	QUANTITY	0.99+
pandemic	EVENT	0.98+
four attributes	QUANTITY	0.98+
Both	QUANTITY	0.98+
today	DATE	0.98+
24	QUANTITY	0.98+
first	QUANTITY	0.98+
Airbnb	ORGANIZATION	0.98+
over 220 patents	QUANTITY	0.97+
over a hundred guests	QUANTITY	0.97+
2021	DATE	0.97+
one	QUANTITY	0.96+
Starbucks	ORGANIZATION	0.96+
single partner	QUANTITY	0.96+
Presto	ORGANIZATION	0.96+
single line	QUANTITY	0.96+
seven	QUANTITY	0.95+
confluent	ORGANIZATION	0.95+
10%	QUANTITY	0.94+
one central place	QUANTITY	0.94+
one thing	QUANTITY	0.93+
single tool	QUANTITY	0.92+
day two	QUANTITY	0.92+
next decade	DATE	0.92+
single entity	QUANTITY	0.92+
star gate	TITLE	0.92+
Mongo DB	ORGANIZATION	0.91+
last 22 months	DATE	0.91+
two life	QUANTITY	0.91+
Starburst	TITLE	0.88+
last 22 months	DATE	0.87+

Lie 3, Today’s Modern Data Stack Is Modern | Starburst

(energetic music) >> Okay, we're back with Justin Borgman, CEO of Starburst, Richard Jarvis is the CTO of EMIS Health, and Teresa Tung is the cloud first technologist from Accenture. We're on to lie number three. And that is the claim that today's "Modern Data Stack" is actually modern. So (chuckles), I guess that's the lie. Or, is that it's not modern. Justin, what do you say? >> Yeah, I think new isn't modern. Right? I think it's the new data stack. It's the cloud data stack, but that doesn't necessarily mean it's modern. I think a lot of the components actually, are exactly the same as what we've had for 40 years. Rather than Teradata, you have Snowflake. Rather than Informatica, you have Fivetran. So, it's the same general stack, just, y'know, a cloud version of it. And I think a lot of the challenges that have plagued us for 40 years still maintain. >> So, let me come back to you Justin. Okay, but there are differences, right? You can scale. You can throw resources at the problem. You can separate compute from storage. You really, there's a lot of money being thrown at that by venture capitalists, and Snowflake you mentioned, its competitors. So that's different. Is it not? Is that not at least an aspect of modern dial it up, dial it down? So what do you say to that? >> Well, it is. It's certainly taking, y'know what the cloud offers and taking advantage of that. But it's important to note that the cloud data warehouses out there are really just separating their compute from their storage. So it's allowing them to scale up and down, but your data's still stored in a proprietary format. You're still locked in. You still have to ingest the data to get it even prepared for analysis. So a lot of the same structural constraints that exist with the old enterprise data warehouse model on-preem still exist. Just yes, a little bit more elastic now because the cloud offers that. >> So Teresa, let me go to you, 'cause you have cloud-first in your title. So, what's say you to this conversation? >> Well, even the cloud providers are looking towards more of a cloud continuum, right? So the centralized cloud as we know it, maybe data lake, data warehouse in the central place, that's not even how the cloud providers are looking at it. They have use query services. Every provider has one that really expands those queries to be beyond a single location. And if we look at a lot of where our- the future goes, right? That's going to very much fall the same thing. There was going to be more edge. There's going to be more on-premise, because of data sovereignty, data gravity, because you're working with different parts of the business that have already made major cloud investments in different cloud providers, right? So, there's a lot of reasons why the modern, I guess, the next modern generation of the data stack needs to be much more federated. >> Okay, so Richard, how do you deal with this? You've obviously got, you know, the technical debt, the existing infrastructure, it's on the books. You don't want to just throw it out. A lot of conversation about modernizing applications, which a lot of times is, you know, of microservices layer on top of legacy apps. How do you think about the Modern Data Stack? >> Well, I think probably the first thing to say is that the stack really has to include the processes and people around the data as well is all well and good changing the technology. But if you don't modernize how people use that technology, then you're not going to be able to, to scale because just 'cause you can scale CPU and storage doesn't mean you can get more people to use your data to generate you more value for the business. And so what we've been looking at is really changing in very much aligned to data products and, and data mesh. How do you enable more people to consume the service and have the stack respond in a way that keeps costs low? Because that's important for our customers consuming this data but also allows people to occasionally run enormous queries and then tick along with smaller ones when required. And it's a good job we did because during COVID all of a sudden we had enormous pressures on our data platform to answer really important life threatening queries. And if we couldn't scale both our data stack and our teams we wouldn't have been able to answer those as quickly as we had. So I think the stack needs to support a scalable business not just the technology itself. >> Well thank you for that. So Justin let's, let's try to break down what the critical aspects are of the modern data stack. So you think about the past, you know, five seven years cloud obviously has given a different pricing model. Derisked experimentation, you know that we talked about the ability to scale up scale down, but it's, I'm taking away that that's not enough. Based on what Richard just said, the modern data stack has to serve the business and enable the business to build data products. I buy that. I'm you a big fan of the data mesh concepts, even though we're early days. So what are the critical aspects if you had to think about you know, the, maybe putting some guardrails and definitions around the modern data stack, what does that look like? What are some of the attributes and, and principles there >> Of how it should look like or, or how >> Yeah. What it should be? >> Yeah. Yeah. Well, I think, you know, in, in Theresa mentioned this in in a previous segment about the data warehouse is not necessarily going to disappear. It just becomes one node, one element of the overall data mesh. And I certainly agree with that. So by no means, are we suggesting that, you know Snowflake or what Redshift or whatever cloud data warehouse you may be using is going to disappear, but it's it's not going to become the end all be all. It's not the, the central single source of truth. And I think that's the paradigm shift that needs to occur. And I think it's also worth noting that those who were the early adopters of the modern data stack were primarily digital, native born in the cloud young companies who had the benefit of of idealism. They had the benefit of starting with a clean slate that does not reflect the vast majority of enterprises. And even those companies, as they grow up, mature out of that ideal state, they go by a business. Now they've got something on another cloud provider that has a different data stack and they have to deal with that heterogeneity that is just change and change is a part of life. And so I think there is an element here that is almost philosophical. It's like, do you believe in an absolute ideal where I can just fit everything into one place or do I believe in reality? And I think the far more pragmatic approach is really what data mesh represents. So to answer your question directly, I think it's adding you know, the ability to access data that lives outside of the data warehouse, maybe living in open data formats in a data lake or accessing operational systems as well. Maybe you want to directly access data that lives in an Oracle database or a Mongo database or, or what have you. So creating that flexibility to really future proof yourself from the inevitable change that you will you won't encounter over time. >> So thank you. So Theresa, based on what Justin just said, I I might take away there is it's inclusive whether it's a data mart, data hub, data lake, data warehouse, just a node on the mesh. Okay. I get that. Does that include Theresa on, on Preem data? Obviously it has to. What are you seeing in terms of the ability to, to take that data mesh concept on Preem I mean most implementations I've seen and data mesh, frankly really aren't, you know adhering to the philosophy there. Maybe, maybe it's data lake and maybe it's using glue. You look at what JPMC is doing, HelloFresh, a lot of stuff happening on the AWS cloud in that, you know, closed stack, if you will. What's the answer to that Theresa? >> I mean, I think it's a killer case for data mesh. The fact that you have valuable data sources on Preem, and then yet you still want to modernize and take the best of cloud. Cloud is still, like we mentioned, there's a lot of great reasons for it around the economics and the way ability to tap into the innovation that the cloud providers are giving around data and AI architecture. It's an easy button. So the mesh allows you to have the best of both world. You can start using the data products on Preem, or in the existing systems that are working already. It's meaningful for the business. At the same time, you can modernize the ones that make business sense because it needs better performance. It needs, you know, something that is, is cheaper or or maybe just tapping into better analytics to get better insights, right? So you're going to be able to stretch and really have the best of both worlds. That, again, going back to Richard's point, that is meaningful by the business. Not everything has to have that one size fits all set a tool. >> Okay. Thank you. So Richard, you know, talking about data as product wonder if we could give us your perspectives here what are the advantages of treating data as a product? What, what role do data products have in the modern data stack? We talk about monetizing data. What are your thoughts on data products? >> So for us, one of the most important data products that we've been creating is taking data that is healthcare data across a wide variety of different settings. So information about patients, demographics about their their treatment, about their medications and so on, and taking that into a standards format that can be utilized by a wide variety of different researchers because misinterpreting that data or having the data not presented in the way that the user is expecting means that you generate the wrong insight and in any business that's clearly not a desirable outcome but when that insight is so critical as it might be in healthcare or some security settings you really have to have gone to the trouble of understanding the data, presenting it in a format that everyone can clearly agree on. And then letting people consume in a very structured managed way, even if that data comes from a variety of different sources in the first place. And so our data product journey has really begun by standardizing data across a number of different silos through the data mesh. So we can present out both internally and through the right governance externally to, to researchers. >> So that data product through whatever APIs is is accessible, it's discoverable, but it's obviously got to be governed as well. You mentioned appropriately provided to internally. >> Yeah. >> But also, you know, external folks as well. So the, so you've, you've architected that capability today? >> We have and because the data is standard it can generate value much more quickly and we can be sure of the security and value that that's providing, because the data product isn't just about formatting the data into the correct tables, it's understanding what it means to redact the data or to remove certain rows from it or to interpret what a date actually means. Is it the start of the contract or the start of the treatment or the date of birth of a patient? These things can be lost in the data storage without having the proper product management around the data to say in a very clear business context what does this data mean, and what does it mean to process this data for a particular use case. >> Yeah, it makes sense. It's got the context. If the, if the domains on the data, you know you got to cut through a lot of the, the centralized teams, the technical teams that that data agnostic, they don't really have that context. All right, let's end. Justin. How does Starburst fit into this modern data stack? Bring us home. >> Yeah. So I think for us it's really providing our customers with, you know the flexibility to operate and analyze data that lives in a wide variety of different systems. Ultimately giving them that optionality, you know and optionality provides the ability to reduce costs store more in a data lake rather than data warehouse. It provides the ability for the fastest time to insight to access the data directly where it lives. And ultimately with this concept of data products that we've now, you know incorporated into our offering as well you can really create and, and curate, you know data as a product to be shared and consumed. So we're trying to help enable the data mesh, you know model and make that an appropriate compliment to you know, the modern data stack that people have today. >> Excellent. Hey, I want to thank Justin, Teresa, and Richard for joining us today. You guys are great. Big believers in the in the data mesh concept, and I think, you know we're seeing the future of data architecture. So thank you. Now, remember, all these conversations are going to be available on the cube.net for on demand viewing. You can also go to starburst.io. They have some great content on the website and they host some really thought provoking interviews and they have awesome resources. Lots of data mesh conversations over there and really good stuff in, in the resource section. So check that out. Thanks for watching the "Data Doesn't Lie... or Does It?" made possible by Starburst data. This is Dave Vellante for the Cube, and we'll see you next time. (upbeat music)

Published Date : Aug 22 2022

SUMMARY :

And that is the claim It's the cloud data stack, So, let me come back to you Justin. that the cloud data warehouses out there So Teresa, let me go to you, So the centralized cloud as we know it, it's on the books. the first thing to say is of the modern data stack. from the inevitable change that you will What's the answer to that Theresa? So the mesh allows you to in the modern data stack? or having the data not presented So that data product But also, you know, around the data to say in a on the data, you know enable the data mesh, you know in the data mesh concept,

ENTITIES

Entity	Category	Confidence
Richard	PERSON	0.99+
Teresa Tung	PERSON	0.99+
Justin	PERSON	0.99+
Teresa	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Justin Borgman	PERSON	0.99+
Richard Jarvis	PERSON	0.99+
40 years	QUANTITY	0.99+
Theresa	PERSON	0.99+
Starburst	ORGANIZATION	0.99+
JPMC	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Informatica	ORGANIZATION	0.99+
Accenture	ORGANIZATION	0.99+
both worlds	QUANTITY	0.99+
today	DATE	0.99+
EMIS Health	ORGANIZATION	0.99+
first technologist	QUANTITY	0.98+
one element	QUANTITY	0.98+
both	QUANTITY	0.98+
first thing	QUANTITY	0.98+
five seven years	QUANTITY	0.98+
one	QUANTITY	0.97+
Teradata	ORGANIZATION	0.97+
Oracle	ORGANIZATION	0.97+
cube.net	OTHER	0.96+
Mongo	ORGANIZATION	0.95+
one size	QUANTITY	0.93+
Cube	ORGANIZATION	0.92+
Preem	TITLE	0.92+
both world	QUANTITY	0.91+
one place	QUANTITY	0.91+
Today’s	TITLE	0.89+
Fivetran	ORGANIZATION	0.86+
Data Doesn't Lie... or Does It?	TITLE	0.86+
single location	QUANTITY	0.85+
HelloFresh	ORGANIZATION	0.84+
first place	QUANTITY	0.83+
CEO	PERSON	0.83+
Lie	TITLE	0.82+
single source	QUANTITY	0.79+
first	QUANTITY	0.75+
one node	QUANTITY	0.72+
Snowflake	ORGANIZATION	0.66+
Snowflake	TITLE	0.66+
three	QUANTITY	0.59+
CTO	PERSON	0.53+
Data Stack	TITLE	0.53+
Redshift	TITLE	0.52+
starburst.io	OTHER	0.48+
COVID	TITLE	0.37+

Lie 1, The Most Effective Data Architecture Is Centralized | Starburst

(bright upbeat music) >> In 2011, early Facebook employee and Cloudera co-founder Jeff Hammerbacher famously said, "The best minds of my generation are thinking about how to get people to click on ads, and that sucks!" Let's face it. More than a decade later, organizations continue to be frustrated with how difficult it is to get value from data and build a truly agile and data-driven enterprise. What does that even mean, you ask? Well, it means that everyone in the organization has the data they need when they need it in a context that's relevant to advance the mission of an organization. Now, that could mean cutting costs, could mean increasing profits, driving productivity, saving lives, accelerating drug discovery, making better diagnoses, solving supply chain problems, predicting weather disasters, simplifying processes, and thousands of other examples where data can completely transform people's lives beyond manipulating internet users to behave a certain way. We've heard the prognostications about the possibilities of data before and in fairness we've made progress, but the hard truth is the original promises of master data management, enterprise data warehouses, data marts, data hubs, and yes even data lakes were broken and left us wanting for more. Welcome to The Data Doesn't Lie... Or Does It? A series of conversations produced by theCUBE and made possible by Starburst Data. I'm your host, Dave Vellante, and joining me today are three industry experts. Justin Borgman is the co-founder and CEO of Starburst, Richard Jarvis is the CTO at EMIS Health, and Teresa Tung is cloud first technologist at Accenture. Today, we're going to have a candid discussion that will expose the unfulfilled, and yes, broken promises of a data past. We'll expose data lies: big lies, little lies, white lies, and hidden truths. And we'll challenge, age old data conventions and bust some data myths. We're debating questions like is the demise of a single source of truth inevitable? Will the data warehouse ever have feature parity with the data lake or vice versa? Is the so-called modern data stack simply centralization in the cloud, AKA the old guards model in new cloud close? How can organizations rethink their data architectures and regimes to realize the true promises of data? Can and will an open ecosystem deliver on these promises in our lifetimes? We're spanning much of the Western world today. Richard is in the UK, Teresa is on the West Coast, and Justin is in Massachusetts with me. I'm in theCUBE studios, about 30 miles outside of Boston. Folks, welcome to the program. Thanks for coming on. >> Thanks for having us. >> Okay, let's get right into it. You're very welcome. Now, here's the first lie. The most effective data architecture is one that is centralized with a team of data specialists serving various lines of business. What do you think Justin? >> Yeah, definitely a lie. My first startup was a company called Hadapt, which was an early SQL engine for IDU that was acquired by Teradata. And when I got to Teradata, of course, Teradata is the pioneer of that central enterprise data warehouse model. One of the things that I found fascinating was that not one of their customers had actually lived up to that vision of centralizing all of their data into one place. They all had data silos. They all had data in different systems. They had data on prem, data in the cloud. Those companies were acquiring other companies and inheriting their data architecture. So despite being the industry leader for 40 years, not one of their customers truly had everything in one place. So I think definitely history has proven that to be a lie. >> So Richard, from a practitioner's point of view, what are your thoughts? I mean, there's a lot of pressure to cut cost, keep things centralized, serve the business as best as possible from that standpoint. What does your experience show? >> Yeah, I mean, I think I would echo Justin's experience really that we as a business have grown up through acquisition, through storing data in different places sometimes to do information governance in different ways to store data in a platform that's close to data experts people who really understand healthcare data from pharmacies or from doctors. And so, although if you were starting from a greenfield site and you were building something brand new, you might be able to centralize all the data and all of the tooling and teams in one place. The reality is that businesses just don't grow up like that. And it's just really impossible to get that academic perfection of storing everything in one place. >> Teresa, I feel like Sarbanes-Oxley have kind of saved the data warehouse, right? (laughs) You actually did have to have a single version of the truth for certain financial data, but really for some of those other use cases I mentioned, I do feel like the industry has kind of let us down. What's your take on this? Where does it make sense to have that sort of centralized approach versus where does it make sense to maybe decentralize? >> I think you got to have centralized governance, right? So from the central team, for things like Sarbanes-Oxley, for things like security, for certain very core data sets having a centralized set of roles, responsibilities to really QA, right? To serve as a design authority for your entire data estate, just like you might with security, but how it's implemented has to be distributed. Otherwise, you're not going to be able to scale, right? So being able to have different parts of the business really make the right data investments for their needs. And then ultimately, you're going to collaborate with your partners. So partners that are not within the company, right? External partners. We're going to see a lot more data sharing and model creation. And so you're definitely going to be decentralized. >> So Justin, you guys last, jeez, I think it was about a year ago, had a session on data mesh. It was a great program. You invited Zhamak Dehghani. Of course, she's the creator of the data mesh. One of our fundamental premises is that you've got this hyper specialized team that you've got to go through if you want anything. But at the same time, these individuals actually become a bottleneck, even though they're some of the most talented people in the organization. So I guess, a question for you Richard. How do you deal with that? Do you organize so that there are a few sort of rock stars that build cubes and the like or have you had any success in sort of decentralizing with your constituencies that data model? >> Yeah. So we absolutely have got rockstar data scientists and data guardians, if you like. People who understand what it means to use this data, particularly the data that we use at EMIS is very private, it's healthcare information. And some of the rules and regulations around using the data are very complex and strict. So we have to have people who understand the usage of the data, then people who understand how to build models, how to process the data effectively. And you can think of them like consultants to the wider business because a pharmacist might not understand how to structure a SQL query, but they do understand how they want to process medication information to improve patient lives. And so that becomes a consulting type experience from a set of rock stars to help a more decentralized business who needs to understand the data and to generate some valuable output. >> Justin, what do you say to a customer or prospect that says, "Look, Justin. I got a centralized team and that's the most cost effective way to serve the business. Otherwise, I got duplication." What do you say to that? >> Well, I would argue it's probably not the most cost effective, and the reason being really twofold. I think, first of all, when you are deploying a enterprise data warehouse model, the data warehouse itself is very expensive, generally speaking. And so you're putting all of your most valuable data in the hands of one vendor who now has tremendous leverage over you for many, many years to come. I think that's the story at Oracle or Teradata or other proprietary database systems. But the other aspect I think is that the reality is those central data warehouse teams, as much as they are experts in the technology, they don't necessarily understand the data itself. And this is one of the core tenets of data mesh that Zhamak writes about is this idea of the domain owners actually know the data the best. And so by not only acknowledging that data is generally decentralized, and to your earlier point about Sarbanes-Oxley, maybe saving the data warehouse, I would argue maybe GDPR and data sovereignty will destroy it because data has to be decentralized for those laws to be compliant. But I think the reality is the data mesh model basically says data's decentralized and we're going to turn that into an asset rather than a liability. And we're going to turn that into an asset by empowering the people that know the data the best to participate in the process of curating and creating data products for consumption. So I think when you think about it that way, you're going to get higher quality data and faster time to insight, which is ultimately going to drive more revenue for your business and reduce costs. So I think that that's the way I see the two models comparing and contrasting. >> So do you think the demise of the data warehouse is inevitable? Teresa, you work with a lot of clients. They're not just going to rip and replace their existing infrastructure. Maybe they're going to build on top of it, but what does that mean? Does that mean the EDW just becomes less and less valuable over time or it's maybe just isolated to specific use cases? What's your take on that? >> Listen, I still would love all my data within a data warehouse. I would love it mastered, would love it owned by a central team, right? I think that's still what I would love to have. That's just not the reality, right? The investment to actually migrate and keep that up to date, I would say it's a losing battle. Like we've been trying to do it for a long time. Nobody has the budgets and then data changes, right? There's going to be a new technology that's going to emerge that we're going to want to tap into. There's going to be not enough investment to bring all the legacy, but still very useful systems into that centralized view. So you keep the data warehouse. I think it's a very, very valuable, very high performance tool for what it's there for, but you could have this new mesh layer that still takes advantage of the things I mentioned: the data products in the systems that are meaningful today, and the data products that actually might span a number of systems. Maybe either those that either source systems with the domains that know it best, or the consumer-based systems or products that need to be packaged in a way that'd be really meaningful for that end user, right? Each of those are useful for a different part of the business and making sure that the mesh actually allows you to use all of them. >> So, Richard, let me ask you. Take Zhamak's principles back to those. You got the domain ownership and data as product. Okay, great. Sounds good. But it creates what I would argue are two challenges: self-serve infrastructure, let's park that for a second, and then in your industry, one of the most regulated, most sensitive, computational governance. How do you automate and ensure federated governance in that mesh model that Teresa was just talking about? >> Well, it absolutely depends on some of the tooling and processes that you put in place around those tools to centralize the security and the governance of the data. And I think although a data warehouse makes that very simple 'cause it's a single tool, it's not impossible with some of the data mesh technologies that are available. And so what we've done at EMIS is we have a single security layer that sits on top of our data mesh, which means that no matter which user is accessing which data source, we go through a well audited, well understood security layer. That means that we know exactly who's got access to which data field, which data tables. And then everything that they do is audited in a very kind of standard way regardless of the underlying data storage technology. So for me, although storing the data in one place might not be possible, understanding where your source of truth is and securing that in a common way is still a valuable approach, and you can do it without having to bring all that data into a single bucket so that it's all in one place. And so having done that and investing quite heavily in making that possible has paid dividends in terms of giving wider access to the platform, and ensuring that only data that's available under GDPR and other regulations is being used by the data users. >> Yeah. So Justin, we always talk about data democratization, and up until recently, they really haven't been line of sight as to how to get there, but do you have anything to add to this because you're essentially doing analytic queries with data that's all dispersed all over. How are you seeing your customers handle this challenge? >> Yeah, I mean, I think data products is a really interesting aspect of the answer to that. It allows you to, again, leverage the data domain owners, the people who know the data the best, to create data as a product ultimately to be consumed. And we try to represent that in our product as effectively, almost eCommerce like experience where you go and discover and look for the data products that have been created in your organization, and then you can start to consume them as you'd like. And so really trying to build on that notion of data democratization and self-service, and making it very easy to discover and start to use with whatever BI tool you may like or even just running SQL queries yourself. >> Okay guys, grab a sip of water. After the short break, we'll be back to debate whether proprietary or open platforms are the best path to the future of data excellence. Keep it right there. (bright upbeat music)

Published Date : Aug 22 2022

SUMMARY :

has the data they need when they need it Now, here's the first lie. has proven that to be a lie. of pressure to cut cost, and all of the tooling have kind of saved the data So from the central team, for that build cubes and the like and to generate some valuable output. and that's the most cost effective way is that the reality is those of the data warehouse is inevitable? and making sure that the mesh one of the most regulated, most sensitive, and processes that you put as to how to get there, aspect of the answer to that. or open platforms are the best path

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Richard	PERSON	0.99+
Justin Borgman	PERSON	0.99+
Justin	PERSON	0.99+
Richard Jarvis	PERSON	0.99+
Teresa Tung	PERSON	0.99+
Jeff Hammerbacher	PERSON	0.99+
Teresa	PERSON	0.99+
Teradata	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Massachusetts	LOCATION	0.99+
Zhamak Dehghani	PERSON	0.99+
UK	LOCATION	0.99+
2011	DATE	0.99+
two challenges	QUANTITY	0.99+
Hadapt	ORGANIZATION	0.99+
40 years	QUANTITY	0.99+
Starburst	ORGANIZATION	0.99+
two models	QUANTITY	0.99+
thousands	QUANTITY	0.99+
Boston	LOCATION	0.99+
Facebook	ORGANIZATION	0.99+
Sarbanes-Oxley	ORGANIZATION	0.99+
Each	QUANTITY	0.99+
first lie	QUANTITY	0.99+
Accenture	ORGANIZATION	0.99+
GDPR	TITLE	0.99+
Today	DATE	0.98+
today	DATE	0.98+
SQL	TITLE	0.98+
Starburst Data	ORGANIZATION	0.98+
EMIS Health	ORGANIZATION	0.98+
Cloudera	ORGANIZATION	0.98+
one	QUANTITY	0.98+
first startup	QUANTITY	0.98+
one place	QUANTITY	0.98+
about 30 miles	QUANTITY	0.98+
One	QUANTITY	0.97+
More than a decade later	DATE	0.97+
EMIS	ORGANIZATION	0.97+
single bucket	QUANTITY	0.97+
first technologist	QUANTITY	0.96+
three industry experts	QUANTITY	0.96+
single tool	QUANTITY	0.96+
single version	QUANTITY	0.94+
Zhamak	PERSON	0.92+
theCUBE	ORGANIZATION	0.91+
single source	QUANTITY	0.9+
West Coast	LOCATION	0.87+
one vendor	QUANTITY	0.84+
single security layer	QUANTITY	0.81+
about a year ago	DATE	0.75+
IDU	ORGANIZATION	0.68+
Is	TITLE	0.65+
a second	QUANTITY	0.64+
EDW	ORGANIZATION	0.57+
examples	QUANTITY	0.55+
echo	COMMERCIAL_ITEM	0.54+
twofold	QUANTITY	0.5+
Lie	TITLE	0.35+

Starburst The Data Lies FULL V2b

>>In 2011, early Facebook employee and Cloudera co-founder Jeff Ocker famously said the best minds of my generation are thinking about how to get people to click on ads. And that sucks. Let's face it more than a decade later organizations continue to be frustrated with how difficult it is to get value from data and build a truly agile data-driven enterprise. What does that even mean? You ask? Well, it means that everyone in the organization has the data they need when they need it. In a context that's relevant to advance the mission of an organization. Now that could mean cutting cost could mean increasing profits, driving productivity, saving lives, accelerating drug discovery, making better diagnoses, solving, supply chain problems, predicting weather disasters, simplifying processes, and thousands of other examples where data can completely transform people's lives beyond manipulating internet users to behave a certain way. We've heard the prognostications about the possibilities of data before and in fairness we've made progress, but the hard truth is the original promises of master data management, enterprise data, warehouses, data marts, data hubs, and yes, even data lakes were broken and left us wanting from more welcome to the data doesn't lie, or doesn't a series of conversations produced by the cube and made possible by Starburst data. >>I'm your host, Dave Lanta and joining me today are three industry experts. Justin Borgman is this co-founder and CEO of Starburst. Richard Jarvis is the CTO at EMI health and Theresa tongue is cloud first technologist at Accenture. Today we're gonna have a candid discussion that will expose the unfulfilled and yes, broken promises of a data past we'll expose data lies, big lies, little lies, white lies, and hidden truths. And we'll challenge, age old data conventions and bust some data myths. We're debating questions like is the demise of a single source of truth. Inevitable will the data warehouse ever have featured parody with the data lake or vice versa is the so-called modern data stack, simply centralization in the cloud, AKA the old guards model in new cloud close. How can organizations rethink their data architectures and regimes to realize the true promises of data can and will and open ecosystem deliver on these promises in our lifetimes, we're spanning much of the Western world today. Richard is in the UK. Teresa is on the west coast and Justin is in Massachusetts with me. I'm in the cube studios about 30 miles outside of Boston folks. Welcome to the program. Thanks for coming on. Thanks for having us. Let's get right into it. You're very welcome. Now here's the first lie. The most effective data architecture is one that is centralized with a team of data specialists serving various lines of business. What do you think Justin? >>Yeah, definitely a lie. My first startup was a company called hit adapt, which was an early SQL engine for hit that was acquired by Teradata. And when I got to Teradata, of course, Teradata is the pioneer of that central enterprise data warehouse model. One of the things that I found fascinating was that not one of their customers had actually lived up to that vision of centralizing all of their data into one place. They all had data silos. They all had data in different systems. They had data on prem data in the cloud. You know, those companies were acquiring other companies and inheriting their data architecture. So, you know, despite being the industry leader for 40 years, not one of their customers truly had everything in one place. So I think definitely history has proven that to be a lie. >>So Richard, from a practitioner's point of view, you know, what, what are your thoughts? I mean, there, there's a lot of pressure to cut cost, keep things centralized, you know, serve the business as best as possible from that standpoint. What, what is your experience show? >>Yeah, I mean, I think I would echo Justin's experience really that we, as a business have grown up through acquisition, through storing data in different places sometimes to do information governance in different ways to store data in, in a platform that's close to data experts, people who really understand healthcare data from pharmacies or from, from doctors. And so, although if you were starting from a Greenfield site and you were building something brand new, you might be able to centralize all the data and all of the tooling and teams in one place. The reality is that that businesses just don't grow up like that. And, and it's just really impossible to get that academic perfection of, of storing everything in one place. >>Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, you know, right. You actually did have to have a single version of the truth for certain financial data, but really for those, some of those other use cases, I, I mentioned, I, I do feel like the industry has kinda let us down. What's your take on this? Where does it make sense to have that sort of centralized approach versus where does it make sense to maybe decentralized? >>I, I think you gotta have centralized governance, right? So from the central team, for things like star Oxley, for things like security for certainly very core data sets, having a centralized set of roles, responsibilities to really QA, right. To serve as a design authority for your entire data estate, just like you might with security, but how it's implemented has to be distributed. Otherwise you're not gonna be able to scale. Right? So being able to have different parts of the business really make the right data investments for their needs. And then ultimately you're gonna collaborate with your partners. So partners that are not within the company, right. External partners, we're gonna see a lot more data sharing and model creation. And so you're definitely going to be decentralized. >>So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, on data mesh. It was a great program. You invited Jamma, Dani, of course, she's the creator of the data mesh. And her one of our fundamental premises is that you've got this hyper specialized team that you've gotta go through. And if you want anything, but at the same time, these, these individuals actually become a bottleneck, even though they're some of the most talented people in the organization. So I guess question for you, Richard, how do you deal with that? Do you, do you organize so that there are a few sort of rock stars that, that, you know, build cubes and, and the like, and, and, and, or have you had any success in sort of decentralizing with, you know, your, your constituencies, that data model? >>Yeah. So, so we absolutely have got rockstar, data scientists and data guardians. If you like people who understand what it means to use this data, particularly as the data that we use at emos is very private it's healthcare information. And some of the, the rules and regulations around using the data are very complex and, and strict. So we have to have people who understand the usage of the data, then people who understand how to build models, how to process the data effectively. And you can think of them like consultants to the wider business, because a pharmacist might not understand how to structure a SQL query, but they do understand how they want to process medication information to improve patient lives. And so that becomes a, a consulting type experience from a, a set of rock stars to help a, a more decentralized business who needs to, to understand the data and to generate some valuable output. >>Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, I got a centralized team and that's the most cost effective way to serve the business. Otherwise I got, I got duplication. What do you say to that? >>Well, I, I would argue it's probably not the most cost effective and, and the reason being really twofold. I think, first of all, when you are deploying a enterprise data warehouse model, the, the data warehouse itself is very expensive, generally speaking. And so you're putting all of your most valuable data in the hands of one vendor who now has tremendous leverage over you, you know, for many, many years to come. I think that's the story at Oracle or Terra data or other proprietary database systems. But the other aspect I think is that the reality is those central data warehouse teams is as much as they are experts in the technology. They don't necessarily understand the data itself. And this is one of the core tenants of data mash that that jam writes about is this idea of the domain owners actually know the data the best. >>And so by, you know, not only acknowledging that data is generally decentralized and to your earlier point about SAR, brain Oxley, maybe saving the data warehouse, I would argue maybe GDPR and data sovereignty will destroy it because data has to be decentralized for, for those laws to be compliant. But I think the reality is, you know, the data mesh model basically says, data's decentralized, and we're gonna turn that into an asset rather than a liability. And we're gonna turn that into an asset by empowering the people that know the data, the best to participate in the process of, you know, curating and creating data products for, for consumption. So I think when you think about it, that way, you're going to get higher quality data and faster time to insight, which is ultimately going to drive more revenue for your business and reduce costs. So I think that that's the way I see the two, the two models comparing and contrasting. >>So do you think the demise of the data warehouse is inevitable? I mean, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing infrastructure. Maybe they're gonna build on top of it, but what does that mean? Does that mean the E D w just becomes, you know, less and less valuable over time, or it's maybe just isolated to specific use cases. What's your take on that? >>Listen, I still would love all my data within a data warehouse would love it. Mastered would love it owned by essential team. Right? I think that's still what I would love to have. That's just not the reality, right? The investment to actually migrate and keep that up to date. I would say it's a losing battle. Like we've been trying to do it for a long time. Nobody has the budgets and then data changes, right? There's gonna be a new technology. That's gonna emerge that we're gonna wanna tap into. There's going to be not enough investment to bring all the legacy, but still very useful systems into that centralized view. So you keep the data warehouse. I think it's a very, very valuable, very high performance tool for what it's there for, but you could have this, you know, new mesh layer that still takes advantage of the things. I mentioned, the data products in the systems that are meaningful today and the data products that actually might span a number of systems, maybe either those that either source systems for the domains that know it best, or the consumer based systems and products that need to be packaged in a way that be really meaningful for that end user, right? Each of those are useful for a different part of the business and making sure that the mesh actually allows you to use all of them. >>So, Richard, let me ask you, you take, take Gemma's principles back to those. You got to, you know, domain ownership and, and, and data as product. Okay, great. Sounds good. But it creates what I would argue are two, you know, challenges, self-serve infrastructure let's park that for a second. And then in your industry, the one of the high, most regulated, most sensitive computational governance, how do you automate and ensure federated governance in that mesh model that Theresa was just talking about? >>Well, it absolutely depends on some of the tooling and processes that you put in place around those tools to be, to centralize the security and the governance of the data. And I think, although a data warehouse makes that very simple, cause it's a single tool, it's not impossible with some of the data mesh technologies that are available. And so what we've done at emus is we have a single security layer that sits on top of our data match, which means that no matter which user is accessing, which data source, we go through a well audited well understood security layer. That means that we know exactly who's got access to which data field, which data tables. And then everything that they do is, is audited in a very kind of standard way, regardless of the underlying data storage technology. So for me, although storing the data in one place might not be possible understanding where your source of truth is and securing that in a common way is still a valuable approach and you can do it without having to bring all that data into a single bucket so that it's all in one place. And, and so having done that and investing quite heavily in making that possible has paid dividends in terms of giving wider access to the platform and ensuring that only data that's available under GDPR and other regulations is being used by, by the data users. >>Yeah. So Justin, I mean, Democrat, we always talk about data democratization and you know, up until recently, they really haven't been line of sight as to how to get there. But do you have anything to add to this because you're essentially taking, you know, do an analytic queries and with data that's all dispersed all over the, how are you seeing your customers handle this, this challenge? >>Yeah. I mean, I think data products is a really interesting aspect of the answer to that. It allows you to, again, leverage the data domain owners, people know the data, the best to, to create, you know, data as a product ultimately to be consumed. And we try to represent that in our product as effectively a almost eCommerce like experience where you go and discover and look for the data products that have been created in your organization. And then you can start to consume them as, as you'd like. And so really trying to build on that notion of, you know, data democratization and self-service, and making it very easy to discover and, and start to use with whatever BI tool you, you may like, or even just running, you know, SQL queries yourself, >>Okay. G guys grab a sip of water. After this short break, we'll be back to debate whether proprietary or open platforms are the best path to the future of data excellence, keep it right there. >>Your company has more data than ever, and more people trying to understand it, but there's a problem. Your data is stored across multiple systems. It's hard to access and that delays analytics and ultimately decisions. The old method of moving all of your data into a single source of truth is slow and definitely not built for the volume of data we have today or where we are headed while your data engineers spent over half their time, moving data, your analysts and data scientists are left, waiting, feeling frustrated, unproductive, and unable to move the needle for your business. But what if you could spend less time moving or copying data? What if your data consumers could analyze all your data quickly? >>Starburst helps your teams run fast queries on any data source. We help you create a single point of access to your data, no matter where it's stored. And we support high concurrency, we solve for speed and scale, whether it's fast, SQL queries on your data lake or faster queries across multiple data sets, Starburst helps your teams run analytics anywhere you can't afford to wait for data to be available. Your team has questions that need answers. Now with Starburst, the wait is over. You'll have faster access to data with enterprise level security, easy connectivity, and 24 7 support from experts, organizations like Zolando Comcast and FINRA rely on Starburst to move their businesses forward. Contact our Trino experts to get started. >>We're back with Jess Borgman of Starburst and Richard Jarvis of EVAs health. Okay, we're gonna get to lie. Number two, and that is this an open source based platform cannot give you the performance and control that you can get with a proprietary system. Is that a lie? Justin, the enterprise data warehouse has been pretty dominant and has evolved and matured. Its stack has mature over the years. Why is it not the default platform for data? >>Yeah, well, I think that's become a lie over time. So I, I think, you know, if we go back 10 or 12 years ago with the advent of the first data lake really around Hudu, that probably was true that you couldn't get the performance that you needed to run fast, interactive, SQL queries in a data lake. Now a lot's changed in 10 or 12 years. I remember in the very early days, people would say, you you'll never get performance because you need to be column there. You need to store data in a column format. And then, you know, column formats we're introduced to, to data apes, you have Parque ORC file in aro that were created to ultimately deliver performance out of that. So, okay. We got, you know, largely over the performance hurdle, you know, more recently people will say, well, you don't have the ability to do updates and deletes like a traditional data warehouse. >>And now we've got the creation of new data formats, again like iceberg and Delta and Hodi that do allow for updates and delete. So I think the data lake has continued to mature. And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, know it takes six or seven years to build a functional database. I think that's that's right. And now we've had almost a decade go by. So, you know, these technologies have matured to really deliver very, very close to the same level performance and functionality of, of cloud data warehouses. So I think the, the reality is that's become a line and now we have large giant hyperscale internet companies that, you know, don't have the traditional data warehouse at all. They do all of their analytics in a data lake. So I think we've, we've proven that it's very much possible today. >>Thank you for that. And so Richard, talk about your perspective as a practitioner in terms of what open brings you versus, I mean, look closed is it's open as a moving target. I remember Unix used to be open systems and so it's, it is an evolving, you know, spectrum, but, but from your perspective, what does open give you that you can't get from a proprietary system where you are fearful of in a proprietary system? >>I, I suppose for me open buys us the ability to be unsure about the future, because one thing that's always true about technology is it evolves in a, a direction, slightly different to what people expect. And what you don't want to end up is done is backed itself into a corner that then prevents it from innovating. So if you have chosen a technology and you've stored trillions of records in that technology and suddenly a new way of processing or machine learning comes out, you wanna be able to take advantage and your competitive edge might depend upon it. And so I suppose for us, we acknowledge that we don't have perfect vision of what the future might be. And so by backing open storage technologies, we can apply a number of different technologies to the processing of that data. And that gives us the ability to remain relevant, innovate on our data storage. And we have bought our way out of the, any performance concerns because we can use cloud scale infrastructure to scale up and scale down as we need. And so we don't have the concerns that we don't have enough hardware today to process what we want to do, want to achieve. We can just scale up when we need it and scale back down. So open source has really allowed us to maintain the being at the cutting edge. >>So Jess, let me play devil's advocate here a little bit, and I've talked to Shaak about this and you know, obviously her vision is there's an open source that, that the data meshes open source, an open source tooling, and it's not a proprietary, you know, you're not gonna buy a data mesh. You're gonna build it with, with open source toolings and, and vendors like you are gonna support it, but to come back to sort of today, you can get to market with a proprietary solution faster. I'm gonna make that statement. You tell me if it's a lie and then you can say, okay, we support Apache iceberg. We're gonna support open source tooling, take a company like VMware, not really in the data business, but how, the way they embraced Kubernetes and, and you know, every new open source thing that comes along, they say, we do that too. Why can't proprietary systems do that and be as effective? >>Yeah, well, I think at least with the, within the data landscape saying that you can access open data formats like iceberg or, or others is, is a bit dis disingenuous because really what you're selling to your customer is a certain degree of performance, a certain SLA, and you know, those cloud data warehouses that can reach beyond their own proprietary storage drop all the performance that they were able to provide. So it is, it reminds me kind of, of, again, going back 10 or 12 years ago when everybody had a connector to Haddo and that they thought that was the solution, right? But the reality was, you know, a connector was not the same as running workloads in Haddo back then. And I think similarly, you know, being able to connect to an external table that lives in an open data format, you know, you're, you're not going to give it the performance that your customers are accustomed to. And at the end of the day, they're always going to be predisposed. They're always going to be incentivized to get that data ingested into the data warehouse, cuz that's where they have control. And you know, the bottom line is the database industry has really been built around vendor lockin. I mean, from the start, how, how many people love Oracle today, but our customers, nonetheless, I think, you know, lockin is, is, is part of this industry. And I think that's really what we're trying to change with open data formats. >>Well, that's interesting reminded when I, you know, I see the, the gas price, the tees or gas price I, I drive up and then I say, oh, that's the cash price credit card. I gotta pay 20 cents more, but okay. But so the, the argument then, so let me, let me come back to you, Justin. So what's wrong with saying, Hey, we support open data formats, but yeah, you're gonna get better performance if you, if you keep it into our closed system, are you saying that long term that's gonna come back and bite you cuz you're gonna end up, you mentioned Oracle, you mentioned Teradata. Yeah. That's by, by implication, you're saying that's where snowflake customers are headed. >>Yeah, absolutely. I think this is a movie that, you know, we've all seen before. At least those of us who've been in the industry long enough to, to see this movie play over a couple times. So I do think that's the future. And I think, you know, I loved what Richard said. I actually wrote it down. Cause I thought it was an amazing quote. He said, it buys us the ability to be unsure of the future. Th that that pretty much says it all the, the future is unknowable and the reality is using open data formats. You remain interoperable with any technology you want to utilize. If you want to use spark to train a machine learning model and you want to use Starbust to query via sequel, that's totally cool. They can both work off the same exact, you know, data, data sets by contrast, if you're, you know, focused on a proprietary model, then you're kind of locked in again to that model. I think the same applies to data, sharing to data products, to a wide variety of, of aspects of the data landscape that a proprietary approach kind of closes you in and locks you in. >>So I, I would say this Richard, I'd love to get your thoughts on it. Cause I talked to a lot of Oracle customers, not as many te data customers, but, but a lot of Oracle customers and they, you know, they'll admit, yeah, you know, they're jamming us on price and the license cost they give, but we do get value out of it. And so my question to you, Richard, is, is do the, let's call it data warehouse systems or the proprietary systems. Are they gonna deliver a greater ROI sooner? And is that in allure of, of that customers, you know, are attracted to, or can open platforms deliver as fast in ROI? >>I think the answer to that is it can depend a bit. It depends on your businesses skillset. So we are lucky that we have a number of proprietary teams that work in databases that provide our operational data capability. And we have teams of analytics and big data experts who can work with open data sets and open data formats. And so for those different teams, they can get to an ROI more quickly with different technologies for the business though, we can't do better for our operational data stores than proprietary databases. Today we can back off very tight SLAs to them. We can demonstrate reliability from millions of hours of those databases being run at enterprise scale, but for an analytics workload where increasing our business is growing in that direction, we can't do better than open data formats with cloud-based data mesh type technologies. And so it's not a simple answer. That one will always be the right answer for our business. We definitely have times when proprietary databases provide a capability that we couldn't easily represent or replicate with open technologies. >>Yeah. Richard, stay with you. You mentioned, you know, you know, some things before that, that strike me, you know, the data brick snowflake, you know, thing is, oh, is a lot of fun for analysts like me. You've got data bricks coming at it. Richard, you mentioned you have a lot of rockstar, data engineers, data bricks coming at it from a data engineering heritage. You get snowflake coming at it from an analytics heritage. Those two worlds are, are colliding people like PJI Mohan said, you know what? I think it's actually harder to play in the data engineering. So I E it's easier to for data engineering world to go into the analytics world versus the reverse, but thinking about up and coming engineers and developers preparing for this future of data engineering and data analytics, how, how should they be thinking about the future? What, what's your advice to those young people? >>So I think I'd probably fall back on general programming skill sets. So the advice that I saw years ago was if you have open source technologies, the pythons and Javas on your CV, you commander 20% pay, hike over people who can only do proprietary programming languages. And I think that's true of data technologies as well. And from a business point of view, that makes sense. I'd rather spend the money that I save on proprietary licenses on better engineers, because they can provide more value to the business that can innovate us beyond our competitors. So I think I would my advice to people who are starting here or trying to build teams to capitalize on data assets is begin with open license, free capabilities, because they're very cheap to experiment with. And they generate a lot of interest from people who want to join you as a business. And you can make them very successful early, early doors with, with your analytics journey. >>It's interesting. Again, analysts like myself, we do a lot of TCO work and have over the last 20 plus years. And in world of Oracle, you know, normally it's the staff, that's the biggest nut in total cost of ownership, not an Oracle. It's the it's the license cost is by far the biggest component in the, in the blame pie. All right, Justin, help us close out this segment. We've been talking about this sort of data mesh open, closed snowflake data bricks. Where does Starburst sort of as this engine for the data lake data lake house, the data warehouse fit in this, in this world? >>Yeah. So our view on how the future ultimately unfolds is we think that data lakes will be a natural center of gravity for a lot of the reasons that we described open data formats, lowest total cost of ownership, because you get to choose the cheapest storage available to you. Maybe that's S3 or Azure data lake storage, or Google cloud storage, or maybe it's on-prem object storage that you bought at a, at a really good price. So ultimately storing a lot of data in a deal lake makes a lot of sense, but I think what makes our perspective unique is we still don't think you're gonna get everything there either. We think that basically centralization of all your data assets is just an impossible endeavor. And so you wanna be able to access data that lives outside of the lake as well. So we kind of think of the lake as maybe the biggest place by volume in terms of how much data you have, but to, to have comprehensive analytics and to truly understand your business and understand it holistically, you need to be able to go access other data sources as well. And so that's the role that we wanna play is to be a single point of access for our customers, provide the right level of fine grained access controls so that the right people have access to the right data and ultimately make it easy to discover and consume via, you know, the creation of data products as well. >>Great. Okay. Thanks guys. Right after this quick break, we're gonna be back to debate whether the cloud data model that we see emerging and the so-called modern data stack is really modern, or is it the same wine new bottle? When it comes to data architectures, you're watching the cube, the leader in enterprise and emerging tech coverage. >>Your data is capable of producing incredible results, but data consumers are often left in the dark without fast access to the data they need. Starers makes your data visible from wherever it lives. Your company is acquiring more data in more places, more rapidly than ever to rely solely on a data centralization strategy. Whether it's in a lake or a warehouse is unrealistic. A single source of truth approach is no longer viable, but disconnected data silos are often left untapped. We need a new approach. One that embraces distributed data. One that enables fast and secure access to any of your data from anywhere with Starburst, you'll have the fastest query engine for the data lake that allows you to connect and analyze your disparate data sources no matter where they live Starburst provides the foundational technology required for you to build towards the vision of a decentralized data mesh Starburst enterprise and Starburst galaxy offer enterprise ready, connectivity, interoperability, and security features for multiple regions, multiple clouds and everchanging global regulatory requirements. The data is yours. And with Starburst, you can perform analytics anywhere in light of your world. >>Okay. We're back with Justin Boardman. CEO of Starbust Richard Jarvis is the CTO of EMI health and Theresa tongue is the cloud first technologist from Accenture. We're on July number three. And that is the claim that today's modern data stack is actually modern. So I guess that's the lie it's it is it's is that it's not modern. Justin, what do you say? >>Yeah. I mean, I think new isn't modern, right? I think it's the, it's the new data stack. It's the cloud data stack, but that doesn't necessarily mean it's modern. I think a lot of the components actually are exactly the same as what we've had for 40 years, rather than Terra data. You have snowflake rather than Informatica you have five trend. So it's the same general stack, just, you know, a cloud version of it. And I think a lot of the challenges that it plagued us for 40 years still maintain. >>So lemme come back to you just, but okay. But, but there are differences, right? I mean, you can scale, you can throw resources at the problem. You can separate compute from storage. You really, you know, there's a lot of money being thrown at that by venture capitalists and snowflake, you mentioned it's competitors. So that's different. Is it not, is that not at least an aspect of, of modern dial it up, dial it down. So what, what do you say to that? >>Well, it, it is, it's certainly taking, you know, what the cloud offers and taking advantage of that, but it's important to note that the cloud data warehouses out there are really just separating their compute from their storage. So it's allowing them to scale up and down, but your data still stored in a proprietary format. You're still locked in. You still have to ingest the data to get it even prepared for analysis. So a lot of the same sort of structural constraints that exist with the old enterprise data warehouse model OnPrem still exist just yes, a little bit more elastic now because the cloud offers that. >>So Theresa, let me go to you cuz you have cloud first in your, in your, your title. So what's what say you to this conversation? >>Well, even the cloud providers are looking towards more of a cloud continuum, right? So the centralized cloud, as we know it, maybe data lake data warehouse in the central place, that's not even how the cloud providers are looking at it. They have news query services. Every provider has one that really expands those queries to be beyond a single location. And if we look at a lot of where our, the future goes, right, that that's gonna very much fall the same thing. There was gonna be more edge. There's gonna be more on premise because of data sovereignty, data gravity, because you're working with different parts of the business that have already made major cloud investments in different cloud providers. Right? So there's a lot of reasons why the modern, I guess, the next modern generation of the data staff needs to be much more federated. >>Okay. So Richard, how do you deal with this? You you've obviously got, you know, the technical debt, the existing infrastructure it's on the books. You don't wanna just throw it out. A lot of, lot of conversation about modernizing applications, which a lot of times is a, you know, a microservices layer on top of leg legacy apps. How do you think about the modern data stack? >>Well, I think probably the first thing to say is that the stack really has to include the processes and people around the data as well is all well and good changing the technology. But if you don't modernize how people use that technology, then you're not going to be able to, to scale because just cuz you can scale CPU and storage doesn't mean you can get more people to use your data, to generate you more, more value for the business. And so what we've been looking at is really changing in very much aligned to data products and, and data mesh. How do you enable more people to consume the service and have the stack respond in a way that keeps costs low? Because that's important for our customers consuming this data, but also allows people to occasionally run enormous queries and then tick along with smaller ones when required. And it's a good job we did because during COVID all of a sudden we had enormous pressures on our data platform to answer really important life threatening queries. And if we couldn't scale both our data stack and our teams, we wouldn't have been able to answer those as quickly as we had. So I think the stack needs to support a scalable business, not just the technology itself. >>Well thank you for that. So Justin let's, let's try to break down what the critical aspects are of the modern data stack. So you think about the past, you know, five, seven years cloud obviously has given a different pricing model. De-risked experimentation, you know that we talked about the ability to scale up scale down, but it's, I'm, I'm taking away that that's not enough based on what Richard just said. The modern data stack has to serve the business and enable the business to build data products. I, I buy that. I'm a big fan of the data mesh concepts, even though we're early days. So what are the critical aspects if you had to think about, you know, paying, maybe putting some guardrails and definitions around the modern data stack, what does that look like? What are some of the attributes and, and principles there >>Of, of how it should look like or, or how >>It's yeah. What it should be. >>Yeah. Yeah. Well, I think, you know, in, in Theresa mentioned this in, in a previous segment about the data warehouse is not necessarily going to disappear. It just becomes one node, one element of the overall data mesh. And I, I certainly agree with that. So by no means, are we suggesting that, you know, snowflake or Redshift or whatever cloud data warehouse you may be using is going to disappear, but it's, it's not going to become the end all be all. It's not the, the central single source of truth. And I think that's the paradigm shift that needs to occur. And I think it's also worth noting that those who were the early adopters of the modern data stack were primarily digital, native born in the cloud young companies who had the benefit of, of idealism. They had the benefit of it was starting with a clean slate that does not reflect the vast majority of enterprises. >>And even those companies, as they grow up mature out of that ideal state, they go buy a business. Now they've got something on another cloud provider that has a different data stack and they have to deal with that heterogeneity that is just change and change is a part of life. And so I think there is an element here that is almost philosophical. It's like, do you believe in an absolute ideal where I can just fit everything into one place or do I believe in reality? And I think the far more pragmatic approach is really what data mesh represents. So to answer your question directly, I think it's adding, you know, the ability to access data that lives outside of the data warehouse, maybe living in open data formats in a data lake or accessing operational systems as well. Maybe you want to directly access data that lives in an Oracle database or a Mongo database or, or what have you. So creating that flexibility to really Futureproof yourself from the inevitable change that you will, you won't encounter over time. >>So thank you. So there, based on what Justin just said, I, my takeaway there is it's inclusive, whether it's a data Mar data hub, data lake data warehouse, it's a, just a node on the mesh. Okay. I get that. Does that include there on Preem data? O obviously it has to, what are you seeing in terms of the ability to, to take that data mesh concept on Preem? I mean, most implementations I've seen in data mesh, frankly really aren't, you know, adhering to the philosophy. They're maybe, maybe it's data lake and maybe it's using glue. You look at what JPMC is doing. Hello, fresh, a lot of stuff happening on the AWS cloud in that, you know, closed stack, if you will. What's the answer to that Theresa? >>I mean, I, I think it's a killer case for data. Me, the fact that you have valuable data sources, OnPrem, and then yet you still wanna modernize and take the best of cloud cloud is still, like we mentioned, there's a lot of great reasons for it around the economics and the way ability to tap into the innovation that the cloud providers are giving around data and AI architecture. It's an easy button. So the mesh allows you to have the best of both worlds. You can start using the data products on-prem or in the existing systems that are working already. It's meaningful for the business. At the same time, you can modernize the ones that make business sense because it needs better performance. It needs, you know, something that is, is cheaper or, or maybe just tap into better analytics to get better insights, right? So you're gonna be able to stretch and really have the best of both worlds. That, again, going back to Richard's point, that is meaningful by the business. Not everything has to have that one size fits all set a tool. >>Okay. Thank you. So Richard, you know, talking about data as product, wonder if we could give us your perspectives here, what are the advantages of treating data as a product? What, what role do data products have in the modern data stack? We talk about monetizing data. What are your thoughts on data products? >>So for us, one of the most important data products that we've been creating is taking data that is healthcare data across a wide variety of different settings. So information about patients' demographics about their, their treatment, about their medications and so on, and taking that into a standards format that can be utilized by a wide variety of different researchers because misinterpreting that data or having the data not presented in the way that the user is expecting means that you generate the wrong insight. And in any business, that's clearly not a desirable outcome, but when that insight is so critical, as it might be in healthcare or some security settings, you really have to have gone to the trouble of understanding the data, presenting it in a format that everyone can clearly agree on. And then letting people consume in a very structured, managed way, even if that data comes from a variety of different sources in, in, in the first place. And so our data product journey has really begun by standardizing data across a number of different silos through the data mesh. So we can present out both internally and through the right governance externally to, to researchers. >>So that data product through whatever APIs is, is accessible, it's discoverable, but it's obviously gotta be governed as well. You mentioned you, you appropriately provided to internally. Yeah. But also, you know, external folks as well. So the, so you've, you've architected that capability today >>We have, and because the data is standard, it can generate value much more quickly and we can be sure of the security and, and, and value that that's providing because the data product isn't just about formatting the data into the correct tables, it's understanding what it means to redact the data or to remove certain rows from it or to interpret what a date actually means. Is it the start of the contract or the start of the treatment or the date of birth of a patient? These things can be lost in the data storage without having the proper product management around the data to say in a very clear business context, what does this data mean? And what does it mean to process this data for a particular use case? >>Yeah, it makes sense. It's got the context. If the, if the domains own the data, you, you gotta cut through a lot of the, the, the centralized teams, the technical teams that, that data agnostic, they don't really have that context. All right. Let's send Justin, how does Starburst fit into this modern data stack? Bring us home. >>Yeah. So I think for us, it's really providing our customers with, you know, the flexibility to operate and analyze data that lives in a wide variety of different systems. Ultimately giving them that optionality, you know, and optionality provides the ability to reduce costs, store more in a data lake rather than data warehouse. It provides the ability for the fastest time to insight to access the data directly where it lives. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, you can really create and, and curate, you know, data as a product to be shared and consumed. So we're trying to help enable the data mesh, you know, model and make that an appropriate compliment to, you know, the, the, the modern data stack that people have today. >>Excellent. Hey, I wanna thank Justin Theresa and Richard for joining us today. You guys are great. I big believers in the, in the data mesh concept, and I think, you know, we're seeing the future of data architecture. So thank you. Now, remember, all these conversations are gonna be available on the cube.net for on-demand viewing. You can also go to starburst.io. They have some great content on the website and they host some really thought provoking interviews and, and, and they have awesome resources, lots of data mesh conversations over there, and really good stuff in, in the resource section. So check that out. Thanks for watching the data doesn't lie or does it made possible by Starburst data? This is Dave Valante for the cube, and we'll see you next time. >>The explosion of data sources has forced organizations to modernize their systems and architecture and come to terms with one size does not fit all for data management today. Your teams are constantly moving and copying data, which requires time management. And in some cases, double paying for compute resources. Instead, what if you could access all your data anywhere using the BI tools and SQL skills your users already have. And what if this also included enterprise security and fast performance with Starburst enterprise, you can provide your data consumers with a single point of secure access to all of your data, no matter where it lives with features like strict, fine grained, access control, end to end data encryption and data masking Starburst meets the security standards of the largest companies. Starburst enterprise can easily be deployed anywhere and managed with insights where data teams holistically view their clusters operation and query execution. So they can reach meaningful business decisions faster, all this with the support of the largest team of Trino experts in the world, delivering fully tested stable releases and available to support you 24 7 to unlock the value in all of your data. You need a solution that easily fits with what you have today and can adapt to your architecture. Tomorrow. Starbust enterprise gives you the fastest path from big data to better decisions, cuz your team can't afford to wait. Trino was created to empower analytics anywhere and Starburst enterprise was created to give you the enterprise grade performance, connectivity, security management, and support your company needs organizations like Zolando Comcast and FINRA rely on Starburst to move their businesses forward. Contact us to get started.

Published Date : Aug 22 2022

SUMMARY :

famously said the best minds of my generation are thinking about how to get people to the data warehouse ever have featured parody with the data lake or vice versa is So, you know, despite being the industry leader for 40 years, not one of their customers truly had So Richard, from a practitioner's point of view, you know, what, what are your thoughts? although if you were starting from a Greenfield site and you were building something brand new, Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, I, I think you gotta have centralized governance, right? So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, And you can think of them Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, you know, for many, many years to come. But I think the reality is, you know, the data mesh model basically says, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing that the mesh actually allows you to use all of them. But it creates what I would argue are two, you know, Well, it absolutely depends on some of the tooling and processes that you put in place around those do an analytic queries and with data that's all dispersed all over the, how are you seeing your the best to, to create, you know, data as a product ultimately to be consumed. open platforms are the best path to the future of data But what if you could spend less you create a single point of access to your data, no matter where it's stored. give you the performance and control that you can get with a proprietary system. I remember in the very early days, people would say, you you'll never get performance because And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, know it takes six or seven it is an evolving, you know, spectrum, but, but from your perspective, And what you don't want to end up So Jess, let me play devil's advocate here a little bit, and I've talked to Shaak about this and you know, And I think similarly, you know, being able to connect to an external table that lives in an open data format, Well, that's interesting reminded when I, you know, I see the, the gas price, And I think, you know, I loved what Richard said. not as many te data customers, but, but a lot of Oracle customers and they, you know, And so for those different teams, they can get to an ROI more quickly with different technologies that strike me, you know, the data brick snowflake, you know, thing is, oh, is a lot of fun for analysts So the advice that I saw years ago was if you have open source technologies, And in world of Oracle, you know, normally it's the staff, easy to discover and consume via, you know, the creation of data products as well. really modern, or is it the same wine new bottle? And with Starburst, you can perform analytics anywhere in light of your world. And that is the claim that today's So it's the same general stack, just, you know, a cloud version of it. So lemme come back to you just, but okay. So a lot of the same sort of structural constraints that exist with So Theresa, let me go to you cuz you have cloud first in your, in your, the data staff needs to be much more federated. you know, a microservices layer on top of leg legacy apps. So I think the stack needs to support a scalable So you think about the past, you know, five, seven years cloud obviously has given What it should be. And I think that's the paradigm shift that needs to occur. data that lives outside of the data warehouse, maybe living in open data formats in a data lake seen in data mesh, frankly really aren't, you know, adhering to So the mesh allows you to have the best of both worlds. So Richard, you know, talking about data as product, wonder if we could give us your perspectives is expecting means that you generate the wrong insight. But also, you know, around the data to say in a very clear business context, It's got the context. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, This is Dave Valante for the cube, and we'll see you next time. You need a solution that easily fits with what you have today and can adapt

ENTITIES

Entity	Category	Confidence
Richard	PERSON	0.99+
Dave Lanta	PERSON	0.99+
Jess Borgman	PERSON	0.99+
Justin	PERSON	0.99+
Theresa	PERSON	0.99+
Justin Borgman	PERSON	0.99+
Teresa	PERSON	0.99+
Jeff Ocker	PERSON	0.99+
Richard Jarvis	PERSON	0.99+
Dave Valante	PERSON	0.99+
Justin Boardman	PERSON	0.99+
six	QUANTITY	0.99+
Dani	PERSON	0.99+
Massachusetts	LOCATION	0.99+
20 cents	QUANTITY	0.99+
Teradata	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Jamma	PERSON	0.99+
UK	LOCATION	0.99+
FINRA	ORGANIZATION	0.99+
40 years	QUANTITY	0.99+
Kurt Monash	PERSON	0.99+
20%	QUANTITY	0.99+
two	QUANTITY	0.99+
five	QUANTITY	0.99+
Jess	PERSON	0.99+
2011	DATE	0.99+
Starburst	ORGANIZATION	0.99+
10	QUANTITY	0.99+
Accenture	ORGANIZATION	0.99+
seven years	QUANTITY	0.99+
thousands	QUANTITY	0.99+
pythons	TITLE	0.99+
Boston	LOCATION	0.99+
GDPR	TITLE	0.99+
Today	DATE	0.99+
two models	QUANTITY	0.99+
Zolando Comcast	ORGANIZATION	0.99+
Gemma	PERSON	0.99+
Starbust	ORGANIZATION	0.99+
JPMC	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
Javas	TITLE	0.99+
today	DATE	0.99+
AWS	ORGANIZATION	0.99+
millions	QUANTITY	0.99+
first lie	QUANTITY	0.99+
10	DATE	0.99+
12 years	QUANTITY	0.99+
one place	QUANTITY	0.99+
Tomorrow	DATE	0.99+

Starburst The Data Lies FULL V1

>>In 2011, early Facebook employee and Cloudera co-founder Jeff Ocker famously said the best minds of my generation are thinking about how to get people to click on ads. And that sucks. Let's face it more than a decade later organizations continue to be frustrated with how difficult it is to get value from data and build a truly agile data-driven enterprise. What does that even mean? You ask? Well, it means that everyone in the organization has the data they need when they need it. In a context that's relevant to advance the mission of an organization. Now that could mean cutting cost could mean increasing profits, driving productivity, saving lives, accelerating drug discovery, making better diagnoses, solving, supply chain problems, predicting weather disasters, simplifying processes, and thousands of other examples where data can completely transform people's lives beyond manipulating internet users to behave a certain way. We've heard the prognostications about the possibilities of data before and in fairness we've made progress, but the hard truth is the original promises of master data management, enterprise data, warehouses, data marts, data hubs, and yes, even data lakes were broken and left us wanting from more welcome to the data doesn't lie, or doesn't a series of conversations produced by the cube and made possible by Starburst data. >>I'm your host, Dave Lanta and joining me today are three industry experts. Justin Borgman is this co-founder and CEO of Starburst. Richard Jarvis is the CTO at EMI health and Theresa tongue is cloud first technologist at Accenture. Today we're gonna have a candid discussion that will expose the unfulfilled and yes, broken promises of a data past we'll expose data lies, big lies, little lies, white lies, and hidden truths. And we'll challenge, age old data conventions and bust some data myths. We're debating questions like is the demise of a single source of truth. Inevitable will the data warehouse ever have featured parody with the data lake or vice versa is the so-called modern data stack, simply centralization in the cloud, AKA the old guards model in new cloud close. How can organizations rethink their data architectures and regimes to realize the true promises of data can and will and open ecosystem deliver on these promises in our lifetimes, we're spanning much of the Western world today. Richard is in the UK. Teresa is on the west coast and Justin is in Massachusetts with me. I'm in the cube studios about 30 miles outside of Boston folks. Welcome to the program. Thanks for coming on. Thanks for having us. Let's get right into it. You're very welcome. Now here's the first lie. The most effective data architecture is one that is centralized with a team of data specialists serving various lines of business. What do you think Justin? >>Yeah, definitely a lie. My first startup was a company called hit adapt, which was an early SQL engine for hit that was acquired by Teradata. And when I got to Teradata, of course, Teradata is the pioneer of that central enterprise data warehouse model. One of the things that I found fascinating was that not one of their customers had actually lived up to that vision of centralizing all of their data into one place. They all had data silos. They all had data in different systems. They had data on prem data in the cloud. You know, those companies were acquiring other companies and inheriting their data architecture. So, you know, despite being the industry leader for 40 years, not one of their customers truly had everything in one place. So I think definitely history has proven that to be a lie. >>So Richard, from a practitioner's point of view, you know, what, what are your thoughts? I mean, there, there's a lot of pressure to cut cost, keep things centralized, you know, serve the business as best as possible from that standpoint. What, what is your experience show? >>Yeah, I mean, I think I would echo Justin's experience really that we, as a business have grown up through acquisition, through storing data in different places sometimes to do information governance in different ways to store data in, in a platform that's close to data experts, people who really understand healthcare data from pharmacies or from, from doctors. And so, although if you were starting from a Greenfield site and you were building something brand new, you might be able to centralize all the data and all of the tooling and teams in one place. The reality is that that businesses just don't grow up like that. And, and it's just really impossible to get that academic perfection of, of storing everything in one place. >>Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, you know, right. You actually did have to have a single version of the truth for certain financial data, but really for those, some of those other use cases, I, I mentioned, I, I do feel like the industry has kinda let us down. What's your take on this? Where does it make sense to have that sort of centralized approach versus where does it make sense to maybe decentralized? >>I, I think you gotta have centralized governance, right? So from the central team, for things like star Oxley, for things like security for certainly very core data sets, having a centralized set of roles, responsibilities to really QA, right. To serve as a design authority for your entire data estate, just like you might with security, but how it's implemented has to be distributed. Otherwise you're not gonna be able to scale. Right? So being able to have different parts of the business really make the right data investments for their needs. And then ultimately you're gonna collaborate with your partners. So partners that are not within the company, right. External partners, we're gonna see a lot more data sharing and model creation. And so you're definitely going to be decentralized. >>So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, on data mesh. It was a great program. You invited Jamma, Dani, of course, she's the creator of the data mesh. And her one of our fundamental premises is that you've got this hyper specialized team that you've gotta go through. And if you want anything, but at the same time, these, these individuals actually become a bottleneck, even though they're some of the most talented people in the organization. So I guess question for you, Richard, how do you deal with that? Do you, do you organize so that there are a few sort of rock stars that, that, you know, build cubes and, and the like, and, and, and, or have you had any success in sort of decentralizing with, you know, your, your constituencies, that data model? >>Yeah. So, so we absolutely have got rockstar, data scientists and data guardians. If you like people who understand what it means to use this data, particularly as the data that we use at emos is very private it's healthcare information. And some of the, the rules and regulations around using the data are very complex and, and strict. So we have to have people who understand the usage of the data, then people who understand how to build models, how to process the data effectively. And you can think of them like consultants to the wider business, because a pharmacist might not understand how to structure a SQL query, but they do understand how they want to process medication information to improve patient lives. And so that becomes a, a consulting type experience from a, a set of rock stars to help a, a more decentralized business who needs to, to understand the data and to generate some valuable output. >>Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, I got a centralized team and that's the most cost effective way to serve the business. Otherwise I got, I got duplication. What do you say to that? >>Well, I, I would argue it's probably not the most cost effective and, and the reason being really twofold. I think, first of all, when you are deploying a enterprise data warehouse model, the, the data warehouse itself is very expensive, generally speaking. And so you're putting all of your most valuable data in the hands of one vendor who now has tremendous leverage over you, you know, for many, many years to come. I think that's the story at Oracle or Terra data or other proprietary database systems. But the other aspect I think is that the reality is those central data warehouse teams is as much as they are experts in the technology. They don't necessarily understand the data itself. And this is one of the core tenants of data mash that that jam writes about is this idea of the domain owners actually know the data the best. >>And so by, you know, not only acknowledging that data is generally decentralized and to your earlier point about SAR, brain Oxley, maybe saving the data warehouse, I would argue maybe GDPR and data sovereignty will destroy it because data has to be decentralized for, for those laws to be compliant. But I think the reality is, you know, the data mesh model basically says, data's decentralized, and we're gonna turn that into an asset rather than a liability. And we're gonna turn that into an asset by empowering the people that know the data, the best to participate in the process of, you know, curating and creating data products for, for consumption. So I think when you think about it, that way, you're going to get higher quality data and faster time to insight, which is ultimately going to drive more revenue for your business and reduce costs. So I think that that's the way I see the two, the two models comparing and contrasting. >>So do you think the demise of the data warehouse is inevitable? I mean, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing infrastructure. Maybe they're gonna build on top of it, but what does that mean? Does that mean the E D w just becomes, you know, less and less valuable over time, or it's maybe just isolated to specific use cases. What's your take on that? >>Listen, I still would love all my data within a data warehouse would love it. Mastered would love it owned by essential team. Right? I think that's still what I would love to have. That's just not the reality, right? The investment to actually migrate and keep that up to date. I would say it's a losing battle. Like we've been trying to do it for a long time. Nobody has the budgets and then data changes, right? There's gonna be a new technology. That's gonna emerge that we're gonna wanna tap into. There's going to be not enough investment to bring all the legacy, but still very useful systems into that centralized view. So you keep the data warehouse. I think it's a very, very valuable, very high performance tool for what it's there for, but you could have this, you know, new mesh layer that still takes advantage of the things. I mentioned, the data products in the systems that are meaningful today and the data products that actually might span a number of systems, maybe either those that either source systems for the domains that know it best, or the consumer based systems and products that need to be packaged in a way that be really meaningful for that end user, right? Each of those are useful for a different part of the business and making sure that the mesh actually allows you to use all of them. >>So, Richard, let me ask you, you take, take Gemma's principles back to those. You got to, you know, domain ownership and, and, and data as product. Okay, great. Sounds good. But it creates what I would argue are two, you know, challenges, self-serve infrastructure let's park that for a second. And then in your industry, the one of the high, most regulated, most sensitive computational governance, how do you automate and ensure federated governance in that mesh model that Theresa was just talking about? >>Well, it absolutely depends on some of the tooling and processes that you put in place around those tools to be, to centralize the security and the governance of the data. And I think, although a data warehouse makes that very simple, cause it's a single tool, it's not impossible with some of the data mesh technologies that are available. And so what we've done at emus is we have a single security layer that sits on top of our data match, which means that no matter which user is accessing, which data source, we go through a well audited well understood security layer. That means that we know exactly who's got access to which data field, which data tables. And then everything that they do is, is audited in a very kind of standard way, regardless of the underlying data storage technology. So for me, although storing the data in one place might not be possible understanding where your source of truth is and securing that in a common way is still a valuable approach and you can do it without having to bring all that data into a single bucket so that it's all in one place. And, and so having done that and investing quite heavily in making that possible has paid dividends in terms of giving wider access to the platform and ensuring that only data that's available under GDPR and other regulations is being used by, by the data users. >>Yeah. So Justin, I mean, Democrat, we always talk about data democratization and you know, up until recently, they really haven't been line of sight as to how to get there. But do you have anything to add to this because you're essentially taking, you know, do an analytic queries and with data that's all dispersed all over the, how are you seeing your customers handle this, this challenge? >>Yeah. I mean, I think data products is a really interesting aspect of the answer to that. It allows you to, again, leverage the data domain owners, people know the data, the best to, to create, you know, data as a product ultimately to be consumed. And we try to represent that in our product as effectively a almost eCommerce like experience where you go and discover and look for the data products that have been created in your organization. And then you can start to consume them as, as you'd like. And so really trying to build on that notion of, you know, data democratization and self-service, and making it very easy to discover and, and start to use with whatever BI tool you, you may like, or even just running, you know, SQL queries yourself, >>Okay. G guys grab a sip of water. After this short break, we'll be back to debate whether proprietary or open platforms are the best path to the future of data excellence, keep it right there. >>Your company has more data than ever, and more people trying to understand it, but there's a problem. Your data is stored across multiple systems. It's hard to access and that delays analytics and ultimately decisions. The old method of moving all of your data into a single source of truth is slow and definitely not built for the volume of data we have today or where we are headed while your data engineers spent over half their time, moving data, your analysts and data scientists are left, waiting, feeling frustrated, unproductive, and unable to move the needle for your business. But what if you could spend less time moving or copying data? What if your data consumers could analyze all your data quickly? >>Starburst helps your teams run fast queries on any data source. We help you create a single point of access to your data, no matter where it's stored. And we support high concurrency, we solve for speed and scale, whether it's fast, SQL queries on your data lake or faster queries across multiple data sets, Starburst helps your teams run analytics anywhere you can't afford to wait for data to be available. Your team has questions that need answers. Now with Starburst, the wait is over. You'll have faster access to data with enterprise level security, easy connectivity, and 24 7 support from experts, organizations like Zolando Comcast and FINRA rely on Starburst to move their businesses forward. Contact our Trino experts to get started. >>We're back with Jess Borgman of Starburst and Richard Jarvis of EVAs health. Okay, we're gonna get to lie. Number two, and that is this an open source based platform cannot give you the performance and control that you can get with a proprietary system. Is that a lie? Justin, the enterprise data warehouse has been pretty dominant and has evolved and matured. Its stack has mature over the years. Why is it not the default platform for data? >>Yeah, well, I think that's become a lie over time. So I, I think, you know, if we go back 10 or 12 years ago with the advent of the first data lake really around Hudu, that probably was true that you couldn't get the performance that you needed to run fast, interactive, SQL queries in a data lake. Now a lot's changed in 10 or 12 years. I remember in the very early days, people would say, you you'll never get performance because you need to be column there. You need to store data in a column format. And then, you know, column formats we're introduced to, to data apes, you have Parque ORC file in aro that were created to ultimately deliver performance out of that. So, okay. We got, you know, largely over the performance hurdle, you know, more recently people will say, well, you don't have the ability to do updates and deletes like a traditional data warehouse. >>And now we've got the creation of new data formats, again like iceberg and Delta and Hodi that do allow for updates and delete. So I think the data lake has continued to mature. And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, know it takes six or seven years to build a functional database. I think that's that's right. And now we've had almost a decade go by. So, you know, these technologies have matured to really deliver very, very close to the same level performance and functionality of, of cloud data warehouses. So I think the, the reality is that's become a line and now we have large giant hyperscale internet companies that, you know, don't have the traditional data warehouse at all. They do all of their analytics in a data lake. So I think we've, we've proven that it's very much possible today. >>Thank you for that. And so Richard, talk about your perspective as a practitioner in terms of what open brings you versus, I mean, look closed is it's open as a moving target. I remember Unix used to be open systems and so it's, it is an evolving, you know, spectrum, but, but from your perspective, what does open give you that you can't get from a proprietary system where you are fearful of in a proprietary system? >>I, I suppose for me open buys us the ability to be unsure about the future, because one thing that's always true about technology is it evolves in a, a direction, slightly different to what people expect. And what you don't want to end up is done is backed itself into a corner that then prevents it from innovating. So if you have chosen a technology and you've stored trillions of records in that technology and suddenly a new way of processing or machine learning comes out, you wanna be able to take advantage and your competitive edge might depend upon it. And so I suppose for us, we acknowledge that we don't have perfect vision of what the future might be. And so by backing open storage technologies, we can apply a number of different technologies to the processing of that data. And that gives us the ability to remain relevant, innovate on our data storage. And we have bought our way out of the, any performance concerns because we can use cloud scale infrastructure to scale up and scale down as we need. And so we don't have the concerns that we don't have enough hardware today to process what we want to do, want to achieve. We can just scale up when we need it and scale back down. So open source has really allowed us to maintain the being at the cutting edge. >>So Jess, let me play devil's advocate here a little bit, and I've talked to Shaak about this and you know, obviously her vision is there's an open source that, that the data meshes open source, an open source tooling, and it's not a proprietary, you know, you're not gonna buy a data mesh. You're gonna build it with, with open source toolings and, and vendors like you are gonna support it, but to come back to sort of today, you can get to market with a proprietary solution faster. I'm gonna make that statement. You tell me if it's a lie and then you can say, okay, we support Apache iceberg. We're gonna support open source tooling, take a company like VMware, not really in the data business, but how, the way they embraced Kubernetes and, and you know, every new open source thing that comes along, they say, we do that too. Why can't proprietary systems do that and be as effective? >>Yeah, well, I think at least with the, within the data landscape saying that you can access open data formats like iceberg or, or others is, is a bit dis disingenuous because really what you're selling to your customer is a certain degree of performance, a certain SLA, and you know, those cloud data warehouses that can reach beyond their own proprietary storage drop all the performance that they were able to provide. So it is, it reminds me kind of, of, again, going back 10 or 12 years ago when everybody had a connector to Haddo and that they thought that was the solution, right? But the reality was, you know, a connector was not the same as running workloads in Haddo back then. And I think similarly, you know, being able to connect to an external table that lives in an open data format, you know, you're, you're not going to give it the performance that your customers are accustomed to. And at the end of the day, they're always going to be predisposed. They're always going to be incentivized to get that data ingested into the data warehouse, cuz that's where they have control. And you know, the bottom line is the database industry has really been built around vendor lockin. I mean, from the start, how, how many people love Oracle today, but our customers, nonetheless, I think, you know, lockin is, is, is part of this industry. And I think that's really what we're trying to change with open data formats. >>Well, that's interesting reminded when I, you know, I see the, the gas price, the tees or gas price I, I drive up and then I say, oh, that's the cash price credit card. I gotta pay 20 cents more, but okay. But so the, the argument then, so let me, let me come back to you, Justin. So what's wrong with saying, Hey, we support open data formats, but yeah, you're gonna get better performance if you, if you keep it into our closed system, are you saying that long term that's gonna come back and bite you cuz you're gonna end up, you mentioned Oracle, you mentioned Teradata. Yeah. That's by, by implication, you're saying that's where snowflake customers are headed. >>Yeah, absolutely. I think this is a movie that, you know, we've all seen before. At least those of us who've been in the industry long enough to, to see this movie play over a couple times. So I do think that's the future. And I think, you know, I loved what Richard said. I actually wrote it down. Cause I thought it was an amazing quote. He said, it buys us the ability to be unsure of the future. Th that that pretty much says it all the, the future is unknowable and the reality is using open data formats. You remain interoperable with any technology you want to utilize. If you want to use spark to train a machine learning model and you want to use Starbust to query via sequel, that's totally cool. They can both work off the same exact, you know, data, data sets by contrast, if you're, you know, focused on a proprietary model, then you're kind of locked in again to that model. I think the same applies to data, sharing to data products, to a wide variety of, of aspects of the data landscape that a proprietary approach kind of closes you in and locks you in. >>So I, I would say this Richard, I'd love to get your thoughts on it. Cause I talked to a lot of Oracle customers, not as many te data customers, but, but a lot of Oracle customers and they, you know, they'll admit, yeah, you know, they're jamming us on price and the license cost they give, but we do get value out of it. And so my question to you, Richard, is, is do the, let's call it data warehouse systems or the proprietary systems. Are they gonna deliver a greater ROI sooner? And is that in allure of, of that customers, you know, are attracted to, or can open platforms deliver as fast in ROI? >>I think the answer to that is it can depend a bit. It depends on your businesses skillset. So we are lucky that we have a number of proprietary teams that work in databases that provide our operational data capability. And we have teams of analytics and big data experts who can work with open data sets and open data formats. And so for those different teams, they can get to an ROI more quickly with different technologies for the business though, we can't do better for our operational data stores than proprietary databases. Today we can back off very tight SLAs to them. We can demonstrate reliability from millions of hours of those databases being run at enterprise scale, but for an analytics workload where increasing our business is growing in that direction, we can't do better than open data formats with cloud-based data mesh type technologies. And so it's not a simple answer. That one will always be the right answer for our business. We definitely have times when proprietary databases provide a capability that we couldn't easily represent or replicate with open technologies. >>Yeah. Richard, stay with you. You mentioned, you know, you know, some things before that, that strike me, you know, the data brick snowflake, you know, thing is, oh, is a lot of fun for analysts like me. You've got data bricks coming at it. Richard, you mentioned you have a lot of rockstar, data engineers, data bricks coming at it from a data engineering heritage. You get snowflake coming at it from an analytics heritage. Those two worlds are, are colliding people like PJI Mohan said, you know what? I think it's actually harder to play in the data engineering. So I E it's easier to for data engineering world to go into the analytics world versus the reverse, but thinking about up and coming engineers and developers preparing for this future of data engineering and data analytics, how, how should they be thinking about the future? What, what's your advice to those young people? >>So I think I'd probably fall back on general programming skill sets. So the advice that I saw years ago was if you have open source technologies, the pythons and Javas on your CV, you commander 20% pay, hike over people who can only do proprietary programming languages. And I think that's true of data technologies as well. And from a business point of view, that makes sense. I'd rather spend the money that I save on proprietary licenses on better engineers, because they can provide more value to the business that can innovate us beyond our competitors. So I think I would my advice to people who are starting here or trying to build teams to capitalize on data assets is begin with open license, free capabilities, because they're very cheap to experiment with. And they generate a lot of interest from people who want to join you as a business. And you can make them very successful early, early doors with, with your analytics journey. >>It's interesting. Again, analysts like myself, we do a lot of TCO work and have over the last 20 plus years. And in world of Oracle, you know, normally it's the staff, that's the biggest nut in total cost of ownership, not an Oracle. It's the it's the license cost is by far the biggest component in the, in the blame pie. All right, Justin, help us close out this segment. We've been talking about this sort of data mesh open, closed snowflake data bricks. Where does Starburst sort of as this engine for the data lake data lake house, the data warehouse fit in this, in this world? >>Yeah. So our view on how the future ultimately unfolds is we think that data lakes will be a natural center of gravity for a lot of the reasons that we described open data formats, lowest total cost of ownership, because you get to choose the cheapest storage available to you. Maybe that's S3 or Azure data lake storage, or Google cloud storage, or maybe it's on-prem object storage that you bought at a, at a really good price. So ultimately storing a lot of data in a deal lake makes a lot of sense, but I think what makes our perspective unique is we still don't think you're gonna get everything there either. We think that basically centralization of all your data assets is just an impossible endeavor. And so you wanna be able to access data that lives outside of the lake as well. So we kind of think of the lake as maybe the biggest place by volume in terms of how much data you have, but to, to have comprehensive analytics and to truly understand your business and understand it holistically, you need to be able to go access other data sources as well. And so that's the role that we wanna play is to be a single point of access for our customers, provide the right level of fine grained access controls so that the right people have access to the right data and ultimately make it easy to discover and consume via, you know, the creation of data products as well. >>Great. Okay. Thanks guys. Right after this quick break, we're gonna be back to debate whether the cloud data model that we see emerging and the so-called modern data stack is really modern, or is it the same wine new bottle? When it comes to data architectures, you're watching the cube, the leader in enterprise and emerging tech coverage. >>Your data is capable of producing incredible results, but data consumers are often left in the dark without fast access to the data they need. Starers makes your data visible from wherever it lives. Your company is acquiring more data in more places, more rapidly than ever to rely solely on a data centralization strategy. Whether it's in a lake or a warehouse is unrealistic. A single source of truth approach is no longer viable, but disconnected data silos are often left untapped. We need a new approach. One that embraces distributed data. One that enables fast and secure access to any of your data from anywhere with Starburst, you'll have the fastest query engine for the data lake that allows you to connect and analyze your disparate data sources no matter where they live Starburst provides the foundational technology required for you to build towards the vision of a decentralized data mesh Starburst enterprise and Starburst galaxy offer enterprise ready, connectivity, interoperability, and security features for multiple regions, multiple clouds and everchanging global regulatory requirements. The data is yours. And with Starburst, you can perform analytics anywhere in light of your world. >>Okay. We're back with Justin Boardman. CEO of Starbust Richard Jarvis is the CTO of EMI health and Theresa tongue is the cloud first technologist from Accenture. We're on July number three. And that is the claim that today's modern data stack is actually modern. So I guess that's the lie it's it is it's is that it's not modern. Justin, what do you say? >>Yeah. I mean, I think new isn't modern, right? I think it's the, it's the new data stack. It's the cloud data stack, but that doesn't necessarily mean it's modern. I think a lot of the components actually are exactly the same as what we've had for 40 years, rather than Terra data. You have snowflake rather than Informatica you have five trend. So it's the same general stack, just, you know, a cloud version of it. And I think a lot of the challenges that it plagued us for 40 years still maintain. >>So lemme come back to you just, but okay. But, but there are differences, right? I mean, you can scale, you can throw resources at the problem. You can separate compute from storage. You really, you know, there's a lot of money being thrown at that by venture capitalists and snowflake, you mentioned it's competitors. So that's different. Is it not, is that not at least an aspect of, of modern dial it up, dial it down. So what, what do you say to that? >>Well, it, it is, it's certainly taking, you know, what the cloud offers and taking advantage of that, but it's important to note that the cloud data warehouses out there are really just separating their compute from their storage. So it's allowing them to scale up and down, but your data still stored in a proprietary format. You're still locked in. You still have to ingest the data to get it even prepared for analysis. So a lot of the same sort of structural constraints that exist with the old enterprise data warehouse model OnPrem still exist just yes, a little bit more elastic now because the cloud offers that. >>So Theresa, let me go to you cuz you have cloud first in your, in your, your title. So what's what say you to this conversation? >>Well, even the cloud providers are looking towards more of a cloud continuum, right? So the centralized cloud, as we know it, maybe data lake data warehouse in the central place, that's not even how the cloud providers are looking at it. They have news query services. Every provider has one that really expands those queries to be beyond a single location. And if we look at a lot of where our, the future goes, right, that that's gonna very much fall the same thing. There was gonna be more edge. There's gonna be more on premise because of data sovereignty, data gravity, because you're working with different parts of the business that have already made major cloud investments in different cloud providers. Right? So there's a lot of reasons why the modern, I guess, the next modern generation of the data staff needs to be much more federated. >>Okay. So Richard, how do you deal with this? You you've obviously got, you know, the technical debt, the existing infrastructure it's on the books. You don't wanna just throw it out. A lot of, lot of conversation about modernizing applications, which a lot of times is a, you know, a microservices layer on top of leg legacy apps. How do you think about the modern data stack? >>Well, I think probably the first thing to say is that the stack really has to include the processes and people around the data as well is all well and good changing the technology. But if you don't modernize how people use that technology, then you're not going to be able to, to scale because just cuz you can scale CPU and storage doesn't mean you can get more people to use your data, to generate you more, more value for the business. And so what we've been looking at is really changing in very much aligned to data products and, and data mesh. How do you enable more people to consume the service and have the stack respond in a way that keeps costs low? Because that's important for our customers consuming this data, but also allows people to occasionally run enormous queries and then tick along with smaller ones when required. And it's a good job we did because during COVID all of a sudden we had enormous pressures on our data platform to answer really important life threatening queries. And if we couldn't scale both our data stack and our teams, we wouldn't have been able to answer those as quickly as we had. So I think the stack needs to support a scalable business, not just the technology itself. >>Well thank you for that. So Justin let's, let's try to break down what the critical aspects are of the modern data stack. So you think about the past, you know, five, seven years cloud obviously has given a different pricing model. De-risked experimentation, you know that we talked about the ability to scale up scale down, but it's, I'm, I'm taking away that that's not enough based on what Richard just said. The modern data stack has to serve the business and enable the business to build data products. I, I buy that. I'm a big fan of the data mesh concepts, even though we're early days. So what are the critical aspects if you had to think about, you know, paying, maybe putting some guardrails and definitions around the modern data stack, what does that look like? What are some of the attributes and, and principles there >>Of, of how it should look like or, or how >>It's yeah. What it should be. >>Yeah. Yeah. Well, I think, you know, in, in Theresa mentioned this in, in a previous segment about the data warehouse is not necessarily going to disappear. It just becomes one node, one element of the overall data mesh. And I, I certainly agree with that. So by no means, are we suggesting that, you know, snowflake or Redshift or whatever cloud data warehouse you may be using is going to disappear, but it's, it's not going to become the end all be all. It's not the, the central single source of truth. And I think that's the paradigm shift that needs to occur. And I think it's also worth noting that those who were the early adopters of the modern data stack were primarily digital, native born in the cloud young companies who had the benefit of, of idealism. They had the benefit of it was starting with a clean slate that does not reflect the vast majority of enterprises. >>And even those companies, as they grow up mature out of that ideal state, they go buy a business. Now they've got something on another cloud provider that has a different data stack and they have to deal with that heterogeneity that is just change and change is a part of life. And so I think there is an element here that is almost philosophical. It's like, do you believe in an absolute ideal where I can just fit everything into one place or do I believe in reality? And I think the far more pragmatic approach is really what data mesh represents. So to answer your question directly, I think it's adding, you know, the ability to access data that lives outside of the data warehouse, maybe living in open data formats in a data lake or accessing operational systems as well. Maybe you want to directly access data that lives in an Oracle database or a Mongo database or, or what have you. So creating that flexibility to really Futureproof yourself from the inevitable change that you will, you won't encounter over time. >>So thank you. So there, based on what Justin just said, I, my takeaway there is it's inclusive, whether it's a data Mar data hub, data lake data warehouse, it's a, just a node on the mesh. Okay. I get that. Does that include there on Preem data? O obviously it has to, what are you seeing in terms of the ability to, to take that data mesh concept on Preem? I mean, most implementations I've seen in data mesh, frankly really aren't, you know, adhering to the philosophy. They're maybe, maybe it's data lake and maybe it's using glue. You look at what JPMC is doing. Hello, fresh, a lot of stuff happening on the AWS cloud in that, you know, closed stack, if you will. What's the answer to that Theresa? >>I mean, I, I think it's a killer case for data. Me, the fact that you have valuable data sources, OnPrem, and then yet you still wanna modernize and take the best of cloud cloud is still, like we mentioned, there's a lot of great reasons for it around the economics and the way ability to tap into the innovation that the cloud providers are giving around data and AI architecture. It's an easy button. So the mesh allows you to have the best of both worlds. You can start using the data products on-prem or in the existing systems that are working already. It's meaningful for the business. At the same time, you can modernize the ones that make business sense because it needs better performance. It needs, you know, something that is, is cheaper or, or maybe just tap into better analytics to get better insights, right? So you're gonna be able to stretch and really have the best of both worlds. That, again, going back to Richard's point, that is meaningful by the business. Not everything has to have that one size fits all set a tool. >>Okay. Thank you. So Richard, you know, talking about data as product, wonder if we could give us your perspectives here, what are the advantages of treating data as a product? What, what role do data products have in the modern data stack? We talk about monetizing data. What are your thoughts on data products? >>So for us, one of the most important data products that we've been creating is taking data that is healthcare data across a wide variety of different settings. So information about patients' demographics about their, their treatment, about their medications and so on, and taking that into a standards format that can be utilized by a wide variety of different researchers because misinterpreting that data or having the data not presented in the way that the user is expecting means that you generate the wrong insight. And in any business, that's clearly not a desirable outcome, but when that insight is so critical, as it might be in healthcare or some security settings, you really have to have gone to the trouble of understanding the data, presenting it in a format that everyone can clearly agree on. And then letting people consume in a very structured, managed way, even if that data comes from a variety of different sources in, in, in the first place. And so our data product journey has really begun by standardizing data across a number of different silos through the data mesh. So we can present out both internally and through the right governance externally to, to researchers. >>So that data product through whatever APIs is, is accessible, it's discoverable, but it's obviously gotta be governed as well. You mentioned you, you appropriately provided to internally. Yeah. But also, you know, external folks as well. So the, so you've, you've architected that capability today >>We have, and because the data is standard, it can generate value much more quickly and we can be sure of the security and, and, and value that that's providing because the data product isn't just about formatting the data into the correct tables, it's understanding what it means to redact the data or to remove certain rows from it or to interpret what a date actually means. Is it the start of the contract or the start of the treatment or the date of birth of a patient? These things can be lost in the data storage without having the proper product management around the data to say in a very clear business context, what does this data mean? And what does it mean to process this data for a particular use case? >>Yeah, it makes sense. It's got the context. If the, if the domains own the data, you, you gotta cut through a lot of the, the, the centralized teams, the technical teams that, that data agnostic, they don't really have that context. All right. Let's send Justin, how does Starburst fit into this modern data stack? Bring us home. >>Yeah. So I think for us, it's really providing our customers with, you know, the flexibility to operate and analyze data that lives in a wide variety of different systems. Ultimately giving them that optionality, you know, and optionality provides the ability to reduce costs, store more in a data lake rather than data warehouse. It provides the ability for the fastest time to insight to access the data directly where it lives. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, you can really create and, and curate, you know, data as a product to be shared and consumed. So we're trying to help enable the data mesh, you know, model and make that an appropriate compliment to, you know, the, the, the modern data stack that people have today. >>Excellent. Hey, I wanna thank Justin Theresa and Richard for joining us today. You guys are great. I big believers in the, in the data mesh concept, and I think, you know, we're seeing the future of data architecture. So thank you. Now, remember, all these conversations are gonna be available on the cube.net for on-demand viewing. You can also go to starburst.io. They have some great content on the website and they host some really thought provoking interviews and, and, and they have awesome resources, lots of data mesh conversations over there, and really good stuff in, in the resource section. So check that out. Thanks for watching the data doesn't lie or does it made possible by Starburst data? This is Dave Valante for the cube, and we'll see you next time. >>The explosion of data sources has forced organizations to modernize their systems and architecture and come to terms with one size does not fit all for data management today. Your teams are constantly moving and copying data, which requires time management. And in some cases, double paying for compute resources. Instead, what if you could access all your data anywhere using the BI tools and SQL skills your users already have. And what if this also included enterprise security and fast performance with Starburst enterprise, you can provide your data consumers with a single point of secure access to all of your data, no matter where it lives with features like strict, fine grained, access control, end to end data encryption and data masking Starburst meets the security standards of the largest companies. Starburst enterprise can easily be deployed anywhere and managed with insights where data teams holistically view their clusters operation and query execution. So they can reach meaningful business decisions faster, all this with the support of the largest team of Trino experts in the world, delivering fully tested stable releases and available to support you 24 7 to unlock the value in all of your data. You need a solution that easily fits with what you have today and can adapt to your architecture. Tomorrow. Starbust enterprise gives you the fastest path from big data to better decisions, cuz your team can't afford to wait. Trino was created to empower analytics anywhere and Starburst enterprise was created to give you the enterprise grade performance, connectivity, security management, and support your company needs organizations like Zolando Comcast and FINRA rely on Starburst to move their businesses forward. Contact us to get started.

Published Date : Aug 20 2022

SUMMARY :

famously said the best minds of my generation are thinking about how to get people to the data warehouse ever have featured parody with the data lake or vice versa is So, you know, despite being the industry leader for 40 years, not one of their customers truly had So Richard, from a practitioner's point of view, you know, what, what are your thoughts? although if you were starting from a Greenfield site and you were building something brand new, Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, I, I think you gotta have centralized governance, right? So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, And you can think of them Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, you know, for many, many years to come. But I think the reality is, you know, the data mesh model basically says, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing that the mesh actually allows you to use all of them. But it creates what I would argue are two, you know, Well, it absolutely depends on some of the tooling and processes that you put in place around those do an analytic queries and with data that's all dispersed all over the, how are you seeing your the best to, to create, you know, data as a product ultimately to be consumed. open platforms are the best path to the future of data But what if you could spend less you create a single point of access to your data, no matter where it's stored. give you the performance and control that you can get with a proprietary system. I remember in the very early days, people would say, you you'll never get performance because And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, know it takes six or seven it is an evolving, you know, spectrum, but, but from your perspective, And what you don't want to end up So Jess, let me play devil's advocate here a little bit, and I've talked to Shaak about this and you know, And I think similarly, you know, being able to connect to an external table that lives in an open data format, Well, that's interesting reminded when I, you know, I see the, the gas price, And I think, you know, I loved what Richard said. not as many te data customers, but, but a lot of Oracle customers and they, you know, And so for those different teams, they can get to an ROI more quickly with different technologies that strike me, you know, the data brick snowflake, you know, thing is, oh, is a lot of fun for analysts So the advice that I saw years ago was if you have open source technologies, And in world of Oracle, you know, normally it's the staff, easy to discover and consume via, you know, the creation of data products as well. really modern, or is it the same wine new bottle? And with Starburst, you can perform analytics anywhere in light of your world. And that is the claim that today's So it's the same general stack, just, you know, a cloud version of it. So lemme come back to you just, but okay. So a lot of the same sort of structural constraints that exist with So Theresa, let me go to you cuz you have cloud first in your, in your, the data staff needs to be much more federated. you know, a microservices layer on top of leg legacy apps. So I think the stack needs to support a scalable So you think about the past, you know, five, seven years cloud obviously has given What it should be. And I think that's the paradigm shift that needs to occur. data that lives outside of the data warehouse, maybe living in open data formats in a data lake seen in data mesh, frankly really aren't, you know, adhering to So the mesh allows you to have the best of both worlds. So Richard, you know, talking about data as product, wonder if we could give us your perspectives is expecting means that you generate the wrong insight. But also, you know, around the data to say in a very clear business context, It's got the context. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, This is Dave Valante for the cube, and we'll see you next time. You need a solution that easily fits with what you have today and can adapt

ENTITIES

Entity	Category	Confidence
Richard	PERSON	0.99+
Dave Lanta	PERSON	0.99+
Jess Borgman	PERSON	0.99+
Justin	PERSON	0.99+
Theresa	PERSON	0.99+
Justin Borgman	PERSON	0.99+
Teresa	PERSON	0.99+
Jeff Ocker	PERSON	0.99+
Richard Jarvis	PERSON	0.99+
Dave Valante	PERSON	0.99+
Justin Boardman	PERSON	0.99+
six	QUANTITY	0.99+
Dani	PERSON	0.99+
Massachusetts	LOCATION	0.99+
20 cents	QUANTITY	0.99+
Teradata	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Jamma	PERSON	0.99+
UK	LOCATION	0.99+
FINRA	ORGANIZATION	0.99+
40 years	QUANTITY	0.99+
Kurt Monash	PERSON	0.99+
20%	QUANTITY	0.99+
two	QUANTITY	0.99+
five	QUANTITY	0.99+
Jess	PERSON	0.99+
2011	DATE	0.99+
Starburst	ORGANIZATION	0.99+
10	QUANTITY	0.99+
Accenture	ORGANIZATION	0.99+
seven years	QUANTITY	0.99+
thousands	QUANTITY	0.99+
pythons	TITLE	0.99+
Boston	LOCATION	0.99+
GDPR	TITLE	0.99+
Today	DATE	0.99+
two models	QUANTITY	0.99+
Zolando Comcast	ORGANIZATION	0.99+
Gemma	PERSON	0.99+
Starbust	ORGANIZATION	0.99+
JPMC	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
Javas	TITLE	0.99+
today	DATE	0.99+
AWS	ORGANIZATION	0.99+
millions	QUANTITY	0.99+
first lie	QUANTITY	0.99+
10	DATE	0.99+
12 years	QUANTITY	0.99+
one place	QUANTITY	0.99+
Tomorrow	DATE	0.99+

Starburst Panel Q1

>>In 2011, early Facebook employee and Cloudera co-founder Jeff Ocker famously said the best minds of my generation are thinking about how to get people to click on ads. And that sucks. Let's face it more than a decade later organizations continue to be frustrated with how difficult it is to get value from data and build a truly agile data driven enterprise. What does that even mean? You ask? Well, it means that everyone in the organization has the data they need when they need it. In a context that's relevant to advance the mission of an organization. Now that could mean cutting costs could mean increasing profits, driving productivity, saving lives, accelerating drug discovery, making better diagnoses, solving, supply chain problems, predicting weather disasters, simplifying processes, and thousands of other examples where data can completely transform people's lives beyond manipulating internet users to behave a certain way. We've heard the prognostications about the possibilities of data before and in fairness we've made progress, but the hard truth is the original promises of master data management, enterprise data, warehouses, data, Mars, data hubs, and yes, even data lakes were broken and left us wanting for more welcome to the data doesn't lie, or does it a series of conversations produced by the cube and made possible by Starburst data. >>I'm your host, Dave Lanta and joining me today are three industry experts. Justin Borgman is this co-founder and CEO of Starburst. Richard Jarvis is the CTO at EMI health and Theresa tongue is cloud first technologist at Accenture. Today we're gonna have a candid discussion that will expose the unfulfilled and yes, broken promises of a data past we'll expose data lies, big lies, little lies, white lies, and hidden truths. And we'll challenge, age old data conventions and bust some data myths. We're debating questions like is the demise of a single source of truth. Inevitable will the data warehouse ever have feature parody with the data lake or vice versa is the so-called modern data stack simply centralization in the cloud, AKA the old guards model in new cloud close. How can organizations rethink their data architectures and regimes to realize the true promises of data can and will and open ecosystem deliver on these promises in our lifetimes, we're spanning much of the Western world today. Richard is in the UK. Teresa is on the west coast and Justin is in Massachusetts with me. I'm in the cube studios about 30 miles outside of Boston folks. Welcome to the program. Thanks for coming on. Thanks for having us. Let's get right into it. You're very welcome. Now here's the first lie. The most effective data architecture is one that is centralized with a team of data specialists serving various lines of business. What do you think Justin? >>Yeah, definitely a lie. My first startup was a company called hit adapt, which was an early SQL engine for IDU that was acquired by Teradata. And when I got to Teradata, of course, Terada is the pioneer of that central enterprise data warehouse model. One of the things that I found fascinating was that not one of their customers had actually lived up to that vision of centralizing all of their data into one place. They all had data silos. They all had data in different systems. They had data on-prem data in the cloud. You know, those companies were acquiring other companies and inheriting their data architecture. So, you know, despite being the industry leader for 40 years, not one of their customers truly had everything in one place. So I think definitely history has proven that to be a lie. >>So Richard, from a practitioner's point of view, you know, what, what are your thoughts? I mean, there, there's a lot of pressure to cut cost, keep things centralized, you know, serve the business as best as possible from that standpoint. What, what is your experience, Joe? >>Yeah, I mean, I think I would echo Justin's experience really that we, as a business have grown up through acquisition, through storing data in different places sometimes to do information governance in different ways to store data in, in a platform that's close to data experts, people who really understand healthcare data from pharmacies or from, from doctors. And so, although if you were starting from a Greenfield site and you were building something brand new, you might be able to centralize all the data and all of the tooling and teams in one place. The reality is that that businesses just don't grow up like that. And, and it's just really impossible to get that academic perfection of, of storing everything in one place. >>Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, you know? Right. But you actually did have to have a single version of the truth for certain financial data, but really for those, some of those other use cases, I, I mentioned, I, I do feel like the industry has kinda let us down. What's your take on this? Where does it make sense to have that sort of centralized approach versus where does it make sense to maybe decentralized? >>I, I think you gotta have centralized governance, right? So from the central team, for things like swans Oxley, for things like security, for certain very core data sets, having a centralized set of roles, responsibilities to really QA, right. To serve as a design authority for your entire data estate, just like you might with security, but how it's implemented has to be distributed. Otherwise you're not gonna be able to scale. Right? So being able to have different parts of the business really make the right data investments for their needs. And then ultimately you're gonna collaborate with your partners. So partners that are not within the company, right. External partners, we're gonna see a lot more data sharing and model creation. And so you're definitely going to be decentralized. >>So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, on data mesh. It was a great program. You invited JAK, Dani, of course, she's the creator of the data mesh. And her one of our fundamental premises is that you've got this hyper specialized team that you've gotta go through. And if you want anything, but at the same time, these, these individuals actually become a bottleneck, even though they're some of the most talented people in the organization. So I guess question for you, Richard, how do you deal with that? Do you, do you organize so that there are a few sort of rock stars that, that, you know, build cubes and, and the like, and, and, and, or have you had any success in sort of decentralizing with, you know, your, your constituencies, that data model? >>Yeah. So, so we absolutely have got rockstar, data scientists and data guardians. If you like people who understand what it means to use this data, particularly as the data that we use at emos is very private it's healthcare information. And some of the, the rules and regulations around using the data are very complex and, and strict. So we have to have people who understand the usage of the data, then people who understand how to build models, how to process the data effectively. And you can think of them like consultants to the wider business, because a pharmacist might not understand how to structure a SQL query, but they do understand how they want to process medication information to improve patient lives. And so that becomes a, a consulting type experience from a, a set of rock stars to help a, a more decentralized business who needs to, to understand the data and to generate some valuable output. >>Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, I got a centralized team and that's the most cost effective way to serve the business. Otherwise I got, I got duplication. What do you say to that? >>Well, I, I would argue it's probably not the most cost effective and, and the reason being really twofold. I think, first of all, when you are deploying a enterprise data warehouse model, the, the data warehouse itself is very expensive, generally speaking. And so you're putting all of your most valuable data in the hands of one vendor who now has tremendous leverage over you, you know, for many, many years to come, I think that's the story of Oracle or Terra data or other proprietary database systems. But the other aspect I think is that the reality is those central data warehouse teams is as much as they are experts in the technology. They don't necessarily understand the data itself. And this is one of the core tenets of data mash that that jam writes about is this idea of the domain owners actually know the data the best. >>And so by, you know, not only acknowledging that data is generally decentralized and to your earlier point about, so Oxley, maybe saving the data warehouse, I would argue maybe GDPR and data sovereignty will destroy it because data has to be decentralized for, for those laws to be compliant. But I think the reality is, you know, the data mesh model basically says, data's decentralized, and we're gonna turn that into an asset rather than a liability. And we're gonna turn that into an asset by empowering the people that know the data, the best to participate in the process of, you know, curating and creating data products for, for consumption. So I think when you think about it, that way, you're going to get higher quality data and faster time to insight, which is ultimately going to drive more revenue for your business and reduce costs. So I think that that's the way I see the two, the two models comparing and con contrasting. >>So do you think the demise of the data warehouse is inevitable? I mean, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing infrastructure. Maybe they're gonna build on top of it, but the, what does that mean? Does that mean the ed w just becomes, you know, less and less valuable over time, or it's maybe just isolated to specific use cases. What's your take on that? >>Listen, I still would love all my data within a data warehouse would love it. Mastered would love it owned by essential team. Right? I think that's still what I would love to have. That's just not the reality, right? The investment to actually migrate and keep that up to date. I would say it's a losing battle. Like we've been trying to do it for a long time. Nobody has the budgets and then data changes, right? There's gonna be a new technology. That's gonna emerge that we're gonna wanna tap into. There's gonna be not enough investment to bring all the legacy, but still very useful systems into that centralized view. So you keep the data warehouse. I think it's a very, very valuable, very high performance tool for what it's there for, but you could have this, you know, new mesh layer that still takes advantage of the things. I mentioned, the data products in the systems that are meaningful today and the data products that actually might span a number of systems. Maybe either those that either source systems, the domains that know it best, or the consumer based systems and products that need to be packaged in a way that be really meaningful for that end user, right? Each of those are useful for a different part of the business and making sure that the mesh actually allows you to lose all of them. >>So, Richard, let me ask you, you take, take Gemma's principles back to those. You got, you know, the domain ownership and, and, and data as product. Okay, great. Sounds good. But it creates what I would argue or two, you know, challenges self-serve infrastructure let's park that for a second. And then in your industry, one of the high, most regulated, most sensitive computational governance, how do you automate and ensure federated governance in that mesh model that Theresa was just talking about? >>Well, it absolutely depends on some of the tooling and processes that you put in place around those tools to be, to centralize the security and the governance of the data. And, and I think, although a data warehouse makes that very simple, cause it's a single tool, it's not impossible with some of the data mesh technologies that are available. And so what we've done at EMI is we have a single security layer that sits on top of our data mesh, which means that no matter which user is accessing, which data source, we go through a well audited well understood security layer. That means that we know exactly who's got access to which data field, which data tables. And then everything that they do is, is audited in a very kind of standard way, regardless of the underlying data storage technology. So for me, although storing the data in one place might not be possible understanding where your source of truth is and securing that in a common way is still a valuable approach and you can do it without having to bring all that data into a single bucket so that it's all in one place. >>And, and so having done that and investing quite heavily in making that possible has paid dividends in terms of giving wider access to the platform and ensuring that only data that's available under GDPR and other regulations is being used by, by the data users. >>Yeah. So Justin mean Democrat, we always talk about data democratization and you know, up until recently, they really haven't been line of sight as to how to get there. But do you have anything to add to this because you're essentially taking, you know, doing analytic queries and with data, that's all dispersed all over the, how are you seeing your customers handle this, this challenge? >>Yeah, I mean, I think data products is a really interesting aspect of the answer to that. It allows you to, again, leverage the data domain owners, people know the data, the best to, to create, you know, data as a product ultimately to be consumed. And we try to represent that in our product as effectively, almost eCommerce, like experience where you go and discover and look for the data products that have been created in your organization. And then you can start to consume them as, as you'd like. And so really trying to build on that notion of, you know, data democratization and self-service, and making it very easy to discover and, and start to use with whatever BI tool you, you may like, or even just running, you know, SQL queries yourself. >>Okay. G guys grab a sip of water. After the short break, we'll be back to debate whether proprietary or open platforms are the best path to the future of data excellence. Keep it right there.

Published Date : Aug 2 2022

SUMMARY :

famously said the best minds of my generation are thinking about how to get people to Teresa is on the west coast and Justin is in Massachusetts with me. So, you know, despite being the industry leader for 40 years, not one of their customers truly had So Richard, from a practitioner's point of view, you know, what, what are your thoughts? you might be able to centralize all the data and all of the tooling and teams in one place. Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, I, I think you gotta have centralized governance, right? of rock stars that, that, you know, build cubes and, and the like, And you can think of them like consultants Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, you know, for many, many years to come, I think that's the story of Oracle or Terra data or other proprietary But I think the reality is, you know, the data mesh model basically says, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing you know, new mesh layer that still takes advantage of the things. But it creates what I would argue or two, you know, Well, it absolutely depends on some of the tooling and processes that you put in place around And, and so having done that and investing quite heavily in making that possible But do you have anything to add to this because you're essentially taking, you know, the best to, to create, you know, data as a product ultimately to be consumed. open platforms are the best path to the future of

ENTITIES

Entity	Category	Confidence
Dave Lanta	PERSON	0.99+
Dani	PERSON	0.99+
Richard	PERSON	0.99+
Justin Borgman	PERSON	0.99+
Justin	PERSON	0.99+
Jeff Ocker	PERSON	0.99+
Theresa	PERSON	0.99+
Richard Jarvis	PERSON	0.99+
Teresa	PERSON	0.99+
Massachusetts	LOCATION	0.99+
Teradata	ORGANIZATION	0.99+
40 years	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
UK	LOCATION	0.99+
two	QUANTITY	0.99+
Joe	PERSON	0.99+
GDPR	TITLE	0.99+
JAK	PERSON	0.99+
2011	DATE	0.99+
Starburst	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
thousands	QUANTITY	0.99+
two models	QUANTITY	0.99+
EMI	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
Gemma	PERSON	0.99+
Terada	ORGANIZATION	0.99+
Accenture	ORGANIZATION	0.99+
Each	QUANTITY	0.99+
first lie	QUANTITY	0.99+
today	DATE	0.99+
first startup	QUANTITY	0.98+
Cloudera	ORGANIZATION	0.98+
Today	DATE	0.98+
SQL	TITLE	0.98+
first technologist	QUANTITY	0.97+
one place	QUANTITY	0.97+
Democrat	ORGANIZATION	0.97+
single	QUANTITY	0.97+
about 30 miles	QUANTITY	0.97+
one	QUANTITY	0.96+
three industry experts	QUANTITY	0.95+
more than a decade later	DATE	0.94+
One	QUANTITY	0.94+
hit adapt	ORGANIZATION	0.94+
Terra data	ORGANIZATION	0.93+
Greenfield	LOCATION	0.92+
single source	QUANTITY	0.91+
single tool	QUANTITY	0.91+
Oxley	PERSON	0.91+
one vendor	QUANTITY	0.9+
single bucket	QUANTITY	0.9+
single version	QUANTITY	0.88+
about a year ago	DATE	0.85+
Theresa tongue	PERSON	0.83+
emos	ORGANIZATION	0.82+
Mars	ORGANIZATION	0.8+
swans Oxley	PERSON	0.77+
IDU	TITLE	0.69+
first	QUANTITY	0.59+
a second	QUANTITY	0.55+
Sarbanes Oxley	ORGANIZATION	0.53+
Mastered	PERSON	0.45+
Q1	QUANTITY	0.37+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Justin Borgman: