Collibra Data Citizens 22

>>Collibra is a company that was founded in 2008 right before the so-called modern big data era kicked into high gear. The company was one of the first to focus its business on data governance. Now, historically, data governance and data quality initiatives, they were back office functions and they were largely confined to regulatory regulated industries that had to comply with public policy mandates. But as the cloud went mainstream, the tech giants showed us how valuable data could become and the value proposition for data quality and trust. It evolved from primarily a compliance driven issue to becoming a lynchpin of competitive advantage. But data in the decade of the 2010s was largely about getting the technology to work. You had these highly centralized technical teams that were formed and they had hyper specialized skills to develop data architectures and processes to serve the myriad data needs of organizations. >>And it resulted in a lot of frustration with data initiatives for most organizations that didn't have the resources of the cloud guys and the social media giants to really attack their data problems and turn data into gold. This is why today for example, this quite a bit of momentum to rethinking monolithic data architectures. You see, you hear about initiatives like data mesh and the idea of data as a product. They're gaining traction as a way to better serve the the data needs of decentralized business Uni users, you hear a lot about data democratization. So these decentralization efforts around data, they're great, but they create a new set of problems. Specifically, how do you deliver like a self-service infrastructure to business users and domain experts? Now the cloud is definitely helping with that, but also how do you automate governance? This becomes especially tricky as protecting data privacy has become more and more important. >>In other words, while it's enticing to experiment and run fast and loose with data initiatives kinda like the Wild West, to find new veins of gold, it has to be done responsibly. As such, the idea of data governance has had to evolve to become more automated. And intelligence governance and data lineage is still fundamental to ensuring trust as data. It moves like water through an organization. No one is gonna use data that isn't trusted. Metadata has become increasingly important for data discovery and data classification. As data flows through an organization, the continuously ability to check for data flaws and automating that data quality, they become a functional requirement of any modern data management platform. And finally, data privacy has become a critical adjacency to cyber security. So you can see how data governance has evolved into a much richer set of capabilities than it was 10 or 15 years ago. >>Hello and welcome to the Cube's coverage of Data Citizens made possible by Calibra, a leader in so-called Data intelligence and the host of Data Citizens 2022, which is taking place in San Diego. My name is Dave Ante and I'm one of the hosts of our program, which is running in parallel to data citizens. Now at the Cube we like to say we extract the signal from the noise, and over the, the next couple of days, we're gonna feature some of the themes from the keynote speakers at Data Citizens and we'll hear from several of the executives. Felix Von Dala, who is the co-founder and CEO of Collibra, will join us along with one of the other founders of Collibra, Stan Christians, who's gonna join my colleague Lisa Martin. I'm gonna also sit down with Laura Sellers, she's the Chief Product Officer at Collibra. We'll talk about some of the, the announcements and innovations they're making at the event, and then we'll dig in further to data quality with Kirk Hasselbeck. >>He's the vice president of Data quality at Collibra. He's an amazingly smart dude who founded Owl dq, a company that he sold to Col to Collibra last year. Now many companies, they didn't make it through the Hado era, you know, they missed the industry waves and they became Driftwood. Collibra, on the other hand, has evolved its business. They've leveraged the cloud, expanded its product portfolio, and leaned in heavily to some major partnerships with cloud providers, as well as receiving a strategic investment from Snowflake earlier this year. So it's a really interesting story that we're thrilled to be sharing with you. Thanks for watching and I hope you enjoy the program. >>Last year, the Cube Covered Data Citizens Collibra's customer event. And the premise that we put forth prior to that event was that despite all the innovation that's gone on over the last decade or more with data, you know, starting with the Hado movement, we had data lakes, we'd spark the ascendancy of programming languages like Python, the introduction of frameworks like TensorFlow, the rise of ai, low code, no code, et cetera. Businesses still find it's too difficult to get more value from their data initiatives. And we said at the time, you know, maybe it's time to rethink data innovation. While a lot of the effort has been focused on, you know, more efficiently storing and processing data, perhaps more energy needs to go into thinking about the people and the process side of the equation, meaning making it easier for domain experts to both gain insights for data, trust the data, and begin to use that data in new ways, fueling data, products, monetization and insights data citizens 2022 is back and we're pleased to have Felix Van Dema, who is the founder and CEO of Collibra. He's on the cube or excited to have you, Felix. Good to see you again. >>Likewise Dave. Thanks for having me again. >>You bet. All right, we're gonna get the update from Felix on the current data landscape, how he sees it, why data intelligence is more important now than ever and get current on what Collibra has been up to over the past year and what's changed since Data Citizens 2021. And we may even touch on some of the product news. So Felix, we're living in a very different world today with businesses and consumers. They're struggling with things like supply chains, uncertain economic trends, and we're not just snapping back to the 2010s. That's clear, and that's really true as well in the world of data. So what's different in your mind, in the data landscape of the 2020s from the previous decade, and what challenges does that bring for your customers? >>Yeah, absolutely. And, and I think you said it well, Dave, and and the intro that that rising complexity and fragmentation in the broader data landscape, that hasn't gotten any better over the last couple of years. When when we talk to our customers, that level of fragmentation, the complexity, how do we find data that we can trust, that we know we can use has only gotten kinda more, more difficult. So that trend that's continuing, I think what is changing is that trend has become much more acute. Well, the other thing we've seen over the last couple of years is that the level of scrutiny that organizations are under respect to data, as data becomes more mission critical, as data becomes more impactful than important, the level of scrutiny with respect to privacy, security, regulatory compliance, as only increasing as well, which again, is really difficult in this environment of continuous innovation, continuous change, continuous growing complexity and fragmentation. >>So it's become much more acute. And, and to your earlier point, we do live in a different world and and the the past couple of years we could probably just kind of brute for it, right? We could focus on, on the top line. There was enough kind of investments to be, to be had. I think nowadays organizations are focused or are, are, are, are, are, are in a very different environment where there's much more focus on cost control, productivity, efficiency, How do we truly get value from that data? So again, I think it just another incentive for organization to now truly look at data and to scale it data, not just from a a technology and infrastructure perspective, but how do you actually scale data from an organizational perspective, right? You said at the the people and process, how do we do that at scale? And that's only, only only becoming much more important. And we do believe that the, the economic environment that we find ourselves in today is gonna be catalyst for organizations to really dig out more seriously if, if, if, if you will, than they maybe have in the have in the best. >>You know, I don't know when you guys founded Collibra, if, if you had a sense as to how complicated it was gonna get, but you've been on a mission to really address these problems from the beginning. How would you describe your, your, your mission and what are you doing to address these challenges? >>Yeah, absolutely. We, we started Colli in 2008. So in some sense and the, the last kind of financial crisis, and that was really the, the start of Colli where we found product market fit, working with large finance institutions to help them cope with the increasing compliance requirements that they were faced with because of the, of the financial crisis and kind of here we are again in a very different environment, of course 15 years, almost 15 years later. But data only becoming more important. But our mission to deliver trusted data for every user, every use case and across every source, frankly, has only become more important. So what has been an incredible journey over the last 14, 15 years, I think we're still relatively early in our mission to again, be able to provide everyone, and that's why we call it data citizens. We truly believe that everyone in the organization should be able to use trusted data in an easy, easy matter. That mission is is only becoming more important, more relevant. We definitely have a lot more work ahead of us because we are still relatively early in that, in that journey. >>Well, that's interesting because, you know, in my observation it takes seven to 10 years to actually build a company and then the fact that you're still in the early days is kind of interesting. I mean, you, Collibra's had a good 12 months or so since we last spoke at Data Citizens. Give us the latest update on your business. What do people need to know about your, your current momentum? >>Yeah, absolutely. Again, there's, there's a lot of tail organizations that are only maturing the data practices and we've seen it kind of transform or, or, or influence a lot of our business growth that we've seen, broader adoption of the platform. We work at some of the largest organizations in the world where it's Adobe, Heineken, Bank of America, and many more. We have now over 600 enterprise customers, all industry leaders and every single vertical. So it's, it's really exciting to see that and continue to partner with those organizations. On the partnership side, again, a lot of momentum in the org in, in the, in the markets with some of the cloud partners like Google, Amazon, Snowflake, data bricks and, and others, right? As those kind of new modern data infrastructures, modern data architectures that are definitely all moving to the cloud, a great opportunity for us, our partners and of course our customers to help them kind of transition to the cloud even faster. >>And so we see a lot of excitement and momentum there within an acquisition about 18 months ago around data quality, data observability, which we believe is an enormous opportunity. Of course, data quality isn't new, but I think there's a lot of reasons why we're so excited about quality and observability now. One is around leveraging ai, machine learning, again to drive more automation. And the second is that those data pipelines that are now being created in the cloud, in these modern data architecture arch architectures, they've become mission critical. They've become real time. And so monitoring, observing those data pipelines continuously has become absolutely critical so that they're really excited about about that as well. And on the organizational side, I'm sure you've heard a term around kind of data mesh, something that's gaining a lot of momentum, rightfully so. It's really the type of governance that we always believe. Then federated focused on domains, giving a lot of ownership to different teams. I think that's the way to scale data organizations. And so that aligns really well with our vision and, and from a product perspective, we've seen a lot of momentum with our customers there as well. >>Yeah, you know, a couple things there. I mean, the acquisition of i l dq, you know, Kirk Hasselbeck and, and their team, it's interesting, you know, the whole data quality used to be this back office function and, and really confined to highly regulated industries. It's come to the front office, it's top of mind for chief data officers, data mesh. You mentioned you guys are a connective tissue for all these different nodes on the data mesh. That's key. And of course we see you at all the shows. You're, you're a critical part of many ecosystems and you're developing your own ecosystem. So let's chat a little bit about the, the products. We're gonna go deeper in into products later on at, at Data Citizens 22, but we know you're debuting some, some new innovations, you know, whether it's, you know, the, the the under the covers in security, sort of making data more accessible for people just dealing with workflows and processes as you talked about earlier. Tell us a little bit about what you're introducing. >>Yeah, absolutely. We're super excited, a ton of innovation. And if we think about the big theme and like, like I said, we're still relatively early in this, in this journey towards kind of that mission of data intelligence that really bolts and compelling mission, either customers are still start, are just starting on that, on that journey. We wanna make it as easy as possible for the, for our organization to actually get started because we know that's important that they do. And for our organization and customers that have been with us for some time, there's still a tremendous amount of opportunity to kind of expand the platform further. And again, to make it easier for really to, to accomplish that mission and vision around that data citizen that everyone has access to trustworthy data in a very easy, easy way. So that's really the theme of a lot of the innovation that we're driving. >>A lot of kind of ease of adoption, ease of use, but also then how do we make sure that lio becomes this kind of mission critical enterprise platform from a security performance architecture scale supportability that we're truly able to deliver that kind of an enterprise mission critical platform. And so that's the big theme from an innovation perspective, From a product perspective, a lot of new innovation that we're really excited about. A couple of highlights. One is around data marketplace. Again, a lot of our customers have plans in that direction, how to make it easy. How do we make, how do we make available to true kind of shopping experience that anybody in your organization can, in a very easy search first way, find the right data product, find the right dataset, that data can then consume usage analytics. How do you, how do we help organizations drive adoption, tell them where they're working really well and where they have opportunities homepages again to, to make things easy for, for people, for anyone in your organization to kind of get started with ppia, you mentioned workflow designer, again, we have a very powerful enterprise platform. >>One of our key differentiators is the ability to really drive a lot of automation through workflows. And now we provided a new low code, no code kind of workflow designer experience. So, so really customers can take it to the next level. There's a lot more new product around K Bear Protect, which in partnership with Snowflake, which has been a strategic investor in kib, focused on how do we make access governance easier? How do we, how do we, how are we able to make sure that as you move to the cloud, things like access management, masking around sensitive data, PII data is managed as much more effective, effective rate, really excited about that product. There's more around data quality. Again, how do we, how do we get that deployed as easily and quickly and widely as we can? Moving that to the cloud has been a big part of our strategy. >>So we launch more data quality cloud product as well as making use of those, those native compute capabilities in platforms like Snowflake, Data, Bricks, Google, Amazon, and others. And so we are bettering a capability, a capability that we call push down. So actually pushing down the computer and data quality, the monitoring into the underlying platform, which again, from a scale performance and ease of use perspective is gonna make a massive difference. And then more broadly, we, we talked a little bit about the ecosystem. Again, integrations, we talk about being able to connect to every source. Integrations are absolutely critical and we're really excited to deliver new integrations with Snowflake, Azure and Google Cloud storage as well. So there's a lot coming out. The, the team has been work at work really hard and we are really, really excited about what we are coming, what we're bringing to markets. >>Yeah, a lot going on there. I wonder if you could give us your, your closing thoughts. I mean, you, you talked about, you know, the marketplace, you know, you think about data mesh, you think of data as product, one of the key principles you think about monetization. This is really different than what we've been used to in data, which is just getting the technology to work has been been so hard. So how do you see sort of the future and, you know, give us the, your closing thoughts please? >>Yeah, absolutely. And I, and I think we we're really at this pivotal moment, and I think you said it well. We, we all know the constraint and the challenges with data, how to actually do data at scale. And while we've seen a ton of innovation on the infrastructure side, we fundamentally believe that just getting a faster database is important, but it's not gonna fully solve the challenges and truly kind of deliver on the opportunity. And that's why now is really the time to deliver this data intelligence vision, this data intelligence platform. We are still early, making it as easy as we can. It's kind of, of our, it's our mission. And so I'm really, really excited to see what we, what we are gonna, how the marks gonna evolve over the next, next few quarters and years. I think the trend is clearly there when we talk about data mesh, this kind of federated approach folks on data products is just another signal that we believe that a lot of our organization are now at the time. >>The understanding need to go beyond just the technology. I really, really think about how do we actually scale data as a business function, just like we've done with it, with, with hr, with, with sales and marketing, with finance. That's how we need to think about data. I think now is the time given the economic environment that we are in much more focus on control, much more focused on productivity efficiency and now's the time. We need to look beyond just the technology and infrastructure to think of how to scale data, how to manage data at scale. >>Yeah, it's a new era. The next 10 years of data won't be like the last, as I always say. Felix, thanks so much and good luck in, in San Diego. I know you're gonna crush it out there. >>Thank you Dave. >>Yeah, it's a great spot for an in-person event and, and of course the content post event is gonna be available@collibra.com and you can of course catch the cube coverage@thecube.net and all the news@siliconangle.com. This is Dave Valante for the cube, your leader in enterprise and emerging tech coverage. >>Hi, I'm Jay from Collibra's Data Office. Today I want to talk to you about Collibra's data intelligence cloud. We often say Collibra is a single system of engagement for all of your data. Now, when I say data, I mean data in the broadest sense of the word, including reference and metadata. Think of metrics, reports, APIs, systems, policies, and even business processes that produce or consume data. Now, the beauty of this platform is that it ensures all of your users have an easy way to find, understand, trust, and access data. But how do you get started? Well, here are seven steps to help you get going. One, start with the data. What's data intelligence? Without data leverage the Collibra data catalog to automatically profile and classify your enterprise data wherever that data lives, databases, data lakes or data warehouses, whether on the cloud or on premise. >>Two, you'll then wanna organize the data and you'll do that with data communities. This can be by department, find a business or functional team, however your organization organizes work and accountability. And for that you'll establish community owners, communities, make it easy for people to navigate through the platform, find the data and will help create a sense of belonging for users. An important and related side note here, we find it's typical in many organizations that data is thought of is just an asset and IT and data offices are viewed as the owners of it and who are really the central teams performing analytics as a service provider to the enterprise. We believe data is more than an asset, it's a true product that can be converted to value. And that also means establishing business ownership of data where that strategy and ROI come together with subject matter expertise. >>Okay, three. Next, back to those communities there, the data owners should explain and define their data, not just the tables and columns, but also the related business terms, metrics and KPIs. These objects we call these assets are typically organized into business glossaries and data dictionaries. I definitely recommend starting with the topics that are most important to the business. Four, those steps that enable you and your users to have some fun with it. Linking everything together builds your knowledge graph and also known as a metadata graph by linking or relating these assets together. For example, a data set to a KPI to a report now enables your users to see what we call the lineage diagram that visualizes where the data in your dashboards actually came from and what the data means and who's responsible for it. Speaking of which, here's five. Leverage the calibra trusted business reporting solution on the marketplace, which comes with workflows for those owners to certify their reports, KPIs, and data sets. >>This helps them force their trust in their data. Six, easy to navigate dashboards or landing pages right in your platform for your company's business processes are the most effective way for everyone to better understand and take action on data. Here's a pro tip, use the dashboard design kit on the marketplace to help you build compelling dashboards. Finally, seven, promote the value of this to your users and be sure to schedule enablement office hours and new employee onboarding sessions to get folks excited about what you've built and implemented. Better yet, invite all of those community and data owners to these sessions so that they can show off the value that they've created. Those are my seven tips to get going with Collibra. I hope these have been useful. For more information, be sure to visit collibra.com. >>Welcome to the Cube's coverage of Data Citizens 2022 Collibra's customer event. My name is Dave Valante. With us is Kirk Hasselbeck, who's the vice president of Data Quality of Collibra Kirk, good to see you. Welcome. >>Thanks for having me, Dave. Excited to be here. >>You bet. Okay, we're gonna discuss data quality observability. It's a hot trend right now. You founded a data quality company, OWL dq, and it was acquired by Collibra last year. Congratulations. And now you lead data quality at Collibra. So we're hearing a lot about data quality right now. Why is it such a priority? Take us through your thoughts on that. >>Yeah, absolutely. It's, it's definitely exciting times for data quality, which you're right, has been around for a long time. So why now and why is it so much more exciting than it used to be? I think it's a bit stale, but we all know that companies use more data than ever before and the variety has changed and the volume has grown. And, and while I think that remains true, there are a couple other hidden factors at play that everyone's so interested in as, as to why this is becoming so important now. And, and I guess you could kind of break this down simply and think about if Dave, you and I were gonna build, you know, a new healthcare application and monitor the heartbeat of individuals, imagine if we get that wrong, you know, what the ramifications could be, what, what those incidents would look like, or maybe better yet, we try to build a, a new trading algorithm with a crossover strategy where the 50 day crosses the, the 10 day average. >>And imagine if the data underlying the inputs to that is incorrect. We will probably have major financial ramifications in that sense. So, you know, it kind of starts there where everybody's realizing that we're all data companies and if we are using bad data, we're likely making incorrect business decisions. But I think there's kind of two other things at play. You know, I, I bought a car not too long ago and my dad called and said, How many cylinders does it have? And I realized in that moment, you know, I might have failed him because, cause I didn't know. And, and I used to ask those types of questions about any lock brakes and cylinders and, and you know, if it's manual or, or automatic and, and I realized I now just buy a car that I hope works. And it's so complicated with all the computer chips, I, I really don't know that much about it. >>And, and that's what's happening with data. We're just loading so much of it. And it's so complex that the way companies consume them in the IT function is that they bring in a lot of data and then they syndicate it out to the business. And it turns out that the, the individuals loading and consuming all of this data for the company actually may not know that much about the data itself, and that's not even their job anymore. So we'll talk more about that in a minute, but that's really what's setting the foreground for this observability play and why everybody's so interested. It, it's because we're becoming less close to the intricacies of the data and we just expect it to always be there and be correct. >>You know, the other thing too about data quality, and for years we did the MIT CDO IQ event, we didn't do it last year, Covid messed everything up. But the observation I would make there thoughts is, is it data quality? Used to be information quality used to be this back office function, and then it became sort of front office with financial services and government and healthcare, these highly regulated industries. And then the whole chief data officer thing happened and people were realizing, well, they sort of flipped the bit from sort of a data as a, a risk to data as a, as an asset. And now as we say, we're gonna talk about observability. And so it's really become front and center just the whole quality issue because data's so fundamental, hasn't it? >>Yeah, absolutely. I mean, let's imagine we pull up our phones right now and I go to my, my favorite stock ticker app and I check out the NASDAQ market cap. I really have no idea if that's the correct number. I know it's a number, it looks large, it's in a numeric field. And, and that's kind of what's going on. There's, there's so many numbers and they're coming from all of these different sources and data providers and they're getting consumed and passed along. But there isn't really a way to tactically put controls on every number and metric across every field we plan to monitor, but with the scale that we've achieved in early days, even before calibra. And what's been so exciting is we have these types of observation techniques, these data monitors that can actually track past performance of every field at scale. And why that's so interesting and why I think the CDO is, is listening right intently nowadays to this topic is, so maybe we could surface all of these problems with the right solution of data observability and with the right scale and then just be alerted on breaking trends. So we're sort of shifting away from this world of must write a condition and then when that condition breaks, that was always known as a break record. But what about breaking trends and root cause analysis? And is it possible to do that, you know, with less human intervention? And so I think most people are seeing now that it's going to have to be a software tool and a computer system. It's, it's not ever going to be based on one or two domain experts anymore. >>So, So how does data observability relate to data quality? Are they sort of two sides of the same coin? Are they, are they cousins? What's your perspective on that? >>Yeah, it's, it's super interesting. It's an emerging market. So the language is changing a lot of the topic and areas changing the way that I like to say it or break it down because the, the lingo is constantly moving is, you know, as a target on this space is really breaking records versus breaking trends. And I could write a condition when this thing happens, it's wrong and when it doesn't it's correct. Or I could look for a trend and I'll give you a good example. You know, everybody's talking about fresh data and stale data and, and why would that matter? Well, if your data never arrived or only part of it arrived or didn't arrive on time, it's likely stale and there will not be a condition that you could write that would show you all the good in the bads. That was kind of your, your traditional approach of data quality break records. But your modern day approach is you lost a significant portion of your data, or it did not arrive on time to make that decision accurately on time. And that's a hidden concern. Some people call this freshness, we call it stale data, but it all points to the same idea of the thing that you're observing may not be a data quality condition anymore. It may be a breakdown in the data pipeline. And with thousands of data pipelines in play for every company out there there, there's more than a couple of these happening every day. >>So what's the Collibra angle on all this stuff made the acquisition, you got data quality observability coming together, you guys have a lot of expertise in, in this area, but you hear providence of data, you just talked about, you know, stale data, you know, the, the whole trend toward real time. How is Calibra approaching the problem and what's unique about your approach? >>Well, I think where we're fortunate is with our background, myself and team, we sort of lived this problem for a long time, you know, in, in the Wall Street days about a decade ago. And we saw it from many different angles. And what we came up with before it was called data observability or reliability was basically the, the underpinnings of that. So we're a little bit ahead of the curve there when most people evaluate our solution, it's more advanced than some of the observation techniques that that currently exist. But we've also always covered data quality and we believe that people want to know more, they need more insights, and they want to see break records and breaking trends together so they can correlate the root cause. And we hear that all the time. I have so many things going wrong, just show me the big picture, help me find the thing that if I were to fix it today would make the most impact. So we're really focused on root cause analysis, business impact, connecting it with lineage and catalog metadata. And as that grows, you can actually achieve total data governance at this point with the acquisition of what was a Lineage company years ago, and then my company Ldq now Collibra, Data quality Collibra may be the best positioned for total data governance and intelligence in the space. >>Well, you mentioned financial services a couple of times and some examples, remember the flash crash in 2010. Nobody had any idea what that was, you know, they just said, Oh, it's a glitch, you know, so they didn't understand the root cause of it. So this is a really interesting topic to me. So we know at Data Citizens 22 that you're announcing, you gotta announce new products, right? You're yearly event what's, what's new. Give us a sense as to what products are coming out, but specifically around data quality and observability. >>Absolutely. There's this, you know, there's always a next thing on the forefront. And the one right now is these hyperscalers in the cloud. So you have databases like Snowflake and Big Query and Data Bricks is Delta Lake and SQL Pushdown. And ultimately what that means is a lot of people are storing in loading data even faster in a SaaS like model. And we've started to hook in to these databases. And while we've always worked with the the same databases in the past, they're supported today we're doing something called Native Database pushdown, where the entire compute and data activity happens in the database. And why that is so interesting and powerful now is everyone's concerned with something called Egress. Did your, my data that I've spent all this time and money with my security team securing ever leave my hands, did it ever leave my secure VPC as they call it? >>And with these native integrations that we're building and about to unveil, here's kind of a sneak peek for, for next week at Data Citizens. We're now doing all compute and data operations in databases like Snowflake. And what that means is with no install and no configuration, you could log into the Collibra data quality app and have all of your data quality running inside the database that you've probably already picked as your your go forward team selection secured database of choice. So we're really excited about that. And I think if you look at the whole landscape of network cost, egress, cost, data storage and compute, what people are realizing is it's extremely efficient to do it in the way that we're about to release here next week. >>So this is interesting because what you just described, you know, you mentioned Snowflake, you mentioned Google, Oh actually you mentioned yeah, data bricks. You know, Snowflake has the data cloud. If you put everything in the data cloud, okay, you're cool, but then Google's got the open data cloud. If you heard, you know, Google next and now data bricks doesn't call it the data cloud, but they have like the open source data cloud. So you have all these different approaches and there's really no way up until now I'm, I'm hearing to, to really understand the relationships between all those and have confidence across, you know, it's like Jak Dani, you should just be a note on the mesh. And I don't care if it's a data warehouse or a data lake or where it comes from, but it's a point on that mesh and I need tooling to be able to have confidence that my data is governed and has the proper lineage, providence. And, and, and that's what you're bringing to the table, Is that right? Did I get that right? >>Yeah, that's right. And it's, for us, it's, it's not that we haven't been working with those great cloud databases, but it's the fact that we can send them the instructions now, we can send them the, the operating ability to crunch all of the calculations, the governance, the quality, and get the answers. And what that's doing, it's basically zero network costs, zero egress cost, zero latency of time. And so when you were to log into Big Query tomorrow using our tool or like, or say Snowflake for example, you have instant data quality metrics, instant profiling, instant lineage and access privacy controls, things of that nature that just become less onerous. What we're seeing is there's so much technology out there, just like all of the major brands that you mentioned, but how do we make it easier? The future is about less clicks, faster time to value, faster scale, and eventually lower cost. And, and we think that this positions us to be the leader there. >>I love this example because, you know, Barry talks about, wow, the cloud guys are gonna own the world and, and of course now we're seeing that the ecosystem is finding so much white space to add value, connect across cloud. Sometimes we call it super cloud and so, or inter clouding. All right, Kirk, give us your, your final thoughts and on on the trends that we've talked about and Data Citizens 22. >>Absolutely. Well, I think, you know, one big trend is discovery and classification. Seeing that across the board, people used to know it was a zip code and nowadays with the amount of data that's out there, they wanna know where everything is, where their sensitive data is. If it's redundant, tell me everything inside of three to five seconds. And with that comes, they want to know in all of these hyperscale databases how fast they can get controls and insights out of their tools. So I think we're gonna see more one click solutions, more SAS based solutions and solutions that hopefully prove faster time to value on, on all of these modern cloud platforms. >>Excellent. All right, Kurt Hasselbeck, thanks so much for coming on the Cube and previewing Data Citizens 22. Appreciate it. >>Thanks for having me, Dave. >>You're welcome. Right, and thank you for watching. Keep it right there for more coverage from the Cube. Welcome to the Cube's virtual Coverage of Data Citizens 2022. My name is Dave Valante and I'm here with Laura Sellers, who's the Chief Product Officer at Collibra, the host of Data Citizens. Laura, welcome. Good to see you. >>Thank you. Nice to be here. >>Yeah, your keynote at Data Citizens this year focused on, you know, your mission to drive ease of use and scale. Now when I think about historically fast access to the right data at the right time in a form that's really easily consumable, it's been kind of challenging, especially for business users. Can can you explain to our audience why this matters so much and what's actually different today in the data ecosystem to make this a reality? >>Yeah, definitely. So I think what we really need and what I hear from customers every single day is that we need a new approach to data management and our product teams. What inspired me to come to Calibra a little bit a over a year ago was really the fact that they're very focused on bringing trusted data to more users across more sources for more use cases. And so as we look at what we're announcing with these innovations of ease of use and scale, it's really about making teams more productive in getting started with and the ability to manage data across the entire organization. So we've been very focused on richer experiences, a broader ecosystem of partners, as well as a platform that delivers performance, scale and security that our users and teams need and demand. So as we look at, Oh, go ahead. >>I was gonna say, you know, when I look back at like the last 10 years, it was all about getting the technology to work and it was just so complicated. But, but please carry on. I'd love to hear more about this. >>Yeah, I, I really, you know, Collibra is a system of engagement for data and we really are working on bringing that entire system of engagement to life for everyone to leverage here and now. So what we're announcing from our ease of use side of the world is first our data marketplace. This is the ability for all users to discover and access data quickly and easily shop for it, if you will. The next thing that we're also introducing is the new homepage. It's really about the ability to drive adoption and have users find data more quickly. And then the two more areas of the ease of use side of the world is our world of usage analytics. And one of the big pushes and passions we have at Collibra is to help with this data driven culture that all companies are trying to create. And also helping with data literacy, with something like usage analytics, it's really about driving adoption of the CLE platform, understanding what's working, who's accessing it, what's not. And then finally we're also introducing what's called workflow designer. And we love our workflows at Libra, it's a big differentiator to be able to automate business processes. The designer is really about a way for more people to be able to create those workflows, collaborate on those workflow flows, as well as people to be able to easily interact with them. So a lot of exciting things when it comes to ease of use to make it easier for all users to find data. >>Y yes, there's definitely a lot to unpack there. I I, you know, you mentioned this idea of, of of, of shopping for the data. That's interesting to me. Why this analogy, metaphor or analogy, I always get those confused. I let's go with analogy. Why is it so important to data consumers? >>I think when you look at the world of data, and I talked about this system of engagement, it's really about making it more accessible to the masses. And what users are used to is a shopping experience like your Amazon, if you will. And so having a consumer grade experience where users can quickly go in and find the data, trust that data, understand where the data's coming from, and then be able to quickly access it, is the idea of being able to shop for it, just making it as simple as possible and really speeding the time to value for any of the business analysts, data analysts out there. >>Yeah, I think when you, you, you see a lot of discussion about rethinking data architectures, putting data in the hands of the users and business people, decentralized data and of course that's awesome. I love that. But of course then you have to have self-service infrastructure and you have to have governance. And those are really challenging. And I think so many organizations, they're facing adoption challenges, you know, when it comes to enabling teams generally, especially domain experts to adopt new data technologies, you know, like the, the tech comes fast and furious. You got all these open source projects and get really confusing. Of course it risks security, governance and all that good stuff. You got all this jargon. So where do you see, you know, the friction in adopting new data technologies? What's your point of view and how can organizations overcome these challenges? >>You're, you're dead on. There's so much technology and there's so much to stay on top of, which is part of the friction, right? It's just being able to stay ahead of, of and understand all the technologies that are coming. You also look at as there's so many more sources of data and people are migrating data to the cloud and they're migrating to new sources. Where the friction comes is really that ability to understand where the data came from, where it's moving to, and then also to be able to put the access controls on top of it. So people are only getting access to the data that they should be getting access to. So one of the other things we're announcing with, with all of the innovations that are coming is what we're doing around performance and scale. So with all of the data movement, with all of the data that's out there, the first thing we're launching in the world of performance and scale is our world of data quality. >>It's something that Collibra has been working on for the past year and a half, but we're launching the ability to have data quality in the cloud. So it's currently an on-premise offering, but we'll now be able to carry that over into the cloud for us to manage that way. We're also introducing the ability to push down data quality into Snowflake. So this is, again, one of those challenges is making sure that that data that you have is d is is high quality as you move forward. And so really another, we're just reducing friction. You already have Snowflake stood up. It's not another machine for you to manage, it's just push down capabilities into Snowflake to be able to track that quality. Another thing that we're launching with that is what we call Collibra Protect. And this is that ability for users to be able to ingest metadata, understand where the PII data is, and then set policies up on top of it. So very quickly be able to set policies and have them enforced at the data level. So anybody in the organization is only getting access to the data they should have access to. >>Here's Topica data quality is interesting. It's something that I've followed for a number of years. It used to be a back office function, you know, and really confined only to highly regulated industries like financial services and healthcare and government. You know, you look back over a decade ago, you didn't have this worry about personal information, g gdpr, and, you know, California Consumer Privacy Act all becomes, becomes so much important. The cloud is really changed things in terms of performance and scale and of course partnering for, for, with Snowflake it's all about sharing data and monetization, anything but a back office function. So it was kind of smart that you guys were early on and of course attracting them and as a, as an investor as well was very strong validation. What can you tell us about the nature of the relationship with Snowflake and specifically inter interested in sort of joint engineering or, and product innovation efforts, you know, beyond the standard go to market stuff? >>Definitely. So you mentioned there were a strategic investor in Calibra about a year ago. A little less than that I guess. We've been working with them though for over a year really tightly with their product and engineering teams to make sure that Collibra is adding real value. Our unified platform is touching pieces of our unified platform or touching all pieces of Snowflake. And when I say that, what I mean is we're first, you know, able to ingest data with Snowflake, which, which has always existed. We're able to profile and classify that data we're announcing with Calibra Protect this week that you're now able to create those policies on top of Snowflake and have them enforce. So again, people can get more value out of their snowflake more quickly as far as time to value with, with our policies for all business users to be able to create. >>We're also announcing Snowflake Lineage 2.0. So this is the ability to take stored procedures in Snowflake and understand the lineage of where did the data come from, how was it transformed with within Snowflake as well as the data quality. Pushdown, as I mentioned, data quality, you brought it up. It is a new, it is a, a big industry push and you know, one of the things I think Gartner mentioned is people are losing up to $15 million without having great data quality. So this push down capability for Snowflake really is again, a big ease of use push for us at Collibra of that ability to, to push it into snowflake, take advantage of the data, the data source, and the engine that already lives there and get the right and make sure you have the right quality. >>I mean, the nice thing about Snowflake, if you play in the Snowflake sandbox, you, you, you, you can get sort of a, you know, high degree of confidence that the data sharing can be done in a safe way. Bringing, you know, Collibra into the, into the story allows me to have that data quality and, and that governance that I, that I need. You know, we've said many times on the cube that one of the notable differences in cloud this decade versus last decade, I mean ob there are obvious differences just in terms of scale and scope, but it's shaping up to be about the strength of the ecosystems. That's really a hallmark of these big cloud players. I mean they're, it's a key factor for innovating, accelerating product delivery, filling gaps in, in the hyperscale offerings cuz you got more stack, you know, mature stack capabilities and you know, it creates this flywheel momentum as we often say. But, so my question is, how do you work with the hyperscalers? Like whether it's AWS or Google, whomever, and what do you see as your role and what's the Collibra sweet spot? >>Yeah, definitely. So, you know, one of the things I mentioned early on is the broader ecosystem of partners is what it's all about. And so we have that strong partnership with Snowflake. We also are doing more with Google around, you know, GCP and kbra protect there, but also tighter data plex integration. So similar to what you've seen with our strategic moves around Snowflake and, and really covering the broad ecosystem of what Collibra can do on top of that data source. We're extending that to the world of Google as well and the world of data plex. We also have great partners in SI's Infosys is somebody we spoke with at the conference who's done a lot of great work with Levi's as they're really important to help people with their whole data strategy and driving that data driven culture and, and Collibra being the core of it. >>Hi Laura, we're gonna, we're gonna end it there, but I wonder if you could kind of put a bow on, you know, this year, the event your, your perspectives. So just give us your closing thoughts. >>Yeah, definitely. So I, I wanna say this is one of the biggest releases Collibra's ever had. Definitely the biggest one since I've been with the company a little over a year. We have all these great new product innovations coming to really drive the ease of use to make data more valuable for users everywhere and, and companies everywhere. And so it's all about everybody being able to easily find, understand, and trust and get access to that data going forward. >>Well congratulations on all the pro progress. It was great to have you on the cube first time I believe, and really appreciate you, you taking the time with us. >>Yes, thank you for your time. >>You're very welcome. Okay, you're watching the coverage of Data Citizens 2022 on the cube, your leader in enterprise and emerging tech coverage. >>So data modernization oftentimes means moving some of your storage and computer to the cloud where you get the benefit of scale and security and so on. But ultimately it doesn't take away the silos that you have. We have more locations, more tools and more processes with which we try to get value from this data. To do that at scale in an organization, people involved in this process, they have to understand each other. So you need to unite those people across those tools, processes, and systems with a shared language. When I say customer, do you understand the same thing as you hearing customer? Are we counting them in the same way so that shared language unites us and that gives the opportunity for the organization as a whole to get the maximum value out of their data assets and then they can democratize data so everyone can properly use that shared language to find, understand, and trust the data asset that's available. >>And that's where Collibra comes in. We provide a centralized system of engagement that works across all of those locations and combines all of those different user types across the whole business. At Collibra, we say United by data and that also means that we're united by data with our customers. So here is some data about some of our customers. There was the case of an online do it yourself platform who grew their revenue almost three times from a marketing campaign that provided the right product in the right hands of the right people. In other case that comes to mind is from a financial services organization who saved over 800 K every year because they were able to reuse the same data in different kinds of reports and before there was spread out over different tools and processes and silos, and now the platform brought them together so they realized, oh, we're actually using the same data, let's find a way to make this more efficient. And the last example that comes to mind is that of a large home loan, home mortgage, mortgage loan provider where they have a very complex landscape, a very complex architecture legacy in the cloud, et cetera. And they're using our software, they're using our platform to unite all the people and those processes and tools to get a common view of data to manage their compliance at scale. >>Hey everyone, I'm Lisa Martin covering Data Citizens 22, brought to you by Collibra. This next conversation is gonna focus on the importance of data culture. One of our Cube alumni is back, Stan Christians is Collibra's co-founder and it's Chief Data citizens. Stan, it's great to have you back on the cube. >>Hey Lisa, nice to be. >>So we're gonna be talking about the importance of data culture, data intelligence, maturity, all those great things. When we think about the data revolution that every business is going through, you know, it's so much more than technology innovation. It also really re requires cultural transformation, community transformation. Those are challenging for customers to undertake. Talk to us about what you mean by data citizenship and the role that creating a data culture plays in that journey. >>Right. So as you know, our event is called Data Citizens because we believe that in the end, a data citizen is anyone who uses data to do their job. And we believe that today's organizations, you have a lot of people, most of the employees in an organization are somehow gonna to be a data citizen, right? So you need to make sure that these people are aware of it. You need that. People have skills and competencies to do with data what necessary and that's on, all right? So what does it mean to have a good data culture? It means that if you're building a beautiful dashboard to try and convince your boss, we need to make this decision that your boss is also open to and able to interpret, you know, the data presented in dashboard to actually make that decision and take that action. Right? >>And once you have that why to the organization, that's when you have a good data culture. Now that's continuous effort for most organizations because they're always moving, somehow they're hiring new people and it has to be continuous effort because we've seen that on the hand. Organizations continue challenged their data sources and where all the data is flowing, right? Which in itself creates a lot of risk. But also on the other set hand of the equation, you have the benefit. You know, you might look at regulatory drivers like, we have to do this, right? But it's, it's much better right now to consider the competitive drivers, for example, and we did an IDC study earlier this year, quite interesting. I can recommend anyone to it. And one of the conclusions they found as they surveyed over a thousand people across organizations worldwide is that the ones who are higher in maturity. >>So the, the organizations that really look at data as an asset, look at data as a product and actively try to be better at it, don't have three times as good a business outcome as the ones who are lower on the maturity scale, right? So you can say, ok, I'm doing this, you know, data culture for everyone, awakening them up as data citizens. I'm doing this for competitive reasons, I'm doing this re reasons you're trying to bring both of those together and the ones that get data intelligence right, are successful and competitive. That's, and that's what we're seeing out there in the market. >>Absolutely. We know that just generally stand right, the organizations that are, are really creating a, a data culture and enabling everybody within the organization to become data citizens are, We know that in theory they're more competitive, they're more successful. But the IDC study that you just mentioned demonstrates they're three times more successful and competitive than their peers. Talk about how Collibra advises customers to create that community, that culture of data when it might be challenging for an organization to adapt culturally. >>Of course, of course it's difficult for an organization to adapt but it's also necessary, as you just said, imagine that, you know, you're a modern day organization, laptops, what have you, you're not using those, right? Or you know, you're delivering them throughout organization, but not enabling your colleagues to actually do something with that asset. Same thing as through with data today, right? If you're not properly using the data asset and competitors are, they're gonna to get more advantage. So as to how you get this done, establish this. There's angles to look at, Lisa. So one angle is obviously the leadership whereby whoever is the boss of data in the organization, you typically have multiple bosses there, like achieve data officers. Sometimes there's, there's multiple, but they may have a different title, right? So I'm just gonna summarize it as a data leader for a second. >>So whoever that is, they need to make sure that there's a clear vision, a clear strategy for data. And that strategy needs to include the monetization aspect. How are you going to get value from data? Yes. Now that's one part because then you can leadership in the organization and also the business value. And that's important. Cause those people, their job in essence really is to make everyone in the organization think about data as an asset. And I think that's the second part of the equation of getting that right, is it's not enough to just have that leadership out there, but you also have to get the hearts and minds of the data champions across the organization. You, I really have to win them over. And if you have those two combined and obviously a good technology to, you know, connect those people and have them execute on their responsibilities such as a data intelligence platform like s then the in place to really start upgrading that culture inch by inch if you'll, >>Yes, I like that. The recipe for success. So you are the co-founder of Collibra. You've worn many different hats along this journey. Now you're building Collibra's own data office. I like how before we went live, we were talking about Calibra is drinking its own champagne. I always loved to hear stories about that. You're speaking at Data Citizens 2022. Talk to us about how you are building a data culture within Collibra and what maybe some of the specific projects are that Collibra's data office is working on. >>Yes, and it is indeed data citizens. There are a ton of speaks here, are very excited. You know, we have Barb from m MIT speaking about data monetization. We have Dilla at the last minute. So really exciting agen agenda. Can't wait to get back out there essentially. So over the years at, we've doing this since two and eight, so a good years and I think we have another decade of work ahead in the market, just to be very clear. Data is here to stick around as are we. And myself, you know, when you start a company, we were for people in a, if you, so everybody's wearing all sorts of hat at time. But over the years I've run, you know, presales that sales partnerships, product cetera. And as our company got a little bit biggish, we're now thousand two. Something like people in the company. >>I believe systems and processes become a lot important. So we said you CBRA isn't the size our customers we're getting there in of organization structure, process systems, et cetera. So we said it's really time for us to put our money where is and to our own data office, which is what we were seeing customers', organizations worldwide. And they organizations have HR units, they have a finance unit and over time they'll all have a department if you'll, that is responsible somehow for the data. So we said, ok, let's try to set an examples that other people can take away with it, right? Can take away from it. So we set up a data strategy, we started building data products, took care of the data infrastructure. That's sort of good stuff. And in doing all of that, ISA exactly as you said, we said, okay, we need to also use our product and our own practices and from that use, learn how we can make the product better, learn how we make, can make the practice better and share that learning with all the, and on, on the Monday mornings, we sometimes refer to eating our dog foods on Friday evenings. >>We referred to that drinking our own champagne. I like it. So we, we had a, we had the driver to do this. You know, there's a clear business reason. So we involved, we included that in the data strategy and that's a little bit of our origin. Now how, how do we organize this? We have three pillars, and by no means is this a template that everyone should, this is just the organization that works at our company, but it can serve as an inspiration. So we have a pillar, which is data science. The data product builders, if you'll or the people who help the business build data products. We have the data engineers who help keep the lights on for that data platform to make sure that the products, the data products can run, the data can flow and you know, the quality can be checked. >>And then we have a data intelligence or data governance builders where we have those data governance, data intelligence stakeholders who help the business as a sort of data partner to the business stakeholders. So that's how we've organized it. And then we started following the CBRA approach, which is, well, what are the challenges that our business stakeholders have in hr, finance, sales, marketing all over? And how can data help overcome those challenges? And from those use cases, we then just started to build a map and started execution use of the use case. And a important ones are very simple. We them with our, our customers as well, people talking about the cata, right? The catalog for the data scientists to know what's in their data lake, for example, and for the people in and privacy. So they have their process registry and they can see how the data flows. >>So that's a starting place and that turns into a marketplace so that if new analysts and data citizens join kbra, they immediately have a place to go to, to look at, see, ok, what data is out there for me as an analyst or a data scientist or whatever to do my job, right? So they can immediately get access data. And another one that we is around trusted business. We're seeing that since, you know, self-service BI allowed everyone to make beautiful dashboards, you know, pie, pie charts. I always, my pet pee is the pie chart because I love buy and you shouldn't always be using pie charts. But essentially there's become proliferation of those reports. And now executives don't really know, okay, should I trust this report or that report the reporting on the same thing. But the numbers seem different, right? So that's why we have trusted this reporting. So we know if a, the dashboard, a data product essentially is built, we not that all the right steps are being followed and that whoever is consuming that can be quite confident in the result either, Right. And that silver browser, right? Absolutely >>Decay. >>Exactly. Yes, >>Absolutely. Talk a little bit about some of the, the key performance indicators that you're using to measure the success of the data office. What are some of those KPIs? >>KPIs and measuring is a big topic in the, in the data chief data officer profession, I would say, and again, it always varies with to your organization, but there's a few that we use that might be of interest. Use those pillars, right? And we have metrics across those pillars. So for example, a pillar on the data engineering side is gonna be more related to that uptime, right? Are the, is the data platform up and running? Are the data products up and running? Is the quality in them good enough? Is it going up? Is it going down? What's the usage? But also, and especially if you're in the cloud and if consumption's a big thing, you have metrics around cost, for example, right? So that's one set of examples. Another one is around the data sciences and products. Are people using them? Are they getting value from it? >>Can we calculate that value in ay perspective, right? Yeah. So that we can to the rest of the business continue to say we're tracking all those numbers and those numbers indicate that value is generated and how much value estimated in that region. And then you have some data intelligence, data governance metrics, which is, for example, you have a number of domains in a data mesh. People talk about being the owner of a data domain, for example, like product or, or customer. So how many of those domains do you have covered? How many of them are already part of the program? How many of them have owners assigned? How well are these owners organized, executing on their responsibilities? How many tickets are open closed? How many data products are built according to process? And so and so forth. So these are an set of examples of, of KPIs. There's a, there's a lot more, but hopefully those can already inspire the audience. >>Absolutely. So we've, we've talked about the rise cheap data offices, it's only accelerating. You mentioned this is like a 10 year journey. So if you were to look into a crystal ball, what do you see in terms of the maturation of data offices over the next decade? >>So we, we've seen indeed the, the role sort of grow up, I think in, in thousand 10 there may have been like 10 achieve data officers or something. Gartner has exact numbers on them, but then they grew, you know, industries and the number is estimated to be about 20,000 right now. Wow. And they evolved in a sort of stack of competencies, defensive data strategy, because the first chief data officers were more regulatory driven, offensive data strategy support for the digital program. And now all about data products, right? So as a data leader, you now need all of those competences and need to include them in, in your strategy. >>How is that going to evolve for the next couple of years? I wish I had one of those balls, right? But essentially I think for the next couple of years there's gonna be a lot of people, you know, still moving along with those four levels of the stack. A lot of people I see are still in version one and version two of the chief data. So you'll see over the years that's gonna evolve more digital and more data products. So for next years, my, my prediction is it's all products because it's an immediate link between data and, and the essentially, right? Right. So that's gonna be important and quite likely a new, some new things will be added on, which nobody can predict yet. But we'll see those pop up in a few years. I think there's gonna be a continued challenge for the chief officer role to become a real executive role as opposed to, you know, somebody who claims that they're executive, but then they're not, right? >>So the real reporting level into the board, into the CEO for example, will continue to be a challenging point. But the ones who do get that done will be the ones that are successful and the ones who get that will the ones that do it on the basis of data monetization, right? Connecting value to the data and making that value clear to all the data citizens in the organization, right? And in that sense, they'll need to have both, you know, technical audiences and non-technical audiences aligned of course. And they'll need to focus on adoption. Again, it's not enough to just have your data office be involved in this. It's really important that you're waking up data citizens across the organization and you make everyone in the organization think about data as an asset. >>Absolutely. Because there's so much value that can be extracted. Organizations really strategically build that data office and democratize access across all those data citizens. Stan, this is an exciting arena. We're definitely gonna keep our eyes on this. Sounds like a lot of evolution and maturation coming from the data office perspective. From the data citizen perspective. And as the data show that you mentioned in that IDC study, you mentioned Gartner as well, organizations have so much more likelihood of being successful and being competitive. So we're gonna watch this space. Stan, thank you so much for joining me on the cube at Data Citizens 22. We appreciate it. >>Thanks for having me over >>From Data Citizens 22, I'm Lisa Martin, you're watching The Cube, the leader in live tech coverage. >>Okay, this concludes our coverage of Data Citizens 2022, brought to you by Collibra. Remember, all these videos are available on demand@thecube.net. And don't forget to check out silicon angle.com for all the news and wiki bod.com for our weekly breaking analysis series where we cover many data topics and share survey research from our partner ETR Enterprise Technology Research. If you want more information on the products announced at Data Citizens, go to collibra.com. There are tons of resources there. You'll find analyst reports, product demos. It's really worthwhile to check those out. Thanks for watching our program and digging into Data Citizens 2022 on the Cube, your leader in enterprise and emerging tech coverage. We'll see you soon.

Published Date : Nov 2 2022

SUMMARY :

largely about getting the technology to work. Now the cloud is definitely helping with that, but also how do you automate governance? So you can see how data governance has evolved into to say we extract the signal from the noise, and over the, the next couple of days, we're gonna feature some of the So it's a really interesting story that we're thrilled to be sharing And we said at the time, you know, maybe it's time to rethink data innovation. 2020s from the previous decade, and what challenges does that bring for your customers? as data becomes more impactful than important, the level of scrutiny with respect to privacy, So again, I think it just another incentive for organization to now truly look at data You know, I don't know when you guys founded Collibra, if, if you had a sense as to how complicated the last kind of financial crisis, and that was really the, the start of Colli where we found product market Well, that's interesting because, you know, in my observation it takes seven to 10 years to actually build a again, a lot of momentum in the org in, in the, in the markets with some of the cloud partners And the second is that those data pipelines that are now being created in the cloud, I mean, the acquisition of i l dq, you know, So that's really the theme of a lot of the innovation that we're driving. And so that's the big theme from an innovation perspective, One of our key differentiators is the ability to really drive a lot of automation through workflows. So actually pushing down the computer and data quality, one of the key principles you think about monetization. And I, and I think we we're really at this pivotal moment, and I think you said it well. We need to look beyond just the I know you're gonna crush it out there. This is Dave Valante for the cube, your leader in enterprise and Without data leverage the Collibra data catalog to automatically And for that you'll establish community owners, a data set to a KPI to a report now enables your users to see what Finally, seven, promote the value of this to your users and Welcome to the Cube's coverage of Data Citizens 2022 Collibra's customer event. And now you lead data quality at Collibra. imagine if we get that wrong, you know, what the ramifications could be, And I realized in that moment, you know, I might have failed him because, cause I didn't know. And it's so complex that the way companies consume them in the IT function is And so it's really become front and center just the whole quality issue because data's so fundamental, nowadays to this topic is, so maybe we could surface all of these problems with So the language is changing a you know, stale data, you know, the, the whole trend toward real time. we sort of lived this problem for a long time, you know, in, in the Wall Street days about a decade you know, they just said, Oh, it's a glitch, you know, so they didn't understand the root cause of it. And the one right now is these hyperscalers in the cloud. And I think if you look at the whole So this is interesting because what you just described, you know, you mentioned Snowflake, And so when you were to log into Big Query tomorrow using our I love this example because, you know, Barry talks about, wow, the cloud guys are gonna own the world and, Seeing that across the board, people used to know it was a zip code and nowadays Appreciate it. Right, and thank you for watching. Nice to be here. Can can you explain to our audience why the ability to manage data across the entire organization. I was gonna say, you know, when I look back at like the last 10 years, it was all about getting the technology to work and it And one of the big pushes and passions we have at Collibra is to help with I I, you know, you mentioned this idea of, and really speeding the time to value for any of the business analysts, So where do you see, you know, the friction in adopting new data technologies? So one of the other things we're announcing with, with all of the innovations that are coming is So anybody in the organization is only getting access to the data they should have access to. So it was kind of smart that you guys were early on and We're able to profile and classify that data we're announcing with Calibra Protect this week that and get the right and make sure you have the right quality. I mean, the nice thing about Snowflake, if you play in the Snowflake sandbox, you, you, you, you can get sort of a, We also are doing more with Google around, you know, GCP and kbra protect there, you know, this year, the event your, your perspectives. And so it's all about everybody being able to easily It was great to have you on the cube first time I believe, cube, your leader in enterprise and emerging tech coverage. the cloud where you get the benefit of scale and security and so on. And the last example that comes to mind is that of a large home loan, home mortgage, Stan, it's great to have you back on the cube. Talk to us about what you mean by data citizenship and the And we believe that today's organizations, you have a lot of people, And one of the conclusions they found as they So you can say, ok, I'm doing this, you know, data culture for everyone, awakening them But the IDC study that you just mentioned demonstrates they're three times So as to how you get this done, establish this. part of the equation of getting that right, is it's not enough to just have that leadership out Talk to us about how you are building a data culture within Collibra and But over the years I've run, you know, So we said you the data products can run, the data can flow and you know, the quality can be checked. The catalog for the data scientists to know what's in their data lake, and data citizens join kbra, they immediately have a place to go to, Yes, success of the data office. So for example, a pillar on the data engineering side is gonna be more related So how many of those domains do you have covered? to look into a crystal ball, what do you see in terms of the maturation industries and the number is estimated to be about 20,000 right now. How is that going to evolve for the next couple of years? And in that sense, they'll need to have both, you know, technical audiences and non-technical audiences And as the data show that you mentioned in that IDC study, the leader in live tech coverage. Okay, this concludes our coverage of Data Citizens 2022, brought to you by Collibra.

ENTITIES

Entity	Category	Confidence
Laura	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Dave	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Heineken	ORGANIZATION	0.99+
Dave Valante	PERSON	0.99+
Laura Sellers	PERSON	0.99+
2008	DATE	0.99+
Collibra	ORGANIZATION	0.99+
Adobe	ORGANIZATION	0.99+
Felix Von Dala	PERSON	0.99+
Google	ORGANIZATION	0.99+
Felix Van Dema	PERSON	0.99+
seven	QUANTITY	0.99+
Stan Christians	PERSON	0.99+
2010	DATE	0.99+
Lisa	PERSON	0.99+
San Diego	LOCATION	0.99+
Jay	PERSON	0.99+
50 day	QUANTITY	0.99+
Felix	PERSON	0.99+
one	QUANTITY	0.99+
Kurt Hasselbeck	PERSON	0.99+
Bank of America	ORGANIZATION	0.99+
10 year	QUANTITY	0.99+
California Consumer Privacy Act	TITLE	0.99+
10 day	QUANTITY	0.99+
Six	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
Dave Ante	PERSON	0.99+
Last year	DATE	0.99+
demand@thecube.net	OTHER	0.99+
ETR Enterprise Technology Research	ORGANIZATION	0.99+
Barry	PERSON	0.99+
Gartner	ORGANIZATION	0.99+
one part	QUANTITY	0.99+
Python	TITLE	0.99+
2010s	DATE	0.99+
2020s	DATE	0.99+
Calibra	LOCATION	0.99+
last year	DATE	0.99+
two	QUANTITY	0.99+
Calibra	ORGANIZATION	0.99+
K Bear Protect	ORGANIZATION	0.99+
two sides	QUANTITY	0.99+
Kirk Hasselbeck	PERSON	0.99+
12 months	QUANTITY	0.99+
tomorrow	DATE	0.99+
AWS	ORGANIZATION	0.99+
Barb	PERSON	0.99+
Stan	PERSON	0.99+
Data Citizens	ORGANIZATION	0.99+

Lie 2, An Open Source Based Platform Cannot Give You Performance and Control | Starburst

>>We're back with Jess Borgman of Starburst and Richard Jarvis of EVAs health. Okay. We're gonna get into lie. Number two, and that is this an open source based platform cannot give you the performance and control that you can get with a proprietary system. Is that a lie? Justin, the enterprise data warehouse has been pretty dominant and has evolved and matured. Its stack has mature over the years. Why is it not the default platform for data? >>Yeah, well, I think that's become a lie over time. So I, I think, you know, if we go back 10 or 12 years ago with the advent of the first data lake really around Hudu, that probably was true that you couldn't get the performance that you needed to run fast, interactive, SQL queries in a data lake. Now a lot's changed in 10 or 12 years. I remember in the very early days, people would say, you'll, you'll never get performance because you need to be column. You need to store data in a column format. And then, you know, column formats were introduced to, to data lake. You have Parque ORC file in aro that were created to ultimately deliver performance out of that. So, okay. We got, you know, largely over the performance hurdle, you know, more recently people will say, well, you don't have the ability to do updates and deletes like a traditional data warehouse. >>And now we've got the creation of new data formats, again, like iceberg and Delta and hoote that do allow for updates and delete. So I think the data lake has continued to mature. And I remember a quote from, you know, Kurt Monash many years ago where he said, you know, it takes six or seven years to build a functional database. I think that's that's right. And now we've had almost a decade go by. So, you know, these technologies have matured to really deliver very, very close to the same level performance and functionality of, of cloud data warehouses. So I think the, the reality is that's become a lie and now we have large giant hyperscale internet companies that, you know, don't have the traditional data warehouse at all. They do all of their analytics in a data lake. So I think we've, we've proven that it's very much possible today. >>Thank you for that. And so Richard, talk about your perspective as a practitioner in terms of what open brings you versus, I mean, the clothes is it's open as a moving target. I remember Unix used to be open systems and so it's, it is an evolving, you know, spectrum, but, but from your perspective, what does open give you that you can't get from a proprietary system where you are fearful of in a proprietary system? >>I, I suppose for me open buys us the ability to be unsure about the future, because one thing that's always true about technology is it evolves in a, a direction, slightly different to what people expect and what you don't want to end up done is backed itself into a corner that then prevents it from innovating. So if you have chosen the technology and you've stored trillions of records in that technology and suddenly a new way of processing or machine learning comes out, you wanna be able to take advantage your competitive edge might depend upon it. And so I suppose for us, we acknowledge that we don't have perfect vision of what the future might be. And so by backing open storage technologies, we can apply a number of different technologies to the processing of that data. And that gives us the ability to remain relevant, innovate on our data storage. And we have bought our way out of the, any performance concerns because we can use cloud scale infrastructure to scale up and scale down as we need. And so we don't have the concerns that we don't have enough hardware today to process what we want to do, want to achieve. We can just scale up when we need it and scale back down. So open source has really allowed us to maintain the being at the cutting edge. >>So Jess, let me play devil's advocate here a little bit, and I've talked to JAK about this and you know, obviously her vision is there's an open source that, that data mesh is open source, an open source tooling, and it's not a proprietary, you know, you're not gonna buy a data mesh. You're gonna build it with, with open source toolings and, and vendors like you are gonna support it, but come back to sort of today, you can get to market with a proprietary solution faster. I'm gonna make that statement. You tell me if it's a lie and then you can say, okay, we support Apache iceberg. We're gonna support open source tooling, take a company like VMware, not really in the data business, but how, the way they embraced Kubernetes and, and you know, every new open source thing that comes along, they say, we do that too. Why can't proprietary systems do that and be as effective? >>Yeah, well I think at least with the, within the data landscape saying that you can access open data formats like iceberg or, or others is, is a bit dis disingenuous because really what you're selling to your customer is a certain degree of performance, a certain SLA, and you know, those cloud data warehouses that can reach beyond their own proprietary storage drop all the performance that they were able to provide. So it is, it reminds me kind of, of, again, going back 10 or 12 years ago when everybody had a connector to hit and that they thought that was the solution, right? But the reality was, you know, a connector was not the same as running workloads in hit back then. And I think similarly, you know, being able to connect to an external table that lives in an open data format, you know, you're, you're not going to give it the performance that your customers are accustomed to. And at the end of the day, they're always going to be predisposed. They're always going to be incentivized to get that data ingested into the data warehouse, cuz that's where they have control. And you know, the bottom line is the database industry has really been built around vendor lockin. I mean, from the start, how, how many people love Oracle today, but our customers, nonetheless, I think, you know, lockin is, is, is part of this industry. And I think that's really what we're trying to change with open data formats. >>Well, it's interesting remind of when I, you know, I see the, the gas price, the TSR gas price I, I drive up and then I say, oh, that's the cash price credit card. I gotta pay 20 cents more, but okay. But so the, the argument then, so let me, let me come back to you, Justin. So what's wrong with saying, Hey, we support open data formats, but yeah, you're gonna get better performance if you, if you, you keep it into our closed system, are you saying that long term that's gonna come back and bite you cuz you're gonna end up, you mentioned Oracle, you mentioned Teradata. Yeah. That's by, by implication, you're saying that's where snowflake customers are headed. >>Yeah, absolutely. I think this is a movie that, you know, we've all seen before. At least those of us who've been in the industry long enough to, to see this movie play over a couple times. So I do think that's the future. And I think, you know, I loved what Richard said. I actually wrote it down. Cause I thought it was an amazing quote. He said, it buys us the ability to be unsure of the future. That that pretty much says it all the, the future is unknowable and the reality is using open data formats. You remain interoperable with any technology you want to utilize. If you want to use spark to train a machine learning model and you wanna use Starbust to query via sequel, that's totally cool. They can both work off the same exact, you know, data, data sets by contrast, if you're, you know, focused on a proprietary model, then you're kind of locked in again to that model. I think the same applies to data, sharing to data products, to a wide variety of, of aspects of the data landscape that a proprietary approach kind of closes you and, and locks you in. >>So I, I would say this Richard, I'd love to get your thoughts on it. Cause I talked to a lot of Oracle customers, not as many te data customers there, but, but a lot of Oracle customers and they, you know, they'll admit yeah, you know, the Jammin us on price and the license cost, but we do get value out of it. And so my question to you, Richard, is, is do the, let's call it data warehouse systems or the proprietary systems. Are they gonna deliver a greater ROI sooner? And is that in allure of, of that customers, you know, are attracted to, or can open platforms deliver as fast an ROI? >>I think the answer to that is it can depend a bit. It depends on your business's skillset. So we are lucky that we have a number of proprietary teams that work in databases that provide our operational data capability. And we have teams of analytics and big data experts who can work with open data sets and open data formats. And so for those different teams, they can get to an ROI more quickly with different technologies for the business though, we can't do better for our operational data stores than proprietary databases. Today we can back off very tight SLAs to them. We can demonstrate reliability from millions of hours of those databases being run at enterprise scale, but for an analytics workload where increasing our business is growing in that direction, we can't do better than open data formats with cloud-based data mesh type technologies. And so it's not a simple answer. That one will always be the right answer for our business. We definitely have times when proprietary databases provide a capability that we couldn't easily represent or replicate with open technologies. >>Yeah. Richard, stay with you. You mentioned, you know, you know, some things before that, that strike me, you know, the data brick snowflake, you know, thing is always a lot of fun for analysts like me. You've got data bricks coming at it. Richard, you mentioned you have a lot of rockstar, data engineers, data bricks coming at it from a data engineering heritage. You get snowflake coming at it from an analytics heritage. Those two worlds are, are colliding people like PJI Mohan said, you know what? I think it's actually harder to play in the data engineering. So IE, it's easier to for data engineering world to go into the analytics world versus the reverse, but thinking about up and coming engineers and developers preparing for this future of data engineering and data analytics, how, how should they be thinking about the future? What, what's your advice to those young people? >>So I think I'd probably fall back on general programming skill sets. So the advice that I saw years ago was if you have open source technologies, the pythons and Javas on your CV, you command a 20% pay, hike over people who can only do proprietary programming languages. And I think that's true of data technologies as well. And from a business point of view, that makes sense. I'd rather spend the money that I save on proprietary licenses on better engineers, because they can provide more value to the business that can innovate us beyond our competitors. So I think I would my advice to people who are starting here or trying to build teams to capitalize on data assets is begin with open license, free capabilities because they're very cheap to experiment with. And they generate a lot of interest from people who want to join you as a business. And you can make them very successful early, early doors with, with your analytics journey. >>It's interesting. Again, analysts like myself, we do a lot of TCO work and have over the last 20 plus years and in the world of Oracle, you know, normally it's the staff, that's the biggest nut in total cost of ownership, not an Oracle. It's the it's the license cost is by far the biggest component in the, in the blame pie. All right, Justin, help us close out this segment. We've been talking about this sort of data mesh open, closed snowflake data bricks. Where does Starburst sort of as this engine for the data lake data lake house, the data warehouse, it, it fit in this, in this world. >>Yeah. So our view on how the future ultimately unfolds is we think that data lakes will be a natural center of gravity for a lot of the reasons that we described open data formats, lowest total cost of ownership, because you get to choose the cheapest storage available to you. Maybe that's S3 or Azure data lake storage or Google cloud storage, or maybe it's on-prem object storage that you bought at a, at a really good price. So ultimately storing a lot of data in a data lake makes a lot of sense, but I think what makes our perspective unique is we still don't think you're gonna get everything there either. We think that basically centralization of all your data assets is just an impossible endeavor. And so you wanna be able to access data that lives outside of the lake as well. So we kind of think of the lake as maybe the biggest place by volume in terms of how much data you have, but to, to have comprehensive analytics and to truly understand your business and understanding holistically, you need to be able to go access other data sources as well. And so that's the role that we wanna play is to be a single point of access for our customers, provide the right level of fine grained access controls so that the right people have access to the right data and ultimately make it easy to discover and consume via, you know, the creation of data products as well. >>Great. Okay. Thanks guys. Right after this quick break, we're gonna be back to debate whether the cloud data model that we see emerging and the so-called modern data stack is really modern or is it the same wine new bottle when it comes to data architectures, you're watching the cube, the leader in enterprise and emerging tech coverage.

Published Date : Aug 22 2022

SUMMARY :

give you the performance and control that you can get with a proprietary We got, you know, largely over the performance hurdle, you know, more recently people will say, And I remember a quote from, you know, Kurt Monash many years ago where he said, you know, it is an evolving, you know, spectrum, but, but from your perspective, in a, a direction, slightly different to what people expect and what you don't want to end up So Jess, let me play devil's advocate here a little bit, and I've talked to JAK about this and you know, And I think similarly, you know, being able to connect to an external table that lives in an open data format, Well, it's interesting remind of when I, you know, I see the, the gas price, the TSR gas price And I think, you know, I loved what Richard said. you know, the Jammin us on price and the license cost, but we do get value out And so for those different teams, they can get to an you know, the data brick snowflake, you know, thing is always a lot of fun for analysts like me. So the advice that I saw years ago was if you have open source technologies, years and in the world of Oracle, you know, normally it's the staff, to discover and consume via, you know, the creation of data products as well. data model that we see emerging and the so-called modern data stack is

ENTITIES

Entity	Category	Confidence
Jess Borgman	PERSON	0.99+
Richard	PERSON	0.99+
20 cents	QUANTITY	0.99+
six	QUANTITY	0.99+
Justin	PERSON	0.99+
Richard Jarvis	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Kurt Monash	PERSON	0.99+
20%	QUANTITY	0.99+
Jess	PERSON	0.99+
pythons	TITLE	0.99+
seven years	QUANTITY	0.99+
Today	DATE	0.99+
Javas	TITLE	0.99+
Teradata	ORGANIZATION	0.99+
VMware	ORGANIZATION	0.98+
millions	QUANTITY	0.98+
EVAs	ORGANIZATION	0.98+
JAK	PERSON	0.98+
Starburst	ORGANIZATION	0.98+
both	QUANTITY	0.97+
10	DATE	0.97+
12 years ago	DATE	0.97+
Starbust	TITLE	0.96+
today	DATE	0.95+
Apache iceberg	ORGANIZATION	0.94+
Google	ORGANIZATION	0.93+
12 years	QUANTITY	0.92+
single point	QUANTITY	0.92+
two worlds	QUANTITY	0.92+
10	QUANTITY	0.91+
Hudu	LOCATION	0.91+
Unix	TITLE	0.9+
one thing	QUANTITY	0.87+
trillions of records	QUANTITY	0.83+
first data lake	QUANTITY	0.82+
Starburst	TITLE	0.8+
PJI	ORGANIZATION	0.79+
years ago	DATE	0.76+
IE	TITLE	0.75+
Lie 2	TITLE	0.72+
many years ago	DATE	0.72+
over a couple times	QUANTITY	0.7+
TCO	ORGANIZATION	0.7+
Parque	ORGANIZATION	0.67+
Number two	QUANTITY	0.64+
Kubernetes	ORGANIZATION	0.59+
a decade	QUANTITY	0.58+
plus years	DATE	0.57+
Azure	TITLE	0.57+
S3	TITLE	0.55+
Delta	TITLE	0.54+
20	QUANTITY	0.49+
last	DATE	0.48+
Mohan	PERSON	0.44+
ORC	ORGANIZATION	0.27+

Starburst Panel Q2

>>We're back with Jess Borgman of Starburst and Richard Jarvis of emus health. Okay. We're gonna get into lie. Number two, and that is this an open source based platform cannot give you the performance and control that you can get with a proprietary system. Is that a lie? Justin, the enterprise data warehouse has been pretty dominant and has evolved and matured. Its stack has mature over the years. Why is it not the default platform for data? >>Yeah, well, I think that's become a lie over time. So I, I think, you know, if we go back 10 or 12 years ago with the advent of the first data lake really around Hudu, that probably was true that you couldn't get the performance that you needed to run fast, interactive, SQL queries in a data lake. Now a lot's changed in 10 or 12 years. I remember in the very early days, people would say, you'll, you'll never get performance because you need to be column. You need to store data in a column format. And then, you know, column formats were introduced to, to data lakes. You have Parque ORC file in aro that were created to ultimately deliver performance out of that. So, okay. We got, you know, largely over the performance hurdle, you know, more recently people will say, well, you don't have the ability to do updates and deletes like a traditional data warehouse. >>And now we've got the creation of new data formats, again like iceberg and Delta and DY that do allow for updates and delete. So I think the data lake has continued to mature. And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, it takes six or seven years to build a functional database. I think that's that's right. And now we've had almost a decade go by. So, you know, these technologies have matured to really deliver very, very close to the same level performance and functionality of, of cloud data warehouses. So I think the, the reality is that's become a lie and now we have large giant hyperscale internet companies that, you know, don't have the traditional data warehouse at all. They do all of their analytics in a data lake. So I think we've, we've proven that it's very much possible today. >>Thank you for that. And so Richard, talk about your perspective as a practitioner in terms of what open brings you versus, I mean, the closed is it's open as a moving target. I remember Unix used to be open systems and so it's, it is an evolving, you know, spectrum, but, but from your perspective, what does open give you that you can't get from a proprietary system where you are fearful of in a proprietary system? >>I, I suppose for me open buys us the ability to be unsure about the future, because one thing that's always true about technology is it evolves in a, a direction, slightly different to what people expect. And what you don't want to end up is done is backed itself into a corner that then prevents it from innovating. So if you have chosen the technology and you've stored trillions of records in that technology and suddenly a new way of processing or machine learning comes out, you wanna be able to take advantage and your competitive edge might depend upon it. And so I suppose for us, we acknowledge that we don't have perfect vision of what the future might be. And so by backing open storage technologies, we can apply a number of different technologies to the processing of that data. And that gives us the ability to remain relevant, innovate on our data storage. And we have bought our way out of the, any performance concerns because we can use cloud scale infrastructure to scale up and scale down as we need. And so we don't have the concerns that we don't have enough hardware today to process what we want to do, but want to achieve. We can just scale up when we need it and scale back down. So open source has really allowed us to maintain the being at the cutting edge. >>So Justin, let me play devil's advocate here a little bit, and I've talked to JAK about this and you know, obviously her vision is there's an open source that, that data mesh is open source, an open source tooling, and it's not a proprietary, you know, you're not gonna buy a data mesh. You're gonna build it with, with open source toolings and, and vendors like you are gonna support it, but come back to sort of today, you can get to market with a proprietary solution faster. I'm gonna make that statement. You tell me if it's a lie and then you can say, okay, we support Apache iceberg. We're gonna support open source tooling, take a company like VMware, not really in the data business, but how, the way they embraced Kubernetes and, and you know, every new open source thing that comes along, they say, we do that too. Why can't proprietary systems do that and be as effective? >>Yeah, well, I think at least with the, within the data landscape saying that you can access open data formats like iceberg or, or others is, is a bit dis disingenuous because really what you're selling to your customer is a certain degree of performance, a certain SLA, and you know, those cloud data warehouses that can reach beyond their own proprietary storage drop all the performance that they were able to provide. So it is, it reminds me kind of, of, again, going back 10 or 12 years ago when everybody had a connector to hit and that they thought that was the solution, right? But the reality was, you know, a connector was not the same as running workloads in had back then. And I think, think similarly, you know, being able to connect to an external table that lives in an open data format, you know, you're, you're not going to give it the performance that your customers are accustomed to. And at the end of the day, they're always going to be predisposed. They're always going to be incentivized to get that data ingested into the data warehouse, cuz that's where they have control. And you know, the bottom line is the database industry has really been built around vendor lockin. I mean, from the start, how, how many people love Oracle today, but our customers, nonetheless, I think, you know, lockin is, is, is part of this industry. And I think that's really what we're trying to change with open data formats. >>Well, it's interesting reminded when I, you know, I see the, the gas price, the TSR gas price I, I drive up and then I say, oh, that's the cash price credit card. I gotta pay 20 cents more, but okay. But so the, the argument then, so let me, let me come back to you, Justin. So what's wrong with saying, Hey, we support open data formats, but yeah, you're gonna get better performance if you, if you, you keep it into our closed system, are you saying that long term that's gonna come back and bite you cuz you're gonna end up. You mentioned Oracle, you mentioned Teradata. Yeah. That's by, by implication, you're saying that's where snowflake customers are headed. >>Yeah, absolutely. I think this is a movie that, you know, we've all seen before. At least those of us who've been in the industry long enough to, to see this movie play over a couple times. So I do think that's the future. And I think, you know, I loved what Richard said. I actually wrote it down cause I thought it was amazing quote. He said, it buys us the ability to be unsure of the future. That that pretty much says it all the, the future is unknowable and the reality is using open data formats. You remain interoperable with any technology you want to utilize. If you want to use smart to train a machine learning model and you wanna use Starbust to query be a sequel, that's totally cool. They can both work off the same exact, you know, data, data sets by contrast, if you're, you know, focused on a proprietary model, then you're kind of locked in again to that model. I think the same applies to data, sharing to data products, to a wide variety of, of aspects of the data landscape that a proprietary approach kind of closes you and, and locks you in. >>So I would say this Richard, I'd love to get your thoughts on it. Cause I talked to a lot of Oracle customers, not as many te data customers, but, but a lot of Oracle customers and they, you know, they'll admit yeah, you know, they Jimin some price and the license cost they give, but we do get value out of it. And so my question to you, Richard, is, is do the, let's call it data warehouse systems or the proprietary systems. Are they gonna deliver a greater ROI sooner? And is that in allure of, of that customers, you know, are attracted to, or can open platforms deliver as fast an ROI? >>I think the answer to that is it can depend a bit. It depends on your business's skillset. So we are lucky that we have a number of proprietary teams that work in databases that provide our operational data capability. And we have teams of analytics and big data experts who can work with open data sets and open data formats. And so for those different teams, they can get to an ROI more quickly with different technologies for the business though, we can't do better for our operational data stores than proprietary databases. Today we can back off very tight SLAs to them. We can demonstrate reliability from millions of hours of those databases being run enterprise scale, but for an analytics workload where increasing our business is growing in that direction, we can't do better than open data formats with cloud based data mesh type technologies. And so it's not a simple answer. That one will always be the right answer for our business. We definitely have times when proprietary databases provide a capability that we couldn't easily represent or replicate with open technologies. >>Yeah. Richard, stay with you. You mentioned, you know, you know, some things before that, that strike me, you know, the data brick snowflake, you know, thing is a lot of fun for analysts like me. You've got data bricks coming at it. Richard, you mentioned you have a lot of rockstar, data engineers, data bricks coming at it from a data engineering heritage. You get snowflake coming at it from an analytics heritage. Those two worlds are, are colliding people like P Sanji Mohan said, you know what? I think it's actually harder to play in the data engineering. So I E it's easier to for data engineering world to go into the analytics world versus the reverse, but thinking about up and coming engineers and developers preparing for this future of data engineering and data analytics, how, how should they be thinking about the future? What, what's your advice to those young people? >>So I think I'd probably fall back on general programming skill sets. So the advice that I saw years ago was if you have open source technologies, the pythons and Javas on your CV, you command a 20% pay, hike over people who can only do proprietary programming languages. And I think that's true of data technologies as well. And from a business point of view, that makes sense. I'd rather spend the money that I save on proprietary licenses on better engineers, because they can provide more value to the business that can innovate us beyond our competitors. So I think I would my advice to people who are starting here or trying to build teams to capitalize on data assets is begin with open license, free capabilities, because they're very cheap to experiment with. And they generate a lot of interest from people who want to join you as a business. And you can make them very successful early, early doors with, with your analytics journey. >>It's interesting. Again, analysts like myself, we do a lot of TCO work and have over the last 20 plus years and in the world of Oracle, you know, normally it's the staff, that's the biggest nut in total cost of ownership, not an Oracle. It's the it's the license cost is by far the biggest component in the, in the blame pie. All right, Justin, help us close out this segment. We've been talking about this sort of data mesh open, closed snowflake data bricks. Where does Starburst sort of as this engine for the data lake data lake house, the data warehouse, it fit in this, in this world. >>Yeah. So our view on how the future ultimately unfolds is we think that data lakes will be a natural center of gravity for a lot of the reasons that we described open data formats, lowest total cost of ownership, because you get to choose the cheapest storage available to you. Maybe that's S3 or Azure data lake storage, or Google cloud storage, or maybe it's on-prem object storage that you bought at a, at a really good price. So ultimately storing a lot of data in a data lake makes a lot of sense, but I think what makes our perspective unique is we still don't think you're gonna get everything there either. We think that basically centralization of all your data assets is just an impossible endeavor. And so you wanna be able to access data that lives outside of the lake as well. So we kind of think of the lake as maybe the biggest place by volume in terms of how much data you have, but to, to have comprehensive analytics and to truly understand your business and understand it holistically, you need to be able to go access other data sources as well. And so that's the role that we wanna play is to be a single point of access for our customers, provide the right level of fine grained access control so that the right people have access to the right data and ultimately make it easy to discover and consume via, you know, the creation of data products as well. >>Great. Okay. Thanks guys. Right after this quick break, we're gonna be back to debate whether the cloud data model that we see emerging and the so-called modern data stack is really modern, or is it the same wine new bottle when it comes to data architectures, you're watching the cube, the leader in enterprise and emerging tech coverage.

Published Date : Aug 2 2022

SUMMARY :

cannot give you the performance and control that you can get with We got, you know, largely over the performance hurdle, you know, more recently people will say, And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, open systems and so it's, it is an evolving, you know, spectrum, And what you don't want to end up So Justin, let me play devil's advocate here a little bit, and I've talked to JAK about this and you know, And I think, think similarly, you know, being able to connect to an external table that lives in an open data Well, it's interesting reminded when I, you know, I see the, the gas price, And I think, you know, I loved what Richard said. not as many te data customers, but, but a lot of Oracle customers and they, you know, I think the answer to that is it can depend a bit. that strike me, you know, the data brick snowflake, you know, thing is a lot of fun for analysts So the advice that I saw years ago was if you have open source technologies, years and in the world of Oracle, you know, normally it's the staff, it easy to discover and consume via, you know, the creation of data products as well. data model that we see emerging and the so-called modern data stack

ENTITIES

Entity	Category	Confidence
Richard	PERSON	0.99+
Jess Borgman	PERSON	0.99+
Justin	PERSON	0.99+
six	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
Richard Jarvis	PERSON	0.99+
20 cents	QUANTITY	0.99+
20%	QUANTITY	0.99+
Kurt Monash	PERSON	0.99+
P Sanji Mohan	PERSON	0.99+
Today	DATE	0.99+
seven years	QUANTITY	0.99+
pythons	TITLE	0.99+
Teradata	ORGANIZATION	0.99+
JAK	PERSON	0.99+
Javas	TITLE	0.99+
10	DATE	0.99+
today	DATE	0.98+
Starbust	TITLE	0.98+
Starburst	ORGANIZATION	0.97+
VMware	ORGANIZATION	0.97+
both	QUANTITY	0.97+
12 years ago	DATE	0.96+
single point	QUANTITY	0.96+
millions of hours	QUANTITY	0.95+
10	QUANTITY	0.93+
Unix	TITLE	0.92+
12 years	QUANTITY	0.92+
Google	ORGANIZATION	0.9+
two worlds	QUANTITY	0.9+
DY	ORGANIZATION	0.87+
first data lake	QUANTITY	0.86+
Hudu	LOCATION	0.85+
trillions	QUANTITY	0.85+
one thing	QUANTITY	0.83+
many years ago	DATE	0.79+
Apache iceberg	ORGANIZATION	0.79+
over a couple times	QUANTITY	0.77+
emus health	ORGANIZATION	0.75+
Jimin	PERSON	0.73+
Starburst	TITLE	0.73+
years ago	DATE	0.72+
Azure	TITLE	0.7+
Kubernetes	ORGANIZATION	0.67+
TCO	ORGANIZATION	0.64+
S3	TITLE	0.62+
Delta	ORGANIZATION	0.6+
plus years	DATE	0.59+
Number two	QUANTITY	0.58+
a decade	QUANTITY	0.56+
iceberg	TITLE	0.47+
Parque	ORGANIZATION	0.47+
last	DATE	0.47+
20	QUANTITY	0.46+
Q2	QUANTITY	0.31+
ORC	ORGANIZATION	0.27+

Starburst Panel Q1

>>In 2011, early Facebook employee and Cloudera co-founder Jeff Ocker famously said the best minds of my generation are thinking about how to get people to click on ads. And that sucks. Let's face it more than a decade later organizations continue to be frustrated with how difficult it is to get value from data and build a truly agile data driven enterprise. What does that even mean? You ask? Well, it means that everyone in the organization has the data they need when they need it. In a context that's relevant to advance the mission of an organization. Now that could mean cutting costs could mean increasing profits, driving productivity, saving lives, accelerating drug discovery, making better diagnoses, solving, supply chain problems, predicting weather disasters, simplifying processes, and thousands of other examples where data can completely transform people's lives beyond manipulating internet users to behave a certain way. We've heard the prognostications about the possibilities of data before and in fairness we've made progress, but the hard truth is the original promises of master data management, enterprise data, warehouses, data, Mars, data hubs, and yes, even data lakes were broken and left us wanting for more welcome to the data doesn't lie, or does it a series of conversations produced by the cube and made possible by Starburst data. >>I'm your host, Dave Lanta and joining me today are three industry experts. Justin Borgman is this co-founder and CEO of Starburst. Richard Jarvis is the CTO at EMI health and Theresa tongue is cloud first technologist at Accenture. Today we're gonna have a candid discussion that will expose the unfulfilled and yes, broken promises of a data past we'll expose data lies, big lies, little lies, white lies, and hidden truths. And we'll challenge, age old data conventions and bust some data myths. We're debating questions like is the demise of a single source of truth. Inevitable will the data warehouse ever have feature parody with the data lake or vice versa is the so-called modern data stack simply centralization in the cloud, AKA the old guards model in new cloud close. How can organizations rethink their data architectures and regimes to realize the true promises of data can and will and open ecosystem deliver on these promises in our lifetimes, we're spanning much of the Western world today. Richard is in the UK. Teresa is on the west coast and Justin is in Massachusetts with me. I'm in the cube studios about 30 miles outside of Boston folks. Welcome to the program. Thanks for coming on. Thanks for having us. Let's get right into it. You're very welcome. Now here's the first lie. The most effective data architecture is one that is centralized with a team of data specialists serving various lines of business. What do you think Justin? >>Yeah, definitely a lie. My first startup was a company called hit adapt, which was an early SQL engine for IDU that was acquired by Teradata. And when I got to Teradata, of course, Terada is the pioneer of that central enterprise data warehouse model. One of the things that I found fascinating was that not one of their customers had actually lived up to that vision of centralizing all of their data into one place. They all had data silos. They all had data in different systems. They had data on-prem data in the cloud. You know, those companies were acquiring other companies and inheriting their data architecture. So, you know, despite being the industry leader for 40 years, not one of their customers truly had everything in one place. So I think definitely history has proven that to be a lie. >>So Richard, from a practitioner's point of view, you know, what, what are your thoughts? I mean, there, there's a lot of pressure to cut cost, keep things centralized, you know, serve the business as best as possible from that standpoint. What, what is your experience, Joe? >>Yeah, I mean, I think I would echo Justin's experience really that we, as a business have grown up through acquisition, through storing data in different places sometimes to do information governance in different ways to store data in, in a platform that's close to data experts, people who really understand healthcare data from pharmacies or from, from doctors. And so, although if you were starting from a Greenfield site and you were building something brand new, you might be able to centralize all the data and all of the tooling and teams in one place. The reality is that that businesses just don't grow up like that. And, and it's just really impossible to get that academic perfection of, of storing everything in one place. >>Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, you know? Right. But you actually did have to have a single version of the truth for certain financial data, but really for those, some of those other use cases, I, I mentioned, I, I do feel like the industry has kinda let us down. What's your take on this? Where does it make sense to have that sort of centralized approach versus where does it make sense to maybe decentralized? >>I, I think you gotta have centralized governance, right? So from the central team, for things like swans Oxley, for things like security, for certain very core data sets, having a centralized set of roles, responsibilities to really QA, right. To serve as a design authority for your entire data estate, just like you might with security, but how it's implemented has to be distributed. Otherwise you're not gonna be able to scale. Right? So being able to have different parts of the business really make the right data investments for their needs. And then ultimately you're gonna collaborate with your partners. So partners that are not within the company, right. External partners, we're gonna see a lot more data sharing and model creation. And so you're definitely going to be decentralized. >>So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, on data mesh. It was a great program. You invited JAK, Dani, of course, she's the creator of the data mesh. And her one of our fundamental premises is that you've got this hyper specialized team that you've gotta go through. And if you want anything, but at the same time, these, these individuals actually become a bottleneck, even though they're some of the most talented people in the organization. So I guess question for you, Richard, how do you deal with that? Do you, do you organize so that there are a few sort of rock stars that, that, you know, build cubes and, and the like, and, and, and, or have you had any success in sort of decentralizing with, you know, your, your constituencies, that data model? >>Yeah. So, so we absolutely have got rockstar, data scientists and data guardians. If you like people who understand what it means to use this data, particularly as the data that we use at emos is very private it's healthcare information. And some of the, the rules and regulations around using the data are very complex and, and strict. So we have to have people who understand the usage of the data, then people who understand how to build models, how to process the data effectively. And you can think of them like consultants to the wider business, because a pharmacist might not understand how to structure a SQL query, but they do understand how they want to process medication information to improve patient lives. And so that becomes a, a consulting type experience from a, a set of rock stars to help a, a more decentralized business who needs to, to understand the data and to generate some valuable output. >>Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, I got a centralized team and that's the most cost effective way to serve the business. Otherwise I got, I got duplication. What do you say to that? >>Well, I, I would argue it's probably not the most cost effective and, and the reason being really twofold. I think, first of all, when you are deploying a enterprise data warehouse model, the, the data warehouse itself is very expensive, generally speaking. And so you're putting all of your most valuable data in the hands of one vendor who now has tremendous leverage over you, you know, for many, many years to come, I think that's the story of Oracle or Terra data or other proprietary database systems. But the other aspect I think is that the reality is those central data warehouse teams is as much as they are experts in the technology. They don't necessarily understand the data itself. And this is one of the core tenets of data mash that that jam writes about is this idea of the domain owners actually know the data the best. >>And so by, you know, not only acknowledging that data is generally decentralized and to your earlier point about, so Oxley, maybe saving the data warehouse, I would argue maybe GDPR and data sovereignty will destroy it because data has to be decentralized for, for those laws to be compliant. But I think the reality is, you know, the data mesh model basically says, data's decentralized, and we're gonna turn that into an asset rather than a liability. And we're gonna turn that into an asset by empowering the people that know the data, the best to participate in the process of, you know, curating and creating data products for, for consumption. So I think when you think about it, that way, you're going to get higher quality data and faster time to insight, which is ultimately going to drive more revenue for your business and reduce costs. So I think that that's the way I see the two, the two models comparing and con contrasting. >>So do you think the demise of the data warehouse is inevitable? I mean, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing infrastructure. Maybe they're gonna build on top of it, but the, what does that mean? Does that mean the ed w just becomes, you know, less and less valuable over time, or it's maybe just isolated to specific use cases. What's your take on that? >>Listen, I still would love all my data within a data warehouse would love it. Mastered would love it owned by essential team. Right? I think that's still what I would love to have. That's just not the reality, right? The investment to actually migrate and keep that up to date. I would say it's a losing battle. Like we've been trying to do it for a long time. Nobody has the budgets and then data changes, right? There's gonna be a new technology. That's gonna emerge that we're gonna wanna tap into. There's gonna be not enough investment to bring all the legacy, but still very useful systems into that centralized view. So you keep the data warehouse. I think it's a very, very valuable, very high performance tool for what it's there for, but you could have this, you know, new mesh layer that still takes advantage of the things. I mentioned, the data products in the systems that are meaningful today and the data products that actually might span a number of systems. Maybe either those that either source systems, the domains that know it best, or the consumer based systems and products that need to be packaged in a way that be really meaningful for that end user, right? Each of those are useful for a different part of the business and making sure that the mesh actually allows you to lose all of them. >>So, Richard, let me ask you, you take, take Gemma's principles back to those. You got, you know, the domain ownership and, and, and data as product. Okay, great. Sounds good. But it creates what I would argue or two, you know, challenges self-serve infrastructure let's park that for a second. And then in your industry, one of the high, most regulated, most sensitive computational governance, how do you automate and ensure federated governance in that mesh model that Theresa was just talking about? >>Well, it absolutely depends on some of the tooling and processes that you put in place around those tools to be, to centralize the security and the governance of the data. And, and I think, although a data warehouse makes that very simple, cause it's a single tool, it's not impossible with some of the data mesh technologies that are available. And so what we've done at EMI is we have a single security layer that sits on top of our data mesh, which means that no matter which user is accessing, which data source, we go through a well audited well understood security layer. That means that we know exactly who's got access to which data field, which data tables. And then everything that they do is, is audited in a very kind of standard way, regardless of the underlying data storage technology. So for me, although storing the data in one place might not be possible understanding where your source of truth is and securing that in a common way is still a valuable approach and you can do it without having to bring all that data into a single bucket so that it's all in one place. >>And, and so having done that and investing quite heavily in making that possible has paid dividends in terms of giving wider access to the platform and ensuring that only data that's available under GDPR and other regulations is being used by, by the data users. >>Yeah. So Justin mean Democrat, we always talk about data democratization and you know, up until recently, they really haven't been line of sight as to how to get there. But do you have anything to add to this because you're essentially taking, you know, doing analytic queries and with data, that's all dispersed all over the, how are you seeing your customers handle this, this challenge? >>Yeah, I mean, I think data products is a really interesting aspect of the answer to that. It allows you to, again, leverage the data domain owners, people know the data, the best to, to create, you know, data as a product ultimately to be consumed. And we try to represent that in our product as effectively, almost eCommerce, like experience where you go and discover and look for the data products that have been created in your organization. And then you can start to consume them as, as you'd like. And so really trying to build on that notion of, you know, data democratization and self-service, and making it very easy to discover and, and start to use with whatever BI tool you, you may like, or even just running, you know, SQL queries yourself. >>Okay. G guys grab a sip of water. After the short break, we'll be back to debate whether proprietary or open platforms are the best path to the future of data excellence. Keep it right there.

Published Date : Aug 2 2022

SUMMARY :

famously said the best minds of my generation are thinking about how to get people to Teresa is on the west coast and Justin is in Massachusetts with me. So, you know, despite being the industry leader for 40 years, not one of their customers truly had So Richard, from a practitioner's point of view, you know, what, what are your thoughts? you might be able to centralize all the data and all of the tooling and teams in one place. Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, I, I think you gotta have centralized governance, right? of rock stars that, that, you know, build cubes and, and the like, And you can think of them like consultants Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, you know, for many, many years to come, I think that's the story of Oracle or Terra data or other proprietary But I think the reality is, you know, the data mesh model basically says, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing you know, new mesh layer that still takes advantage of the things. But it creates what I would argue or two, you know, Well, it absolutely depends on some of the tooling and processes that you put in place around And, and so having done that and investing quite heavily in making that possible But do you have anything to add to this because you're essentially taking, you know, the best to, to create, you know, data as a product ultimately to be consumed. open platforms are the best path to the future of

ENTITIES

Entity	Category	Confidence
Dave Lanta	PERSON	0.99+
Dani	PERSON	0.99+
Richard	PERSON	0.99+
Justin Borgman	PERSON	0.99+
Justin	PERSON	0.99+
Jeff Ocker	PERSON	0.99+
Theresa	PERSON	0.99+
Richard Jarvis	PERSON	0.99+
Teresa	PERSON	0.99+
Massachusetts	LOCATION	0.99+
Teradata	ORGANIZATION	0.99+
40 years	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
UK	LOCATION	0.99+
two	QUANTITY	0.99+
Joe	PERSON	0.99+
GDPR	TITLE	0.99+
JAK	PERSON	0.99+
2011	DATE	0.99+
Starburst	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
thousands	QUANTITY	0.99+
two models	QUANTITY	0.99+
EMI	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
Gemma	PERSON	0.99+
Terada	ORGANIZATION	0.99+
Accenture	ORGANIZATION	0.99+
Each	QUANTITY	0.99+
first lie	QUANTITY	0.99+
today	DATE	0.99+
first startup	QUANTITY	0.98+
Cloudera	ORGANIZATION	0.98+
Today	DATE	0.98+
SQL	TITLE	0.98+
first technologist	QUANTITY	0.97+
one place	QUANTITY	0.97+
Democrat	ORGANIZATION	0.97+
single	QUANTITY	0.97+
about 30 miles	QUANTITY	0.97+
one	QUANTITY	0.96+
three industry experts	QUANTITY	0.95+
more than a decade later	DATE	0.94+
One	QUANTITY	0.94+
hit adapt	ORGANIZATION	0.94+
Terra data	ORGANIZATION	0.93+
Greenfield	LOCATION	0.92+
single source	QUANTITY	0.91+
single tool	QUANTITY	0.91+
Oxley	PERSON	0.91+
one vendor	QUANTITY	0.9+
single bucket	QUANTITY	0.9+
single version	QUANTITY	0.88+
about a year ago	DATE	0.85+
Theresa tongue	PERSON	0.83+
emos	ORGANIZATION	0.82+
Mars	ORGANIZATION	0.8+
swans Oxley	PERSON	0.77+
IDU	TITLE	0.69+
first	QUANTITY	0.59+
a second	QUANTITY	0.55+
Sarbanes Oxley	ORGANIZATION	0.53+
Mastered	PERSON	0.45+
Q1	QUANTITY	0.37+

LIVE Panel: "Easy CI With Docker"

>>Hey, welcome to the live panel. My name is Brett. I am your host, and indeed we are live. In fact, if you're curious about that, if you don't believe us, um, let's just show a little bit of the browser real quick to see. Yup. There you go. We're live. So, all right. So how this is going to work is I'm going to bring in some guests and, uh, in one second, and we're going to basically take your questions on the topic designer of the day, that continuous integration testing. Uh, thank you so much to my guests welcoming into the panel. I've got Carlos, Nico and Mandy. Hello everyone. >>Hello? All right, >>Let's go. Let's go around the room and all pretend we don't know each other and that the internet didn't read below the video who we are. Uh, hi, my name is Brett. I am a Docker captain, which means I'm supposed to know something about Docker. I'm coming from Virginia Beach. I'm streaming here from Virginia Beach, Virginia, and, uh, I make videos on the internet and courses on you to me, Carlos. Hey, >>Hey, what's up? I'm Carlos Nunez. I am a solutions architect, VMware. I do solution things with computers. It's fun. I live in Dallas when I'm moving to Houston in a month, which is where I'm currently streaming. I've been all over the Northeast this whole week. So, um, it's been fun and I'm excited to meet with all of you and talk about CIA and Docker. Sure. >>Yeah. Hey everyone. Uh, Nico, Khobar here. I'm a solution engineer at HashiCorp. Uh, I am streaming to you from, uh, the beautiful Austin, Texas. Uh, ignore, ignore the golden gate bridge here. This is from my old apartment in San Francisco. Uh, just, uh, you know, keeping that, to remember all the good days, um, that that lived at. But, uh, anyway, I work at Patrick Corp and I work on all things, automation, um, and cloud and dev ops. Um, and I'm excited to be here and Mandy, >>Hi. Yeah, Mandy Hubbard. I am streaming from Austin, Texas. I am, uh, currently a DX engineer at ship engine. Um, I've worked in QA and that's kind of where I got my, uh, my Docker experience and, um, uh, moving into DX to try and help developers better understand and use our products and be an advocate for them. >>Nice. Well, thank you all for joining me. Uh, I really appreciate you taking the time out of your busy schedule to be here. And so for those of you in chat, the reason we're doing this live, because it's always harder to do things live. The reason we're here is to answer a question. So we didn't come with a bunch of slides and demos or anything like that. We're here to talk amongst ourselves about ideas and really here for you. So we've, we obviously, this is about easy CII, so we're, we're going to try to keep the conversation around testing and continuous integration and all the things that that entails with containers. But we may, we may go down rabbit holes. We may go veer off and start talking about other things, and that's totally fine if it's in the realm of dev ops and containers and developer and ops workflows, like, Hey, it's, it's kinda game. >>And, uh, these people have a wide variety of expertise. They haven't done just testing, right? We, we live in a world where you all kind of have to wear many hats. So feel free to, um, ask what you think is on the top of your mind. And we'll do our best to answer. It may, might not be the best answer or the correct answer, but we're going to do our best. Um, well, let's get it start off. Uh, let's, let's get a couple of topics to start off with. Uh, th the, the easy CGI was my, one of my three ideas. Cause he's the, one of the things that I'm most excited about is the innovation we're seeing around easier testing, faster testing, automated testing, uh, because as much as we've all been doing this stuff for, you know, 15 years, since 20 years since the sort of Jenkins early days, um, it it's, it seems like it's still really hard and it's still a lot of work. >>So, um, let's go around the room real quick, and everybody can just kind of talk for a minute about like your experience with testing and maybe some of your pain points, like what you don't like about our testing world. Um, and we can talk about some pains, cause I think that will lead us to kind of talk about what, what are the things we're seeing now that might be better, uh, ideas about how to do this. I know for me, uh, testing, obviously there's the code part, but just getting it automated, but mostly getting it in the hands of developers so that they can control their own testing. And don't have to go talk to a person to run that test again, or the mysterious Jenkins platform somewhere. I keep mentioning Jenkins cause it's, it is still the dominant player out there. Um, so for me, I'm, I'm, I, I don't like it when I'm walking into a room and there's, there's only one or two people that know how the testing works or know how to make the new tests go into the testing platform and stuff like that. So I'm always trying to free those things so that any of the developers are enabled and empowered to do that stuff. So someone else, Carlos, anybody, um, >>Oh, I have a lot of opinions on that. Having been a QA engineer for most of my career. Um, the shift that we're saying is everyone is dev ops and everyone is QA. Th the issue I see is no one asked developers if they wanted to be QA. Um, and so being the former QA on the team, when there's a problem, even though I'm a developer and we're all running QA, they always tend to come to the one of the former QA engineers. And they're not really owning that responsibility and, um, and digging in. So that's kind of what I'm saying is that we're all expected to test now. And some people, well, some people don't know how it's, uh, for me it was kind of an intuitive skill. It just kind of fit with my personality, but not knowing what to look for, not knowing what to automate, not even understanding how your API end points are used by your front end to know what to test when a change is made. It's really overwhelming for developers. And, um, we're going to need to streamline that and, and hold their hands a little bit until they get their feet wet with also being QA. >>Right. Right. So, um, uh, Carlos, >>Yeah, uh, testing is like, Tesla is one of my favorite subjects to talk about when I'm baring with developers. And a lot of it is because of what Mandy said, right? Like a lot of developers now who used to write a test and say, Hey, QA, go. Um, I wrote my unit tests. Now write the rest of the test. Essentially. Now developers are expected to be able to understand how testing, uh, testing methodologies work, um, in their local environments, right? Like they're supposed to understand how to write an integration tasks federate into and tasks, a component test. And of course, how to write unit tests that aren't just, you know, assert true is true, right? Like more comprehensive, more comprehensive, um, more high touch unit tests, which include things like mocking and stubbing and spine and all that stuff. And, you know, it's not so much getting those tests. Well, I've had a lot of challenges with developers getting those tests to run in Docker because of usually because of dependency hell, but, um, getting developers to understand how to write tests that matter and mean something. Um, it's, it's, it can be difficult, but it's also where I find a lot of the enjoyment of my work comes into play. So yeah. I mean, that's the difficulty I've seen around testing. Um, big subject though. Lots to talk about there. >>Yeah. We've got, we've already got so many questions coming in. You already got an hour's worth of stuff. So, uh, Nico 81st thoughts on that? >>Yeah, I think I definitely agree with, with other folks here on the panel, I think from a, um, the shift from a skillset perspective that's needed to adopt the new technologies, but I think from even from, uh, aside from the organizational, um, and kind of key responsibilities that, that the new developers have to kinda adapt to and, and kind of inherit now, um, there's also from a technical perspective as there's, you know, um, more developers are owning the full stack, including the infrastructure piece. So that adds a lot more to the plate in Tim's oaf, also testing that component that they were not even, uh, responsible for before. Um, and, um, also the second challenge that, you know, I'm seeing is that on, you know, the long list of added, um, uh, tooling and, you know, there's new tool every other day. Um, and, um, that kind of requires more customization to the testing, uh, that each individual team, um, any individual developer Y by extension has to learn. Uh, so the customization, uh, as well as the, kind of the scope that had, uh, you know, now in conferences, the infrastructure piece, um, uh, both of act to the, to the challenges that we're seeing right now for, um, for CGI and overall testing, um, uh, the developers are saying, uh, in, in the market today. >>Yeah. We've got a lot of questions, um, about all the, all the different parts of this. So, uh, let me just go straight to them. Cause that's why we're here is for the people, uh, a lot of people asking about your favorite tools and in one of this is one of the challenges with integration, right? Is, um, there is no, there are dominant players, but there, there is such a variety. I mean, every one of my customers seems like they're using a different workflow and a different set of tools. So, and Hey, we're all here to just talk about what we're, what we're using, uh, you know, whether your favorite tools. So like a lot of the repeated questions are, what are your favorite tools? Like if you could create it from scratch, uh, what would you use? Pierre's asking, you know, GitHub actions sounds like they're a fan of GitHub actions, uh, w you know, mentioning, pushing the ECR and Docker hub and, uh, using vs code pipeline, I guess there may be talking about Azure pipelines. Um, what, what's your preferred way? So, does anyone have any, uh, thoughts on that anyone want to throw out there? Their preferred pipeline of tooling? >>Well, I have to throw out mine. I might as Jenkins, um, like kind of a honorary cloud be at this point, having spoken a couple of times there, um, all of the plugins just make the functionality. I don't love the UI, but I love that it's been around so long. It has so much community support, and there are so many plugins so that if you want to do something, you don't have to write the code it's already been tested. Um, unfortunately I haven't been able to use Jenkins in, uh, since I joined ship engine, we, most of our, um, our, our monolithic core application is, is team city. It's a dotnet application and TeamCity plays really well with.net. Um, didn't love it, uh, Ms. Jenkins. And I'm just, we're just starting some new initiatives that are using GitHub actions, and I'm really excited to learn, to learn those. I think they have a lot of the same functionality that you're looking for, but, um, much more simplified in is right there and get hubs. So, um, the integration is a lot more seamless, but I do have to go on record that my favorite CICT tools Jenkins. >>All right. You heard it here first people. All right. Anyone else? You're muted? I'm muted. Carlin says muted. Oh, Carla says, guest has muted themselves to Carlos. You got to unmute. >>Yes. I did mute myself because I was typing a lot, trying to, you know, try to answer stuff in the chat. And there's a lot of really dark stuff in there. That's okay. Two more times today. So yeah, it's fine. Yeah, no problem. So totally. And it's the best way to start a play more. So I'm just going to go ahead and light it up. Um, for enterprise environments, I actually am a huge fan of Jenkins. Um, it's a tool that people really understand. Um, it has stood the test of time, right? I mean, people were using Hudson, but 15 years ago, maybe longer. And, you know, the way it works, hasn't really changed very much. I mean, Jenkins X is a little different, but, um, the UI and the way it works internally is pretty familiar to a lot of enterprise environments, which is great. >>And also in me, the plugin ecosystem is amazing. There's so many plugins for everything, and you can make your own if you know, Java groovy. I'm sure there's a perfect Kotlin in there, but I haven't tried myself, but it's really great. It's also really easy to write, um, CIS code, which is something I'm a big fan of. So Jenkins files have been, have worked really well for me. I, I know that I can get a little bit more complex as you start to build your own models and such, but, you know, for enterprise enterprise CIO CD, if you want, especially if you want to roll your own or own it yourself, um, Jenkins is the bellwether and for very good reason now for my personal projects. And I see a lot on the chat here, I think y'all, y'all been agreed with me get hub actions 100%, my favorite tool right now. >>Um, I love GitHub actions. It's, it's customizable, it's modular. There's a lot of plugins already. I started using getting that back maybe a week after when GA and there was no documentation or anything. And I still, it was still my favorite CIA tool even then. Um, and you know, the API is really great. There's a lot to love about GitHub actions and, um, and I, and I use it as much as I can from my personal project. So I still have a soft spot for Travis CAI. Um, you know, they got acquired and they're a little different now trying to see, I, I can't, I can't let it go. I just love it. But, um, yeah, I mean, when it comes to Seattle, those are my tools. So light me up in the comments I will respond. Yeah. >>I mean, I, I feel with you on the Travis, the, I think, cause I think that was my first time experiencing, you know, early days get hub open source and like a free CIA tool that I could describe. I think it was the ammo back then. I don't actually remember, but yeah, it was kind of an exciting time from my experience. There was like, oh, this is, this is just there as a service. And I could just use it. It doesn't, it's like get hub it's free from my open source stuff. And so it does have a soft spot in my heart too. So yeah. >>All right. We've got questions around, um, cam, so I'm going to ask some questions. We don't have to have these answers because sometimes they're going to be specific, but I want to call them out because people in chat may have missed that question. And there's probably, you know, that we have smart people in chat too. So there's probably someone that knows the answer to these things. If, if it's not us, um, they're asking about building Docker images in Kubernetes, which to me is always a sore spot because it's Kubernetes does not build images by default. It's not meant for that out of the gate. And, uh, what is the best way to do this without having to use privileged containers, which privileged containers just implying that yeah, you, you, it probably has more privileges than by default as a container in Kubernetes. And that is a hard thing because, uh, I don't, I think Docker doesn't lie to do that out of the gate. So I don't know if anyone has an immediate answer to that. That's a pretty technical one, but if you, if you know the answer to that in chat, call it out. >>Um, >>I had done this, uh, but I'm pretty sure I had to use a privileged, um, container and install the Docker Damon on the Kubernetes cluster. And I CA I can't give you a better solution. Um, I've done the same. So, >>Yeah, uh, Chavonne asks, um, back to the Jenkins thing, what's the easiest way to integrate Docker into a Jenkins CICB pipeline. And that's one of the challenges I find with Jenkins because I don't claim to be the expert on Jenkins. Is there are so many plugins because of this, of this such a huge ecosystem. Um, when you go searching for Docker, there's a lot that comes back, right. So I, I don't actually have a preferred way because every team I find uses it differently. Um, I don't know, is there a, do you know if there's a Jenkins preferred, a default plugin? I don't even know for Docker. Oh, go ahead. Yeah. Sorry for Docker. And jacon sorry, Docker plugins for Jenkins. Uh, as someone's asking like the preferred or easy way to do that. Um, and I don't, I don't know the back into Jenkins that well, so, >>Well, th the new, the new way that they're doing, uh, Docker builds with the pipeline, which is more declarative versus the groovy. It's really simple, and their documentation is really good. They, um, they make it really easy to say, run this in this image. So you can pull down, you know, public images and add your own layers. Um, so I don't know the name of that plugin, uh, but I can certainly take a minute after this session and going and get that. Um, but if you really are overwhelmed by the plugins, you can just write your, you know, your shell command in Jenkins. You could just by, you know, doing everything in bash, calling the Docker, um, Damon directly, and then getting it working just to see that end to end, and then start browsing for plugins to see if you even want to use those. >>The plugins will allow more integration from end to end. Some of the things that you input might be available later on in the process for having to manage that yourself. But, you know, you don't have to use any of the plugins. You can literally just, you know, do a block where you write your shell command and get it working, and then decide if, for plugins for you. Um, I think it's always under important to understand what is going on under the hood before you, before you adopt the magic of a plugin, because, um, once you have a problem, if you're, if it's all a lockbox to you, it's going to be more difficult to troubleshoot. It's kind of like learning, get command line versus like get cracking or something. Once, once you get in a bind, if you don't understand the underlying steps, it's really hard to get yourself out of a bind, versus if you understand what the plugin or the app is doing, then, um, you can get out of situations a lot easier. That's a good place. That's, that's where I'd start. >>Yeah. Thank you. Um, Camden asks better to build test environment images, every commit in CII. So this is like one of those opinions of we're all gonna have some different, uh, or build on build images on every commit, leveraging the cash, or build them once outside the test pile pipeline. Um, what say you people? >>Uh, well, I I've seen both and generally speaking, my preference is, um, I guess the ant, the it's a consultant answer, right? I think it depends on what you're trying to do, right. So if you have a lot of small changes that are being made and you're creating images for each of those commits, you're going to have a lot of images in your, in your registry, right? And on top of that, if you're building those images, uh, through CAI frequently, if you're using Docker hub or something like that, you might run into rate limiting issues because of Docker's new rate, limiting, uh, rate limits that they put in place. Um, but that might be beneficial if the, if being able to roll back between those small changes while you're testing is important to you. Uh, however, if all you care about is being able to use Docker images, um, or being able to correlate versions to your Docker images, or if you're the type of team that doesn't even use him, uh, does he even use, uh, virgins in your image tags? Then I would think that that might be a little, much you might want to just have in your CIO. You might want to have a stage that builds your Docker images and Docker image and pushes it into your registry, being done first particular branches instead of having to be done on every commit regardless of branch. But again, it really depends on the team. It really depends on what you're building. It really depends on your workflow. It can depend on a number of things like a curse sometimes too. Yeah. Yeah. >>Once had two points here, you know, I've seen, you know, the pattern has been at every, with every, uh, uh, commit, assuming that you have the right set of tests that would kind of, uh, you would benefit from actually seeing, um, the, the, the, the testing workflow go through and can detect any issue within, within the build or whatever you're trying to test against. But if you're just a building without the appropriate set of tests, then you're just basically consuming almond, adding time, as well as all the, the image, uh, stories associated with it without treaty reaping the benefit of, of, of this pattern. Uh, and the second point is, again, I think if you're, if you're going to end up doing a per commit, uh, definitely recommend having some type of, uh, uh, image purging, um, uh, and, and, and garbage collection process to ensure that you're not just wasting, um, all the stories needed and also, um, uh, optimizing your, your bill process, because that will end up being the most time-consuming, um, um, you know, within, within your pipeline. So this is my 2 cents on this. >>Yeah, that's good stuff. I mean, those are both of those are conversations that could lead us into the rabbit hole for the rest of the day on storage management, uh, you know, CP CPU minutes for, uh, you know, your build stuff. I mean, if you're in any size team, more than one or two people, you immediately run into headaches with cost of CIA, because we have now the problem of tools, right? We have so many tools. We can have the CIS system burning CPU cycles all day, every day, if we really wanted to. And so you re very quickly, I think, especially if you're on every commit on every branch, like that gets you into a world of cost mitigation, and you probably are going to have to settle somewhere in the middle on, uh, between the budget, people that are saying you're spending way too much money on the CII platform, uh, because of all these CPU cycles, and then the developers who would love to have everything now, you know, as fast as possible and the biggest, biggest CPU's, and the biggest servers, and have the bills, because the bills can never go fast enough, right. >>There's no end to optimizing your build workflow. Um, we have another question on that. This is another topic that we'll all probably have different takes on is, uh, basically, uh, version tags, right? So on images, we, we have a very established workflow in get for how we make commits. We have commit shots. We have, uh, you know, we know get tags and there's all these things there. And then we go into images and it's just this whole new world that's opened up. Like there's no real consensus. Um, so what, what are your thoughts on the strategy for teams in their image tag? Again, another, another culture thing. Um, commander, >>I mean, I'm a fan of silver when we have no other option. Um, it's just clean and I like the timestamp, you know, exactly when it was built. Um, I don't really see any reason to use another, uh, there's just normal, incremental, um, you know, numbering, but I love the fact that you can pull any tag and know exactly when it was created. So I'm a big fan of bar, if you can make that work for your organization. >>Yep. People are mentioned that in chat, >>So I like as well. Uh, I'm a big fan of it. I think it's easy to be able to just be as easy to be able to signify what a major changes versus a minor change versus just a hot fix or, you know, some or some kind of a bad fix. The problem that I've found with having teams adopt San Bernardo becomes answering these questions and being able to really define what is a major change, what is a minor change? What is a patch, right? And this becomes a bit of an overhead or not so much of an overhead, but, uh, uh, uh, a large concern for teams who have never done versioning before, or they never been responsible for their own versioning. Um, in fact, you know, I'm running into that right now, uh, with, with a client that I'm working with, where a lot, I'm working with a lot of teams, helping them move their applications from a legacy production environment into a new one. >>And in doing so, uh, versioning comes up because Docker images, uh, have tags and usually the tax correlate to versions, but some teams over there, some teams that I'm working with are only maintaining a script and others are maintaining a fully fledged JAK, three tier application, you know, with lots of dependencies. So telling the script, telling the team that maintains a script, Hey, you know, you should use somber and you should start thinking about, you know, what's major, what's my number what's patch. That might be a lot for them. And for someone or a team like that, I might just suggest using commit shots as your versions until you figure that out, or maybe using, um, dates as your version, but for the more for the team, with the larger application, they probably already know the answers to those questions. In which case they're either already using Sember or they, um, or they may be using some other version of the strategy and might be in December, might suit them better. So, um, you're going to hear me say, it depends a lot, and I'm just going to say here, it depends. Cause it really does. Carlos. >>I think you hit on something interesting beyond just how to version, but, um, when to consider it a major release and who makes those decisions, and if you leave it to engineers to version, you're kind of pushing business decisions down the pipe. Um, I think when it's a minor or a major should be a business decision and someone else needs to make that call someone closer to the business should be making that call as to when we want to call it major. >>That's a really good point. And I add some, I actually agree. Um, I absolutely agree with that. And again, it really depends on the team that on the team and the scope of it, it depends on the scope that they're maintaining, right? And so it's a business application. Of course, you're going to have a product manager and you're going to have, you're going to have a product manager who's going to want to make that call because that version is going to be out in marketing. People are going to use it. They're going to refer to and support calls. They're going to need to make those decisions. Sember again, works really, really well for that. Um, but for a team that's maintaining the scripts, you know, I don't know, having them say, okay, you must tell me what a major version is. It's >>A lot, but >>If they want it to use some birds great too, which is why I think going back to what you originally said, Sember in the absence of other options. I think that's a good strategy. >>Yeah. There's a, there's a, um, catching up on chat. I'm not sure if I'm ever going to catch up, but there's a lot of people commenting on their favorite CII systems and it's, and it, it just goes to show for the, the testing and deployment community. Like how many tools there are out there, how many tools there are to support the tools that you're using. Like, uh, it can be a crazy wilderness. And I think that's, that's part of the art of it, uh, is that these things are allowing us to build our workflows to the team's culture. Um, and, uh, but I do think that, you know, getting into like maybe what we hope to be at what's next is I do hope that we get to, to try to figure out some of these harder problems of consistency. Uh, one of the things that led me to Docker at the beginning to begin with was the fact that it wa it created a consistent packaging solution for me to get my code, you know, off of, off of my site of my local system, really, and into the server. >>And that whole workflow would at least the thing that I was making at each step was going to be the same thing used. Right. And that, that was huge. Uh, it was also, it also took us a long time to get there. Right. We all had to, like Docker was one of those ones that decade kind of ideas of let's solidify the, enter, get the consensus of the community around this idea. And we, and it's not perfect. Uh, you know, the Docker Docker file is not the most perfect way to describe how to make your app, but it is there and we're all using it. And now I'm looking for that next piece, right. Then hopefully the next step in that, um, that where we can all arrive at a consensus so that once you hop teams, you know, okay. We all knew Docker. We now, now we're all starting to get to know the manifests, but then there's this big gap in the middle where it's like, it might be one of a dozen things. Um, you know, so >>Yeah, yeah. To that, to that, Brett, um, you know, uh, just maybe more of a shameless plug here and wanting to kind of talk about one of the things that I'm on. So excited, but I work, I work at Tasha Corp. I don't know anyone, or I don't know if many people have heard of, um, you know, we tend to focus a lot on workflows versus technologies, right. Because, you know, as you can see, even just looking at the chat, there's, you know, ton of opinions on the different tooling, right. And, uh, imagine having, you know, I'm working with clients that have 10,000 developers. So imagine taking the folks in the chat and being partnered with one organization or one company and having to make decisions on how to build software. Um, but there's no way you can conversion one or, or one way or one tool, uh, and that's where we're facing in the industry. >>So one of the things that, uh, I'm pretty excited about, and I don't know if it's getting as much traction as you know, we've been focused on it. This is way point, which is a project, an open source project. I believe we got at least, uh, last year, um, which is, it's more of, uh, it's, it is aim to address that really, uh, uh, Brad set on, you know, to come to tool to, uh, make it extremely easy and simple. And, you know, to describe how you want to build, uh, deploy or release your application, uh, in, in a consistent way, regardless of the tools. So similar to how you can think of Terraform and having that pluggability to say Terraform apply or plan against any cloud infrastructure, uh, without really having to know exactly the details of how to do it, uh, this is what wave one is doing. Um, and it can be applied with, you know, for the CIA, uh, framework. So, you know, task plugability into, uh, you know, circle CEI tests to Docker helm, uh, Kubernetes. So that's the, you know, it's, it's a hard problem to solve, but, um, I'm hopeful that that's the path that we're, you know, we'll, we'll eventually get to. So, um, hope, you know, you can, you can, uh, see some of the, you know, information, data on it, on, on HashiCorp site, but I mean, I'm personally excited about it. >>Yeah. Uh I'm to gonna have to check that out. And, um, I told you on my live show, man, we'll talk about it, but talk about it for a whole hour. Uh, so there's another question here around, uh, this, this is actually a little bit more detailed, but it is one that I think a lot of people deal with and I deal with a lot too, is essentially the question is from Cameron, uh, D essentially, do you use compose in your CIO or not Docker compose? Uh, because yes I do. Yeah. Cause it, it, it, it solves so many problems am and not every CGI can, I don't know, there's some problems with a CIO is trying to do it for me. So there are pros and cons and I feel like I'm still on the fence about it because I use it all the time, but also it's not perfect. It's not always meant for CIA. And CIA sometimes tries to do things for you, like starting things up before you start other parts and having that whole order, uh, ordering problem of things anyway. W thoughts and when have thoughts. >>Yes. I love compose. It's one of my favorite tools of all time. Um, and the reason why it's, because what I often find I'm working with teams trying to actually let me walk that back, because Jack on the chat asked a really interesting question about what, what, what the hardest thing about CIS for a lot of teams. And in my experience, the hardest thing is getting teams to build an app that is the same app as what's built in production. A lot of CGI does things that are totally different than what you would do in your local, in your local dev. And as a result of that, you get, you got this application that either doesn't work locally, or it does work, but it's a completely different animal than what you would get in production. Right? So what I've found in trying to get teams to bridge that gap by basically taking their CGI, shifting the CII left, I hate the shift left turn, but I'll use it. >>I'm shifting the CIO left to your local development is trying to say, okay, how do we build an app? How do we, how do we build mot dependencies of that app so that we can build so that we can test our app? How do we run tests, right? How do we build, how do we get test data? And what I found is that trying to get teams to do all this in Docker, which is normally a first for a lot of teams that I'm working with, trying to get them all to do all of this. And Docker means you're running Docker, build a lot running Docker, run a lot. You're running Docker, RM a lot. You ran a lot of Docker, disparate Docker commands. And then on top of that, trying to bridge all of those containers together into a single network can be challenging without compose. >>So I like using a, to be able to really easily categorize and compartmentalize a lot of the things that are going to be done in CII, like building a Docker image, running tests, which is you're, you're going to do it in CII anyway. So running tests, building the image, pushing it to the registry. Well, I wouldn't say pushing it to the registry, but doing all the things that you would do in local dev, but in the same network that you might have a mock database or a mock S3 instance or some of something else. Um, so it's just easy to take all those Docker compose commands and move them into your Yammel file using the hub actions or your dankest Bob using Jenkins, or what have you. Right. It's really, it's really portable that way, but it doesn't work for every team. You know, for example, if you're just a team that, you know, going back to my script example, if it's a really simple script that does one thing on a somewhat routine basis, then that might be a lot of overhead. Um, in that case, you know, you can get away with just Docker commands. It's not a big deal, but the way I looked at it is if I'm, if I'm building, if I build something that's similar to a make bile or rate file, or what have you, then I'm probably gonna want to use Docker compose. If I'm working with Docker, that's, that's a philosophy of values, right? >>So I'm also a fan of Docker compose. And, um, you know, to your point, Carlos, the whole, I mean, I'm also a fan of shifting CEI lift and testing lift, but if you put all that logic in your CTI, um, it changes the L the local development experience from the CGI experience. Versus if you put everything in a compose file so that what you build locally is the same as what you build in CGI. Um, you're going to have a better experience because you're going to be testing something more, that's closer to what you're going to be releasing. And it's also very easy to look at a compose file and kind of, um, understand what the dependencies are and what's happening is very readable. And once you move that stuff to CGI, I think a lot of developers, you know, they're going to be intimidated by the CGI, um, whatever the scripting language is, it's going to be something they're going to have to wrap their head around. >>Um, but they're not gonna be able to use it locally. You're going to have to have another local solution. So I love the idea of a composed file use locally, um, especially if he can Mount the local workspace so that they can do real time development and see their changes in the exact same way as it's going to be built and tested in CGI. It gives developers a high level of confidence. And then, you know, you're less likely to have issues because of discrepancies between how it was built in your local test environment versus how it's built in NCI. And so Docker compose really lets you do all of that in a way that makes your solution more portable, portable between local dev and CGI and reduces the number of CGI cycles to get, you know, the test, the test data that you need. So that's why I like it for really, for local dev. >>It'll be interesting. Um, I don't know if you all were able to see the keynote, but there was a, there was a little bit, not a whole lot, but a little bit talk of the Docker, compose V two, which has now built into the Docker command line. And so now we're shifting from the Python built compose, which was a separate package. You could that one of the challenges was getting it into your CA solution because if you don't have PIP and you got down on the binary and the binary wasn't available for every platform and, uh, it was a PI installer. It gets a little nerdy into how that works, but, uh, and the team is now getting, be able to get unified with it. Now that it's in Golang and it's, and it's plugged right into the Docker command line, it hopefully will be easier to distribute, easier to, to use. >>And you won't have to necessarily have dependencies inside of where you're running it because there'll be a statically compiled binary. Um, so I've been playing with that, uh, this year. And so like training myself to do Docker going from Docker dash compose to Docker space, compose. It is a thing I I'm almost to the point of having to write a shell replacement. Yeah. Alias that thing. Um, but, um, I'm excited to see what that's going, cause there's already new features in it. And it, these built kit by default, like there's all these things. And I, I love build kit. We could make a whole session on build kit. Um, in fact there's actually, um, maybe going on right now, or right around this time, there is a session on, uh, from Solomon hikes, the seat, uh, co-founder of Docker, former CTO, uh, on build kit using, uh, using some other tool on top of build kit or whatever. >>So that, that would be interesting for those of you that are not watching that one. Cause you're here, uh, to do a check that one out later. Um, all right. So another good question was caching. So another one, another area where there is no wrong answers probably, and everyone has a different story. So the question is, what are your thoughts on CII build caching? There's often a debate between security. This is from Quentin. Thank you for this great question. There's often a debate between security reproducibility and build speeds. I haven't found a good answer so far. I will just throw my hat in the ring and say that the more times you want to build, like if you're trying to build every commit or every commit, if you're building many times a day, the more caching you need. So like the more times you're building, the more caching you're gonna likely want. And in most cases caching doesn't bite you in the butt, but that could be, yeah, we, can we get the bit about that? So, yeah. Yeah. >>I'm going to quote Carlos again and say, it depends on, on, you know, how you're talking, you know, what you're trying to build and I'm quoting your colors. Um, yeah, it's, it's got, it's gonna depend because, you know, there are some instances where you definitely want to use, you know, depends on the frequency that you're building and how you're building. Um, it's you would want to actually take advantage of cashing functionalities, um, for the build, uh, itself. Um, but if, um, you know, as you mentioned, there could be some instances where you would want to disable, um, any caching because you actually want to either pull a new packages or, um, you know, there could be some security, um, uh, disadvantages related to security aspects that would, you know, you know, using a cache version of, uh, image layer, for example, could be a problem. And you, you know, if you have a fleet of build, uh, engines, you don't have a good grasp of where they're being cashed. We would have to, um, disable caching in that, in that, um, in those instances. So it, it would depend. >>Yeah, it's, it's funny you have that problem on both sides of cashing. Like there are things that, especially in Docker world, they will cash automatically. And, and then, and then you maybe don't realize that some of that caching could be bad. It's, it's actually using old, uh, old assets, old artifacts, and then there's times where you would expect it to cash, that it doesn't cash. And then you have to do something extra to enable that caching, especially when you're dealing with that cluster of, of CIS servers. Right. And the cloud, the whole clustering problem with caching is even more complex, but yeah, >>But that's, that's when, >>Uh, you know, ever since I asked you to start using build kits and able to build kit, you know, between it's it's it's reader of Boston in, in detecting word, you know, where in, in the bill process needs to cash, as well as, uh, the, the, um, you know, the process. I don't think I've seen any other, uh, approach there that comes close to how efficient, uh, that process can become how much time it can actually save. Uh, but again, I think, I think that's, for me that had been my default approach, unless I actually need something that I would intentionally to disable caching for that purpose, but the benefits, at least for me, the benefits of, um, how bill kit actually been processing my bills, um, from the builds as well as, you know, using the cash up until, you know, how it detects the, the difference in, in, in the assets within the Docker file had been, um, you know, uh, pretty, you know, outweigh the disadvantages that it brings in. So it, you know, take it each case by case. And based on that, determine if you want to use it, but definitely recommend those enabling >>In the absence of a reason not to, um, I definitely think that it's a good approach in terms of speed. Um, yeah, I say you cash until you have a good reason not to personally >>Catch by default. There you go. I think you catch by default. Yeah. Yeah. And, uh, the trick is, well, one, it's not always enabled by default, especially when you're talking about cross server. So that's a, that's a complexity for your SIS admins, or if you're on the cloud, you know, it's usually just an option. Um, I think it also is this, this veers into a little bit of, uh, the more you cash the in a lot of cases with Docker, like the, from like, if you're from images and checked every single time, if you're not pinning every single thing, if you're not painting your app version, you're at your MPN versions to the exact lock file definition. Like there's a lot of these things where I'm I get, I get sort of, I get very grouchy with teams that sort of let it, just let it all be like, yeah, we'll just build two images and they're totally going to have different dependencies because someone happened to update that thing and after whatever or MPM or, or, and so I get grouchy about that, cause I want to lock it all down, but I also know that that's going to create administrative burden. >>Like the team is now going to have to manage versions in a very much more granular way. Like, do we need to version two? Do we need to care about curl? You know, all that stuff. Um, so that's, that's kind of tricky, but when you get to, when you get to certain version problems, uh, sorry, uh, cashing problems, you, you, you don't want those set those caches to happen because it, if you're from image changes and you're not constantly checking for a new image, and if you're not pinning that V that version, then now you, you don't know whether you're getting the latest version of Davion or whatever. Um, so I think that there's, there's an art form to the more you pen, the less you have, the less, you have to be worried about things changing, but the more you pen, the, uh, all your versions of everything all the way down the stack, the more administrative stuff, because you're gonna have to manually change every one of those. >>So I think it's a balancing act for teams. And as you mature, I to find teams, they tend to pin more until they get to a point of being more comfortable with their testing. So the other side of this argument is if you trust your testing, then you, and you have better testing to me, the less likely to the subtle little differences in versions have to be penned because you can get away with those minor or patch level version changes. If you're thoroughly testing your app, because you're trusting your testing. And this gets us into a whole nother rant, but, uh, yeah, but talking >>About penny versions, if you've got a lot of dependencies isn't that when you would want to use the cash the most and not have to rebuild all those layers. Yeah. >>But if you're not, but if you're not painting to the exact patch version and you are caching, then you're not technically getting the latest versions because it's not checking for all the time. It's a weird, there's a lot of this subtle nuance that people don't realize until it's a problem. And that's part of the, the tricky part of allow this stuff, is it, sometimes the Docker can be almost so much magic out of the box that you, you, you get this all and it all works. And then day two happens and you built it a second time and you've got a new version of open SSL in there and suddenly it doesn't work. Um, so anyway, uh, that was a great question. I've done the question on this, on, uh, from heavy. What do you put, where do you put testing in your pipeline? Like, so testing the code cause there's lots of types of testing, uh, because this pipeline gets longer and longer and Docker building images as part of it. And so he says, um, before staging or after staging, but before production, where do you put it? >>Oh man. Okay. So, um, my, my main thought on this is, and of course this is kind of religious flame bait, so sure. You know, people are going to go into the compensation wrong. Carlos, the boy is how I like to think about it. So pretty much in every stage or every environment that you're going to be deploying your app into, or that your application is going to touch. My idea is that there should be a build of a Docker image that has all your applications coded in, along with its dependencies, there's testing that tests your application, and then there's a deployment that happens into whatever infrastructure there is. Right. So the testing, they can get tricky though. And the type of testing you do, I think depends on the environment that you're in. So if you're, let's say for example, your team and you have, you have a main branch and then you have feature branches that merged into the main branch. >>You don't have like a pre-production branch or anything like that. So in those feature branches, whenever I'm doing CGI that way, I know when I freak, when I cut my poll request, that I'm going to merge into main and everything's going to work in my feature branches, I'm going to want to probably just run unit tests and maybe some component tests, which really, which are just, you know, testing that your app can talk to another component or another part, another dependency, like maybe a database doing tests like that, that don't take a lot of time that are fascinating and right. A lot of would be done at the beach branch level and in my opinion, but when you're going to merge that beach branch into main, as part of a release in that activity, you're going to want to be able to do an integration tasks, to make sure that your app can actually talk to all the other dependencies that it talked to. >>You're going to want to do an end to end test or a smoke test, just to make sure that, you know, someone that actually touches the application, if it's like a website can actually use the website as intended and it meets the business cases and all that, and you might even have testing like performance testing, low performance load testing, or security testing, compliance testing that would want to happen in my opinion, when you're about to go into production with a release, because those are gonna take a long time. Those are very expensive. You're going to have to cut new infrastructure, run those tests, and it can become quite arduous. And you're not going to want to run those all the time. You'll have the resources, uh, builds will be slower. Uh, release will be slower. It will just become a mess. So I would want to save those for when I'm about to go into production. Instead of doing those every time I make a commit or every time I'm merging a feature ranch into a non main branch, that's the way I look at it, but everything does a different, um, there's other philosophies around it. Yeah. >>Well, I don't disagree with your build test deploy. I think if you're going to deploy the code, it needs to be tested. Um, at some level, I mean less the same. You've got, I hate the term smoke tests, cause it gives a false sense of security, but you have some mental minimum minimal amount of tests. And I would expect the developer on the feature branch to add new tests that tested that feature. And that would be part of the PR why those tests would need to pass before you can merge it, merge it to master. So I agree that there are tests that you, you want to run at different stages, but the earlier you can run the test before going to production. Um, the fewer issues you have, the easier it is to troubleshoot it. And I kind of agree with what you said, Carlos, about the longer running tests like performance tests and things like that, waiting to the end. >>The only problem is when you wait until the end to run those performance tests, you kind of end up deploying with whatever performance you have. It's, it's almost just an information gathering. So if you don't run your performance test early on, um, and I don't want to go down a rabbit hole, but performance tests can be really useless if you don't have a goal where it's just information gap, uh, this is, this is the performance. Well, what did you expect it to be? Is it good? Is it bad? They can get really nebulous. So if performance is really important, um, you you're gonna need to come up with some expectations, preferably, you know, set up the business level, like what our SLA is, what our response times and have something to shoot for. And then before you're getting to production. If you have targets, you can test before staging and you can tweak the code before staging and move that performance initiative. Sorry, Carlos, a little to the left. Um, but if you don't have a performance targets, then it's just a check box. So those are my thoughts. I like to test before every deployment. Right? >>Yeah. And you know what, I'm glad that you, I'm glad that you brought, I'm glad that you brought up Escalades and performance because, and you know, the definition of performance says to me, because one of the things that I've seen when I work with teams is that oftentimes another team runs a P and L tests and they ended, and the development team doesn't really have too much insight into what's going on there. And usually when I go to the performance team and say, Hey, how do you run your performance test? It's usually just a generic solution for every single application that they support, which may or may not be applicable to the application team that I'm working with specifically. So I think it's a good, I'm not going to dig into it. I'm not going to dig into the rabbit hole SRE, but it is a good bridge into SRE when you start trying to define what does reliability mean, right? >>Because the reason why you test performance, it's test reliability to make sure that when you cut that release, that customers would go to your site or use your application. Aren't going to see regressions in performance and are not going to either go to another website or, you know, lodge in SLA violation or something like that. Um, it does, it does bridge really well with defining reliability and what SRE means. And when you have, when you start talking about that, that's when you started talking about how often do I run? How often do I test my reliability, the reliability of my application, right? Like, do I have nightly tasks in CGI that ensure that my main branch or, you know, some important branch I does not mean is meeting SLA is meeting SLR. So service level objectives, um, or, you know, do I run tasks that ensure that my SLA is being met in production? >>Like whenever, like do I use, do I do things like game days where I test, Hey, if I turn something off or, you know, if I deploy this small broken code to production and like what happens to my performance? What happens to my security and compliance? Um, you can, that you can go really deep into and take creating, um, into creating really robust tests that cover a lot of different domains. But I liked just using build test deploy is the overall answer to that because I find that you're going to have to build your application first. You're going to have to test it out there and build it, and then you're going to want to deploy it after you test it. And that order generally ensures that you're releasing software. That works. >>Right. Right. Um, I was going to ask one last question. Um, it's going to have to be like a sentence answer though, for each one of you. Uh, this is, uh, do you lint? And if you lint, do you lent all the things, if you do, do you fail the linters during your testing? Yes or no? I think it's going to depend on the culture. I really do. Sorry about it. If we >>Have a, you know, a hook, uh, you know, on the get commit, then theoretically the developer can't get code there without running Melinta anyway, >>So, right, right. True. Anyone else? Anyone thoughts on that? Linting >>Nice. I saw an additional question online thing. And in the chat, if you would introduce it in a multi-stage build, um, you know, I was wondering also what others think about that, like typically I've seen, you know, with multi-stage it's the most common use case is just to produce the final, like to minimize the, the, the, the, the, the image size and produce a final, you know, thin, uh, layout or thin, uh, image. Uh, so if it's not for that, like, I, I don't, I haven't seen a lot of, you know, um, teams or individuals who are actually within a multi-stage build. There's nothing really against that, but they think the number one purpose of doing multi-stage had been just producing the minimalist image. Um, so just wanted to kind of combine those two answers in one, uh, for sure. >>Yeah, yeah, sure. Um, and with that, um, thank you all for the great questions. We are going to have to wrap this up and we could go for another hour if we all had the time. And if Dr. Khan was a 24 hour long event and it didn't sadly, it's not. So we've got to make room for the next live panel, which will be Peter coming on and talking about security with some developer ex security experts. And I wanted to thank again, thank you all three of you for being here real quick, go around the room. Um, uh, where can people reach out to you? I am, uh, at Bret Fisher on Twitter. You can find me there. Carlos. >>I'm at dev Mandy with a Y D E N D Y that's me, um, >>Easiest name ever on Twitter, Carlos and DFW on LinkedIn. And I also have a LinkedIn learning course. So if you check me out on my LinkedIn learning, >>Yeah. I'm at Nicola Quebec. Um, one word, I'll put it in the chat as well on, on LinkedIn, as well as, uh, uh, as well as Twitter. Thanks for having us, Brett. Yeah. Thanks for being here. >>Um, and, and you all stay around. So if you're in the room with us chatting, you're gonna, you're gonna, if you want to go to see the next live panel, I've got to go back to the beginning and do that whole thing, uh, and find the next, because this one will end, but we'll still be in chat for a few minutes. I think the chat keeps going. I don't actually know. I haven't tried it yet. So we'll find out here in a minute. Um, but thanks you all for being here, I will be back a little bit later, but, uh, coming up next on the live stuff is Peter Wood security. Ciao. Bye.

Published Date : May 28 2021

SUMMARY :

Uh, thank you so much to my guests welcoming into the panel. Virginia, and, uh, I make videos on the internet and courses on you to me, So, um, it's been fun and I'm excited to meet with all of you and talk Uh, just, uh, you know, keeping that, to remember all the good days, um, uh, moving into DX to try and help developers better understand and use our products And so for those of you in chat, the reason we're doing this So feel free to, um, ask what you think is on the top of your And don't have to go talk to a person to run that Um, and so being the former QA on the team, So, um, uh, Carlos, And, you know, So, uh, Nico 81st thoughts on that? kind of the scope that had, uh, you know, now in conferences, what we're using, uh, you know, whether your favorite tools. if you want to do something, you don't have to write the code it's already been tested. You got to unmute. And, you know, the way it works, enterprise CIO CD, if you want, especially if you want to roll your own or own it yourself, um, Um, and you know, the API is really great. I mean, I, I feel with you on the Travis, the, I think, cause I think that was my first time experiencing, And there's probably, you know, And I CA I can't give you a better solution. Um, when you go searching for Docker, and then start browsing for plugins to see if you even want to use those. Some of the things that you input might be available later what say you people? So if you have a lot of small changes that are being made and time-consuming, um, um, you know, within, within your pipeline. hole for the rest of the day on storage management, uh, you know, CP CPU We have, uh, you know, we know get tags and there's Um, it's just clean and I like the timestamp, you know, exactly when it was built. Um, in fact, you know, I'm running into that right now, telling the script, telling the team that maintains a script, Hey, you know, you should use somber and you should start thinking I think you hit on something interesting beyond just how to version, but, um, when to you know, I don't know, having them say, okay, you must tell me what a major version is. If they want it to use some birds great too, which is why I think going back to what you originally said, a consistent packaging solution for me to get my code, you know, Uh, you know, the Docker Docker file is not the most perfect way to describe how to make your app, To that, to that, Brett, um, you know, uh, just maybe more of So similar to how you can think of Terraform and having that pluggability to say Terraform uh, D essentially, do you use compose in your CIO or not Docker compose? different than what you would do in your local, in your local dev. I'm shifting the CIO left to your local development is trying to say, you know, you can get away with just Docker commands. And, um, you know, to your point, the number of CGI cycles to get, you know, the test, the test data that you need. Um, I don't know if you all were able to see the keynote, but there was a, there was a little bit, And you won't have to necessarily have dependencies inside of where you're running it because So that, that would be interesting for those of you that are not watching that one. I'm going to quote Carlos again and say, it depends on, on, you know, how you're talking, you know, And then you have to do something extra to enable that caching, in, in the assets within the Docker file had been, um, you know, Um, yeah, I say you cash until you have a good reason not to personally uh, the more you cash the in a lot of cases with Docker, like the, there's an art form to the more you pen, the less you have, So the other side of this argument is if you trust your testing, then you, and you have better testing to the cash the most and not have to rebuild all those layers. And then day two happens and you built it a second And the type of testing you do, which really, which are just, you know, testing that your app can talk to another component or another you know, someone that actually touches the application, if it's like a website can actually Um, the fewer issues you have, the easier it is to troubleshoot it. So if you don't run your performance test early on, um, and you know, the definition of performance says to me, because one of the things that I've seen when I work So service level objectives, um, or, you know, do I run Hey, if I turn something off or, you know, if I deploy this small broken code to production do you lent all the things, if you do, do you fail the linters during your testing? So, right, right. And in the chat, if you would introduce it in a multi-stage build, And I wanted to thank again, thank you all three of you for being here So if you check me out on my LinkedIn Um, one word, I'll put it in the chat as well on, Um, but thanks you all for being here,

ENTITIES

Entity	Category	Confidence
Carlos Nunez	PERSON	0.99+
Carla	PERSON	0.99+
Carlos	PERSON	0.99+
Brett	PERSON	0.99+
Dallas	LOCATION	0.99+
Houston	LOCATION	0.99+
Nico	PERSON	0.99+
Virginia Beach	LOCATION	0.99+
Chavonne	PERSON	0.99+
San Francisco	LOCATION	0.99+
December	DATE	0.99+
Mandy	PERSON	0.99+
Khobar	PERSON	0.99+
Carlin	PERSON	0.99+
Jack	PERSON	0.99+
Seattle	LOCATION	0.99+
CIA	ORGANIZATION	0.99+
two points	QUANTITY	0.99+
24 hour	QUANTITY	0.99+
Tasha Corp.	ORGANIZATION	0.99+
Pierre	PERSON	0.99+
Patrick Corp	ORGANIZATION	0.99+
Peter	PERSON	0.99+
Jenkins X	TITLE	0.99+
second point	QUANTITY	0.99+
second challenge	QUANTITY	0.99+
Python	TITLE	0.99+
Docker	TITLE	0.99+
2 cents	QUANTITY	0.99+
10,000 developers	QUANTITY	0.99+
LinkedIn	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Austin, Texas	LOCATION	0.99+
Cameron	PERSON	0.99+
two images	QUANTITY	0.99+
one	QUANTITY	0.99+
15 years	QUANTITY	0.99+
Jenkins	TITLE	0.99+
Khan	PERSON	0.99+
HashiCorp	ORGANIZATION	0.99+
Twitter	ORGANIZATION	0.99+
each case	QUANTITY	0.99+
Brad	PERSON	0.99+
first	QUANTITY	0.99+
three ideas	QUANTITY	0.99+
this year	DATE	0.99+
Quentin	PERSON	0.98+
both sides	QUANTITY	0.98+
Tim	PERSON	0.98+
last year	DATE	0.98+
20 years	QUANTITY	0.98+
Camden	PERSON	0.98+
each step	QUANTITY	0.98+
Two more times	QUANTITY	0.98+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for JAK: