Laura Sellers, Collibra | Data Citizens 22
>> Welcome to theCUBE's Virtual Coverage of Data Citizens 2022. My name is Dave Vellante and I'm here with Laura Sellers who is the Chief Product Officer at Collibra, the host of Data Citizens, Laura, welcome. Good to see you. >> Thank you. Nice to be here. >> Yeah, your keynote at Data Citizens this year focused on you know, your mission to drive ease of use and scale. Now, when I think about historically fast access to the right data at the right time in a form that's really easily consumable it's been kind of challenging especially for business users. Can you explain to our audience why this matters so much and what's actually different today in the data ecosystem to make this a reality? >> Yeah, definitely. So I think what we really need and what I hear from customers every single day is that we need a new approach to data management and our product teams. What inspired me to come to Collibra a little bit over a year ago, was really the fact that they're very focused on bringing trusted data to more users across more sources for more use cases. And so as we look at what we're announcing with these innovations of ease of use and scale it's really about making teams more productive in getting started with and the ability to manage data across the entire organization. So we've been very focused on richer experiences, a broader ecosystem of partners, as well as a platform that delivers performance, scale and security that our users and teams need and demand. So as we look at, oh, go ahead. >> I was going to say, you know, when I look back at like the last 10 years it was all about getting the technology to work and it was just so complicated, but, but please carry on. I'd love to hear more about this. >> Yeah, I really, you know, Collibra is a system of engagement for data and we really are working on bringing that entire system of engagement to life for everyone to leverage here and now. So what we're announcing from our ease of use side of the world is first our data marketplace. This is the ability for all users to discover and access data quickly and easily shop for it, if you will. The next thing that we're also introducing is the new homepage. It's really about the ability to drive adoption and have users find data more quickly. And then the two more areas of the ease of use side of the world is our world of usage analytics. And one of the big pushes and passions we have at Collibra is to help with this data-driven culture that all companies are trying to create. And also helping with data literacy. With something like usage analytics, it's really about driving adoption of the Collibra platform, understanding what's working, who's accessing it, what's not. And then finally we're also introducing what's called Workflow Designer. And we love our workflows at Collibra, it's a big differentiator to be able to automate business processes. The Designer is really about a way for more people to be able to create those workflows, collaborate on those workflows, as well as people to be able to easily interact with them. So a lot of of exciting things when it comes to ease of use to make it easier for all users to find data. >> Yes, there's definitely a lot to unpack there. You know, you mentioned this idea of shopping for the data. That's interesting to me. Why this analogy, metaphor or analogy, I always get those confused. Let's go with analogy. Why is it so important to data consumers? >> I think when you look at the world of data, and I talked about this system of engagement, it's really about making it more accessible to the masses. And what users are used to is a shopping experience like your Amazon, if you will. And so having a consumer grade experience where users can quickly go in and find the data, trust that data, understand where the data's coming from and then be able to quickly access it, is the idea of being able to shop for it. Just making it as simple as possible and really speeding the time to value for any of the business analysts, data analysts out there. >> Yeah, I think you see a lot of discussion about rethinking data architectures, putting data in the hands of the users and business people, decentralized data and of course that's awesome. I love that. But of course then you have to have self-service infrastructure and you have to have governance. And those are really challenging. And I think so many organizations they're facing adoption challenges. You know, when it comes to enabling teams generally, especially domain experts to adopt new data technologies you know, like the tech comes fast and furious. You got all these open source projects and you get really confusing. Of course it risks security, governance and all that good stuff. You got all this jargon. So where do you see, you know, the friction in adopting new data technologies? What's your point of view, and how can organizations overcome these challenges? >> You're, you're dead on. There's so much technology and there's so much to stay on top of, which is part of the friction, right? Is just being able to stay ahead of and understand all the technologies that are coming. You also look at it as there's so many more sources of data and people are migrating data to the cloud and they're migrating to new sources. Where the friction comes is really that ability to understand where the data came from, where it's moving to and then also to be able to put the access controls on top of it. So people are only getting access to the data that they should be getting access to. So one of the other things we're announcing with, with all of the innovations that are coming is what we're doing around performance and scale. So with all of the data movement, with all of the data that's out there, the first thing we're launching in the world of performance and scale is our world of data quality. It's something that Collibra has been working on for the past year and a half, but we're launching the ability to have data quality in the cloud. So it's currently an on-premise offering, but we'll now be able to carry that over into the cloud for us to manage that way. We're also introducing the ability to push down data quality into Snowflake. So this is, again, one of those challenges is making sure that that data that you have is, is high quality as you move forward. And so really another, we're just reducing friction. You already have Snowflake stood up, it's not another machine for you to manage, it's just push-down capabilities into Snowflake to be able to track that quality. Another thing that we're launching with that is what we call Collibra Protect. And this is that ability for users to be able to ingest metadata, understand where the PII data is and then set policies up on top of it. So very quickly be able to set policies and have them enforced at the data level. So anybody in the organization is only getting access to the data they should have access to. >> This topic of data quality is interesting. It's something that I've followed for a number of years. It used to be a back office function, you know and really confined only to highly regulated industries like financial services and healthcare and government. You know, you look back over a decade ago, you didn't have this worry about personal information, GDPR, and you know, California Consumer Privacy Act all becomes so much important. The cloud is really changed things in terms of performance and scale. And of course partnering for, with Snowflake, it's all about sharing data and monetization anything but a back office function. So it was kind of smart that you guys were early on and of course attracting them and as an investor as well was very strong validation. What can you tell us about the nature of the relationship with Snowflake and specifically interested in sort of joint engineering and product innovation efforts, you know, beyond the standard go-to-market stuff? >> Definitely. So you mentioned there were a strategic investor in Collibra about a year ago. A little less than that I guess. We've been working with them though for over a year really tightly with their product and engineering teams to make sure that Collibra is adding real value. Our unified platform is touching pieces of, our unified platform are touching all pieces of Snowflake. And when I say that, what I mean is we're first, you know, able to ingest data with Snowflake, which which has always existed. We're able to profile and classify that data. We're announcing with Collibra Protect this week that you're now able to create those policies on top of Snowflake and have them enforced. So again, people can get more value out of their Snowflake more quickly, as far as time to value with our policies for all business users to be able to create. We're also announcing Snowflake Lineage 2.0. So this is the ability to take stored procedures in Snowflake and understand the lineage of where did the data come from, how was it transformed, within Snowflake as well as the data quality push-down, as I mentioned, data quality, you brought it up. It is a new, it is a big industry push and you know, one of the things I think Gartner mentioned is people are losing up to $15 million dollars without having great data quality. So this push-down capability for Snowflake really is again a big ease of use push for us at Collibra of that ability to, to push it into Snowflake, take advantage of the data, the data source and the engine that already lives there, and get the right, and make sure you have the right quality. >> I mean the nice thing about Snowflake if you play in the Snowflake sandbox, you, you can get sort of a, you know, high degree of confidence that the data sharing can be done in a safe way. Bringing, you know, Collibra into the, into the story allows me to have that data quality and and that governance that I, that I need. You know, we've said many times on theCUBE that one of the notable differences in cloud this decade versus last decade I mean there are obvious differences just in terms of scale and scope, but it's shaping up to be about the strength of the ecosystems. That's really a hallmark of these big cloud players. I mean they're, it's a key factor for innovating, accelerating product delivery, filling gaps in in the hyperscale offerings. Because you got more stack, you know, mature stack capabilities and you know, that creates this flywheel momentum as we often say. But, so my question is, how do you work with the hyperscalers? Like whether it's AWS or Google or whomever, and what do you see as your role and what's the Collibra sweet spot? >> Yeah, definitely. So, you know, one of the things I mentioned early on is the broader ecosystem of partners is what it's all about. And so we have that strong partnership with Snowflake. We also are doing more with Google around, you know, GCP and Collibra Protect there, but also tighter Dataplex integration. So similar to what you've seen with our strategic moves around Snowflake, and really covering the broad ecosystem of what Collibra can do on top of that data source. We're extending that to the world of Google as well and the world of Dataplex. We also have great partners in SI's. Infosys is somebody we spoke with at the conference who's done a lot of great work with Levi's, as they're really important to help people with their whole data strategy and driving that data-driven culture and and Collibra being the core of it. >> Hi Laura, we're going to, we're going to end it there but I wonder if you could kind of put a bow on, you know, this year, the event your, your perspectives. So just give us your closing thoughts. >> Yeah, definitely. So I, I want to say this is one of the biggest releases Collibra's ever had. Definitely the biggest one since I've been with the company a little over a year. We have all these great new product innovations coming to really drive the ease of use, to make data more valuable for users everywhere and, and companies everywhere. And so it's all about everybody being able to easily find, understand and trust and get access to that data going forward. >> Well congratulations on all the progress. It was great to have you on theCUBE. First time, I believe. And really appreciate you, you taking the time with us. >> Yes, thank you, for your time. >> You're very welcome. Okay, you're watching the coverage of Data Citizens 2022 on theCUBE your leader in enterprise and emerging tech coverage.
SUMMARY :
the host of Data Citizens, Nice to be here. in the data ecosystem the ability to manage data the technology to work at Collibra is to help with Why is it so important to data consumers? and really speeding the time to value But of course then you have to have the ability to have data and really confined only to and the engine that already lives there, into the story allows me to and the world of Dataplex. of put a bow on, you know, and get access to that data going forward. on all the progress. of Data Citizens 2022 on theCUBE
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Laura | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Laura Sellers | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Collibra | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
California Consumer Privacy Act | TITLE | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
GDPR | TITLE | 0.99+ |
Infosys | ORGANIZATION | 0.99+ |
Snowflake | ORGANIZATION | 0.99+ |
Dataplex | ORGANIZATION | 0.99+ |
one | QUANTITY | 0.99+ |
first | QUANTITY | 0.98+ |
Data Citizens | ORGANIZATION | 0.97+ |
this year | DATE | 0.97+ |
this week | DATE | 0.95+ |
Levi's | ORGANIZATION | 0.94+ |
Snowflake | TITLE | 0.94+ |
past year and a half | DATE | 0.94+ |
First time | QUANTITY | 0.94+ |
Gartner | ORGANIZATION | 0.93+ |
last decade | DATE | 0.93+ |
two more areas | QUANTITY | 0.91+ |
today | DATE | 0.91+ |
GCP | ORGANIZATION | 0.86+ |
up to $15 million dollars | QUANTITY | 0.86+ |
a year ago | DATE | 0.85+ |
first thing | QUANTITY | 0.83+ |
Data Citizens 22 | ORGANIZATION | 0.83+ |
about a year ago | DATE | 0.83+ |
over a decade ago | DATE | 0.82+ |
Collibra Protect | ORGANIZATION | 0.82+ |
over a year | QUANTITY | 0.81+ |
theCUBE | ORGANIZATION | 0.81+ |
Snowflake | EVENT | 0.8+ |
Snowf | TITLE | 0.79+ |
Data Citizens 2022 | EVENT | 0.76+ |
over | DATE | 0.72+ |
last 10 years | DATE | 0.7+ |
Data | EVENT | 0.67+ |
Snowflake Lineage 2.0 | TITLE | 0.64+ |
Protect | COMMERCIAL_ITEM | 0.63+ |
decade | DATE | 0.62+ |
single day | QUANTITY | 0.62+ |
Data Citizens 2022 | TITLE | 0.53+ |
Citizens | ORGANIZATION | 0.52+ |
Stijn Christiaens, Collibra, Data Citizens 22
(Inspiring rock music) >> Hey everyone, I'm Lisa Martin covering Data Citizens 22 brought to you by Collibra. This next conversation is going to focus on the importance of data culture. One of our Cube alumni is back; Stan Christians is Collibra's co-founder and it's Chief Data citizen. Stan, it's great to have you back on theCUBE. >> Hey Lisa, nice to be here. >> So we're going to be talking about the importance of data culture, data intelligence, maturity all those great things. When we think about the data revolution that every business is going through, you know, it's so much more than technology innovation; it also really requires cultural transformation, community transformation. Those are challenging for customers to undertake. Talk to us about what you mean by data citizenship and the role that creating a data culture plays in that journey. >> Right. So as you know, our event is called Data Citizens because we believe that, in the end, a data citizen is anyone who uses data to do their job. And we believe that today's organizations you have a lot of people, most of the employees in an organization, are somehow going to be a data citizen, right? So you need to make sure that these people are aware of it, you need to make sure that these people have the skills and competencies to do with data what is necessary, and that's on all levels, right? So what does it mean to have a good data culture? It means that if you're building a beautiful dashboard to try and convince your boss we need to make this decision, that your boss is also open to and able to interpret, you know, the data presented in the dashboard to actually make that decision and take that action. Right? And once you have that "Why" to the organization that's when you have a good data culture. That's a continuous effort for most organizations because they're always moving somehow, they're hiring new people. And it has to be a continuous effort because we've seen that, on the one hand, organizations continue to be challenged with controlling their data sources and where all the data is flowing right? Which in itself creates lot of risk, but also on the other hand of the equation, you have the benefits, you know, you might look at regulatory drivers like we have to do this, right? But it's, it's much better right now to consider the competitive drivers for example. And we did an IDC study earlier this year, quite interesting, I can recommend anyone to read it, and one of the conclusions they found as they surveyed over a thousand people across organizations worldwide, is that the ones who are higher in maturity, so the organizations that really look at data as an asset, look at data as a product and actively try to be better at it don't have three times as good a business outcome as the ones who are lower on the maturity scale, right? So you can say, okay, I'm doing this, you know, data culture for everyone, awakening them up as data citizens. I'm doing this for competitive reasons. I'm doing this for regulatory reasons. You're trying to bring both of those together. And the ones that get data intelligence, right, are just going to be more successful and more competitive. That's our view and that's what we're seeing out there in the market. >> Absolutely. We know that just generally, Stan, right, The organizations that are really creating a a data culture and enabling everybody within the organization to become data citizens are, we know that, in theory, they're more competitive, they're more successful, But the IDC study that you just mentioned demonstrates they're three times more successful and competitive than their peers. Talk about how Collibra advises customers to create that community, that culture of data when it might be challenging for an organization to adapt culturally. >> Of course, of course it's difficult for an organization to adapt, but it's also necessary as you just said, imagine that, you know, you're a modern day organization, phones, laptops, what have you. You're not using those IT assets, right? Or you know, you're delivering them throughout the organization, but not enabling your colleagues to actually do something with that asset. Same thing is true with data today, right, if you're not properly using the data asset, and your competitors are, they're going to get more advantage. So as to how you get this done or how you establish this culture there's a few angles to look at, I would say. So one angle is obviously the leadership angle whereby whoever is the boss of data in the organization you typically have multiple bosses there, like a chief Data Officer, sometimes there's multiple, but they may have a different title, right? So I'm just going to summarize it as a data leader for a second. So whoever that is, they need to make sure that there's a clear vision, a clear strategy for data. And that strategy needs to include the monetization aspect. How are you going to get value from data? >> Lisa: Yes. >> Now, that's one part because then you can clearly see the example of your leadership in the organization, and also the business value, and that's important because those people, their job, in essence, really is to make everyone in the organization think about data as an asset. And I think that's the second part of the equation of getting that go to right is it's not enough to just have that leadership out there but you also have to get the hearts and minds of the data champions across the organization. You really have to win them over. And if you have those two combined, and obviously good technology to, you know, connect those people and have them execute on their responsibilities such as a data intelligence platform like ePlus, then you have the pieces in place to really start upgrading that culture inch by inch, if you will. >> Yes, I like that. The recipe for success. So you are the co-founder of Collibra. You've worn many different hats along this journey. Now you're building Collibra's own data office. I like how, before we went live, we were talking about Collibra is drinking its own champagne. I always loved to hear stories about that. You're speaking at Data Citizens 2022. Talk to us about how you are building a data culture within Collibra and what, maybe some of the specific projects are that Collibra's data office is working on. >> Yes. And it is indeed data citizens. There are a ton of speakers here, very excited. You know, we have Barb from MIT speaking about data monetization. We have DJ Patil at the last minute on the agenda so really exciting agenda, can't wait to get back out there. But essentially you're right. So over the years at Collibra, we've been doing this now since 2008, so a good 15 years, and I think we have another decade of work ahead in the market, just to be very clear. Data is here to stick around, as are we, and myself, you know, when you start a company we were four people in a garage, if you will, so everybody's wearing all sorts of hat at that time. But over the years I've run pre-sales at Collibra, I've run post sales, partnerships, product, et cetera, and as our company got a little bit biggish, we're now 1,200 something like that, people in the company I believe, systems and processes become a lot more important, right? So we said, you know, Collibra isn't the size of our customers yet, but we're getting there in terms of organization, structure, process systems et cetera. So we said it's really time for us to put our money where our mouth is, and to set up our own data office, which is what we were seeing that all of our customers are doing, and which is what we're seeing that organizations worldwide are doing and Gartner was predicting as well. They said, okay, organizations have an HR unit, they have a finance unit, and over time they'll all have a department, if you will, that is responsible somehow for the data. >> Lisa: Hm. >> So we said, okay, let's try to set an example with Collibra. Let's set up our own data office in such a way that other people can take away with it, right? Can take away from it? So we set up a data strategy, we started building data products, took care of the data infrastructure, that sort of good stuff, And in doing all of that, Lisa, exactly as you said, we said, okay, we need to also use our own products and our own practices, right? And from that use, learn how we can make the product better, learn how we can make the practice better and share that learning with all of the markets, of course. And on Monday mornings, we sometimes refer to that as eating our own dog foods, Friday evenings, we refer to that as drinking our own champagne. >> Lisa: I like it. >> So we, we had a (both chuckle) We had the drive do this, you know, there's a clear business reason, so we involved, we included that in the data strategy and that's a little bit of our origin. Now how, how do we organize this? We have three pillars, and by no means is this a template that everyone should follow. This is just the organization that works at our company, but it can serve as an inspiration. So we have pillars, which is data science, The data product builders, if you will or the people who help the business build data products, we have the data engineers who help keep the lights on for that data platform to make sure that the products, the data products, can run, the data can flow and, you know, the quality can be checked. And then we have a data intelligence or data governance pillar where we have those data governance data intelligence stakeholders who help the business as a sort of data partners to the business stakeholders. So that's how we've organized it. And then we started following the Collibra approach, which is, well, what are the challenges that our business stakeholders have in HR, finance, sales, marketing all over? And how can data help overcome those challenges? And from those use cases, we then just started to build a roadmap, and started execution on use case after use case. And a few important ones there are very simple, we see them with all our customers as well, people love talking about the catalog, right? The catalog for the data scientists to know what's in their data lake, for example, and for the people in Deagle and privacy, So they have their process registry, and they can see how the data flows. So that's a popular starting place and that turns into a marketplace so that if new analysts and data citizens join Collibra, they immediately have a place to go to to look at what data is out there for me as an analyst or data scientist or whatever, to do my job, right? So they can immediately get access to the data. And another one that we did is around trusted business reporting. We're seeing that, since 2008, you know, self-service BI allowed everyone to make beautiful dashboards, you know, by pie charts. I always, my pet peeve is the pie charts because I love pie, and you shouldn't always be using pie charts, but essentially there's become proliferation of those reports. And now executives don't really know, okay, should I trust this report or that report? They're reporting on the same thing but the numbers seem different, right? So that's why we have trusted business reporting. So we know if the reports, the dashboard, a data product essentially, is built, we know that all the right steps are being followed, and that whoever is consuming that can be quite confident in the result. >> Lisa: Right, and that confidence is absolutely key. >> Exactly. Yes. >> Absolutely. Talk a little bit about some of the the key performance indicators that you're using to measure the success of the data office. What are some of those KPIs? >> KPIs and measuring is a big topic in the chief data officer profession I would say, and again, it always varies, with respect to your organization, but there's a few that we use that might be of interest to you. So remember you have those three pillars, right? And we have metrics across those pillars. So, for example, a pillar on the data engineering side is going to be more related to that uptime, right? Is the data platform up and running? Are the data products up and running? Is the quality in them good enough? Is it going up? Is it going down? What's the usage? But also, and especially if you're in the cloud and if consumption's a big thing, you have metrics around cost, for example, right? So that's one set of examples. Another one is around the data signs and the products. Are people using them? Are they getting value from it? Can we calculate that value in a monetary perspective, right? >> Lisa: Yes. >> So that we can, to the rest of the business, continue to say, "We're tracking all those numbers and those numbers indicate that value is generated" and how much value estimated in that region. And then you have some data intelligence, data governance metrics, which is, for example you have a number of domains in a data mesh [Indistinct] People talk about being the owner a data domain for example, like product or customer. So how many of those domains do you have covered? How many of them are already part of the program? How many of them have owners assigned? How well are these owners organized, executing on their responsibilities? How many tickets are open? Closed? How many data products are built according to process? And so on and so forth, so these are a set of examples of KPI's. There's a lot more but hopefully those can already inspire the audience. >> Absolutely. So we've, we've talked about the rise of cheap data offices, it's only accelerating. You mentioned this is like a 10-year journey. So if you were to look into a crystal ball, what do you see, in terms of the maturation of data offices over the next decade? >> So we, we've seen, indeed, the role sort of grow up. I think in 2010 there may have been like, 10 chief data officers or something, Gartner has exact numbers on them. But then they grew, you know, 400's they were like mostly in financial services, but they expanded them to all industries and the number is estimated to be about 20,000 right now. >> Wow. >> And they evolved in a sort of stack of competencies, defensive data strategy, because the first chief data officers were more regulatory driven, offensive data strategy, support for the digital program and now all about data products, right? So as a data leader, you now need all those competences and need to include them in your strategy. How is that going to evolve for the next couple of years? I wish I had one of those crystal balls, right? But essentially, I think for the next couple of years there's going to be a lot of people, you know, still moving along with those four levels of the stack. A lot of people I see are still in version one and version two of the chief data officers. So you'll see, over the years that's going to evolve more digital and more data products. So for the next three, five years, my prediction is it's all going to be about data products because it's an immediate link between the data and the dollar essentially. >> Right. >> So that's going to be important and quite likely a new, some new things will be added on, which nobody can predict yet. But we'll see those pop up a few years. I think there's going to be a continued challenge for the chief data officer role to become a real executive role as opposed to, you know, somebody who claims that they're executive, but then they're not, right? So the real reporting level into the board, into the CEO for example, will continue to be a challenging point. But the ones who do get that done, will be the ones that are successful, and the ones who get that done will be the ones that do it on the basis of data monetization, right? Connecting value to the data and making that very clear to all the data citizens in the organization, right? >> Right, really creating that value chain. >> In that sense they'll need to have both, you know, technical audiences and non-technical audiences aligned of course, and they'll need to focus on adoption. Again, it's not enough to just have your data office be involved in this. It's really important that you are waking up data citizens across the organization and you make everyone in the organization think about data as an essence. >> Absolutely, because there's so much value that can be extracted if organizations really strategically build that data office and democratize access across all those data citizens. Stan, this is an exciting arena. We're definitely going to keep our eyes on this. Sounds like a lot of evolution and maturation coming from the data office perspective. From the data citizen perspective. And as the data show, that you mentioned in that IDC study you mentioned Gartner as well. Organizations have so much more likelihood of being successful and being competitive. So we're going to watch this space. Stan, thank you so much for joining me on theCUBE at Data Citizens 22. We appreciate it. >> Thanks for having me over. >> From Data Citizens 22, I'm Lisa Martin you're watching theCUBE, the leader in live tech coverage. (inspiring rock music) >> Okay, this concludes our coverage of Data Citizens 2022 brought to you by Collibra. Remember, all these videos are available on demand at theCUBE.net. And don't forget to check out siliconangle.com for all the news and wikibon.com for our weekly breaking analysis series where we cover many data topics and share survey research from our partner ETR, Enterprise Technology Research. If you want more information on the products announced at Data Citizens, go to Collibra.com. There are tons of resources there. You'll find analyst reports, product demos. It's really worthwhile to check those out. Thanks for watching our program and digging into Data Citizens 2022 on theCUBE Your leader in enterprise and emerging tech coverage. We'll see you soon. (inspiring rock music continues)
SUMMARY :
brought to you by Collibra. Talk to us about what you is that the ones who that you just mentioned demonstrates And that strategy needs to and minds of the data champions Talk to us about how you are building So we said, you know, of the data infrastructure, We had the drive do this, you know, Lisa: Right, and that Yes. little bit about some of the in the chief data officer profession So that we can, to So if you were to look the number is estimated to So for the next three, five that do it on the basis of that value chain. in the organization think And as the data show, that you you're watching theCUBE, the brought to you by Collibra.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Collibra | ORGANIZATION | 0.99+ |
Lisa Martin | PERSON | 0.99+ |
Lisa | PERSON | 0.99+ |
Lisa Martin | PERSON | 0.99+ |
2010 | DATE | 0.99+ |
Stan | PERSON | 0.99+ |
Gartner | ORGANIZATION | 0.99+ |
1,200 | QUANTITY | 0.99+ |
Stan Christians | PERSON | 0.99+ |
Barb | PERSON | 0.99+ |
10-year | QUANTITY | 0.99+ |
2008 | DATE | 0.99+ |
one angle | QUANTITY | 0.99+ |
one part | QUANTITY | 0.99+ |
ETR | ORGANIZATION | 0.99+ |
both | QUANTITY | 0.99+ |
10 chief data officers | QUANTITY | 0.99+ |
DJ Patil | PERSON | 0.99+ |
15 years | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
Stijn Christiaens | PERSON | 0.99+ |
400 | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
siliconangle.com | OTHER | 0.98+ |
IDC | ORGANIZATION | 0.98+ |
MIT | ORGANIZATION | 0.98+ |
three pillars | QUANTITY | 0.98+ |
Cube | ORGANIZATION | 0.98+ |
one | QUANTITY | 0.98+ |
Monday mornings | DATE | 0.98+ |
Enterprise Technology Research | ORGANIZATION | 0.97+ |
four people | QUANTITY | 0.97+ |
One | QUANTITY | 0.97+ |
over a thousand people | QUANTITY | 0.97+ |
second part | QUANTITY | 0.97+ |
three times | QUANTITY | 0.97+ |
theCUBE.net | OTHER | 0.97+ |
Data Citizens | EVENT | 0.96+ |
about 20,000 | QUANTITY | 0.96+ |
Data Citizens 22 | ORGANIZATION | 0.95+ |
Data Citizens 22 | EVENT | 0.95+ |
five years | QUANTITY | 0.94+ |
one set | QUANTITY | 0.94+ |
next decade | DATE | 0.94+ |
Friday evenings | DATE | 0.94+ |
earlier this year | DATE | 0.93+ |
theCUBE | ORGANIZATION | 0.92+ |
next couple of years | DATE | 0.89+ |
next couple of years | DATE | 0.89+ |
first chief | QUANTITY | 0.87+ |
ePlus | TITLE | 0.87+ |
Data | EVENT | 0.82+ |
Collibra.com | OTHER | 0.79+ |
version one | OTHER | 0.78+ |
four levels | QUANTITY | 0.76+ |
version two | OTHER | 0.76+ |
three | QUANTITY | 0.73+ |
Citizens | ORGANIZATION | 0.7+ |
Data Citizens | ORGANIZATION | 0.65+ |
wikibon.com | ORGANIZATION | 0.65+ |
Absolu | PERSON | 0.64+ |
22 | EVENT | 0.64+ |
Data Citizens 2022 | TITLE | 0.63+ |
Felix Van de Maele, Collibra, Data Citizens 22
(upbeat techno music) >> Collibra is a company that was founded in 2008 right before the so-called modern big data era kicked into high gear. The company was one of the first to focus its business on data governance. Now, historically, data governance and data quality initiatives, they were back office functions, and they were largely confined to regulated industries that had to comply with public policy mandates. But as the cloud went mainstream the tech giants showed us how valuable data could become, and the value proposition for data quality and trust, it evolved from primarily a compliance driven issue, to becoming a linchpin of competitive advantage. But, data in the decade of the 2010s was largely about getting the technology to work. You had these highly centralized technical teams that were formed and they had hyper-specialized skills, to develop data architectures and processes, to serve the myriad data needs of organizations. And it resulted in a lot of frustration, with data initiatives for most organizations, that didn't have the resources of the cloud guys and the social media giants, to really attack their data problems and turn data into gold. This is why today, for example, there's quite a bit of momentum to re-thinking monolithic data architectures. You see, you hear about initiatives like Data Mesh and the idea of data as a product. They're gaining traction as a way to better serve the the data needs of decentralized business users. You hear a lot about data democratization. So these decentralization efforts around data, they're great, but they create a new set of problems. Specifically, how do you deliver, like a self-service infrastructure to business users and domain experts? Now the cloud is definitely helping with that but also, how do you automate governance? This becomes especially tricky as protecting data privacy has become more and more important. In other words, while it's enticing to experiment, and run fast and loose with data initiatives, kind of like the Wild West, to find new veins of gold, it has to be done responsibly. As such, the idea of data governance has had to evolve to become more automated and intelligent. Governance and data lineage is still fundamental to ensuring trust as data. It moves like water through an organization. No one is going to use data that is entrusted. Metadata has become increasingly important for data discovery and data classification. As data flows through an organization, the continuously ability to check for data flaws and automating that data quality, they become a functional requirement of any modern data management platform. And finally, data privacy has become a critical adjacency to cyber security. So you can see how data governance has evolved into a much richer set of capabilities than it was 10 or 15 years ago. Hello and welcome to theCUBE's coverage of Data Citizens made possible by Collibra, a leader in so-called Data intelligence and the host of Data Citizens 2022, which is taking place in San Diego. My name is Dave Vellante and I'm one of the hosts of our program which is running in parallel to Data Citizens. Now at theCUBE we like to say we extract the signal from the noise, and over the next couple of days we're going to feature some of the themes from the keynote speakers at Data Citizens, and we'll hear from several of the executives. Felix Van de Maele, who is the co-founder and CEO of Collibra, will join us. Along with one of the other founders of Collibra, Stan Christiaens, who's going to join my colleague Lisa Martin. I'm going to also sit down with Laura Sellers, she's the Chief Product Officer at Collibra. We'll talk about some of the the announcements and innovations they're making at the event, and then we'll dig in further to data quality with Kirk Haslbeck. He's the Vice President of Data Quality at Collibra. He's an amazingly smart dude who founded Owl DQ, a company that he sold to Collibra last year. Now, many companies they didn't make it through the Hadoop era, you know they missed the industry waves and they became driftwood. Collibra, on the other hand, has evolved its business, they've leveraged the cloud, expanded its product portfolio and leaned in heavily to some major partnerships with cloud providers as well as receiving a strategic investment from Snowflake, earlier this year. So, it's a really interesting story that we're thrilled to be sharing with you. Thanks for watching and I hope you enjoy the program. (upbeat rock music) Last year theCUBE covered Data Citizens, Collibra's customer event, and the premise that we put forth prior to that event was that despite all the innovation that's gone on over the last decade or more with data, you know starting with the Hadoop movement, we had Data lakes, we had Spark, the ascendancy of programming languages like Python, the introduction of frameworks like Tensorflow, the rise of AI, Low Code, No Code, et cetera. Businesses still find it's too difficult to get more value from their data initiatives, and we said at the time, you know maybe it's time to rethink data innovation. While a lot of the effort has been focused on, you more efficiently storing and processing data, perhaps more energy needs to go into thinking about the people and the process side of the equation. Meaning, making it easier for domain experts to both gain insights from data, trust the data, and begin to use that data in new ways, fueling data products, monetization, and insights. Data Citizens 2022 is back and we're pleased to have Felix Van de Maele who is the founder and CEO of Collibra. He's on theCUBE. We're excited to have you Felix. Good to see you again. >> Likewise Dave. Thanks for having me again. >> You bet. All right, we're going to get the update from Felix on the current data landscape, how he sees it why data intelligence is more important now than ever, and get current on what Collibra has been up to over the past year, and what's changed since Data citizens 2021, and we may even touch on some of the product news. So Felix, we're living in a very different world today with businesses and consumers. They're struggling with things like supply chains, uncertain economic trends and we're not just snapping back to the 2010s, that's clear, and that's really true as well in the world of data. So what's different in your mind, in the data landscape of the 2020s, from the previous decade, and what challenges does that bring for your customers? >> Yeah, absolutely, and and I think you said it well, Dave and the intro that, that rising complexity and fragmentation, in the broader data landscape, that hasn't gotten any better over the last couple of years. When when we talk to our customers, that level of fragmentation, the complexity, how do we find data that we can trust, that we know we can use, has only gotten more more difficult. So that trend that's continuing, I think what is changing is that trend has become much more acute. Well, the other thing we've seen over the last couple of years is that the level of scrutiny that organizations are under, respect to data, as data becomes more mission critical, as data becomes more impactful than important, the level of scrutiny with respect to privacy, security, regulatory compliance, as only increasing as well. Which again, is really difficult in this environment of continuous innovation, continuous change, continuous growing complexity, and fragmentation. So, it's become much more acute. And to your earlier point, we do live in a different world and and the past couple of years we could probably just kind of brute force it, right? We could focus on, on the top line, there was enough kind of investments to be, to be had. I think nowadays organizations are focused or are, are, are are, are, are in a very different environment where there's much more focus on cost control, productivity, efficiency, how do we truly get the value from that data? So again, I think it just another incentive for organization to now truly look at data and to scale with data, not just from a a technology and infrastructure perspective, but how do we actually scale data from an organizational perspective, right? You said at the, the people and process, how do we do that at scale? And that's only, only, only becoming much more important, and we do believe that the, the economic environment that we find ourselves in today is going to be catalyst for organizations to really take that more seriously if, if, if you will, than they maybe have in the have in the past. >> You know, I don't know when you guys founded Collibra, if you had a sense as to how complicated it was going to get, but you've been on a mission to really address these problems from the beginning. How would you describe your, your, your mission and what are you doing to address these challenges? >> Yeah, absolutely. We, we started Collibra in 2008. So, in some sense and the, the last kind of financial crisis and that was really the, the start of Collibra, where we found product market fit, working with large financial institutions to help them cope with the increasing compliance requirements that they were faced with because of the, of the financial crisis. And kind of here we are again, in a very different environment of course 15 years, almost 15 years later, but data only becoming more important. But our mission to deliver trusted data for every user, every use case and across every source, frankly, has only become more important. So, what has been an incredible journey over the last 14, 15 years, I think we're still relatively early in our mission to again, be able to provide everyone, and that's why we call it Data Citizens, we truly believe that everyone in the organization should be able to use trusted data in an easy, easy matter. That mission is is only becoming more important, more relevant. We definitely have a lot more work ahead of us because we still relatively early in that, in that journey. >> Well that's interesting, because you know, in my observation it takes 7 to 10 years to actually build a company, and then the fact that you're still in the early days is kind of interesting. I mean, you, Collibra's had a good 12 months or so since we last spoke at Data Citizens. Give us the latest update on your business. What do people need to know about your current momentum? >> Yeah, absolutely. Again, there's a lot of tailwind organizations that are only maturing their data practices and we've seen that kind of transform or influence a lot of our business growth that we've seen, broader adoption of the platform. We work at some of the largest organizations in the world with its Adobe, Heineken, Bank of America and many more. We have now over 600 enterprise customers, all industry leaders and every single vertical. So it's, it's really exciting to see that and continue to partner with those organizations. On the partnership side, again, a lot of momentum in the org in the, in the market with some of the cloud partners like Google, Amazon, Snowflake, Data Breaks, and and others, right? As those kind of new modern data infrastructures, modern data architectures, are definitely all moving to the cloud. A great opportunity for us, our partners, and of course our customers, to help them kind of transition to the cloud even faster. And so we see a lot of excitement and momentum there. We did an acquisition about 18 months ago around data quality, data observability, which we believe is an enormous opportunity. Of course data quality isn't new but I think there's a lot of reasons why we're so excited about quality and observability now. One, is around leveraging AI machine learning again to drive more automation. And a second is that those data pipelines, that are now being created in the cloud, in these modern data architecture, architectures, they've become mission critical. They've become real time. And so monitoring, observing those data pipelines continuously, has become absolutely critical so that they're really excited about, about that as well. And on the organizational side, I'm sure you've heard the term around kind of data mesh, something that's gaining a lot of momentum, rightfully so. It's really the type of governance that we always believed in. Federated, focused on domains, giving a lot of ownership to different teams. I think that's the way to scale data organizations, and so that aligns really well with our vision and from a product perspective, we've seen a lot of momentum with our customers there as well. >> Yeah, you know, a couple things there. I mean, the acquisition of OwlDQ, you know Kirk Haslbeck and, and their team. It's interesting, you know the whole data quality used to be this back office function and and really confined to highly regulated industries. It's come to the front office, it's top of mind for Chief Data Officers. Data mesh, you mentioned you guys are a connective tissue for all these different nodes on the data mesh. That's key. And of course we see you at all the shows. You're, you're a critical part of many ecosystems and you're developing your own ecosystem. So, let's chat a little bit about the, the products. We're going to go deeper into products later on, at Data Citizens 22, but we know you're debuting some, some new innovations, you know, whether it's, you know, the the under the covers in security, sort of making data more accessible for people, just dealing with workflows and processes, as you talked about earlier. Tell us a little bit about what you're introducing. >> Yeah, absolutely. We we're super excited, a ton of innovation. And if we think about the big theme and like, like I said, we're still relatively early in this, in this journey towards kind of that mission of data intelligence that really bolts and compelling mission. Either customers are still start, are just starting on that, on that journey. We want to make it as easy as possible for the, for organization to actually get started, because we know that's important that they do. And for our organization and customers, that have been with us for some time, there's still a tremendous amount of opportunity to kind of expand the platform further. And again to make it easier for, really to, to accomplish that mission and vision around that Data Citizen, that everyone has access to trustworthy data in a very easy, easy way. So that's really the theme of a lot of the innovation that we're driving, a lot of kind of ease of adoption, ease of use, but also then, how do we make sure that, as clear becomes this kind of mission critical enterprise platform, from a security performance, architecture scale supportability, that we're truly able to deliver that kind of an enterprise mission critical platform. And so that's the big theme. From an innovation perspective, from a product perspective, a lot of new innovation that we're really excited about. A couple of highlights. One, is around data marketplace. Again, a lot of our customers have plans in that direction, How to make it easy? How do we make How do we make available to true kind of shopping experience? So that anybody in the organization can, in a very easy search first way, find the right data product, find the right dataset, that they can then consume. Usage analytics, how do you, how do we help organizations drive adoption? Tell them where they're working really well and where they have opportunities. Homepages again to, to make things easy for, for people, for anyone in your organization, to kind of get started with Collibra. You mentioned Workflow Designer, again, we have a very powerful enterprise platform, one of our key differentiators is the ability to really drive a lot of automation through workflows. And now we provided a, a new Low-Code, No-Code kind of workflow designer experience. So, so really customers can take it to the next level. There's a lot more new product around Collibra protect, which in partnership with Snowflake, which has been a strategic investor in Collibra, focused on how do we make access governance easier? How do we, how do we, how are we able to make sure that as you move to the cloud, things like access management, masking around sensitive data, PIA data, is managed as a much more effective, effective rate. Really excited about that product. There's more around data quality. Again, how do we, how do we get that deployed as easily, and quickly, and widely as we can? Moving that to the cloud has been a big part of our strategy. So, we launch our data quality cloud product, as well as making use of those, those native compute capabilities and platforms, like Snowflake, Databricks, Google, Amazon, and others. And so we are bettering a capability, a capability that we call push down, so we're actually pushing down the computer and data quality, to monitoring into the underlying platform, which again from a scale performance and ease of use perspective, is going to make a massive difference. And then more broadly, we talked a little bit about the ecosystem. Again, integrations, we talk about being able to connect to every source. Integrations are absolutely critical, and we're really excited to deliver new integrations with Snowflake, Azure and Google Cloud storage as well. So that's a lot coming out, the team has been work, at work really hard, and we are really really excited about what we are coming, what we're bringing to market. >> Yeah, a lot going on there. I wonder if you could give us your, your closing thoughts. I mean, you you talked about, you know, the marketplace, you know you think about Data Mesh, you think of data as product, one of the key principles, you think about monetization. This is really different than what we've been used to in data, which is just getting the technology to work has been, been so hard. So, how do you see sort of the future and, you know give us the, your closing thoughts please? >> Yeah, absolutely. And, and I think we we're really at a pivotal moment and I think you said it well. We, we all know the constraint and the challenges with data, how to actually do data at scale. And while we've seen a ton of innovation on the infrastructure side, we fundamentally believe that just getting a faster database is important, but it's not going to fully solve the challenges and truly kind of deliver on the opportunity. And that's why now is really the time to, deliver this data intelligence vision, this data intelligence platform. We are still early, making it as easy as we can, as kind of our, as our mission. And so I'm really, really excited to see what we, what we are going to, how the marks are going to evolve over the next, next few quarters and years. I think the trend is clearly there. We talked about Data Mesh, this kind of federated approach focus on data products, is just another signal that we believe, that a lot of our organization are now at the time, they're understanding need to go beyond just the technology. I really, really think about how to actually scale data as a business function, just like we've done with IT, with HR, with sales and marketing, with finance. That's how we need to think about data. I think now is the time, given the economic environment that we are in, much more focus on control, much more focus on productivity, efficiency, and now is the time we need to look beyond just the technology and infrastructure to think of how to scale data, how to manage data at scale. >> Yeah, it's a new era. The next 10 years of data won't be like the last, as I always say. Felix, thanks so much. Good luck in, in San Diego. I know you're going to crush it out there. >> Thank you Dave. >> Yeah, it's a great spot for an in-person event and and of course the content post-event is going to be available at collibra.com and you can of course catch theCUBE coverage at theCUBE.net and all the news at siliconangle.com. This is Dave Vellante for theCUBE, your leader in enterprise and emerging tech coverage. (upbeat techno music)
SUMMARY :
and the premise that we put for having me again. in the data landscape of the 2020s, and to scale with data, and what are you doing to And kind of here we are again, still in the early days a lot of momentum in the org in the, And of course we see you at all the shows. is the ability to the technology to work and now is the time we need to look of data won't be like the and of course the content
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Vellante | PERSON | 0.99+ |
Lisa Martin | PERSON | 0.99+ |
Heineken | ORGANIZATION | 0.99+ |
Adobe | ORGANIZATION | 0.99+ |
Felix Van de Maele | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
Laura Sellers | PERSON | 0.99+ |
Collibra | ORGANIZATION | 0.99+ |
2008 | DATE | 0.99+ |
Felix | PERSON | 0.99+ |
San Diego | LOCATION | 0.99+ |
Stan Christiaens | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Bank of America | ORGANIZATION | 0.99+ |
7 | QUANTITY | 0.99+ |
Snowflake | ORGANIZATION | 0.99+ |
2020s | DATE | 0.99+ |
last year | DATE | 0.99+ |
2010s | DATE | 0.99+ |
Data Breaks | ORGANIZATION | 0.99+ |
Python | TITLE | 0.99+ |
Last year | DATE | 0.99+ |
12 months | QUANTITY | 0.99+ |
siliconangle.com | OTHER | 0.99+ |
one | QUANTITY | 0.99+ |
Data Citizens | ORGANIZATION | 0.99+ |
Databricks | ORGANIZATION | 0.99+ |
Owl DQ | ORGANIZATION | 0.98+ |
10 | DATE | 0.98+ |
OwlDQ | ORGANIZATION | 0.98+ |
Kirk Haslbeck | PERSON | 0.98+ |
10 years | QUANTITY | 0.98+ |
One | QUANTITY | 0.98+ |
Spark | TITLE | 0.98+ |
today | DATE | 0.98+ |
first | QUANTITY | 0.97+ |
Data Citizens | EVENT | 0.97+ |
earlier this year | DATE | 0.96+ |
Tensorflow | TITLE | 0.96+ |
Data Citizens 22 | ORGANIZATION | 0.95+ |
both | QUANTITY | 0.94+ |
theCUBE | ORGANIZATION | 0.94+ |
15 years ago | DATE | 0.93+ |
over 600 enterprise customers | QUANTITY | 0.91+ |
past couple of years | DATE | 0.91+ |
about 18 months ago | DATE | 0.9+ |
collibra.com | OTHER | 0.89+ |
Data citizens 2021 | ORGANIZATION | 0.88+ |
Data Citizens 2022 | EVENT | 0.86+ |
almost 15 years later | DATE | 0.85+ |
West | LOCATION | 0.85+ |
Azure | TITLE | 0.84+ |
first way | QUANTITY | 0.83+ |
Vice President | PERSON | 0.83+ |
last couple of years | DATE | 0.8+ |
Kirk Haslbeck, Collibra, Data Citizens 22
(atmospheric music) >> Welcome to theCUBE Coverage of Data Citizens 2022 Collibra's Customer event. My name is Dave Vellante. With us is Kirk Haslbeck, who's the Vice President of Data Quality of Collibra. Kirk, good to see you, welcome. >> Thanks for having me, Dave. Excited to be here. >> You bet. Okay, we're going to discuss data quality, observability. It's a hot trend right now. You founded a data quality company, OwlDQ, and it was acquired by Collibra last year. Congratulations. And now you lead data quality at Collibra. So we're hearing a lot about data quality right now. Why is it such a priority? Take us through your thoughts on that. >> Yeah, absolutely. It's definitely exciting times for data quality which you're right, has been around for a long time. So why now? And why is it so much more exciting than it used to be? I think it's a bit stale, but we all know that companies use more data than ever before, and the variety has changed and the volume has grown. And while I think that remains true there are a couple other hidden factors at play that everyone's so interested in as to why this is becoming so important now. And I guess you could kind of break this down simply and think about if Dave you and I were going to build a new healthcare application and monitor the heartbeat of individuals, imagine if we get that wrong, what the ramifications could be, what those incidents would look like. Or maybe better yet, we try to build a new trading algorithm with a crossover strategy where the 50 day crosses the 10 day average. And imagine if the data underlying the inputs to that is incorrect. We will probably have major financial ramifications in that sense. So, kind of starts there, where everybody's realizing that we're all data companies, and if we are using bad data we're likely making incorrect business decisions. But I think there's kind of two other things at play. I bought a car not too long ago and my dad called and said, "How many cylinders does it have?" And I realized in that moment, I might have failed him cause I didn't know. And I used to ask those types of questions about any lock breaks and cylinders, and if it's manual or automatic. And I realized, I now just buy a car that I hope works. And it's so complicated with all the computer chips. I really don't know that much about it. And that's what's happening with data. We're just loading so much of it. And it's so complex that the way companies consume them in the IT function is that they bring in a lot of data and then they syndicate it out to the business. And it turns out that the individuals loading and consuming all of this data for the company actually may not know that much about the data itself and that's not even their job anymore. So, we'll talk more about that in a minute, but that's really what's setting the foreground for this observability play and why everybody's so interested. It's because we're becoming less close to the intricacies of the data and we just expect it to always be there and be correct. >> You know, the other thing too about data quality, and for years we did the MIT, CDO, IQ event. We didn't do it last year at COVID, messed everything up. But the observation I would make there, your thoughts is, data quality used to be information quality, used to be this back office function, and then it became sort of front office with financial services, and government and healthcare, these highly regulated industries. And then the whole chief data officer thing happened and people were realizing, well they sort of flipped the bit from sort of a data as a risk to data as an asset. And now as we say, we're going to talk about observability. And so it's really become front and center, just the whole quality issue because data's so fundamental, hasn't it? >> Yeah, absolutely. I mean, let's imagine we pull up our phones right now and I go to my favorite stock ticker app, and I check out the Nasdaq market cap. I really have no idea if that's the correct number. I know it's a number, it looks large, it's in a numeric field. And that's kind of what's going on. There's so many numbers and they're coming from all of these different sources, and data providers, and they're getting consumed and passed along. But there isn't really a way to tactically put controls on every number and metric across every field we plan to monitor, but with the scale that we've achieved in early days, even before Collibra. And what's been so exciting is, we have these types of observation techniques, these data monitors that can actually track past performance of every field at scale. And why that's so interesting, and why I think the CDO is listening right intently nowadays to this topic is, so maybe we could surface all of these problems with the right solution of data observability and with the right scale, and then just be alerted on breaking trends. So we're sort of shifting away from this world of must write a condition and then when that condition breaks that was always known as a break record. But what about breaking trends and root cause analysis? And is it possible to do that with less human intervention? And so I think most people are seeing now that it's going to have to be a software tool and a computer system. It's not ever going to be based on one or two domain experts anymore. >> So how does data observability relate to data quality? Are they sort of two sides of the same coin? Are they cousins? What's your perspective on that? >> Yeah, it's super interesting. It's an emerging market. So the language is changing, a lot of the topic and areas changing. The way that I like to say it or break it down because the lingo is constantly moving, as a target on the space is really breaking records versus breaking trends. And I could write a condition when this thing happens it's wrong, and when it doesn't it's correct. Or I could look for a trend and I'll give you a good example. Everybody's talking about fresh data and stale data, and why would that matter? Well, if your data never arrived, or only part of it arrived, or didn't arrive on time, it's likely stale, and there will not be a condition that you could write that would show you all the good and the bads. That was kind of your traditional approach of data quality break records. But your modern day approach is you lost a significant portion of your data, or it did not arrive on time to make that decision accurately on time. And that's a hidden concern. Some people call this freshness, we call it stale data. But it all points to the same idea of the thing that you're observing may not be a data quality condition anymore. It may be a breakdown in the data pipeline. And with thousands of data pipelines in play for every company out there, there's more than a couple of these happening every day. >> So what's the Collibra angle on all this stuff? Made the acquisition, you got data quality, observability coming together. You guys have a lot of expertise in this area, but you hear providence of data. You just talked about stale data, the whole trend toward realtime. How is Collibra approaching the problem and what's unique about your approach? >> Well I think where we're fortunate is with our background. Myself and team, we sort of lived this problem for a long time in the Wall Street days about a decade ago. And we saw it from many different angles. And what we came up with, before it was called data observability or reliability, was basically the underpinnings of that. So we're a little bit ahead of the curve there when most people evaluate our solution. It's more advanced than some of the observation techniques that currently exist. But we've also always covered data quality and we believe that people want to know more, they need more insights. And they want to see break records and breaking trends together, so they can correlate the root cause. And we hear that all the time. "I have so many things going wrong just show me the big picture. Help me find the thing that if I were to fix it today would make the most impact." So we're really focused on root cause analysis, business impact, connecting it with lineage and catalog metadata. And as that grows you can actually achieve total data governance. At this point with the acquisition of what was a Lineage company years ago, and then my company OwlDQ, now Collibra Data Quality. Collibra may be the best positioned for total data governance and intelligence in the space. >> Well, you mentioned financial services a couple of times and some examples, remember the flash crash in 2010. Nobody had any idea what that was. They would just say, "Oh, it's a glitch." So they didn't understand the root cause of it. So this is a really interesting topic to me. So we know at Data Citizens 22 that you're announcing, you got to announce new products, right? It is your yearly event. What's new? Give us a sense as to what products are coming out but specifically around data quality and observability. >> Absolutely. There's this, there's always a next thing on the forefront. And the one right now is these hyperscalers in the cloud. So you have databases like Snowflake and BigQuery, and Databricks, Delta Lake and SQL Pushdown. And ultimately what that means is a lot of people are storing in loading data even faster in a SaaS like model. And we've started to hook into these databases, and while we've always worked with the same databases in the past they're supported today. We're doing something called Native Database pushdown, where the entire compute and data activity happens in the database. And why that is so interesting and powerful now? Is everyone's concerned with something called Egress. Did my data that I've spent all this time and money with my security team securing ever leave my hands, did it ever leave my secure VPC as they call it? And with these native integrations that we're building and about to unveil here as kind of a sneak peak for next week at Data Citizens, we're now doing all compute and data operations in databases like Snowflake. And what that means is with no install and no configuration you could log into the Collibra data quality app and have all of your data quality running inside the database that you've probably already picked as your go forward team selection secured database of choice. So we're really excited about that. And I think if you look at the whole landscape of network cost, egress cost, data storage and compute, what people are realizing is it's extremely efficient to do it in the way that we're about to release here next week. >> So this is interesting because what you just described, you mentioned Snowflake, you mentioned Google, oh actually you mentioned yeah, Databricks. You know, Snowflake has the data cloud. If you put everything in the data cloud, okay, you're cool. But then Google's got the open data cloud. If you heard, Google next. And now Databricks doesn't call it the data cloud, but they have like the open source data cloud. So you have all these different approaches and there's really no way, up until now I'm hearing, to really understand the relationships between all those and have confidence across, it's like yamarket AMI, you should just be a note on the mesh. I don't care if it's a data warehouse or a data lake, or where it comes from, but it's a point on that mesh and I need tooling to be able to have confidence that my data is governed and has the proper lineage, providence. And that's what you're bringing to the table. Is that right? Did I get that right? >> Yeah, that's right. And it's, for us, it's not that we haven't been working with those great cloud databases, but it's the fact that we can send them the instructions now we can send them the operating ability to crunch all of the calculations, the governance, the quality, and get the answers. And what that's doing, it's basically zero network cost, zero egress cost, zero latency of time. And so when you were to log into BigQuery tomorrow using our tool, or say Snowflake for example, you have instant data quality metrics, instant profiling, instant lineage in access, privacy controls, things of that nature that just become less onerous. What we're seeing is there's so much technology out there just like all of the major brands that you mentioned but how do we make it easier? The future is about less clicks, faster time to value, faster scale, and eventually lower cost. And we think that this positions us to be the leader there. >> I love this example because, we've got talks about well the cloud guys you're going to own the world. And of course now we're seeing that the ecosystem is finding so much white space to add value connect across cloud. Sometimes we call it super cloud and so, or inter clouding. Alright, Kirk, give us your final thoughts on the trends that we've talked about and data Citizens 22. >> Absolutely. Well I think, one big trend is discovery and classification. Seeing that across the board, people used to know it was a zip code and nowadays with the amount of data that's out there they want to know where everything is, where their sensitive data is, if it's redundant, tell me everything inside of three to five seconds. And with that comes, they want to know in all of these hyperscale databases how fast they can get controls and insights out of their tools. So I think we're going to see more one click solutions, more SaaS based solutions, and solutions that hopefully prove faster time to value on all of these modern cloud platforms. >> Excellent. All right, Kirk Haslbeck, thanks so much for coming on theCUBE and previewing Data Citizens 22. Appreciate it. >> Thanks for having me, Dave. >> You're welcome. All right. And thank you for watching. Keep it right there for more coverage from theCUBE. (atmospheric music)
SUMMARY :
Kirk, good to see you, welcome. Excited to be here. And now you lead data quality at Collibra. And it's so complex that the And now as we say, we're going and I check out the Nasdaq market cap. of the thing that you're observing and what's unique about your approach? ahead of the curve there and some examples, And the one right now is these and has the proper lineage, providence. and get the answers. And of course now we're and solutions that hopefully and previewing Data Citizens 22. And thank you for watching.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Vellante | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Collibra | ORGANIZATION | 0.99+ |
2010 | DATE | 0.99+ |
Kirk Haslbeck | PERSON | 0.99+ |
one | QUANTITY | 0.99+ |
OwlDQ | ORGANIZATION | 0.99+ |
Kirk | PERSON | 0.99+ |
50 day | QUANTITY | 0.99+ |
ORGANIZATION | 0.99+ | |
10 day | QUANTITY | 0.99+ |
Databricks | ORGANIZATION | 0.99+ |
two sides | QUANTITY | 0.99+ |
last year | DATE | 0.99+ |
Collibra Data Quality | ORGANIZATION | 0.99+ |
next week | DATE | 0.99+ |
Data Citizens | ORGANIZATION | 0.99+ |
tomorrow | DATE | 0.98+ |
two other things | QUANTITY | 0.98+ |
BigQuery | TITLE | 0.98+ |
five seconds | QUANTITY | 0.98+ |
one click | QUANTITY | 0.97+ |
today | DATE | 0.97+ |
Collibra | TITLE | 0.96+ |
Wall Street | LOCATION | 0.96+ |
SQL Pushdown | TITLE | 0.94+ |
Data Citizens 22 | ORGANIZATION | 0.93+ |
COVID | ORGANIZATION | 0.93+ |
Snowflake | TITLE | 0.91+ |
Nasdaq | ORGANIZATION | 0.9+ |
Data Citizens 22 | ORGANIZATION | 0.89+ |
Delta Lake | TITLE | 0.89+ |
Egress | ORGANIZATION | 0.89+ |
MIT | EVENT | 0.89+ |
more than a couple | QUANTITY | 0.87+ |
a decade ago | DATE | 0.85+ |
zero | QUANTITY | 0.84+ |
Citizens | ORGANIZATION | 0.83+ |
Data Citizens 2022 Collibra | EVENT | 0.83+ |
years | DATE | 0.81+ |
thousands of data | QUANTITY | 0.8+ |
Data Citizens 22 | TITLE | 0.78+ |
two domain experts | QUANTITY | 0.77+ |
Snowflake | ORGANIZATION | 0.76+ |
IQ | EVENT | 0.76+ |
couple | QUANTITY | 0.75+ |
Collibra | PERSON | 0.75+ |
theCUBE | ORGANIZATION | 0.71+ |
many numbers | QUANTITY | 0.7+ |
Vice President | PERSON | 0.68+ |
Lineage | ORGANIZATION | 0.66+ |
Databricks | TITLE | 0.64+ |
too long ago | DATE | 0.62+ |
three | QUANTITY | 0.6+ |
Data | ORGANIZATION | 0.57+ |
CDO | EVENT | 0.53+ |
minute | QUANTITY | 0.53+ |
CDO | TITLE | 0.53+ |
number | QUANTITY | 0.51+ |
AMI | ORGANIZATION | 0.44+ |
Quality | PERSON | 0.43+ |
Collibra Data Citizens 22
>>Collibra is a company that was founded in 2008 right before the so-called modern big data era kicked into high gear. The company was one of the first to focus its business on data governance. Now, historically, data governance and data quality initiatives, they were back office functions and they were largely confined to regulatory regulated industries that had to comply with public policy mandates. But as the cloud went mainstream, the tech giants showed us how valuable data could become and the value proposition for data quality and trust. It evolved from primarily a compliance driven issue to becoming a lynchpin of competitive advantage. But data in the decade of the 2010s was largely about getting the technology to work. You had these highly centralized technical teams that were formed and they had hyper specialized skills to develop data architectures and processes to serve the myriad data needs of organizations. >>And it resulted in a lot of frustration with data initiatives for most organizations that didn't have the resources of the cloud guys and the social media giants to really attack their data problems and turn data into gold. This is why today for example, this quite a bit of momentum to rethinking monolithic data architectures. You see, you hear about initiatives like data mesh and the idea of data as a product. They're gaining traction as a way to better serve the the data needs of decentralized business Uni users, you hear a lot about data democratization. So these decentralization efforts around data, they're great, but they create a new set of problems. Specifically, how do you deliver like a self-service infrastructure to business users and domain experts? Now the cloud is definitely helping with that, but also how do you automate governance? This becomes especially tricky as protecting data privacy has become more and more important. >>In other words, while it's enticing to experiment and run fast and loose with data initiatives kinda like the Wild West, to find new veins of gold, it has to be done responsibly. As such, the idea of data governance has had to evolve to become more automated. And intelligence governance and data lineage is still fundamental to ensuring trust as data. It moves like water through an organization. No one is gonna use data that isn't trusted. Metadata has become increasingly important for data discovery and data classification. As data flows through an organization, the continuously ability to check for data flaws and automating that data quality, they become a functional requirement of any modern data management platform. And finally, data privacy has become a critical adjacency to cyber security. So you can see how data governance has evolved into a much richer set of capabilities than it was 10 or 15 years ago. >>Hello and welcome to the Cube's coverage of Data Citizens made possible by Calibra, a leader in so-called Data intelligence and the host of Data Citizens 2022, which is taking place in San Diego. My name is Dave Ante and I'm one of the hosts of our program, which is running in parallel to data citizens. Now at the Cube we like to say we extract the signal from the noise, and over the, the next couple of days, we're gonna feature some of the themes from the keynote speakers at Data Citizens and we'll hear from several of the executives. Felix Von Dala, who is the co-founder and CEO of Collibra, will join us along with one of the other founders of Collibra, Stan Christians, who's gonna join my colleague Lisa Martin. I'm gonna also sit down with Laura Sellers, she's the Chief Product Officer at Collibra. We'll talk about some of the, the announcements and innovations they're making at the event, and then we'll dig in further to data quality with Kirk Hasselbeck. >>He's the vice president of Data quality at Collibra. He's an amazingly smart dude who founded Owl dq, a company that he sold to Col to Collibra last year. Now many companies, they didn't make it through the Hado era, you know, they missed the industry waves and they became Driftwood. Collibra, on the other hand, has evolved its business. They've leveraged the cloud, expanded its product portfolio, and leaned in heavily to some major partnerships with cloud providers, as well as receiving a strategic investment from Snowflake earlier this year. So it's a really interesting story that we're thrilled to be sharing with you. Thanks for watching and I hope you enjoy the program. >>Last year, the Cube Covered Data Citizens Collibra's customer event. And the premise that we put forth prior to that event was that despite all the innovation that's gone on over the last decade or more with data, you know, starting with the Hado movement, we had data lakes, we'd spark the ascendancy of programming languages like Python, the introduction of frameworks like TensorFlow, the rise of ai, low code, no code, et cetera. Businesses still find it's too difficult to get more value from their data initiatives. And we said at the time, you know, maybe it's time to rethink data innovation. While a lot of the effort has been focused on, you know, more efficiently storing and processing data, perhaps more energy needs to go into thinking about the people and the process side of the equation, meaning making it easier for domain experts to both gain insights for data, trust the data, and begin to use that data in new ways, fueling data, products, monetization and insights data citizens 2022 is back and we're pleased to have Felix Van Dema, who is the founder and CEO of Collibra. He's on the cube or excited to have you, Felix. Good to see you again. >>Likewise Dave. Thanks for having me again. >>You bet. All right, we're gonna get the update from Felix on the current data landscape, how he sees it, why data intelligence is more important now than ever and get current on what Collibra has been up to over the past year and what's changed since Data Citizens 2021. And we may even touch on some of the product news. So Felix, we're living in a very different world today with businesses and consumers. They're struggling with things like supply chains, uncertain economic trends, and we're not just snapping back to the 2010s. That's clear, and that's really true as well in the world of data. So what's different in your mind, in the data landscape of the 2020s from the previous decade, and what challenges does that bring for your customers? >>Yeah, absolutely. And, and I think you said it well, Dave, and and the intro that that rising complexity and fragmentation in the broader data landscape, that hasn't gotten any better over the last couple of years. When when we talk to our customers, that level of fragmentation, the complexity, how do we find data that we can trust, that we know we can use has only gotten kinda more, more difficult. So that trend that's continuing, I think what is changing is that trend has become much more acute. Well, the other thing we've seen over the last couple of years is that the level of scrutiny that organizations are under respect to data, as data becomes more mission critical, as data becomes more impactful than important, the level of scrutiny with respect to privacy, security, regulatory compliance, as only increasing as well, which again, is really difficult in this environment of continuous innovation, continuous change, continuous growing complexity and fragmentation. >>So it's become much more acute. And, and to your earlier point, we do live in a different world and and the the past couple of years we could probably just kind of brute for it, right? We could focus on, on the top line. There was enough kind of investments to be, to be had. I think nowadays organizations are focused or are, are, are, are, are, are in a very different environment where there's much more focus on cost control, productivity, efficiency, How do we truly get value from that data? So again, I think it just another incentive for organization to now truly look at data and to scale it data, not just from a a technology and infrastructure perspective, but how do you actually scale data from an organizational perspective, right? You said at the the people and process, how do we do that at scale? And that's only, only only becoming much more important. And we do believe that the, the economic environment that we find ourselves in today is gonna be catalyst for organizations to really dig out more seriously if, if, if, if you will, than they maybe have in the have in the best. >>You know, I don't know when you guys founded Collibra, if, if you had a sense as to how complicated it was gonna get, but you've been on a mission to really address these problems from the beginning. How would you describe your, your, your mission and what are you doing to address these challenges? >>Yeah, absolutely. We, we started Colli in 2008. So in some sense and the, the last kind of financial crisis, and that was really the, the start of Colli where we found product market fit, working with large finance institutions to help them cope with the increasing compliance requirements that they were faced with because of the, of the financial crisis and kind of here we are again in a very different environment, of course 15 years, almost 15 years later. But data only becoming more important. But our mission to deliver trusted data for every user, every use case and across every source, frankly, has only become more important. So what has been an incredible journey over the last 14, 15 years, I think we're still relatively early in our mission to again, be able to provide everyone, and that's why we call it data citizens. We truly believe that everyone in the organization should be able to use trusted data in an easy, easy matter. That mission is is only becoming more important, more relevant. We definitely have a lot more work ahead of us because we are still relatively early in that, in that journey. >>Well, that's interesting because, you know, in my observation it takes seven to 10 years to actually build a company and then the fact that you're still in the early days is kind of interesting. I mean, you, Collibra's had a good 12 months or so since we last spoke at Data Citizens. Give us the latest update on your business. What do people need to know about your, your current momentum? >>Yeah, absolutely. Again, there's, there's a lot of tail organizations that are only maturing the data practices and we've seen it kind of transform or, or, or influence a lot of our business growth that we've seen, broader adoption of the platform. We work at some of the largest organizations in the world where it's Adobe, Heineken, Bank of America, and many more. We have now over 600 enterprise customers, all industry leaders and every single vertical. So it's, it's really exciting to see that and continue to partner with those organizations. On the partnership side, again, a lot of momentum in the org in, in the, in the markets with some of the cloud partners like Google, Amazon, Snowflake, data bricks and, and others, right? As those kind of new modern data infrastructures, modern data architectures that are definitely all moving to the cloud, a great opportunity for us, our partners and of course our customers to help them kind of transition to the cloud even faster. >>And so we see a lot of excitement and momentum there within an acquisition about 18 months ago around data quality, data observability, which we believe is an enormous opportunity. Of course, data quality isn't new, but I think there's a lot of reasons why we're so excited about quality and observability now. One is around leveraging ai, machine learning, again to drive more automation. And the second is that those data pipelines that are now being created in the cloud, in these modern data architecture arch architectures, they've become mission critical. They've become real time. And so monitoring, observing those data pipelines continuously has become absolutely critical so that they're really excited about about that as well. And on the organizational side, I'm sure you've heard a term around kind of data mesh, something that's gaining a lot of momentum, rightfully so. It's really the type of governance that we always believe. Then federated focused on domains, giving a lot of ownership to different teams. I think that's the way to scale data organizations. And so that aligns really well with our vision and, and from a product perspective, we've seen a lot of momentum with our customers there as well. >>Yeah, you know, a couple things there. I mean, the acquisition of i l dq, you know, Kirk Hasselbeck and, and their team, it's interesting, you know, the whole data quality used to be this back office function and, and really confined to highly regulated industries. It's come to the front office, it's top of mind for chief data officers, data mesh. You mentioned you guys are a connective tissue for all these different nodes on the data mesh. That's key. And of course we see you at all the shows. You're, you're a critical part of many ecosystems and you're developing your own ecosystem. So let's chat a little bit about the, the products. We're gonna go deeper in into products later on at, at Data Citizens 22, but we know you're debuting some, some new innovations, you know, whether it's, you know, the, the the under the covers in security, sort of making data more accessible for people just dealing with workflows and processes as you talked about earlier. Tell us a little bit about what you're introducing. >>Yeah, absolutely. We're super excited, a ton of innovation. And if we think about the big theme and like, like I said, we're still relatively early in this, in this journey towards kind of that mission of data intelligence that really bolts and compelling mission, either customers are still start, are just starting on that, on that journey. We wanna make it as easy as possible for the, for our organization to actually get started because we know that's important that they do. And for our organization and customers that have been with us for some time, there's still a tremendous amount of opportunity to kind of expand the platform further. And again, to make it easier for really to, to accomplish that mission and vision around that data citizen that everyone has access to trustworthy data in a very easy, easy way. So that's really the theme of a lot of the innovation that we're driving. >>A lot of kind of ease of adoption, ease of use, but also then how do we make sure that lio becomes this kind of mission critical enterprise platform from a security performance architecture scale supportability that we're truly able to deliver that kind of an enterprise mission critical platform. And so that's the big theme from an innovation perspective, From a product perspective, a lot of new innovation that we're really excited about. A couple of highlights. One is around data marketplace. Again, a lot of our customers have plans in that direction, how to make it easy. How do we make, how do we make available to true kind of shopping experience that anybody in your organization can, in a very easy search first way, find the right data product, find the right dataset, that data can then consume usage analytics. How do you, how do we help organizations drive adoption, tell them where they're working really well and where they have opportunities homepages again to, to make things easy for, for people, for anyone in your organization to kind of get started with ppia, you mentioned workflow designer, again, we have a very powerful enterprise platform. >>One of our key differentiators is the ability to really drive a lot of automation through workflows. And now we provided a new low code, no code kind of workflow designer experience. So, so really customers can take it to the next level. There's a lot more new product around K Bear Protect, which in partnership with Snowflake, which has been a strategic investor in kib, focused on how do we make access governance easier? How do we, how do we, how are we able to make sure that as you move to the cloud, things like access management, masking around sensitive data, PII data is managed as much more effective, effective rate, really excited about that product. There's more around data quality. Again, how do we, how do we get that deployed as easily and quickly and widely as we can? Moving that to the cloud has been a big part of our strategy. >>So we launch more data quality cloud product as well as making use of those, those native compute capabilities in platforms like Snowflake, Data, Bricks, Google, Amazon, and others. And so we are bettering a capability, a capability that we call push down. So actually pushing down the computer and data quality, the monitoring into the underlying platform, which again, from a scale performance and ease of use perspective is gonna make a massive difference. And then more broadly, we, we talked a little bit about the ecosystem. Again, integrations, we talk about being able to connect to every source. Integrations are absolutely critical and we're really excited to deliver new integrations with Snowflake, Azure and Google Cloud storage as well. So there's a lot coming out. The, the team has been work at work really hard and we are really, really excited about what we are coming, what we're bringing to markets. >>Yeah, a lot going on there. I wonder if you could give us your, your closing thoughts. I mean, you, you talked about, you know, the marketplace, you know, you think about data mesh, you think of data as product, one of the key principles you think about monetization. This is really different than what we've been used to in data, which is just getting the technology to work has been been so hard. So how do you see sort of the future and, you know, give us the, your closing thoughts please? >>Yeah, absolutely. And I, and I think we we're really at this pivotal moment, and I think you said it well. We, we all know the constraint and the challenges with data, how to actually do data at scale. And while we've seen a ton of innovation on the infrastructure side, we fundamentally believe that just getting a faster database is important, but it's not gonna fully solve the challenges and truly kind of deliver on the opportunity. And that's why now is really the time to deliver this data intelligence vision, this data intelligence platform. We are still early, making it as easy as we can. It's kind of, of our, it's our mission. And so I'm really, really excited to see what we, what we are gonna, how the marks gonna evolve over the next, next few quarters and years. I think the trend is clearly there when we talk about data mesh, this kind of federated approach folks on data products is just another signal that we believe that a lot of our organization are now at the time. >>The understanding need to go beyond just the technology. I really, really think about how do we actually scale data as a business function, just like we've done with it, with, with hr, with, with sales and marketing, with finance. That's how we need to think about data. I think now is the time given the economic environment that we are in much more focus on control, much more focused on productivity efficiency and now's the time. We need to look beyond just the technology and infrastructure to think of how to scale data, how to manage data at scale. >>Yeah, it's a new era. The next 10 years of data won't be like the last, as I always say. Felix, thanks so much and good luck in, in San Diego. I know you're gonna crush it out there. >>Thank you Dave. >>Yeah, it's a great spot for an in-person event and, and of course the content post event is gonna be available@collibra.com and you can of course catch the cube coverage@thecube.net and all the news@siliconangle.com. This is Dave Valante for the cube, your leader in enterprise and emerging tech coverage. >>Hi, I'm Jay from Collibra's Data Office. Today I want to talk to you about Collibra's data intelligence cloud. We often say Collibra is a single system of engagement for all of your data. Now, when I say data, I mean data in the broadest sense of the word, including reference and metadata. Think of metrics, reports, APIs, systems, policies, and even business processes that produce or consume data. Now, the beauty of this platform is that it ensures all of your users have an easy way to find, understand, trust, and access data. But how do you get started? Well, here are seven steps to help you get going. One, start with the data. What's data intelligence? Without data leverage the Collibra data catalog to automatically profile and classify your enterprise data wherever that data lives, databases, data lakes or data warehouses, whether on the cloud or on premise. >>Two, you'll then wanna organize the data and you'll do that with data communities. This can be by department, find a business or functional team, however your organization organizes work and accountability. And for that you'll establish community owners, communities, make it easy for people to navigate through the platform, find the data and will help create a sense of belonging for users. An important and related side note here, we find it's typical in many organizations that data is thought of is just an asset and IT and data offices are viewed as the owners of it and who are really the central teams performing analytics as a service provider to the enterprise. We believe data is more than an asset, it's a true product that can be converted to value. And that also means establishing business ownership of data where that strategy and ROI come together with subject matter expertise. >>Okay, three. Next, back to those communities there, the data owners should explain and define their data, not just the tables and columns, but also the related business terms, metrics and KPIs. These objects we call these assets are typically organized into business glossaries and data dictionaries. I definitely recommend starting with the topics that are most important to the business. Four, those steps that enable you and your users to have some fun with it. Linking everything together builds your knowledge graph and also known as a metadata graph by linking or relating these assets together. For example, a data set to a KPI to a report now enables your users to see what we call the lineage diagram that visualizes where the data in your dashboards actually came from and what the data means and who's responsible for it. Speaking of which, here's five. Leverage the calibra trusted business reporting solution on the marketplace, which comes with workflows for those owners to certify their reports, KPIs, and data sets. >>This helps them force their trust in their data. Six, easy to navigate dashboards or landing pages right in your platform for your company's business processes are the most effective way for everyone to better understand and take action on data. Here's a pro tip, use the dashboard design kit on the marketplace to help you build compelling dashboards. Finally, seven, promote the value of this to your users and be sure to schedule enablement office hours and new employee onboarding sessions to get folks excited about what you've built and implemented. Better yet, invite all of those community and data owners to these sessions so that they can show off the value that they've created. Those are my seven tips to get going with Collibra. I hope these have been useful. For more information, be sure to visit collibra.com. >>Welcome to the Cube's coverage of Data Citizens 2022 Collibra's customer event. My name is Dave Valante. With us is Kirk Hasselbeck, who's the vice president of Data Quality of Collibra Kirk, good to see you. Welcome. >>Thanks for having me, Dave. Excited to be here. >>You bet. Okay, we're gonna discuss data quality observability. It's a hot trend right now. You founded a data quality company, OWL dq, and it was acquired by Collibra last year. Congratulations. And now you lead data quality at Collibra. So we're hearing a lot about data quality right now. Why is it such a priority? Take us through your thoughts on that. >>Yeah, absolutely. It's, it's definitely exciting times for data quality, which you're right, has been around for a long time. So why now and why is it so much more exciting than it used to be? I think it's a bit stale, but we all know that companies use more data than ever before and the variety has changed and the volume has grown. And, and while I think that remains true, there are a couple other hidden factors at play that everyone's so interested in as, as to why this is becoming so important now. And, and I guess you could kind of break this down simply and think about if Dave, you and I were gonna build, you know, a new healthcare application and monitor the heartbeat of individuals, imagine if we get that wrong, you know, what the ramifications could be, what, what those incidents would look like, or maybe better yet, we try to build a, a new trading algorithm with a crossover strategy where the 50 day crosses the, the 10 day average. >>And imagine if the data underlying the inputs to that is incorrect. We will probably have major financial ramifications in that sense. So, you know, it kind of starts there where everybody's realizing that we're all data companies and if we are using bad data, we're likely making incorrect business decisions. But I think there's kind of two other things at play. You know, I, I bought a car not too long ago and my dad called and said, How many cylinders does it have? And I realized in that moment, you know, I might have failed him because, cause I didn't know. And, and I used to ask those types of questions about any lock brakes and cylinders and, and you know, if it's manual or, or automatic and, and I realized I now just buy a car that I hope works. And it's so complicated with all the computer chips, I, I really don't know that much about it. >>And, and that's what's happening with data. We're just loading so much of it. And it's so complex that the way companies consume them in the IT function is that they bring in a lot of data and then they syndicate it out to the business. And it turns out that the, the individuals loading and consuming all of this data for the company actually may not know that much about the data itself, and that's not even their job anymore. So we'll talk more about that in a minute, but that's really what's setting the foreground for this observability play and why everybody's so interested. It, it's because we're becoming less close to the intricacies of the data and we just expect it to always be there and be correct. >>You know, the other thing too about data quality, and for years we did the MIT CDO IQ event, we didn't do it last year, Covid messed everything up. But the observation I would make there thoughts is, is it data quality? Used to be information quality used to be this back office function, and then it became sort of front office with financial services and government and healthcare, these highly regulated industries. And then the whole chief data officer thing happened and people were realizing, well, they sort of flipped the bit from sort of a data as a, a risk to data as a, as an asset. And now as we say, we're gonna talk about observability. And so it's really become front and center just the whole quality issue because data's so fundamental, hasn't it? >>Yeah, absolutely. I mean, let's imagine we pull up our phones right now and I go to my, my favorite stock ticker app and I check out the NASDAQ market cap. I really have no idea if that's the correct number. I know it's a number, it looks large, it's in a numeric field. And, and that's kind of what's going on. There's, there's so many numbers and they're coming from all of these different sources and data providers and they're getting consumed and passed along. But there isn't really a way to tactically put controls on every number and metric across every field we plan to monitor, but with the scale that we've achieved in early days, even before calibra. And what's been so exciting is we have these types of observation techniques, these data monitors that can actually track past performance of every field at scale. And why that's so interesting and why I think the CDO is, is listening right intently nowadays to this topic is, so maybe we could surface all of these problems with the right solution of data observability and with the right scale and then just be alerted on breaking trends. So we're sort of shifting away from this world of must write a condition and then when that condition breaks, that was always known as a break record. But what about breaking trends and root cause analysis? And is it possible to do that, you know, with less human intervention? And so I think most people are seeing now that it's going to have to be a software tool and a computer system. It's, it's not ever going to be based on one or two domain experts anymore. >>So, So how does data observability relate to data quality? Are they sort of two sides of the same coin? Are they, are they cousins? What's your perspective on that? >>Yeah, it's, it's super interesting. It's an emerging market. So the language is changing a lot of the topic and areas changing the way that I like to say it or break it down because the, the lingo is constantly moving is, you know, as a target on this space is really breaking records versus breaking trends. And I could write a condition when this thing happens, it's wrong and when it doesn't it's correct. Or I could look for a trend and I'll give you a good example. You know, everybody's talking about fresh data and stale data and, and why would that matter? Well, if your data never arrived or only part of it arrived or didn't arrive on time, it's likely stale and there will not be a condition that you could write that would show you all the good in the bads. That was kind of your, your traditional approach of data quality break records. But your modern day approach is you lost a significant portion of your data, or it did not arrive on time to make that decision accurately on time. And that's a hidden concern. Some people call this freshness, we call it stale data, but it all points to the same idea of the thing that you're observing may not be a data quality condition anymore. It may be a breakdown in the data pipeline. And with thousands of data pipelines in play for every company out there there, there's more than a couple of these happening every day. >>So what's the Collibra angle on all this stuff made the acquisition, you got data quality observability coming together, you guys have a lot of expertise in, in this area, but you hear providence of data, you just talked about, you know, stale data, you know, the, the whole trend toward real time. How is Calibra approaching the problem and what's unique about your approach? >>Well, I think where we're fortunate is with our background, myself and team, we sort of lived this problem for a long time, you know, in, in the Wall Street days about a decade ago. And we saw it from many different angles. And what we came up with before it was called data observability or reliability was basically the, the underpinnings of that. So we're a little bit ahead of the curve there when most people evaluate our solution, it's more advanced than some of the observation techniques that that currently exist. But we've also always covered data quality and we believe that people want to know more, they need more insights, and they want to see break records and breaking trends together so they can correlate the root cause. And we hear that all the time. I have so many things going wrong, just show me the big picture, help me find the thing that if I were to fix it today would make the most impact. So we're really focused on root cause analysis, business impact, connecting it with lineage and catalog metadata. And as that grows, you can actually achieve total data governance at this point with the acquisition of what was a Lineage company years ago, and then my company Ldq now Collibra, Data quality Collibra may be the best positioned for total data governance and intelligence in the space. >>Well, you mentioned financial services a couple of times and some examples, remember the flash crash in 2010. Nobody had any idea what that was, you know, they just said, Oh, it's a glitch, you know, so they didn't understand the root cause of it. So this is a really interesting topic to me. So we know at Data Citizens 22 that you're announcing, you gotta announce new products, right? You're yearly event what's, what's new. Give us a sense as to what products are coming out, but specifically around data quality and observability. >>Absolutely. There's this, you know, there's always a next thing on the forefront. And the one right now is these hyperscalers in the cloud. So you have databases like Snowflake and Big Query and Data Bricks is Delta Lake and SQL Pushdown. And ultimately what that means is a lot of people are storing in loading data even faster in a SaaS like model. And we've started to hook in to these databases. And while we've always worked with the the same databases in the past, they're supported today we're doing something called Native Database pushdown, where the entire compute and data activity happens in the database. And why that is so interesting and powerful now is everyone's concerned with something called Egress. Did your, my data that I've spent all this time and money with my security team securing ever leave my hands, did it ever leave my secure VPC as they call it? >>And with these native integrations that we're building and about to unveil, here's kind of a sneak peek for, for next week at Data Citizens. We're now doing all compute and data operations in databases like Snowflake. And what that means is with no install and no configuration, you could log into the Collibra data quality app and have all of your data quality running inside the database that you've probably already picked as your your go forward team selection secured database of choice. So we're really excited about that. And I think if you look at the whole landscape of network cost, egress, cost, data storage and compute, what people are realizing is it's extremely efficient to do it in the way that we're about to release here next week. >>So this is interesting because what you just described, you know, you mentioned Snowflake, you mentioned Google, Oh actually you mentioned yeah, data bricks. You know, Snowflake has the data cloud. If you put everything in the data cloud, okay, you're cool, but then Google's got the open data cloud. If you heard, you know, Google next and now data bricks doesn't call it the data cloud, but they have like the open source data cloud. So you have all these different approaches and there's really no way up until now I'm, I'm hearing to, to really understand the relationships between all those and have confidence across, you know, it's like Jak Dani, you should just be a note on the mesh. And I don't care if it's a data warehouse or a data lake or where it comes from, but it's a point on that mesh and I need tooling to be able to have confidence that my data is governed and has the proper lineage, providence. And, and, and that's what you're bringing to the table, Is that right? Did I get that right? >>Yeah, that's right. And it's, for us, it's, it's not that we haven't been working with those great cloud databases, but it's the fact that we can send them the instructions now, we can send them the, the operating ability to crunch all of the calculations, the governance, the quality, and get the answers. And what that's doing, it's basically zero network costs, zero egress cost, zero latency of time. And so when you were to log into Big Query tomorrow using our tool or like, or say Snowflake for example, you have instant data quality metrics, instant profiling, instant lineage and access privacy controls, things of that nature that just become less onerous. What we're seeing is there's so much technology out there, just like all of the major brands that you mentioned, but how do we make it easier? The future is about less clicks, faster time to value, faster scale, and eventually lower cost. And, and we think that this positions us to be the leader there. >>I love this example because, you know, Barry talks about, wow, the cloud guys are gonna own the world and, and of course now we're seeing that the ecosystem is finding so much white space to add value, connect across cloud. Sometimes we call it super cloud and so, or inter clouding. All right, Kirk, give us your, your final thoughts and on on the trends that we've talked about and Data Citizens 22. >>Absolutely. Well, I think, you know, one big trend is discovery and classification. Seeing that across the board, people used to know it was a zip code and nowadays with the amount of data that's out there, they wanna know where everything is, where their sensitive data is. If it's redundant, tell me everything inside of three to five seconds. And with that comes, they want to know in all of these hyperscale databases how fast they can get controls and insights out of their tools. So I think we're gonna see more one click solutions, more SAS based solutions and solutions that hopefully prove faster time to value on, on all of these modern cloud platforms. >>Excellent. All right, Kurt Hasselbeck, thanks so much for coming on the Cube and previewing Data Citizens 22. Appreciate it. >>Thanks for having me, Dave. >>You're welcome. Right, and thank you for watching. Keep it right there for more coverage from the Cube. Welcome to the Cube's virtual Coverage of Data Citizens 2022. My name is Dave Valante and I'm here with Laura Sellers, who's the Chief Product Officer at Collibra, the host of Data Citizens. Laura, welcome. Good to see you. >>Thank you. Nice to be here. >>Yeah, your keynote at Data Citizens this year focused on, you know, your mission to drive ease of use and scale. Now when I think about historically fast access to the right data at the right time in a form that's really easily consumable, it's been kind of challenging, especially for business users. Can can you explain to our audience why this matters so much and what's actually different today in the data ecosystem to make this a reality? >>Yeah, definitely. So I think what we really need and what I hear from customers every single day is that we need a new approach to data management and our product teams. What inspired me to come to Calibra a little bit a over a year ago was really the fact that they're very focused on bringing trusted data to more users across more sources for more use cases. And so as we look at what we're announcing with these innovations of ease of use and scale, it's really about making teams more productive in getting started with and the ability to manage data across the entire organization. So we've been very focused on richer experiences, a broader ecosystem of partners, as well as a platform that delivers performance, scale and security that our users and teams need and demand. So as we look at, Oh, go ahead. >>I was gonna say, you know, when I look back at like the last 10 years, it was all about getting the technology to work and it was just so complicated. But, but please carry on. I'd love to hear more about this. >>Yeah, I, I really, you know, Collibra is a system of engagement for data and we really are working on bringing that entire system of engagement to life for everyone to leverage here and now. So what we're announcing from our ease of use side of the world is first our data marketplace. This is the ability for all users to discover and access data quickly and easily shop for it, if you will. The next thing that we're also introducing is the new homepage. It's really about the ability to drive adoption and have users find data more quickly. And then the two more areas of the ease of use side of the world is our world of usage analytics. And one of the big pushes and passions we have at Collibra is to help with this data driven culture that all companies are trying to create. And also helping with data literacy, with something like usage analytics, it's really about driving adoption of the CLE platform, understanding what's working, who's accessing it, what's not. And then finally we're also introducing what's called workflow designer. And we love our workflows at Libra, it's a big differentiator to be able to automate business processes. The designer is really about a way for more people to be able to create those workflows, collaborate on those workflow flows, as well as people to be able to easily interact with them. So a lot of exciting things when it comes to ease of use to make it easier for all users to find data. >>Y yes, there's definitely a lot to unpack there. I I, you know, you mentioned this idea of, of of, of shopping for the data. That's interesting to me. Why this analogy, metaphor or analogy, I always get those confused. I let's go with analogy. Why is it so important to data consumers? >>I think when you look at the world of data, and I talked about this system of engagement, it's really about making it more accessible to the masses. And what users are used to is a shopping experience like your Amazon, if you will. And so having a consumer grade experience where users can quickly go in and find the data, trust that data, understand where the data's coming from, and then be able to quickly access it, is the idea of being able to shop for it, just making it as simple as possible and really speeding the time to value for any of the business analysts, data analysts out there. >>Yeah, I think when you, you, you see a lot of discussion about rethinking data architectures, putting data in the hands of the users and business people, decentralized data and of course that's awesome. I love that. But of course then you have to have self-service infrastructure and you have to have governance. And those are really challenging. And I think so many organizations, they're facing adoption challenges, you know, when it comes to enabling teams generally, especially domain experts to adopt new data technologies, you know, like the, the tech comes fast and furious. You got all these open source projects and get really confusing. Of course it risks security, governance and all that good stuff. You got all this jargon. So where do you see, you know, the friction in adopting new data technologies? What's your point of view and how can organizations overcome these challenges? >>You're, you're dead on. There's so much technology and there's so much to stay on top of, which is part of the friction, right? It's just being able to stay ahead of, of and understand all the technologies that are coming. You also look at as there's so many more sources of data and people are migrating data to the cloud and they're migrating to new sources. Where the friction comes is really that ability to understand where the data came from, where it's moving to, and then also to be able to put the access controls on top of it. So people are only getting access to the data that they should be getting access to. So one of the other things we're announcing with, with all of the innovations that are coming is what we're doing around performance and scale. So with all of the data movement, with all of the data that's out there, the first thing we're launching in the world of performance and scale is our world of data quality. >>It's something that Collibra has been working on for the past year and a half, but we're launching the ability to have data quality in the cloud. So it's currently an on-premise offering, but we'll now be able to carry that over into the cloud for us to manage that way. We're also introducing the ability to push down data quality into Snowflake. So this is, again, one of those challenges is making sure that that data that you have is d is is high quality as you move forward. And so really another, we're just reducing friction. You already have Snowflake stood up. It's not another machine for you to manage, it's just push down capabilities into Snowflake to be able to track that quality. Another thing that we're launching with that is what we call Collibra Protect. And this is that ability for users to be able to ingest metadata, understand where the PII data is, and then set policies up on top of it. So very quickly be able to set policies and have them enforced at the data level. So anybody in the organization is only getting access to the data they should have access to. >>Here's Topica data quality is interesting. It's something that I've followed for a number of years. It used to be a back office function, you know, and really confined only to highly regulated industries like financial services and healthcare and government. You know, you look back over a decade ago, you didn't have this worry about personal information, g gdpr, and, you know, California Consumer Privacy Act all becomes, becomes so much important. The cloud is really changed things in terms of performance and scale and of course partnering for, for, with Snowflake it's all about sharing data and monetization, anything but a back office function. So it was kind of smart that you guys were early on and of course attracting them and as a, as an investor as well was very strong validation. What can you tell us about the nature of the relationship with Snowflake and specifically inter interested in sort of joint engineering or, and product innovation efforts, you know, beyond the standard go to market stuff? >>Definitely. So you mentioned there were a strategic investor in Calibra about a year ago. A little less than that I guess. We've been working with them though for over a year really tightly with their product and engineering teams to make sure that Collibra is adding real value. Our unified platform is touching pieces of our unified platform or touching all pieces of Snowflake. And when I say that, what I mean is we're first, you know, able to ingest data with Snowflake, which, which has always existed. We're able to profile and classify that data we're announcing with Calibra Protect this week that you're now able to create those policies on top of Snowflake and have them enforce. So again, people can get more value out of their snowflake more quickly as far as time to value with, with our policies for all business users to be able to create. >>We're also announcing Snowflake Lineage 2.0. So this is the ability to take stored procedures in Snowflake and understand the lineage of where did the data come from, how was it transformed with within Snowflake as well as the data quality. Pushdown, as I mentioned, data quality, you brought it up. It is a new, it is a, a big industry push and you know, one of the things I think Gartner mentioned is people are losing up to $15 million without having great data quality. So this push down capability for Snowflake really is again, a big ease of use push for us at Collibra of that ability to, to push it into snowflake, take advantage of the data, the data source, and the engine that already lives there and get the right and make sure you have the right quality. >>I mean, the nice thing about Snowflake, if you play in the Snowflake sandbox, you, you, you, you can get sort of a, you know, high degree of confidence that the data sharing can be done in a safe way. Bringing, you know, Collibra into the, into the story allows me to have that data quality and, and that governance that I, that I need. You know, we've said many times on the cube that one of the notable differences in cloud this decade versus last decade, I mean ob there are obvious differences just in terms of scale and scope, but it's shaping up to be about the strength of the ecosystems. That's really a hallmark of these big cloud players. I mean they're, it's a key factor for innovating, accelerating product delivery, filling gaps in, in the hyperscale offerings cuz you got more stack, you know, mature stack capabilities and you know, it creates this flywheel momentum as we often say. But, so my question is, how do you work with the hyperscalers? Like whether it's AWS or Google, whomever, and what do you see as your role and what's the Collibra sweet spot? >>Yeah, definitely. So, you know, one of the things I mentioned early on is the broader ecosystem of partners is what it's all about. And so we have that strong partnership with Snowflake. We also are doing more with Google around, you know, GCP and kbra protect there, but also tighter data plex integration. So similar to what you've seen with our strategic moves around Snowflake and, and really covering the broad ecosystem of what Collibra can do on top of that data source. We're extending that to the world of Google as well and the world of data plex. We also have great partners in SI's Infosys is somebody we spoke with at the conference who's done a lot of great work with Levi's as they're really important to help people with their whole data strategy and driving that data driven culture and, and Collibra being the core of it. >>Hi Laura, we're gonna, we're gonna end it there, but I wonder if you could kind of put a bow on, you know, this year, the event your, your perspectives. So just give us your closing thoughts. >>Yeah, definitely. So I, I wanna say this is one of the biggest releases Collibra's ever had. Definitely the biggest one since I've been with the company a little over a year. We have all these great new product innovations coming to really drive the ease of use to make data more valuable for users everywhere and, and companies everywhere. And so it's all about everybody being able to easily find, understand, and trust and get access to that data going forward. >>Well congratulations on all the pro progress. It was great to have you on the cube first time I believe, and really appreciate you, you taking the time with us. >>Yes, thank you for your time. >>You're very welcome. Okay, you're watching the coverage of Data Citizens 2022 on the cube, your leader in enterprise and emerging tech coverage. >>So data modernization oftentimes means moving some of your storage and computer to the cloud where you get the benefit of scale and security and so on. But ultimately it doesn't take away the silos that you have. We have more locations, more tools and more processes with which we try to get value from this data. To do that at scale in an organization, people involved in this process, they have to understand each other. So you need to unite those people across those tools, processes, and systems with a shared language. When I say customer, do you understand the same thing as you hearing customer? Are we counting them in the same way so that shared language unites us and that gives the opportunity for the organization as a whole to get the maximum value out of their data assets and then they can democratize data so everyone can properly use that shared language to find, understand, and trust the data asset that's available. >>And that's where Collibra comes in. We provide a centralized system of engagement that works across all of those locations and combines all of those different user types across the whole business. At Collibra, we say United by data and that also means that we're united by data with our customers. So here is some data about some of our customers. There was the case of an online do it yourself platform who grew their revenue almost three times from a marketing campaign that provided the right product in the right hands of the right people. In other case that comes to mind is from a financial services organization who saved over 800 K every year because they were able to reuse the same data in different kinds of reports and before there was spread out over different tools and processes and silos, and now the platform brought them together so they realized, oh, we're actually using the same data, let's find a way to make this more efficient. And the last example that comes to mind is that of a large home loan, home mortgage, mortgage loan provider where they have a very complex landscape, a very complex architecture legacy in the cloud, et cetera. And they're using our software, they're using our platform to unite all the people and those processes and tools to get a common view of data to manage their compliance at scale. >>Hey everyone, I'm Lisa Martin covering Data Citizens 22, brought to you by Collibra. This next conversation is gonna focus on the importance of data culture. One of our Cube alumni is back, Stan Christians is Collibra's co-founder and it's Chief Data citizens. Stan, it's great to have you back on the cube. >>Hey Lisa, nice to be. >>So we're gonna be talking about the importance of data culture, data intelligence, maturity, all those great things. When we think about the data revolution that every business is going through, you know, it's so much more than technology innovation. It also really re requires cultural transformation, community transformation. Those are challenging for customers to undertake. Talk to us about what you mean by data citizenship and the role that creating a data culture plays in that journey. >>Right. So as you know, our event is called Data Citizens because we believe that in the end, a data citizen is anyone who uses data to do their job. And we believe that today's organizations, you have a lot of people, most of the employees in an organization are somehow gonna to be a data citizen, right? So you need to make sure that these people are aware of it. You need that. People have skills and competencies to do with data what necessary and that's on, all right? So what does it mean to have a good data culture? It means that if you're building a beautiful dashboard to try and convince your boss, we need to make this decision that your boss is also open to and able to interpret, you know, the data presented in dashboard to actually make that decision and take that action. Right? >>And once you have that why to the organization, that's when you have a good data culture. Now that's continuous effort for most organizations because they're always moving, somehow they're hiring new people and it has to be continuous effort because we've seen that on the hand. Organizations continue challenged their data sources and where all the data is flowing, right? Which in itself creates a lot of risk. But also on the other set hand of the equation, you have the benefit. You know, you might look at regulatory drivers like, we have to do this, right? But it's, it's much better right now to consider the competitive drivers, for example, and we did an IDC study earlier this year, quite interesting. I can recommend anyone to it. And one of the conclusions they found as they surveyed over a thousand people across organizations worldwide is that the ones who are higher in maturity. >>So the, the organizations that really look at data as an asset, look at data as a product and actively try to be better at it, don't have three times as good a business outcome as the ones who are lower on the maturity scale, right? So you can say, ok, I'm doing this, you know, data culture for everyone, awakening them up as data citizens. I'm doing this for competitive reasons, I'm doing this re reasons you're trying to bring both of those together and the ones that get data intelligence right, are successful and competitive. That's, and that's what we're seeing out there in the market. >>Absolutely. We know that just generally stand right, the organizations that are, are really creating a, a data culture and enabling everybody within the organization to become data citizens are, We know that in theory they're more competitive, they're more successful. But the IDC study that you just mentioned demonstrates they're three times more successful and competitive than their peers. Talk about how Collibra advises customers to create that community, that culture of data when it might be challenging for an organization to adapt culturally. >>Of course, of course it's difficult for an organization to adapt but it's also necessary, as you just said, imagine that, you know, you're a modern day organization, laptops, what have you, you're not using those, right? Or you know, you're delivering them throughout organization, but not enabling your colleagues to actually do something with that asset. Same thing as through with data today, right? If you're not properly using the data asset and competitors are, they're gonna to get more advantage. So as to how you get this done, establish this. There's angles to look at, Lisa. So one angle is obviously the leadership whereby whoever is the boss of data in the organization, you typically have multiple bosses there, like achieve data officers. Sometimes there's, there's multiple, but they may have a different title, right? So I'm just gonna summarize it as a data leader for a second. >>So whoever that is, they need to make sure that there's a clear vision, a clear strategy for data. And that strategy needs to include the monetization aspect. How are you going to get value from data? Yes. Now that's one part because then you can leadership in the organization and also the business value. And that's important. Cause those people, their job in essence really is to make everyone in the organization think about data as an asset. And I think that's the second part of the equation of getting that right, is it's not enough to just have that leadership out there, but you also have to get the hearts and minds of the data champions across the organization. You, I really have to win them over. And if you have those two combined and obviously a good technology to, you know, connect those people and have them execute on their responsibilities such as a data intelligence platform like s then the in place to really start upgrading that culture inch by inch if you'll, >>Yes, I like that. The recipe for success. So you are the co-founder of Collibra. You've worn many different hats along this journey. Now you're building Collibra's own data office. I like how before we went live, we were talking about Calibra is drinking its own champagne. I always loved to hear stories about that. You're speaking at Data Citizens 2022. Talk to us about how you are building a data culture within Collibra and what maybe some of the specific projects are that Collibra's data office is working on. >>Yes, and it is indeed data citizens. There are a ton of speaks here, are very excited. You know, we have Barb from m MIT speaking about data monetization. We have Dilla at the last minute. So really exciting agen agenda. Can't wait to get back out there essentially. So over the years at, we've doing this since two and eight, so a good years and I think we have another decade of work ahead in the market, just to be very clear. Data is here to stick around as are we. And myself, you know, when you start a company, we were for people in a, if you, so everybody's wearing all sorts of hat at time. But over the years I've run, you know, presales that sales partnerships, product cetera. And as our company got a little bit biggish, we're now thousand two. Something like people in the company. >>I believe systems and processes become a lot important. So we said you CBRA isn't the size our customers we're getting there in of organization structure, process systems, et cetera. So we said it's really time for us to put our money where is and to our own data office, which is what we were seeing customers', organizations worldwide. And they organizations have HR units, they have a finance unit and over time they'll all have a department if you'll, that is responsible somehow for the data. So we said, ok, let's try to set an examples that other people can take away with it, right? Can take away from it. So we set up a data strategy, we started building data products, took care of the data infrastructure. That's sort of good stuff. And in doing all of that, ISA exactly as you said, we said, okay, we need to also use our product and our own practices and from that use, learn how we can make the product better, learn how we make, can make the practice better and share that learning with all the, and on, on the Monday mornings, we sometimes refer to eating our dog foods on Friday evenings. >>We referred to that drinking our own champagne. I like it. So we, we had a, we had the driver to do this. You know, there's a clear business reason. So we involved, we included that in the data strategy and that's a little bit of our origin. Now how, how do we organize this? We have three pillars, and by no means is this a template that everyone should, this is just the organization that works at our company, but it can serve as an inspiration. So we have a pillar, which is data science. The data product builders, if you'll or the people who help the business build data products. We have the data engineers who help keep the lights on for that data platform to make sure that the products, the data products can run, the data can flow and you know, the quality can be checked. >>And then we have a data intelligence or data governance builders where we have those data governance, data intelligence stakeholders who help the business as a sort of data partner to the business stakeholders. So that's how we've organized it. And then we started following the CBRA approach, which is, well, what are the challenges that our business stakeholders have in hr, finance, sales, marketing all over? And how can data help overcome those challenges? And from those use cases, we then just started to build a map and started execution use of the use case. And a important ones are very simple. We them with our, our customers as well, people talking about the cata, right? The catalog for the data scientists to know what's in their data lake, for example, and for the people in and privacy. So they have their process registry and they can see how the data flows. >>So that's a starting place and that turns into a marketplace so that if new analysts and data citizens join kbra, they immediately have a place to go to, to look at, see, ok, what data is out there for me as an analyst or a data scientist or whatever to do my job, right? So they can immediately get access data. And another one that we is around trusted business. We're seeing that since, you know, self-service BI allowed everyone to make beautiful dashboards, you know, pie, pie charts. I always, my pet pee is the pie chart because I love buy and you shouldn't always be using pie charts. But essentially there's become proliferation of those reports. And now executives don't really know, okay, should I trust this report or that report the reporting on the same thing. But the numbers seem different, right? So that's why we have trusted this reporting. So we know if a, the dashboard, a data product essentially is built, we not that all the right steps are being followed and that whoever is consuming that can be quite confident in the result either, Right. And that silver browser, right? Absolutely >>Decay. >>Exactly. Yes, >>Absolutely. Talk a little bit about some of the, the key performance indicators that you're using to measure the success of the data office. What are some of those KPIs? >>KPIs and measuring is a big topic in the, in the data chief data officer profession, I would say, and again, it always varies with to your organization, but there's a few that we use that might be of interest. Use those pillars, right? And we have metrics across those pillars. So for example, a pillar on the data engineering side is gonna be more related to that uptime, right? Are the, is the data platform up and running? Are the data products up and running? Is the quality in them good enough? Is it going up? Is it going down? What's the usage? But also, and especially if you're in the cloud and if consumption's a big thing, you have metrics around cost, for example, right? So that's one set of examples. Another one is around the data sciences and products. Are people using them? Are they getting value from it? >>Can we calculate that value in ay perspective, right? Yeah. So that we can to the rest of the business continue to say we're tracking all those numbers and those numbers indicate that value is generated and how much value estimated in that region. And then you have some data intelligence, data governance metrics, which is, for example, you have a number of domains in a data mesh. People talk about being the owner of a data domain, for example, like product or, or customer. So how many of those domains do you have covered? How many of them are already part of the program? How many of them have owners assigned? How well are these owners organized, executing on their responsibilities? How many tickets are open closed? How many data products are built according to process? And so and so forth. So these are an set of examples of, of KPIs. There's a, there's a lot more, but hopefully those can already inspire the audience. >>Absolutely. So we've, we've talked about the rise cheap data offices, it's only accelerating. You mentioned this is like a 10 year journey. So if you were to look into a crystal ball, what do you see in terms of the maturation of data offices over the next decade? >>So we, we've seen indeed the, the role sort of grow up, I think in, in thousand 10 there may have been like 10 achieve data officers or something. Gartner has exact numbers on them, but then they grew, you know, industries and the number is estimated to be about 20,000 right now. Wow. And they evolved in a sort of stack of competencies, defensive data strategy, because the first chief data officers were more regulatory driven, offensive data strategy support for the digital program. And now all about data products, right? So as a data leader, you now need all of those competences and need to include them in, in your strategy. >>How is that going to evolve for the next couple of years? I wish I had one of those balls, right? But essentially I think for the next couple of years there's gonna be a lot of people, you know, still moving along with those four levels of the stack. A lot of people I see are still in version one and version two of the chief data. So you'll see over the years that's gonna evolve more digital and more data products. So for next years, my, my prediction is it's all products because it's an immediate link between data and, and the essentially, right? Right. So that's gonna be important and quite likely a new, some new things will be added on, which nobody can predict yet. But we'll see those pop up in a few years. I think there's gonna be a continued challenge for the chief officer role to become a real executive role as opposed to, you know, somebody who claims that they're executive, but then they're not, right? >>So the real reporting level into the board, into the CEO for example, will continue to be a challenging point. But the ones who do get that done will be the ones that are successful and the ones who get that will the ones that do it on the basis of data monetization, right? Connecting value to the data and making that value clear to all the data citizens in the organization, right? And in that sense, they'll need to have both, you know, technical audiences and non-technical audiences aligned of course. And they'll need to focus on adoption. Again, it's not enough to just have your data office be involved in this. It's really important that you're waking up data citizens across the organization and you make everyone in the organization think about data as an asset. >>Absolutely. Because there's so much value that can be extracted. Organizations really strategically build that data office and democratize access across all those data citizens. Stan, this is an exciting arena. We're definitely gonna keep our eyes on this. Sounds like a lot of evolution and maturation coming from the data office perspective. From the data citizen perspective. And as the data show that you mentioned in that IDC study, you mentioned Gartner as well, organizations have so much more likelihood of being successful and being competitive. So we're gonna watch this space. Stan, thank you so much for joining me on the cube at Data Citizens 22. We appreciate it. >>Thanks for having me over >>From Data Citizens 22, I'm Lisa Martin, you're watching The Cube, the leader in live tech coverage. >>Okay, this concludes our coverage of Data Citizens 2022, brought to you by Collibra. Remember, all these videos are available on demand@thecube.net. And don't forget to check out silicon angle.com for all the news and wiki bod.com for our weekly breaking analysis series where we cover many data topics and share survey research from our partner ETR Enterprise Technology Research. If you want more information on the products announced at Data Citizens, go to collibra.com. There are tons of resources there. You'll find analyst reports, product demos. It's really worthwhile to check those out. Thanks for watching our program and digging into Data Citizens 2022 on the Cube, your leader in enterprise and emerging tech coverage. We'll see you soon.
SUMMARY :
largely about getting the technology to work. Now the cloud is definitely helping with that, but also how do you automate governance? So you can see how data governance has evolved into to say we extract the signal from the noise, and over the, the next couple of days, we're gonna feature some of the So it's a really interesting story that we're thrilled to be sharing And we said at the time, you know, maybe it's time to rethink data innovation. 2020s from the previous decade, and what challenges does that bring for your customers? as data becomes more impactful than important, the level of scrutiny with respect to privacy, So again, I think it just another incentive for organization to now truly look at data You know, I don't know when you guys founded Collibra, if, if you had a sense as to how complicated the last kind of financial crisis, and that was really the, the start of Colli where we found product market Well, that's interesting because, you know, in my observation it takes seven to 10 years to actually build a again, a lot of momentum in the org in, in the, in the markets with some of the cloud partners And the second is that those data pipelines that are now being created in the cloud, I mean, the acquisition of i l dq, you know, So that's really the theme of a lot of the innovation that we're driving. And so that's the big theme from an innovation perspective, One of our key differentiators is the ability to really drive a lot of automation through workflows. So actually pushing down the computer and data quality, one of the key principles you think about monetization. And I, and I think we we're really at this pivotal moment, and I think you said it well. We need to look beyond just the I know you're gonna crush it out there. This is Dave Valante for the cube, your leader in enterprise and Without data leverage the Collibra data catalog to automatically And for that you'll establish community owners, a data set to a KPI to a report now enables your users to see what Finally, seven, promote the value of this to your users and Welcome to the Cube's coverage of Data Citizens 2022 Collibra's customer event. And now you lead data quality at Collibra. imagine if we get that wrong, you know, what the ramifications could be, And I realized in that moment, you know, I might have failed him because, cause I didn't know. And it's so complex that the way companies consume them in the IT function is And so it's really become front and center just the whole quality issue because data's so fundamental, nowadays to this topic is, so maybe we could surface all of these problems with So the language is changing a you know, stale data, you know, the, the whole trend toward real time. we sort of lived this problem for a long time, you know, in, in the Wall Street days about a decade you know, they just said, Oh, it's a glitch, you know, so they didn't understand the root cause of it. And the one right now is these hyperscalers in the cloud. And I think if you look at the whole So this is interesting because what you just described, you know, you mentioned Snowflake, And so when you were to log into Big Query tomorrow using our I love this example because, you know, Barry talks about, wow, the cloud guys are gonna own the world and, Seeing that across the board, people used to know it was a zip code and nowadays Appreciate it. Right, and thank you for watching. Nice to be here. Can can you explain to our audience why the ability to manage data across the entire organization. I was gonna say, you know, when I look back at like the last 10 years, it was all about getting the technology to work and it And one of the big pushes and passions we have at Collibra is to help with I I, you know, you mentioned this idea of, and really speeding the time to value for any of the business analysts, So where do you see, you know, the friction in adopting new data technologies? So one of the other things we're announcing with, with all of the innovations that are coming is So anybody in the organization is only getting access to the data they should have access to. So it was kind of smart that you guys were early on and We're able to profile and classify that data we're announcing with Calibra Protect this week that and get the right and make sure you have the right quality. I mean, the nice thing about Snowflake, if you play in the Snowflake sandbox, you, you, you, you can get sort of a, We also are doing more with Google around, you know, GCP and kbra protect there, you know, this year, the event your, your perspectives. And so it's all about everybody being able to easily It was great to have you on the cube first time I believe, cube, your leader in enterprise and emerging tech coverage. the cloud where you get the benefit of scale and security and so on. And the last example that comes to mind is that of a large home loan, home mortgage, Stan, it's great to have you back on the cube. Talk to us about what you mean by data citizenship and the And we believe that today's organizations, you have a lot of people, And one of the conclusions they found as they So you can say, ok, I'm doing this, you know, data culture for everyone, awakening them But the IDC study that you just mentioned demonstrates they're three times So as to how you get this done, establish this. part of the equation of getting that right, is it's not enough to just have that leadership out Talk to us about how you are building a data culture within Collibra and But over the years I've run, you know, So we said you the data products can run, the data can flow and you know, the quality can be checked. The catalog for the data scientists to know what's in their data lake, and data citizens join kbra, they immediately have a place to go to, Yes, success of the data office. So for example, a pillar on the data engineering side is gonna be more related So how many of those domains do you have covered? to look into a crystal ball, what do you see in terms of the maturation industries and the number is estimated to be about 20,000 right now. How is that going to evolve for the next couple of years? And in that sense, they'll need to have both, you know, technical audiences and non-technical audiences And as the data show that you mentioned in that IDC study, the leader in live tech coverage. Okay, this concludes our coverage of Data Citizens 2022, brought to you by Collibra.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Laura | PERSON | 0.99+ |
Lisa Martin | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Heineken | ORGANIZATION | 0.99+ |
Dave Valante | PERSON | 0.99+ |
Laura Sellers | PERSON | 0.99+ |
2008 | DATE | 0.99+ |
Collibra | ORGANIZATION | 0.99+ |
Adobe | ORGANIZATION | 0.99+ |
Felix Von Dala | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
Felix Van Dema | PERSON | 0.99+ |
seven | QUANTITY | 0.99+ |
Stan Christians | PERSON | 0.99+ |
2010 | DATE | 0.99+ |
Lisa | PERSON | 0.99+ |
San Diego | LOCATION | 0.99+ |
Jay | PERSON | 0.99+ |
50 day | QUANTITY | 0.99+ |
Felix | PERSON | 0.99+ |
one | QUANTITY | 0.99+ |
Kurt Hasselbeck | PERSON | 0.99+ |
Bank of America | ORGANIZATION | 0.99+ |
10 year | QUANTITY | 0.99+ |
California Consumer Privacy Act | TITLE | 0.99+ |
10 day | QUANTITY | 0.99+ |
Six | QUANTITY | 0.99+ |
Snowflake | ORGANIZATION | 0.99+ |
Dave Ante | PERSON | 0.99+ |
Last year | DATE | 0.99+ |
demand@thecube.net | OTHER | 0.99+ |
ETR Enterprise Technology Research | ORGANIZATION | 0.99+ |
Barry | PERSON | 0.99+ |
Gartner | ORGANIZATION | 0.99+ |
one part | QUANTITY | 0.99+ |
Python | TITLE | 0.99+ |
2010s | DATE | 0.99+ |
2020s | DATE | 0.99+ |
Calibra | LOCATION | 0.99+ |
last year | DATE | 0.99+ |
two | QUANTITY | 0.99+ |
Calibra | ORGANIZATION | 0.99+ |
K Bear Protect | ORGANIZATION | 0.99+ |
two sides | QUANTITY | 0.99+ |
Kirk Hasselbeck | PERSON | 0.99+ |
12 months | QUANTITY | 0.99+ |
tomorrow | DATE | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Barb | PERSON | 0.99+ |
Stan | PERSON | 0.99+ |
Data Citizens | ORGANIZATION | 0.99+ |
Data Citizens 22 | Laura Sellers
(light music) >> Welcome to the Cube's virtual coverage of Data Citizens 2022. My name is Dave Vellante, and I'm here with Laura Sellers, who is the Chief Product Officer at Collibra, the host of Data Citizens. Laura, welcome. Good to see you. >> Thank you. Nice to be here. >> You know, your keynote at Data Citizens this year focused on, you know, your mission to drive ease of use and scale. Now, when I think about historically, fast access to the right data at the right time in a form that's really easily consumable, it's been kind of challenging, especially for business users. Can you explain to our audience why this matters so much, and what's actually different today in the data ecosystem to make this a reality? >> Yeah, definitely. So I think what we really need and what I hear from customers every single day is that we need a new approach to data management, and our product team is what inspired me to come to Collibra a little bit over a year ago, was really the fact that they're very focused on bringing trusted data to more users across more sources for more use cases. And so as we look at what we're announcing with these innovations of ease of use and scale, it's really about making teams more productive in getting started with and the ability to manage data across the entire organization. So we've been very focused on richer experiences, a broader ecosystem of partners, as well as a platform that delivers performance, scale, and security that our users and teams need and demand. So as we look at, oh, go ahead. >> I was going to say, you know, when I look back at like the last 10 years, it was all about getting the technology to work, and it was just so complicated, but please carry on. I'd love to hear more about this. >> Yeah, I really, you know, Collibra is a system of engagement for data, and we really are working on bringing that entire system of engagement to life for everyone to leverage here and now. So what we're announcing from our ease of use side of the world is first our data marketplace. This is the ability for all users to discover and access data quickly and easily, shop for it, if you will. The next thing that we're also introducing is the new homepage. It's really about the ability to drive adoption and have users find data more quickly. And then the two more areas of the ease of use side of the world is our world of usage analytics. And one of the big pushes and passions we have at Collibra is to help with this data driven culture that all companies are trying to create, and also helping with data literacy. With something like usage analytics, it's really about driving adoption of the Collibra platform, understanding what's working, who's accessing it, what's not. And then finally, we're also introducing what's called Workflow Designer. And we love our workflows at Collibra. It's a big differentiator to be able to automate business processes. The designer is really about a way for more people to be able to create those workflows, collaborate on those workflows, as well as people to be able to easily interact with them. So a lot of exciting things when it comes to ease of use to make it easier for all users to find data. >> Yes, there's definitely a lot to unpack there. You know, you mentioned this idea of shopping for the data. That's interesting to me. Why this analogy, metaphor or analogy? I always get those confused. Let's go with analogy. Why is it so important to data consumers? >> I think when you look at the world of data, and I talked about this system of engagement, it's really about making it more accessible to the masses. And what users are used to is a shopping experience, like your Amazon, if you will. And so having a consumer grade experience where users can quickly go in and find the data, trust that data, understand where the data's coming from, and then be able to quickly access it, is the idea of being able to shop for it, just making it as simple as possible and really speeding the time to value for any of the business analysts, data analysts out there. >> Yeah, I think when you see a lot of discussion about rethinking data architectures, putting data in the hands of the users and business people, decentralized data, and of course that's awesome. I love that. But of course, then you have to have self-service infrastructure, and you have to have governance. And those are really challenging. And I think so many organizations, they're facing adoption challenges. You know, when it comes to enabling teams generally, especially domain experts, to adopt new data technologies, you know, like the tech comes fast and furious. You got all these open source projects. It can get really confusing. Of course it risks security, governance, and all that good stuff. You got all this jargon. So where do you see, you know, the friction in adopting new data technologies? What's your point of view, and how can organizations overcome these challenges? >> You're dead on. There's so much technology, and there's so much to stay on top of, which is part of the friction, right? It's just being able to stay ahead of and understand all the technologies that are coming. You also look at as there's so many more sources of data, and people are migrating data to the cloud, and they're migrating to new sources. Where the friction comes is really that ability to understand where the data came from, where it's moving to, and then also to be able to put the access controls on top of it. So people are only getting access to the data that they should be getting access to. So one of the other things we're announcing with all of the innovations that are coming is what we're doing around performance and scale. So with all of the data movement, with all of the data that's out there, the first thing we're launching in the world of performance and scale is our world of data quality. It's something that Collibra has been working on for the past year and a half, but we're launching the ability to have data quality in the cloud. So it's currently an on-premise offering, but we'll now be able to carry that over into the cloud for us to manage that way. We're also introducing the ability to push down data quality into Snowflake. So this is, again, one of those challenges is making sure that that data that you have is high quality as you move forward. And so really another, we're just reducing friction. You already have Snowflake stood up. It's not another machine for you to manage. It's just push down capabilities into Snowflake to be able to track that quality. Another thing that we're launching with that is what we call Collibra Protect. And this is that ability for users to be able to ingest metadata, understand where the PII data is, and then set policies up on top of it. So very quickly be able to set policies and have them enforced at the data level. So anybody in the organization is only getting access to the data they should have access to. >> This topic of data quality is interesting. It's something that I've followed for a number of years. It used to be a back office function, you know, and really confined only to highly regulated industries like financial services and healthcare and government. You know, you look back over a decade ago, you didn't have this worry about personal information, GDPR, and, you know, California Consumer Privacy Act, all becomes so much important. The cloud has really changed things in terms of performance and scale, and of course, partnering with Snowflake, it's all about sharing data and monetization, anything but a back office function. So it was kind of smart that you guys were early on and of course, attracting them as an investor as well was very strong validation. What can you tell us about the nature of the relationship with Snowflake, and specifically interested in sort of joint engineering and product innovation efforts, you know, beyond the standard go to market stuff? >> Definitely. So you mentioned they were a strategic investor in Collibra about a year ago. A little less than that I guess. We've been working with them though for over a year really tightly with their product and engineering teams to make sure that Collibra is adding real value. Our unified platform is touching, pieces of our unified platform are touching all pieces of Snowflake. And when I say that, what I mean is we're first, you know, able to ingest data with Snowflake, which has always existed. We're able to profile and classify that data. We're announcing with Collibra Protect this week that you're now able to create those policies on top of Snowflake and have them enforced. So again, people can get more value out of their Snowflake more quickly. As far as time to value with our policies, for all business users to be able to create. We're also announcing Snowflake Lineage 2.0. So this is the ability to take stored procedures in Snowflake and understand the lineage of where did the data come from, how was it transformed within Snowflake, as well as the data quality pushdown, as I mentioned. Data quality, you brought it up, it is a new, it is a big industry push, and you know, one of the things I think Gartner mentioned is people are losing up to $15 million without having great data quality. So this push down capability for Snowflake really is, again, a big ease of use push for us at Collibra of that ability to push it into Snowflake, take advantage of the data source and the engine that already lives there, and get the right and make sure you have the right quality. >> I mean, the nice thing about Snowflake, if you play in the Snowflake sandbox, you can get sort of a high degree of confidence that the data sharing can be done in a safe way. Bringing Collibra into the story allows me to have that data quality and that governance that I need. You know, we've said many times on the Cube that one of the notable differences in cloud this decade versus last decade, I mean there are obvious differences just in terms of scale and scope, but it's shaping up to be about the strength of the ecosystems. That's really a hallmark of these big cloud players. I mean they're, it's a key factor for innovating, accelerating product delivery, filling gaps in the hyperscale offerings, 'cause you got more stack, you know, much more stack capabilities, and it creates this flywheel momentum as we often say. But, so my question is, how do you work with the hyperscalers? Like whether it's AWS or Google or whomever, and what do you see as your role, and what's the Collibra sweet spot? >> Yeah, definitely. So, you know, one of the things I mentioned early on is the broader ecosystem of partners is what it's all about. And so we have that strong partnership with Snowflake. We also are doing more with Google around, you know, GCP and Collibra Protect there, but also tighter Dataplex integration. So similar to what you've seen with our strategic moves around Snowflake and really covering the broad ecosystem of what Collibra can do on top of that data source, we're extending that to the world of Google as well and the world of Dataplex. We also have great partners in SIs. Infosys is somebody we spoke with at the conference who's done a lot of great work with Levi's, as they're really important to help people with their whole data strategy and driving that data driven culture and Collibra being the core of it. >> All right, Laura, we're going to end it there, but I wonder if you could kind of put a bow on this year, the event, your perspectives. So just give us your closing thoughts. >> Yeah, definitely. So I want to say, this is one of the biggest releases Collibra's ever had, definitely the biggest one since I've been with the company a little over a year. We have all these great new product innovations coming to really drive the ease of use, to make data more valuable for users everywhere and companies everywhere. And so it's all about everybody being able to easily find, understand, and trust, and get access to that data going forward. >> Well congratulations on all the progress. It was great to have you on the Cube, first time I believe, and really appreciate you taking the time with us. >> Yes, thank you for your time. >> You're very welcome. Okay, you're watching the coverage of Data Citizens 2022 on the Cube, your leader in enterprise and emerging tech coverage. (light music)
SUMMARY :
Welcome to the Cube's virtual coverage Nice to be here. fast access to the right the ability to manage data the technology to work, is to help with this data driven culture Why is it so important to data consumers? and really speeding the time to value and you have to have governance. and then also to be able and really confined only to and get the right and make sure and what do you see as your role, and really covering the broad ecosystem going to end it there, and get access to that data going forward. and really appreciate you on the Cube, your leader in enterprise
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Vellante | PERSON | 0.99+ |
Laura | PERSON | 0.99+ |
Collibra | ORGANIZATION | 0.99+ |
Laura Sellers | PERSON | 0.99+ |
California Consumer Privacy Act | TITLE | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
Snowflake | ORGANIZATION | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Infosys | ORGANIZATION | 0.99+ |
GDPR | TITLE | 0.99+ |
Snowflake | TITLE | 0.99+ |
Dataplex | ORGANIZATION | 0.98+ |
Gartner | ORGANIZATION | 0.98+ |
one | QUANTITY | 0.98+ |
first | QUANTITY | 0.97+ |
this week | DATE | 0.97+ |
Data Citizens | ORGANIZATION | 0.96+ |
first time | QUANTITY | 0.96+ |
Snowflake Lineage 2.0 | TITLE | 0.94+ |
up to $15 million | QUANTITY | 0.93+ |
Cube | COMMERCIAL_ITEM | 0.93+ |
today | DATE | 0.93+ |
Levi's | ORGANIZATION | 0.92+ |
a year ago | DATE | 0.92+ |
this year | DATE | 0.91+ |
a decade ago | DATE | 0.89+ |
first thing | QUANTITY | 0.88+ |
Collibra | TITLE | 0.87+ |
Snowflake | EVENT | 0.86+ |
past year and a half | DATE | 0.83+ |
last decade | DATE | 0.83+ |
GCP | ORGANIZATION | 0.8+ |
over a year | QUANTITY | 0.79+ |
two more areas | QUANTITY | 0.79+ |
last 10 years | DATE | 0.79+ |
Data | EVENT | 0.77+ |
single day | QUANTITY | 0.77+ |
about | DATE | 0.76+ |
decade | DATE | 0.74+ |
Collibra Protect | ORGANIZATION | 0.72+ |
Data Citizens 2022 | TITLE | 0.72+ |
Cube | ORGANIZATION | 0.66+ |
Data Citizens | TITLE | 0.63+ |
Protect | COMMERCIAL_ITEM | 0.63+ |
over | DATE | 0.61+ |
2022 | EVENT | 0.58+ |
22 | ORGANIZATION | 0.44+ |
Citizens | ORGANIZATION | 0.38+ |
Felix Van de Maele, Collibra | Data Citizens '22
(upbeat music) >> Last year, the Cube covered Data Citizens, Collibra's customer event. And the premise that we put forth prior to that event was that despite all the innovation that's gone on over the last decade or more with data, you know, starting with the Hadoop movement. We had data lakes, we had Spark, the ascendancy of programming languages like Python, the introduction of frameworks like TensorFlow, the rise of AI, low code, no code, et cetera. Businesses still find it's too difficult to get more value from their data initiatives. And we said at the time, you know, maybe it's time to rethink data innovation. While a lot of the effort has been focused on more efficiently storing and processing data, perhaps more energy needs to go into thinking about the people and the process side of the equation, meaning making it easier for domain experts to both gain insights from data, trust the data, and begin to use that data in new ways, fueling data products, monetization, and insights. Data Citizens 2022 is back, and we're pleased to have Felix Van de Maele, who is the founder and CEO of Collibra. He's on the Cube. We're excited to have you, Felix. Good to see you again. >> Likewise Dave. Thanks for having me again. >> You bet. All right, we're going to get the update from Felix on the current data landscape, how he sees it, why data intelligence is more important now than ever, and get current on what Collibra has been up to over the past year, and what's changed since Data Citizens 2021. And we may even touch on some of the product news. So Felix, we're living in a very different world today with businesses and consumers. They're struggling with things like supply chains, uncertain economic trends, and we're not just snapping back to the 2010s. That's clear. And that's really true, as well, in the world of data. So what's different in your mind in the data landscape of the 2020s from the previous decade, and what challenges does that bring for your customers? >> Yeah, absolutely. And I think you said it well, Dave, in the intro that rising complexity and fragmentation in the broader data landscape that hasn't gotten any better over the last couple of years. When we talk to our customers, that level of fragmentation, the complexity, how do we find data that we can trust, that we know we can use, has only gotten kind of more difficult. So that trend is continuing. I think what is changing is that trend has become much more acute. Well, the other thing we've seen over the last couple of years is that the level of scrutiny that organizations are under with respect to data, as data becomes more mission critical, as data becomes more impactful and important, the level of scrutiny with respect to privacy, security, regulatory compliance, is only increasing as well. Which again, is really difficult in this environment of continuous innovation, continuous change, continuous growing complexity and fragmentation. So it's become much more acute. And to your earlier point, we do live in a different world, and the past couple of years, we could probably just kind of brute force it, right? We could focus on the top line. There was enough kind of investments to be had. I think nowadays organizations are focused, or are in a very different environment where there's much more focus on cost control, productivity, efficiency. How do we truly get value from that data? So again, I think it's just another incentive for organizations to now truly look at that data and to scale that data, not just from a technology and infrastructure perspective, but how do we actually scale data from an organizational perspective, right? Like you said, the people and process, how do we do that at scale? And that's only becoming much more important. And we do believe that the economic environment that we find ourselves in today is going to be a catalyst for organizations to really take that more seriously if you will than they maybe have in the past. >> You know, I don't know when you guys founded Collibra, if you had a sense as to how complicated it was going to get, but you've been on a mission to really address these problems from the beginning. How would you describe your mission, and what are you doing to address these challenges? >> Yeah, absolutely. We started Collibra in 2008. So in some sense in the last kind of financial crisis. And that was really the start of Collibra, where we found product market fit working with large financial institutions to help them cope with the increasing compliance requirements that they were faced with because of the financial crisis, and kind of here we are again in a very different environment of course, 15 years, almost 15 years later. But data only becoming more important. But our mission to deliver trusted data for every user, every use case, and across every source, frankly has only become more important. So while it's been an incredible journey over the last 14, 15 years, I think we're still relatively early in our mission to, again, be able to provide everyone, and that's why we call it Data Citizens. We truly believe that everyone in the organization should be able to use trusted data in an easy, easy manner. That mission is only becoming more important, more relevant. We definitely have a lot more work ahead of us because we're still relatively early in that journey. >> Well, that's interesting because, you know, in my observation, it takes seven to 10 years to actually build a company, and then the fact that you're still in the early days is kind of interesting. I mean, Collibra's had a good 12 months or so since we last spoke at Data Citizens. Give us the latest update on your business. What do people need to know about your your current momentum? >> Yeah, absolutely. Again, there's a lot of tailwinds, organizations are only maturing their data practices, and we've seen it kind of transform, or influence a lot of our business growth that we've seen, broader adoption of the platform. We work at some of the largest organizations in the world, whether it's Adobe, Heineken, Bank of America, and many more. We have now over 600 enterprise customers, all industry leaders and every single vertical. So it's really exciting to see that and continue to partner with those organizations. On the partnership side, again, a lot of momentum in the market with some of the cloud partners like Google, Amazon, Snowflake, Databricks, and others, right? As those kind of new modern data infrastructures, modern data architectures, are definitely all moving to the cloud. A great opportunity for us, our partners, and of course our customers, to help them kind of transition to the cloud even faster. And so we see a lot of excitement and momentum there. We did an acquisition about 18 months ago around data quality, data observability, which we believe is an enormous opportunity. Of course data quality isn't new, but I think there's a lot of reasons why we're so excited about quality and observability now. One is around leveraging AI, machine learning, again to drive more automation. And the second is that those data pipelines that are now being created in the cloud, in these modern data architectures, they've become mission critical. They've become real time. And so monitoring, observing those data pipelines continuously has become absolutely critical. So we're really excited about that as well. And on the organizational side, I'm sure you've heard a term around kind of data mesh, something that's gaining a lot of momentum, rightfully so. It's really the type of governance that we always believed in. Federated, focused on domains, giving a lot of ownership to different teams. I think that's the way to scale the data organizations, and so that aligns really well with our vision, and from a product perspective, we've seen a lot of momentum with our customers there as well. >> Yeah, you know, a couple things there. I mean, the acquisition of OwlDQ, you know, Kirk Haslbeck and their team, it's interesting, you know, the whole data quality used to be this back office function and really confined to highly regulated industries. It's come to the front office, it's top of mind for chief data officers, data mesh, you mentioned. You guys are a connective tissue for all these different nodes on the data mesh. That's key. And of course we see you at all the shows. You're a critical part of many ecosystems, and you're developing your own ecosystem. So let's chat a little bit about the products. We're going to go deeper into products later on at Data Citizens '22, but we know you're debuting some new innovations, you know, whether it's, you know, the under the covers in security, sort of making data more accessible for people, just dealing with workflows and processes as you talked about earlier. Tell us a little bit about what you're introducing. >> Yeah, absolutely. We're super excited, a ton of innovation. And if we think about the big theme, and like I said, we're still relatively early in this journey towards kind of that mission of data intelligence, that really bold and compelling mission. Either customers are just starting on that journey, and we want to make it as easy as possible for the organization to actually get started, because we know that's important that they do. And for our organization and customers that have been with us for some time, there's still a tremendous amount of opportunity to kind of expand the platform further. And again, to make it easier for, really to accomplish that mission and vision around that data citizen that everyone has access to trustworthy data in a very easy, easy way. So that's really the theme of a lot of the innovation that we're driving, a lot of kind of ease of adoption, ease of use, but also then, how do we make sure that as Collibra becomes this kind of mission critical enterprise platform from a security performance architecture scale, supportability that we're truly able to deliver that kind of an enterprise mission critical platform. And so that's the big theme. From an innovation perspective, from a product perspective, a lot of new innovation that we're really excited about. A couple of highlights. One is around data marketplace. Again, a lot of our customers have plans in that direction. How do we make it easy? How do we make available a true kind of shopping experience so that anybody in your organization can, in a very easy search first way, find the right data product, find the right data set that data can then consume, use its analytics. How do we help organizations drive adoption, tell them where they're working really well, and where they have opportunities. Home pages, again, to make things easy for people, for anyone in your organization, to kind of get started with Collibra. You mentioned workflow designer, again, we have a very powerful enterprise platform. One of our key differentiators is the ability to really drive a lot of automation through workflows. And now we provided a new low code, no code, kind of workflow designer experience. So really customers can take it to the next level. There's a lot more new product around Collibra Protect, which in partnership with Snowflake, which has been a strategic investor in Collibra, focused on how do we make access governance easier? How do we, how are we able to make sure that as you move to the cloud, things like access management, masking around sensitive data, PII data, is managed in a much more effective way. Really excited about that product. There's more around data quality. Again, how do we get that deployed as easily and quickly and widely as we can? Moving that to the cloud has been a big part of our strategy. So we launched our data quality cloud product as well as making use of those native compute capabilities in platforms like Snowflake, Databricks, Google, Amazon, and others. And so we are bettering a capability that we call push down. So we're actually pushing down the computer and data quality, the monitoring, into the underlying platform, which again, from a scale performance and ease of use perspective is going to make a massive difference. And then more broadly, we talked a little bit about the ecosystem. Again, integrations that we talk about, being able to connect to every source. Integrations are absolutely critical, and we're really excited to deliver new integrations with Snowflake, Azure, and Google Cloud Storage as well. So there's a lot coming out. The team has been at work really hard, and we are really, really excited about what we are coming, what we're bringing to markets. >> Yeah, a lot going on there. I wonder if you could give us your closing thoughts. I mean, you talked about the marketplace, you know, you think about data mesh, you think of data as product, one of the key principles. You think about monetization. This is really different than what we've been used to in data, which is just getting the technology to work has been been so hard, so how do you see sort of the future? And, you know, give us your closing thoughts please. >> Yeah, absolutely. And I think we're really at this pivotal moment, and I think you said it well. We all know the constraint and the challenges with data, how to actually do data at scale. And while we've seen a ton of innovation on the infrastructure side, we fundamentally believe that just getting a faster database is important, but it's not going to fully solve the challenges and truly kind of deliver on the opportunity. And that's why now is really the time to deliver this data intelligence vision, the data intelligence platform. We are still early, making it as easy as we can. It's kind of our, as our mission. And so I'm really, really excited to see what we are going to, how the markets are going to evolve over the next few quarters and years. I think the trend is clearly there, when we talk about data mesh, this kind of federated approach, focus on data products is just another signal that we believe that a lot of our organizations are now at the time, they understand the need to go beyond just the technology, how to really, really think about how to actually scale data as a business function, just like we've done with IT, with HR, with sales and marketing, with finance. That's how we need to think about data. I think now's the time given the economic environment that we are in, much more focus on control, much more focus on productivity, efficiency, and now's the time we need to look beyond just the technology and infrastructure to think of how to scale data, how to manage data at scale. >> Yeah, it's a new era. The next 10 years of data won't be like the last, as I always say. Felix, thanks so much, and good luck in San Diego. I know you're going to crush it out there. >> Thank you Dave. >> Yeah, it's a great spot for an in person event, and of course, the content post event is going to be available at collibra.com, and you can of course catch the Cube coverage at thecube.net, and all the news at siliconangle.com. This is Dave Vellante for the Cube, your leader in enterprise and emerging tech coverage. (light music)
SUMMARY :
And the premise that we put Thanks for having me again. of the 2020s from the previous decade, and the past couple of years, and what are you doing to and kind of here we are again What do people need to know And on the organizational side, And of course we see you at all the shows. for the organization to the technology to work and now's the time we need to look beyond I know you're going to crush it out there. and of course, the content post event
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Adobe | ORGANIZATION | 0.99+ |
Heineken | ORGANIZATION | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
Collibra | ORGANIZATION | 0.99+ |
San Diego | LOCATION | 0.99+ |
Dave | PERSON | 0.99+ |
Felix Van de Maele | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Snowflake | ORGANIZATION | 0.99+ |
seven | QUANTITY | 0.99+ |
2008 | DATE | 0.99+ |
Felix | PERSON | 0.99+ |
Bank of America | ORGANIZATION | 0.99+ |
2020s | DATE | 0.99+ |
Databricks | ORGANIZATION | 0.99+ |
Python | TITLE | 0.99+ |
2010s | DATE | 0.99+ |
Last year | DATE | 0.99+ |
thecube.net | OTHER | 0.99+ |
Data Citizens | ORGANIZATION | 0.99+ |
12 months | QUANTITY | 0.99+ |
second | QUANTITY | 0.99+ |
siliconangle.com | OTHER | 0.99+ |
One | QUANTITY | 0.99+ |
10 years | QUANTITY | 0.99+ |
OwlDQ | ORGANIZATION | 0.98+ |
Spark | TITLE | 0.98+ |
TensorFlow | TITLE | 0.97+ |
Data Citizens | EVENT | 0.97+ |
today | DATE | 0.97+ |
Kirk Haslbeck | PERSON | 0.96+ |
over 600 enterprise customers | QUANTITY | 0.96+ |
both | QUANTITY | 0.96+ |
Collibra Protect | ORGANIZATION | 0.96+ |
first way | QUANTITY | 0.94+ |
one | QUANTITY | 0.93+ |
last decade | DATE | 0.93+ |
past couple of years | DATE | 0.93+ |
collibra.com | OTHER | 0.92+ |
15 years | QUANTITY | 0.88+ |
about 18 months ago | DATE | 0.87+ |
last couple of years | DATE | 0.87+ |
last couple of years | DATE | 0.83+ |
almost 15 years later | DATE | 0.82+ |
Data | ORGANIZATION | 0.81+ |
previous decade | DATE | 0.76+ |
Data Citizens 2021 | ORGANIZATION | 0.73+ |
next 10 years | DATE | 0.69+ |
quarters | DATE | 0.67+ |
last | DATE | 0.66+ |
Data Citizens 2022 | ORGANIZATION | 0.63+ |
Google Cloud | ORGANIZATION | 0.63+ |
past year | DATE | 0.62+ |
Storage | TITLE | 0.6+ |
Azure | ORGANIZATION | 0.59+ |
next | DATE | 0.58+ |
case | QUANTITY | 0.58+ |
Cube | ORGANIZATION | 0.53+ |
single vertical | QUANTITY | 0.53+ |
14 | QUANTITY | 0.46+ |
Cube | COMMERCIAL_ITEM | 0.45+ |
Kirk Haslbeck, Collibra | Data Citizens '22
(bright upbeat music) >> Welcome to theCUBE's Coverage of Data Citizens 2022 Collibra's Customer event. My name is Dave Vellante. With us is Kirk Hasselbeck, who's the Vice President of Data Quality of Collibra. Kirk, good to see you. Welcome. >> Thanks for having me, Dave. Excited to be here. >> You bet. Okay, we're going to discuss data quality, observability. It's a hot trend right now. You founded a data quality company, OwlDQ and it was acquired by Collibra last year. Congratulations! And now you lead data quality at Collibra. So we're hearing a lot about data quality right now. Why is it such a priority? Take us through your thoughts on that. >> Yeah, absolutely. It's definitely exciting times for data quality which you're right, has been around for a long time. So why now, and why is it so much more exciting than it used to be? I think it's a bit stale, but we all know that companies use more data than ever before and the variety has changed and the volume has grown. And while I think that remains true, there are a couple other hidden factors at play that everyone's so interested in as to why this is becoming so important now. And I guess you could kind of break this down simply and think about if Dave, you and I were going to build, you know a new healthcare application and monitor the heartbeat of individuals, imagine if we get that wrong, what the ramifications could be? What those incidents would look like? Or maybe better yet, we try to build a new trading algorithm with a crossover strategy where the 50 day crosses the 10 day average. And imagine if the data underlying the inputs to that is incorrect. We'll probably have major financial ramifications in that sense. So, it kind of starts there where everybody's realizing that we're all data companies and if we are using bad data, we're likely making incorrect business decisions. But I think there's kind of two other things at play. I bought a car not too long ago and my dad called and said, "How many cylinders does it have?" And I realized in that moment, I might have failed him because 'cause I didn't know. And I used to ask those types of questions about any lock brakes and cylinders and if it's manual or automatic and I realized I now just buy a car that I hope works. And it's so complicated with all the computer chips. I really don't know that much about it. And that's what's happening with data. We're just loading so much of it. And it's so complex that the way companies consume them in the IT function is that they bring in a lot of data and then they syndicate it out to the business. And it turns out that the individuals loading and consuming all of this data for the company actually may not know that much about the data itself and that's not even their job anymore. So, we'll talk more about that in a minute but that's really what's setting the foreground for this observability play and why everybody's so interested, it's because we're becoming less close to the intricacies of the data and we just expect it to always be there and be correct. >> You know, the other thing too about data quality and for years we did the MIT CDOIQ event we didn't do it last year at COVID, messed everything up. But the observation I would make there love thoughts is it data quality used to be information quality used to be this back office function, and then it became sort of front office with financial services and government and healthcare, these highly regulated industries. And then the whole chief data officer thing happened and people were realizing, well, they sort of flipped the bit from sort of a data as a a risk to data as an asset. And now, as we say, we're going to talk about observability. And so it's really become front and center, just the whole quality issue because data's fundamental, hasn't it? >> Yeah, absolutely. I mean, let's imagine we pull up our phones right now and I go to my favorite stock ticker app and I check out the NASDAQ market cap. I really have no idea if that's the correct number. I know it's a number, it looks large, it's in a numeric field. And that's kind of what's going on. There's so many numbers and they're coming from all of these different sources and data providers and they're getting consumed and passed along. But there isn't really a way to tactically put controls on every number and metric across every field we plan to monitor. But with the scale that we've achieved in early days, even before Collibra. And what's been so exciting is we have these types of observation techniques, these data monitors that can actually track past performance of every field at scale. And why that's so interesting and why I think the CDO is listening right intently nowadays to this topic is so maybe we could surface all of these problems with the right solution of data observability and with the right scale and then just be alerted on breaking trends. So we're sort of shifting away from this world of must write a condition and then when that condition breaks, that was always known as a break record. But what about breaking trends and root cause analysis? And is it possible to do that, with less human intervention? And so I think most people are seeing now that it's going to have to be a software tool and a computer system. It's not ever going to be based on one or two domain experts anymore. >> So, how does data observability relate to data quality? Are they sort of two sides of the same coin? Are they cousins? What's your perspective on that? >> Yeah, it's super interesting. It's an emerging market. So the language is changing a lot of the topic and areas changing the way that I like to say it or break it down because the lingo is constantly moving as a target on this space is really breaking records versus breaking trends. And I could write a condition when this thing happens it's wrong and when it doesn't, it's correct. Or I could look for a trend and I'll give you a good example. Everybody's talking about fresh data and stale data and why would that matter? Well, if your data never arrived or only part of it arrived or didn't arrive on time, it's likely stale and there will not be a condition that you could write that would show you all the good and the bads. That was kind of your traditional approach of data quality break records. But your modern day approach is you lost a significant portion of your data, or it did not arrive on time to make that decision accurately on time. And that's a hidden concern. Some people call this freshness, we call it stale data but it all points to the same idea of the thing that you're observing may not be a data quality condition anymore. It may be a breakdown in the data pipeline. And with thousands of data pipelines in play for every company out there there, there's more than a couple of these happening every day. >> So what's the Collibra angle on all this stuff made the acquisition you got data quality observability coming together, you guys have a lot of expertise in this area but you hear providence of data you just talked about stale data, the whole trend toward real time. How is Collibra approaching the problem and what's unique about your approach? >> Well, I think where we're fortunate is with our background, myself and team we sort of lived this problem for a long time in the Wall Street days about a decade ago. And we saw it from many different angles. And what we came up with before it was called data observability or reliability was basically the underpinnings of that. So we're a little bit ahead of the curve there when most people evaluate our solution. It's more advanced than some of the observation techniques that currently exist. But we've also always covered data quality and we believe that people want to know more, they need more insights and they want to see break records and breaking trends together so they can correlate the root cause. And we hear that all the time. I have so many things going wrong just show me the big picture. Help me find the thing that if I were to fix it today would make the most impact. So we're really focused on root cause analysis, business impact connecting it with lineage and catalog, metadata. And as that grows, you can actually achieve total data governance. At this point, with the acquisition of what was a lineage company years ago and then my company OwlDQ, now Collibra Data Quality, Collibra may be the best positioned for total data governance and intelligence in the space. >> Well, you mentioned financial services a couple of times and some examples, remember the flash crash in 2010. Nobody had any idea what that was, they just said, "Oh, it's a glitch." So they didn't understand the root cause of it. So this is a really interesting topic to me. So we know at Data Citizens '22 that you're announcing you got to announce new products, right? Your yearly event, what's new? Give us a sense as to what products are coming out but specifically around data quality and observability. >> Absolutely. There's always a next thing on the forefront. And the one right now is these hyperscalers in the cloud. So you have databases like Snowflake and Big Query and Data Bricks, Delta Lake and SQL Pushdown. And ultimately what that means is a lot of people are storing in loading data even faster in a salike model. And we've started to hook in to these databases. And while we've always worked with the same databases in the past they're supported today we're doing something called Native Database pushdown, where the entire compute and data activity happens in the database. And why that is so interesting and powerful now is everyone's concerned with something called Egress. Did my data that I've spent all this time and money with my security team securing ever leave my hands? Did it ever leave my secure VPC as they call it? And with these native integrations that we're building and about to unveil here as kind of a sneak peek for next week at Data Citizens, we're now doing all compute and data operations in databases like Snowflake. And what that means is with no install and no configuration you could log into the Collibra Data Quality app and have all of your data quality running inside the database that you've probably already picked as your your go forward team selection secured database of choice. So we're really excited about that. And I think if you look at the whole landscape of network cost, egress cost, data storage and compute, what people are realizing is it's extremely efficient to do it in the way that we're about to release here next week. >> So this is interesting because what you just described you mentioned Snowflake, you mentioned Google, oh actually you mentioned yeah, the Data Bricks. Snowflake has the data cloud. If you put everything in the data cloud, okay, you're cool but then Google's got the open data cloud. If you heard Google Nest and now Data Bricks doesn't call it the data cloud but they have like the open source data cloud. So you have all these different approaches and there's really no way up until now I'm hearing to really understand the relationships between all those and have confidence across, it's like (indistinct) you should just be a note on the mesh. And I don't care if it's a data warehouse or a data lake or where it comes from, but it's a point on that mesh and I need tooling to be able to have confidence that my data is governed and has the proper lineage, providence. And that's what you're bringing to the table. Is that right? Did I get that right? >> Yeah, that's right. And for us, it's not that we haven't been working with those great cloud databases, but it's the fact that we can send them the instructions now we can send them the operating ability to crunch all of the calculations, the governance, the quality and get the answers. And what that's doing, it's basically zero network cost, zero egress cost, zero latency of time. And so when you were to log into Big BigQuery tomorrow using our tool or let or say Snowflake, for example, you have instant data quality metrics, instant profiling, instant lineage and access privacy controls things of that nature that just become less onerous. What we're seeing is there's so much technology out there just like all of the major brands that you mentioned but how do we make it easier? The future is about less clicks, faster time to value faster scale, and eventually lower cost. And we think that this positions us to be the leader there. >> I love this example because every talks about wow the cloud guys are going to own the world and of course now we're seeing that the ecosystem is finding so much white space to add value, connect across cloud. Sometimes we call it super cloud and so, or inter clouding. Alright, Kirk, give us your final thoughts and on the trends that we've talked about and Data Citizens '22. >> Absolutely. Well I think, one big trend is discovery and classification. Seeing that across the board people used to know it was a zip code and nowadays with the amount of data that's out there, they want to know where everything is where their sensitive data is. If it's redundant, tell me everything inside of three to five seconds. And with that comes, they want to know in all of these hyperscale databases, how fast they can get controls and insights out of their tools. So I think we're going to see more one click solutions, more SAS-based solutions and solutions that hopefully prove faster time to value on all of these modern cloud platforms. >> Excellent, all right. Kurt Hasselbeck, thanks so much for coming on theCUBE and previewing Data Citizens '22. Appreciate it. >> Thanks for having me, Dave. >> You're welcome. All right, and thank you for watching. Keep it right there for more coverage from theCUBE.
SUMMARY :
Kirk, good to see you. Excited to be here. and it was acquired by Collibra last year. And it's so complex that the And now, as we say, we're going and I check out the NASDAQ market cap. and areas changing the and what's unique about your approach? of the curve there when most and some examples, remember and data activity happens in the database. and has the proper lineage, providence. and get the answers. and on the trends that we've talked about and solutions that hopefully and previewing Data Citizens '22. All right, and thank you for watching.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Vellante | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Collibra | ORGANIZATION | 0.99+ |
Kurt Hasselbeck | PERSON | 0.99+ |
2010 | DATE | 0.99+ |
one | QUANTITY | 0.99+ |
Kirk Hasselbeck | PERSON | 0.99+ |
50 day | QUANTITY | 0.99+ |
Kirk | PERSON | 0.99+ |
10 day | QUANTITY | 0.99+ |
OwlDQ | ORGANIZATION | 0.99+ |
Kirk Haslbeck | PERSON | 0.99+ |
next week | DATE | 0.99+ |
ORGANIZATION | 0.99+ | |
last year | DATE | 0.99+ |
two sides | QUANTITY | 0.99+ |
thousands | QUANTITY | 0.99+ |
NASDAQ | ORGANIZATION | 0.99+ |
Snowflake | TITLE | 0.99+ |
Data Citizens | ORGANIZATION | 0.99+ |
Data Bricks | ORGANIZATION | 0.99+ |
two other things | QUANTITY | 0.98+ |
one click | QUANTITY | 0.98+ |
tomorrow | DATE | 0.98+ |
today | DATE | 0.98+ |
five seconds | QUANTITY | 0.97+ |
two domain | QUANTITY | 0.94+ |
Collibra Data Quality | TITLE | 0.92+ |
MIT CDOIQ | EVENT | 0.9+ |
Data Citizens '22 | TITLE | 0.9+ |
Egress | ORGANIZATION | 0.89+ |
Delta Lake | TITLE | 0.89+ |
three | QUANTITY | 0.86+ |
zero | QUANTITY | 0.85+ |
Big Query | TITLE | 0.85+ |
about a decade ago | DATE | 0.85+ |
SQL Pushdown | TITLE | 0.83+ |
Data Citizens 2022 Collibra | EVENT | 0.82+ |
Big BigQuery | TITLE | 0.81+ |
more than a couple | QUANTITY | 0.79+ |
couple | QUANTITY | 0.78+ |
one big | QUANTITY | 0.77+ |
Collibra Data Quality | ORGANIZATION | 0.75+ |
Collibra | OTHER | 0.75+ |
Google Nest | ORGANIZATION | 0.75+ |
Data Citizens '22 | ORGANIZATION | 0.74+ |
zero latency | QUANTITY | 0.72+ |
SAS | ORGANIZATION | 0.71+ |
Snowflake | ORGANIZATION | 0.69+ |
COVID | ORGANIZATION | 0.69+ |
years ago | DATE | 0.68+ |
Wall Street | LOCATION | 0.66+ |
theCUBE | ORGANIZATION | 0.66+ |
many numbers | QUANTITY | 0.63+ |
Collibra | PERSON | 0.63+ |
times | QUANTITY | 0.61+ |
Data | ORGANIZATION | 0.61+ |
too long | DATE | 0.6+ |
Vice President | PERSON | 0.57+ |
data | QUANTITY | 0.56+ |
CDO | TITLE | 0.52+ |
Bricks | TITLE | 0.48+ |
Jacklyn Osborne, Bank Of America | Collibra Data Citizens'21
>>from >>around the globe. >>It's the cube >>covering Data Citizens >>21 brought to you by culebra. >>Well how everybody john Wallace here as we continue our coverage here on the cube of Data Citizens 21 it is a pleasure of ours to welcome in an award winner here at Data Citizens 21 were with Jacqueline Osborne who is the Managing Director and risk and Finance technology executive at Bank of America and she is also the data citizen of the Year, one of the culebra Excellence Award winners. And Jacqueline congratulations on the honor. Well deserved, I'm sure. >>Thank you so much. It is a true honor and I am so happy to be here and I'm looking forward to our conversation today. Yeah, what is it? >>It's all about just the concept of being a data citizen um in your mind um what is all that about? What are those pillars in terms of being a good data sets? And that gets to the point that you are the data citizen of the year? >>I think that's such a good question and actually is something that I don't even know if I know everything because it's constantly evolving. Being a data citizen yesterday is not what it is today and it's not what it means tomorrow because this field is evolving but with that said I think to me being a data citizen is being is having that awareness that data matters is driving to that data as an asset and really trying to lay the foundation to ensure its the right data in the right place at the right time. >>Yeah, let's talk about that high wire act because it's becoming increasingly more complex as you know, you've been in This realm if you will for what 15 years now? I believe it has evolved dramatically right in terms of capabilities but also complexity. So let's talk about that, about making finding the relevance of data and delivering it on time to the right people within your organization. >>How much >>More challenging is that now than it was maybe five or just you know, 10 years ago? >>I mean it's kind of crazy. There are some areas that make it so much easier and then for your question in some areas that make it so much harder. But if I can, let's start with the easier because I think this is something that really is important is when I started this, I've been in data my entire professional career, I've been achieved data officer since 2013. Um and when I started, I used to joke that I was a used car salesman. I was selling selling something this idea of data quality, data governance that nobody wanted. But now, so the shift of your question is the good if I am now a luxury car salesman selling a product that everybody wants, but shift to the bad nobody wants to pay for. So the complexity of it as data becomes bigger as we talk about big data and unstructured data and social media and facebook feeds. That is hard. It is complex. And the ability to truly manage and govern data to the degree of that perfection is really hard. So the more data we get, the more complexity, the more challenge, the more there is a need to really prioritize align with business strategy and ensure that you are embedding into the culture and the DNA of the corporate and not do it in the silos. >>You know, delivering that data to in the secure environment obviously, critically important for any enterprise, but even more so to put a finer point on financial services in terms of your work in that regard. So, so let's add that layer into this to not only internal, all the communication you have to do in the collaboration, you have to have but you have these external stakeholders to write, you have me, you know, a boa client if you will um that you've got to be aware of and have to communicate with. So so let's talk about that, that kind of merger if you will of not only having to work internally but also externally and making sure that with all the data you've got now that it works >>indeed. And you're kind of moving towards this new one of the newer dimensions, which is privacy, I mean G D p R was the first regulation in the UK, but now you have the C C P A and the California and it's coming and that that right to be forgotten or more importantly, as you said, as a customer of financial issues, that right to understand where your data is is very important because customers do want to know that their information is understood, trusted protected and going to be taken care of. So that ability to really transform back that you have a solid basis and that you are taking the measures and the necessary steps to ensure that that data is air quotes govern is so important. And it really again that shift from that used car salesman to a luxury car salesman. Your question is another example of how that shift is happening. It's no longer a should do or could do. Data governance is really becoming a must do and why you are seeing so many more. Chief data officers. Chief analytics officers, data management professionals. The profession is growing. I mean, incrementally every single day. >>What about the balancing act that you do? Let's just do with the internal audiences that you have to contend with. I shouldn't say content, content has that pejorative term to that you that you that you deal with, you collaborate with. Um you know, governance is also critically important because you want to make data available to the right people at the right time, but only the right people. Right. So what kind of practices or procedures are you putting in a place at B. O. A to make sure that that data is delivered to the right folks, but only to the right people and trying I guess to educate people within your organization as to the need for these strict governance processes. >>Sure. I tend to refer to them as the foundational pillars and if I was to take a step back and say what they are and how we use them. So the first one is metadata management and it is really around that. What data do you have? It's that understanding the information. So I used to refer to it or I still refer to it as when we were going to the library and you used to have to look at the card catalog That metadata manages very similar to the card catalogue for books. It tells you all the information. What's the genre? Who's the author, what the section is, where it is in the library and that is a core pieces. If you don't understand your data you can govern it. So that's kind of Pillar one. Metadata management. Pillar two is what's often referred to his data lineage. But I do think the new buzzword is that a providence? It's really that access low. It's understanding where data comes from the movement along the journey and where it's going. If you don't understand that horizontal front to back you can't govern the information as well because it can be changing hands, it can be altering and so it's that that end to end look at things. This pillar to pillar three is data quality and that's really that measurement of is it the right data and it is made up of a series of data quality dimensions, accuracy, completeness, validity, timeliness, conformity, reasonable nous etcetera. And it's really that fit freezes the data that I have the right data as I said earlier and then last but not least is issue management. At the end of the day there will be problems, there is too much data. It is in too many hands. So it's not we're not trying to remove all data issues but having a process where you can actually log prioritized and ultimately remediate is that that last and final pillar of the data management I would call circle because it has to all come back together and it's rinse and repeat. >>Yeah. And and so you you raise a point, a great point about things are going to go wrong. You know, eventually something happens. We know nothing is foolproof, nothing is bulletproof. Uh and we're certainly seeing that in terms of security now right with breaches pretty well publicized with invasions, ransom, where you name it, right, all kinds of flavors of that. Unfortunately. So from your perspective in terms of being that this data data guardian, if you will um how much of your concerns now have been amplified in terms of security and privacy and and that kind of internal uh communication you have to have or or I guess by in you know to understand the need to make this data ultra secure and ultra private, especially in this environment where the bad actors you know are are prolific, so kind of talk about that it's a struggle but maybe that challenge That you have in this environment here in 2021. >>Yeah, I think what you know the way I would do it is the struggle is again that that need or the desire to to protect everything and at the end of the day that's hard. And so the struggle right now that I have ri faces the prioritization. How do we differentiate what we call the critical few some call it cds chris critical data elements that they call it Katie key dad elements there, there's there's a term but really as that need and that demand grows whether it's for security or privacy or even data democratization, which hopefully we do talk about at some point, all these things are reliance on the right data because like statistics garbage in garbage out. So whether it's because you need the right information because of your analytics and your models or as you talked about its prevention and defensive security reasons that defensive and offensive isn't going away. So the real struggle is not around the driver, but the prioritization. How do you focus to ensure you're spending your time on the right areas and more importantly in alignment with the business priorities? Because one of the things that's critically important for me is ensuring that it's not metadata or data governance or data quality for the sake of it, it is in alignment with that business priority. >>And and and a big part of that is is strategy for the future, right strategy going forward. you know, where you're going to go in the next 18, 24 months and so from uh without, you know, revealing state secrets here. How do you how do you see this playing out in terms of this continual digital transformation? If you will from the B O a side of the fence? Um, you know, what what do you see as being important or in terms of what you would like to accomplish over the next year and a half, two years >>for me? I think it's that and I'm glad you asked that question, cause I wanted to mention that that data democratization I think. And if we if we debunk that or look into that, what do I mean by democratization? It's that real time access, but it's not real time access to the wrong information or to the wrong people as we talked about, it really is ensuring almost like an amazon model that I can simply search for the information I need, I can put it in my shopping cart and I can check out and I am able to that's that data driven, I'm able to use that information knowing it's the right data in the right hands for the right reasons and that's really my future mind where I'm getting to is how do I enable that? How do I democratize it? So data is truly and does become that enterprise asset that everybody and anybody can access, but they can do so in a way that has all of those defensive controls in place, going back to that right data, right place the right time because the shiny toys of ai machine learning all those things is if you're building models off of the wrong data from the wrong place or in the wrong hands, it's going to bite you in about whether it's today, tomorrow, the future. >>Well, exactly. I love that analogy and on that I'm going to thank you for the time. So I'm gonna call you a luxury data salesperson, not a car car salesman. But uh it certainly has paid off and we certainly congratulate you as well on the award that you wanna hear from calabria. >>Thank you so much and thank you for the time. Hopefully you've enjoyed our conversations as much as I have. >>I certainly have. Thank you very much Jacqueline Osborn, joining us on the Bank of America, the data citizen of the Year. Her data citizens 2021. I'm john walls and you've been watching the cube >>mm
SUMMARY :
data citizen of the Year, one of the culebra Excellence Award winners. Thank you so much. that data matters is driving to that data as an asset and about making finding the relevance of data and delivering it on time to the right people within your that you are embedding into the culture and the DNA of the corporate and not so let's add that layer into this to not only internal, all the communication back that you have a solid basis and that you are taking the measures I shouldn't say content, content has that pejorative term to that you that you that you deal with, And it's really that fit freezes the data that I have the right data as I said earlier in terms of being that this data data guardian, if you will um So whether it's because you need the right information because of your analytics and your models or as you talked about And and and a big part of that is is strategy for the future, right strategy going forward. or in the wrong hands, it's going to bite you in about whether it's today, I love that analogy and on that I'm going to thank you for the time. Thank you so much and thank you for the time. Thank you very much Jacqueline Osborn, joining us on the Bank of America, the data citizen
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Jacqueline | PERSON | 0.99+ |
Jacqueline Osborne | PERSON | 0.99+ |
Jacqueline Osborn | PERSON | 0.99+ |
Jacklyn Osborne | PERSON | 0.99+ |
john Wallace | PERSON | 0.99+ |
2021 | DATE | 0.99+ |
amazon | ORGANIZATION | 0.99+ |
UK | LOCATION | 0.99+ |
15 years | QUANTITY | 0.99+ |
Bank of America | ORGANIZATION | 0.99+ |
tomorrow | DATE | 0.99+ |
today | DATE | 0.99+ |
2013 | DATE | 0.99+ |
yesterday | DATE | 0.99+ |
ORGANIZATION | 0.98+ | |
john walls | PERSON | 0.98+ |
Bank Of America | ORGANIZATION | 0.98+ |
first one | QUANTITY | 0.98+ |
one | QUANTITY | 0.98+ |
10 years ago | DATE | 0.97+ |
Katie | PERSON | 0.97+ |
18 | QUANTITY | 0.91+ |
24 months | QUANTITY | 0.89+ |
culebra | TITLE | 0.88+ |
California | LOCATION | 0.88+ |
C C P A | TITLE | 0.87+ |
next year and a half | DATE | 0.87+ |
Data Citizens 21 | ORGANIZATION | 0.85+ |
first regulation | QUANTITY | 0.85+ |
Excellence Award | TITLE | 0.78+ |
two years | QUANTITY | 0.78+ |
Collibra Data | ORGANIZATION | 0.76+ |
single day | QUANTITY | 0.73+ |
chris | PERSON | 0.69+ |
pillar three | OTHER | 0.68+ |
five | QUANTITY | 0.59+ |
Pillar two | OTHER | 0.5+ |
hands | QUANTITY | 0.49+ |
21 | QUANTITY | 0.45+ |
Pillar | PERSON | 0.3+ |
Michele Goetz,, Forrester Research | Collibra Data Citizens'21
>> From around the globe, it's theCUBE, covering Data Citizens '21. Brought to you by Collibra. >> For the past decade organizations have been effecting very deliberate data strategies and investing quite heavily in people, processes and technology, specifically designed to gain insights from data, better serve customers, drive new revenue streams we've heard this before. The results quite frankly have been mixed. As much of the effort is focused on analytics and technology designed to create a single version of the truth, which in many cases continues to be elusive. Moreover, the world of data is changing. Data is increasingly distributed making collaboration and governance more challenging, especially where operational use cases are a priority. Hello, everyone. My name is Dave Vellante and you're watching theCUBE coverage of Data Citizens '21. And we're pleased to welcome Michele Goetz who's the vice president and principal analyst at Forrester Research. Hello, Michele. Welcome to theCUBE. >> Hi, Dave. Thanks for having me today. >> It's our pleasure. So I want to start, you serve have a wide range of roles including enterprise architects, CDOs, chief data officers that is, analyst, the analyst, et cetera, and many data-related functions. And my first question is what are they thinking about today? What's on their minds, these data experts? >> So there's actually two things happening. One is what is the demand that's placed on data for our new intelligent digital systems. So we're seeing a lot of investment and interest in things like edge computing. And then how does that intersect with artificial intelligence to really run your business intelligently and drive new value propositions to be both adaptive to the market as well as resilient to changes that are unforeseen. The second thing is then you create this massive complexity to managing the data, governing the data, orchestrating the data because it's not just a centralized data warehouse environment anymore. You have a highly diverse and distributed landscape that you both control internally, as well as taking advantage of third party information. So really what the struggle then becomes is how do you trust the data? How do you govern it, and secure, and protect that data? And then how do you ensure that it's hyper contextualized to the types of value propositions that our intelligence systems are going to serve? >> Well, I think you're hitting on the key issues here. I mean, you're right. The data and I sort of refer to this as well is sort of out there, it's distributed at the edge. But generally our data organizations are actually quite centralized and as well you talk about the need to trust the data obviously that's crucial. But are you seeing the organization change? I know you're talking about this to clients, your discussion about collaboration. How are you seeing that change? >> Yeah, so as you have to bring data into context of the insights that you're trying to get or the intelligence that's automating and scaling out the value streams and outcomes within your business, we're actually seeing a federated model emerge in organizations. So while there's still a centralized data management and data services organization led typical enterprise architects for data, a data engineering team that's managing warehouses as in data lakes. They're creating this great platform to access and orchestrate information, but we're also seeing data, and analytics, and governance teams come together under chief data officers or chief data and analytics officers. And this is really where the insights are being generated from either BI and analytics or from data science itself and having dedicated data engineers and stewards that are helping to access and prepare data for analytic efforts. And then lastly, this is the really interesting part is when you push data into the edge the goal is that you're actually driving an experience and an application. And so in that case we are seeing data engineering teams starting to be incorporated into the solutions teams that are aligned to lines of business or divisions themselves. And so really what's happening is if there is a solution consultant who is also overseeing value-based portfolio management when you need to instrument the data to these new use cases and keep up with the pace of the business it's this engineering team that is part of the DevOps work bench to execute on that. So really the balances we need the core, we need to get to the insights and build our models for AI. And then the next piece is how do you activate all that? And there's a team over there to help. So it's really spreading the wealth and expertise where it needs to go. >> Yeah, I love that. You took a couple of things that really resonated with me. You talked about context a couple of times and this notion of a federated model, because historically the sort of big data architecture, the team, they didn't have the context, the business context, and my inference is that's changing and I think that's critical. Your talk at Data Citizens is called how obsessive collaboration fuels scalable DataOps. You talk about the data, the DevOps team. What's the premise you put forth to the audience? >> So the point about obsessive collaboration is sort of taking the hubris out of your expertise on the data. Certainly there's a recognition by data professionals that the business understands and owns their data. They know the semantics, they know the context of it and just receiving the requirements on that was assumed to be okay. And then you could provide a data foundation, whether it's just a lake or whether you have a warehouse environment where you're pulling for your analytics. The reality is that as we move into more of AI machine learning type of model, one, more context is necessary. And you're kind of balancing between what are the things that you can ascribe to the data globally which is what data engineers can support. And then there's what is unique about the data and the context of the data that is related to the business value and outcome as well as the feature engineering that is being done on the machine learning models. So there has to be a really tight link and collaboration between the data engineers, the data scientists, and analysts, and the business stakeholders themselves. You see a lot of pods starting up that way to build the intelligence within the system. And then lastly, what do you do with that model? What do you do with that data? What do you do with that insight? You now have to shift your collaboration over to the work bench that is going to pull all these components together to create the experiences and the automation that you're looking for. And that requires a different collaboration model around software development. And still incorporating the business expertise from those stakeholders, so that you're satisfying, not only the quality of the code to run the solution, but the quality towards the outcome that meets the expectation and the time to value that your stakeholders have. So data teams aren't just sitting in the basement or in another part of the organization and digitally disconnected anymore. You're finding that they're having to work much more closely and side by side with their colleagues and stakeholders. >> I think it's clear that you understand this space really well. Hubris out context in, I mean, that's kind of what's been lacking. And I'm glad you said you used the word anymore because I think it's a recognition that that's kind of what it was. They were down in the basement or out in some kind of silo. And I think, and I want to ask you this. I come back to organization because I think a lot of organizations look the most cost effective way for us to serve the business is to have a single data team with hyper specialized roles. That'll be the cheapest way, the most efficient way that we can serve them. And meanwhile, the business, which as you pointed out has the context is frustrated. They can't get to data. So there's this notion of a federated governance model is actually quite interesting. Are you seeing actual common use cases where this is being operationalized? >> Absolutely, I think the first place that you were seeing it was within the operational technology use cases. There the use cases where a lot of the manufacturing industrial device. Any sort of IOT based use case really recognized that without applying data and intelligence to whatever process was going to be executed. It was really going to be challenging to know that you're creating the right foundation, meeting the SLA requirements, and then ultimately bringing the right quality and integrity to the data, let alone any sort of data protection and regulatory compliance that has to be necessary. So you already started seeing the solution teams coming together with the data engineers, the solution developers, the analysts, and data scientists, and the business stakeholders to drive that. But that is starting to come back down into more of the IT mindset as well. And so DataOps starts to emerge from that paradigm into more of the corporate types of use cases and sort of parrot that because there are customer experience use cases that have an IOT or edge component to though. We live on our smart phones, we live on our smart watches, we've got our laptops. All of us have been put into virtual collaboration. And so we really need to take into account not just the insight of analytics but how do you feed that forward. And so this is really where you're seeing sort of the evolution of DataOps as a competency not only to engineer the data and collaborate but ensure that there sort of an activation and alignment where the value is going to come out, and still being trusted and governed. >> I got kind of a weird question, but I'm going. I was talking to somebody in Israel the other day and they told me masks are off, the economy's booming. And he noted that Israel said, hey, we're going to pay up for the price of a vaccine. The cost per dose out, 28 bucks or whatever it was. And he pointed out that the EU haggled big time and they don't want to pay $19. And as a result they're not as far along. Israel understood that the real value was opening up the economy. And so there's an analogy here which I want to come back to my organization and it relates to the DataOps. Is if the real metric is, hey, I have an idea for a data product. How long does it take to go from idea to monetization? That seems to me to be a better KPI than how much storage I have, or how much geometry petabytes I'm managing. So my question is, and it relates to DataOps. Can that DataOps, should that DataOps individual maybe live, and then maybe even the data engineer live inside of the business and is that even feasible technically with this notion of federated governance? Are you seeing that and maybe talk a little bit more about this DataOps role. Is it. >> Yeah. >> Fungible. >> Yeah, it's definitely fungible. And in fact, when I talked about sort of those three units of there's your core enterprise data services, there's your BI and data, and then there's your line of business. All of those, the engineering and the ops is the DataOps which is living in all of those environments and being as close as possible to where the value proposition is being defined and designed. So absolutely being able to federate that. And I think the other piece on DataOps that is really important is recognizing how the practices around continuous integration and continuous deployment using agile methodologies is really reshaping. A lot of the waterfall approaches that were done before where data was lagging 12 to 18 months behind any sort of insights, but a lot of the platforms today assume that you're moving into a standard mature software development life cycle. And you can start seeing returns on investment within a quarter, really, so that you can iterate and then speed that up so that you're delivering new value every two weeks. But it does change the mindset this DataOps team aligned to solution development, aligned to a broader portfolio management of business capabilities and outcomes needs to understand how to appropriately scope the data products that they're delivering to incremental value-based milestones. So the business feels that they're getting improvements over time and not just waiting. So there's an MVP, you move forward on that and optimize, optimize, extend scale. So again, that CICD mindset is helping to not bottleneck and wait for the complete field of dreams to come from your data and your insights. >> Thank you for that, Michelle. I want to come back to this idea of collaboration because over the last decade we've seen attempts, I've seen software come out to try to help the various roles collaborate and some of it's been okay, but you have these hyper specialized roles. You've got data scientists, data engineers, quality engineers, analysts, et cetera. And they tend to be in their own little worlds. But at the end of the day we rely on them all to get answers. So how can these data scientists, all these stewards, how can they collaborate better? What are you seeing there? >> You need to get them onto the same process. That's really what it comes down to. If you're working from different points of view, that's one thing. But if you're working from different processes collaborating is really challenging. And I think the one thing that's really come out of this move to machine learning and AI is recognizing that you need processes that reinforce collaboration. So that's number one. So you see agile development in CICD not just for DataOps, not just for DevOps, but also encouraging and propelling these projects and iterations for the data science teams as well or even if there's machine learning engineers incorporated. And then certainly the business stakeholders are inserted within there as appropriate to accept what it is that is going to be developed. So processes is number one. And number two is what is the platform that's going to reinforce those processes and collaboration. And it's really about what's being shared. How do you share? So certainly what we're seeing within the platforms themselves is everybody contributing into some sort of a library where their components and products are being ascribed to and then that's able to help different teams grab those components and build out what those solutions are going to be. And in fact, what gets really cool about that is you don't always need hardcore data scientists anymore as you have this social platform for data product and analytic product development. This is where a lot of the auto ML begins because those who are less data science-oriented but can build an insight pipeline, can grab all the different components from the pipelines to the transformations, to capture mechanisms, to bolting into the model itself and allowing that to be delivered to the application. So really kind of balancing out between process and platforms that enable and encourage, and almost force you to collaborate and manage through sharing. >> Thank you for that. I want to ask you about the role data governance. You've mentioned trust and that's data quality, and you've got teams that are focused on and specialists focused on data quality. There's the data catalog. Here's my question. You mentioned edge a couple of times and I can see a lot of that. I mean, today, most AI is are a lot of value, I would say most is modeling. And in the future, you mentioned edge it's going to be a lot of influencing in real time. And people maybe not going to have the time or be involved in that decision. So what are you seeing in terms of data governance, federate. We talked about federated governance, this notion of a data catalog and maybe automating data quality without necessarily having it be so labor intensive. What are you seeing the trends there? >> Yeah, so I think our new environment, our new normal is that you have to be composable, interoperable, and portable. Portability is really the key here. So from a cataloging perspective and governance we would bring everything together into our catalogs and business glossaries. And it would be a reference point, it was like a massive Wiki. Well, that's wonderful, but why just how's it in a museum. You really want to activate that. And I think what's interesting about the technologies today for governance is that you can turn those rules, and business logic, and policies into services that are composable components and bring those into the solutions that you're defining. And in that way what happens is that creates portability. You can drive them wherever they need to go. But from the composability and the interoperability portion of that you can put those services in the right place at the right time for what you need for an outcome so that you start to become behaviorally driven on executing on governance rather than trying to write all of the governance down into transformations and controls to where the data lives. You can have quality and observability of that quality and performance right at the edge and context of behavior and use of that solution. You can run those services and in governance on gateways that are managing and routing information at those edge solutions and we synchronization between the edge and the cloud comes up. And if it's appropriate during synchronization of the data back into the data lake you can run those services there. So there's a lot more flexibility and elasticity for today's modern approaches to cataloging, and glossaries, and governance of data than we had before. And that goes back into what we talked about earlier of like, this is the new wave of DataOps. This is how you bring data products to fruition now. Everything is about activation. >> So how do you see the future of DataOps? I mean, I kind of been pushing you to a more decentralized model where the business has more control 'cause the business has the context. I mean, I feel as though, hey, we've done a great job of contextualizing our operational systems. The sales team they know when the data is crap within my CRM, but our data systems are context agnostic generally. And you obviously understand that problem well. But so how do you see the future of DataOps? >> So I think what's kind of interesting about that is we're going to go to governance on greed versus governance on right more so. What do I mean by that? That means that from a business perspective there's two sides of it. There's ensuring that where governance is run is as we talked about before executing at the appropriate place at the appropriate time. It's semantically domain-centric driven not logical and systems centric. So that's number one. Number two is also recognizing that business owners or business operations actually plays a role in this, because as you're working within your CRM systems, like a Salesforce, for example you're using an iPaaS MuleSoft to connect to other applications, connect to other data sources, connect to other analytics sources. And what's happening there is that the data is being modeled and personalized to whatever view insight our task has to happen within those processes. So even CRM environments where we think of as sort of traditional technologies that we're used to are getting a lift, both in terms of intelligence from the data but also your flexibility and how you execute governance and quality services within that environment. And that actually opens up the data foundations a lot more and avoids you from having to do a lot of moving, copying centralizing data and creating an over-weighted business application and an over, both in terms of the data foundation but also in terms of the types of business services, and status updates, and processes that happen in the application itself. You're drawing those tasks back down to where they should be and where performance can be managed rather than trying to over customize your application environment. And that gives you a lot more flexibility later too for any sort of upgrades or migrations that you want to make because all of the logic is contained back down in a service layer instead. >> Great perspectives, Michelle, you obviously know your stuff and it's been a pleasure having you on. My last question is when you look out there anything that really excites you or any specific research that you're working on that you want to share, that you're super pumped about? >> I think there's two things. One is it's truly incredible the amount of insight and growth that is coming through data profiling and observation. Really understanding and contextualizing data anomalies so that you understand is data helping or hurting the business value and tying it very specifically to processes and metrics, which is fantastic as well as models themselves like really understanding how data inputs and outputs are making a difference whether the model performs or not. And then I think the second thing is really the emergence of more active data, active insights. And as what we talked about before your ability to package up services for governance and quality in particular that allow you to scale your data out towards the edge or where it's needed. And doing so not just so that you can run analytics but that you're also driving overall processes and value. So the research around the operationalization and activation of data is really exciting. And looking at the networks and service mesh to bring those things together is kind of where I'm focusing right now because what's the point of having data in a database if it's not providing any value. >> Michele Goetz, Forrester Research, thanks so much for coming on theCUBE. Really awesome perspectives. You're in an exciting space, so appreciate your time. >> Absolutely, thank you. >> And thank you for watching Data Citizens '21 on theCUBE. My name is Dave Vellante. (upbeat music)
SUMMARY :
Brought to you by Collibra. of the truth, which in many Thanks for having me today. So I want to start, you serve that you both control internally, the need to trust the data the data to these new use cases What's the premise you and the time to value that And meanwhile, the business, But that is starting to come back down and it relates to the DataOps. and the ops is the DataOps And they tend to be in and allowing that to be And in the future, you mentioned edge of that you can put those services I mean, I kind of been pushing you And that gives you a lot more flexibility on that you want to share, that allow you to scale your so appreciate your time. And thank you for watching
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Michele Goetz | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Michele | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Michelle | PERSON | 0.99+ |
$19 | QUANTITY | 0.99+ |
Israel | LOCATION | 0.99+ |
12 | QUANTITY | 0.99+ |
28 bucks | QUANTITY | 0.99+ |
first question | QUANTITY | 0.99+ |
two sides | QUANTITY | 0.99+ |
EU | ORGANIZATION | 0.99+ |
two things | QUANTITY | 0.99+ |
Forrester Research | ORGANIZATION | 0.99+ |
today | DATE | 0.99+ |
One | QUANTITY | 0.99+ |
Data Citizens | ORGANIZATION | 0.99+ |
second thing | QUANTITY | 0.99+ |
both | QUANTITY | 0.98+ |
Collibra | ORGANIZATION | 0.98+ |
18 months | QUANTITY | 0.98+ |
Forrester Research | ORGANIZATION | 0.98+ |
one | QUANTITY | 0.96+ |
Israel | ORGANIZATION | 0.96+ |
three units | QUANTITY | 0.94+ |
Data Citizens '21 | TITLE | 0.94+ |
DataOps | ORGANIZATION | 0.93+ |
one thing | QUANTITY | 0.9+ |
Hubris | PERSON | 0.89+ |
first place | QUANTITY | 0.85+ |
past decade | DATE | 0.84+ |
agile | TITLE | 0.83+ |
Number two | QUANTITY | 0.82+ |
single data team | QUANTITY | 0.82+ |
DevOps | TITLE | 0.81+ |
last | DATE | 0.8+ |
DataOps | TITLE | 0.8+ |
edge | ORGANIZATION | 0.78+ |
DataOps | OTHER | 0.78+ |
single version | QUANTITY | 0.78+ |
wave | EVENT | 0.74+ |
two weeks | QUANTITY | 0.74+ |
DataOps | EVENT | 0.73+ |
times | QUANTITY | 0.73+ |
SLA | TITLE | 0.72+ |
number two | QUANTITY | 0.71+ |
Salesforce | TITLE | 0.7+ |
CICD | ORGANIZATION | 0.67+ |
number one | QUANTITY | 0.65+ |
CICD | TITLE | 0.6+ |
iPaaS | TITLE | 0.59+ |
Citizens'21 | ORGANIZATION | 0.56+ |
couple | QUANTITY | 0.42+ |
MuleSoft | ORGANIZATION | 0.41+ |
theCUBE | TITLE | 0.34+ |
Kirk Haslbeck, Collibra | Collibra Data Citizens'21
>> Narrator: From around the globe. It's theCUBE covering Data Citizens, 21 brought to you by Collibra. >> Hi everybody, John Walls here on theCUBE continuing our coverage of Data Citizens 2021. And I'm with now Kirk Haslbeck was the vice president of engineering at Collibra. Kirk joins us from his home, Kirk good to see you today. Thanks for joining us here on theCUBE. >> Well, thanks for having me, I'm excited to be here. >> Yeah, no, this is all about data quality, right? That's your world, you know, making sure that you're making the most of this great asset, right? That continues to evolve and mature. And yet I'm wondering from your perspective from your side of the fence, I assume data quality has always been a concern, right? Making the most of this asset, wherever it is. And whenever you can get it. >> Yeah, absolutely. I mean, the challenge hasn't slowed down, right? We're looking at more data coming in all the time laws of large numbers, but you kind of have to wonder a lot of the large organizations have been trying to solve this for quite some time, right? So what is going on? Why isn't it just easier to get our arms around it? And there's so many reasons, but if I were to list maybe the top one it's the diminishing value of static rules and a good example of that might just be something as simple as starting with a gender column. And back in the day, we might have assumed that it had to be an M or an F male or female. And over the last couple of years, we've actually seen that column evolve into six or seven different types. So just the very act of assuming that we could go in and write rules about our business and that they're never going to change and that the data's not evolving. And we start to think about zip codes and addresses that are changing, you know, Google street view. However you want to think of it. Every column and every record is just changing all the time. And so what, you know, many large organizations have done they've written maybe forty thousand, fifty thousand rules and they have to continue to manage them. So I think we all try to get our arms around rule creation. And it's not even just about that. It would also be about if you had all the rules in place could you even keep up with them on a day-to-day changing basis? And so one of the largest companies in the U.S sat down with myself and team early on and said, so what am I up against? I'm really either going to continue to hire a mountain of rule writers, you know, as they put it per department to get my arms around this and that'll never end, or I need to think of a better way which was the solution that we were ultimately providing at that time. And, you know, and what that solution really entails is using data mining to learn and observe all the data that's already there and to curate the rules based on the data itself, right? That's where all the information is. And then ultimately we have this concept of adaptive ruling which means all the variants in that column all the new values that come in every day, the roll counts, the sizes are all being managed. It's an automatic program, so that the rule is recalibrating itself and I think this is where most most chief data officers sit back and say if I have to protect the franchise, right? If I have to put a trusted data program in place what are my options and how does it scale? And they have to take a really hard look at something like this. >> You know, the process that you're talking about too it just kind of reminds me of, of like, of a diet in that nobody wants to go through that pain, right? We all want to eat, what we want to eat but you're really happy when you get there at the end of the day, you like the way you look like the way you feel, like the way you act, all those things, so it'd be almost like when you're talking about in terms of this data, you know, in terms of a rule setting, right? Governance and accessibility and all these things, it's, it can be a tough process. Can be, but it certainly seems well worth it because you make your data all the more valuable and essential to your business, Is that about right? >> Yeah, that's right, that's right. And you know, it's funny you compare it to a diet. Sometimes I think of a patient stress test, you know, almost like a health exam and we're spending so much time testing the analytics or testing the models and looking at accuracy and can anybody achieve 89 to 90% but we're probably not spending enough time testing our data assumptions, right? Running that diet or health check against the data itself. And I would say that every fortune 100 or even fortune 1000 probably considers themselves a data-driven business at this point in time, which means they're going to make decisions quickly based on data. And if we really pull that thread a little bit, what about what's the cost of making decisions on incorrect data? I mean it's terribly scary as we start to unfold that, so you're absolutely right. They're taking it very seriously. And it takes a lot of thought of how to get enough coverage and how to create trust in that type of environment. >> Yeah, it's almost too, it's like, you know the concept of input bias a little bit here where were if you're assuming that certain data sets are accurate and pertinent, relevant, all those things and then you're making decisions based on those data sets but you might be looking at kind of an input bias if I'm hearing you right, that you're maybe you're not keeping your mind open as to what really should be important or influential in your decision-making in terms of data. And then obviously acting on that appropriately. So you have to decide maybe on the front side, you know, what data matters and you help people do that. And then help me make decisions based on good data basically, right? >> Right, that's right and to be fully transparent and candid we weren't as strong in the what data matters piece of it. We were very strong early on in giving you broad coverage meaning we made no assumptions, right? We wanted to go out and attack the whole surface of the problem and then sort of have a consistent scoring methodology. And as we've partnered and now become acquired by Collibra which is an exciting path, they are very good at what's called critical data elements and lineage and doing graph analysis to sort of identify the assets that are most used. And that's where we see a huge benefit in combining those two powers. So you kind of got there quickly, but ultimately we are combining the forces of total coverage at scale with what is most important to you. >> Imagine we coming OwlDQ, you were the founder of that, that was purchased by Collibra. Tell us a little bit about, just about how that came to be in first off, we did a OwlDQ, what that was all about and then how this, this a marriage, if you will how this relationship with Collibra evolved and then you were eventually purchased. >> Yeah, absolutely, so, I mean, I had this passion that I couldn't hold back on in the data community. Once you see it this way, where you can use data mining and compute power to curate and manage rules and then take it much beyond there and to predicting and seeing around the corner for tomorrow, you have to go that direction. So that's exactly what myself and team did. And what we started to see with the early adopters of our software was that they were getting a seven figure return on investment per department. And they were able to replicate this across many departments, so we've had a great lifespan with those customers, staying and growing and expanding but we were getting a little bit of market pressure from the investment community, as well as that same customer community that they wanted us to integrate with their data catalog and the data catalog of choice. Every time the conversation was Collibra. And interestingly enough, you know, I ran into the likes of Jim Cushman and in the, you know, the whole thing unfolds from there. I think they were seeing a little bit of a similar story saying doesn't catalog and lineage belong together with quality. And when we sat together it was like three market forces suggesting the same answer. And as we laid out the roadmap and the integration we just can't see it any other way. There's no way I'll be bold and say that it goes back the other way, not just for this company but for the industry, data governance and data intelligence will absolutely combine quality, lineage, catalog and all of the above in the future. It is becoming that clear, I think. >> You know, this has kind of a big picture question, about all of that data quality right now, what's driving this avid interest that organizations showing and it's you know, small, medium enterprise it's everybody but in your mind, you know, you've been involved in this for a number of years now. You know, why now, what is it now? Is it just that we have so much more data available that so much of it's own use that, that, you know, we know what we have. And we're realizing that what we have is pretty valuable but you know, what's the driver, what's the big push here? >> Yeah, it is a tough question. And I have gotten this one before and it's interesting because it's been around since the nineties, right? So it's a very fair question. There's a couple things I think that are driving it. One as we start to see more data in Tableau dashboards and pick your favorite BI tool you start to realize the data's not correct. You know, you look at your house on Zillow or whatever you find out it's mislabeled. It doesn't have the right bedrooms. Maybe humans are entering into the listings and as data's become more available visually we're more critical of it. And now businesses are becoming more data-driven where they're humans aren't involved as much and the actions are automatically being taken. And it becomes an embarrassing moment if your data is incorrect and we can really measure that cost at this point. You do see some other factors like cloud migration. Well, that adds a risk to your business. Could you possibly port everything, not just the servers not just the software, but all of your data into another system and think that there would be no errors in that process. So as people are kind of creating their next generation platforms, and then probably even a touch of COVID accelerating that cloud migration adoption and even just technology adoption. So for a multitude of reasons, there's just more data and there's more data quality concerns than ever before. >> So if you're talking to a prospective client right now, which you probably are, you know, what do you want to share with them? Or what would you encourage them to consider in terms of kind of their data venture their data journey if you will, in terms of, you know, refining what they have in terms of mining appropriately in terms of governing it appropriately, all these things that maybe haven't been given a lot of consideration or deep consideration. >> Yeah, I think the two things although if you listen to my other talks I can talk forever about, about all of those items. It probably, you know, maybe just do the napkin math of all the tables, all the files all the Kafka messages, right? All the columns and fields and attributes and kind of just multiply that out and and try to figure out how you would get coverage. And if you could, how you could maintain it. And why shouldn't we be trading compute power for domain knowledge and things at that point I think that's the first place to start. And probably the second is actually the act of traditional data quality rules puts you in a binary situation. It basically says you will either have a break record or you will not. So it's a yes, no question, what it never will tell you is what the answer should have been. And if you take a deeper look at the solution that we're providing to the market we're actually predicting to you what the correct value is and it's a complete paradigm shift it obviously is much more scientific, but it's much more powerful to get you to the end answer more quickly instead of just going through break records. >> Right? Tremendous capability that you just described. And on that, I'm going to thank you for the time but just think about it, right? We're we're not only going to help you make more sense of your data. We're also going to help you make better decisions and show you what that path might be or what you probably should be considering. So it certainly opens up a lot of doors for a lot of companies in that respect. Kirk, thanks for the time, sorry we didn't have enough time to hear that guitar in the background, but next time I'm going to hold you to it, okay. >> Yeah, that sounds good, John, I really appreciate it. >> All right very good Kirk Haslbeck joining us from Collibra, we continue our coverage here at Data Citizens 21 on theCUBE and I'm John Walls. (bright music)
SUMMARY :
brought to you by Collibra. Kirk good to see you today. me, I'm excited to be here. And whenever you can get it. and that the data's not evolving. like the way you feel, And you know, it's funny and you help people do that. of identify the assets that are most used. and then you were eventually purchased. and all of the above in the future. but you know, what's the driver, and the actions are you know, what do you to get you to the end answer I'm going to hold you to it, okay. Yeah, that sounds good, joining us from Collibra, we
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Kirk | PERSON | 0.99+ |
Collibra | ORGANIZATION | 0.99+ |
John | PERSON | 0.99+ |
John Walls | PERSON | 0.99+ |
six | QUANTITY | 0.99+ |
89 | QUANTITY | 0.99+ |
forty thousand | QUANTITY | 0.99+ |
Kirk Haslbeck | PERSON | 0.99+ |
Jim Cushman | PERSON | 0.99+ |
second | QUANTITY | 0.99+ |
two powers | QUANTITY | 0.99+ |
two things | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
U.S | LOCATION | 0.98+ |
90% | QUANTITY | 0.98+ |
Tableau | TITLE | 0.98+ |
seven figure | QUANTITY | 0.97+ |
tomorrow | DATE | 0.97+ |
OwlDQ | ORGANIZATION | 0.96+ |
today | DATE | 0.95+ |
three market forces | QUANTITY | 0.93+ |
fifty thousand rules | QUANTITY | 0.93+ |
nineties | DATE | 0.93+ |
One | QUANTITY | 0.93+ |
first | QUANTITY | 0.92+ |
theCUBE | ORGANIZATION | 0.91+ |
Kafka | PERSON | 0.88+ |
first place | QUANTITY | 0.85+ |
seven different types | QUANTITY | 0.83+ |
Data Citizens'21 | ORGANIZATION | 0.82+ |
couple things | QUANTITY | 0.73+ |
ORGANIZATION | 0.73+ | |
Data Citizens | ORGANIZATION | 0.72+ |
2021 | DATE | 0.69+ |
COVID | TITLE | 0.69+ |
fortune 1000 | ORGANIZATION | 0.66+ |
Data | EVENT | 0.66+ |
fortune 100 | ORGANIZATION | 0.66+ |
street view | TITLE | 0.65+ |
last couple of years | DATE | 0.63+ |
21 | EVENT | 0.55+ |
Zillow | ORGANIZATION | 0.55+ |
Data Citizens | TITLE | 0.51+ |
Citizens | ORGANIZATION | 0.39+ |
21 | QUANTITY | 0.35+ |
Stijn "Stan" Christiaens | Collibra Data Citizens'21
>>From around the globe. It's the Cube covering data citizens 21 brought to you by culebra. Hello everyone john walls here as we continue our cube conversations here as part of Data citizens 21 the conference ongoing caliber at the heart of that really at the heart of data these days and helping companies and corporations make sense. All of those data chaos that they're dealing with, trying to provide new insights, new analyses being a lot more efficient and effective with your data. That's what culebra is all about and their founder and their Chief data Citizen if you will stand christians joins us today and stan I love that title. Chief Data Citizen. What is that all about? What does that mean? >>Hey john thanks for having me over and hopefully we'll get to the point where the chief data citizen titlists cleaves to you. Thanks by the way for giving us the opportunity to speak a little bit about what we're doing with our Chief Data Citizen. Um we started the community, the company about 13 years ago, uh 2008 and over those years as a founder, I've worn many different hats from product presales to partnerships and a bunch of other things. But ultimately the company reaches a certain point, a certain size where systems and processes become absolutely necessary if you want to scale further for us. This is the moment in time when we said, okay, we probably need a data office right now ourselves, something that we've seen with many of our customers. So he said, okay, let me figure out how to lead our own data office and figure out how we can get value out of data using our own software at Clear Bright Self. And that's where it achieved. That a citizen role comes in on friday evening. We like to call that, drinking our own champagne monday morning, you know, eating our own dog food. But essentially um this is what we help our customers do build out the offices. So we're doing this ourselves now when we're very hands on. So there's a lot of things we're learning again, just like our customers do. And for me at culebra, this means that I'm responsible as achieved data citizen for our overall data strategy, which talks a lot about data products as well as our data infrastructure, which is needed to power data problems now because we're doing this in the company and also doing this in a way that is helpful to our customers. Were also figuring out how do we translate the learning that we have ourselves and give them back to our customers, to our partners, to the broader ecosystem as a whole. And that's why uh if you summarize the strategy, I like the sometimes refer to it as Data office 2025, it's 2025. What is the data office looked like by then? And we recommend to our customers also have that forward looking view just as well. So if I summarize the the answer a little bit it's very similar to achieve their officer role but because it has the external evangelization component helping other data leaders we like to refer to it as the chief data scientist. >>Yeah that that kind of uh you talk about evangelizing obviously with that that you're talking about certain kinds of responsibilities and obligations and when I think of citizenship in general I think about privileges and rights and about national citizenship. You're talking about data citizenship. So I assume that with that you're talking about appropriate behaviors and the most uh well defined behaviors and kind of keep it between the lanes basically. Is that is that how you look at being a data citizen. And if not how would you describe that to a client about being a data citizen? >>It's a very good point as a citizen. You have the rights and responsibilities and the same is exactly true for a day to citizens. For us, starting with what it is right for us. The data citizen is somebody who uses data to do their job. And we've purposely made that definition very broad because today we believe that everyone in some way uses data, do their job. You know, data universal. It's critical to business processes and its importance is only increasing and we want all the data citizens to have appropriate access to data and and the ability to do stuff with data but also to do that in the right way. And if you think about it, this is not just something that applies to you and your job but also extends beyond the workplace because as a data citizen, you're also a human being. Of course. So the way you do data at home with your friends and family, all of this becomes important as well. Uh and we like to think about it as informed privacy. Us data citizens who think about trust in data all the time because ultimately everybody's talking today about data as an asset and data is the new gold and the new oil and the new soil. And there is a ton of value uh data but it's not just organizations themselves to see this. It's also the bad actors out there were reading a lot more about data breaches for example. So ultimately there is no value without rescue. Uh so as the data citizen you can achieve value but you also have to think about how do I avoid these risks? And as an organization, if you manage to combine both of those, that's when you can get the maximum value out of data in a trusted manner. >>Yeah, I think this is pretty interesting approach that you've taken here because obviously there are processes with regard to data, right? I mean you know that's that's pretty clear but there are there's a culture that you're talking about here that not only are we going to have an operational plan for how we do this certain activity and how we're going to uh analyze here, input here action uh perform action on that whatever. But we're gonna have a mindset or an approach mentally that we want our company to embrace. So if you would walk me through that process a little bit in terms of creating that kind of culture which is very different then kind of the X's and oh's and the technical side of things. >>Yeah, that's I think where organizations face the biggest challenge because you know, maybe they're hiring the best, most unique data scientists in the world, but it's not about what that individual can do, right? It's about what the combination of data citizens across the organization can do. And I think there it starts first by thinking as an individual about universal goal Golden rule, treat others as you would want to be treated yourself right the way you would ethically use data at your job. Think about that. There's other people and other companies who you would want to do the same thing. Um now from our experience and our own data office at cordoba as well as what we see with our customers, a lot of that personal responsibility, which is where culture starts, starts with data literacy and you know, we talked a little bit about Planet Rock and small statues in brussels Belgium where I'm from. But essentially um here we speak a couple of languages in Belgium and for organizations for individuals, Data literacy is very similar. You know, you're able to read and write, which are pretty essential for any job today. And so we want all data citizens to also be able to speak and read and write data fluently if I if I can express it this way. And one of the key ways of getting that done and establishing that culture around data uh is lies with the one who leads data in the organization, the Chief Petty Officer or however the roll is called. They play a very important role in this. Um, the comparison maybe that I always make there is think about other assets in your organization. You know, you're you're organized for the money asset for the talent assets with HR and a bunch of other assets. So let's talk about the money asset for a little bit, right? You have a finance department, you have a chief financial officer. And obviously their responsibility is around managing that money asset, but it's also around making others in the organization think about that money asset and they do that through established processes and responsibilities like budgeting and planning, but also ultimately to the individual where, you know, through expense sheets that we all off so much they make you think about money. So if the CFO makes everyone in the company thinks about think about money, that data officer or the data lead has to think has to make everyone think uh in the company about data as a as it just as well and and those rights those responsibilities um in that culture, they also change right today. They're set this and this way because of privacy and policy X. And Y. And Z. But tomorrow for example as with the european union's new regulation around the eye, there's a bunch of new responsibilities you have to think about. >>Mhm. You know you mentioned security and about value and risk which is certainly um they are part and parcel right? If I have something important, I gotta protect it because somebody else might want to um to create some damage, some harm uh and and steal my value basically. Well that's what's happening as you point out in the data world these days. So so what kind of work are you doing in that regard in terms of reinforcing the importance of security, culture, privacy culture, you know this kind of protective culture within an organization so that everybody fully understands the risks. But also the huge upsides if you do enforce this responsibility and these good behaviors that that obviously the company can gain from and then provide value to their client base. So how do you reinforce that within your clients to spread that culture if you will within their organizations? >>Um spreading a culture is not always an easy thing. Um especially a lot of organizations think about the value around data but to your point, not always about the risks that come associated with it sometimes just because they don't know about it yet. Right? There's new architecture is that come into play like the clouds and that comes with a whole bunch of new risk. That's why one of the things that we recommend always to our uh customers and to data officers and our customers organizations is that next to establishing that that data literacy, for example, and working on data products is that they also partners strongly with other leaders in their organization. On the one hand, for example, the legal uh folks, where typically you find the aspects around privacy and on the other hand, um the information security folks, because if you're building up a sort of map of your data, look at it like a castle, right that you're trying to protect. Uh if you don't have a map of your castle with the strong points and weak points and you know, where people can build, dig a hole under your wall or what have you, then it's very hard to defend. So you have to be able to get a map of your data. A data map if you will know what data is out there with being used by and and why and how and then you want to prioritize that data which is the most important, what are the most important uses and put the appropriate protections and controls in place. Um and it's fundamental that you do that together with your legal and information security partners because you may have as a data leader you may have the data module data expertise, but there's a bunch of other things that come into play when you're trying to protect, not just the data but really your company on its data as a whole. >>You know you were talking about 2025 a little bit ago and I think good for you. That's quite a crystal ball that you have you know looking uh with the headlights that far down the road. But I know you have to be you know that kind of progressive thinking is very important. What do you see in the long term for number one? You're you're kind of position as a chief data citizen if you will. And then the role of the chief data officer which you think is kind of migrating toward that citizenship if you will. So maybe put on those long term vision uh goggles of yours again and and tell me what do you see as far as these evolving roles and and these new responsibilities for people who are ceos these days? >>Um well 2025 is closer than we think right? And obviously uh my crystal ball is as Fuzzy as everyone else's but there's a few things that trends that you can easily identify and that we've seen by doing this for so long at culebra. Um and one is the push around data I think last year. Um the years 2020, 2020 words uh sort of Covid became the executive director of digitalization forced everyone to think more about digital. And I expect that to continue. Right. So that's an important aspect. The second important aspect that I expect to continue for the next couple of years, easily. 2025 is the whole movement to the cloud. So those cloud native architecture to become important as well as the, you know, preparing your data around and preparing your false, he's around it, et cetera. I also expect that privacy regulations will continue to increase as well as the need to protect your data assets. Um And I expect that a lot of achieved that officers will also be very busy building out those data products. So if you if you think that that trend then okay, data products are getting more important for t data officers, then um data quality is something that's increasingly important today to get right otherwise becomes a garbage in garbage out kind of situation where your data products are being fed bad food and ultimately their their outcomes are very tricky. So for us, for the chief data officers, Um I think there was about one of them in 2002. Um and then in 2019 ISH, let's say there were around 10,000. So there's there's plenty of upside to go for the chief data officers, there's plenty of roles like that needed across the world. Um and they've also evolved in in responsibility and I expect that their position, you know, it it is really a sea level position today in most organizations expect that that trend will also to continue to grow. But ultimately, those achieved that officers have to think about the business, right? Not just the defensive and offensive positions around data like policies and regulations, but also the support for businesses who are today shifting very fast and we'll continue to uh to digital. So those Tv officers will be seen as heroes, especially when they can build out a factory of data products that really supports the business. Um, but at the same time, they have to figure out how to um reach and always branch to their technical counterparts because you cannot build that factory of data products in my mind, at least without the proper infrastructure. And that's where your technical teams come in. And then obviously the partnerships with your video and information security folks, of course. >>Well heroes. Everybody wants to be the hero. And I know that uh you painted a pretty clear path right now as far as the Chief data officer is concerned and their importance and the value to companies down the road stan. We thank you very much for the time today and for the insight and wish you continued success at the conference. Thank you very much. >>Thank you very much. Have a nice day healthy. >>Thank you very much Dan Christians joining us talking about chief data citizenship if you will as part of data citizens 21. The conference being put on by caliber. I'm John Wall's thanks for joining us here on the Cube. >>Mhm.
SUMMARY :
citizens 21 brought to you by culebra. So if I summarize the the answer a little bit it's very similar to achieve And if not how would you describe that to a client about being a data So the way you do data So if you would walk me through that process a little bit in terms of creating the european union's new regulation around the eye, there's a bunch of new responsibilities you have But also the huge upsides if you do enforce this the legal uh folks, where typically you find the And then the role of the chief data officer which you think is kind of migrating toward that citizenship responsibility and I expect that their position, you know, it it is really a And I know that uh you painted a pretty Thank you very much. Thank you very much Dan Christians joining us talking about chief data citizenship if you
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Belgium | LOCATION | 0.99+ |
2002 | DATE | 0.99+ |
2008 | DATE | 0.99+ |
John Wall | PERSON | 0.99+ |
european union | ORGANIZATION | 0.99+ |
john walls | PERSON | 0.99+ |
Clear Bright Self | ORGANIZATION | 0.99+ |
last year | DATE | 0.99+ |
2019 | DATE | 0.99+ |
tomorrow | DATE | 0.99+ |
both | QUANTITY | 0.99+ |
culebra | ORGANIZATION | 0.99+ |
today | DATE | 0.98+ |
john | PERSON | 0.98+ |
first | QUANTITY | 0.98+ |
2025 | DATE | 0.98+ |
Stijn "Stan" Christiaens | PERSON | 0.98+ |
one | QUANTITY | 0.98+ |
2020 | DATE | 0.98+ |
Dan Christians | PERSON | 0.98+ |
monday morning | DATE | 0.97+ |
friday evening | DATE | 0.97+ |
Covid | PERSON | 0.97+ |
Collibra | ORGANIZATION | 0.97+ |
around 10,000 | QUANTITY | 0.97+ |
next couple of years | DATE | 0.92+ |
about 13 years ago | DATE | 0.9+ |
brussels | LOCATION | 0.85+ |
second important aspect | QUANTITY | 0.8+ |
cordoba | ORGANIZATION | 0.78+ |
christians | ORGANIZATION | 0.62+ |
uh | ORGANIZATION | 0.61+ |
Planet Rock | LOCATION | 0.61+ |
Data | PERSON | 0.58+ |
Data citizens 21 | EVENT | 0.56+ |
about | DATE | 0.54+ |
ISH | ORGANIZATION | 0.46+ |
21 | ORGANIZATION | 0.41+ |
Stijn Paul Fireside Chat Accessible Data | Data Citizens'21
>>Really excited about this year's data, citizens with so many of you together. Uh, I'm going to talk today about accessible data, because what good is the data. If you can get it into your hands and shop for it, but you can't understand it. Uh, and I'm here today with, uh, bald, really thrilled to be here with Paul. Paul is an award-winning author on all topics data. I think 20 books with 21st on the way over 300 articles, he's been a frequent speaker. He's an expert in future trends. Uh, he's a VP at cognitive systems, uh, over at IBM teachers' data also, um, at the business school and as a champion of diversity initiatives. Paul, thank you for being here, really the conformance, uh, to the session with you. >>Oh, thanks for having me. It's a privilege. >>So let's get started with, uh, our origins and data poll. Um, and I'll start with a little story of my own. So, uh, I trained as an engineer way back when, uh, and, um, in one of the courses we got as an engineer, it was about databases. So we got the stick thick book of CQL and me being in it for the programming. I was like, well, who needs this stuff? And, uh, I wanted to do my part in terms of making data accessible. So essentially I, I was the only book that I sold on. Uh, obviously I learned some hard lessons, uh, later on, as I did a master's in AI after that, and then joined the database research lab at the university that Libra spun off from. Uh, but Hey, we all learned along the way. And, uh, Paula, I'm really curious. Um, when did you awaken first to data? If you will? >>You know, it's really interesting Stan, because I come from the opposite side, an undergrad in economics, uh, with some, uh, information systems research at the higher level. And so I think I was always attuned to what data could do, but I didn't understand how to get at it and the kinds of nuances around it. So then I started this job, a database company, like 27 years ago, and it started there, but I would say the awakening has never stopped because the data game is always changing. Like I look at these epochs that I've been through data. I was a real relational databases thinking third normal form, and then no SQL databases. And then I watch no SQL be about no don't use SQL, then wait a minute. Not only sequel. And today it's really for the data citizens about wait, no, I need SQL. So, um, I think I'm always waking up in data, so I'll call it a continuum if you will. But that was it. It was trying to figure out the technology behind driving analytics in which I took in school. >>Excellent. And I fully agree with you there. Uh, every couple of years they seem to reinvent new stuff and they want to be able to know SQL models. Let me see. I saw those come and go. Uh, obviously, and I think that's, that's a challenge for most people because in a way, data is a very abstract concepts, um, until you get down in the weeds and then it starts to become really, really messy, uh, until you, you know, from that end button extract a certain insights. Um, and as the next thing I want to talk about with you is that challenging organizations, we're hearing a lot about data, being valuable data, being the new oil data, being the new soil, the new gold, uh, data as an asset is being used as a slogan all over. Uh, people are investing a lot in data over multiple decades. Now there's a lot of new data technologies, always, but still, it seems that organizations fundamentally struggle with getting people access to data. What do you think are some of the key challenges that are underlying the struggles that mud, that organizations seem to face when it comes to data? >>Yeah. Listen, Stan, I'll tell you a lot of people I think are stuck on what I call their data, acumen curves, and you know, data is like a gym membership. If you don't use it, you're not going to get any value on it. And that's what I mean by accurate. And so I like to think that you use the analogy of some mud. There's like three layers that are holding a lot of organizations back at first is just the amount of data. Now, I'm not going to give you some stat about how many times I can go to the moon and back with the data regenerate, but I will give you one. I found interesting stat. The average human being in their lifetime will generate a petabyte of data. How much data is that? If that was my apple music playlist, it would be about 2000 years of nonstop music. >>So that's some kind of playlist. And I think what's happening for the first layer of mud is when I first started writing about data warehousing and analytics, I would be like, go find a needle in the haystack. But now it's really finding a needle in a stack of needles. So much data. So little time that's level one of mine. I think the second thing is people are looking for some kind of magic solution, like Cinderella's glass slipper, and you put it on her. She turns into a princess that's for Disney movies, right? And there's nothing magical about it. It is about skill and acumen and up-skilling. And I think if you're familiar with the duper, you recall the Hadoop craze, that's exactly what happened, right? Like people brought all their data together and everyone was going to be able to access it and give insights. >>And it teams said it was pretty successful, but every line of business I ever talked to said it was a complete failure. And the third layer is governance. That's actually where you're going to find some magic. And the problem in governance is every client I talked to is all about least effort to comply. They don't want to violate GDPR or California consumer protection act or whatever governance overlooks, where they do business and governance. When you don't lead me separate to comply and try not to get fine, but as an accelerant to your analytics, and that gets you out of that third layer of mud. So you start to invoke what I call the wisdom of the crowd. Now imagine taking all these different people with intelligence about the business and giving them access and acumen to hypothesize on thousands of ideas that turn into hundreds, we test and maybe dozens that go to production. So those are three layers that I think every organization is facing. >>Well. Um, I definitely follow on all the days, especially the one where people see governance as a, oh, I have to comply to this, which always hurts me a little bit, honestly, because all good governance is about making things easier while also making sure that they're less riskier. Um, but I do want to touch on that Hadoop thing a little bit, uh, because for me in my a decade or more over at Libra, we saw it come as well as go, let's say around 2015 to 2020 issue. So, and it's still around. Obviously once you put your data in something, it's very hard to make it go away, but I've always felt that had do, you know, it seemed like, oh, now we have a bunch of clusters and a bunch of network engineers. So what, >>Yeah. You know, Stan, I fell for, I wrote the book to do for dummies and it had such great promise. I think the problem is there wasn't enough education on how to extract value out of it. And that's why I say it thinks it's great. They liked clusters and engineers that you just said, but it didn't drive lineup >>Business. Got it. So do you think that the whole paradigm with the clouds that we're now on is going to fundamentally change that or is just an architectural change? >>Yeah. You know, it's, it's a great comment. What you're seeing today now is the movement for the data lake. Maybe a way from repositories, like Hadoop into cloud object stores, right? And then you look at CQL or other interfaces over that not allows me to really scale compute and storage separately, but that's all the technical stuff at the end of the day, whether you're on premise hybrid cloud, into cloud software, as a service, if you don't have the acumen for your entire organization to know how to work with data, get value from data, this whole data citizen thing. Um, you're not going to get the kind of value that goes into your investment, right? And I think that's the key thing that business leaders need to understand is it's not about analytics for kind of science project sakes. It's about analytics to drive. >>Absolutely. We fully agree with that. And I want to touch on that point. You mentioned about the wisdom of the crowds, the concept that I love about, right, and your organization is a big grout full of what we call data citizens. Now, if I remember correctly from the book of the wisdom of the crowds, there's, there's two points that really, you have to take Canada. What is, uh, for the wisdom of the grounds to work, you have to have all the individuals enabled, uh, for them to have access to the right information and to be able to share that information safely kept from the bias from others. Otherwise you're just biasing the outcome. And second, you need to be able to somehow aggregate that wisdom up to a certain decision. Uh, so as Felix mentioned earlier, we all are United by data and it's a data citizen topic. >>I want to touch on with you a little bit, because at Collibra we look at it as anyone who uses data to do their job, right. And 2020 has sort of accelerated digitization. Uh, but apart from that, I've always believed that, uh, you don't have to have data in your title, like a data analyst or a data scientist to be a data citizen. If I take a look at the example inside of Libra, we have product managers and they're trying to figure out which features are most important and how are they used and what patterns of behavior is there. You have a gal managers, and they're always trying to know the most they can about their specific accounts, uh, to be able to serve as them best. So for me, the data citizen is really in its broadest sense. Uh, anyone who uses data to do their job, does that, does that resonate with you? >>Yeah, absolutely. It reminds me of myself. And to be honest in my eyes where I got started from, and I agree, you don't need the word data in your title. What you need to have is curiosity, and that is in your culture and in your being. And, and I think as we look at organizations to transform and take full advantage of their, their data investments, they're going to need great governance. I guarantee you that, but then you're going to have to invest in this data citizen concept. And the first thing I'll tell you is, you know, that kind of acumen, if you will, as a team sport, it's not a departmental sport. So you need to think about what are the upskilling programs of where we can reach across to the technical and the non-technical, you know, lots and lots of businesses rely on Microsoft Excel. >>You have data citizens right there, but then there's other folks who are just flat out curious about stuff. And so now you have to open this up and invest in those people. Like, why are you paying people to think about your business without giving the data? It would be like hiring Tom Brady as a quarterback and telling him not to throw a pass. Right. And I see it all the time. So we kind of limit what we define as data citizen. And that's why I love what you said. You don't need the word data in your title and more so if you don't build the acumen, you don't know how to bring the data together, maybe how to wrangle it, but where did it come from? And where can you fixings? One company I worked with had 17 definitions for a sales individual, 17 definitions, and the talent team and HR couldn't drive to a single definition because they didn't have the data accurate. So when you start thinking of the data citizen, concept it about enabling everybody to shop for data much. Like I would look for a USB cable on Amazon, but also to attach to a business glossary for definition. So we have a common version of what a word means, the lineage of the data who owns it, who did it come from? What did it do? So bring that all together. And, uh, I will tell you companies that invest in the data, citizen concept, outperform companies that don't >>For all of that, I definitely fully agree that there's enough research out there that shows that the ones who are data-driven are capturing the most markets, but also capturing the most growth. So they're capturing the market even faster. And I love what you said, Paul, about, um, uh, the brains, right? You've already paid for the brains you've already invested in. So you may as well leverage them. Um, you may as well recognize and, and enable the data citizens, uh, to get access to the assets that they need to really do their job properly. That's what I want to touch on just a little bit, if, if you're capable, because for me, okay. Getting access to data is one thing, right? And I think you already touched on a few items there, but I'm shopping for data. Now I have it. I have a cul results set in my hands. Let's say, but I'm unable to read and write data. Right? I don't know how to analyze it. I don't know maybe about bias. Uh, maybe I, I, I don't know how to best visualize it. And maybe if I do, maybe I don't know how to craft a compelling persuasion narrative around it to change my bosses decisions. So from your viewpoint, do you think that it's wise for companies to continuously invest in data literacy to continuously upgrade that data citizens? If you will. >>Yeah, absolutely. Forest. I'm going to tell you right now, data literacy years are like dog years stage. So fast, new data types, new sources of data, new ways to get data like API APIs and microservices. But let me take it away from the technical concept for a bit. I want to talk to you about the movie. A star is born. I'm sure most of you have seen it or heard it Bradley Cooper, lady Gaga. So everyone knows the movie. What most people probably don't know is when lady Gaga teamed up with Bradley Cooper to do this movie, she demanded that he sing everything like nothing could be auto-tuned everything line. This is one of the leading actors of Hollywood. They filmed this remake in 42 days and Bradley Cooper spent 18 months on singing lessons. 18 months on a guitar lessons had a voice coach and it's so much and so forth. >>And so I think here's the point. If one of the best actors in the world has to invest three and a half years for 42 days to hit a movie out of the park. Why do we think we don't need a continuous investment in data literacy? Even once you've done your initial training, if you will, over the data, citizen, things are going to change. I don't, you don't. If I, you Stan, if you go to the gym and workout every day for three months, you'll never have to work out for the rest of your life. You would tell me I was ridiculous. So your data literacy is no different. And I will tell you, I have managed thousands of individuals, some of the most technical people around distinguished engineers, fellows, and data literacy comes from curiosity and a culture of never ending learning. That is the number one thing to success. >>And that curiosity, I hire people who are curious, I'll give you one more story. It's about Mozart. And this 21 year old comes to Mozart and he says, Mozart, can you teach me how to compose a symphony? And Mozart looks at this person that says, no, no, you're too young, too young. You compose your fourth symphony when you were 12 and Mozart looks at him and says, yeah, but I didn't go around asking people how to compose a symphony. Right? And so the notion of that story is curiosity. And those people who show up in always want to learn, they're your home run individuals. And they will bring data literacy across the organization. >>I love it. And I'm not going to try and be Mozart, but you know, three and a half years, I think you said two times, 18 months, uh, maybe there's hope for me yet in a singing, you'll be a good singer. Um, Duchy on the, on the, some of the sports references you've made, uh, Paul McGuire, we first connected, uh, I'm not gonna like disclose where you're from, but, uh, I saw he did come up and I know it all sorts of sports that drive to measure everything they can right on the field of the field. So let's imagine that you've done the best analysis, right? You're the most advanced data scientists schooled in the classics, as well as the modernist methods, the best tools you've made a beautiful analysis, beautiful dashboards. And now your coach just wants to put their favorite player on the game, despite what you're building to them. How do you deal with that kind of coaches? >>Yeah. Listen, this is a great question. I think for your data analytics strategy, but also for anyone listening and watching, who wants to just figure out how to drive a career forward? I would give the same advice. So the story you're talking about, indeed hockey, you can figure out where I'm from, but it's around the Ottawa senators, general manager. And he made a quote in an interview and he said, sometimes I want to punch my analytics, people in the head. Now I'm going to tell you, that's not a good culture for analytics. And he goes on to say, they tell me not to play this one player. This one player is very tough. You know, throws four or five hits a game. And he goes, I'd love my analytics people to get hit by bore a wacky and tell me how it feels. That's the player. >>Sure. I'm sure he hits hard, but here's the deal. When he's on the ice, the opposing team gets more shots on goal than the senators do on the opposing team. They score more goals, they lose. And so I think whenever you're trying to convince a movement forward, be it management, be it a project you're trying to fund. I always try to teach something that someone didn't previously know before and make them think, well, I never thought of it that way before. And I think the great opportunity right now, if you're trying to get moving in a data analytics strategy is around this post COVID era. You know, we've seen post COVID now really accelerate, or at least post COVID in certain parts of the world, but accelerate the appetite for digital transformation by about half a decade. Okay. And getting the data within your systems, as you digitize will give you all kinds of types of projects to make people think differently than the way they thought before. >>About data. I call this data exhaust. I'll give you a great example, Uber. I think we're all familiar with Uber. If we all remember back in the days when Uber would offer you search pricing. Okay? So basically you put Uber on your phone, they know everything about you, right? Who are your friends, where you going, uh, even how much batteries on your phone? Well, in a data science paper, I read a long time ago. They recognize that there was a 70% chance that you would accept a surge price. If you had less than 10% of your battery. So 10% of battery on your phone is an example of data exhaust all the lawns that you generate on your digital front end properties. Those are logs. You can take those together and maybe show executive management with data. We can understand why people abandoned their cart at the shipping phase, or what is the amount of shipping, which they abandoned it. When is the signal when our systems are about to go to go down. So, uh, I think that's a tremendous way. And if you look back to the sports, I mean the Atlanta Falcons NFL team, and they monitor their athletes, sleep performance, the Toronto Raptors basketball, they're running AI analytics on people's personalities and everything they tweet and every interview to see if the personality fits. So in sports, I think athletes are the most important commodity, if you will, or asset a yet all these teams are investing in analytics. So I think that's pretty telling, >>Okay, Paul, it looks like we're almost out of time. So in 30 seconds or less, what would you recommend to the data citizens out there? >>Okay. I'm going to give you a four tips in 30 seconds. Number one, remember learning never ends be curious forever. You'll drive your career. Number two, remember companies that invest in analytics and data, citizens outperform those that don't McKinsey says it's about 1.4 times across many KPIs. Number three, stop just collecting the dots and start connecting them with that. You need a strong governance strategy and that's going to help you for the future because the biggest thing in the future is not going to be about analytics, accuracy. It's going to be about analytics, explainability. So accuracy is no longer going to be enough. You're going to have to explain your decisions and finally stay positive and forever test negative. >>Love it. Thank you very much fall. Um, and for all the data seasons is out there. Um, when it comes down to access to data, it's more than just getting your hands on the data. It's also knowing what you can do with it, how you can do that and what you definitely shouldn't be doing with it. Uh, thank you everyone out there and enjoy your learning and interaction with the community. Stay healthy. Bye-bye.
SUMMARY :
If you can get it into your hands and shop for it, but you can't understand it. It's a privilege. Um, when did you awaken first to data? And so I think I was always attuned to what data could do, but I didn't understand how to get Um, and as the next thing I want to talk about with you is And so I like to think that you use And I think if you're familiar with the duper, you recall the Hadoop craze, And the problem in governance is every client I talked to is Obviously once you put your They liked clusters and engineers that you just said, So do you think that the whole paradigm with the clouds that And then you look at CQL or other interfaces over that not allows me to really scale you have to have all the individuals enabled, uh, uh, you don't have to have data in your title, like a data analyst or a data scientist to be a data citizen. and I agree, you don't need the word data in your title. And so now you have to open this up and invest in those people. And I think you already touched on a few items there, but I'm shopping for data. I'm going to tell you right now, data literacy years are like dog years I don't, you don't. And that curiosity, I hire people who are curious, I'll give you one more story. And I'm not going to try and be Mozart, but you know, And he goes on to say, they tell me not to play this one player. And I think the great opportunity And if you look back to the sports, what would you recommend to the data citizens out there? You need a strong governance strategy and that's going to help you for the future thank you everyone out there and enjoy your learning and interaction with the community.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Paul | PERSON | 0.99+ |
Toronto Raptors | ORGANIZATION | 0.99+ |
Paula | PERSON | 0.99+ |
Paul McGuire | PERSON | 0.99+ |
Uber | ORGANIZATION | 0.99+ |
17 definitions | QUANTITY | 0.99+ |
Tom Brady | PERSON | 0.99+ |
Mozart | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Bradley Cooper | PERSON | 0.99+ |
70% | QUANTITY | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
18 months | QUANTITY | 0.99+ |
30 seconds | QUANTITY | 0.99+ |
20 books | QUANTITY | 0.99+ |
12 | QUANTITY | 0.99+ |
hundreds | QUANTITY | 0.99+ |
42 days | QUANTITY | 0.99+ |
fourth symphony | QUANTITY | 0.99+ |
two times | QUANTITY | 0.99+ |
three months | QUANTITY | 0.99+ |
Atlanta Falcons | ORGANIZATION | 0.99+ |
lady Gaga | PERSON | 0.99+ |
Bradley Cooper | PERSON | 0.99+ |
Stan | PERSON | 0.99+ |
2020 | DATE | 0.99+ |
10% | QUANTITY | 0.99+ |
21st | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
one player | QUANTITY | 0.99+ |
CQL | TITLE | 0.99+ |
Cinderella | PERSON | 0.99+ |
second thing | QUANTITY | 0.99+ |
GDPR | TITLE | 0.99+ |
two points | QUANTITY | 0.99+ |
Felix | PERSON | 0.99+ |
dozens | QUANTITY | 0.99+ |
three and a half years | QUANTITY | 0.99+ |
single definition | QUANTITY | 0.99+ |
thousands | QUANTITY | 0.99+ |
second | QUANTITY | 0.99+ |
four | QUANTITY | 0.99+ |
less than 10% | QUANTITY | 0.98+ |
Collibra | ORGANIZATION | 0.98+ |
first | QUANTITY | 0.98+ |
third layer | QUANTITY | 0.98+ |
three layers | QUANTITY | 0.98+ |
one | QUANTITY | 0.98+ |
2015 | DATE | 0.98+ |
about 2000 years | QUANTITY | 0.98+ |
Canada | LOCATION | 0.98+ |
California consumer protection act | TITLE | 0.98+ |
four tips | QUANTITY | 0.97+ |
Disney | ORGANIZATION | 0.97+ |
third | QUANTITY | 0.97+ |
SQL | TITLE | 0.97+ |
Microsoft | ORGANIZATION | 0.97+ |
this year | DATE | 0.96+ |
Hollywood | ORGANIZATION | 0.96+ |
one more story | QUANTITY | 0.96+ |
over 300 articles | QUANTITY | 0.94+ |
27 years ago | DATE | 0.94+ |
one thing | QUANTITY | 0.94+ |
a decade | QUANTITY | 0.94+ |
Duchy | PERSON | 0.93+ |
level one | QUANTITY | 0.92+ |
Kirk Viktor Fireside Chat Trusted Data | Data Citizens'21
>>Kirk focuses on the approach to modern data quality and how it can enable the continuous delivery of trusted data. Take it away. Kirk >>Trusted data has been a focus of mine for the last several years. Most particularly in the area of machine learning. Uh, I spent much of my career on wall street, writing models and trying to create a healthy data program, sort of the run the bank and protect the franchise and how to do that at scale for larger organizations. Uh, I'm excited to have the opportunity today sitting with me as Victor to have a fireside chat. He is an award-winning and best-selling author of delete big data and most currently framers. He's also a professor of governance at Oxford. So Victor, my question for you today is in an era of data that is always on and always flowing. How does CDOs get comfortable? You know, the, I can sleep at night factor when data is coming in from more angles, it's being stored in different formats and varieties and probably just in larger quantities than ever before. In my opinion, just laws of large numbers with that much data. Is there really just that much more risk of having bad data or inaccuracy in your business? >>Well, thank you Kirk, for having me on. Yes, you're absolutely right. That the real problem, if I were to simplify it down to one statement is that incorrect data and it can lead to wrong decisions that can be incredibly costly and incredibly costly for trust for the brand, for the franchise incredibly costly, because they can lead to decisions that are fundamentally flawed, uh, and therefore lead the business in the wrong direction. And so the, the, the real question is, you know, how can you avoid, uh, incorrect data to produce incorrect insights? And that depends on how you view trust and how you view, uh, data and correctness in the first place. >>Yeah, that's interesting, you know, in my background, we were constantly writing models, you know, we're trying to make the models smarter all the time, and we always wanted to get that accuracy level from 89% to 90%, you know, whatever we could be, but there's this popular theme where over time the models can diminish an accuracy. And the only button we really had at our disposal was to retrain the model, uh, oftentime I'm focused on, should we be stress testing the data, it almost like a patient health exam. Uh, and how do we do that? Where we could get more comfortable thinking about the quality of the data before we're running our models and our analytics. >>Yeah, absolutely. When we look at the machine learning landscape, even the big data landscape, what we see is that a lot of focus is now put on getting the models, right, getting it worked out, getting the kinks worked out, but getting sort of the ethics, right. The value, right. That is in the model. Um, uh, and what is really not looked at what is not focused enough that, um, is the data. Now, if you're looking at it from a compliance viewpoint, maybe it's okay if you just look at the model, maybe not. But if you understand that actually using the right data with the right model gives you a competitive advantage that your competitors don't have, then it is far more than compliance. And if it is far more compliance, then actually the aperture for strategy opens up and you should not just look at models. You should actually look at the data and the quality and correctness of the data as a huge way by which you can push forward your competitive advantage. >>Well, I haven't even trickier one for you. I think, you know, there's so much coming in and there's so much that we know we can measure and there's so much we could replay and do what if analysis on and kind of back tests, but, you know, do you see organizations doing things to look around the corner? And maybe an interesting analogy would be something like with Tesla is doing whether it's sensors or LIDAR, and they're trying to bounce off every object they know, and they can make a lot of measurements, but the advancements in computer vision are saying, I might be able to predict what's around the corner. I might be able to be out ahead of the data error. I'm about to see tomorrow. Um, you know, do you see any organizations trying to take that futuristic step to sort of know the unknown and be more predictive versus reactive? >>Absolutely. Tesla is doing a bit Lincoln, uh, but so are others in that space and not autonomous driving space, um, uh, Waymo, the, uh, the, the, uh, Google company that is, uh, doing autonomous driving for a long period of time where they have been doing is collecting training data, uh, through their cars and then running a machine learning on the training data. Now they hit a wall a couple of years ago because the training data wasn't diverse enough. It didn't have that sort of Moore's law of insight anymore, even though it was more and more training data. Um, and so the, the Delta, the additional learning was just limited. So what they then decided to do was to build a virtual reality called car crafting, which were actually cars would drive around and create, uh, uh, predictive training data. Now, what is really interesting about that is that that is isn't a model. It is a model that creates predictive data. And this predictive is the actual value that is added to the equation here. And with this extra predictive data, they were able to improve their autonomous driving quite significantly. Uh, five years ago, their disengagement was, uh, raped was every, uh, 2000 miles on average. And, uh, last year, uh, five years later, it was every 30,000 miles on average, that's a 15 K improvement. And that wasn't driven by a mysterious model. It was driven by predictive data. >>Right, right. You know, that's interesting. I, I'm also a fan of trying to use data points that don't exist in the data sets. So it sounds like they were using more data data that was derived from other sources. And maybe the most simple format that I usually get started with was, you know, what, if I was looking at data from Glassdoor and I wanted to know if it was valid, if it was accurate, but of course there's going to be numbers in the age, field and salary and years of experience in different things. But what if the years of experience and age and academic level of someone no longer correlates to the salary yet that correlation component is not a piece of data that even lives in the column, the row, the cell. So I do think that there's a huge area for improvement and just advancement in the role data that we see in collect, but also the data science metrics, something like lift and correlation between the data points that really helped me certify and feel comfortable that this data makes sense. Otherwise it could just be numbers in the field >>Indeed. And, and this challenge of, of finding the data and focusing on the right subset of the data and manipulating it, uh, in the right, in a qualitatively right way is really something that has been with us for quite a number of years. There's a fabulous, uh, case, um, a few years back, uh, when, um, in Japan, when there was the suspicion that in Sumo wrestling, there was match fixing going on massive max fiction. Um, and, and so investigators came in and they took the data from the championship bouts and analyzed them and, uh, didn't find anything. And, uh, what was, what was really interesting is then later researchers came in and read the rules and regulations of Sumo wrestling and understood that it's not just the championship bouts that matter, but it's also sometimes the relegation matches that matter. And so then they started looking at those secondary matches that nobody looked at before and that subset of data, and they discovered there's massive match fixing going on. It's just, nobody looked at it because nobody just, as you said, that connection, uh, between th those various data sources or the sort of causal connectivity there. And so it's, it's, it's really crucial to understand, uh, that, uh, driving insight out of data, isn't a black box thing where you feed the data in and get it out. It really requires deep thinking about how to wire it up from the very beginning. >>No, that's an interesting story. I kind of wonder if the model in that case is almost the, the wrestlers themselves or the output, but definitely the, the data that goes into it. Um, yeah. So, I mean, do you see a path where organizations will achieve a hundred percent confidence? Because we all know there's a, I can't sleep at night factor, but there's also a case of what do I do today. It's, I'm probably not living in a perfect world. I might be sailing a boat across an ocean that already has a hole in it. So, you know, we can't turn everything off. We have to sort of patch the boat and sail it at the same time. Um, what do you think the, a good approaches for a large organization to improve their posture? >>You know, if you focus on perfection, you never, you never achieved that perfection a hundred percent perfection or so is never achievable. And if you want some radical change, then that that's admirable. But a lot of times it's very risky. It's a very risky proposition. So rather than doing that, there is a lot of low hanging fruit than that incremental, pragmatic step-by-step approach. If I can use an analogy from history, uh, we, we, we talk a lot about, um, the data revolution and before that, the industrial revolution, and when we think about the industrial revolution, we think about the steam engine, but the reality is that the steam engine, wasn't just one radical invention. In fact, there were a myriad of small incremental invade innovations over the course of a century that today we call the industrial revolution. And I think it's the various same thing when the data revolution where we don't have this one silver bullet that radically puts us into data Nirvana, but it is this incremental, pragmatic step-by-step change. It will get us closer. Um, pragmatic, can you speak in closer to where we want to be, even though there was always more work for us left? >>Yeah, that's interesting. Um, you know, that one hits home for me because we ultimately at Collibra take an incremental approach. We don't think there's a stop the world event. There's, you know, a way to learn from the past trends of our data to become incrementally smarter each day. And this kind of stops us from being in a binary project mode, right. Where we have to wait right. Something for six months and then reassess it and hope, you know, we kind of wonder if you're at 70% accuracy today is being at 71% better tomorrow, right? At least there's a measurable amount of improvement there. Uh, and it's a sort of a philosophical difference. And it reminds me of my banking days. When you say, uh, you know, past performance is no guarantee of future results. And, um, it's a nice disclaimer, you can put in everything, but I actually find it to be more true in data. >>We have all of these large data assets, whether it's terabytes or petabytes, or even if it's just gigabytes sitting there on all the datasets to learn from. And what I find in data is that the past historical values actually do tell us a lot about the future and we can learn from that to become incrementally smarter tomorrow. And there's really a lot of value sitting there in the historical data. And it tells me at least a lot about how to forecast the future. You know, one that's been sitting on the top of my mind recently, especially with COVID and the housing market a long time back, I competed with automation, valuation modeling, which basically means how well can you predict the price of a house? And, you know, that's always a fun one to do. And there's some big name brands out there that do that pretty well. >>Back then when I built those models, I would look at things like the size of the yard, the undulation of the land, uh, you know, whether a pool would award you more or less money for your house. And a lot of those factors were different than they are now. So those models ultimately have already changed. And now that we've seen post COVID people look for different things in housing and the prices have gone up. So we've seen a decline and then a dramatic increase. And then we've also seen things like land and pools become more valuable than they were in the housing model before, you know, what are you seeing here with models and data and how that's going to come together? And it's just, is it always going to change where you're going to have to constantly recalibrate both, you know, our understanding of the data and the models themselves? >>Well, indeed the, the problem of course is almost eternal. Um, oftentimes we have developed beautiful models that work really well. And then we're so wedded to this model or this particular kind of model. And we can fathom to give them up. I mean, if I think of my students, sometimes, you know, they, they, they, they have a model, they collect the data, then they run the analysis and, uh, it basically, uh, tells them that their model was wrong. They go out and they collect more data and more data and more data just to make sure that it isn't there, that, that, that their model is right. But the data tells them what the truth is that the model isn't right anymore that has context and goals and circumstances change the model needs to adapt. And we have seen it over and over again, not just in the housing market, but post COVID and in the COVID crisis, you know, a lot of the epidemiologists looked at life expectancy of people, but when you, when you look at people, uh, in the intensive care unit, uh, with long COVID, uh, suffering, uh, and in ICU and so on, you also need to realize, and many have that rather than life expectancy. >>You also need to look at life quality as a mother, uh, kind of dimension. And that means your model needs to change because you can't just have a model that optimizes on life expectancy anymore. And so what we need to do is to understand that the data and the changes in the data that they NAMIC of the data really is a thorn in our thigh of revisiting the model and thinking very critically about what we can do in order to adjust the model to the present situation. >>But with that, Victor, uh, I've really enjoyed our chat today. And, uh, do you have any final thoughts, comments, questions for me? >>Uh, you know, Kirk, I enjoyed it tremendously as well. Uh, I do think that, uh, that what is important, uh, to understand with data is that as there is no, uh, uh, no silver bullet, uh, and there is only incremental steps forward, this is not actually something to despair, but to give and be the source of great hope, because it means that not just tomorrow, but even the day after tomorrow and the day after the day after tomorrow, we still can make headway can make improvement and get better. >>Absolutely. I like the hopeful message I live every day to, uh, to make data a better place. And it is exciting as we see the advancements in what's possible on what's kind of on the forefront. Um, well with that, I really appreciate the chat and I would encourage anyone. Who's interested in this topic to attend a session later today on modern data quality, where I go through maybe five key flaws of the past and some of the pitfalls, and explain a little bit more about how we're using unsupervised learning to solve for future problems. Thanks Victor. Thank you, Kurt. >>Thanks, Kirk. And Victor, how incredible was that?
SUMMARY :
Kirk focuses on the approach to modern data quality and how it can enable the continuous delivery the franchise and how to do that at scale for larger organizations. And that depends on how you view trust and how you And the only button we really even the big data landscape, what we see is that a lot of focus is now Um, you know, the Delta, the additional learning was just limited. and just advancement in the role data that we see in collect, but also the that matter, but it's also sometimes the relegation matches that matter. Um, what do you think the, a good approaches And if you want some radical Um, you know, that one hits home for me because we ultimately And, you know, that's always a fun one to do. the undulation of the land, uh, you know, whether a pool would not just in the housing market, but post COVID and in the COVID crisis, you know, adjust the model to the present situation. And, uh, do you have any final thoughts, comments, questions for me? Uh, you know, Kirk, I enjoyed it tremendously as well. I like the hopeful message I live every day to, uh, to make data a better place.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Kirk | PERSON | 0.99+ |
Kurt | PERSON | 0.99+ |
Victor | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
Japan | LOCATION | 0.99+ |
six months | QUANTITY | 0.99+ |
71% | QUANTITY | 0.99+ |
Glassdoor | ORGANIZATION | 0.99+ |
89% | QUANTITY | 0.99+ |
tomorrow | DATE | 0.99+ |
15 K | QUANTITY | 0.99+ |
Tesla | ORGANIZATION | 0.99+ |
last year | DATE | 0.99+ |
70% | QUANTITY | 0.99+ |
2000 miles | QUANTITY | 0.99+ |
Waymo | ORGANIZATION | 0.99+ |
five years later | DATE | 0.99+ |
one statement | QUANTITY | 0.99+ |
90% | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
five years ago | DATE | 0.98+ |
both | QUANTITY | 0.98+ |
each day | QUANTITY | 0.98+ |
COVID | OTHER | 0.97+ |
Moore | PERSON | 0.97+ |
five key flaws | QUANTITY | 0.95+ |
Collibra | ORGANIZATION | 0.94+ |
hundred percent | QUANTITY | 0.94+ |
one silver bullet | QUANTITY | 0.92+ |
Kirk Viktor | PERSON | 0.92+ |
first | QUANTITY | 0.91+ |
COVID crisis | EVENT | 0.88+ |
Oxford | ORGANIZATION | 0.88+ |
every 30,000 miles | QUANTITY | 0.86+ |
a couple of years ago | DATE | 0.85+ |
Sumo wrestling | EVENT | 0.84+ |
one radical invention | QUANTITY | 0.8+ |
few years back | DATE | 0.75+ |
secondary matches | QUANTITY | 0.74+ |
last several years | DATE | 0.73+ |
COVID | EVENT | 0.68+ |
Delta | ORGANIZATION | 0.66+ |
NAMIC | ORGANIZATION | 0.53+ |
Kirk | ORGANIZATION | 0.53+ |
Lincoln | ORGANIZATION | 0.45+ |
Jim Cushman Product strategy vision | Data Citizens'21
>>Hi everyone. And welcome to data citizens. Thank you for making the time to join me and the over 5,000 data citizens like you that are looking to become United by data. My name is Jim Cushman. I serve as the chief product officer at Collibra. I have the benefit of sharing with you, the product, vision, and strategy of Culebra. There's several sections to this presentation, and I can't wait to share them with you. The first is a story of how we're taking a business user and making it possible for him or her data, use data and gain. And if it and insight from that data, without relying on anyone in the organization to write code or do the work for them next I'll share with you how Collibra will make it possible to manage metadata at scales, into the billions of assets. And again, load this into our software without writing any code third, I will demonstrate to you the integration we have already achieved with our newest product release it's data quality that's powered by machine learning. >>Right? Finally, you're going to hear about how Colibra has become the most universally available solution in the market. Now, we all know that data is a critical asset that can make or break an organization. Yet organizations struggle to capture the power of their data and many remain afraid of how their data could be misused and or abused. We also observe that the understanding of and access to data remains in the hands of just a small few, three out of every four companies continue to struggle to use data, to drive meaningful insights, all forward looking companies, looking for an advantage, a differentiator that will set them apart from their peers and competitors. What if you could improve your organization's productivity by just 5%, even a modest 5% productivity improvement compounded over a five-year period will make your organization 28% more productive. This will leave you with an overwhelming advantage over your competition and uniting your data. >>Litter employees with data is the key to your success. And dare I say, sorry to unlock this potential for increased productivity, huge competitive advantage organizations need to enable self-service access to data for everyday to literate knowledge worker. Our ultimate goal at Cleaver has always been to enable this self-service for our customers to empower every knowledge worker to access the data they need when they need it. But with the peace of mind that your data is governed insecure. Just to imagine if you had a single integrated solution that could deliver a seamless governed, no code user experience of delivering the right data to the right person at the right time, just as simply as ordering a pair of shoes online would be quite a magic trick and one that would place you and your organization on the fast track for success. Let me introduce you to our character here. >>Cliff cliff is that business analyst. He doesn't write code. He doesn't know Julian or R or sequel, but is data literate. When cliff has presented with data of high quality and can actually help find that data of high-quality cliff knows what to do with it. Well, we're going to expose cliff to our software and see how he can find the best data to solve his problem of the day, which is customer churn. Cliff is going to go out and find this information is going to bring it back to him. And he's going to analyze it in his favorite BI reporting tool. Tableau, of course, that could be Looker, could be power BI or any other of your favorites, but let's go ahead and get started and see how cliff can do this without any help from anyone in the organization. So cliff is going to log into Cleaver and being a business user. >>The first thing he's going to do is look for a business term. He looks for customer churn rate. Now, when he brings back a churn rate, it shows him the definition of churn rate and various other things that have been attributed to it such as data domains like product and customer in order. Now, cliff says, okay, customer is really important. So let me click on that and see what makes up customer definition. Cliff will scroll through a customer and find out the various data concepts attributes that make up the definition of customer and cliff knows that customer identifier is a really important aspect to this. It helps link all the data together. And so cliff is going to want to make sure that whatever source he brings actually has customer identifier in it. And that it's of high quality cliff is also interested in things such as email address and credit activity and credit card. >>But he's now going to say, okay, what data sets actually have customer as a data domain in, and by the way, why I'm doing it, what else has product and order information? That's again, relevant to the concept of customer churn. Now, as he goes on, he can actually filter down because there's a lot of different results that could potentially come back. And again, customer identifier was very important to cliff. So cliff, further filters on customer identifier any further does it on customer churn rate as well. This results in two different datasets that are available to cliff for selection, which one to use? Well, he's first presented with some data quality information you can see for customer analytics. It has a data quality score of 76. You can see for sales data enrichment dataset. It has a data quality score of 68. Something that he can see right at the front of the box of things that he's looking for, but let's dig in deeper because the contents really matter. >>So we see again the score of 76, but we actually have the chance to find out that this is something that's actually certified. And this is something that has a check mark. And so he knows someone he trusts is actually certified. This is a dataset. You'll see that there's 91 columns that make up this data set. And rather than sifting through all of that information, cliff is going to go ahead and say, well, okay, customer identifier is very important to me. Let me search through and see if I can find what it's data quality scores very quickly. He finds that using a fuzzy search and brings back and sees, wow, that's a really high data quality score of 98. Well, what's the alternative? Well, the data set is only has 68, but how about, uh, the customer identifier and quickly, he discovers that the data quality for that is only 70. >>So all things being equal, customer analytics is the better data set for what cliff needs to achieve. But now he wants to look and say, other people have used this, what have they had to say about it? And you can see there are various reviews for different reviews from peers of his, in the organization that have given it five stars. So this is encourages cliffs, a confidence that this is great data set to use. Now cliff wants to look a little bit more detailed before he finally commits to using this dataset. Cliff has the opportunity to look at it in the broader set. What are the things can I learn about customer analytics, such as what else is it related to? Who else uses it? Where did it come from? Where does it go and what actually happens to it? And so within our graph of information, we're able to show you a diagram. >>You can see the customer analytics actually comes from the CRM cloud system. And from there you can inherit some wonderful information. We know exactly what CRM cloud is about as an overall system. It's related to other logical models. And here you're actually seeing that it's related to a policy policy about PII or personally identifiable information. This gets cliff almost the immediate knowledge that there's going to be some customer information in this PII information that he's not going to be able to see given his user role in the organization. But cliff says, Hey, that's okay. I actually don't need to see somebody's name and social security number to do my work. I can actually work with other information in the data file. That'll actually help me understand why our customers churning in, what can I actually do about it. If we dig in deeper, we can see what is personally identifiable information that actually could cause issues. >>And as we scroll down and take a little bit of a focus on what we call or what you'll see here is customer phone, because we'll show that to you a little bit later, but these show the various information that once cliff actually has it fulfilled and delivered to him, he will see that it's actually massed and or redacted from his use. Now cliff might drive in deeper and see more information. And he says, you know what? Another piece that's important to me in my analysis is something called is churned. This is basically suggesting that has a customer actually churned. It's an important flag, of course, because that's the analysis that he's performing cliff sees that the score is a mere 65. That's not exactly a great data quality score, but cliff has, is kind of in a hurry. His bosses is, has come back and said, we need to have this information so we can take action. >>So he's not going to wait around to see if they can go through some long day to quality project before he pursues, but he is going to come up and use it. The speed of thinking. He's going to create a suggestion, an issue. He's going to submit this as a work queue item that actually informs others that are responsible for the quality of data. That there's an opportunity for improvement to this dataset that is highly reviewed, but it may be, it has room for improvement as cliff is actually typing in his explanation that he'll pass along. We can also see that the data quality is made up of multiple components, such as integrity, duplication, accuracy, consistency, and conformity. Um, we see that we can submit this, uh, issue and pass it through. And this will go to somebody else who can actually work on this. >>And we'll show that to you a little bit later, but back to cliff, cliff says, okay, I'd like to, I'd like to work with this dataset. So he adds it to his data basket. And just like if he's shopping online, cliff wants that kind of ability to just say, I want to just click once and be done with it. Now it is data and there's some sensitivity about it. And again, there's an owner of this data who you need to get permission from. So cliff is going to provide information to the owner to say, here's why I need this data. And how long do I need this data for starting on a certain date and ending on a certain date and ultimately, what purpose am I going to have with this data? Now, there are other things that cliff can choose to run. This one is how do you want this day to deliver to you? >>Now, you'll see down below, there are three options. One is borrow the other's lease and others by what does that mean? Well, borrow is this idea of, I don't want to have the data that's currently in this CRM, uh, cloud database moved somewhere. I don't want it to be persistent anywhere else. I just want to borrow it very short term to use in my Tablo report and then poof be gone. Cause I don't want to create any problems in my organization. Now you also see lease. Lease is a situation where you actually do need to take possession of the data, but only for a time box period of time, you don't need it for an indefinite amount of time. And ultimately buy is your ability to take possession of the data and have it in perpetuity. So we're going to go forward with our bar use case and cliff is going to submit this and all the fun starts there. >>So cliff has actually submitted the order and the owner, Joanna is actually going to receive the request for the order. Joanna, uh, opens up her task, UCS there's work to perform. It says, oh, okay, here's this there's work for me to perform. Now, Joanna has the ability to automate this using incorporated workflow that we have in Colibra. But for this situation, she's going to manually review that. Cliff wants to borrow a specific data set for a certain period of time. And he actually wants to be using in a Tablo context. So she reviews. It makes an approval and submits it this in turn, flips it back to cliff who says, okay, what obligations did I just take on in order to work for this data? And he reviews each of these data sharing agreements that you, as an organization would set up and say, what am I, uh, what are my restrictions for using this data site? >>As cliff accepts his notices, he now has triggered the process of what we would call fulfillment or a service broker. And in this situation we're doing a virtualization, uh, access, uh, for the borrow use case. Cliff suggests Tablo is his preferred BI and reporting tool. And you can see the various options that are available from power BI Looker size on ThoughtSpot. There are others that can be added over time. And from there, cliff now will be alerted the minute this data is available to them. So now we're running out and doing a distributed query to get the information and you see it returns back for raw view. Now what's really interesting is you'll see, the customer phone has a bunch of X's in it. If you remember that's PII. So it's actually being massed. So cliff can't actually see the raw data. Now cliff also wants to look at it in a Tablo report and can see the visualization layer, but you also see an incorporation of something we call Collibra on the go. >>Not only do we bring the data to the report, but then we tell you the reader, how to interpret the report. It could be that there's someone else who wants to use the very same report that cliff helped create, but they don't understand exactly all the things that cliff went through. So now they have the ability to get a full interpretation of what was this data that was used, where did it come from? And how do I actually interpret some of the fields that I see on this report? Really a clever combination of bringing the data to you and showing you how to use it. Cliff can also see this as a registered asset within a Colibra. So the next shopper comes through might actually, instead of shopping for the dataset might actually shop for the report itself. And the report is connected with the data set he used. >>So now they have a full bill of materials to run a customer Shern report and schedule it anytime they want. So now we've turned cliff actually into a creator of data assets, and this is where intelligent, it gets more intelligence and that's really what we call data intelligence. So let's go back through that magic trick that we just did with cliff. So cliff went into the software, not knowing if the source of data that he was looking for for customer product sales was even available to him. He went in very quickly and searched and found his dataset, use facts and facets to filter down to exactly what was available. Compare to contrast the options that were there actually made an observation that there actually wasn't enough data quality around a certain thing was important to him, created an idea, or basically a suggestion for somebody to follow up on was able to put that into his shopping basket checkout and have it delivered to his front door. >>I mean, that's a bit of a magic trick, right? So, uh, cliff was successful in finding data that he wanted and having it, deliver it to him. And then in his preferred model, he was able to look at it into Tableau. All right. So let's talk about how we're going to make this vision a reality. So our first section here is about performance and scale, but it's also about codeless database registration. How did we get all that stuff into the data catalog and available for, uh, cliff to find? So allow us to introduce you to what we call the asset life cycle and some of the largest organizations in the world. They might have upwards of a billion data assets. These are columns and tables, reports, API, APIs, algorithms, et cetera. These are very high volume and quite technical and far more information than a business user like cliff might want to be engaged with those very same really large organizations may have upwards of say, 20 to 25 million that are critical data sources and data assets, things that they do need to highly curate and make available. >>But through that as a bit of a distillation, a lifecycle of different things you might want to do along that. And so we're going to share with you how you can actually automatically register these sources, deal with these very large volumes at speed and at scale, and actually make it available with just a level of information you need to govern and protect, but also make it available for opportunistic use cases, such as the one we presented with cliff. So as you recall, when cliff was actually trying to look for his dataset, he identified that the is churned, uh, data at your was of low quality. So he passed this over to Eliza, who's a data steward and she actually receives this work queue in a collaborative fashion. And she has to review, what is the request? If you recall, this was the request to improve the data quality for his churn. >>Now she needs to familiarize herself with what cliff was observing when he was doing his shopping experience. So she digs in and wants to look at the quality that he was observing and sure enough, as she goes down and it looks at his churn, she sees that it was a low 65% and now understands exactly what cliff was referring to. She says, aha, okay. I need to get help. I need to decide whether I have a data quality project to fix the data, or should I see if there's another data set in the organization that has better, uh, data for this. And so she creates a queue that can go over to one of her colleagues who really focuses on data quality. She submits this request and it goes over to, uh, her colleague, John who's really familiar with data quality. So John actually receives the request from Eliza and you'll see a task showing up in his queue. >>He opens up the request and finds out that Eliza's asking if there's another source out there that actually has good is churned, uh, data available. Now he actually knows quite a bit about the quality of information sturdiness. So he goes into the data quality console and does a quick look for a dataset that he's familiar with called customer product sales. He quickly scrolls down and finds out the one that's actually been published. That's the one he was looking for and he opens it up to find out more information. What data sets are, what columns are actually in there. And he goes down to find his churned is in fact, one of the attributes in there. It actually does have active rules that are associated with it to manage the quality. And so he says, well, let's look in more detail and find out what is the quality of this dataset? >>Oh, it's 86. This is a dramatic improvement over what we've seen before. So we can see again, it's trended quite nicely over time each day, it hasn't actually degraded in performance. So we actually responds back to realize and say, this data set, uh, is actually the data set that you want to bring in. It really will improve. And you'll see that he refers to the refined database within the CRM cloud solution. Once he actually submits this, it goes back to Eliza and she's able to continue her work. Now when Eliza actually brings this back open, she's able to very quickly go into the database registration process for her. She very quickly goes into the CRM cloud, selects the community, to which she wants to register this, uh, data set into the schemas community. And the CRM cloud is the system that she wants to load it in. >>And the refined is the database that John told her that she should bring in. After a quick description, she's able to click register. And this triggers that automatic codeless process of going out to the dataset and bringing back its metadata. Now metadata is great, but it's not the end all be all. There's a lot of other values that she really cares about as she's actually registering this dataset and synchronizing the metadata she's also then asked, would you like to bring in quality information? And so she'll go out and say, yes, of course, I want to enable the quality information from CRM refined. I also want to bring back lineage information to associate with this metadata. And I also want to select profiling and classification information. Now when she actually selects it, she can also say, how often do you want to synchronize this? This is a daily, weekly, monthly kind of update. >>That's part of the change data capture process. Again, all automated without the require of actually writing code. So she's actually run this process. Now, after this loads in, she can then open up this new registered, uh, dataset and actually look and see if it actually has achieved the problem that cliff set her out on, which was improved data quality. So looking into the data quality for the is churn capability shows her that she has fantastic quality. It's at a hundred, it's exactly what she was looking for. So she can with confidence actually, uh, suggest that it's done, but she did notice something and something that she wants to tell John, which is there's a couple of data quality checks that seem to be missing from this dataset. So again, in a collaborative fashion, she can pass that information, uh, for validity and completeness to say, you know what, check for NOLs and MPS and send that back. >>So she submits this onto John to work on. And John now has a work queue in his task force, but remember she's been working in this task forklift and because she actually has actually added a much better source for his churn information, she's going to update that test that was sent to her to notify cliff that the work has actually been done and that she actually has a really good data set in there. In fact, if you recall, it was 100% in terms of its data quality. So this will really make life a lot easier for cliff. Once he receives that data and processes, the churn report analysis next time. So let's talk about these audacious performance goals that we have in mind. Now today, we actually have really strong performance and amazing usability. Our customers continue to tell us how great our usability is, but they keep asking for more well, we've decided to present to you. >>Something you can start to bank on. This is the performance you can expect from us on the highly curated assets that are available for the business users, as well as the technical and lineage assets that are more available for the developer uses and for things that are more warehoused based, you'll see in Q1, uh, our Q2 of this year, we're making available 5 million curated assets. Now you might be out there saying, Hey, I'm already using the software and I've got over 20 million already. That's fair. We do. We have customers that are actually well over 20 million in terms of assets they're managing, but we wanted to present this to you with zero conditions, no limitations we wouldn't talk about, well, it depends, et cetera. This is without any conditions. That's what we can offer you without fail. And yes, it can go higher and higher. We're also talking about the speed with which you can ingest the data right now, we're ingesting somewhere around 50,000 to a hundred thousand records per and of course, yes, you've probably seen it go quite a bit faster, but we are assuring you that that's the case, but what's really impressive is right now, we can also, uh, help you manage 250 million technical assets and we can load it at a speed of 25 million for our, and you can see how over the next 18 months about every two quarters, we show you dramatic improvements, more than doubling of these. >>For most of them leading up to the end of 2022, we're actually handling over a billion technical lineage assets and we're loading at a hundred million per hour. That sets the mark for the industry. Earlier this year, we announced a recent acquisition Al DQ. LDQ brought to us machine learning based data quality. We're now able to introduce to you Collibra data quality, the first integrated approach to Al DQ and Culebra. We've got a demo to follow. I'm really excited to share it with you. Let's get started. So Eliza submitted a task for John to work on, remember to add checks for no and for empty. So John picks up this task very quickly and looks and sees what's what's the request. And from there says, ah, yes, we do have a quality check issue when we look at these churns. So he jumps over to the data quality console and says, I need to create a new data quality test. >>So cliff is able to go in, uh, to the solution and, uh, set up quick rules, automated rules. Uh, he could inherit rules from other things, but it starts with first identifying what is the data source that he needs to connect to, to perform this. And so he chooses the CRM refined data set that was most recently, uh, registered by Lysa. You'll see the same score of 86 was the quality score for the dataset. And you'll also see, there are four rules that are associated underneath this. Now there are various checks that, uh, that John can establish on this, but remember, this is a fairly easy request that he receives from Eliza. So he's going to go in and choose the actual field, uh, is churned. Uh, and from there identify quick rules of, uh, an empty check and that quickly sets up the rules for him. >>And also the null check equally fast. This one's established and analyzes all the data in there. And this sets up the baseline of data quality, uh, for this. Now this data, once it's captured then is periodically brought back to the catalog. So it's available to not only Eliza, but also to cliff next time he, uh, where to shop in the environment. As we look through the rules that were created through that very simple user experience, you can see the one for is empty and is no that we're set up. Now, these are various, uh, styles that can be set up either manually, or you can set them up through machine learning again, or you can inherit them. But the key is to track these, uh, rule creation in the metrics that are generated from these rules so that it can be brought back to the catalog and then used in meaningful context, by someone who's shopping and the confidence that this has neither empty nor no fields, at least most of them don't well now give a confidence as you go forward. >>And as you can see, those checks have now been entered in and you can see that it's a hundred percent quality score for the Knoll check. So with confidence now, John can actually respond back to Eliza and say, I've actually inserted them they're up and running. And, uh, you're in good status. So that was pretty amazing integration, right? And four months after our acquisition, we've already brought that level of integration between, uh, Colibra, uh, data intelligence, cloud, and data quality. Now it doesn't stop there. We have really impressive and high site set early next year. We're getting introduced a fully immersive experience where customers can work within Culebra and actually bring the data quality information all the way in as well as start to manipulate the rules and generate the machine learning rules. On top of it, all of that will be a deeply immersive experience. >>We also have something really clever coming, which we call continuous data profiling, where we bring the power of data quality all the way into the database. So it's continuously running and always making that data available for you. Now, I'd also like to share with you one of the reasons why we are the most universally available software solutions in data intelligence. We've already announced that we're available on AWS and Google cloud prior, but today we can announce to you in Q3, we're going to be, um, available on Microsoft Azure as well. Now it's not just these three cloud providers that were available on we've also become available on each of their marketplaces. So if you are buying our software, you can actually go out and achieve that same purchase from their marketplace and achieve your financial objectives as well. We're very excited about this. These are very important partners for, uh, for our, for us. >>Now, I'd also like to introduce you our system integrators, without them. There's no way we could actually achieve our objectives of growing so rapidly and dealing with the demand that you customers have had Accenture, Deloitte emphasis, and even others have been instrumental in making sure that we can serve your needs when you need them. Uh, and so it's been a big part of our growth and will be a continued part of our growth as well. And finally, I'd like to actually introduce you to our product showcases where we can go into absolute detail on many of the topics I talked about today, such as data governance with Arco or data privacy with Sergio or data quality with Brian and finally catalog with Peter. Again, I'd like to thank you all for joining us. Uh, and we really look forward to hearing your feedback. Thank you..
SUMMARY :
I have the benefit of sharing with you, We also observe that the understanding of and access to data remains in the hands of to imagine if you had a single integrated solution that could deliver a seamless governed, And he's going to analyze it in his favorite BI reporting tool. And so cliff is going to want to make sure that are available to cliff for selection, which one to use? And rather than sifting through all of that information, cliff is going to go ahead and say, well, okay, Cliff has the opportunity to look at it in the broader set. knowledge that there's going to be some customer information in this PII information that he's not going to be And as we scroll down and take a little bit of a focus on what we call or what you'll see here is customer phone, We can also see that the data quality is made up of multiple components, So cliff is going to provide information to the owner to say, case and cliff is going to submit this and all the fun starts there. So cliff has actually submitted the order and the owner, Joanna is actually going to receive the request for the order. in a Tablo report and can see the visualization layer, but you also see an incorporation of something we call Collibra Really a clever combination of bringing the data to you and showing you how to So now they have a full bill of materials to run a customer Shern report and schedule it anytime they want. So allow us to introduce you to what we call the asset life cycle and And so we're going to share with you how you can actually automatically register these sources, And so she creates a queue that can go over to one of her colleagues who really focuses on data quality. And he goes down to find So we actually responds back to realize and say, this data set, uh, is actually the data set that you want And the refined is the database that John told her that she should bring in. So again, in a collaborative fashion, she can pass that information, uh, So she submits this onto John to work on. We're also talking about the speed with which you can ingest the data right We're now able to introduce to you Collibra data quality, the first integrated approach to Al So cliff is able to go in, uh, to the solution and, uh, set up quick rules, So it's available to not only Eliza, but also to cliff next time he, uh, And as you can see, those checks have now been entered in and you can see that it's a hundred percent quality Now, I'd also like to share with you one of the reasons why we are the most And finally, I'd like to actually introduce you to our product showcases where we can go into
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Joanna | PERSON | 0.99+ |
John | PERSON | 0.99+ |
Brian | PERSON | 0.99+ |
Jim Cushman | PERSON | 0.99+ |
Deloitte | ORGANIZATION | 0.99+ |
Peter | PERSON | 0.99+ |
Eliza | PERSON | 0.99+ |
Accenture | ORGANIZATION | 0.99+ |
cliff | PERSON | 0.99+ |
Arco | ORGANIZATION | 0.99+ |
100% | QUANTITY | 0.99+ |
5 million | QUANTITY | 0.99+ |
250 million | QUANTITY | 0.99+ |
20 | QUANTITY | 0.99+ |
65 | QUANTITY | 0.99+ |
28% | QUANTITY | 0.99+ |
25 million | QUANTITY | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
98 | QUANTITY | 0.99+ |
Cliff | PERSON | 0.99+ |
Collibra | ORGANIZATION | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
5% | QUANTITY | 0.99+ |
first section | QUANTITY | 0.99+ |
68 | QUANTITY | 0.99+ |
first | QUANTITY | 0.99+ |
76 | QUANTITY | 0.99+ |
One | QUANTITY | 0.99+ |
five stars | QUANTITY | 0.99+ |
Culebra | ORGANIZATION | 0.99+ |
LDQ | ORGANIZATION | 0.99+ |
91 columns | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
Al DQ | ORGANIZATION | 0.99+ |
Cleaver | ORGANIZATION | 0.99+ |
86 | QUANTITY | 0.99+ |
one | QUANTITY | 0.98+ |
three | QUANTITY | 0.98+ |
end of 2022 | DATE | 0.98+ |
each day | QUANTITY | 0.98+ |
each | QUANTITY | 0.98+ |
over 20 million | QUANTITY | 0.98+ |
Cliff cliff | PERSON | 0.98+ |
next year | DATE | 0.98+ |
Q1 | DATE | 0.98+ |
70 | QUANTITY | 0.98+ |
ORGANIZATION | 0.98+ | |
Tableau | TITLE | 0.98+ |
Welcome to Data Citizens'21
>>Welcome to data, citizens, to anyone I'm thrilled that so many of you joining us this year for what I think will be our best conference yet. This is always my favorite moment of the year. And what makes it especially meaningful for me at this time is that we've all faced so much uncertainty over the last year. Being able to bring together or community of data, citizens, data professionals, customers, and partners gives me so much energy. We all share the same passion to use data, to create positive change in our work. And in our lives 2021 has been called a year of transitions and rightfully so the pandemic has changed our lives, our businesses and our society. It has changed or world. There's been a number of notable shifts over the last 18 months. And I like to bring up three shifts that I personally connect to. >>And these will likely resonate with many of you too. First as a shift though, it's remote work at the start of the pandemic, tens of millions of people across many industries, transition to working from home. This transition happened and presented really fast. And in many cases have happened overnight. For me not being able to meet our customers and our federal court. He begins in Berson, especially during such turbulent times. I've always actually welcomed over 200 new colleagues. New Colombians was especially hard. The second is of course, a shift towards online retail in the U S e-commerce was forecasted to reach 24% of total retail sales by 2024, but by July, 2020. So four years earlier, it had already reached 33% that has translated into an enormous boost for delivery companies. And finally, the supply chain reinvention, the pandemic reveal the complexity and vulnerabilities in the supply chains of many different companies from raw materials to freight disruptions, to labor shortages. >>The damage from the pandemic was felt everywhere. For example, my wife and I have been waiting for us for over six months for a four year old daughter's first bike. Now, many companies are oriented towards data and analytics to reduce costs and better understand, manage and optimize their entire value chain. Now, the one thing that all of these shifts have in common is that they accelerated the massive growth of digitization. This transition to digital isn't new, but how much it has accelerated. Hasn't been easy for organizations in many cases as has happened under enormous pressure. And that digitization has resulted in two related trends. First, an explosion of digital channels, which has created unprecedented amounts of new data, this more volume and more variety of data than ever before. It's been distributed broadly across organizations. Again, this is not a nutrient, but one that has also accelerated imagine just the amount of data that is now on tick-tock. >>It's also a great example of the responsibilities and risks that come with all of that data. This brings me to the second trend and risk that we had started seeing even before the pandemic, the creation of ever more data silos, these silos result in disjointed and often ineffective data teams. And what is more concerning is that it's often a lack of confidence in the outcome. This leads to an overall lack of trust in the information we need to solve this every day, maybe every hour, every minute we rely on data to make both transactional and transformative business decisions. Every organization today depends on mission, critical insights and data critical processes. What happens if suddenly there's a data problem, this could impact our resourcing or customers or back-office or entire ecosystem, the integrity and the reliability of data has real immediate, uh, long-term implications for our businesses and our reputations. >>And this will determine the trajectory of our success. We all feel the weight of data, the immense opportunities and potential implications associated with it. And this is a lot of pressure to bear, but I believe that we have the ability to take control of our data to become more effective and how we work to be more productive and to ultimately generate faster and better outcomes. I believe this is a pivotal moment as organizations transition from reacting to the pandemic, to building a healthy new, normal, we have an extraordinary opportunity to make good use of our data and by doing so, I believe we can achieve extraordinary things by making trusted data more accessible and more usable. We can do even more. We can get more out of our work. Uh, we can put more work into it. We can help our organizations serve more customers and enrich more communities with trusted data. We have the power to change things for good and with it, there's no limit to what people, businesses or society can achieve. When we are United by data, >>The world doesn't just run on information. It runs on people living their passion, dreaming big ideas, but without information without the data, those ideas won't become innovations. That's why at Colibra we're changing how organizations use data. So our customers can change the world. We make data easier to access by making it usable, manageable, and practical. We make it make sense. So people have a common language to share and shape their ideas. And no matter how far and wide that data is scattered, we make sure it's all within reach, connecting the disconnected, joining the disjointed so people can collaborate and trust that their data won't slow them down so they can prove that data has the power to change things for good, doing more enriching, more, helping more together with Collibra. You can be United by data >>United by data. All of us here are United by our passion for data. We are all data citizens, and there's so much power in this community. Uniting is also what the Colibra data intelligence cloud or product does it unites your entire organization to deliver accurate, trusted data for every use for every user and across every source managed, trusted, and accessible. These are the crucial elements that will give your teams the ability to easily collaborate and make every data workflow more productive. There's also some of the experience and the impacts of our customers take Freddie Mac. For example, it's driving their data ecosystem transformation with 5.5 billion data points and over true trillion dollars in assets. Under management, Freddie Mac leveraged Columbia to support the digital transformation and management of its data ecosystem. They eliminate duplicate data, spending improve data lake productivity and drive enhance data quality while delivering increased value for their customers. >>It's also at the heart of what Yelp is doing to connect its engineers, to trusted data unleashed product innovation and instill a data-driven culture. And why companies like audio and BT are promoting the importance of data, culture, and making data easily accessible to the data citizens throughout their organization. Over the next couple of days, the knowledge shared by our partners, our customers, guest speakers. And could you begins, will inspire and energize you to keep moving forward as change agents United by data. Again, I'm so glad to kick off data citizens and thank you for being here with us.
SUMMARY :
We all share the same passion to use data, to create positive change in the supply chains of many different companies from raw materials to freight disruptions, imagine just the amount of data that is now on tick-tock. It's also a great example of the responsibilities and risks that come with all of that data. We have the power to change things for good and with it, We make data easier to access by making it These are the crucial elements that will give your teams the ability It's also at the heart of what Yelp is doing to connect its engineers, to trusted data unleashed
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
33% | QUANTITY | 0.99+ |
24% | QUANTITY | 0.99+ |
July, 2020 | DATE | 0.99+ |
Yelp | ORGANIZATION | 0.99+ |
United | ORGANIZATION | 0.99+ |
first bike | QUANTITY | 0.99+ |
2021 | DATE | 0.99+ |
First | QUANTITY | 0.99+ |
2024 | DATE | 0.99+ |
last year | DATE | 0.99+ |
Colibra | ORGANIZATION | 0.99+ |
Berson | LOCATION | 0.99+ |
second | QUANTITY | 0.99+ |
over six months | QUANTITY | 0.99+ |
Freddie Mac | ORGANIZATION | 0.99+ |
pandemic | EVENT | 0.99+ |
5.5 billion data points | QUANTITY | 0.98+ |
four year old | QUANTITY | 0.98+ |
this year | DATE | 0.98+ |
BT | ORGANIZATION | 0.98+ |
both | QUANTITY | 0.98+ |
over 200 new colleagues | QUANTITY | 0.98+ |
four years earlier | DATE | 0.97+ |
second trend | QUANTITY | 0.97+ |
today | DATE | 0.97+ |
U S | LOCATION | 0.96+ |
tens of millions of people | QUANTITY | 0.96+ |
Collibra | ORGANIZATION | 0.95+ |
one thing | QUANTITY | 0.94+ |
last 18 months | DATE | 0.94+ |
trillion dollars | QUANTITY | 0.94+ |
Columbia | ORGANIZATION | 0.93+ |
two related | QUANTITY | 0.85+ |
three shifts | QUANTITY | 0.66+ |
days | DATE | 0.59+ |
United | LOCATION | 0.59+ |
a year | QUANTITY | 0.58+ |
one | QUANTITY | 0.57+ |
Colombians | PERSON | 0.52+ |
hour | QUANTITY | 0.5+ |
Data | ORGANIZATION | 0.45+ |
couple | DATE | 0.36+ |
Data Citizens '21 Preview with Felix Van de Maele, CEO, Collibra
>>At the beginning of the last decade, the technology industry was a buzzing because we were on the cusp of a new era of data. The promise of so-called big data was that it would enable data-driven organizations to tap a new form of competitive advantage. Namely insights from data at a much lower cost. The problem was data became plentiful, but insights. They remained scarce, a rash of technical complexity combined with a lack of trust due to conflicting data sources and inconsistent definitions led to the same story that we've heard for decades. We spent a ton of time and money to create a single version of the truth. And we're further away than we've ever been before. Maybe as an industry, we should be approaching this problem differently. Perhaps it should start with the idea that we have to change the way we serve business users. I E those who understand data context, and with me to discuss the evolving data space, his company, and the upcoming data citizens conference is Felix van de Mala, the CEO and founder of Collibra. Felix. Welcome. Great to see you. >>Great to see you. Great to be here. >>So tell us a little bit about Collibra and the problem that you're solving. Maybe you could double click on my upfront narrative. >>Yeah, I think you said it really well. Uh, we've seen so much innovation over the last couple of years in data, the exploding volume complexity of data. We've seen a lot of innovation of how to store and process that data, that, that volume of data more effectively or more cost-effectively, but fundamentally the source of the problem as being able to really derive insights from that data effectively when it's for an AI model or for reporting, it's still as difficult as it was, let's say 10 years ago. And if only in a way it's only become more, uh, more difficult. And so what we fundamentally believe is that next to that innovation on the infrastructure side of data, you really need to look at the people on process side of data. There's so many more people that today consume and produce data to do their job. >>That's why we talk about data citizens. They have to make it easier for them to find the right data in a way that they can trust that there's confidence in that data to be able to make decisions and to be able to trust the output of that, uh, of that model. And that's really what is focused on initially around governance. Uh, how do you make sure people actually are companies know what data they have and make sure they can trust it and they can use it in a compliant way. And now we've extended that into the only data intelligence platform today in the industry where we just make it easier for organizations to truly unite around the data across the whole organization, wherever that data is stored on premise and the cloud, whoever is actually using or consuming data. Uh, that's why we talk about data citizens. I >>Think you're right. I think it is more complex. There's just more of it. And there's more pressure on individuals to get advantage from it. But I, to ask you what sets Culebra apart, because I'd like you to explain why you're not just another data company chasing a problem with w it's going to be an incremental solution. It's really not going to change anything. What, what sets Collibra apart? >>Yeah, that's a really good question. And I think what's fundamentally sets us apart. What makes us unique is that we look at data or the problem around data as truly a business problem and a business function. So we fundamentally believe that if you believe that data is an asset, you really have to run it as a, as a, as a strategic business functions, just like your, um, uh, your HR function, your people function, your it functioning says a marketing function. You have a system to run that function. Now you have Salesforce to run sales and marketing. You have service now to run your, it function. You have Workday to run your people function, but you need the same system to really run your data from. And that's really how we think about GDPR. So we not another kind of faster, better database we know than other data management tool that makes the life of a single individual easier, which really a business application that focuses on how do we bring people together and effective rate so that they can collaborate around the data. It creates efficiency. So you don't have to do things ad hoc. You can easily find the right information. You can collaborate effectively. And it creates the confidence to actually be able to do something with the outcomes of it, the results of all of that work. And so fundamentally I'm looking at the problem as a, as a business function that needs a business system. We call it the system of record or system of engagement for the, for the data function, I think is absolutely critical and, and really unique in the, in our approach. So >>Data citizens are big user conference, data citizens, 21, it's coming up June 16th and 17th, the cubes stoked because we love talking about data. This is the first time we're bringing the cube to that event. So we're really gearing up for it. And I wonder if it could tell us a little bit about the history and the evolution of the data citizens conference? >>Absolutely. I think the first one is set at six years ago where we had a small event at a hotel downtown New York. Uh, most of the customers as their user conference, a lot of the banks, which are at the time of the main customers at 60 people. So very small events, and it exploded ever since, uh, this year we expect over 5,000 people. So it's really expanded beyond just the user conference to really become more of almost the community conference and the industry, um, the conference. So we're really excited, a big part of what we do, why we care so much about the conference. That's an opportunity to build that data citizens community. That's what we hear from our customers, from all attendees that come to the conference, uh, bring those people to get us all care about the same topic and are passionate about doing more at data, uh, being able to connect, uh, connect people together as a big part of that. So we've always, uh, we're always looking forwards, uh, through the event, uh, from that perspective >>Competition, of course, for virtual events these days with them, what's in it for me, what, who should attend and what can attendees expect from data citizens? 21. >>Yeah, absolutely. The good thing about the virtual event, uh, event is that everybody can attend. It's free, it's open from across the road, of course, but what we want for people to take away as attendees is that you learn something at pragmatics or the next day on the job, you can do something. You've learned something very specific. We've also been, um, um, excited and looked at what is possible from an innovation perspective. And so that's how we look at the events. We bring a lot of, um, uh, customers on my realization that they're going to share their best practices, very specifically, how they are, how they are handling data governance, how they're doing data, data, cataloging, how they're doing data privacy. So very specific best practices and tips on how to be successful, but then also industry experts that can paint the picture of where we going as an industry, what are the best practices? >>What do we need to think about today to be ready for what's going to come tomorrow? So that's a big focus. We, of course, we're going to talk about and our product. What are we, what do we have in store from a product roadmap and innovation perspective? How are we helping these organizations get their foster and not aspect as we were being in a lot of partners as well? Um, and so that's a big part of that broader ecosystem, uh, which is, which is really interesting. And I finally, like I said, it's really around the community, right? And that's what we hear continuously from the attendees. Just being able to make these connections, learn new people, learn what they're doing, how they've, uh, kind of, um, solved certain challenges. We hear that's a really big part of, uh, of the value proposition. So as an attendee, uh, the good thing is you can, you can join from anywhere. Uh, all of the content is going to be available on demand. So later it's going to be available for you to have to look at as well. Plus you're going to be farther out. You're going to become part of that data, citizens community, which has a really thriving and growing community where you're going to find a lot of like-minded people with the same passion, the same interest that McConnell learned the most from, well, I'd rather >>Like the term data citizen. I consider myself a data citizen, and it has implications just in terms of putting data in the hands of, of business users. So it's sort of central to this event, obviously. W what is a data citizen to Collibra? >>Yeah, it's, it's a really core part of our mission and our vision that we believe that today everyone needs data to do their job. Everyone in that sense has become a data citizen in the sense that they need to be able to easily access trustworthy data. We have to make it easy for people to easily find the right data that they can trust that they can understand. And I can do something like with and make their job easier. On the other hand, like a citizen, you have rights and you have responsibilities as a data citizen. You also have the responsibility to treat that data in the right way to make sure from a privacy and security perspective, that data is a as again, like I said, treated in the right way. And so that combination of making it easy, making it accessible, democratizing it, uh, but also making sure we treat data in the right way is really important. And that's a core part of what we believe that everyone is going to become a data citizen. And so, um, that's a big part of our mission. I like that >>We're to enter into a contract, I'll do my part and you'll give me access to that data. I think that's a great philosophy. So the call to action here, June 16th and 17th, go register@citizensdotcollibra.com go register because it's not just the normal mumbo jumbo. You're going to get some really interesting data. Felix, I'll give you the last word. >>No, like I said, it's like you said, go register. It's a great event. It's a great community to be part of June 16 at 17, you can block it in your calendar. So go to citizens up pretty bad outcome. It's going to be a, it's going to be a great event. Thanks for helping >>Us preview. Uh, this event is going to be a great event that really excited about Felix. Great to see you. And we'll see you on June 16th and 17th. Absolutely. All right. Thanks for watching everybody. This is Dave Volante for the cube. We'll see you next time.
SUMMARY :
At the beginning of the last decade, the technology industry was a buzzing because we were on Great to be here. So tell us a little bit about Collibra and the problem that you're solving. effectively or more cost-effectively, but fundamentally the source of the problem as being able to to be able to trust the output of that, uh, of that model. But I, to ask you what sets Culebra apart, And it creates the confidence to actually be able to do something with the the cubes stoked because we love talking about data. So it's really expanded beyond just the user conference to really become more of almost the community Competition, of course, for virtual events these days with them, what's in it for me, what, it's open from across the road, of course, but what we want for people to take Uh, all of the content is going to be available on demand. So it's sort of central to this event, You also have the responsibility to treat So the call to action here, June 16th and 17th, go register@citizensdotcollibra.com It's a great community to be part of June Uh, this event is going to be a great event that really excited about Felix.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Felix van de Mala | PERSON | 0.99+ |
Felix Van de Maele | PERSON | 0.99+ |
Dave Volante | PERSON | 0.99+ |
Felix | PERSON | 0.99+ |
June 16 | DATE | 0.99+ |
June 16th | DATE | 0.99+ |
17th | DATE | 0.99+ |
60 people | QUANTITY | 0.99+ |
tomorrow | DATE | 0.99+ |
register@citizensdotcollibra.com | OTHER | 0.99+ |
today | DATE | 0.99+ |
Collibra | ORGANIZATION | 0.99+ |
McConnell | PERSON | 0.99+ |
six years ago | DATE | 0.99+ |
over 5,000 people | QUANTITY | 0.98+ |
single | QUANTITY | 0.97+ |
Culebra | ORGANIZATION | 0.96+ |
this year | DATE | 0.96+ |
GDPR | TITLE | 0.95+ |
first one | QUANTITY | 0.95+ |
10 years ago | DATE | 0.95+ |
first time | QUANTITY | 0.95+ |
last decade | DATE | 0.93+ |
17 | DATE | 0.92+ |
New York | LOCATION | 0.89+ |
21 | DATE | 0.88+ |
single version | QUANTITY | 0.88+ |
decades | QUANTITY | 0.86+ |
data citizens | EVENT | 0.75+ |
next day | DATE | 0.72+ |
double | QUANTITY | 0.66+ |
last couple of years | DATE | 0.64+ |
Data Citizens | TITLE | 0.63+ |
more people | QUANTITY | 0.61+ |
ton | QUANTITY | 0.53+ |
Salesforce | ORGANIZATION | 0.51+ |
'21 | DATE | 0.44+ |
Stijn Christiaens | Data Citizen 22
>>Hey everyone. I'm Lisa Martin covering Data Citizens 22, brought to you by Collibra. This next conversation is gonna focus on the importance of data culture. One of our Cube alumni is back, Stan Christians is Collibra's co-founder and it's Chief Data citizen. Stan, it's great to have you back on the cube. >>Hey, Lisa, nice to be here. >>So we're gonna be talking about the importance of data culture, data intelligence, maturity, all those great things. When we think about the data revolution that every business is going through, you know, so much more than technology innovation, it also really re requires cultural transformation, community transformation. Those are challenging for customers to undertake. Talk to us about what you mean by data citizenship and the role that creating a data culture plays in that journey. >>Right. So as you know, our event is called Data Citizens because we believe that in the end, a data citizen is anyone who uses data to do their job. And we believe that today's organizations, you have a lot of people, most of the employees in an organization are somehow going to be a data citizen, right? So you need to make sure that these people are aware of it. You need to make sure that these people have the skills and competencies to do with data what is necessary. And that's on all levels, right? So what does it mean to have a good data culture? It means that if you're building a beautiful dashboard to try and convince your boss, we need to make this decision that your boss is also open to and able to interpret, you know, the data presented in that dashboard to actually make that decision and take that action, right? >>And once you have that why through the organization, that's when you have a good data culture. Now, that's a continuous effort for most organizations because they, they're always moving, somehow there, hiring new people. And it has to be a continuous effort because we've seen that on the one hand, organizations continue to be challenged with controlling their data sources and where all the data is flowing, right? Which in itself creates a lot of risk. But also on the other set hand of the equation, you have the benefits. You know, you might look at regulatory drivers like, we have to do this, right? But it's, it's much better right now to consider the competitive drivers, for example. And we did an IDC study earlier this year, quite interesting. I can recommend anyone to read it. And one of the conclusions they found as they surveyed over a thousand people across organizations worldwide is that the ones who are higher in maturity. >>So the, the organizations that really look at data as an asset, look at data as a product and actively try to be better at it, don't have three times as good a business outcome as the ones who are lower on the maturity scale, right? So you can say, Okay, I'm doing this, you know, data culture for everyone, wakening them up as data citizens. I'm doing this for competitive reasons, I'm doing this for regulatory reasons. You're trying to bring both of those together and the ones that get data intelligence right, are just going to be more successful and more competitive. That's our view, and that's what we're seeing out there in the market. >>Absolutely. We know that just generally stand right, The organizations that are, are really creating a, a data culture and enabling everybody within the organization to become data citizens are, We know that in theory they're more competitive, they're more successful. But the IDC study that you just mentioned demonstrates they're three times more successful and competitive than their peers. Talk about how Collibra advises customers to create that community, that culture of data when it might be challenging for an organization to adapt culturally. >>Of course, of course it's difficult for an organization to adapt, but it's also necessary, as you just said, imagine that, you know, you're a modern day organization, phones, laptops, what have you, you're not using those IT assets, right? Or you know, you're delivering them through your, throughout the organization, but not enabling your colleagues to actually do something with that asset. Same thing is true with data today, right? If you are not properly using the data assets and your competitors are, they're going to get more advantage. So as to how you get this zone or how you establish this culture, there's a few angles to look at. I would say, Lisa, so one angle is obviously the leadership angle whereby whoever is the boss of data in the organization, you typically have multiple bosses there, like achieve data officers. Sometimes there's, there's multiple, but they may have a different title, right? >>So I'm just gonna summarize it as a data leader for a second. So whoever that is, they need to make sure that there's a clear vision, a clear strategy for data. And that strategy needs to include the monetization aspect. How are you going to get value from data? Yes. Now that's one part because then you can clearly see the example of your leadership in the organization and also the business value. And that's important because those people, their job in essence really is to make everyone in the organization think about data as an asset. And I think that's the second part of the equation of getting that culture right, is it's not enough to just have that leadership out there, but you also have to get the hearts and minds of the data champions across the organization. You really have to win them over. And if you have those two combined and obviously a good technology to, you know, connect those people and have them execute on their responsibilities, such as as a data intelligence platform like Colibra, then you have the pieces in place to really start upgrading that culture inch by inch if youll, >>Yes, I like that. The recipe for success. So you are the co-founder of colibra. You've worn many different hats along this journey. Now you're building Collibra's own data office. I like how before we went live, we were talking about Collibra is drinking its own champagne. I always loved to hear stories about that. You're speaking at Data Citizens 2022. Talk to us about how you are building a data culture within Collibra and what maybe some of the specific projects are that Collibra's data office is working on. >>Yes, and it is indeed data citizens. There are a ton of speakers here, very excited. You know, we have Barb from MIT speaking about data monetization. We have dig pat at the last minute on the agenda. So really exciting agenda. Can't wait to get back out there. But essentially you're right. So over the years at cbra, we've been doing this now since 2008, so a good 15 years. And I think we have another decade of work ahead in the market, just to be very clear. Data is here to stick around as are we. And myself, you know, when you start a company, we were for people in a, in a garage if you will. So everybody's wearing all sorts of hat at that time. But over the years I've run, you know, pre-sales at colibra, I've run post-sales partnerships, product, et cetera. And as our company got a little bit biggish for now, 1,200, something like that, people in the company, I believe systems and processes become a lot more important, right? >>So we said, you know, Colibra isn't the size of our customers yet, but we're getting there in terms of organizations, structure, process systems, et cetera. So we said, it's really time for us to put our money where our mouth is and to set up our own data office, which is what we were seeing at all of our customers are doing, and which is what we're seeing that organizations worldwide are doing. And Gartner was predicting us as well. They said, Okay, organizations have an HR unit, they have a finance unit, and over time they'll all have a department, if you will, that is responsible somehow for the data. So we said, Okay, let's try to set a an example at cbra. Let's try to set up our own data office and such way that other people can take away with it, right? Can take away from it. >>So we set up a data strategy, we started building data products, took care of the data infrastructure, that sort of good stuff. And in doing all of that, Lisa, exactly as you said, we said, okay, we need to also use our own product and our own practices, right? And from that use, learn how we can make the product better, learn how we can make the practice better, and share that learning with all of the markets of course. And on, on the Monday mornings, we sometimes refer to that as eating our own dog foods or Friday evenings we refer to that as drinking our own champagne. I like it. So we, we had a, we had the driver to do this, you know, there's a clear business reason. So we involved, we included that in the data strategy and that's a little bit of our origin. >>Now how, how do we organize this? We have three pillars, and by no means is this a template that everyone should follow? This is just the organization that works at our company, but it can serve as an inspiration. So we have a pillar, which is data science. The data product builders if you will, or the people who help the business build data products. We have the data engineers who help keep the lights on for that data platform to make sure the products, the data products can run, the data can flow and you know, the quality can be checked. And then we have a data intelligence or data governance builder where we have those data governance, data intelligence stakeholders who help the business as a sort of data partner to the business stakeholders. So that's how we've organized it. And then we started following the calibra approach, which is, well, what are the challenges that our business stakeholders have in hr, finance, sales, marketing all over? >>And how can data help overcome those challenges? And from those use cases, we then just started to build a roadmap and started execution on use case after use case. And a few important ones there are very simple, we see them with our, all our customers as well. People love talking about the catalog, right? The catalog for the data scientists to know what's in their data lake, for example, and for the people in and legal and privacy. So they have their process registry and they can see how the data flows. So that's a popular starting place. And that turns into a marketplace so that if new analysts and data citizens join cbra, they immediately have a place to go to, to look and see, okay, what data is out there for me as an analyst or a data scientist or whatever to do my job, right? >>So they can immediately get access to the data. And another one that we did is around trusted business reporting. We're seeing that since 2008. You know, self-service BI allowed everyone to make beautiful dashboards, you know, by pie charts. I always, my pet peeve is the pie charts because I love buy and you shouldn't always be using pie charts. But essentially there's become proliferation of those reports. And now executives don't really know, okay, should I trust this report or that report the reporting on the same thing. But the numbers seem different, right? So that's why we have trusted business reporting. So we know if a report, a dashboard, a data product essentially is built, we know that all the right steps are being followed and that whoever is consuming that can be quite confident in the result either right, in that silver or browser Absolutely key. Exactly. Yes. A absolutely. >>Talk a little bit about some of the, the key performance indicators that you're using to measure the success of the data office. What are some of those KPIs? >>KPIs and measuring is a big topic in the, in the data chief data officer profession, I would say, and again, it always varies with respect to your organization, but there's a few that we use that might be of interest to you. So remember we have those three pillars, right? And we have metrics across those pillars. So for example, a pillar on the data engineering side is gonna be more related to that uptime, right? Audit is a data platform up and running. Are the data products up and running? Is the quality in them good enough? Is it going up? Is it going down? What's the usage? But also, and especially if you're in the cloud and if consumption is a big thing, you have metrics around cost, for example, right? So that's one set of examples. Another one is around the data science and the products. >>Are people using them? Are they getting value from it? Can we calculate that value in a monetary perspective, right? So that we can to the rest of the business continue to say we're tracking on those numbers. And those numbers indicate that value is generated and how much value estimated in that region. And then you have some data intelligence, data governance metrics, which is, for example, you have a number of domains in a data mesh. People talk about being the owner of a data domain, for example, like product or customer. So how many of those domains do you have covered? How many of them are already part of the program? How many of them have owners assigned? How well are these owners organized, executing on their responsibilities? How many tickets are open closed? How many data products are built according to process? And so on and so forth. So these are an a set of examples of, of KPIs. There's a, there's a lot more, but hopefully those can already inspire the audience. >>Absolutely. So we've, we've talked about the rise of cheap data offices, it's only accelerating. You mentioned this is like a 10 year journey. So if you were to look into a crystal ball, what do you see in terms of the maturation of data offices over the next decade? >>So we, we've seen indeed the, the role sort of grow up, I think in, in 2010 there may have been like 10 chief data officers or something. Gartner has exact numbers on them, but then they grew, you know, 400, they were like mostly in financial services, but they expanded then to all of industries and then to all of the season. The number is estimated to be about 20,000 right now. Wow. And they evolved in a sort of stack of competencies, defensive data strategy, because the first chief data officers were more regulatory driven, offensive data strategy support for the digital program. And now all about data products, right? So as a data leader, you'd now need all of those competences and need to include them in, in your strategy. >>How is that going to evolve for the next couple of years? I wish I had one of those crystal balls, right? But essentially I think for the next couple of years there's gonna be a lot of people, you know, still moving along with those four levels of the stack. A lot of people I see are still in version one and version two of the chief data officer. So you'll see over the years that's going to evolve more digital and more data products. So for next three, five years, my, my prediction is it's all going to be about data products because it's an immediate link between the data and, and the dollar essentially, right? So that's gonna be important and quite likely a new, some new things will be added on, which nobody can predict yet. But we'll see those pop up in a few years. >>I think there's gonna be a continued challenge for the chief data officer role to become a real executive role as opposed to, you know, somebody who claims that they're executive, but then they're not. Right? So the real reporting level into the board, into the CEO for example, will continue to be a challenging point. But the ones who do get that done will be the ones that are successful. Yeah. And the ones who get that done will be the ones that do it on the basis of data monetization, right? Connecting value to the data and making that very clear to all the data citizens in the organization, right? Really and in that sense, value chain, they'll need to have both, you know, technical audiences and non-technical audiences aligned of course. And they'll need to focus on adoption. Again, it's not enough to just have your data office be involved in this. It's really important that you're waking up data citizens across the organization and you make everyone in the organization think about data as an essence. >>Absolutely. Because there's so much value that can be extracted if organizations really strategically build that data office and democratize access across all those data citizens. Stan, this is an exciting arena. We're definitely gonna keep our eyes on this. Sounds like a lot of evolution and maturation coming from the data office perspective. From the data citizen perspective. And as the data show that you mentioned in that IDC study, you mentioned Gartner as well, organizations have so much more likelihood of being successful in being competitive. So we're gonna watch this space. Stan, thank you so much for joining me on the queue at Data Citizens 22. We appreciate it. >>Thanks for having me over >>From Data Citizens 22, I'm Lisa Martin, you're watching The Cube, the leader in live tech coverage.
SUMMARY :
Stan, it's great to have you back on the cube. Talk to us about what you mean by data citizenship and the And we believe that today's organizations, you have a lot of people, the equation, you have the benefits. So you can say, Okay, I'm doing this, you know, data culture for everyone, wakening them But the IDC study that you just mentioned demonstrates they're So as to how you get this zone or how you establish this of the equation of getting that culture right, is it's not enough to just have that leadership out there, So you are the co-founder of colibra. So over the years at cbra, we've been doing this now since 2008, so a good 15 years. So we said, you know, Colibra isn't the size of our customers yet, but we're we had the driver to do this, you know, there's a clear business reason. make sure the products, the data products can run, the data can flow and you know, the data scientists to know what's in their data lake, for example, and for the people in So they can immediately get access to the data. Talk a little bit about some of the, the key performance indicators that you're using to measure the success of the So for example, a pillar on the data engineering side is gonna be more related So how many of those domains do you have covered? So if you were to Gartner has exact numbers on them, but then they grew, you know, How is that going to evolve for the next couple of years? Really and in that sense, value chain, they'll need to have both, you know, And as the data show that you mentioned in that IDC study, you mentioned Gartner as well, the leader in live tech coverage.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Lisa | PERSON | 0.99+ |
Lisa Martin | PERSON | 0.99+ |
Collibra | ORGANIZATION | 0.99+ |
Barb | PERSON | 0.99+ |
2010 | DATE | 0.99+ |
Stijn Christiaens | PERSON | 0.99+ |
10 year | QUANTITY | 0.99+ |
Stan | PERSON | 0.99+ |
Stan Christians | PERSON | 0.99+ |
one part | QUANTITY | 0.99+ |
Gartner | ORGANIZATION | 0.99+ |
one angle | QUANTITY | 0.99+ |
2008 | DATE | 0.99+ |
1,200 | QUANTITY | 0.99+ |
15 years | QUANTITY | 0.99+ |
400 | QUANTITY | 0.99+ |
10 chief data officers | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
five years | QUANTITY | 0.99+ |
MIT | ORGANIZATION | 0.99+ |
The Cube | TITLE | 0.99+ |
both | QUANTITY | 0.99+ |
IDC | ORGANIZATION | 0.98+ |
over a thousand people | QUANTITY | 0.98+ |
three pillars | QUANTITY | 0.98+ |
three times | QUANTITY | 0.98+ |
one | QUANTITY | 0.98+ |
One | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
about 20,000 | QUANTITY | 0.98+ |
second part | QUANTITY | 0.97+ |
cbra | ORGANIZATION | 0.96+ |
Colibra | ORGANIZATION | 0.95+ |
next couple of years | DATE | 0.94+ |
Data Citizens | EVENT | 0.93+ |
Data Citizens 22 | EVENT | 0.93+ |
Monday mornings | DATE | 0.92+ |
earlier this year | DATE | 0.92+ |
next decade | DATE | 0.91+ |
one set | QUANTITY | 0.9+ |
version two | OTHER | 0.89+ |
colibra | ORGANIZATION | 0.89+ |
Friday | DATE | 0.86+ |
Data Citizens 22 | ORGANIZATION | 0.85+ |
version one | OTHER | 0.82+ |
Data | EVENT | 0.81+ |
Data Citizen 22 | ORGANIZATION | 0.81+ |
first chief data | QUANTITY | 0.8+ |
four levels | QUANTITY | 0.77+ |
three | QUANTITY | 0.76+ |
second | QUANTITY | 0.73+ |
Citizens | ORGANIZATION | 0.68+ |
Data | ORGANIZATION | 0.65+ |
Cube | ORGANIZATION | 0.6+ |
2022 | EVENT | 0.48+ |
Michael Kuzma, Lockheed Martin
>> Announcer: From around the globe. It's theCUBE covering Data Citizens '21 brought to you by Collibra. >> Everybody, John Walls here on theCUBE, continuing our coverage of Data Citizens '21 with Michael Kuzma, who is a Senior Data Engineer at Lockheed Martin but he has just not any Senior Data Engineer. He is the Collibra Ranger of the Year, an outstanding award that certainly honors Michael's dedication to training and evaluation, and development. He is the top dog. And so it is our real pleasure to welcome Michael in this morning. Michael, first off congratulations on the recognition. I know it is well deserved, but, I'm certainly it's been a long time in the making for you. So congratulations on that. >> Thanks, John, thanks so much. >> Yeah, let's talk about the award a little bit here because you're the top Collibra Ranger. The fact that you've undergone this intensive training and evaluation process, what has that or what is that doing for you in terms of your professional development and what you're able to provide Lockheed Martin? >> Well, I think the ranger program definitely has helped with my understanding of the tool. First of all, we're standing up Collibra as sort of the key pillar of data governance within Lockheed Martin. So it's important to have people who are subject-matter experts on the tool that can help the different business areas to be able to stand up and just extract as much value as they can from it. >> Yeah, why did this matter to you? I mean, a lot of work, I mean, a lot of work that went into this and to reach the pinnacle required I know sacrifice and commitment on your part and on your team's part for that matter. But why was this of paramount importance to you? >> Well, I think it was partially because I was early on in my Collibra journey when I took the ranger certification and went through it. So it definitely helped to solidify my understanding of the tool and get more into it. That way I can just provide that value to the customers. We also wanted to see what would it look like for other people at Lockheed Martin to become rangers and get proficient in the tool. So I was kind of the Guinea pig for Lockheed and we were evaluating just how it would help us with standing it up. >> Yeah, I mean, talk about the process, if you will a little bit and share with us just what you went through in terms of how many hours this required, what kind of work you had to do, what kind of training and the evaluation process. So kind of take us through there from A to Z if you will, on your journey. >> Yeah, well, it started off, we had to get a virtual environment stood up just so that we could do some of the exercises that the ranger certification requires. So that was an intensive process of just making sure we had all the infrastructure in place to run the sandbox environment. And then once we got that up, it was mainly doing the exercises of, you're provided with the data landscape. How are you going to represent it in the tool? That way your users both business and technical users could go in and see the data that's in there and be able to get value, be able to get insights from it. And I think it was challenging for sure, to just figure out what all is required for standing up the Collibra environment 'cause that was a piece of the ranger not only how to work the tool, but how to stand it up, how to administrate it and in an effective way and get the metamodel set up in an effective way that way you have that longterm sustainability. So it was good seeing all of those different pieces come together. And then after you put it all together, I had the interviews with the Collibra team where you go over everything you did. So it definitely helps when you have to explain it to somebody they're asking questions. It sort of provides you with that dry run for when people in your business area and your company are going to be trying to use the tool and they might not understand about it or what value it can provide. So having that interview almost like a dry run that you can then help customers when they have questions and come to you. >> Yeah, how helpful was that? I mean, you raised a point, interesting point and really thought about that. You're basically going before the board, if you will, and answering a lot of how's and why's about your process, your thinking process, and what you put into place and how you implemented the tools, what have you. What did you find interesting about that? Or what did you find out about yourself perhaps in your knowledge base through that process? >> I definitely think it stretched my knowledge base for it. It was definitely nerve wracking having to go in and explain your rationale to people but it turned out well. And I feel like if you can explain something, like if you do your prep work and you're able to explain it to somebody else, it sort of proves that you have the true understanding on your side of it. So it was definitely a lot of prep work to just anticipate all the different questions, figure it out on my side first and then be able to answer it effectively. >> Yeah, we all like softballs, but what about curve balls? Were there any curve balls that perhaps that came up in that evaluation process? They're like, "Oh, no, I hadn't thought of that. Or I didn't anticipate that." You know sometimes it's those curve balls that really keep us on our toes. >> Yeah, I can't remember any specific questions. I do remember getting thrown some of those curve balls where you give the answer you think it's sufficient and then there's the build on follow on questions to that where you're like, "Okay, well, I didn't of that." And so you're trying to think through it on the spot. So I definitely got some of those I don't remember the exact questions but it definitely helps to be prepared. >> Yeah, it keeps you on your toes for sure. You mentioned that the value of this, perhaps within Lockheed Martin and being par, I think a great example for others within your organization. What about just kind of in the data community at large or your colleagues at other enterprises. What would you say to them in terms of the value in pursuing this kind of honor, this kind of recognition and how it could be put into good use in their work on the day-to-day side of operations? >> Well, I think for people who are early on and trying to stand it up, the video curriculum definitely helped me out for sure. Learning about both the administrative side, as well as how to use the tool as an end user. If you can put your mind yourself in the mindset of an end user, that's where you can really figure out where the most value is going to be coming from. And it was also good just getting that hands-on experience in a sandbox environment, that you could build it out and not have to worry about it breaking anything for your organization, but also figuring out how are you going to set up the metamodel and get it working before people populate the tool? 'Cause it's a lot harder to make updates when people are using it. It's good to try to get that as well established upfront as possible. So it's definitely good to get that hands-on experience with standing that up. And I think it helps you sort of think through all the different intricacies and nuances for standing up your own environment and getting the most value for your company. >> You know, let's talk about Lockheed Martin a little bit and obviously I'm going to take, everybody's pretty well familiar obviously with your work. I mean 110,000 employees worldwide footprint and obviously security and data security is a critical importance. What does Collibra do for you in that respect in terms of whatever peace of mind you might get in terms of data privacy and data security and reliability all these things that really factor, I would assume in the Lockheed Martin's operations. >> Yeah, it does and we're still thinking through all of the things especially with classified information, but it being metadata helps a lot. People are a lot less apprehensive knowing that it's just metadata in the tool. You're not actually keeping the data itself in the tool. So that way we can still have our security pieces on the underlying data. It's more for that discovery piece for us and we're able to see what shared reports are out there to be able to get lineage for different systems and help people's just business understanding of the things that are out there and the technical users as well, getting value from the lineage and system setups. So I think being able to lock down the view permissions that helps too, you know, puts people's minds at ease if you're able to say, "Okay, well, we can make sure only certain people are able to see this." You know, we have some of those built-in as well. >> Yeah, I mean, that's something I know you've done a lot at Lockheed in terms of working on the tech side and the non-tech side. And trying to explain policies, governance, and determining accessibility and putting the right governance controls in place. From a data perspective, again, sharing your insights what you have learned in that regard at Lockheed Martin what would you say to your fellow data colleagues if you will, again, at other enterprises in terms of getting that kind of collaboration and feedback and input from just not the, just the tech side but also the non-tech side of your house? >> Yeah, it's definitely important to get that business side as well because the technical users that while they work with it so much they might not understand that business users are not going to know what all of these things mean and that they're going to need some sort of human readable version of it. So we have people from the different business areas both business representatives and technical representatives who we work with on a consistent basis to get that continual feedback. And that way we're getting what are the priorities from both sides and seeing sort of where the synergies are across the different business areas as well. That way we're not duplicating effort, but we're trying to make it a comprehensive tool that everybody can use. >> Now I know that your relationship at Lockheed Martin with Clipper goes back some four years now. So you have a maturing relationship for sure. And the value there seems to be pretty well-documented. What would you say to others in your space, again not only about, just about Collibra, but about the data, evolution of data in general in terms of giving advice to somebody who's looking at this as a career, or maybe somebody who is just now getting into a more sophisticated look at their data footprint? >> Yeah it's definitely a large field. There's always new things to learn. It's always evolving too. So I think that that first step for an individual is to be willing to to learn those new things, to learn those new systems, processes, ways of thinking and take on tasks that sort of stretch you in your career. Things that you might not have said yes to before but saying yes could give you more of a comprehensive view of the business or give you a better data view as well. And from the company, it's just trying to figure out where the most value lies. Trying to get everybody sort of on the same page when it's the wild west it becomes a lot harder to extract value and move towards value. So trying to get everybody standardized but also give them the flexibility for their individual program or business needs but try to keep people to where there's a common understanding of the data. >> Now, spoken by someone who's been there and is doing that, Michael, we appreciate the insights. And once again, congratulations on the honor. It is a well-deserved. >> Thank you. Thank you. >> You bet Michael Kuzma joining us from Lockheed Martin as the Collibra Ranger of the Year. We continue our discussion here, Data Citizens '21 on theCUBE. (upbeat music)
SUMMARY :
brought to you by Collibra. He is the Collibra Ranger of the Year, is that doing for you that can help the different business areas and on your team's part for that matter. and get proficient in the tool. and the evaluation process. and see the data that's in there the board, if you will, it sort of proves that you that came up in that evaluation process? but it definitely helps to be prepared. You mentioned that the value of this, and getting the most and obviously I'm going to take, and the technical users as well, what would you say to your that they're going to need And the value there seems to of the business or give you congratulations on the honor. Thank you. as the Collibra Ranger of the Year.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Michael | PERSON | 0.99+ |
John | PERSON | 0.99+ |
Michael Kuzma | PERSON | 0.99+ |
Lockheed | ORGANIZATION | 0.99+ |
Lockheed Martin | ORGANIZATION | 0.99+ |
Lockheed Martin | ORGANIZATION | 0.99+ |
John Walls | PERSON | 0.99+ |
110,000 employees | QUANTITY | 0.99+ |
Collibra | ORGANIZATION | 0.99+ |
both sides | QUANTITY | 0.98+ |
four years | QUANTITY | 0.97+ |
both | QUANTITY | 0.97+ |
Clipper | ORGANIZATION | 0.97+ |
First | QUANTITY | 0.96+ |
first | QUANTITY | 0.96+ |
first step | QUANTITY | 0.95+ |
Data Citizens '21 | TITLE | 0.95+ |
this morning | DATE | 0.82+ |
both business | QUANTITY | 0.71+ |
Collibra | LOCATION | 0.67+ |
Data | TITLE | 0.62+ |
'21 | TITLE | 0.57+ |
Citizens | ORGANIZATION | 0.54+ |
theCUBE | ORGANIZATION | 0.48+ |
'21 | DATE | 0.42+ |
Collibra | TITLE | 0.39+ |
Jim Cushman, CPO, Collibra
>> From around the globe, it's theCUBE, covering Data Citizens'21. Brought to you by Collibra. >> We're back talking all things data at Data Citizens '21. My name is Dave Vellante and you're watching theCUBE's continuous coverage, virtual coverage #DataCitizens21. I'm here with Jim Cushman who is Collibra's Chief Product Officer who shared the company's product vision at the event. Jim, welcome, good to see you. >> Thanks Dave, glad to be here. >> Now one of the themes of your session was all around self-service and access to data. This is a big big point of discussion amongst organizations that we talk to. I wonder if you could speak a little more toward what that means for Collibra and your customers and maybe some of the challenges of getting there. >> So Dave our ultimate goal at Collibra has always been to enable service access for all customers. Now, one of the challenges is they're limited to how they can access information, these knowledge workers. So our goal is to totally liberate them and so, why is this important? Well, in and of itself, self-service liberates, tens of millions of data lyric knowledge workers. This will drive more rapid, insightful decision-making, it'll drive productivity and competitiveness. And to make this level of adoption possible, the user experience has to be as intuitive as say, retail shopping, like I mentioned in my previous bit, like you're buying shoes online. But this is a little bit of foreshadowing and there's even a more profound future than just enabling a self-service, that we believe that a new class of shopper is coming online and she may not be as data-literate as our knowledge worker of today. Think of her as an algorithm developer, she builds machine learning or AI. The engagement model for this user will be, to kind of build automation, personalized experiences for people to engage with data. But in order to build that automation, she too needs data. Because she's not data literate, she needs the equivalent of a personal shopper. Someone that can guide her through the experience without actually having her know all the answers to the questions that would be asked. So this level of self-service goes one step further and becomes an automated service. One to really help find the best unbiased in a labeled training data to help train an algorithm in the future. >> That's, okay please continue. >> No please, and so all of this self and automated service, needs to be complemented with kind of a peace of mind that you're letting the right people gain access to it. So when you automate it, it's like, well, geez are the right people getting access to this. So it has to be governed and secured. This can't become like the Wild Wild West or like a data, what we call a data flea market or you know, data's everywhere. So, you know, history does quickly forget the companies that do not adjust to remain relevant. And I think we're in the midst of an exponential differentiation in Collibra data intelligence cloud is really kind of established to be the key catalyst for companies that will be on the winning side. >> Well, that's big because I mean, I'm a big believer in putting data in the hands of those folks in the line of business. And of course the big question that always comes up is, well, what about governance? What about security? So to the extent that you can federate that, that's huge. Because data is distributed by its very nature, it's going to stay that way. It's complex. You have to make the technology work in that complex environment, which brings me to this idea of low code or no code. It's gaining a lot of momentum in the industry. Everybody's talking about it, but there are a lot of questions, you know, what can you actually expect from no code and low code who were the right, you know potential users of that? Is there a difference between low and no? And so from your standpoint, why is this getting so much attention and why now, Jim? >> You don't want me to go back even 25 years ago we were talking about four and five generational languages that people were building. And it really didn't re reach the total value that folks were looking for because it always fell short. And you'd say, listen, if you didn't do all the work it took to get to a certain point how are you possibly going to finish it? And that's where the four GLs and five GLs fell short as capability. With our stuff where if you really get a great self-service how are you going to be self-service if it still requires somebody right though? Well, I guess you could do it if the only self-service people are people who write code, well, that's not bad factor. So if you truly want the ability to have something show up at your front door, without you having to call somebody or make any efforts to get it, then it needs to generate itself. The beauty of doing a catalog, new governance, understanding all the data that is available for choice, giving someone the selection that is using objective criteria, like this is the best objective cause if it's quality for what you want or it's labeled or it's unbiased and it has that level of deterministic value to it versus guessing or civic activity or what my neighbor used or what I used on my last job. Now that we've given people the power with confidence to say, this is the one that I want, the next step is okay, can you deliver it to them without them having to write any code? So imagine being able to generate those instructions from everything that we have in our metadata repository to say this is exactly the data I need you to go get and perform what we call a distributed query against those data sets and bringing it back to them. No code written. And here's the real beauty Dave, pipeline development, data pipeline development is a relatively expensive thing today and that's why people spend a lot of money maintaining these pipelines but imagine if there was zero cost to building your pipeline would you spend any money to maintain it? Probably not. So if we can build it for no cost, then why maintain it? Just build it every time you need it. And it then again, done on a self-service basis. >> I really liked the way you're thinking about this cause you're right. A lot of times when you hear self self-service it's about making the hardcore developers, you know be able to do self service. But the reality is, and you talk about that data pipeline it's complex a business person sitting there waiting for data or wants to put in new data and it turns out that the smallest unit is actually that entire team. And so you sit back and wait. And so to the extent that you can actually enable self-serve for the business by simplification that is it's been the holy grail for a while, isn't it? >> I agree. >> Let's look a little bit dig into where you're placing your bets. I mean, your head of products, you got to make bets, you know, certainly many many months if not years in advance. What are your big focus areas of investment right now? >> Yeah, certainly. So one of the things we've done very successfully since our origin over a decade ago, was building a business user-friendly software and it was predominantly kind of a plumbing or infrastructure area. So, business users love working with our software. They can find what they're looking for and they don't need to have some cryptic key of how to work with it. They can think about things in their terms and use our business glossary and they can navigate through what we call our data intelligence graph and find just what they're looking for. And we don't require a business to change everything just to make it happen. We give them kind of a universal translator to talk to the data. But with all that wonderful usability the common compromise that you make as well, its only good up to a certain amount of information, kind of like Excel. You know, you can do almost anything with Excel, right? But when you get to into large volumes, it becomes problematic and now you need that, you know go with a hardcore database and application on top. So what the industry is pulling us towards is far greater amounts of data not that just millions or even tens of millions but into the hundreds of millions and billions of things that we need to manage. So we have a huge focus on scale and performance on a global basis and that's a mouthful, right? Not only are you dealing with large amounts at performance but you have to do it in a global fashion and make it possible for somebody who might be operating in a Southeast Asia to have the same experience with the environment as they would be in Los Angeles. And the data needs to therefore go to the user as opposed to having the user come to the data as much as possible. So it really does put a lot of emphasis on some of what you call the non-functional requirements also known as the ilities and so our ability to bring the data and handle those large enterprise grade capabilities at scale and performance globally is what's really driving a good number of our investments today. >> I want to talk about data quality. This is a hard topic, but it's one that's so important. And I think it's been really challenging and somewhat misunderstood when you think about the chief data officer role itself, it kind of emerged from these highly regulated industries. And it came out of the data quality, kind of a back office role that's kind of gone front and center and now is, you know pretty strategic. Having said that, the you know, the prevailing philosophy is okay, we got to have this centralized data quality approach and that it's going to be imposed throughout. And it really is a hard problem and I think about, you know these hyper specialized roles, like, you know the quality engineer and so forth. And again, the prevailing wisdom is, if I could centralize that it can be lower cost and I can service these lines of business when in reality, the real value is, you know speed. And so how are you thinking about data quality? You hear so much about it. Why is it such a big deal and why is it so hard in a priority in the marketplace? You're thoughts. >> Thanks for that. So we of course acquired a data quality company, not burying delete, earlier this year LGQ and the big question is, okay, so why, why them and why now, not before? Well, at least a decade ago you started hearing people talk about big data. It was probably around 2009, it was becoming the big talk and what we don't really talk about when we talk about this ever expanding data, the byproduct is, this velocity of data, is increasing dramatically. So the speed of which new data is being presented the way in which data is changing is dramatic. And why is that important to data quality? Cause data quality historically for the last 30 years or so has been a rules-based business where you analyze the data at a certain point in time and you write a rule for it. Now there's already a room for error there cause humans are involved in writing those rules, but now with the increased velocity, the likelihood that it's going to atrophy and become no longer a valid or useful rule to you increases exponentially. So we were looking for a technology that was doing it in a new way similar to the way that we do auto classification when we're cataloging attributes is how do we look at millions of pieces of information around metadata and decide what it is to put it into context? The ability to automatically generate these rules and then continuously adapt as data changes to adjust these rules, is really a game changer for the industry itself. So we chose OwlDQ for that very reason. It's not only where they had this really kind of modern architecture to automatically generate rules but then to continuously monitor the data and adjust those rules, cutting out the huge amounts of costs, clearly having rules that aren't helping you save and frankly, you know how this works is, you know no one really complains about it until there's the squeaky wheel, you know, you get a fine or exposes and that's what is causing a lot of issues with data quality. And then why now? Well, I think and this is my speculation, but there's so much movement of data moving to the cloud right now. And so anyone who's made big investments in data quality historically for their on-premise data warehouses, Netezzas, Teradatas, Oracles, et cetera or even their data lakes are now moving to the cloud. And they're saying, hmm, what investments are we going to carry forward that we had on premise? And which ones are we going to start a new from and data quality seems to be ripe for something new and so these new investments in data in the cloud are now looking up. Let's look at new next generation method of doing data quality. And that's where we're really fitting in nicely. And of course, finally, you can't really do data governance and cataloging without data quality and data quality without data governance and cataloging is kind of a hollow a long-term story. So the three working together is very a powerful story. >> I got to ask you some Colombo questions about this cause you know, you're right. It's rules-based and so my, you know, immediate like, okay what are the rules around COVID or hybrid work, right? If there's static rules, there's so much unknown and so what you're saying is you've got a dynamic process to do that. So and one of the my gripes about the whole big data thing and you know, you referenced that 2009, 2010, I loved it, because there was a lot of profound things about Hadoop and a lot of failings. And one of the challenges is really that there's no context in the big data system. You know, the data, the folks in the data pipeline, they don't have the business context. So my question is, as you it's and it sounds like you've got this awesome magic to automate, who would adjudicates the dynamic rules? How does, do humans play a role? What role do they play there? >> Absolutely. There's the notion of sampling. So you can only trust a machine for certain point before you want to have some type of a steward or a assisted or supervised learning that goes on. So, you know, suspect maybe one out of 10, one out of 20 rules that are generated, you might want to have somebody look at it. Like there's ways to do the equivalent of supervised learning without actually paying the cost of the supervisor. Let's suppose that you've written a thousand rules for your system that are five years old. And we come in with our ability and we analyze the same data and we generate rules ourselves. We compare the two themselves and there's absolutely going to be some exact matching some overlap that validates one another. And that gives you confidence that the machine learning did exactly what you did and what's likelihood that you guessed wrong and machine learning guessed wrong exactly the right way that seems pretty, pretty small concern. So now you're really saying, well, why are they different? And now you start to study the samples. And what we learned, is that our ability to generate between 60 and 70% of these rules anytime we were different, we were right. Almost every single time, like almost every, like only one out of a hundred where was it proven that the handwritten rule was a more profound outcome. And of course, it's machine learning. So it learned, and it caught up the next time. So that's the true power of this innovation is it learns from the data as well as the stewards and it gives you confidence that you're not missing things and you start to trust it, but you should never completely walk away. You should constantly do your periodic sampling. >> And the secret sauce is math. I mean, I remember back in the mid two thousands it was like 2006 timeframe. You mentioned, you know, auto classification. That was a big problem with the federal rules of civil procedure trying to figure out, okay, you know, had humans classifying humans don't scale, until you had, you know, all kinds of support, vector machines and probabilistic, latent semantic indexing, but you didn't have the compute power or the data corpus to really do it well. So it sounds like a combination of you know, cheaper compute, a lot more data and machine intelligence have really changed the game there. Is that a fair assumption? >> That's absolutely fair. I think the other aspect that to keep in mind is that it's an innovative technology that actually brings all that compute as close into the data as possible. One of the greatest expenses of doing data quality was of course, the profiling concept bringing up the statistics of what the data represents. And in most traditional senses that data is completely pulled out of the database itself, into a separate area and now you start talking about terabytes or petabytes of data that takes a long time to extract that much information from a database and then to process through it all. Imagine bringing that profiling closer into the database, what's happening in the NAPE the same space as the data, that cuts out like 90% of the unnecessary processing speed. It also gives you the ability to do it incrementally. So you're not doing a full analysis each time, you have kind of an expensive play when you're first looking at a full database and then maybe over the course of a day, an hour, 15 minutes you've only seen a small segment of change. So now it feels more like a transactional analysis process. >> Yeah and that's, you know, again, we talked about the old days of big data, you know the Hadoop days and the boat was profound was it was all about bringing five megabytes of code to a petabyte of data, but that didn't happen. We shoved it all into a central data lake. I'm really excited for Collibra. It sounds like you guys are really on the cutting edge and doing some really interesting things. I'll give you the last word, Jim, please bring us on. >> Yeah thanks Dave. So one of the really exciting things about our solution is, it trying to be a combination of best of breed capabilities but also integrated. So to actually create a full and complete story that customers are looking for, you don't want to have them worry about a complex integration in trying to manage multiple vendors and the times of their releases, et cetera. If you can find one customer that you don't have to say well, that's good enough, but every single component is in fact best of breed that you can find in it's integrated and they'll manage it as a service. You truly unlock the power of your data, literate individuals in your organization. And again, that goes back to our overall goal. How do we empower the hundreds of millions of people around the world who are just looking for insightful decision? Did they feel completely locked it's as if they're looking for information before the internet and they're kind of limited to whatever their local library has and if we can truly become somewhat like the internet of data, we make it possible for anyone to access it without controls but we still govern it and secure it for privacy laws, I think we do have a chance to to change the world for better. >> Great. Thank you so much, Jim. Great conversation really appreciate your time and your insights. >> Yeah, thank you, Dave. Appreciate it. >> All right and thank you for watching theCUBE's continuous coverage of Data Citizens'21. My name is Dave Vellante. Keep it right there for more great content. (upbeat music)
SUMMARY :
Brought to you by Collibra. and you're watching theCUBE's and maybe some of the And to make this level So it has to be governed and secured. And of course the big question and it has that level of And so to the extent that you you got to make bets, you know, And the data needs to and that it's going to and frankly, you know how this works is, So and one of the my gripes and it gives you confidence or the data corpus to really do it well. of data that takes a long time to extract Yeah and that's, you know, again, is in fact best of breed that you can find Thank you so much, Jim. you for watching theCUBE's
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Jim Cushman | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Jim | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
90% | QUANTITY | 0.99+ |
Collibra | ORGANIZATION | 0.99+ |
2009 | DATE | 0.99+ |
Oracles | ORGANIZATION | 0.99+ |
Netezzas | ORGANIZATION | 0.99+ |
LGQ | ORGANIZATION | 0.99+ |
Los Angeles | LOCATION | 0.99+ |
Excel | TITLE | 0.99+ |
Teradatas | ORGANIZATION | 0.99+ |
two | QUANTITY | 0.99+ |
2010 | DATE | 0.99+ |
15 minutes | QUANTITY | 0.99+ |
2006 | DATE | 0.99+ |
millions of pieces | QUANTITY | 0.99+ |
millions | QUANTITY | 0.99+ |
tens of millions | QUANTITY | 0.99+ |
an hour | QUANTITY | 0.99+ |
five GLs | QUANTITY | 0.99+ |
Southeast Asia | LOCATION | 0.99+ |
one | QUANTITY | 0.99+ |
four GLs | QUANTITY | 0.99+ |
billions | QUANTITY | 0.99+ |
Hadoop | TITLE | 0.99+ |
hundreds of millions | QUANTITY | 0.98+ |
20 rules | QUANTITY | 0.98+ |
three | QUANTITY | 0.98+ |
70% | QUANTITY | 0.98+ |
each time | QUANTITY | 0.98+ |
one customer | QUANTITY | 0.98+ |
earlier this year | DATE | 0.97+ |
10 | QUANTITY | 0.97+ |
today | DATE | 0.95+ |
a decade ago | DATE | 0.95+ |
first | QUANTITY | 0.95+ |
a day | QUANTITY | 0.95+ |
25 years ago | DATE | 0.94+ |
Collibra | PERSON | 0.94+ |
hundreds of millions of people | QUANTITY | 0.94+ |
four | QUANTITY | 0.94+ |
petabytes | QUANTITY | 0.91+ |
over a decade ago | DATE | 0.9+ |
terabytes | QUANTITY | 0.9+ |
theCUBE | ORGANIZATION | 0.9+ |
five years old | QUANTITY | 0.88+ |
CPO | PERSON | 0.87+ |
Wild Wild West | LOCATION | 0.86+ |
tens of millions of data | QUANTITY | 0.86+ |
One | QUANTITY | 0.84+ |
five generational languages | QUANTITY | 0.83+ |
a thousand rules | QUANTITY | 0.81+ |
single component | QUANTITY | 0.8+ |
60 | QUANTITY | 0.8+ |
last 30 years | DATE | 0.79+ |
Data Citizens'21 | TITLE | 0.78+ |
zero cost | QUANTITY | 0.77+ |
five megabytes of code | QUANTITY | 0.76+ |
OwlDQ | ORGANIZATION | 0.7+ |
single time | QUANTITY | 0.69+ |
Data Citizens '21 | EVENT | 0.67+ |
Chief Product Officer | PERSON | 0.64+ |
hundred | QUANTITY | 0.63+ |
two thousands | QUANTITY | 0.63+ |
Data | EVENT | 0.58+ |
#DataCitizens21 | EVENT | 0.58+ |
petabyte | QUANTITY | 0.49+ |
COVID | OTHER | 0.48+ |
Michele Goetz, VP, Principal Analyst, Forrester Research EDIT
>> From around the globe, it's theCube covering Data Citizens '21, brought to you by Collibra. >> For the past decade, organizations have been effecting very deliberate data strategies investing quite heavily in people, processes, and technology specifically designed to gain insights from data, better serve customers, drive new revenue streams, we've heard this before. The results quite frankly have been mixed. As much of the effort is focused on analytics and technology designed to create a single version of the truth, which in many cases continues to be elusive. Moreover, the world of data is changing, data is increasingly distributed making collaboration in governance more challenging especially where operational use cases are a priority. Hello, everyone, my name is Dave Vellante and you're watching theCube's coverage of Data Citizens '21. And we're pleased to welcome Michele Goetz, who's the Vice President and Principal Analyst at Forrester Research. Hello, Michele, welcome to theCube. >> Hi, Dave thanks for having me today. >> It's our pleasure. So I want to start, you serve have a wide range of roles including enterprise architects, CDOs, chief data officers that is, the analyst et cetera, and many data related functions. And my first question is what are they thinking about today? What's on their minds? These data experts. >> So there's actually two things happening. One is what is the demand that's placed on data for our new intelligent digital systems. So we're seeing a lot of investment and interest in things like edge computing. And then how does that intersect with artificial intelligence to really run your business intelligently and drive new value propositions, to be both adaptive to the market as well as resilient to changes that are unforeseen. The second thing is then you create this massive complexity to managing the data, governing the data, orchestrating the data, because it's not just a centralized data warehouse environment anymore. You have a highly diverse and distributed landscape that you both control internally, as well as taking advantage of third party information. So really what the struggle then becomes is how do you trust the data? How do you govern it and secure or protect that data? And then how do you ensure that it's hyper-contextualized to the types of value propositions that our intelligence systems are going to serve? >> Well, I think you're hitting on the key issues here. I mean, you're right, the data and I sort of refer to this as well as sort of out there it's distributed as at the edge, but generally our data organizations are actually quite centralized. And as well, you talk about the need to trust the data, obviously that's crucial. But are you seeing the organization change? I know you're talking about this to clients, your discussion about collaboration. How are you seeing that change? >> Yeah, so as you have to bring data into context of the insights that you're trying to get or the intelligence that's automating and scaling out the value streams and outcomes within your business. We're actually seeing a federated model emerge in organizations. So while there's still a centralized data management and data services organization led typically by enterprise architects for data, a data engineering team that's managing warehouses and data lakes. They're creating this great platform to access and orchestrate information, but we're also seeing data and analytics and governance teams come together under chief data officers or chief data and analytics officers. And this is really where the insights are being generated from either BI and analytics or from data science itself and having dedicated data engineers and stewards that are helping to access and prepare data for analytic efforts. And then lastly, this is the really interesting part is when you push data into the edge, the goal is that you're actually driving an experience and an application. And so in that case, we are seeing data engineering teams starting to be incorporated into the solutions teams that are aligned to lines of business or divisions themselves. And so really what's happening is if there is a solution consultant who is also overseeing value-based portfolio management when you need to instrument the data to these new use cases and keep up with the pace of the business, it's this engineering team that is part of the DevOps work bench to execute on that. So really the balances we need the core, we need to get to the insights and build our models for AI. And then the next piece is how do you activate all that and there's a team over there to help? So it's really spreading the wealth and expertise where it needs to go. >> Yeah, I love that you to, a couple of things that really resonated with me. You talked about context a couple of times and this notion of a federated model, because historically the sort of big data architecture, the team, they didn't have the context, the business context, and you're the, my inference is that's changing. And I think that's critical. Your talk at Data Citizens is called how obsessive collaboration fuels scalable DataOps. You talk about the data, the DevOps team. What's the premise you put forth to the audience? >> So the point about obsessive collaboration is sort of taking the hubris out of your expertise on the data. Certainly, there's a recognition by data professionals that the business understands and owns their data. They know the semantics, they know the context of it and just receiving the requirements on that was assumed to be okay. And then you could provide a data foundation whether it's just a lake or whether you have a warehouse environment where you're pulling for your analytics. The reality is that as we move into more of AI machine learning type of model, one, more context is necessary and you're kind of balancing between what are the things that you can ascribe to the data globally which is what data engineers can support. And then there's what is unique about the data and the context of about the data that is related to the business value and outcome as well as the feature engineering that is being done on the machine learning models. So there has to be a really tight link and collaboration between the data engineers, the data scientists, and analysts, and the business stakeholders themselves. You see a lot of pods starting up that way to build the intelligence within the system. And then lastly, what do you do with that model? What do you do with that data? What do you do with that insight? You now have to shift your collaboration over to the work bench that is going to pull all these components together to create the experiences and the automation that you're looking for. And that requires a different collaboration model around software development and still incorporating the business expertise from those stakeholders so that you're satisfying, not only the quality of the code to run the solution, but the quality towards the outcome that meets the expectation and the time to value that your stakeholders have. So data teams aren't just sitting in the basement or in another part of the organization and digitally, disconnected anymore. You're finding that they're having to work much more closely and side by side with their colleagues and stakeholders. >> I think it's clear that you understand this space really well, hubris out, context in, I mean, that's kind of what's been lacking. And I'm glad you said, you used the word anymore because I think it's a recognition that that's kind of what it was. They were down in the basement or out in some kind of silo. And I think, and I want to ask you this, I'll come back to organization because I think a lot of organizations, look the most cost effective way for us to serve the businesses to have a single data team with hyper-specialized roles, that'll be the cheapest way, the most efficient way that we can serve them. And meanwhile, the business which as you pointed out has the context is frustrated. They can't get to data. So this notion of a federated governance model is actually quite interesting. Are you seeing actual common use cases where this is being operationalized? >> Absolutely, I think the first place that you were seeing it was within the operational technology use cases. The use cases where a lot of the manufacturing, industrial device, any sort of IoT-based use case really recognized that without applying data and intelligence to whatever process was going to be executed, it was really going to be challenging to know that you're creating the right foundation, meeting the SLA requirements, and then ultimately bringing the right quality and integrity to the data, let alone any sort of data protection and regulatory compliance that has to be necessary. So you already started seeing the solution teams coming together with the data engineers, the solution developers, the analysts, and data scientists, and the business stakeholders to drive that. But that is starting to come back down into more of the IT mindset as well. And so DataOps starts to emerge from that paradigm into more of the corporate types of use cases and sort of parrot that because there are customer experience use cases that have an IoT or edge component to them. We live on our smart phones, we live on our smart watches, we've got our laptops, all of us have been put into virtual collaboration. And so we really need to take into account not just the insight of analytics, but how do you feed that, you know, feed that forward. And so this is really where you're seeing sort of the evolution of DataOps as a competency not only to engineer the data and collaborate, but ensure that there sort of an activation and alignment where the value is going to come out and still being trusted and governed. >> I've got kind of a weird question, but I'm going to (indistinct). I was talking to somebody in Israel the other day and they told me masks are off, the economy's booming. And he noted that Israel said, "Hey, we're going to pay up for the price of a vaccine, the cost per dose around 28 bucks," or whatever it was. And he pointed out that the EU haggled big time and they go, "We're going to pay $19." And as a result, they're not, you know, as far along Israel understood that the real value was opening up the economy. And so there's an analogy here, which I want to come back to my organization and it relates to the DataOps. If the real metric is, "Hey, I have an idea for a data product." How long does it take to go from idea to monetization? That seems to me to be a better KPI than, you know, how much storage I have or how much petabytes I'm managing. So my question is, and it relates to DataOps, can that DataOps, should that DataOps individual maybe live and then maybe even the data engineer live inside of the business and is that even feasible technically with this notion of federated governance? Are you seeing that? And maybe talk a little bit more about this DataOps role. Is it-- >> Yeah. >> Fungible? >> Yeah, it's definitely fungible. And in fact, when I talked about sort of those three units of there's your core enterprise data services, there's your BI and data and then there's your line of business. All of those, the engineering and the ops is the DataOps which is living in all of those environments and being as close as possible to where the value proposition is being defined and designed. So absolutely being able to federate that. And I think the other piece on DataOps that is really important is recognizing how the practices around continuous integration and continuous deployment using agile methodologies is really reshaping a lot of the waterfall approaches that were done before where data was lagging 12 to 18 months behind any sort of insights, but a lot of the platforms today assume that you're moving into a standard mature software development life cycle. And you can start seeing returns on investment within a quarter really, so that you can iterate and then speed that up so that you're delivering new value every two weeks. But it does change the mindset, this DataOps team align to solution development, align to a broader portfolio management of business capabilities and outcomes needs to understand how to appropriately stop the data products that they're delivering to incremental value based milestones. So the business feels that they're getting improvements over time and not just waiting. So there's an MVP, you move forward on that and optimize, optimize, extend scale. So again, that CICD mindset is helping to not bottleneck and wait for the complete field of dreams to come from your data and your insights. >> Thank you for that, Michele. I want to come back to this idea of collaboration 'cause over the last decade, we've seen attempts. I've seen software come out to try to help the various roles, collaborate and some of it's been okay, but you have these hyper-specialized roles. You've got data scientists, data engineers, quality engineers, analysts, et cetera. And they tend to be in their own little worlds. But at the end of the day, we rely on them all to get answers. So how can these data scientists, all these stewards, how can they collaborate better? What are you seeing there? >> You need to get them onto the same process, that's really what it comes down to. If you're working from different points of view, that's one thing. But if you're working from different processes, collaborating is really challenging. And I think the one thing that's really come out of this move to machine learning and AI is recognizing that you need processes that reinforce collaboration. So that's number one. So you see agile development in CICD not just for DataOps, not just for DevOps, but also encouraging and propelling these projects and iterations before the data science teams as well or even if there's machine learning engineers incorporated. And then, certainly the business stakeholders are inserted within there as appropriate to accept what it is that is going to be developed. So process is number one. Number two is what is the platform that's going to reinforce those processes and collaboration. And it's really about what's being shared. How do you share? So certainly what we're seeing within the platforms themselves is everybody contributing into some sort of a library where their components and products are being ascribed to and then that's able to help different teams grab those components and build out what those solutions are going to be. And in fact, what gets really cool about that is you don't always need hardcore data scientists anymore as you have this social platform for data product and analytic product development. This is where a lot of the auto ML begins because those who are less data science oriented but can build an insight pipeline, can grab all the different components from the pipelines to the transformations, to capture mechanisms, to bolting into the model itself and allowing that to be delivered to the application. So really kind of balancing out between process and platforms that enable and encourage and almost force you to collaborate and manage through sharing. >> Thank you for that I want to ask you about the role of data governance. You've mentioned trust and that's data quality and you've got teams that are focused on and specialists focused on data quality. There's the data catalog and here's my question. You mentioned edge a couple of times and I can see a lot of that. I mean, today, most AI is a lot of the AI, I would say most is modeling. And in the future, you mentioned edge. It's going to be a lot of inferencing in real-time. And you know people maybe not going to have the time or be involved in that decision. So what are you seeing in terms of data governance, federate, we talked about federated governance, this notion of a data catalog and maybe automating data quality without necessarily having it be so labor-intensive. What are you seeing trends there? >> Yeah, so I think our new environment, our new normal is that you have to be composable, interoperable, and portable. Portability is really the key here. So from a cataloging perspective, in governance we would bring everything together into our catalogs and business glossaries. And it would be a reference point. It was like a massive Wiki. Well, that's wonderful, but why just how's it in a museum you really want to activate that. And I think what's interesting about the technologies today for governance is that you can turn those rules and business logic and policies into services that are composable components and bring those into the solutions that you're defining. And in that way, what happens is that creates portability. You can drive them wherever they need to go. But from the composability and the interoperability portion of that, you can put those services in the right place at the right time for what you need for an outcome so that you start to become behaviorally-driven on executing on governance, rather than trying to write all of the governance down into transformations and controls to where the data lives. You can have quality and observability of that quality and performance right at the edge in context of behavior and use of that solution. You can run those services and in governance on gateways that are managing and routing information at those edge solutions and where synchronization between the edge and the cloud comes up. And if it's appropriate during synchronization of the data back into the data lake, you can run those services there. So there's a lot more flexibility and elasticity for today's modern approaches to cataloging and glossaries and governance of data than we had before. And that goes back into what we talked about earlier of like this is the new wave of DataOps. This is how you bring data products to fruition now everything is about activation. >> So how do you see the future of DataOps? I mean, I kind of been pushing you to a more decentralized model where the business has more control 'cause the business has the context. I mean, I feel as though, hey, we've done a great job of contextualizing our operational systems. The sales team, they know when the data is crap within my CRM, but our data systems are context agnostic, which you know, generally and you obviously understand that problem well but so how do you see the future of DataOps? >> So I think what's kind of interesting about that is we're going to go to governance on greed versus governance on right, more so. What do I mean by that? That means that from a business perspective there's two sides of it. There's ensuring that where governance is run as we talked about before executing at the appropriate place at the appropriate time. It's semantically domain centric driven not logical and systems centric. So that's number one. Number two is also recognizing that business owners or business operations actually plays a role in this because as you're working within your CRM systems like a Salesforce, for example, you're using an I-PASS environment MuleSoft to connect to other applications, connect to other data sources, connect to other analytics sources, and what's happening there is that the data is being modeled and personalized to whatever view, insight, or task has to happen within those processes. So even CRM environments where we think of as sort of traditional technologies that we're used to are getting a lift to both in terms of intelligence from the data but also your flexibility and how you execute governance and quality services within that environment. And that actually opens up the data foundations a lot more and avoids you from having to do a lot of moving, copying, centralizing data, and creating an over-weighted business application and an over, you know, both in terms of the data foundation but also in terms of the types of business services and status updates and processes that happen in the application itself. You're drawing those tasks back down to where they should be and where performance can be managed rather than trying to over customize your application environment. And that gives you a lot more flexibility later too for any sort of upgrades or migrations that you want to make because all of the logic is contained back down in a service layer instead. >> Great perspectives, Michele, you obviously know your stuff and it's been a pleasure having you on. My last question is when you look out there anything that really excites you or any specific research that you're working on that you want to share that you're super-pumped about. >> I think there's two things. One is it's truly incredible the amount of insight and growth that is coming through data profiling and observation, really understanding and contextualizing data anomalies so that you understand is data helping or hurting the business value. And, you know tying it very specifically to processes and metrics which is fantastic as well as models themselves like really understanding how data inputs and outputs are making a difference whether the model performs or not. And then I think the second thing is really the emergence of more active data, active insights, as what we talked about before your ability to package up services for governance and quality in particular that allow you to scale your data out towards the edge or where it's needed and doing so, you know not just so that you can run analytics but that you're also driving overall processes and value. So the research around the operationalization and activation of data is really exciting. And looking at the networks and service mesh to bring those things is kind of where I'm focusing right now because what's the point of having data in a database if it's not providing any value. >> Michele Goetz, Forrester Research, thanks so much for coming on theCube really awesome perspectives. You're in an exciting space. So appreciate your time. >> Absolutely, thank you. >> And thank you for watching Data Citizens '21 on theCube. My name is Dave Vellante. (upbeat music)
SUMMARY :
brought to you by Collibra. of the truth, which in many So I want to start, you that you both control internally, and I sort of refer to this and stewards that are helping to access What's the premise you and the time to value that you understand and the business and it relates to the DataOps. and the ops is the DataOps And they tend to be in and allowing that to be And in the future, you mentioned edge. and controls to where the data lives. and you obviously understand And that gives you a lot and it's been a pleasure having you on. not just so that you can run analytics So appreciate your time. And thank you for watching
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Michele | PERSON | 0.99+ |
Michele Goetz | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
$19 | QUANTITY | 0.99+ |
Israel | LOCATION | 0.99+ |
12 | QUANTITY | 0.99+ |
first question | QUANTITY | 0.99+ |
EU | ORGANIZATION | 0.99+ |
two sides | QUANTITY | 0.99+ |
first | QUANTITY | 0.99+ |
Forrester Research | ORGANIZATION | 0.99+ |
two things | QUANTITY | 0.99+ |
Data Citizens | ORGANIZATION | 0.99+ |
Forrester Research | ORGANIZATION | 0.99+ |
One | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
Collibra | ORGANIZATION | 0.99+ |
18 months | QUANTITY | 0.99+ |
second thing | QUANTITY | 0.98+ |
both | QUANTITY | 0.98+ |
second thing | QUANTITY | 0.98+ |
Data Citizens '21 | TITLE | 0.97+ |
around 28 bucks | QUANTITY | 0.96+ |
DataOps | ORGANIZATION | 0.94+ |
Israel | ORGANIZATION | 0.93+ |
one | QUANTITY | 0.93+ |
three units | QUANTITY | 0.9+ |
one thing | QUANTITY | 0.88+ |
Salesforce | TITLE | 0.86+ |
CICD | ORGANIZATION | 0.84+ |
single version | QUANTITY | 0.83+ |
past decade | DATE | 0.83+ |
two weeks | QUANTITY | 0.82+ |
theCube | ORGANIZATION | 0.81+ |
Number two | QUANTITY | 0.79+ |
agile | TITLE | 0.77+ |
a quarter | QUANTITY | 0.75+ |
edge | ORGANIZATION | 0.7+ |
wave of | EVENT | 0.69+ |
times | QUANTITY | 0.69+ |
single data team | QUANTITY | 0.67+ |
DataOps | TITLE | 0.65+ |
last decade | DATE | 0.64+ |
DevOps | TITLE | 0.64+ |
MuleSoft | ORGANIZATION | 0.64+ |
Vice President | PERSON | 0.56+ |
DataOps | EVENT | 0.53+ |
couple | QUANTITY | 0.4+ |
Felix Van de Maele, CEO, Collibra
(upbeat music) >> At the beginning of last decade technology industry was a buzzing because we were on the cusp of a new era of data. The promise of so-called big data was that it would enable data-driven organizations to tap a new form of competitive advantage. Namely insights from data at a much lower cost. The problem was data became plentiful, but insights, they remain scarce. A rash of technical complexity combined with a lack of trust due to conflicting data sources and inconsistent definitions led to the same story that we've heard for decades. We spent a ton of time and money to create a single version of the truth. And we're further away than we've ever been before. Maybe as an industry, we should be approaching this problem differently. Perhaps it should start with the idea that we have to change the way we serve business users i.e. those who understand data context. And with me, to discuss the evolving data space, his company and the upcoming Data Citizens Conference is Felix Van De Maele, the CEO and Founder, of Collibra. Felix, welcome. Great to see you. >> Great to see you. Great to be here. >> So tell us a little bit about Collibra and the problem that you're solving. Maybe you could double click on my upfront narrative. >> Yeah, I think you said it really well. We've seen so much innovation over the last couple of years in data, the exploding volume complexity of data. We've seen a lot of innovation of how to store and process that data, that volume of data more effectively, more cost-effectively. But fundamentally the source of the problem as being able to really derive insights from that data effectively when it's for an AI model or for reporting is still as difficult as it was let's say 10 years ago. And it only... In a way it's only become more difficult. And so what we fundamentally believe is that next to that innovation on the infrastructure side of data you really need to look at the people on process side of data. There are so many more people that today consume and produce data to do their job. That's why we talk about data citizens. They have to make it easier for them to find the right data in a way that they can trust that there's confidence in that data to be able to make decisions and to be able to trust the algorithm of that model. And that's really what Collibra is focused on. Initially, around governance. How do you make sure people actually or companies know what data they have and make sure they can trust it and they can use it in a compliant way. And now we've extended that into the only data intelligence platform today in the industry where we just make it easier for organizations to truly unite around the data across the whole organization. wherever that data stored on premise and the cloud whoever is actually using or consuming that data. That's why we talk about data citizens. >> I think you're right. I think yours is more complex. There's more of it. And there's more pressure on individuals to get advantage from it. But I want to ask you, what sets Collibra apart because I'd like you to explain why you're not just another data company chasing a problem with it's going to be an incremental solution, it's really not going to change anything. What sets Collibra apart? >> Yeah, that's a really good question. And what fundamentally sets us apart, or makes us unique is that we look at data or the problem around data as truly a business owner and a business function. So we fundamentally believe that if you believe that data is an asset, you really have to run it as a strategic business function. Just like you run your HR function, your people function, your IT function your sales and marketing function. You have a system to run that function. And you have Salesforce to run sales and marketing. You have service now to run your IT function. You have word day to run your people function. Like you need the same system to really run your data function. And that's really how we think about Collibra. So we're not another kind of faster better database. We're not another data management tool that makes the life of a single individual easier. We're truly a business application that focuses on how do we bring people together and effective rates so that they can collaborate around the data. It creates efficiency. So you don't have to do things ad hoc. You can easily find the right information. You can collaborate effectively. And it creates the confidence to actually be able to do something with the outcomes or with the results of all of that work. And so fundamentally, looking at the problem as a business function that needs a business system. We call it the system of record or system of engagement. For the data function, I think it's absolutely a critical and really unique in our approach. >> So Data Citizens your big user conference. Data Citizens '21 it's coming up June 16th and 17th cubes stoked because we love talking about data. This is the first time we're bringing theCUBE to that event. And so we're really gearing up for it. And I wonder if you can tell us a little bit about the history and the evolution of the Data Citizens conference? >> Absolutely. I think the first one it started six years ago where we had a small event at a hotel downtown New York mostly customers as their user conference, a lot of the banks, which are at the time are the main customers at 60 people. So very small events. And it's exploded ever since this year, we expect over 5,000 people. So it's really expanded beyond just a user conference to really become more of almost a community conference and an industry conference. So we're really excited. A big part of what we do, why we care so much about the conference. That's an opportunity to build that data citizens community. That's where we hear from our customers, from all attendees that come to the conference, bring those people together that all care about the same topic and are passionate about doing more with data, being able to connect people together as a big part of that. So we've always... We're always looking forward to event from that perspective. >> Well, a lot of competition of course, for virtual events these days with them. What's in it for me? Who should attend? And what can attendees expect from Data Citizens '21? >> Yeah, absolutely. The good thing about the virtual event is that everybody can attend. It's free, it's open from across the world, of course. But what we want for people to take away as attendees is that you learn something pragmatic. So the next day on the job, you can do something. You've learned something very specific. We've also been excited and looked at what is possible from an innovation perspective? And so that's how we look at the event. We bring a lot of customers and organization that are going to share their best practices. Very specifically, how they're handling data governance. How they're doing data cataloging. How they're doing data privacy. So very specific best practices and tips on how to be successful, but then also industry experts that can paint a picture of where we're going as an industry, what are the best practices? What do we need to think about today to be ready for what's going to come tomorrow? So that's a big focus. We, of course, we're going to talk about Collibra and our product. What do we have in store from a product roadmap. And innovation perspective, how we're helping these organizations get there faster and all that aspect as we bring in a lot of partners as well. And so that's a big part of that broader ecosystem which is really interesting. And I finally, like I said it's really around the community. That's what we hear continuously from the attendees. Just being able to make these connections, learn new people, learn what they're doing how they've kind of solved certain challenges. We hear that's a really big part of the value proposition. So as an attendee, the good thing is you can join from anywhere. All of the content is going to be available on demand. So later it's going to be available for you to have to look at as well. Plus you're going to be part, or you're going to become part of that data citizens community. Which is a really thriving and growing community where you're going to find a lot of like-minded people with the same passion, the same interest, that we can all learn a lot from. >> I rather like the term data citizen. I consider myself a data citizen and it has implications just in terms of putting data in the hands of business users. So it's just sort of central to this event, obviously. What is a data citizen to Collibra? >> Yeah. It's a really core part of our mission and our vision that we believe that today everyone needs data to do their job. Everyone in that sense has become a data citizen in the sense that they need to be able to easily access trustworthy data. We have to make it easy for people to easily find the right data that they can trust, that they can understand and they can do something with and make their job easier. On the other hand, like a citizen, you have rights and you have responsibilities. As a data citizen, you also have the responsibility to treat that data in the right way. To make sure from a privacy and security perspective, that data is as again like I said, treated in the right way. And so that combination of making it easy, making it accessible, democratizing it but also making sure we treat data in the right way is really important. And it's a core part of what we believe that everyone is going to become a data citizen. And so that's a big part of our mission. >> I like that. We're going to enter into a contract. I'll do my part and you'll give me access to that data. I think that's a great philosophy. So the call to action here, June 16th and 17th go register at citizens.collibra.com go register because it's not just the normal mumbo jumbo. You're going to get some really interesting data. Felix, I'll give you the last word. >> No, like I said, like you said, go register. It's a great event. It's a great community to be part of at June 16th and 17th you can block it in your calendar. So go to citizens.collibra.com. It's going to be a great event. >> Well, thanks for helping us preview this event. It's going to be a great event that we're really excited about. Felix, great to see you. And we'll see you on June 16th and 17th. >> Absolutely. >> All right. Thanks for watching everyone. This is Dave Vellante for theCUBE. We'll see you next time. (upbeat music)
SUMMARY :
and the upcoming Data Citizens Conference Great to be here. and the problem that you're solving. in that data to be able to make decisions it's really not going to change anything. And it creates the confidence to actually and the evolution of the a lot of the banks, And what can attendees expect and tips on how to be successful, What is a data citizen to Collibra? in the sense that they need to be able So the call to action here, It's a great community to be part of It's going to be a great event We'll see you next time.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Felix Van De Maele | PERSON | 0.99+ |
Felix Van de Maele | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Felix | PERSON | 0.99+ |
June 16th | DATE | 0.99+ |
citizens.collibra.com | OTHER | 0.99+ |
17th | DATE | 0.99+ |
tomorrow | DATE | 0.99+ |
60 people | QUANTITY | 0.99+ |
Collibra | ORGANIZATION | 0.99+ |
this year | DATE | 0.99+ |
six years ago | DATE | 0.99+ |
Data Citizens | EVENT | 0.98+ |
over 5,000 people | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
first one | QUANTITY | 0.98+ |
last decade | DATE | 0.97+ |
first time | QUANTITY | 0.97+ |
Data Citizens '21 | EVENT | 0.96+ |
New York | LOCATION | 0.96+ |
10 years ago | DATE | 0.92+ |
next day | DATE | 0.9+ |
decades | QUANTITY | 0.88+ |
single version | QUANTITY | 0.87+ |
Data Citizens Conference | EVENT | 0.86+ |
single individual | QUANTITY | 0.69+ |
more people | QUANTITY | 0.69+ |
years | DATE | 0.61+ |
ton | QUANTITY | 0.6+ |
double | QUANTITY | 0.58+ |
last couple | DATE | 0.52+ |
Salesforce | ORGANIZATION | 0.52+ |