Jim Cushman, CPO, Collibra

>> From around the globe, it's theCUBE, covering Data Citizens'21. Brought to you by Collibra. >> We're back talking all things data at Data Citizens '21. My name is Dave Vellante and you're watching theCUBE's continuous coverage, virtual coverage #DataCitizens21. I'm here with Jim Cushman who is Collibra's Chief Product Officer who shared the company's product vision at the event. Jim, welcome, good to see you. >> Thanks Dave, glad to be here. >> Now one of the themes of your session was all around self-service and access to data. This is a big big point of discussion amongst organizations that we talk to. I wonder if you could speak a little more toward what that means for Collibra and your customers and maybe some of the challenges of getting there. >> So Dave our ultimate goal at Collibra has always been to enable service access for all customers. Now, one of the challenges is they're limited to how they can access information, these knowledge workers. So our goal is to totally liberate them and so, why is this important? Well, in and of itself, self-service liberates, tens of millions of data lyric knowledge workers. This will drive more rapid, insightful decision-making, it'll drive productivity and competitiveness. And to make this level of adoption possible, the user experience has to be as intuitive as say, retail shopping, like I mentioned in my previous bit, like you're buying shoes online. But this is a little bit of foreshadowing and there's even a more profound future than just enabling a self-service, that we believe that a new class of shopper is coming online and she may not be as data-literate as our knowledge worker of today. Think of her as an algorithm developer, she builds machine learning or AI. The engagement model for this user will be, to kind of build automation, personalized experiences for people to engage with data. But in order to build that automation, she too needs data. Because she's not data literate, she needs the equivalent of a personal shopper. Someone that can guide her through the experience without actually having her know all the answers to the questions that would be asked. So this level of self-service goes one step further and becomes an automated service. One to really help find the best unbiased in a labeled training data to help train an algorithm in the future. >> That's, okay please continue. >> No please, and so all of this self and automated service, needs to be complemented with kind of a peace of mind that you're letting the right people gain access to it. So when you automate it, it's like, well, geez are the right people getting access to this. So it has to be governed and secured. This can't become like the Wild Wild West or like a data, what we call a data flea market or you know, data's everywhere. So, you know, history does quickly forget the companies that do not adjust to remain relevant. And I think we're in the midst of an exponential differentiation in Collibra data intelligence cloud is really kind of established to be the key catalyst for companies that will be on the winning side. >> Well, that's big because I mean, I'm a big believer in putting data in the hands of those folks in the line of business. And of course the big question that always comes up is, well, what about governance? What about security? So to the extent that you can federate that, that's huge. Because data is distributed by its very nature, it's going to stay that way. It's complex. You have to make the technology work in that complex environment, which brings me to this idea of low code or no code. It's gaining a lot of momentum in the industry. Everybody's talking about it, but there are a lot of questions, you know, what can you actually expect from no code and low code who were the right, you know potential users of that? Is there a difference between low and no? And so from your standpoint, why is this getting so much attention and why now, Jim? >> You don't want me to go back even 25 years ago we were talking about four and five generational languages that people were building. And it really didn't re reach the total value that folks were looking for because it always fell short. And you'd say, listen, if you didn't do all the work it took to get to a certain point how are you possibly going to finish it? And that's where the four GLs and five GLs fell short as capability. With our stuff where if you really get a great self-service how are you going to be self-service if it still requires somebody right though? Well, I guess you could do it if the only self-service people are people who write code, well, that's not bad factor. So if you truly want the ability to have something show up at your front door, without you having to call somebody or make any efforts to get it, then it needs to generate itself. The beauty of doing a catalog, new governance, understanding all the data that is available for choice, giving someone the selection that is using objective criteria, like this is the best objective cause if it's quality for what you want or it's labeled or it's unbiased and it has that level of deterministic value to it versus guessing or civic activity or what my neighbor used or what I used on my last job. Now that we've given people the power with confidence to say, this is the one that I want, the next step is okay, can you deliver it to them without them having to write any code? So imagine being able to generate those instructions from everything that we have in our metadata repository to say this is exactly the data I need you to go get and perform what we call a distributed query against those data sets and bringing it back to them. No code written. And here's the real beauty Dave, pipeline development, data pipeline development is a relatively expensive thing today and that's why people spend a lot of money maintaining these pipelines but imagine if there was zero cost to building your pipeline would you spend any money to maintain it? Probably not. So if we can build it for no cost, then why maintain it? Just build it every time you need it. And it then again, done on a self-service basis. >> I really liked the way you're thinking about this cause you're right. A lot of times when you hear self self-service it's about making the hardcore developers, you know be able to do self service. But the reality is, and you talk about that data pipeline it's complex a business person sitting there waiting for data or wants to put in new data and it turns out that the smallest unit is actually that entire team. And so you sit back and wait. And so to the extent that you can actually enable self-serve for the business by simplification that is it's been the holy grail for a while, isn't it? >> I agree. >> Let's look a little bit dig into where you're placing your bets. I mean, your head of products, you got to make bets, you know, certainly many many months if not years in advance. What are your big focus areas of investment right now? >> Yeah, certainly. So one of the things we've done very successfully since our origin over a decade ago, was building a business user-friendly software and it was predominantly kind of a plumbing or infrastructure area. So, business users love working with our software. They can find what they're looking for and they don't need to have some cryptic key of how to work with it. They can think about things in their terms and use our business glossary and they can navigate through what we call our data intelligence graph and find just what they're looking for. And we don't require a business to change everything just to make it happen. We give them kind of a universal translator to talk to the data. But with all that wonderful usability the common compromise that you make as well, its only good up to a certain amount of information, kind of like Excel. You know, you can do almost anything with Excel, right? But when you get to into large volumes, it becomes problematic and now you need that, you know go with a hardcore database and application on top. So what the industry is pulling us towards is far greater amounts of data not that just millions or even tens of millions but into the hundreds of millions and billions of things that we need to manage. So we have a huge focus on scale and performance on a global basis and that's a mouthful, right? Not only are you dealing with large amounts at performance but you have to do it in a global fashion and make it possible for somebody who might be operating in a Southeast Asia to have the same experience with the environment as they would be in Los Angeles. And the data needs to therefore go to the user as opposed to having the user come to the data as much as possible. So it really does put a lot of emphasis on some of what you call the non-functional requirements also known as the ilities and so our ability to bring the data and handle those large enterprise grade capabilities at scale and performance globally is what's really driving a good number of our investments today. >> I want to talk about data quality. This is a hard topic, but it's one that's so important. And I think it's been really challenging and somewhat misunderstood when you think about the chief data officer role itself, it kind of emerged from these highly regulated industries. And it came out of the data quality, kind of a back office role that's kind of gone front and center and now is, you know pretty strategic. Having said that, the you know, the prevailing philosophy is okay, we got to have this centralized data quality approach and that it's going to be imposed throughout. And it really is a hard problem and I think about, you know these hyper specialized roles, like, you know the quality engineer and so forth. And again, the prevailing wisdom is, if I could centralize that it can be lower cost and I can service these lines of business when in reality, the real value is, you know speed. And so how are you thinking about data quality? You hear so much about it. Why is it such a big deal and why is it so hard in a priority in the marketplace? You're thoughts. >> Thanks for that. So we of course acquired a data quality company, not burying delete, earlier this year LGQ and the big question is, okay, so why, why them and why now, not before? Well, at least a decade ago you started hearing people talk about big data. It was probably around 2009, it was becoming the big talk and what we don't really talk about when we talk about this ever expanding data, the byproduct is, this velocity of data, is increasing dramatically. So the speed of which new data is being presented the way in which data is changing is dramatic. And why is that important to data quality? Cause data quality historically for the last 30 years or so has been a rules-based business where you analyze the data at a certain point in time and you write a rule for it. Now there's already a room for error there cause humans are involved in writing those rules, but now with the increased velocity, the likelihood that it's going to atrophy and become no longer a valid or useful rule to you increases exponentially. So we were looking for a technology that was doing it in a new way similar to the way that we do auto classification when we're cataloging attributes is how do we look at millions of pieces of information around metadata and decide what it is to put it into context? The ability to automatically generate these rules and then continuously adapt as data changes to adjust these rules, is really a game changer for the industry itself. So we chose OwlDQ for that very reason. It's not only where they had this really kind of modern architecture to automatically generate rules but then to continuously monitor the data and adjust those rules, cutting out the huge amounts of costs, clearly having rules that aren't helping you save and frankly, you know how this works is, you know no one really complains about it until there's the squeaky wheel, you know, you get a fine or exposes and that's what is causing a lot of issues with data quality. And then why now? Well, I think and this is my speculation, but there's so much movement of data moving to the cloud right now. And so anyone who's made big investments in data quality historically for their on-premise data warehouses, Netezzas, Teradatas, Oracles, et cetera or even their data lakes are now moving to the cloud. And they're saying, hmm, what investments are we going to carry forward that we had on premise? And which ones are we going to start a new from and data quality seems to be ripe for something new and so these new investments in data in the cloud are now looking up. Let's look at new next generation method of doing data quality. And that's where we're really fitting in nicely. And of course, finally, you can't really do data governance and cataloging without data quality and data quality without data governance and cataloging is kind of a hollow a long-term story. So the three working together is very a powerful story. >> I got to ask you some Colombo questions about this cause you know, you're right. It's rules-based and so my, you know, immediate like, okay what are the rules around COVID or hybrid work, right? If there's static rules, there's so much unknown and so what you're saying is you've got a dynamic process to do that. So and one of the my gripes about the whole big data thing and you know, you referenced that 2009, 2010, I loved it, because there was a lot of profound things about Hadoop and a lot of failings. And one of the challenges is really that there's no context in the big data system. You know, the data, the folks in the data pipeline, they don't have the business context. So my question is, as you it's and it sounds like you've got this awesome magic to automate, who would adjudicates the dynamic rules? How does, do humans play a role? What role do they play there? >> Absolutely. There's the notion of sampling. So you can only trust a machine for certain point before you want to have some type of a steward or a assisted or supervised learning that goes on. So, you know, suspect maybe one out of 10, one out of 20 rules that are generated, you might want to have somebody look at it. Like there's ways to do the equivalent of supervised learning without actually paying the cost of the supervisor. Let's suppose that you've written a thousand rules for your system that are five years old. And we come in with our ability and we analyze the same data and we generate rules ourselves. We compare the two themselves and there's absolutely going to be some exact matching some overlap that validates one another. And that gives you confidence that the machine learning did exactly what you did and what's likelihood that you guessed wrong and machine learning guessed wrong exactly the right way that seems pretty, pretty small concern. So now you're really saying, well, why are they different? And now you start to study the samples. And what we learned, is that our ability to generate between 60 and 70% of these rules anytime we were different, we were right. Almost every single time, like almost every, like only one out of a hundred where was it proven that the handwritten rule was a more profound outcome. And of course, it's machine learning. So it learned, and it caught up the next time. So that's the true power of this innovation is it learns from the data as well as the stewards and it gives you confidence that you're not missing things and you start to trust it, but you should never completely walk away. You should constantly do your periodic sampling. >> And the secret sauce is math. I mean, I remember back in the mid two thousands it was like 2006 timeframe. You mentioned, you know, auto classification. That was a big problem with the federal rules of civil procedure trying to figure out, okay, you know, had humans classifying humans don't scale, until you had, you know, all kinds of support, vector machines and probabilistic, latent semantic indexing, but you didn't have the compute power or the data corpus to really do it well. So it sounds like a combination of you know, cheaper compute, a lot more data and machine intelligence have really changed the game there. Is that a fair assumption? >> That's absolutely fair. I think the other aspect that to keep in mind is that it's an innovative technology that actually brings all that compute as close into the data as possible. One of the greatest expenses of doing data quality was of course, the profiling concept bringing up the statistics of what the data represents. And in most traditional senses that data is completely pulled out of the database itself, into a separate area and now you start talking about terabytes or petabytes of data that takes a long time to extract that much information from a database and then to process through it all. Imagine bringing that profiling closer into the database, what's happening in the NAPE the same space as the data, that cuts out like 90% of the unnecessary processing speed. It also gives you the ability to do it incrementally. So you're not doing a full analysis each time, you have kind of an expensive play when you're first looking at a full database and then maybe over the course of a day, an hour, 15 minutes you've only seen a small segment of change. So now it feels more like a transactional analysis process. >> Yeah and that's, you know, again, we talked about the old days of big data, you know the Hadoop days and the boat was profound was it was all about bringing five megabytes of code to a petabyte of data, but that didn't happen. We shoved it all into a central data lake. I'm really excited for Collibra. It sounds like you guys are really on the cutting edge and doing some really interesting things. I'll give you the last word, Jim, please bring us on. >> Yeah thanks Dave. So one of the really exciting things about our solution is, it trying to be a combination of best of breed capabilities but also integrated. So to actually create a full and complete story that customers are looking for, you don't want to have them worry about a complex integration in trying to manage multiple vendors and the times of their releases, et cetera. If you can find one customer that you don't have to say well, that's good enough, but every single component is in fact best of breed that you can find in it's integrated and they'll manage it as a service. You truly unlock the power of your data, literate individuals in your organization. And again, that goes back to our overall goal. How do we empower the hundreds of millions of people around the world who are just looking for insightful decision? Did they feel completely locked it's as if they're looking for information before the internet and they're kind of limited to whatever their local library has and if we can truly become somewhat like the internet of data, we make it possible for anyone to access it without controls but we still govern it and secure it for privacy laws, I think we do have a chance to to change the world for better. >> Great. Thank you so much, Jim. Great conversation really appreciate your time and your insights. >> Yeah, thank you, Dave. Appreciate it. >> All right and thank you for watching theCUBE's continuous coverage of Data Citizens'21. My name is Dave Vellante. Keep it right there for more great content. (upbeat music)

Published Date : Jun 17 2021

SUMMARY :

Brought to you by Collibra. and you're watching theCUBE's and maybe some of the And to make this level So it has to be governed and secured. And of course the big question and it has that level of And so to the extent that you you got to make bets, you know, And the data needs to and that it's going to and frankly, you know how this works is, So and one of the my gripes and it gives you confidence or the data corpus to really do it well. of data that takes a long time to extract Yeah and that's, you know, again, is in fact best of breed that you can find Thank you so much, Jim. you for watching theCUBE's

ENTITIES

Entity	Category	Confidence
Jim Cushman	PERSON	0.99+
Dave	PERSON	0.99+
Jim	PERSON	0.99+
Dave Vellante	PERSON	0.99+
90%	QUANTITY	0.99+
Collibra	ORGANIZATION	0.99+
2009	DATE	0.99+
Oracles	ORGANIZATION	0.99+
Netezzas	ORGANIZATION	0.99+
LGQ	ORGANIZATION	0.99+
Los Angeles	LOCATION	0.99+
Excel	TITLE	0.99+
Teradatas	ORGANIZATION	0.99+
two	QUANTITY	0.99+
2010	DATE	0.99+
15 minutes	QUANTITY	0.99+
2006	DATE	0.99+
millions of pieces	QUANTITY	0.99+
millions	QUANTITY	0.99+
tens of millions	QUANTITY	0.99+
an hour	QUANTITY	0.99+
five GLs	QUANTITY	0.99+
Southeast Asia	LOCATION	0.99+
one	QUANTITY	0.99+
four GLs	QUANTITY	0.99+
billions	QUANTITY	0.99+
Hadoop	TITLE	0.99+
hundreds of millions	QUANTITY	0.98+
20 rules	QUANTITY	0.98+
three	QUANTITY	0.98+
70%	QUANTITY	0.98+
each time	QUANTITY	0.98+
one customer	QUANTITY	0.98+
earlier this year	DATE	0.97+
10	QUANTITY	0.97+
today	DATE	0.95+
a decade ago	DATE	0.95+
first	QUANTITY	0.95+
a day	QUANTITY	0.95+
25 years ago	DATE	0.94+
Collibra	PERSON	0.94+
hundreds of millions of people	QUANTITY	0.94+
four	QUANTITY	0.94+
petabytes	QUANTITY	0.91+
over a decade ago	DATE	0.9+
terabytes	QUANTITY	0.9+
theCUBE	ORGANIZATION	0.9+
five years old	QUANTITY	0.88+
CPO	PERSON	0.87+
Wild Wild West	LOCATION	0.86+
tens of millions of data	QUANTITY	0.86+
One	QUANTITY	0.84+
five generational languages	QUANTITY	0.83+
a thousand rules	QUANTITY	0.81+
single component	QUANTITY	0.8+
60	QUANTITY	0.8+
last 30 years	DATE	0.79+
Data Citizens'21	TITLE	0.78+
zero cost	QUANTITY	0.77+
five megabytes of code	QUANTITY	0.76+
OwlDQ	ORGANIZATION	0.7+
single time	QUANTITY	0.69+
Data Citizens '21	EVENT	0.67+
Chief Product Officer	PERSON	0.64+
hundred	QUANTITY	0.63+
two thousands	QUANTITY	0.63+
Data	EVENT	0.58+
#DataCitizens21	EVENT	0.58+
petabyte	QUANTITY	0.49+
COVID	OTHER	0.48+

Jim Cushman Product strategy vision | Data Citizens'21

>>Hi everyone. And welcome to data citizens. Thank you for making the time to join me and the over 5,000 data citizens like you that are looking to become United by data. My name is Jim Cushman. I serve as the chief product officer at Collibra. I have the benefit of sharing with you, the product, vision, and strategy of Culebra. There's several sections to this presentation, and I can't wait to share them with you. The first is a story of how we're taking a business user and making it possible for him or her data, use data and gain. And if it and insight from that data, without relying on anyone in the organization to write code or do the work for them next I'll share with you how Collibra will make it possible to manage metadata at scales, into the billions of assets. And again, load this into our software without writing any code third, I will demonstrate to you the integration we have already achieved with our newest product release it's data quality that's powered by machine learning. >>Right? Finally, you're going to hear about how Colibra has become the most universally available solution in the market. Now, we all know that data is a critical asset that can make or break an organization. Yet organizations struggle to capture the power of their data and many remain afraid of how their data could be misused and or abused. We also observe that the understanding of and access to data remains in the hands of just a small few, three out of every four companies continue to struggle to use data, to drive meaningful insights, all forward looking companies, looking for an advantage, a differentiator that will set them apart from their peers and competitors. What if you could improve your organization's productivity by just 5%, even a modest 5% productivity improvement compounded over a five-year period will make your organization 28% more productive. This will leave you with an overwhelming advantage over your competition and uniting your data. >>Litter employees with data is the key to your success. And dare I say, sorry to unlock this potential for increased productivity, huge competitive advantage organizations need to enable self-service access to data for everyday to literate knowledge worker. Our ultimate goal at Cleaver has always been to enable this self-service for our customers to empower every knowledge worker to access the data they need when they need it. But with the peace of mind that your data is governed insecure. Just to imagine if you had a single integrated solution that could deliver a seamless governed, no code user experience of delivering the right data to the right person at the right time, just as simply as ordering a pair of shoes online would be quite a magic trick and one that would place you and your organization on the fast track for success. Let me introduce you to our character here. >>Cliff cliff is that business analyst. He doesn't write code. He doesn't know Julian or R or sequel, but is data literate. When cliff has presented with data of high quality and can actually help find that data of high-quality cliff knows what to do with it. Well, we're going to expose cliff to our software and see how he can find the best data to solve his problem of the day, which is customer churn. Cliff is going to go out and find this information is going to bring it back to him. And he's going to analyze it in his favorite BI reporting tool. Tableau, of course, that could be Looker, could be power BI or any other of your favorites, but let's go ahead and get started and see how cliff can do this without any help from anyone in the organization. So cliff is going to log into Cleaver and being a business user. >>The first thing he's going to do is look for a business term. He looks for customer churn rate. Now, when he brings back a churn rate, it shows him the definition of churn rate and various other things that have been attributed to it such as data domains like product and customer in order. Now, cliff says, okay, customer is really important. So let me click on that and see what makes up customer definition. Cliff will scroll through a customer and find out the various data concepts attributes that make up the definition of customer and cliff knows that customer identifier is a really important aspect to this. It helps link all the data together. And so cliff is going to want to make sure that whatever source he brings actually has customer identifier in it. And that it's of high quality cliff is also interested in things such as email address and credit activity and credit card. >>But he's now going to say, okay, what data sets actually have customer as a data domain in, and by the way, why I'm doing it, what else has product and order information? That's again, relevant to the concept of customer churn. Now, as he goes on, he can actually filter down because there's a lot of different results that could potentially come back. And again, customer identifier was very important to cliff. So cliff, further filters on customer identifier any further does it on customer churn rate as well. This results in two different datasets that are available to cliff for selection, which one to use? Well, he's first presented with some data quality information you can see for customer analytics. It has a data quality score of 76. You can see for sales data enrichment dataset. It has a data quality score of 68. Something that he can see right at the front of the box of things that he's looking for, but let's dig in deeper because the contents really matter. >>So we see again the score of 76, but we actually have the chance to find out that this is something that's actually certified. And this is something that has a check mark. And so he knows someone he trusts is actually certified. This is a dataset. You'll see that there's 91 columns that make up this data set. And rather than sifting through all of that information, cliff is going to go ahead and say, well, okay, customer identifier is very important to me. Let me search through and see if I can find what it's data quality scores very quickly. He finds that using a fuzzy search and brings back and sees, wow, that's a really high data quality score of 98. Well, what's the alternative? Well, the data set is only has 68, but how about, uh, the customer identifier and quickly, he discovers that the data quality for that is only 70. >>So all things being equal, customer analytics is the better data set for what cliff needs to achieve. But now he wants to look and say, other people have used this, what have they had to say about it? And you can see there are various reviews for different reviews from peers of his, in the organization that have given it five stars. So this is encourages cliffs, a confidence that this is great data set to use. Now cliff wants to look a little bit more detailed before he finally commits to using this dataset. Cliff has the opportunity to look at it in the broader set. What are the things can I learn about customer analytics, such as what else is it related to? Who else uses it? Where did it come from? Where does it go and what actually happens to it? And so within our graph of information, we're able to show you a diagram. >>You can see the customer analytics actually comes from the CRM cloud system. And from there you can inherit some wonderful information. We know exactly what CRM cloud is about as an overall system. It's related to other logical models. And here you're actually seeing that it's related to a policy policy about PII or personally identifiable information. This gets cliff almost the immediate knowledge that there's going to be some customer information in this PII information that he's not going to be able to see given his user role in the organization. But cliff says, Hey, that's okay. I actually don't need to see somebody's name and social security number to do my work. I can actually work with other information in the data file. That'll actually help me understand why our customers churning in, what can I actually do about it. If we dig in deeper, we can see what is personally identifiable information that actually could cause issues. >>And as we scroll down and take a little bit of a focus on what we call or what you'll see here is customer phone, because we'll show that to you a little bit later, but these show the various information that once cliff actually has it fulfilled and delivered to him, he will see that it's actually massed and or redacted from his use. Now cliff might drive in deeper and see more information. And he says, you know what? Another piece that's important to me in my analysis is something called is churned. This is basically suggesting that has a customer actually churned. It's an important flag, of course, because that's the analysis that he's performing cliff sees that the score is a mere 65. That's not exactly a great data quality score, but cliff has, is kind of in a hurry. His bosses is, has come back and said, we need to have this information so we can take action. >>So he's not going to wait around to see if they can go through some long day to quality project before he pursues, but he is going to come up and use it. The speed of thinking. He's going to create a suggestion, an issue. He's going to submit this as a work queue item that actually informs others that are responsible for the quality of data. That there's an opportunity for improvement to this dataset that is highly reviewed, but it may be, it has room for improvement as cliff is actually typing in his explanation that he'll pass along. We can also see that the data quality is made up of multiple components, such as integrity, duplication, accuracy, consistency, and conformity. Um, we see that we can submit this, uh, issue and pass it through. And this will go to somebody else who can actually work on this. >>And we'll show that to you a little bit later, but back to cliff, cliff says, okay, I'd like to, I'd like to work with this dataset. So he adds it to his data basket. And just like if he's shopping online, cliff wants that kind of ability to just say, I want to just click once and be done with it. Now it is data and there's some sensitivity about it. And again, there's an owner of this data who you need to get permission from. So cliff is going to provide information to the owner to say, here's why I need this data. And how long do I need this data for starting on a certain date and ending on a certain date and ultimately, what purpose am I going to have with this data? Now, there are other things that cliff can choose to run. This one is how do you want this day to deliver to you? >>Now, you'll see down below, there are three options. One is borrow the other's lease and others by what does that mean? Well, borrow is this idea of, I don't want to have the data that's currently in this CRM, uh, cloud database moved somewhere. I don't want it to be persistent anywhere else. I just want to borrow it very short term to use in my Tablo report and then poof be gone. Cause I don't want to create any problems in my organization. Now you also see lease. Lease is a situation where you actually do need to take possession of the data, but only for a time box period of time, you don't need it for an indefinite amount of time. And ultimately buy is your ability to take possession of the data and have it in perpetuity. So we're going to go forward with our bar use case and cliff is going to submit this and all the fun starts there. >>So cliff has actually submitted the order and the owner, Joanna is actually going to receive the request for the order. Joanna, uh, opens up her task, UCS there's work to perform. It says, oh, okay, here's this there's work for me to perform. Now, Joanna has the ability to automate this using incorporated workflow that we have in Colibra. But for this situation, she's going to manually review that. Cliff wants to borrow a specific data set for a certain period of time. And he actually wants to be using in a Tablo context. So she reviews. It makes an approval and submits it this in turn, flips it back to cliff who says, okay, what obligations did I just take on in order to work for this data? And he reviews each of these data sharing agreements that you, as an organization would set up and say, what am I, uh, what are my restrictions for using this data site? >>As cliff accepts his notices, he now has triggered the process of what we would call fulfillment or a service broker. And in this situation we're doing a virtualization, uh, access, uh, for the borrow use case. Cliff suggests Tablo is his preferred BI and reporting tool. And you can see the various options that are available from power BI Looker size on ThoughtSpot. There are others that can be added over time. And from there, cliff now will be alerted the minute this data is available to them. So now we're running out and doing a distributed query to get the information and you see it returns back for raw view. Now what's really interesting is you'll see, the customer phone has a bunch of X's in it. If you remember that's PII. So it's actually being massed. So cliff can't actually see the raw data. Now cliff also wants to look at it in a Tablo report and can see the visualization layer, but you also see an incorporation of something we call Collibra on the go. >>Not only do we bring the data to the report, but then we tell you the reader, how to interpret the report. It could be that there's someone else who wants to use the very same report that cliff helped create, but they don't understand exactly all the things that cliff went through. So now they have the ability to get a full interpretation of what was this data that was used, where did it come from? And how do I actually interpret some of the fields that I see on this report? Really a clever combination of bringing the data to you and showing you how to use it. Cliff can also see this as a registered asset within a Colibra. So the next shopper comes through might actually, instead of shopping for the dataset might actually shop for the report itself. And the report is connected with the data set he used. >>So now they have a full bill of materials to run a customer Shern report and schedule it anytime they want. So now we've turned cliff actually into a creator of data assets, and this is where intelligent, it gets more intelligence and that's really what we call data intelligence. So let's go back through that magic trick that we just did with cliff. So cliff went into the software, not knowing if the source of data that he was looking for for customer product sales was even available to him. He went in very quickly and searched and found his dataset, use facts and facets to filter down to exactly what was available. Compare to contrast the options that were there actually made an observation that there actually wasn't enough data quality around a certain thing was important to him, created an idea, or basically a suggestion for somebody to follow up on was able to put that into his shopping basket checkout and have it delivered to his front door. >>I mean, that's a bit of a magic trick, right? So, uh, cliff was successful in finding data that he wanted and having it, deliver it to him. And then in his preferred model, he was able to look at it into Tableau. All right. So let's talk about how we're going to make this vision a reality. So our first section here is about performance and scale, but it's also about codeless database registration. How did we get all that stuff into the data catalog and available for, uh, cliff to find? So allow us to introduce you to what we call the asset life cycle and some of the largest organizations in the world. They might have upwards of a billion data assets. These are columns and tables, reports, API, APIs, algorithms, et cetera. These are very high volume and quite technical and far more information than a business user like cliff might want to be engaged with those very same really large organizations may have upwards of say, 20 to 25 million that are critical data sources and data assets, things that they do need to highly curate and make available. >>But through that as a bit of a distillation, a lifecycle of different things you might want to do along that. And so we're going to share with you how you can actually automatically register these sources, deal with these very large volumes at speed and at scale, and actually make it available with just a level of information you need to govern and protect, but also make it available for opportunistic use cases, such as the one we presented with cliff. So as you recall, when cliff was actually trying to look for his dataset, he identified that the is churned, uh, data at your was of low quality. So he passed this over to Eliza, who's a data steward and she actually receives this work queue in a collaborative fashion. And she has to review, what is the request? If you recall, this was the request to improve the data quality for his churn. >>Now she needs to familiarize herself with what cliff was observing when he was doing his shopping experience. So she digs in and wants to look at the quality that he was observing and sure enough, as she goes down and it looks at his churn, she sees that it was a low 65% and now understands exactly what cliff was referring to. She says, aha, okay. I need to get help. I need to decide whether I have a data quality project to fix the data, or should I see if there's another data set in the organization that has better, uh, data for this. And so she creates a queue that can go over to one of her colleagues who really focuses on data quality. She submits this request and it goes over to, uh, her colleague, John who's really familiar with data quality. So John actually receives the request from Eliza and you'll see a task showing up in his queue. >>He opens up the request and finds out that Eliza's asking if there's another source out there that actually has good is churned, uh, data available. Now he actually knows quite a bit about the quality of information sturdiness. So he goes into the data quality console and does a quick look for a dataset that he's familiar with called customer product sales. He quickly scrolls down and finds out the one that's actually been published. That's the one he was looking for and he opens it up to find out more information. What data sets are, what columns are actually in there. And he goes down to find his churned is in fact, one of the attributes in there. It actually does have active rules that are associated with it to manage the quality. And so he says, well, let's look in more detail and find out what is the quality of this dataset? >>Oh, it's 86. This is a dramatic improvement over what we've seen before. So we can see again, it's trended quite nicely over time each day, it hasn't actually degraded in performance. So we actually responds back to realize and say, this data set, uh, is actually the data set that you want to bring in. It really will improve. And you'll see that he refers to the refined database within the CRM cloud solution. Once he actually submits this, it goes back to Eliza and she's able to continue her work. Now when Eliza actually brings this back open, she's able to very quickly go into the database registration process for her. She very quickly goes into the CRM cloud, selects the community, to which she wants to register this, uh, data set into the schemas community. And the CRM cloud is the system that she wants to load it in. >>And the refined is the database that John told her that she should bring in. After a quick description, she's able to click register. And this triggers that automatic codeless process of going out to the dataset and bringing back its metadata. Now metadata is great, but it's not the end all be all. There's a lot of other values that she really cares about as she's actually registering this dataset and synchronizing the metadata she's also then asked, would you like to bring in quality information? And so she'll go out and say, yes, of course, I want to enable the quality information from CRM refined. I also want to bring back lineage information to associate with this metadata. And I also want to select profiling and classification information. Now when she actually selects it, she can also say, how often do you want to synchronize this? This is a daily, weekly, monthly kind of update. >>That's part of the change data capture process. Again, all automated without the require of actually writing code. So she's actually run this process. Now, after this loads in, she can then open up this new registered, uh, dataset and actually look and see if it actually has achieved the problem that cliff set her out on, which was improved data quality. So looking into the data quality for the is churn capability shows her that she has fantastic quality. It's at a hundred, it's exactly what she was looking for. So she can with confidence actually, uh, suggest that it's done, but she did notice something and something that she wants to tell John, which is there's a couple of data quality checks that seem to be missing from this dataset. So again, in a collaborative fashion, she can pass that information, uh, for validity and completeness to say, you know what, check for NOLs and MPS and send that back. >>So she submits this onto John to work on. And John now has a work queue in his task force, but remember she's been working in this task forklift and because she actually has actually added a much better source for his churn information, she's going to update that test that was sent to her to notify cliff that the work has actually been done and that she actually has a really good data set in there. In fact, if you recall, it was 100% in terms of its data quality. So this will really make life a lot easier for cliff. Once he receives that data and processes, the churn report analysis next time. So let's talk about these audacious performance goals that we have in mind. Now today, we actually have really strong performance and amazing usability. Our customers continue to tell us how great our usability is, but they keep asking for more well, we've decided to present to you. >>Something you can start to bank on. This is the performance you can expect from us on the highly curated assets that are available for the business users, as well as the technical and lineage assets that are more available for the developer uses and for things that are more warehoused based, you'll see in Q1, uh, our Q2 of this year, we're making available 5 million curated assets. Now you might be out there saying, Hey, I'm already using the software and I've got over 20 million already. That's fair. We do. We have customers that are actually well over 20 million in terms of assets they're managing, but we wanted to present this to you with zero conditions, no limitations we wouldn't talk about, well, it depends, et cetera. This is without any conditions. That's what we can offer you without fail. And yes, it can go higher and higher. We're also talking about the speed with which you can ingest the data right now, we're ingesting somewhere around 50,000 to a hundred thousand records per and of course, yes, you've probably seen it go quite a bit faster, but we are assuring you that that's the case, but what's really impressive is right now, we can also, uh, help you manage 250 million technical assets and we can load it at a speed of 25 million for our, and you can see how over the next 18 months about every two quarters, we show you dramatic improvements, more than doubling of these. >>For most of them leading up to the end of 2022, we're actually handling over a billion technical lineage assets and we're loading at a hundred million per hour. That sets the mark for the industry. Earlier this year, we announced a recent acquisition Al DQ. LDQ brought to us machine learning based data quality. We're now able to introduce to you Collibra data quality, the first integrated approach to Al DQ and Culebra. We've got a demo to follow. I'm really excited to share it with you. Let's get started. So Eliza submitted a task for John to work on, remember to add checks for no and for empty. So John picks up this task very quickly and looks and sees what's what's the request. And from there says, ah, yes, we do have a quality check issue when we look at these churns. So he jumps over to the data quality console and says, I need to create a new data quality test. >>So cliff is able to go in, uh, to the solution and, uh, set up quick rules, automated rules. Uh, he could inherit rules from other things, but it starts with first identifying what is the data source that he needs to connect to, to perform this. And so he chooses the CRM refined data set that was most recently, uh, registered by Lysa. You'll see the same score of 86 was the quality score for the dataset. And you'll also see, there are four rules that are associated underneath this. Now there are various checks that, uh, that John can establish on this, but remember, this is a fairly easy request that he receives from Eliza. So he's going to go in and choose the actual field, uh, is churned. Uh, and from there identify quick rules of, uh, an empty check and that quickly sets up the rules for him. >>And also the null check equally fast. This one's established and analyzes all the data in there. And this sets up the baseline of data quality, uh, for this. Now this data, once it's captured then is periodically brought back to the catalog. So it's available to not only Eliza, but also to cliff next time he, uh, where to shop in the environment. As we look through the rules that were created through that very simple user experience, you can see the one for is empty and is no that we're set up. Now, these are various, uh, styles that can be set up either manually, or you can set them up through machine learning again, or you can inherit them. But the key is to track these, uh, rule creation in the metrics that are generated from these rules so that it can be brought back to the catalog and then used in meaningful context, by someone who's shopping and the confidence that this has neither empty nor no fields, at least most of them don't well now give a confidence as you go forward. >>And as you can see, those checks have now been entered in and you can see that it's a hundred percent quality score for the Knoll check. So with confidence now, John can actually respond back to Eliza and say, I've actually inserted them they're up and running. And, uh, you're in good status. So that was pretty amazing integration, right? And four months after our acquisition, we've already brought that level of integration between, uh, Colibra, uh, data intelligence, cloud, and data quality. Now it doesn't stop there. We have really impressive and high site set early next year. We're getting introduced a fully immersive experience where customers can work within Culebra and actually bring the data quality information all the way in as well as start to manipulate the rules and generate the machine learning rules. On top of it, all of that will be a deeply immersive experience. >>We also have something really clever coming, which we call continuous data profiling, where we bring the power of data quality all the way into the database. So it's continuously running and always making that data available for you. Now, I'd also like to share with you one of the reasons why we are the most universally available software solutions in data intelligence. We've already announced that we're available on AWS and Google cloud prior, but today we can announce to you in Q3, we're going to be, um, available on Microsoft Azure as well. Now it's not just these three cloud providers that were available on we've also become available on each of their marketplaces. So if you are buying our software, you can actually go out and achieve that same purchase from their marketplace and achieve your financial objectives as well. We're very excited about this. These are very important partners for, uh, for our, for us. >>Now, I'd also like to introduce you our system integrators, without them. There's no way we could actually achieve our objectives of growing so rapidly and dealing with the demand that you customers have had Accenture, Deloitte emphasis, and even others have been instrumental in making sure that we can serve your needs when you need them. Uh, and so it's been a big part of our growth and will be a continued part of our growth as well. And finally, I'd like to actually introduce you to our product showcases where we can go into absolute detail on many of the topics I talked about today, such as data governance with Arco or data privacy with Sergio or data quality with Brian and finally catalog with Peter. Again, I'd like to thank you all for joining us. Uh, and we really look forward to hearing your feedback. Thank you..

Published Date : Jun 17 2021

SUMMARY :

I have the benefit of sharing with you, We also observe that the understanding of and access to data remains in the hands of to imagine if you had a single integrated solution that could deliver a seamless governed, And he's going to analyze it in his favorite BI reporting tool. And so cliff is going to want to make sure that are available to cliff for selection, which one to use? And rather than sifting through all of that information, cliff is going to go ahead and say, well, okay, Cliff has the opportunity to look at it in the broader set. knowledge that there's going to be some customer information in this PII information that he's not going to be And as we scroll down and take a little bit of a focus on what we call or what you'll see here is customer phone, We can also see that the data quality is made up of multiple components, So cliff is going to provide information to the owner to say, case and cliff is going to submit this and all the fun starts there. So cliff has actually submitted the order and the owner, Joanna is actually going to receive the request for the order. in a Tablo report and can see the visualization layer, but you also see an incorporation of something we call Collibra Really a clever combination of bringing the data to you and showing you how to So now they have a full bill of materials to run a customer Shern report and schedule it anytime they want. So allow us to introduce you to what we call the asset life cycle and And so we're going to share with you how you can actually automatically register these sources, And so she creates a queue that can go over to one of her colleagues who really focuses on data quality. And he goes down to find So we actually responds back to realize and say, this data set, uh, is actually the data set that you want And the refined is the database that John told her that she should bring in. So again, in a collaborative fashion, she can pass that information, uh, So she submits this onto John to work on. We're also talking about the speed with which you can ingest the data right We're now able to introduce to you Collibra data quality, the first integrated approach to Al So cliff is able to go in, uh, to the solution and, uh, set up quick rules, So it's available to not only Eliza, but also to cliff next time he, uh, And as you can see, those checks have now been entered in and you can see that it's a hundred percent quality Now, I'd also like to share with you one of the reasons why we are the most And finally, I'd like to actually introduce you to our product showcases where we can go into

ENTITIES

Entity	Category	Confidence
Joanna	PERSON	0.99+
John	PERSON	0.99+
Brian	PERSON	0.99+
Jim Cushman	PERSON	0.99+
Deloitte	ORGANIZATION	0.99+
Peter	PERSON	0.99+
Eliza	PERSON	0.99+
Accenture	ORGANIZATION	0.99+
cliff	PERSON	0.99+
Arco	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
5 million	QUANTITY	0.99+
250 million	QUANTITY	0.99+
20	QUANTITY	0.99+
65	QUANTITY	0.99+
28%	QUANTITY	0.99+
25 million	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
98	QUANTITY	0.99+
Cliff	PERSON	0.99+
Collibra	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
5%	QUANTITY	0.99+
first section	QUANTITY	0.99+
68	QUANTITY	0.99+
first	QUANTITY	0.99+
76	QUANTITY	0.99+
One	QUANTITY	0.99+
five stars	QUANTITY	0.99+
Culebra	ORGANIZATION	0.99+
LDQ	ORGANIZATION	0.99+
91 columns	QUANTITY	0.99+
today	DATE	0.99+
Al DQ	ORGANIZATION	0.99+
Cleaver	ORGANIZATION	0.99+
86	QUANTITY	0.99+
one	QUANTITY	0.98+
three	QUANTITY	0.98+
end of 2022	DATE	0.98+
each day	QUANTITY	0.98+
each	QUANTITY	0.98+
over 20 million	QUANTITY	0.98+
Cliff cliff	PERSON	0.98+
next year	DATE	0.98+
Q1	DATE	0.98+
70	QUANTITY	0.98+
Google	ORGANIZATION	0.98+
Tableau	TITLE	0.98+

Kirk Haslbeck, Collibra | Collibra Data Citizens'21

>> Narrator: From around the globe. It's theCUBE covering Data Citizens, 21 brought to you by Collibra. >> Hi everybody, John Walls here on theCUBE continuing our coverage of Data Citizens 2021. And I'm with now Kirk Haslbeck was the vice president of engineering at Collibra. Kirk joins us from his home, Kirk good to see you today. Thanks for joining us here on theCUBE. >> Well, thanks for having me, I'm excited to be here. >> Yeah, no, this is all about data quality, right? That's your world, you know, making sure that you're making the most of this great asset, right? That continues to evolve and mature. And yet I'm wondering from your perspective from your side of the fence, I assume data quality has always been a concern, right? Making the most of this asset, wherever it is. And whenever you can get it. >> Yeah, absolutely. I mean, the challenge hasn't slowed down, right? We're looking at more data coming in all the time laws of large numbers, but you kind of have to wonder a lot of the large organizations have been trying to solve this for quite some time, right? So what is going on? Why isn't it just easier to get our arms around it? And there's so many reasons, but if I were to list maybe the top one it's the diminishing value of static rules and a good example of that might just be something as simple as starting with a gender column. And back in the day, we might have assumed that it had to be an M or an F male or female. And over the last couple of years, we've actually seen that column evolve into six or seven different types. So just the very act of assuming that we could go in and write rules about our business and that they're never going to change and that the data's not evolving. And we start to think about zip codes and addresses that are changing, you know, Google street view. However you want to think of it. Every column and every record is just changing all the time. And so what, you know, many large organizations have done they've written maybe forty thousand, fifty thousand rules and they have to continue to manage them. So I think we all try to get our arms around rule creation. And it's not even just about that. It would also be about if you had all the rules in place could you even keep up with them on a day-to-day changing basis? And so one of the largest companies in the U.S sat down with myself and team early on and said, so what am I up against? I'm really either going to continue to hire a mountain of rule writers, you know, as they put it per department to get my arms around this and that'll never end, or I need to think of a better way which was the solution that we were ultimately providing at that time. And, you know, and what that solution really entails is using data mining to learn and observe all the data that's already there and to curate the rules based on the data itself, right? That's where all the information is. And then ultimately we have this concept of adaptive ruling which means all the variants in that column all the new values that come in every day, the roll counts, the sizes are all being managed. It's an automatic program, so that the rule is recalibrating itself and I think this is where most most chief data officers sit back and say if I have to protect the franchise, right? If I have to put a trusted data program in place what are my options and how does it scale? And they have to take a really hard look at something like this. >> You know, the process that you're talking about too it just kind of reminds me of, of like, of a diet in that nobody wants to go through that pain, right? We all want to eat, what we want to eat but you're really happy when you get there at the end of the day, you like the way you look like the way you feel, like the way you act, all those things, so it'd be almost like when you're talking about in terms of this data, you know, in terms of a rule setting, right? Governance and accessibility and all these things, it's, it can be a tough process. Can be, but it certainly seems well worth it because you make your data all the more valuable and essential to your business, Is that about right? >> Yeah, that's right, that's right. And you know, it's funny you compare it to a diet. Sometimes I think of a patient stress test, you know, almost like a health exam and we're spending so much time testing the analytics or testing the models and looking at accuracy and can anybody achieve 89 to 90% but we're probably not spending enough time testing our data assumptions, right? Running that diet or health check against the data itself. And I would say that every fortune 100 or even fortune 1000 probably considers themselves a data-driven business at this point in time, which means they're going to make decisions quickly based on data. And if we really pull that thread a little bit, what about what's the cost of making decisions on incorrect data? I mean it's terribly scary as we start to unfold that, so you're absolutely right. They're taking it very seriously. And it takes a lot of thought of how to get enough coverage and how to create trust in that type of environment. >> Yeah, it's almost too, it's like, you know the concept of input bias a little bit here where were if you're assuming that certain data sets are accurate and pertinent, relevant, all those things and then you're making decisions based on those data sets but you might be looking at kind of an input bias if I'm hearing you right, that you're maybe you're not keeping your mind open as to what really should be important or influential in your decision-making in terms of data. And then obviously acting on that appropriately. So you have to decide maybe on the front side, you know, what data matters and you help people do that. And then help me make decisions based on good data basically, right? >> Right, that's right and to be fully transparent and candid we weren't as strong in the what data matters piece of it. We were very strong early on in giving you broad coverage meaning we made no assumptions, right? We wanted to go out and attack the whole surface of the problem and then sort of have a consistent scoring methodology. And as we've partnered and now become acquired by Collibra which is an exciting path, they are very good at what's called critical data elements and lineage and doing graph analysis to sort of identify the assets that are most used. And that's where we see a huge benefit in combining those two powers. So you kind of got there quickly, but ultimately we are combining the forces of total coverage at scale with what is most important to you. >> Imagine we coming OwlDQ, you were the founder of that, that was purchased by Collibra. Tell us a little bit about, just about how that came to be in first off, we did a OwlDQ, what that was all about and then how this, this a marriage, if you will how this relationship with Collibra evolved and then you were eventually purchased. >> Yeah, absolutely, so, I mean, I had this passion that I couldn't hold back on in the data community. Once you see it this way, where you can use data mining and compute power to curate and manage rules and then take it much beyond there and to predicting and seeing around the corner for tomorrow, you have to go that direction. So that's exactly what myself and team did. And what we started to see with the early adopters of our software was that they were getting a seven figure return on investment per department. And they were able to replicate this across many departments, so we've had a great lifespan with those customers, staying and growing and expanding but we were getting a little bit of market pressure from the investment community, as well as that same customer community that they wanted us to integrate with their data catalog and the data catalog of choice. Every time the conversation was Collibra. And interestingly enough, you know, I ran into the likes of Jim Cushman and in the, you know, the whole thing unfolds from there. I think they were seeing a little bit of a similar story saying doesn't catalog and lineage belong together with quality. And when we sat together it was like three market forces suggesting the same answer. And as we laid out the roadmap and the integration we just can't see it any other way. There's no way I'll be bold and say that it goes back the other way, not just for this company but for the industry, data governance and data intelligence will absolutely combine quality, lineage, catalog and all of the above in the future. It is becoming that clear, I think. >> You know, this has kind of a big picture question, about all of that data quality right now, what's driving this avid interest that organizations showing and it's you know, small, medium enterprise it's everybody but in your mind, you know, you've been involved in this for a number of years now. You know, why now, what is it now? Is it just that we have so much more data available that so much of it's own use that, that, you know, we know what we have. And we're realizing that what we have is pretty valuable but you know, what's the driver, what's the big push here? >> Yeah, it is a tough question. And I have gotten this one before and it's interesting because it's been around since the nineties, right? So it's a very fair question. There's a couple things I think that are driving it. One as we start to see more data in Tableau dashboards and pick your favorite BI tool you start to realize the data's not correct. You know, you look at your house on Zillow or whatever you find out it's mislabeled. It doesn't have the right bedrooms. Maybe humans are entering into the listings and as data's become more available visually we're more critical of it. And now businesses are becoming more data-driven where they're humans aren't involved as much and the actions are automatically being taken. And it becomes an embarrassing moment if your data is incorrect and we can really measure that cost at this point. You do see some other factors like cloud migration. Well, that adds a risk to your business. Could you possibly port everything, not just the servers not just the software, but all of your data into another system and think that there would be no errors in that process. So as people are kind of creating their next generation platforms, and then probably even a touch of COVID accelerating that cloud migration adoption and even just technology adoption. So for a multitude of reasons, there's just more data and there's more data quality concerns than ever before. >> So if you're talking to a prospective client right now, which you probably are, you know, what do you want to share with them? Or what would you encourage them to consider in terms of kind of their data venture their data journey if you will, in terms of, you know, refining what they have in terms of mining appropriately in terms of governing it appropriately, all these things that maybe haven't been given a lot of consideration or deep consideration. >> Yeah, I think the two things although if you listen to my other talks I can talk forever about, about all of those items. It probably, you know, maybe just do the napkin math of all the tables, all the files all the Kafka messages, right? All the columns and fields and attributes and kind of just multiply that out and and try to figure out how you would get coverage. And if you could, how you could maintain it. And why shouldn't we be trading compute power for domain knowledge and things at that point I think that's the first place to start. And probably the second is actually the act of traditional data quality rules puts you in a binary situation. It basically says you will either have a break record or you will not. So it's a yes, no question, what it never will tell you is what the answer should have been. And if you take a deeper look at the solution that we're providing to the market we're actually predicting to you what the correct value is and it's a complete paradigm shift it obviously is much more scientific, but it's much more powerful to get you to the end answer more quickly instead of just going through break records. >> Right? Tremendous capability that you just described. And on that, I'm going to thank you for the time but just think about it, right? We're we're not only going to help you make more sense of your data. We're also going to help you make better decisions and show you what that path might be or what you probably should be considering. So it certainly opens up a lot of doors for a lot of companies in that respect. Kirk, thanks for the time, sorry we didn't have enough time to hear that guitar in the background, but next time I'm going to hold you to it, okay. >> Yeah, that sounds good, John, I really appreciate it. >> All right very good Kirk Haslbeck joining us from Collibra, we continue our coverage here at Data Citizens 21 on theCUBE and I'm John Walls. (bright music)

Published Date : Jun 17 2021

SUMMARY :

brought to you by Collibra. Kirk good to see you today. me, I'm excited to be here. And whenever you can get it. and that the data's not evolving. like the way you feel, And you know, it's funny and you help people do that. of identify the assets that are most used. and then you were eventually purchased. and all of the above in the future. but you know, what's the driver, and the actions are you know, what do you to get you to the end answer I'm going to hold you to it, okay. Yeah, that sounds good, joining us from Collibra, we

ENTITIES

Entity	Category	Confidence
Kirk	PERSON	0.99+
Collibra	ORGANIZATION	0.99+
John	PERSON	0.99+
John Walls	PERSON	0.99+
six	QUANTITY	0.99+
89	QUANTITY	0.99+
forty thousand	QUANTITY	0.99+
Kirk Haslbeck	PERSON	0.99+
Jim Cushman	PERSON	0.99+
second	QUANTITY	0.99+
two powers	QUANTITY	0.99+
two things	QUANTITY	0.99+
one	QUANTITY	0.99+
U.S	LOCATION	0.98+
90%	QUANTITY	0.98+
Tableau	TITLE	0.98+
seven figure	QUANTITY	0.97+
tomorrow	DATE	0.97+
OwlDQ	ORGANIZATION	0.96+
today	DATE	0.95+
three market forces	QUANTITY	0.93+
fifty thousand rules	QUANTITY	0.93+
nineties	DATE	0.93+
One	QUANTITY	0.93+
first	QUANTITY	0.92+
theCUBE	ORGANIZATION	0.91+
Kafka	PERSON	0.88+
first place	QUANTITY	0.85+
seven different types	QUANTITY	0.83+
Data Citizens'21	ORGANIZATION	0.82+
couple things	QUANTITY	0.73+
Google	ORGANIZATION	0.73+
Data Citizens	ORGANIZATION	0.72+
2021	DATE	0.69+
COVID	TITLE	0.69+
fortune 1000	ORGANIZATION	0.66+
Data	EVENT	0.66+
fortune 100	ORGANIZATION	0.66+
street view	TITLE	0.65+
last couple of years	DATE	0.63+
21	EVENT	0.55+
Zillow	ORGANIZATION	0.55+
Data Citizens	TITLE	0.51+
Citizens	ORGANIZATION	0.39+
21	QUANTITY	0.35+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Jim Cushman: