Kirk Haslbeck, Collibra | Collibra Data Citizens'21
>> Narrator: From around the globe. It's theCUBE covering Data Citizens, 21 brought to you by Collibra. >> Hi everybody, John Walls here on theCUBE continuing our coverage of Data Citizens 2021. And I'm with now Kirk Haslbeck was the vice president of engineering at Collibra. Kirk joins us from his home, Kirk good to see you today. Thanks for joining us here on theCUBE. >> Well, thanks for having me, I'm excited to be here. >> Yeah, no, this is all about data quality, right? That's your world, you know, making sure that you're making the most of this great asset, right? That continues to evolve and mature. And yet I'm wondering from your perspective from your side of the fence, I assume data quality has always been a concern, right? Making the most of this asset, wherever it is. And whenever you can get it. >> Yeah, absolutely. I mean, the challenge hasn't slowed down, right? We're looking at more data coming in all the time laws of large numbers, but you kind of have to wonder a lot of the large organizations have been trying to solve this for quite some time, right? So what is going on? Why isn't it just easier to get our arms around it? And there's so many reasons, but if I were to list maybe the top one it's the diminishing value of static rules and a good example of that might just be something as simple as starting with a gender column. And back in the day, we might have assumed that it had to be an M or an F male or female. And over the last couple of years, we've actually seen that column evolve into six or seven different types. So just the very act of assuming that we could go in and write rules about our business and that they're never going to change and that the data's not evolving. And we start to think about zip codes and addresses that are changing, you know, Google street view. However you want to think of it. Every column and every record is just changing all the time. And so what, you know, many large organizations have done they've written maybe forty thousand, fifty thousand rules and they have to continue to manage them. So I think we all try to get our arms around rule creation. And it's not even just about that. It would also be about if you had all the rules in place could you even keep up with them on a day-to-day changing basis? And so one of the largest companies in the U.S sat down with myself and team early on and said, so what am I up against? I'm really either going to continue to hire a mountain of rule writers, you know, as they put it per department to get my arms around this and that'll never end, or I need to think of a better way which was the solution that we were ultimately providing at that time. And, you know, and what that solution really entails is using data mining to learn and observe all the data that's already there and to curate the rules based on the data itself, right? That's where all the information is. And then ultimately we have this concept of adaptive ruling which means all the variants in that column all the new values that come in every day, the roll counts, the sizes are all being managed. It's an automatic program, so that the rule is recalibrating itself and I think this is where most most chief data officers sit back and say if I have to protect the franchise, right? If I have to put a trusted data program in place what are my options and how does it scale? And they have to take a really hard look at something like this. >> You know, the process that you're talking about too it just kind of reminds me of, of like, of a diet in that nobody wants to go through that pain, right? We all want to eat, what we want to eat but you're really happy when you get there at the end of the day, you like the way you look like the way you feel, like the way you act, all those things, so it'd be almost like when you're talking about in terms of this data, you know, in terms of a rule setting, right? Governance and accessibility and all these things, it's, it can be a tough process. Can be, but it certainly seems well worth it because you make your data all the more valuable and essential to your business, Is that about right? >> Yeah, that's right, that's right. And you know, it's funny you compare it to a diet. Sometimes I think of a patient stress test, you know, almost like a health exam and we're spending so much time testing the analytics or testing the models and looking at accuracy and can anybody achieve 89 to 90% but we're probably not spending enough time testing our data assumptions, right? Running that diet or health check against the data itself. And I would say that every fortune 100 or even fortune 1000 probably considers themselves a data-driven business at this point in time, which means they're going to make decisions quickly based on data. And if we really pull that thread a little bit, what about what's the cost of making decisions on incorrect data? I mean it's terribly scary as we start to unfold that, so you're absolutely right. They're taking it very seriously. And it takes a lot of thought of how to get enough coverage and how to create trust in that type of environment. >> Yeah, it's almost too, it's like, you know the concept of input bias a little bit here where were if you're assuming that certain data sets are accurate and pertinent, relevant, all those things and then you're making decisions based on those data sets but you might be looking at kind of an input bias if I'm hearing you right, that you're maybe you're not keeping your mind open as to what really should be important or influential in your decision-making in terms of data. And then obviously acting on that appropriately. So you have to decide maybe on the front side, you know, what data matters and you help people do that. And then help me make decisions based on good data basically, right? >> Right, that's right and to be fully transparent and candid we weren't as strong in the what data matters piece of it. We were very strong early on in giving you broad coverage meaning we made no assumptions, right? We wanted to go out and attack the whole surface of the problem and then sort of have a consistent scoring methodology. And as we've partnered and now become acquired by Collibra which is an exciting path, they are very good at what's called critical data elements and lineage and doing graph analysis to sort of identify the assets that are most used. And that's where we see a huge benefit in combining those two powers. So you kind of got there quickly, but ultimately we are combining the forces of total coverage at scale with what is most important to you. >> Imagine we coming OwlDQ, you were the founder of that, that was purchased by Collibra. Tell us a little bit about, just about how that came to be in first off, we did a OwlDQ, what that was all about and then how this, this a marriage, if you will how this relationship with Collibra evolved and then you were eventually purchased. >> Yeah, absolutely, so, I mean, I had this passion that I couldn't hold back on in the data community. Once you see it this way, where you can use data mining and compute power to curate and manage rules and then take it much beyond there and to predicting and seeing around the corner for tomorrow, you have to go that direction. So that's exactly what myself and team did. And what we started to see with the early adopters of our software was that they were getting a seven figure return on investment per department. And they were able to replicate this across many departments, so we've had a great lifespan with those customers, staying and growing and expanding but we were getting a little bit of market pressure from the investment community, as well as that same customer community that they wanted us to integrate with their data catalog and the data catalog of choice. Every time the conversation was Collibra. And interestingly enough, you know, I ran into the likes of Jim Cushman and in the, you know, the whole thing unfolds from there. I think they were seeing a little bit of a similar story saying doesn't catalog and lineage belong together with quality. And when we sat together it was like three market forces suggesting the same answer. And as we laid out the roadmap and the integration we just can't see it any other way. There's no way I'll be bold and say that it goes back the other way, not just for this company but for the industry, data governance and data intelligence will absolutely combine quality, lineage, catalog and all of the above in the future. It is becoming that clear, I think. >> You know, this has kind of a big picture question, about all of that data quality right now, what's driving this avid interest that organizations showing and it's you know, small, medium enterprise it's everybody but in your mind, you know, you've been involved in this for a number of years now. You know, why now, what is it now? Is it just that we have so much more data available that so much of it's own use that, that, you know, we know what we have. And we're realizing that what we have is pretty valuable but you know, what's the driver, what's the big push here? >> Yeah, it is a tough question. And I have gotten this one before and it's interesting because it's been around since the nineties, right? So it's a very fair question. There's a couple things I think that are driving it. One as we start to see more data in Tableau dashboards and pick your favorite BI tool you start to realize the data's not correct. You know, you look at your house on Zillow or whatever you find out it's mislabeled. It doesn't have the right bedrooms. Maybe humans are entering into the listings and as data's become more available visually we're more critical of it. And now businesses are becoming more data-driven where they're humans aren't involved as much and the actions are automatically being taken. And it becomes an embarrassing moment if your data is incorrect and we can really measure that cost at this point. You do see some other factors like cloud migration. Well, that adds a risk to your business. Could you possibly port everything, not just the servers not just the software, but all of your data into another system and think that there would be no errors in that process. So as people are kind of creating their next generation platforms, and then probably even a touch of COVID accelerating that cloud migration adoption and even just technology adoption. So for a multitude of reasons, there's just more data and there's more data quality concerns than ever before. >> So if you're talking to a prospective client right now, which you probably are, you know, what do you want to share with them? Or what would you encourage them to consider in terms of kind of their data venture their data journey if you will, in terms of, you know, refining what they have in terms of mining appropriately in terms of governing it appropriately, all these things that maybe haven't been given a lot of consideration or deep consideration. >> Yeah, I think the two things although if you listen to my other talks I can talk forever about, about all of those items. It probably, you know, maybe just do the napkin math of all the tables, all the files all the Kafka messages, right? All the columns and fields and attributes and kind of just multiply that out and and try to figure out how you would get coverage. And if you could, how you could maintain it. And why shouldn't we be trading compute power for domain knowledge and things at that point I think that's the first place to start. And probably the second is actually the act of traditional data quality rules puts you in a binary situation. It basically says you will either have a break record or you will not. So it's a yes, no question, what it never will tell you is what the answer should have been. And if you take a deeper look at the solution that we're providing to the market we're actually predicting to you what the correct value is and it's a complete paradigm shift it obviously is much more scientific, but it's much more powerful to get you to the end answer more quickly instead of just going through break records. >> Right? Tremendous capability that you just described. And on that, I'm going to thank you for the time but just think about it, right? We're we're not only going to help you make more sense of your data. We're also going to help you make better decisions and show you what that path might be or what you probably should be considering. So it certainly opens up a lot of doors for a lot of companies in that respect. Kirk, thanks for the time, sorry we didn't have enough time to hear that guitar in the background, but next time I'm going to hold you to it, okay. >> Yeah, that sounds good, John, I really appreciate it. >> All right very good Kirk Haslbeck joining us from Collibra, we continue our coverage here at Data Citizens 21 on theCUBE and I'm John Walls. (bright music)
SUMMARY :
brought to you by Collibra. Kirk good to see you today. me, I'm excited to be here. And whenever you can get it. and that the data's not evolving. like the way you feel, And you know, it's funny and you help people do that. of identify the assets that are most used. and then you were eventually purchased. and all of the above in the future. but you know, what's the driver, and the actions are you know, what do you to get you to the end answer I'm going to hold you to it, okay. Yeah, that sounds good, joining us from Collibra, we
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Kirk | PERSON | 0.99+ |
Collibra | ORGANIZATION | 0.99+ |
John | PERSON | 0.99+ |
John Walls | PERSON | 0.99+ |
six | QUANTITY | 0.99+ |
89 | QUANTITY | 0.99+ |
forty thousand | QUANTITY | 0.99+ |
Kirk Haslbeck | PERSON | 0.99+ |
Jim Cushman | PERSON | 0.99+ |
second | QUANTITY | 0.99+ |
two powers | QUANTITY | 0.99+ |
two things | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
U.S | LOCATION | 0.98+ |
90% | QUANTITY | 0.98+ |
Tableau | TITLE | 0.98+ |
seven figure | QUANTITY | 0.97+ |
tomorrow | DATE | 0.97+ |
OwlDQ | ORGANIZATION | 0.96+ |
today | DATE | 0.95+ |
three market forces | QUANTITY | 0.93+ |
fifty thousand rules | QUANTITY | 0.93+ |
nineties | DATE | 0.93+ |
One | QUANTITY | 0.93+ |
first | QUANTITY | 0.92+ |
theCUBE | ORGANIZATION | 0.91+ |
Kafka | PERSON | 0.88+ |
first place | QUANTITY | 0.85+ |
seven different types | QUANTITY | 0.83+ |
Data Citizens'21 | ORGANIZATION | 0.82+ |
couple things | QUANTITY | 0.73+ |
ORGANIZATION | 0.73+ | |
Data Citizens | ORGANIZATION | 0.72+ |
2021 | DATE | 0.69+ |
COVID | TITLE | 0.69+ |
fortune 1000 | ORGANIZATION | 0.66+ |
Data | EVENT | 0.66+ |
fortune 100 | ORGANIZATION | 0.66+ |
street view | TITLE | 0.65+ |
last couple of years | DATE | 0.63+ |
21 | EVENT | 0.55+ |
Zillow | ORGANIZATION | 0.55+ |
Data Citizens | TITLE | 0.51+ |
Citizens | ORGANIZATION | 0.39+ |
21 | QUANTITY | 0.35+ |
Prince Kohli, Automation Anywhere | Imagine 2019
>> From New York City, it's theCUBE! Covering Automation Anywhere Imagine, brought to you by Automation Anywhere. >> Hey welcome back everybody, Jeff Frick here with theCUBE. We're in Midtown Manhattan at the Automation Anywhere Imagine 2019. We we're here last year, it was about 1,500 people. And really, Automation Anywhere is really hot in the RPA space, Robotic Process Automation, but it's really a lot more than that, it's not just automating some processes, it's really about new ways to work, personal digital assistants, and really changing the game. We're excited to have our next guest, first time on theCUBE. He's Prince Kohli, the CTO of Automation Anywhere. Prince, great to see you. >> Thank you, Jeff, good to be here. >> Yeah, so you weren't here last year, so I'm curious to get your general impressions of the event and kind of the scene here with the Automation Anywhere ecosystem. >> Of course, I wasn't here last year, I heard a lot about it, but the sense of excitement, the sense of growth, and the sense of opportunity that is there in everyone. The number of customers who were here and were excited to be here, partners who were here and were really happy to be here, and of course, the team, my own team. It just, just the sense of excitement, and the fact that we are on a hockey stick, in terms of growth, is just palpable. >> Right, so I'm curious to get your take, you've been in the Valley for a long time, and really the RPA theme is about digital workers. In fact, they get roles, they get names, they talk about 'em on stage like they're people. And the idea is that we all have our own assistant, which has been talked about forever but maybe you kind of had an offshore person you could help dial in your laundry, nothing like what we're talking about today. So, as you look back at the evolution as to how we got here, what's your take on the role of a personal digital assistant? >> That's a great question. The way, in my view, the way it evolved was that it is similar to cloud computing. I think the idea that these things could happen. I mean, you know, Star Trek had it, right? >> Right. >> So I think those things have, as an idea, have existed, but usually it was in fantasy. But what has happened in the last five or ten years, is that computing, the need for automation across applications, the need for work to be less mundane, the need for creativity in our human jobs, those have become really important. And therefore the definition of work is evolving. What can be automated therefore must be automated. And it is not automation within an application, it is automation across applications, across processes, across whichever applications, from whichever vendors there may be, without changing the application itself. And that, with the tenurial of AI and acceptance of AI, I think has allowed people to start accepting the notion of a digital worker. >> It's pretty interesting, one of the topics of the keynote was that the people were the integration point between (laughs) a lot of these systems, super inefficient. And what I think is interesting on the AI front and the automation, the place I see it's just a little bit every day, is on Google, or an app that most people are familiar with, whether it's Google Maps, and suddenly it's got restaurants on it, and suddenly it's got reviews on it, and suddenly it's got Street View or whether it's now on the email where suddenly it's guessing my response, it's auto filling even before I start to complete my email. And it really shows that it's this ongoing continuous innovation empowered by AI and a boatload of data that lets these applications do, as you said, things that before would be considered magical. >> Absolutely, and if you look at the digital worker paradigm, right? It's not, if you look at a great example of a digital worker, for example an AP clerk, an account payables clerk. Think of an invoicing function, an invoice comes in, someone has to read it, interpret it, the (coughs), excuse me, the format of invoices are very different across vendors. Reading, interpreting, tying it to a PO, making sure the PO is correct, making sure the PO is valid, was issued at the right time, the item is not late, someone has signed up, there are so many things one has to do. And a person has to do all that today. But it is really very boring work. There is, you just follow a set of steps, there is not judgment involved, really. What an AP digital worker allows you to do (coughs) is to be able to read the document, interpret it, take all the steps that are necessary, and then be able to do that job 24 hours a day, and allow the offloading of this mundane, boring work, right, from a human. So they can be more creative, they can actually make the process better, as opposed to just following a set of simple rules. >> Right, finishing one of the earlier conversations too, and then defining that process so that you can automate it, you're going to unwind inefficiency, you're going to unwind biases, you're going to unwind a whole bunch of stuff to get it to the automated process. So there's all kinds of secondary benefits beyond simply freeing up your time to do more creative work. >> That is correct, and I think, as you said, there are biases, there are also things that must work together in enterprise and today don't. And you know, the vendors, the application vendors are not going to do that, it is not in their own interest. So someone has to, and we are the fabric that brings it together. >> Right, and just people as an integration point, I thought that was classic, that's like the worst place you want to be. And then the other concept that I think doesn't come out enough is a lot of people can be thinking about RPA as a rip and replace for the people. It's not rip and replace at all, it's really augment, just like you augment with your laptop, your phone, other software applications that you're working with every day. >> It's a great point, we have never seen any customer, even talking about ripping and replacing people. What they're trying to do is give people the tools and the augmentation necessary for them to make their own life better. And that improves the moral of the employees, that improves the company's productivity, of course, right, and probably the best output, the best of vidimation that, it improves their customer satisfaction. Because customers are able to create cards faster, are able to get responses faster, claims get adjusted faster, all these things work very well. >> Right, it's interesting, when you sit back and look at the whole technology stack, some really fundamental changes in microprocesser power, networking speed, storage, now the cloud that puts all this access together, and then you add the AL, and the machine learning on top of it, it's really kind of this crazy perfect storm of technologies that are coming together, that are enabling this, which we really couldn't do before, all those pieces weren't there. So if look forward, as CTO, what are some of the things you're excited about, how do you see this evolving, over the next little time, and mid time, I never go longtime, longtime is forever in the future we don't even guess. >> Longtime, I can predict one thing for sure about long time, that whatever we say today will be wrong, in the long-term. Short and medium-term I think we probably will be right. I think short and medium-term, what I see happening, is that AI becoming a part of pretty much every layer of every product, for us for example, as an intelligent RPA platform, AI is embedded in the interaction with the application, interaction with the screen, interaction with the person, interaction with the document, so whichever way we interact with the outside world, as well as how we get better ourselves, AI is embedded in that. And then we use many third-party AI's as our own part to add AI enabled skills, for example understanding if a insurance claim should be denied or not, a credit card should be issued or not. So all these things become part of how AI helps us in day-to-day. So I think that will be the biggest change, I think people, the example that you brought up, right, Google email. I don't think that people predicted that with the first use of AI, in Google, but it is very useful, I use it all the time, because it happens to get better all the time, it knows all my phrases, it knows how I respond, I think that'll happen again and again. >> Right, right, it's just like spell-check, the great unwashed AI that we've all been using for years, and years, and years. Alright Prince, so, the final word is really, I think that's important, is, you're talking about the intelligence. It's not just a process that we apply software to, but this ongoing iterative intelligence applied, whether it's machine learning, or AI, to make it better, and better, and better. It's not just going to be static. >> Not at all, not at all. I think it understands what it needs to be doing, and it then provides ideas on how it could be doing better, and then it integrates those ideas back. Everything gets better over time, and everything that a human finds repetitive, high volume, boring, will eventually get farmed off, to an augmentation, additional worker, additional system. >> And oh, by the way, the number of open rec's is still not going to go down, right? >> Because, you know, if you remember the ATM world, as an ATM started coming in people started worrying tellers will go away and the number of jobs will go down. Actually banks are doing really well, right, and they started hiring more people. The nature of the job changes, the value that humans provide go higher and higher, but that's what happens, eventually. >> Alright Prince, congratulations for you for jumping on a rocket ship, I'm sure it's going to be (laughs) a really fun ride, and having us here at the show. >> Excellent, thank you Jeff, thank you so much. >> Alright, he's Prince, I'm Jeff, your watching theCube, we're on Automation Anywhere Imagine 2019 in midtown Manhattan, thanks for watching, we'll see you next time. (energetic music)
SUMMARY :
brought to you by Automation Anywhere. personal digital assistants, and really changing the game. and kind of the scene here and of course, the team, my own team. and really the RPA theme is about digital workers. I mean, you know, Star Trek had it, right? the need for work to be less mundane, on the AI front and the automation, and allow the offloading of this mundane, and then defining that process so that you can automate it, And you know, the vendors, the application vendors that's like the worst place you want to be. And that improves the moral of the employees, and the machine learning on top of it, AI is embedded in the interaction with the application, Alright Prince, so, the final word is really, and it then provides ideas on how it could be doing better, and the number of jobs will go down. Alright Prince, congratulations for you Excellent, thank you Jeff, thanks for watching, we'll see you next time.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Jeff | PERSON | 0.99+ |
Jeff Frick | PERSON | 0.99+ |
New York City | LOCATION | 0.99+ |
last year | DATE | 0.99+ |
Automation Anywhere | ORGANIZATION | 0.99+ |
first | QUANTITY | 0.99+ |
Star Trek | TITLE | 0.99+ |
today | DATE | 0.98+ |
Midtown Manhattan | LOCATION | 0.98+ |
ORGANIZATION | 0.98+ | |
24 hours a day | QUANTITY | 0.98+ |
one | QUANTITY | 0.97+ |
2019 | DATE | 0.97+ |
first time | QUANTITY | 0.97+ |
Google Maps | TITLE | 0.97+ |
theCUBE | ORGANIZATION | 0.97+ |
Street View | TITLE | 0.97+ |
Prince | PERSON | 0.95+ |
Automation Anywhere Imagine | ORGANIZATION | 0.95+ |
Prince Kohli | PERSON | 0.93+ |
one thing | QUANTITY | 0.92+ |
about 1,500 people | QUANTITY | 0.91+ |
Automation Anywhere | TITLE | 0.87+ |
midtown Manhattan | LOCATION | 0.84+ |
Imagine | TITLE | 0.83+ |
ten years | QUANTITY | 0.82+ |
years | QUANTITY | 0.8+ |
TITLE | 0.76+ | |
last | DATE | 0.58+ |
five | QUANTITY | 0.53+ |
CTO | PERSON | 0.52+ |
theCube | ORGANIZATION | 0.4+ |