Joe DosSantos, Qlik | CUBE Conversation, April 2019
>> From the SiliconANGLE Media office in Boston, Massachusetts, it's theCUBE! Now here's your host, Stu Miniman! >> I'm Stu Miniman and this is a CUBE Conversation from our Boston area studio. Going to dig in to discuss the data catalog and to help me do that, I want to welcome to the program first-time guest Joe DosSantos who is the global Head of Data Management Strategy at Qlik. Joe, thank you so much for joining us. >> Good to be here Stu. >> All right so the data catalog, let's start there. People, in general, know what a catalog is. well maybe some of the millenniums might not know as much as those of us that been in the industry a little bit longer might have. So start there and help level set us. >> So our thinking is that there are lots of data assets around and people can't get at them. And just like you might be able to go to Amazon and shop for something, and you go through a catalog or you go to the library and you can see what's available, we're trying to approximate that same kind of shopping experience for data. You should be able to see what you have, you should be able to look for things that you need, you should be able to find things you didn't even know were available to you. And then you should be able to be able to put them into your cart in a secure way. >> So Joe, the step one is, I've gathered my data lake, or whatever oil or water analogy we want to use for gathering the data on, and then we've usually got analytic tools and lots of things there but this is a piece of that overall puzzle, do I have that right? >> That's exactly right so, if you think about what are the obstacles to analytics, there are studies out there that say less than one percent of analytics data is actually being analyzed. We're having a trouble with the pipelines to get data into the hands of people who can do something meaningful with it. So what is meaningful? Could be data science, could be natural language, which maybe if you have an Alexa at home or you just ask a question and that information is provided right back to you. So somebody wants to do something meaningful with data but they can't get it. Step one is go retrieve it, so our Attunity solution is really about how do we start to effectively build pipelines to go retrieve data from the source? The next step though is how do I understand that data? Cataloging isn't about just having a whole bunch of boxes on a shelf, it's being able to describe the contents of those shelves, it's being able to know that I need that thing. If you were to go into an Amazon.com experience and you say I'm going on a fishing trip and you're looking for a canoe, it'll offer you a paddle, it'll offer you lifejackets. It guides you through that experience. We want data to be the same way, this guided trip through the data that's available to you in that environment. >> Yes, it seems like - metadata is something we often talk about but it seems like even more than that. >> It really is, metadata is a broad term. If you want to know about your data, you want to know where it came from. I often joke that there are three things you want to know about data: what is it, where did it come from and who can have access to it under what circumstances. Now those are really simple concepts but they're really complex under the covers. What is data? Well, is this private information, is this person identifiable information, is a tax ID, is it a credit card? I come from TD Bank and we were very preoccupied with the idea of someone getting data that they shouldn't. You don't want everyone running around with credit cards, how do I recognize a credit card, how do I protect a credit card? So the idea of cataloging is not just available for everything, it's security. I'm going to give you an example of what happens when you walk into a pharmacy. If you walk into a pharmacy and you want a pack of gum or shampoo you walk up to the shelf and you grab it, it's carefully marked in the aisles, it's described but it's public, it's easy to get, there aren't any restrictions. If you wanted chewing tobacco or cigarettes you would need to present somebody with an ID who need to say that you are of age, who would need to validate that you are authorized to see that and if you wanted Oxycontin, you'd best have a prescription. Why isn't data like that, why don't we have rules that stipulate what kind of data belong in what kind of category and who can have access to it? We believe that you can, so a lot of impediments to that are about availability and visibility but also about security and we believe that once you've provisioned that data to a place then the next step is understanding clearly what it is, and who can have access to it so that you can provision it downstream to all of these different analytic consumers that need it. >> Yeah, data security is absolutely front and center, it's the conversation at board levels today, so the catalog, is it a security tool or it works with kind of your overall policies and procedures? >> So you need to have a policy. One of the fascinating things that exists in a lot of companies is you ask people please give me the titles of the columns that constitute personally identifiable information, you'll get blank stares. So if you don't have a policy, you don't have a construct, you're hopelessly lost. But as soon as you write that down now you can start building rules around that. You can know who can have access to what under what circumstances. When I was at TD we took care to try and figure out what the circumstances were that allowed people to do their job. If you're in marketing you need to understand the demographic information, you need to be able to distribute a marketing list that actually has people's names and addresses on it. Do you need their credit card number, probably not. We started to work through these scenarios of understanding what the nature of data was on a must-have basis and then you don't have to ask for approval every single time. If you go to Amazon you don't ask for approval to buy the canoe, you just know whether it's in stock, if it's available and if it's in your area. Same thing with data, we want to remove all of the friction associated with that just because the rules are in place. >> Okay, so now that I have the data what do I do with it? >> Well this is actually really an important part of out Qlik story. So Qlik is not trying to lock people into a Qlik visualization scenario. Once you have data what we're trying to do is to say that discovery might happen across lots of different platforms. Maybe you're a Tableau user, I don't know why, but there are Tableau users - no in fact we did use Tableau at TD - but if you wanted provision data and discover things and comparable BI tools, no problem. Maybe you want to move that into a machine learning type of environment, you have TensorFlow, you have H2O libraries doing predictive modeling, you have R and Python, all of those things are things that you might want to do, in fact these days a lot of times people don't want analytics and visualizations, they want to ask the questions, do you have an Amazon Alexa in your house? >> I have an Alexa and a Google Home. >> That's right so you don't want a fancy visualization, you want the answer to a question so a catalog enables that, a catalog helps you figure out where the data is that asks a question. So when you ask Alexa what's the capital of Kansas it's going through the databases that it has that are neatly tagged and cataloged and organized and it comes back with Topeka. >> Yeah. >> I didn't want to stump you there. >> Thank you Joe, boy, I think back in the world, there are people, ontological studies as to how I put these things together. As a user I'm guessing, using a tool like this, I don't need to have to figure how to set all this up, there's got to be way better tools and things like that just like in the discussion of metadata, most systems today do that for me or at least a lot of it but how much do I as a customer customize stuff and how much does it do it for me? >> So when you and I have a conversation we share a language and if I say where do you live you know that living implies a house, implies an address and you've made that connection. And so effectively all businesses have their own terminology and ontology of how they speak and what we do is, if we have that ontology described to us we will enforce those rules so we are able to then discover the data that fits that categorization of data. So we need the business to define that force and again a lot of this is about processing procedure. Anyone who works in technology knows that very little of the technological problems are actually about technology, they're about process and people and psychology. What we're doing is if someone says I care deeply and passionately about customers and customers have addresses and these are the rules around them, we can then apply those rules. Imagine the governance tools are there to make laws, we're like the police, we enforce those laws at time of shopping in that catalog metaphor. >> Wow Joe, my mind is spinning a little bit because one of the problems you have if you work for a big customer, you'd have different parts of the company that would all want the same answer but they'd ask it in very different ways and they don't speak the same language so does a catalog help with that? >> Well it does and it doesn't. I think that we are moving to a world in which for a lot of questions, truth is in the eye of the beholder. So if you think about a business that wants to close the books, you can't have revenue that was maybe three million, maybe four million. But if you want to say what was the effectiveness of the campaign that we ran last night? Was it more effective with women or men - why? Anytime someone asks a question like why, or I wonder if, these are questions that invite investigation, analysis and we can come to the table with different representations of that data, it's not about truth, it's about how we interpret that. So one of the peculiar and difficult things for people to wrap their arm around is in the modern data world with data democratization, two people can go in search of the same question and get wildly different answers. That's not bad, that's life, right? So what's the best movie that's out right now? There's no truth, it's a question of your tastes and what you need to be able to do is, as we move to a democratized world is, what were the criteria that were used? What was the data that was used? And so we need those things to be cited but the catalog is effectively the thing that puts you in touch with the data that's available. Think about your college research projects. You wrote a thesis or a paper, you were meant to draw a conclusion, you had to go to the library and get the books that you needed. And maybe, hopefully, no one had ever combined all of those ideas from those books to create the conclusion that you did. That's what we're trying to do every single day in the businesses of the world in 2019. >> Yeah it's a little scary in the world of science most things don't come down to a binary answer, there's the data to prove it and what we understand today might not be - if we look and add new data to it it could change. Bring in some customer examples as to what they're doing, how this impacts it and I wish brings more certainty into our world. >> Absolutely, so I come from TD Bank and I was the Vice President of Information Management Technology there, and we used Data Catalyst to catalog a very large data lake so we had a Hadoop data lake that was six petabytes, had about 200 different applications in it. And what we were able to do was to allow self service to those data assets in that lake. So imagine you're just looking for data and instead of having to call somebody or get a pipeline built and spend the next six months getting data, you go to a portal, you grab that data. So what we were able to do was to make it very simple to reduce that. We usually think that it takes about 50% of your time in an analysis context to find the data, to make the data useful, what if that was all done for you? So we created a shopping experience for that at an enterprise level. What was the goal - well at TD, we were all about legendary customer experience so we found very important were customer interactions and their experiences, their transactions, their web Qliks, their behavioral patterns and if you think about it what any company is looking to do is to catch a customer in the act of deciding and what are those critical things that people decide? In a bank it might be when to buy a house, when you need mortgages and you need potentially loans and insurance. For a healthcare company it might be when they change jobs, for a hospital it might be when the weather changes. And everybody's looking for an advantage to do that and you can only get that advantage if you're creative about recognizing those moments through analytics and then acting in real time with streaming to do something about that moment. >> All right so Joe one of the questions I have is is there an aspect of time when you go into this because I understand if I ask questions based on the data that I have available today but if I'd asked that two weeks before that it would be some different data and if I kept watching it, it would do that and so I've got certain apps I use like when's the best time to buy a ticket, when is the best time to do that, how does that play in? >> So there are two different dimensions to this, the first is what we call algorithmic decay. If you're going to try and develop an algorithm you don't want the data shifting under your feet as you do things because all of a sudden your results will change if you're not right and the sad reality is that most humans are not very original so if I look at your behavior for the past ten years and if I look at the past twenty it won't be necessarily different from somebody else, so what we're looking to do is catch mass patterns, that's the power of big data, to look at a lot of patterns to figure out the repeatability in most patterns. At that point you're not really looking for the data to change, then you go to score it and this is where the data changes all the time. So think about big data as looking at a billion rows and figuring out what's going on. The next thing would be traditionally called fast data which is now based on an algorithm - this event just happened, what should I do? That data is changing under your feet regularly, you're looking to stream that data, maybe with a change data capture tool like Attunity, you're looking to get that into the hands of people in applications to make decisions really quickly. Now what happens over time is people's behaviors change - only old people are on Facebook now right, you know this, so demographics change and the things that used to be very predictive fail to be and there has to be capability in an industry, in an enterprise to be able deal with those algorithms as they start to decay and replace them with something fresher. >> All right Joe, how do things like government compliance fit into this? >> So governance is really at the core of the catalog. You really need to understand what the rules are if you want to have an effective catalog. We don't believe that every single person in a data democratized world should have access to every single data element. So you need to understand what is this data, how should I protect it and how should I think about the overall protection of this data and the use of this data. This is a really important governance principle to figure out who can have access to these data sets under what circumstances. Again nothing to do with technology but the catalog should really enforce your policy and a really good catalog should help to enforce the policies that you're coming up with, with who should have access to that data under what circumstances. >> Okay so Joe this is a pretty powerful tool, how do customers measure that they're getting adoption, that they're getting the results that they were hoping to when they roll this out? >> No one ever woke up one day and said boy would it be great if I stockpiled petabytes of data. At the end of the day, >> I know some storage companies that say that. >> They wish the customers would say that but at the end of the day you have data for analytics value and so what is analytics value? Maybe it's about a predictive algorithm. Maybe it's about a vizualisation, maybe its about a KPI for your executive suite. If you don't know, you shouldn't start. What we want to start to do is to think about use cases that make a difference to an enterprise. At TD that was fundamentally about legendary customer experience, offering the next best action to really delight that customer. At SunLife that was about making sure that they had an understand from a customer support perspective about their consumers. At some of our customers, at a healthcare company it was about faster discovery of drugs. So if you understand what those are you then start from the analytical outcome to the data that supports that and that's how you get started. How can I get the datasets that I'm pretty sure are going to drive the needle and then start to build from there to make me able to answer more and more complex questions. >> Well great those are some pretty powerful use cases, I remember back in the early Hadoop days it was like let's not have the best minds of our time figuring out how you can get better ad clicks right? >> That's right it's much easier these days. Effectively Hadoop really allows you to do, what big data really allows you to do is to answer questions more comprehensively. There was a time when cost would prevent you from being able to look at ten years worth of history, those cost impediments are gone. So your analytics are can be much better as a result, you're looking at a much broader section of data and you can do much richer what-if analysis and I think that really the secret of any good analytics is encouraging the what-if kind of questions. So you want in a data democratized world to be able to encourage people to say I wonder if this is true, I wonder if this happened and have the data to support that question. And people talk a lot about failing fast, glibly, what does that mean? Well I wonder if right now women in Montana in summertime buy more sunglasses. Where's the data that can answer that question? I want that quickly to me and I want in five minutes to say boy Joe, that was really stupid. I failed and I failed fast but it wasn't because I spent the next six weeks looking for the data assets, it's because I had the data, got analysis really quickly and then moved on to something else. The people that can churn through those questions fastest will be the ones that win. >> Very cool, I'm one of those people I love swimming into data always seeing what you can learn. Customers that want to get started, what do you recommend, what are the first steps? >> So the first thing is really about critical use case identification. Again no one wants to stockpile data so we need to start to think about how the data is going to affect an outcome and think about that user outcome. Is it someone asking in natural language a question of an application to drive a certain behavior? Is it a real time decision, what is the thing that you want to get good at? I've mentioned that TD wanted to be good about customer experience and offer development. If you think about what Target did there's a notorious story about them being able to predict pregnancy because they recognized that there was an important moment, there was a behavioral change in consumers that would overall change how they buy. What's important to you, what data might be relevant for that, anchor it there, start small, go start to operationalize the pipes that get you the data that you need and encourage a lot of experimentation with these data assets that you've got. You don't need to create petabytes of data. Create the data sets that matter and then grow from use case to use case. One of our customers SunLife did a wonderful job of really trying to articulate seven or eight key use cases that would matter and built their lake accordingly. First it was about customer behavior then it was employee behavior. If you can start to think about your customers and what they care about there's a person out there that cares about customer attrition. There's a person out there that cares about employee attrition, there's a person out there that cuts costs about cost of delivery of goods. Let's figure out what they need and how to use analytics to drive that and then we can start to get smart about the data assets that can really cause that analytics to explode. >> All right well Joe, really appreciate all the updates on the catalogs there, data at the center of digital transformation for so many customers and illuminating some key points there. >> Happy to be here. >> All right thank you so much for watching theCUBE, I'm Stu Miniman. (upbeat music)
SUMMARY :
and to help me do that, I want to welcome All right so the data catalog, let's start there. You should be able to see what you have, that's available to you in that environment. Yes, it seems like - metadata is something we often are authorized to see that and if you wanted the demographic information, you need to be able do you have an Amazon Alexa in your house? That's right so you don't want Thank you Joe, boy, I think back in the world, So when you and I have a conversation and what you need to be able to do is, there's the data to prove it and what we and instead of having to call somebody for the data to change, then you go to score it So you need to understand what is this data, At the end of the day, but at the end of the day you have data and have the data to support that question. what do you recommend, what are the first steps? the pipes that get you the data that you need data at the center of digital All right thank you so much
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
TD Bank | ORGANIZATION | 0.99+ |
Joe DosSantos | PERSON | 0.99+ |
Joe | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Stu Miniman | PERSON | 0.99+ |
three million | QUANTITY | 0.99+ |
SunLife | ORGANIZATION | 0.99+ |
April 2019 | DATE | 0.99+ |
Montana | LOCATION | 0.99+ |
2019 | DATE | 0.99+ |
four million | QUANTITY | 0.99+ |
Boston | LOCATION | 0.99+ |
seven | QUANTITY | 0.99+ |
five minutes | QUANTITY | 0.99+ |
ten years | QUANTITY | 0.99+ |
two people | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
less than one percent | QUANTITY | 0.99+ |
Kansas | LOCATION | 0.99+ |
TD | ORGANIZATION | 0.99+ |
six petabytes | QUANTITY | 0.99+ |
One | QUANTITY | 0.99+ |
first | QUANTITY | 0.99+ |
first steps | QUANTITY | 0.99+ |
First | QUANTITY | 0.99+ |
Amazon.com | ORGANIZATION | 0.99+ |
three things | QUANTITY | 0.99+ |
ORGANIZATION | 0.98+ | |
Boston, Massachusetts | LOCATION | 0.98+ |
Tableau | TITLE | 0.98+ |
first-time | QUANTITY | 0.98+ |
about 50% | QUANTITY | 0.97+ |
Target | ORGANIZATION | 0.97+ |
Python | TITLE | 0.97+ |
Alexa | TITLE | 0.97+ |
one | QUANTITY | 0.97+ |
about 200 different applications | QUANTITY | 0.97+ |
Stu | PERSON | 0.97+ |
last night | DATE | 0.96+ |
eight key use cases | QUANTITY | 0.94+ |
TensorFlow | TITLE | 0.9+ |
step one | QUANTITY | 0.89+ |
Information Management Technology | ORGANIZATION | 0.89+ |
one day | QUANTITY | 0.89+ |
R | TITLE | 0.88+ |
SiliconANGLE | ORGANIZATION | 0.87+ |
Oxycontin | COMMERCIAL_ITEM | 0.87+ |
H2O | TITLE | 0.86+ |
Step one | QUANTITY | 0.86+ |
Qlik | PERSON | 0.86+ |
Hadoop | TITLE | 0.85+ |
Qlik | TITLE | 0.84+ |
Qlik | ORGANIZATION | 0.83+ |
single time | QUANTITY | 0.81+ |
billion rows | QUANTITY | 0.81+ |
two weeks before | DATE | 0.81+ |
next six months | DATE | 0.8+ |
Vice President | PERSON | 0.8+ |
two different dimensions | QUANTITY | 0.8+ |
petabytes | QUANTITY | 0.75+ |
COMMERCIAL_ITEM | 0.75+ | |
first thing | QUANTITY | 0.75+ |
single data element | QUANTITY | 0.7+ |
next six weeks | DATE | 0.7+ |
past ten | DATE | 0.66+ |
single person | QUANTITY | 0.65+ |
one of the questions | QUANTITY | 0.64+ |
single day | QUANTITY | 0.64+ |