Image Title

Search Results for Michael Collins:

Aaron Kalb, Alation | MIT CDOIQ 2019


 

>> From Cambridge, Massachusetts, it's theCUBE covering MIT Chief Data Officer and Information Quality Symposium 2019, brought to you by SiliconANGLE Media. (dramatic music) >> Welcome back to Cambridge, Massachusetts, everybody. This is theCUBE, the leader in live tech coverage. We go out to the events, and we extract the signal from then noise. And, we're here at the MIT CDOIQ, the Chief Data Officer conference. I'm Dave Vellante with my cohost Paul Gillin. Day two of our wall to wall coverage. Aaron Kalb is here. He's the cofounder and chief data officer of Alation. Aaron, thanks for making the time to come on. >> Thanks so much Dave and Paul for having me. >> You're welcome. So, words matter, you know, and we've been talking about data, and big data, and the three Vs, and data is the new oil, and all this stuff. You gave a talk this week about, you know, "We're maybe not talking the right language "when it comes to data." What did you mean by all that? >> Absolutely, so I get a little bit frustrated by some of these cliques we hear at conference after conference, and the one I, sort of, took aim at in this talk is, data is the new oil. I think what people want to invoke with that is to say, in the same way that oil powered the industrial age, data's powering the information age. Just saying, data's really cool and trendy and important. That's true, but there are a lot of other associations and contexts that people have with data, and some of them don't really apply as, I'm sorry, with oil. And, some of them apply, as well, to data. >> So, is data more valuable than oil? >> Well, I think they're each valuable in different ways, but I think there's a couple issues with the metaphor. One is that data is scarce and dwindling, and part of value comes from the fact that it's so rare. Whereas, the experience with data is that it's so plentiful and abundant, we're almost drowning in it. And so, what I contend is, instead of talking about data as compared to oil, we should talk about data compared to water. And, the idea is, you know, water is very plentiful on the planet, but sometimes, you know, if you have saltwater or contaminated water, you can't drink it. Water is good for different purposes, depending on its form, and so it's all about getting the right data for the right purpose, like water. >> Well, we've certainly, at least in my opinion, fought wars, Paul, over oil. >> And, over water. >> And, certainly, conflicts over water. Do you think we'll be fighting wars over data? Or, are we already? >> No, we might be. One of my favorite talks from the sessions here was a keynote by the CDO for the Department of Defense, who was talking about, you know, the civic duty about transparency but was observing that, actually, more IP addresses from China and Russia are looking at our public datasets than from within the country. So, you know, it's definitely a resource that can be very powerful. >> So, what was the reaction to your premise from the audience. What kind of questions did you get? >> You know, people actually responded very favorably, including some folks from the oil and gas industry, which I was pleased to find. We have a lot of customers in energy, so that was cool. But, what it was nice being here at MIT and just really geeking out about language and linguistics and data with a bunch of CDOs and other people who are, kind of, data intellectuals. >> Right, so if data is not the new oil. >> And, water isn't really a good analogy either, because the supply of water is finite. >> That's true. >> So, what is data? >> Yeah. >> Space? >> Yeah, it's a good point. >> Matter? >> Maybe it is like the universe in that it's always expanding, right, somehow. Right, because any thing, any physic which is on the planet probably won't be growing at that exponential speed. >> So, give us the punchline. >> Well, so I contend that water, while imperfect, is, actually, a really good metaphor that helps for a lot of things. It has properties like the fact that if it's a data quality issue, it flows downstream like pollution in a river. It's the fact that it can come in different forms, useful for different purposes. You might have gray water, right, which is good enough for, you know, irrigation or industrial purposes, but not safe to drink. And so, you rely on metadata to get the data that's in the right form. And, you know, the talk is more fun because you've a lot of visual examples that make this clear. >> Yeah, of course, yeah. >> I actually had one person in the audience say that he used a similar analogy in his own company, so it's fun to trade notes. >> So, chief data officer is a relatively new title for you, is it not? In terms of your role at Alation. >> Yeah, that's right, and the most fun thing about my job is being able to interact with all of the other CDOs and CDAOs at a conference like this. And, it was cool to see. I believe this conference doubled since the last year. Is that right? >> No. >> No, it's up about a hundred, though. >> Right. >> Well. >> And, it's about double from three years ago. >> And, when we first started, in 2013, yeah. >> 130 people, yeah. >> Yeah, it was a very small and intimate event. >> Yeah, here we're outgrowing this building, it seems. >> Yeah, they're kicking us out. >> I think what's interesting is, you know, if we do a little bit of analysis, this is a small data, within our own company, you know, our biggest and most visionary customers typically bought Alation. The buyer champion either was a CDO or they weren't a CDO when they bought the software and have since been promoted to be a CDO. And so, seeing this trend of more and more CDOs cropping up is really exciting for us. And also, just hearing all of the people at the conference saying, two trends we're hearing. A move from, sort of, infrastructure and technology to driving business value, and a move from defense and governance to, sort of, playing offense and doing revenue generation with data. Both of those trends are really exciting for us. >> So, don't hate me for asking this question, because what a lot of companies will do is, they'll give somebody a CDO title, and it's, kind of, a little bit of gimmick, right, to go to market. And, they'll drag you into sales, because I'm sure they do, as a cofounder. But, as well, I know CDOs at tech companies that are actually trying to apply new techniques, figure out how data contributes to their business, how they can cut costs, raise revenue. Do you have an internal role, as well? >> Absolutely, yeah. >> Explain that. >> So, Alation, you know, we're about 250 people, so we're not at the same scale as many of the attendees here. But, we want to learn, you know, from the best, and always apply everything that we learn internally as well. So, obviously, analytics, data science is a huge role in our internal operations. >> And so, what kinds of initiatives are you driving internally? Is it, sort of, cost initiatives, efficiency, innovation? >> Yeah, I think it's all of the above, right. Every single division and both in the, sort of, operational efficiency and cost cutting side as well as figuring out the next big bet to make, can be informed by data. And, our goal was to empower a curious and rational world, and our every decision be based not on the highest paid person's opinion, but on the best evidence possible. And so, you know, the goal of my function is largely to enable that both centrally and within each business unit. >> I want to talk to you about data catalogs a bit because it's a topic close to my heart. I've talked to a lot of data catalog companies over the last couple years, and it seems like, for one thing, the market's very crowded right now. It seems to me. Would you agree there are a lot of options out there? >> Yeah, you know, it's been interesting because when we started it, we were basically the first company to make this technology and to, kind of, use this term, data catalog, in this way. And, it's been validating to see, you know, a lot of big players and other startups even, kind of, coming to that terminology. But, yeah, it has gotten more crowded, and I think our customers who, or our prospects, used to ask us, you know, "What is it that you do? "Explain this catalog metaphor to me," are now saying, "Yeah, catalogs, heard about that." >> It doesn't need to be defined anymore. >> "Which one should I pick? "Why you?" Yeah. >> What distinguished one product from another, you know? What are the major differentiation points? >> Yeah, I think one thing that's interesting is, you know, my talk was about how the metaphors we use shape the way we think. And, I think there's a sense in which, kind of, the history of each company shapes their philosophy and their approach, so we've always been a data catalog company. That's our one product. Some of the other catalog vendors come from ETL background, so they're a lot more focused on technical metadata and infrastructure. Some of the catalog products grew out of governance, and so it's, sort of, governance first, no sorry, defense first and then offense secondary. So, I think that's one of the things, I think, we encourage our prospects to look at, is, kind of, the soul of the company and how that affects their decisions. The other thing is, of course, technology. And, what we at Alation are really excited about, and it's been validating to hear Gartner and others and a lot of the people here, like the GSK keynote speaker yesterday, talking about the importance of comprehensiveness and on taking a behavioral approach, right. We have our Behavioral IO technology that really says, "Let's not look at all the bits and the bytes, "but how are people using the data to drive results?" As our core differentiator. >> Do your customers generally standardize on one data catalog, or might they have multiple catalogs for multiple purposes? >> Yeah, you know, we heard a term more last season, of catalog of catalogs, you know. And, people here can get arbitrarily, you know, meta, meta, meta data, where we like to go there. I think the customers we see most successful tend to have one catalog that serves this function of the single source of reference. Many of our customers will say, you know, that their catalog serves as, sort of, their internal Google for data. Or, the one stop shop where you could find everything. Even though they may have many different sources, Typically you don't want to have siloed catalogs. It makes it harder to find what you're looking for. >> Let's play a little word association with some metaphors. Data lake. (laughter) >> Data lake's another one that I sort of hate. If you think about it, people had data warehouses and didn't love them, but at least, when you put something into a warehouse, you can get it out, right. If you throw something into a lake, you know, there's really no hope you're ever going to find it. It's probably not going to be in great shape, and we're not surprised to find that many folks who invested heavily in data lakes are now having to invest in a layer over it, to make it comprehensible and searchable. >> So, yeah, the lake is where we hide the stolen cars. Data swamp. >> Yeah, I mean, I think if your point is it's worse than lake, it works. But, I think we can do better a lake, right. >> How about data ocean? (laughter) >> You know, out of respect for John Furrier, I'll say it's fantastic. But, to us we think, you know, it isn't really about the size. The more data you have, people think the more data the better. It's actually the more data the worse unless you have a mechanism for finding the little bit of data that is relevant and useful for your task and put it to use. >> And to, want to set up, enter the catalog. So, technically, how does the catalog solve that problem? >> Totally, so if we think about, maybe let's go to the warehouse, for example. But, it works just as well on a data lake in practice. >> Yeah, cool. >> Through the catalog is. It starts with the inventory, you know, what's on every single shelf. But, if you think about what Amazon has done, they have the inventory warehouse in the back, but what you see as a consumer is a simple search interface, where you type in the word of the product you're looking for. And then, you see ranked suggestions for different items, you know, toasters, lamps, whatever, books I want to buy. Same thing for data. I can type in, you know, if I'm at the DOD, you know, information about aircraft, or information about, you know, drug discovery if I'm at GSK. And, I should be able to therefore see all of the different data sets that I have. And, that's true in almost any catalog, that you can do some search over the curated data sets there. With Alation in particular, what I can see is, who's using it, how are they using it, what are they joining it with, what results do they find in that process. And, that can really accelerate the pace of discovery. >> Go ahead. >> I'm sorry, Dave. To what degree can you automate some of that detail, like who's using it and what it's being used for. I mean, doesn't that rely on people curating the catalog? Or, to what degree can you automate that? >> Yeah, so it's a great question. I think, sometimes, there's a sense with AI or ML that it's like the computer is making the decisions or making things up. Which is, obviously, very scary. Usually, the training data comes from humans. So, our goal is to learn from humans in two ways. There's learning from humans where humans explicitly teach you. Somebody goes and says, "This is goal standard data versus this is, "you know, low quality data." And, they do that manually. But, there's also learning implicitly from people. So, in the same way on amazon.com, if I buy one item and then buy another, I'm doing that for my own purposes, but Amazon can do collaborative filtering over all of these trends and say, "You might want to buy this item." We can do a similar thing where we parse the query logs, parse the usage logs and be eye tools, and can basically watch what people are doing for their own purposes. Not to, you know, extra work on top of their job to help us. We can learn from that and make everybody more effective. >> Aaron, is data classification a part of all this? Again, when we started in the industry, data classification was a manual exercise. It's always been a challenge. Certainly, people have applied math to it. You've seen support vector machines and probabilistic latent cement tech indexing being used to classify data. Have we solved that problem, as an industry? Can you automate the classification of data on creation or use at this point in time? >> Well, one thing that came up in a few talks about AI and ML here is, regardless of the algorithm you're using, whether it's, you know, IFH or SVM, or something really modern and exciting that keeps learning. >> Stuff that's been around forever or, it's like you say, some new stuff, right. >> Yeah, you know, actually, I think it was said best by Michael Collins at the DOD, that data is more important than the algorithm because even the best algorithm is useless without really good training data. Plus, the algorithm's, kind of, everyone's got them. So, really often, training data is the limiting reactant in getting really good classification. One thing we try to do at Alation is create an upward spiral where maybe some data is curated manually, and then we can use that as a seed to make some suggestions about how to label other data. And then, it's easier to just do a confirm or deny of a guess than to actually manually label everything. So, then you get more training, get it faster, and it kind of accelerates that way instead of being a big burden. >> So, that's really the advancement in the last five to what, five, six years. Where you're able to use machine intelligence to, sort of, solve that problem as opposed to brute forcing it with some algorithm. Is that fair? >> Yeah, I think that's right, and I think what gets me very excited is when you can have these interactive loops where the human helps the computer, which helps the human. You get, again, this upward spiral. Instead of saying, "We have to have all of this, "you know, manual step done "before we even do the first step," or trying to have an algorithm brute force it without any human intervention. >> It's kind of like notes key mode on write, except it actually works. I'm just kidding to all my ADP friends. All right, Aaron, hey. Thanks very much for coming on theCUBE, but give your last word on the event. I think, is this your first one or no? >> This is our first time here. >> Yeah, okay. So, what are your thoughts? >> I think we'll be back. It's just so exciting to get people who are thinking really big about data but are also practitioners who are solving real business problems. And, just the exchange of ideas and best practices has been really inspiring for me. >> Yeah, that's great. >> Yeah. >> Well, thank you for the support of the event, and thanks for coming on theCUBE. It was great to see you again. >> Thanks Dave, thanks Paul. >> All right, you're welcome. >> Thank you, sir. >> All right, keep it right there, everybody. We'll be back with our next guest right after this short break. You're watching theCUBE from MIT CDOIQ. Be right back. (upbeat music)

Published Date : Aug 1 2019

SUMMARY :

brought to you by SiliconANGLE Media. Aaron, thanks for making the time to come on. and data is the new oil, and all this stuff. in the same way that oil powered the industrial age, And, the idea is, you know, water is very plentiful Well, we've certainly, at least in my opinion, Do you think we'll be fighting wars over data? So, you know, it's definitely a resource What kind of questions did you get? We have a lot of customers in energy, so that was cool. because the supply of water is finite. Maybe it is like the universe And, you know, the talk is more fun because you've a lot I actually had one person in the audience say So, chief data officer is a relatively Yeah, that's right, and the most fun thing I think what's interesting is, you know, And, they'll drag you into sales, But, we want to learn, you know, from the best, And so, you know, the goal of my function I want to talk to you about data catalogs a bit And, it's been validating to see, you know, "Which one should I pick? Yeah, I think one thing that's interesting is, you know, Or, the one stop shop where you could find everything. Data lake. when you put something into a warehouse, So, yeah, the lake is where we hide the stolen cars. But, I think we can do better a lake, right. But, to us we think, you know, So, technically, how does the catalog solve that problem? maybe let's go to the warehouse, for example. I can type in, you know, if I'm at the DOD, you know, Or, to what degree can you automate that? Not to, you know, extra work on top of their job to help us. Can you automate the classification of data whether it's, you know, IFH or SVM, or something it's like you say, some new stuff, right. Yeah, you know, actually, I think it was said best in the last five to what, five, six years. when you can have these interactive loops I'm just kidding to all my ADP friends. So, what are your thoughts? And, just the exchange of ideas It was great to see you again. We'll be back with our next guest

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Michael CollinsPERSON

0.99+

Paul GillinPERSON

0.99+

PaulPERSON

0.99+

AmazonORGANIZATION

0.99+

DavePERSON

0.99+

2013DATE

0.99+

Aaron KalbPERSON

0.99+

Dave VellantePERSON

0.99+

AaronPERSON

0.99+

fiveQUANTITY

0.99+

Department of DefenseORGANIZATION

0.99+

six yearsQUANTITY

0.99+

John FurrierPERSON

0.99+

amazon.comORGANIZATION

0.99+

yesterdayDATE

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

AlationPERSON

0.99+

AlationORGANIZATION

0.99+

GartnerORGANIZATION

0.99+

one itemQUANTITY

0.99+

Cambridge, MassachusettsLOCATION

0.99+

first stepQUANTITY

0.99+

last yearDATE

0.99+

GSKORGANIZATION

0.99+

bothQUANTITY

0.99+

DODORGANIZATION

0.99+

one personQUANTITY

0.99+

GoogleORGANIZATION

0.99+

130 peopleQUANTITY

0.98+

OneQUANTITY

0.98+

first timeQUANTITY

0.98+

MITORGANIZATION

0.98+

one productQUANTITY

0.97+

three years agoDATE

0.97+

this weekDATE

0.97+

twoQUANTITY

0.97+

MIT CDOIQORGANIZATION

0.96+

MIT Chief Data Officer andEVENT

0.96+

one data catalogQUANTITY

0.96+

eachQUANTITY

0.96+

each companyQUANTITY

0.95+

BothQUANTITY

0.95+

one thingQUANTITY

0.95+

first oneQUANTITY

0.94+

one catalogQUANTITY

0.93+

two trendsQUANTITY

0.93+

theCUBEORGANIZATION

0.93+

firstQUANTITY

0.92+

first companyQUANTITY

0.92+

last couple yearsDATE

0.92+

CDOORGANIZATION

0.91+

about a hundredQUANTITY

0.91+

single shelfQUANTITY

0.88+

about 250 peopleQUANTITY

0.88+

single sourceQUANTITY

0.87+

ChinaLOCATION

0.87+

2019DATE

0.86+

Day twoQUANTITY

0.86+

oneQUANTITY

0.85+

each business unitQUANTITY

0.82+

MIT CDOIQEVENT

0.79+

ADPORGANIZATION

0.79+

couple issuesQUANTITY

0.76+

Information Quality Symposium 2019EVENT

0.76+

One thingQUANTITY

0.7+

single divisionQUANTITY

0.69+

one stopQUANTITY

0.68+

RussiaLOCATION

0.64+

threeQUANTITY

0.61+

doubleQUANTITY

0.59+

favoriteQUANTITY

0.5+

CDOIQEVENT

0.46+

ChiefPERSON

0.42+