Stephanie McReynolds, Alation | DataWorks Summit 2018

>> Live from San Jose, in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2018, brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of DataWorks here in San Jose, California. I'm your host, Rebecca Knight, along with my co-host, James Kobielus. We're joined by Stephanie McReynolds. She is the Vice President of Marketing at Alation. Thanks so much for, for returning to theCUBE, Stephanie. >> Thank you for having me again. >> So, before the cameras were rolling, we were talking about Kevin Slavin's talk on the main stage this morning, and talking about, well really, a background to sort of this concern about AI and automation coming to take people's jobs, but really, his overarching point was that we really, we shouldn't, we shouldn't let the algorithms take over, and that humans actually are an integral piece of this loop. So, riff on that a little bit. >> Yeah, what I found fascinating about what he presented were actual examples where having a human in the loop of AI decision-making had a more positive impact than just letting the algorithms decide for you, and turning it into kind of a black, a black box. And the issue is not so much that, you know, there's very few cases where the algorithms make the wrong decision. What happens the majority of the time is that the algorithms actually can't be understood by human. So if you have to roll back >> They're opaque, yeah. >> in your decision-making, or uncover it, >> I mean, who can crack what a convolutional neural network does, at a layer by layer, nobody can. >> Right, right. And so, his point was, if we want to avoid not just poor outcomes, but also make sure that the robots don't take over the world, right, which is where every like, media person goes first, right? (Rebecca and James laugh) That you really need a human in the loop of this process. And a really interesting example he gave was what happened with the 2015 storm, and he talked about 16 different algorithms that do weather predictions, and only one algorithm predicted, mis-predicted that there would be a huge weather storm on the east coast. So if there had been a human in the loop, we wouldn't have, you know, caused all this crisis, right? The human could've >> And this is the storm >> Easily seen. >> That shut down the subway system, >> That's right. That's right. >> And really canceled New York City for a few days there, yeah. >> That's right. So I find this pretty meaningful, because Alation is in the data cataloging space, and we have a lot of opportunity to take technical metadata and automate the collection of technical and business metadata and do all this stuff behind the scenes. >> And you make the discovery of it, and the analysis of it. >> We do the discovery of this, and leading to actual recommendations to users of data, that you could turn into automated analyses or automated recommendations. >> Algorithmic, algorithmically augmented human judgment is what it's all about, the way I see it. What do you think? >> Yeah, but I think there's a deeper insight that he was sharing, is it's not just human judgment that is required, but for humans to actually be in the loop of the analysis as it moves from stage to stage, that we can try to influence or at least understand what's happening with that algorithm. And I think that's a really interesting point. You know, there's a number of data cataloging vendors, you know, some analysts will say there's anywhere from 10 to 30 different vendors in the data cataloging space, and as vendors, we kind of have this debate. Some vendors have more advanced AI and machine learning capabilities, and other vendors haven't automated at all. And I think that the answer, if you really want humans to adopt analytics, and to be comfortable with the decision-making of those algorithms, you need to have a human in the loop, in the middle of that process, of not only making the decision, but actually managing the data that flows through these systems. >> Well, algorithmic transparency and accountability is an increasing requirement. It's a requirement for GDPR compliance, for example. >> That's right. >> That I don't see yet with Wiki, but we don't see a lot of solution providers offering solutions to enable more of an automated roll-up of a narrative of an algorithmic decision path. But that clearly is a capability as it comes along, and it will. That will absolutely depend on a big data catalog managing the data, the metadata, but also helping to manage the tracking of what models were used to drive what decision, >> That's right. >> And what scenario. So that, that plays into what Alation >> So we talk, >> And others in your space do. >> We call that data catalog, almost as if the data's the only thing that we're tracking, but in addition to that, that metadata or the data itself, you also need to track the business semantics, how the business is using or applying that data and that algorithmic logic, so that might be logic that's just being used to transform that data, or it might be logic to actually make and automate decision, like what they're talking about GDPR. >> It's a data artifact catalog. These are all artifacts that, they are derived in many ways, or supplement and complement the data. >> That's right. >> They're all, it's all the logic, like you said. >> And what we talk about is, how do you create transparency into all those artifacts, right? So, a catalog starts with this inventory that creates a foundation for transparency, but if you don't make those artifacts accessible to a business person, who might not understand what is metadata, what is a transformation script. If you can't make that, those artifacts accessible to a, what I consider a real, or normal human being, right, (James laughs) I love to geek out, but, (all laugh) at some point, not everyone is going to understand. >> She's the normal human being in this team. >> I'm normal. I'm normal. >> I'm the abnormal human being among the questioners here. >> So, yeah, most people in the business are just getting our arms around how do we trust the output of analytics, how do we understand enough statistics and know what to apply to solve a business problem or not, and then we give them this like, hairball of technical artifacts and say, oh, go at it. You know, here's your transparency. >> Well, I want to ask about that, that human that we're talking about, that needs to be in the loop at every stage. What, that, surely, we can make the data more accessible, and, but it also requires a specialized skill set, and I want to ask you about the talent, because I noticed on your LinkedIn, you said, hey, we're hiring, so let me know. >> That's right, we're always hiring. We're a startup, growing well. >> So I want to know from you, I mean, are you having difficulty with filling roles? I mean, what is at the pipeline here? Are people getting the skills that they need? >> Yeah, I mean, there's a wide, what I think is a misnomer is there's actually a wide variety of skills, and I think we're adding new positions to this pool of skills. So I think what we're starting to see is an expectation that true business people, if you are in a finance organization, or you're in a marketing organization, or you're in a sales organization, you're going to see a higher level of data literacy be expected of that, that business person, and that's, that doesn't mean that they have to go take a Python course and learn how to be a data scientist. It means that they have to understand statistics enough to realize what the output of an algorithm is, and how they should be able to apply that. So, we have some great customers, who have formally kicked off internal training programs that are data literacy programs. Munich Re Insurance is a good example. They spoke with James a couple of months ago in Berlin. >> Yeah, this conference in Berlin, yeah. >> That's right, that's right, and their chief data officer has kicked off a formal data literacy training program for their employees, so that they can get business people comfortable enough and trusting the data, and-- >> It's a business culture transformation initiative that's very impressive. >> Yeah. >> How serious they are, and how comprehensive they are. >> But I think we're going to see that become much more common. Pfizer has taken, who's another customer of ours, has taken on a similar initiative, and how do they make all of their employees be able to have access to data, but then also know when to apply it to particular decision-making use cases. And so, we're seeing this need for business people to get a little bit of training, and then for new roles, like information stewards, or data stewards, to come online, folks who can curate the data and the data assets, and help be kind of translators in the organization. >> Stephanie, will there be a need for a algorithm curator, or a model curator, to, you know, like a model whisperer, to explain how these AI, convolutional, recurrent, >> Yeah. >> Whatever, all these neural, how, what they actually do, you know. Would there be a need for that going forward? Another as a normal human being, who can somehow be bilingual in neural net and in standard language? >> I think, I think so. I mean, I think we've put this pressure on data scientists to be that person. >> Oh my gosh, they're so busy doing their job. How can we expect them to explain, and I mean, >> Right. >> And to spend 100% of their time explaining it to the rest of us? >> And this is the challenge with some of the regulations like GDPR. We aren't set up yet, as organizations, to accommodate this complexity of understanding, and I think that this part of the market is going to move very quickly, so as vendors, one of the things that we can do is continue to help by building out applications that make it easy for information stewardship. How do you lower the barrier for these specialist roles and make it easy for them to do their job by using AI and machine learning, where appropriate, to help scale the manual work, but keeping a human in the loop to certify that data asset, or to add additional explanation and then taking their work and using AI, machine learning, and automation to propagate that work out throughout the organization, so that everyone then has access to those explanations. So you're no longer requiring the data scientists to hold like, I know other organizations that hold office hours, and the data scientist like sits at a desk, like you did in college, and people can come in and ask them questions about neural nets. That's just not going to scale at today's pace of business. >> Right, right. >> You know, the term that I used just now, the algorithm or model whisperer, you know, the recommend-er function that is built into your environment, in similar data catalog, is a key piece of infrastructure to rank the relevance rank, you know, the outputs of the catalog or responses to queries that human beings might make. You know, the recommendation ranking is critically important to help human beings assess the, you know, what's going on in the system, and give them some advice about how to, what avenues to explore, I think, so. >> Yeah, yeah. And that's part of our definition of data catalog. It's not just this inventory of technical metadata. >> That would be boring, and dry, and useless. >> But that's where, >> For most human beings. >> That's where a lot of vendor solutions start, right? >> Yeah. >> And that's an important foundation. >> Yeah, for people who don't live 100% of their work day inside the big data catalog. I hear what you're saying, you know. >> Yeah, so people who want a data catalog, how you make that relevant to the business is you connect those technical assets, that technical metadata with how is the business actually using this in practice, and how can we have proactive recommendation or the recommendation engines, and certifications, and this information steward then communicating through this platform to others in the organization about how do you interpret this data and how do you use it to actually make business decisions. And I think that's how we're going to close the gap between technology adoption and actual data-driven decision-making, which we're not quite seeing yet. We're only seeing about 30, when they survey, only about 36% of companies are actually confident they're making data-driven decisions, even though there have been, you know, millions, if not billions of dollars that have gone into the data analytics market and investments, and it's because as a manager, I don't quite have the data literacy yet, and I don't quite have the transparency across the rest of the organization to close that trust gap on analytics. >> Here's my feeling, in terms of cultural transformations across businesses in general. I think the legal staff of every company is going to need to get real savvy on using those kinds of tools, like your catalog, with recommendation engines, to support e-discovery, or discovery of the algorithmic decision paths that were taken by their company's products, 'cause they're going to be called by judges and juries, under a subpoena and so forth, and so on, to explain all this, and they're human beings who've got law degrees, but who don't know data, and they need the data environment to help them frame up a case for what we did, and you know, so, we being the company that's involved. >> Yeah, and our politicians. I mean, anyone who's read Cathy's book, Weapons of Math Destruction, there are some great use cases of where, >> Math, M-A-T-H, yeah. >> Yes, M-A-T-H. But there are some great examples of where algorithms can go wrong, and many of our politicians and our representatives in government aren't quite ready to have that conversation. I think anyone who watched the Zuckerberg hearings you know, in congress saw the gap of knowledge that exists between >> Oh my gosh. >> The legal community, and you know, and the tech community today. So there's a lot of work to be done to get ready for this new future. >> But just getting back to the cultural transformation needed to be, to make data-driven decisions, one of the things you were talking about is getting the managers to trust the data, and we're hearing about what are the best practices to have that happen in the sense, of starting small, be willing to experiment, get out of the lab, try to get to insight right away. What are, what would your best advice be, to gain trust in the data? >> Yeah, I think the biggest gap is this issue of transparency. How do you make sure that everyone understands each step of the process and has access to be able to dig into that. If you have a foundation of transparency, it's a lot easier to trust, rather than, you know, right now, we have kind of like the high priesthood of analytics going on, right? (Rebecca laughs) And some believers will believe, but a lot of folks won't, and, you know, the origin story of Alation is really about taking these concepts of the scientific revolution and scientific process and how can we support, for data analysis, those same steps of scientific evaluation of a finding. That means that you need to publish your data set, you need to allow others to rework that data, and come up with their own findings, and you have to be open and foster conversations around data in your organization. One other customer of ours, Meijer, who's a grocery store in the mid-west, and if you're west coast or east coast-based, you might not have heard of them-- >> Oh, Meijers, thrifty acres. I'm from Michigan, and I know them, yeah. >> Gigantic. >> Yeah, there you go. Gigantic grocery chain in the mid-west, and, Joe Oppenheimer there actually introduced a program that he calls the social contract for analytics, and before anyone gets their license to use Tableau, or MicroStrategy, or SaaS, or any of the tools internally, he asks those individuals to sign a social contract, which basically says that I'll make my work transparent, I will document what I'm doing so that it's shareable, I'll use certain standards on how I format the data, so that if I come up with a, with a really insightful finding, it can be easily put into production throughout the rest of the organization. So this is a really simple example. His inspiration for that social contract was his high school freshman. He was entering high school and had to sign a social contract, that he wouldn't make fun of the teachers, or the students, you know, >> I love it. >> Very simple basics. >> Yeah, right, right, right. >> I wouldn't make fun of the teacher. >> We all need social contract. >> Oh my gosh, you have to make fun of the teacher. >> I think it was a little more formal than that, in the language, but that was the concept. >> That's violating your civil rights as a student. I'm sorry. (Stephanie laughs) >> Stephanie, always so much fun to have you here. Thank you so much for coming on. >> Thank you. It's a pleasure to be here. >> I'm Rebecca Knight, for James Kobielus. We'll have more of theCUBE's live coverage of DataWorks just after this.

Published Date : Jun 20 2018

SUMMARY :

brought to you by Hortonworks. She is the Vice President of Marketing Thank you for having me and that humans actually of the time is that yeah. I mean, who can crack but also make sure that the robots That's right. And really canceled because Alation is in the and the analysis of it. and leading to actual recommendations the way I see it. and to be comfortable with It's a requirement for GDPR compliance, the metadata, but also helping to manage that plays into what Alation that metadata or the data itself, or supplement and complement the data. it's all the logic, I love to geek out, but, She's the normal human being I'm normal. I'm the abnormal and know what to apply that needs to be in the That's right, we're always hiring. and how they should be able to apply that. Yeah, this conference It's a business culture and how comprehensive they are. in the organization. and in standard language? on data scientists to be to explain, and I mean, and the data scientist to rank the relevance rank, you know, definition of data catalog. and dry, and useless. And that's an important inside the big data catalog. and I don't quite have the transparency and so on, to explain all this, Yeah, and our politicians. and many of our politicians and the tech community today. is getting the managers to trust the data, and has access to be and I know them, yeah. or the students, you know, the teacher. the teacher. in the language, but that was That's violating much fun to have you here. It's a pleasure to be here. We'll have more of theCUBE's live coverage

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Stephanie McReynolds	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Rebecca	PERSON	0.99+
Michigan	LOCATION	0.99+
Stephanie	PERSON	0.99+
Berlin	LOCATION	0.99+
James	PERSON	0.99+
100%	QUANTITY	0.99+
Kevin Slavin	PERSON	0.99+
San Jose	LOCATION	0.99+
millions	QUANTITY	0.99+
Cathy	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
Pfizer	ORGANIZATION	0.99+
LinkedIn	ORGANIZATION	0.99+
Munich Re Insurance	ORGANIZATION	0.99+
San Jose, California	LOCATION	0.99+
congress	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
Joe Oppenheimer	PERSON	0.99+
Python	TITLE	0.99+
10	QUANTITY	0.99+
Meijers	ORGANIZATION	0.99+
Zuckerberg	PERSON	0.99+
16 different algorithms	QUANTITY	0.99+
Weapons of Math Destruction	TITLE	0.99+
GDPR	TITLE	0.99+
One	QUANTITY	0.98+
each step	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.98+
about 36%	QUANTITY	0.98+
DataWorks Summit 2018	EVENT	0.97+
Tableau	TITLE	0.97+
about 30	QUANTITY	0.97+
Hortonworks	ORGANIZATION	0.97+
Alation	ORGANIZATION	0.96+
one algorithm	QUANTITY	0.96+
30 different vendors	QUANTITY	0.95+
billions of dollars	QUANTITY	0.95+
2015	DATE	0.95+
SaaS	TITLE	0.94+
one	QUANTITY	0.94+
Gigantic	ORGANIZATION	0.93+
first	QUANTITY	0.9+
MicroStrategy	TITLE	0.88+
this morning	DATE	0.88+
couple of months ago	DATE	0.84+
today	DATE	0.81+
Meijer	ORGANIZATION	0.77+
Wiki	TITLE	0.74+
Vice President	PERSON	0.72+
DataWorks	ORGANIZATION	0.71+
Alation	PERSON	0.53+
DataWorks	EVENT	0.43+

Aaron Kalb, Alation | BigData NYC 2017

>> Announcer: Live from midtown Manhattan, it's the Cube. Covering Big Data New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Welcome back everyone, we are here live in New York City, in Manhattan for BigData NYC, our event we've been doing for five years in conjunction with Strata Data which is formerly Strata Hadoop, which was formerly Strata Conference, formerly Hadoop World. We've been covering the big data space going on ten years now. This is the Cube. I'm here with Aaron Kalb, whose Head of Product and co-founder at Alation. Welcome to the cube. >> Aaron Kalb: Thank you so much for having me. >> Great to have you on, so co-founder head of product, love these conversations because you're also co-founder, so it's your company, you got a lot of equity interest in that, but also head of product you get to have the 20 mile stare, on what the future looks, while inventing it today, bringing it to market. So you guys have an interesting take on the collaboration of data. Talk about what the means, what's the motivation behind that positioning, what's the core thesis around Alation? >> Totally so the thing we've observed is a lot of people working in the data space, are concerned about the data itself. How can we make it cheaper to store, faster to process. And we're really concerned with the human side of it. Data's only valuable if it's used by people, how do we help people find the data, understand the data, trust in the data, and that involves a mix of algorithmic approaches and also human collaboration, both human to human and human to computer to get that all organized. >> John Furrier: It's interesting you have a symbolics background from Stanford, worked at Apple, involved in Siri, all this kind of futuristic stuff. You can't go a day without hearing about Alexia is going to have voice-activated, you've got Siri. AI is taking a really big part of this. Obviously all of the hype right now, but what it means is the software is going to play a key role as an interface. And this symbolic systems almost brings on this neural network kind of vibe, where objects, data, plays a critical role. >> Oh, absolutely, yeah, and in the early days when we were co-founding the company, we talked about what is Siri for the enterprise? Right, I was you know very excited to work on Siri, and it's really a kind of fun gimmick, and it's really useful when you're in the car, your hands are covered in cookie dough, but if you could answer questions like what was revenue last quarter in the UK and get the right answer fast, and have that dialogue, oh do you mean fiscal quarter or calendar quarter. Do you mean UK including Ireland, or whatever it is. That would really enable better decisions and a better outcome. >> I was worried that Siri might do something here. Hey Siri, oh there it is, okay be careful, I don't want it to answer and take over my job. >> (laughs) >> Automation will take away the job, maybe Siri will be doing interviews. Okay let's take a step back. You guys are doing well as a start up, you've got some great funding, great investors. How are you guys doing on the product? Give us a quick highlight on where you guys are, obviously this is BigData NYC a lot going on, it's Manhattan, you've got financial services, big industry here. You've got the Strata Data event which is the classic Hadoop industry that's morphed into data. Which really is overlapping with cloud, IoTs application developments all kind of coming together. How do you guys fit into that world? >> Yeah, absolutely, so the idea of the data lake is kind of interesting. Psychologically it's sort of a hoarder mentality, oh everything I've ever had I want to keep in the attic, because I might need it one day. Great opportunity to evolve these new streams of data, with IoT and what not, but just cause you can get to it physically doesn't mean it's easy to find the thing you want, the needle in all that big haystack and to distinguish from among all the different assets that are available, which is the one that is actually trustworthy for your need. So we find that all these trends make the need for a catalog to kind of organize that information and get what you want all the more valuable. >> This has come up a lot, I want to get into the integration piece and how you're dealing with your partnerships, but the data lake integration has been huge, and having the catalog has come up with, has been the buzz. Foundationally if you will saying catalog is important. Why is it important to do the catalog work up front, with a lot of the data strategies? >> It's a great question, so, we see data cataloging as step zero. Before you can prep the data in a tool like Trifacta, PACSAT, or Kylo. Before you can visualize it in a tool like Tableau, or MicroStrategy. Before you can do some sort of cool prediction of what's going to happen in the future, with a data science engine, before any of that. These are all garbage in garbage out processes. The step zero is find the relevant data. Understand it so you can get it in the right format. Trust that it's good and then you can do whatever comes next >> And governance has become a key thing here, we've heard of the regulations, GDPR outside of the United States, but also that's going to have an arms length reach over into the United States impact. So these little decisions, and there's going to be an Equifax someday out there. Another one's probably going to come around the corner. How does the policy injection change the catalog equation? A lot of people are building machine learning algorithms on top of catalogs, and they're worried they might have to rewrite everything. How do you balance the trade off between good catalog design and flexibility on the algorithm side? >> Totally yes it's a complicated thing with governance and consumption right. There's people who are concerned with keeping the data safe, and there are people concerned with turning that data into real value, and these can seem to be at odds. What we find is actually a catalog as a foundation for both, and they are not as opposed as they seem. What Alation fundamentally does is we make a map of where the data is, who's using what data, when, how. And that can actually be helpful if your goal is to say let's follow in the footsteps of the best analyst and make more insights generated or if you want to say, hey this data is being used a lot, let's make sure it's being used correctly. >> And by the right people. >> And by the right people exactly >> Equifax they were fishing that pond dry months, months before it actually happened. With good tools like this they might have seen this right? Am I getting it right? >> That's exactly right, how can you observe what's going on to make sure it's compliant and that the answers are correct and that it's happening quickly and driving results. >> So in a way you're taking the collective intelligence of the user behavior and using that into understanding what to do with the data modeling? >> That's exactly right. We want to make each person in your organization as knowledgeable as all of their peers combined. >> So the benefit then for the customer would be if you see something that's developing you can double down on it. And if the users are using a lot of data, then you can provision more technology, more software. >> Absolutely, absolutely. It's sort of like when I was going to Stanford, there was a place where the grass was all dead, because people were riding their bikes diagonally across it. And then somebody smart was like, we're going to put a real gravel path there. So the infrastructure should follow the usage, instead of being something you try to enforce on people. >> It's a classic design meme that goes around. Good design is here, the more effective design is the path. >> Exactly. >> So let's get into the integration. So one of the hot topics here this year obviously besides cloud and AI, with cloud really being more the driver, the tailwind for the growth, AI being more the futuristic head room, is integration. You guys have some partnerships that you announced with integration, what are some of the key ones, and why are they important? >> Absolutely, so, there have been attempts in the past to centralize all the data in one place have one warehouse or one lake have one BI tool. And those generally fail, for different reasons, different teams pick different stacks that work for them. What we think is important is the single source of reference One hub with spokes out to all those different points. If you think about it it's like Google, it's one index of the whole web even though the web is distributed all over the place. To make that happen it's very important that we have partnerships to get data in from various sources. So we have partnerships with database vendors, with Cloudera and Hortonworks, with different BI tools. What's new are a few things. One is with Cloudera Navigator, they have great technical metadata around security and lineage over HGFS, and that's a way to bolster our catalog to go even deeper into what's happening in the files before things get surfaced and higher for places where we have a deeper offering today. >> So it's almost a connector to them in a way, you kind of share data. >> That's exactly right, we've a lot of different connectors, this is one new one that we have. Another, go ahead. >> I was going to go ahead continue. >> I was just going to say another place that is exciting is data prep tools, so Trifacta and Paxata are both places where you can find and understand an alation and then begin to manipulate in those tools. We announced with Paxata yesterday, the ability to click to profile, so if you want to actually see what's in some raw compressed avro file, you can see that in one click. >> It's interesting, Paxata has really been almost lapping, Trifacta because they were the leader in my mind, but now you've got like a Nascar race going on between the two firms, because data wrangling is a huge issue. Data prep is where everyone is stuck right now, they just want to do the data science, it's interesting. >> They are both amazing companies and I'm happy to partner with both. And actually Trifacta and Alation have a lot of joint customers we're psyched to work with as well. I think what's interesting is that data prep, and this is beginning to happen with analyst definitions of that field. It isn't just preparing the data to be used, getting it cleaned and shaped, it's also preparing the humans to use the data giving them the confidence, the tools, the knowledge to know how to manipulate it. >> And it's great progress. So the question I wanted to ask is now the other big trend here is, I mean it's kind of a subtext in this show, it's not really front and center but we've been seeing it kind of emerge as a concept, we see in the cloud world, on premise vs cloud. On premise a lot of people bring in the dev ops model in, and saying I may move to the cloud for bursting and some native applications, but at the end of the day there is a lot of work going on on premise. A lot of companies are kind of cleaning house, retooling, replatforming, whatever you want to do resetting. They are kind of getting their house in order to do on prem cloud ops, meaning a business model of cloud operations on site. A lot of people doing that, that will impact the story, it's going to impact some of the server modeling, that's a hot trend. How do you guys deal with the on premise cloud dynamic? >> Totally, so we just want to do what's right for the customer, so we deploy both on prem and in the cloud and then from wherever the Alation server is it will point to usually a mix of sources, some that are in the cloud like vetshifter S3 often with Amazon today, and also sources that are on prem. I do think I'm seeing a trend more and more toward the cloud and we have people that are migrating from HGFS to S3 is one thing we hear a lot about it. Strata with sort of dupe interest. But I think what's happening is people are realizing as each Equifax in turn happens, that this old wild west model of oh you surround your bank with people on horseback and it's physically in one place. With data it isn't like that, most people are saying I'd rather have the A+ teams at Salesforce or Amazon or Google be responsible for my security, then the people I can get over in the midwest. >> And the Paxata guys have loved the term Data Democracy, because that is really democratization, making the data free but also having the governance thing. So tell me about the Data Lake governance, because I've never loved the term Data Lake, I think it's more of a data ocean, but now you see data lake, data lake, data lake. Are they just silos of data lakes happening now? Are people trying to connect them? That's key, so that's been a key trend here. How do you handle the governance across multiple data lakes? >> That's right so the key is to have that single source of reference, so that regardless of which lake or warehouse, or little siloed Sequel server somewhere, that you can search in a single portal and find that thing no matter where it is. >> John: Can you guys do that? >> We can do that, yeah, I think the metaphor for people who haven't seen it really is Google, if you think about it, you don't even know what physical server a webpage is hosted from. >> Data lakes should just be invisible >> Exactly. >> So your interfacing with multiple data lakes, that's a value proposition for you. >> That's right so it could be on prem or in the cloud, multi-cloud. >> Can you share an example of a customer that uses that and kind of how it's laid out? >> Absolutely, so one great example of an interesting data environment is eBay. They have the biggest teradata warehouse in the world. They also have I believe two huge data lakes, they have hive on top of that, and Presto is used to sort of virtualize it across a mixture of teradata, and hive and then direct Presto query It gets very complicated, and they have, they are a very data driven organization, so they have people who are product owners who are in jobs where data isn't in their job title and they know how to look at excel and look at numbers and make choices, but they aren't real data people. Alation provides that accessibility so that they can understand it. >> We used to call the Hadoop world the car show for the data world, where for a long time it was about the engine what was doing what, and then it became, what's the car, and now how's it drive. Seeing that same evolution now where all that stuff has to get done under the hood. >> Aaron: Exactly. >> But there are still people who care about that, right. They are the mechanics, they are the plumbers, whatever you want to call them, but then the data science are the guys really driving things and now end users potentially, and even applications bots or what nots. It seems to evolve, that's where we're kind of seeing the show change a little bit, and that's kind of where you see some of the AI things. I want to get your thoughts on how you or your guys are using AI, how you see AI, if it's AI at all if it's just machine learning as a baby step into AI, we all know what AI could be, but it's really just machine learning now. How do you guys use quote AI and how has it evolved? >> It's a really insightful question and a great metaphor that I love. If you think about it, it used to be how do you build the car, and now I can drive the car even though I couldn't build it or even fix it, and soon I don't even have to drive the car, the car will just drive me, all I have to know is where I want to go. That's sortof the progression that we see as well. There's a lot of talk about deep learning, all these different approaches, and it's super interesting and exciting. But I think even more interesting than the algorithms are the applications. And so for us it's like today how do we get that turn by turn directions where we say turn left at the light if you want to get there And eventually you know maybe the computer can do it for you The thing that is also interesting is to make these algorithms work no matter how good your algorithm is it's all based on the quality of your training data. >> John: Which is a historical data. Historical data in essence the more historical data you have you need that to train the data. >> Exactly right, and we call this behavior IO how do we look at all the prior human behavior to drive better behavior in the future. And I think the key for us is we don't want to have a bunch of unpaid >> John: You can actually get that URL behavioral IO. >> We should do it before it's too late (Both laugh) >> We're live right now, go register that Patrick. >> Yeah so the goal is we don't want to have a bunch of unpaid interns trying to manually attack things, that's error prone and that's slow. I look at things like Luis von Ahn over at CMU, he does a thing where as you're writing in a CAPTCHA to get an email account you're also helping Google recognize a hard to read address or a piece of text from books. >> John: If you shoot the arrow forward, you just take this kind of forward, you almost think augmented reality is a pretext to what we might see for what you're talking about and ultimately VR are you seeing some of the use cases for virtual reality be very enterprise oriented or even end consumer. I mean Tom Brady the best quarterback of all time, he uses virtual reality to play the offense virtually before every game, he's a power user, in pharma you see them using virtual reality to do data mining without being in the lab, so lab tests. So you're seeing augmentation coming in to this turn by turn direction analogy. >> It's exactly, I think it's the other half of it. So we use AI, we use techniques to get great data from people and then we do extra work watching their behavior to learn what's right. And to figure out if there are recommendations, but then you serve those recommendations, either it's Google glasses it appears right there in your field of view. We just have to figure out how do we make sure, that in a moment of you're making a dashboard, or you're making a choice that you have that information right on hand. >> So since you're a technical geek, and a lot of folks would love to talk about this, so I'll ask you a tough question cause this is something everyone is trying to chase for the holy grail. How do you get the right piece of data at the right place at the right time, given that you have all these legacy silos, latencies and network issues as well, so you've got a data warehouse, you've got stuff in cold storage, and I've got an app and I'm doing something, there could be any points of data in the world that could be in milliseconds potentially on my phone or in my device my internet of thing wearable. How do you make that happen? Because that's the struggle, at the same time keep all the compliance and all the overhead involved, is it more compute, is it an architectural challenge how do you view that because this is the big challenge of our time. >> Yeah again I actually think it's the human challenge more than the technology challenge. It is true that there is data all over the place kind of gathering dust, but again if you think about Google, billions of web pages, I only care about the one I'm about to use. So for us it's really about being in that moment of writing a query, building a chart, how do we say in that moment, hey you're using an out of date definition of profit. Or hey the database you chose to use, the one thing you chose out of the millions that is actually is broken and stale. And we have interventions to do that with our partners and through our own first party apps that actually change how decisions get made at companies. >> So to make that happen, if I imagine it, you'd have to need access to the data, and then write software that is contextually aware to then run, compute, in context to the user interaction. >> It's exactly right, back to the turn by turn directions concept you have to know both where you're trying to go and where you are. And so for us that can be the from where I'm writing a Sequel statement after join we can suggest the table most commonly joined with that, but also overlay onto that the fact that the most commonly joined table was deprecated by a data steward data curator. So that's the moment that we can change the behavior from bad to good. >> So a chief data officer out there, we've got to wrap up, but I wanted to ask one final question, There's a chief data officer out there they might be empowered or they might be just a CFO assistant that's managing compliance, either way, someone's going to be empowered in an organization to drive data science and data value forward because there is so much proof that data science works. From military to play you're seeing examples where being data driven actually has benefits. So everyone is trying to get there. How do you explain the vision of Alation to that prospect? Because they have so much to select from, there's so much noise, there's like, we call it the tool shed out there, there's like a zillion tools out there there's like a zillion platforms, some tools are trying to turn into something else, a hammer is trying to be a lawnmower. So they've got to be careful on who the select, so what's the vision of Alation to that chief data officer, or that person in charge of analytics to scale operational analytics. >> Absolutely so we say to the CDO we have a shared vision for this place where your company is making decisions based on data, instead of based on gut, or expensive consultants months too late. And the way we get there, the reason Alation adds value is, we're sort of the last tool you have to buy, because with this lake mentality, you've got your tool shed with all the tools, you've got your library with all the books, but they're just in a pile on the floor, if you had a tool that had everything organized, so you just said hey robot, I need an hammer and this size nail and this text book on this set of information and it could just come to you, and it would be correct and it would be quick, then you could actually get value out of all the expense you've already put in this infrastructure, that's especially true on the lake. >> And also tools describe the way the works done so in that model tools can be in the tool shed no one needs to know it's in there. >> Aaron: Exactly. >> You guys can help scale that. Well congratulations and just how far along are you guys in terms of number of employees, how many customers do you have? If you can share that, I don't know if that's confidential or what not >> Absolutely, so we're small but growing very fast planning to double in the next year, and in terms of customers, we've got 85 customers including some really big names. I mentioned eBay, Pfizer, Safeway Albertsons, Tesco, Meijer. >> And what are they saying to you guys, why are they buying, why are they happy? >> They share that same vision of a more data driven enterprise, where humans are empowered to find out, understand, and trust data to make more informed choices for the business, and that's why they come and come back. >> And that's the product roadmap, ethos, for you guys that's the guiding principle? >> Yeah the ultimate goal is to empower humans with information. >> Alright Aaron thanks for coming on the Cube. Aaron Kalb, co-founder head of product for Alation here in New York City for BigData NYC and also Strata Data I'm John Furrier thanks for watching. We'll be right back with more after this short break.

Published Date : Sep 28 2017

SUMMARY :

Brought to you by This is the Cube. Great to have you on, so co-founder head of product, Totally so the thing we've observed is a lot Obviously all of the hype right now, and get the right answer fast, and have that dialogue, I don't want it to answer and take over my job. How are you guys doing on the product? doesn't mean it's easy to find the thing you want, and having the catalog has come up with, has been the buzz. Understand it so you can get it in the right format. and flexibility on the algorithm side? and make more insights generated or if you want to say, Am I getting it right? That's exactly right, how can you observe what's going on We want to make each person in your organization So the benefit then for the customer would be So the infrastructure should follow the usage, Good design is here, the more effective design is the path. You guys have some partnerships that you announced it's one index of the whole web So it's almost a connector to them in a way, this is one new one that we have. the ability to click to profile, going on between the two firms, It isn't just preparing the data to be used, but at the end of the day there is a lot of work for the customer, so we deploy both on prem and in the cloud because that is really democratization, making the data free That's right so the key is to have that single source really is Google, if you think about it, So your interfacing with multiple data lakes, on prem or in the cloud, multi-cloud. They have the biggest teradata warehouse in the world. the car show for the data world, where for a long time and that's kind of where you see some of the AI things. and now I can drive the car even though I couldn't build it Historical data in essence the more historical data you have to drive better behavior in the future. Yeah so the goal is and ultimately VR are you seeing some of the use cases but then you serve those recommendations, and all the overhead involved, is it more compute, the one thing you chose out of the millions So to make that happen, if I imagine it, back to the turn by turn directions concept you have to know How do you explain the vision of Alation to that prospect? And the way we get there, no one needs to know it's in there. If you can share that, I don't know if that's confidential planning to double in the next year, for the business, and that's why they come and come back. Yeah the ultimate goal is Alright Aaron thanks for coming on the Cube.

ENTITIES

Entity	Category	Confidence
Luis von Ahn	PERSON	0.99+
eBay	ORGANIZATION	0.99+
Aaron Kalb	PERSON	0.99+
Pfizer	ORGANIZATION	0.99+
John	PERSON	0.99+
Aaron	PERSON	0.99+
Tesco	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Safeway Albertsons	ORGANIZATION	0.99+
Siri	TITLE	0.99+
Google	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
UK	LOCATION	0.99+
20 mile	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
BigData	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
Equifax	ORGANIZATION	0.99+
two firms	QUANTITY	0.99+
Apple	ORGANIZATION	0.99+
Meijer	ORGANIZATION	0.99+
ten years	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
Trifacta	ORGANIZATION	0.99+
85 customers	QUANTITY	0.99+
Alation	ORGANIZATION	0.99+
Patrick	PERSON	0.99+
both	QUANTITY	0.99+
Strata Data	ORGANIZATION	0.99+
millions	QUANTITY	0.99+
United States	LOCATION	0.99+
Paxata	ORGANIZATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
excel	TITLE	0.99+
Manhattan	LOCATION	0.99+
last quarter	DATE	0.99+
Ireland	LOCATION	0.99+
GDPR	TITLE	0.99+
Tom Brady	PERSON	0.99+
each person	QUANTITY	0.99+
Salesforce	ORGANIZATION	0.98+
next year	DATE	0.98+
NYC	LOCATION	0.98+
one	QUANTITY	0.98+
this year	DATE	0.98+
yesterday	DATE	0.98+
today	DATE	0.97+
one lake	QUANTITY	0.97+
Nascar	ORGANIZATION	0.97+
one warehouse	QUANTITY	0.97+
Strata Data	EVENT	0.96+
Tableau	TITLE	0.96+
One	QUANTITY	0.96+
Both laugh	QUANTITY	0.96+
billions of web pages	QUANTITY	0.96+
single portal	QUANTITY	0.95+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Meijer: