Image Title

Search Results for Munich Re:

Andreas Kohlmaier, Munich Re | Dataworks Summit EU 2018


 

>> Narrator: From Berlin, Germany, it's The Cube. Covering DataWorks Summit Europe 2018. Brought to you by Hortonworks. >> Well, hello. Welcome to The Cube. I'm James Kobielus. I'm the Lead Analyst for Big Data Analytics in the Wikibon team of SiliconANGLE Media. We are here at DataWorks Summit 2018 in Berlin. Of course, it's hosted by a Hortonworks. We are in day one of two days of interviews with executives, with developers, with customers. And this morning the opening keynote, one of the speaker's was a customer of Hortonworks from Munich Re, the reinsurance company based of course in Munich, Germany. Andreas Kohlmaier, who's the the head of Data Engineering I believe, it was an excellent discussion you've built out of data lake. And the first thing I'd like to ask you Andreas is right now it's five weeks until GDPR, the general data protection regulation, goes into full force on May 25th. And of course it applies to the EU, to anybody who does business in the EU including companies based elsewhere, such as in the US, needs to start complying with GDPR in terms of protecting personal data. Give us a sense for how Munich Re is approaching the deadline, your level of readiness to comply with GDPR, and how your investment in your data lake serves as a foundation for that compliance. >> Absolutely. So thanks for the question. GDPR, of course, is the hot topic across all European organizations. And we actually pretty well prepared. We compiled all the processes and the necessary regulations and in fact we are now selling this also as a service product to our customers. This has been an interesting side effect because we have lots of other insurance companies and we started to think about why not offer this as a service to other insurance companies to help them prepare for GDPR. This is actually proving to be one of the exciting interesting things that can happen about GDPR. >> Maybe that would be your new line of business. You make more money doing that then. >> I'm not sure! (crosstalk) >> Well that's excellent! So you've learned a lot of lessons. So already so you're ready for May 25th? You have, okay, that's great. You're probably far ahead of I know a lot of U.S. based firms. We're, you know in our country and in other countries, we're still getting our heads around all the steps that are needed so you know many companies outside the EU may call on you guys for some consulting support. That's great! So give us a sense for your data lake. You discussed it this morning but can you give us a sense for the business justification for building it out? How you've rolled it out? What stage it's in? Who's using it for what? >> So absolutely. So one of the key things for us at Munich Re is the issue about complexity or data diversity as it was also called this morning. So we have so many different areas where we are doing business in and we have lots of experts in the different areas. And those people and I really have they are very knowledgeable in the area and now they also get access to new sources of information. So to give you a sense we have people for example that are really familiar with weather and climate change, also with satellites. We have captains for ships and pilots for aircraft. So we have lots of expertise in all the different areas. Why? Because we are taking those risks in our books. >> Those are big risks too. You're a reinsurance company so yeah. >> And these are actually complex risks where we really have people that really are experts on their field. So we have sometimes have people that have 20 years plus of experience in the area and then they change to the insurer to actually bring their expertise on the field also to the risk management side. And all those people, they now get an additional source of input which is the data that is now more or less readily available everywhere. So first of all, we are getting new data with the submissions and the risks that we are taking and there are also interesting open data sources to connect to so that those experts can actually bring their knowledge and their analytics to a new level by adding the layer of data and analytics to their existing knowledge. And this allows us, first of all, to understand the risks even better, to put a better price tag on that, and also to take up new risks that have not been possible to cover before. So one of the things is also in the media I think is that we are also now covering the Hyperloop once it's going to be built. So those kind of new things are only possible with data analytics. >> So you're a Hortonworks customer. Give us a sense for how you're using or deploying Hortonworks data platform or data plane service and whatnot inside of your data lake. It sounds like it's a big data catalog, is that a correct characterization? >> So one of the things that is key to us is actually finding the right information and connecting those different experts to each other. So this is why the data catalog plays a central role. Here we have selected Alation as a catalog tool to connect the different experts in the group. The data lake at the moment is an on-prem installation. We are thinking about moving parts of that workload to the cloud to actually save operation costs. >> On top of HTP. >> Yeah so Alation is actually as far as I know technically it's a separate server that indexes the hive tables on HTP. >> So essentially the catalog itself is provides visualization and correlation across disparate data sources that are managing your hadoop. >> Yeah, so the the catalog actually is a great way of connecting the experts together. So that's you know okay if we have people on one part of the group that are very knowledgeable about weather and they have great data about weather then we'd like to connect them for example to the guys that doing crop insurance for India so that they can use the weather data to improve the models for example for crop insurance in Asia. And there the data catalog helps us to connect those experts because you can first of all find the data sources and you can also see who is the expert on the data. You can then also call them up or ask them a question in the tool. So it's essentially a great way to share knowledge and to connect the different experts of the group. >> Okay, so it's also surfacing up human expertise. Okay, is it also serving as a way to find training datasets possibly to use to build machine learning models to do more complex analyses? Is that something that you're doing now or plan to do in the future? >> Yes, so we are doing some of course machine learning also deep learning projects. We are also just started a Center of Excellence for artificial intelligence to see okay how we can use deep learning and machine learning also to find different ways of pricing insurance lists for example and this of course for all those cases data is key and we really need people to get access to the right data. >> I have to ask you. One of the things I'm seeing, you mentioned Center of Excellence for AI. I'm seeing more companies consider, maybe not do it, consider establishing a office of the chief AI officer like reporting to the CEO. I'm not sure that that's a great idea for a lot of businesses but since an insurance company lives and dies by data and calculations and so forth, is that something that Munich Re is doing or considering in a C-Suite level officer of that sort responsible for this AI competency or no? >> Could be in the future. >> Okay. >> We sort of just started with the AI Center of Excellence. That is now reporting to our Chief Data Officer so it's not yet a C-Suite. >> Is the Center of Excellence for AI, is it simply like a training institute to provide some basic skill building or is there something more there? Do you do development? >> Actually they are trying out and developing ways on how we can use AI on deep learning for insurance. One of the core things of course is also about understanding natural language to structure the information that we are getting in PDFs and in documents but really also while using deep learning as a new way to build tariffs for the insurance industry. So that's one of the the core things to find and create new tariffs. And we also experimenting, haven't found the product yet there, whether or not we can use deep learning to create better tariffs. That could also then be one of the services, again we are providing to our customers, the insurance companies and they build that into their products. Something like yeah the algorithms is powered by Munich Re. >> Now your users of your data lake, these are expert quantitative analysts, right, for the most part? So you mentioned using natural language understanding AI capabilities. Is that something that you have a need to do in high volume as a reinsurance company? Take lots of source documents and be able to as it were identify the content and high volume and important you know not OCR but rather the actual build a graph of semantic graph of what's going on inside the document? >> I'm going to give you an example of the things that we are doing with natural language processing. And this one is about the energy business in the US. So we are actually taking up or seeing most of the risks that are related to oil and gas in the U.S. So all the refineries, all the larger stations, and the the petroleum tanks. They are all in our books and for each and every one of them we get a nice report on risks there with a couple of hundred of pages. And inside these reports there's also some paragraph written in where actually the refinery or the plants gets its supplies from and where it ships its products to. And thence we are seeing all those documents. That's in the scale of a couple of thousands so it's not really huge but all together a couple of hundred thousand pages. We use NLP and AI on those documents to extract the supply chain information out of it so in that way we can stitch together a more or less complete picture of the supply chain for oil and gas in the U.S. which helps us again to better understand that risk because supply chain breakdown is one of the major risk in the world nowadays. >> Andreas, this has been great! We can keep on going on. I'm totally fascinated by your use of AI but also your use of a data lake and I'm impressed by your ability to get your, as a company get your as we say in the U.S. get your GDPR ducks in a row and that's great. So it's been great to have you on The Cube. We are here at DataWorks Summit in Berlin. (techno music)

Published Date : Apr 18 2018

SUMMARY :

Brought to you by Hortonworks. And the first thing I'd like to ask you Andreas of the exciting interesting things Maybe that would be your new line of business. all the steps that are needed so you know So one of the key things for us at Munich Re You're a reinsurance company so yeah. on the field also to the risk management side. of your data lake. So one of the things that is key to us the hive tables on HTP. So essentially the catalog itself experts of the group. or plan to do in the future? for artificial intelligence to see okay how we One of the things I'm seeing, That is now reporting to our Chief Data Officer so to structure the information that we are getting on inside the document? of the risks that are related to oil and gas in the U.S. So it's been great to have you on The Cube.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
James KobielusPERSON

0.99+

Munich ReORGANIZATION

0.99+

Andreas KohlmaierPERSON

0.99+

May 25thDATE

0.99+

AndreasPERSON

0.99+

20 yearsQUANTITY

0.99+

USLOCATION

0.99+

HortonworksORGANIZATION

0.99+

BerlinLOCATION

0.99+

AsiaLOCATION

0.99+

GDPRTITLE

0.99+

two daysQUANTITY

0.99+

U.S.LOCATION

0.99+

five weeksQUANTITY

0.99+

Center of Excellence for AIORGANIZATION

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

IndiaLOCATION

0.99+

Berlin, GermanyLOCATION

0.98+

OneQUANTITY

0.98+

Munich ReLOCATION

0.98+

oneQUANTITY

0.97+

DataWorks SummitEVENT

0.97+

one partQUANTITY

0.97+

DataWorks Summit 2018EVENT

0.97+

2018EVENT

0.96+

EUORGANIZATION

0.96+

Munich, GermanyLOCATION

0.96+

eachQUANTITY

0.96+

Dataworks Summit EU 2018EVENT

0.93+

first thingQUANTITY

0.88+

HyperloopTITLE

0.87+

this morningDATE

0.86+

Center of Excellence for artificial intelligenceORGANIZATION

0.85+

AlationTITLE

0.84+

EULOCATION

0.83+

hundred thousand pagesQUANTITY

0.82+

one ofQUANTITY

0.79+

AlationORGANIZATION

0.77+

WikibonORGANIZATION

0.74+

couple of hundred of pagesQUANTITY

0.73+

couple of thousandsQUANTITY

0.7+

CubeORGANIZATION

0.7+

C-SuiteTITLE

0.69+

firstQUANTITY

0.67+

EuropeanLOCATION

0.6+

DataPERSON

0.57+

servicesQUANTITY

0.56+

everyQUANTITY

0.53+

EuropeLOCATION

0.52+

AIORGANIZATION

0.52+

DataORGANIZATION

0.5+

coupleQUANTITY

0.43+

CubePERSON

0.42+

Stephanie McReynolds, Alation | DataWorks Summit 2018


 

>> Live from San Jose, in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2018, brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of DataWorks here in San Jose, California. I'm your host, Rebecca Knight, along with my co-host, James Kobielus. We're joined by Stephanie McReynolds. She is the Vice President of Marketing at Alation. Thanks so much for, for returning to theCUBE, Stephanie. >> Thank you for having me again. >> So, before the cameras were rolling, we were talking about Kevin Slavin's talk on the main stage this morning, and talking about, well really, a background to sort of this concern about AI and automation coming to take people's jobs, but really, his overarching point was that we really, we shouldn't, we shouldn't let the algorithms take over, and that humans actually are an integral piece of this loop. So, riff on that a little bit. >> Yeah, what I found fascinating about what he presented were actual examples where having a human in the loop of AI decision-making had a more positive impact than just letting the algorithms decide for you, and turning it into kind of a black, a black box. And the issue is not so much that, you know, there's very few cases where the algorithms make the wrong decision. What happens the majority of the time is that the algorithms actually can't be understood by human. So if you have to roll back >> They're opaque, yeah. >> in your decision-making, or uncover it, >> I mean, who can crack what a convolutional neural network does, at a layer by layer, nobody can. >> Right, right. And so, his point was, if we want to avoid not just poor outcomes, but also make sure that the robots don't take over the world, right, which is where every like, media person goes first, right? (Rebecca and James laugh) That you really need a human in the loop of this process. And a really interesting example he gave was what happened with the 2015 storm, and he talked about 16 different algorithms that do weather predictions, and only one algorithm predicted, mis-predicted that there would be a huge weather storm on the east coast. So if there had been a human in the loop, we wouldn't have, you know, caused all this crisis, right? The human could've >> And this is the storm >> Easily seen. >> That shut down the subway system, >> That's right. That's right. >> And really canceled New York City for a few days there, yeah. >> That's right. So I find this pretty meaningful, because Alation is in the data cataloging space, and we have a lot of opportunity to take technical metadata and automate the collection of technical and business metadata and do all this stuff behind the scenes. >> And you make the discovery of it, and the analysis of it. >> We do the discovery of this, and leading to actual recommendations to users of data, that you could turn into automated analyses or automated recommendations. >> Algorithmic, algorithmically augmented human judgment is what it's all about, the way I see it. What do you think? >> Yeah, but I think there's a deeper insight that he was sharing, is it's not just human judgment that is required, but for humans to actually be in the loop of the analysis as it moves from stage to stage, that we can try to influence or at least understand what's happening with that algorithm. And I think that's a really interesting point. You know, there's a number of data cataloging vendors, you know, some analysts will say there's anywhere from 10 to 30 different vendors in the data cataloging space, and as vendors, we kind of have this debate. Some vendors have more advanced AI and machine learning capabilities, and other vendors haven't automated at all. And I think that the answer, if you really want humans to adopt analytics, and to be comfortable with the decision-making of those algorithms, you need to have a human in the loop, in the middle of that process, of not only making the decision, but actually managing the data that flows through these systems. >> Well, algorithmic transparency and accountability is an increasing requirement. It's a requirement for GDPR compliance, for example. >> That's right. >> That I don't see yet with Wiki, but we don't see a lot of solution providers offering solutions to enable more of an automated roll-up of a narrative of an algorithmic decision path. But that clearly is a capability as it comes along, and it will. That will absolutely depend on a big data catalog managing the data, the metadata, but also helping to manage the tracking of what models were used to drive what decision, >> That's right. >> And what scenario. So that, that plays into what Alation >> So we talk, >> And others in your space do. >> We call that data catalog, almost as if the data's the only thing that we're tracking, but in addition to that, that metadata or the data itself, you also need to track the business semantics, how the business is using or applying that data and that algorithmic logic, so that might be logic that's just being used to transform that data, or it might be logic to actually make and automate decision, like what they're talking about GDPR. >> It's a data artifact catalog. These are all artifacts that, they are derived in many ways, or supplement and complement the data. >> That's right. >> They're all, it's all the logic, like you said. >> And what we talk about is, how do you create transparency into all those artifacts, right? So, a catalog starts with this inventory that creates a foundation for transparency, but if you don't make those artifacts accessible to a business person, who might not understand what is metadata, what is a transformation script. If you can't make that, those artifacts accessible to a, what I consider a real, or normal human being, right, (James laughs) I love to geek out, but, (all laugh) at some point, not everyone is going to understand. >> She's the normal human being in this team. >> I'm normal. I'm normal. >> I'm the abnormal human being among the questioners here. >> So, yeah, most people in the business are just getting our arms around how do we trust the output of analytics, how do we understand enough statistics and know what to apply to solve a business problem or not, and then we give them this like, hairball of technical artifacts and say, oh, go at it. You know, here's your transparency. >> Well, I want to ask about that, that human that we're talking about, that needs to be in the loop at every stage. What, that, surely, we can make the data more accessible, and, but it also requires a specialized skill set, and I want to ask you about the talent, because I noticed on your LinkedIn, you said, hey, we're hiring, so let me know. >> That's right, we're always hiring. We're a startup, growing well. >> So I want to know from you, I mean, are you having difficulty with filling roles? I mean, what is at the pipeline here? Are people getting the skills that they need? >> Yeah, I mean, there's a wide, what I think is a misnomer is there's actually a wide variety of skills, and I think we're adding new positions to this pool of skills. So I think what we're starting to see is an expectation that true business people, if you are in a finance organization, or you're in a marketing organization, or you're in a sales organization, you're going to see a higher level of data literacy be expected of that, that business person, and that's, that doesn't mean that they have to go take a Python course and learn how to be a data scientist. It means that they have to understand statistics enough to realize what the output of an algorithm is, and how they should be able to apply that. So, we have some great customers, who have formally kicked off internal training programs that are data literacy programs. Munich Re Insurance is a good example. They spoke with James a couple of months ago in Berlin. >> Yeah, this conference in Berlin, yeah. >> That's right, that's right, and their chief data officer has kicked off a formal data literacy training program for their employees, so that they can get business people comfortable enough and trusting the data, and-- >> It's a business culture transformation initiative that's very impressive. >> Yeah. >> How serious they are, and how comprehensive they are. >> But I think we're going to see that become much more common. Pfizer has taken, who's another customer of ours, has taken on a similar initiative, and how do they make all of their employees be able to have access to data, but then also know when to apply it to particular decision-making use cases. And so, we're seeing this need for business people to get a little bit of training, and then for new roles, like information stewards, or data stewards, to come online, folks who can curate the data and the data assets, and help be kind of translators in the organization. >> Stephanie, will there be a need for a algorithm curator, or a model curator, to, you know, like a model whisperer, to explain how these AI, convolutional, recurrent, >> Yeah. >> Whatever, all these neural, how, what they actually do, you know. Would there be a need for that going forward? Another as a normal human being, who can somehow be bilingual in neural net and in standard language? >> I think, I think so. I mean, I think we've put this pressure on data scientists to be that person. >> Oh my gosh, they're so busy doing their job. How can we expect them to explain, and I mean, >> Right. >> And to spend 100% of their time explaining it to the rest of us? >> And this is the challenge with some of the regulations like GDPR. We aren't set up yet, as organizations, to accommodate this complexity of understanding, and I think that this part of the market is going to move very quickly, so as vendors, one of the things that we can do is continue to help by building out applications that make it easy for information stewardship. How do you lower the barrier for these specialist roles and make it easy for them to do their job by using AI and machine learning, where appropriate, to help scale the manual work, but keeping a human in the loop to certify that data asset, or to add additional explanation and then taking their work and using AI, machine learning, and automation to propagate that work out throughout the organization, so that everyone then has access to those explanations. So you're no longer requiring the data scientists to hold like, I know other organizations that hold office hours, and the data scientist like sits at a desk, like you did in college, and people can come in and ask them questions about neural nets. That's just not going to scale at today's pace of business. >> Right, right. >> You know, the term that I used just now, the algorithm or model whisperer, you know, the recommend-er function that is built into your environment, in similar data catalog, is a key piece of infrastructure to rank the relevance rank, you know, the outputs of the catalog or responses to queries that human beings might make. You know, the recommendation ranking is critically important to help human beings assess the, you know, what's going on in the system, and give them some advice about how to, what avenues to explore, I think, so. >> Yeah, yeah. And that's part of our definition of data catalog. It's not just this inventory of technical metadata. >> That would be boring, and dry, and useless. >> But that's where, >> For most human beings. >> That's where a lot of vendor solutions start, right? >> Yeah. >> And that's an important foundation. >> Yeah, for people who don't live 100% of their work day inside the big data catalog. I hear what you're saying, you know. >> Yeah, so people who want a data catalog, how you make that relevant to the business is you connect those technical assets, that technical metadata with how is the business actually using this in practice, and how can we have proactive recommendation or the recommendation engines, and certifications, and this information steward then communicating through this platform to others in the organization about how do you interpret this data and how do you use it to actually make business decisions. And I think that's how we're going to close the gap between technology adoption and actual data-driven decision-making, which we're not quite seeing yet. We're only seeing about 30, when they survey, only about 36% of companies are actually confident they're making data-driven decisions, even though there have been, you know, millions, if not billions of dollars that have gone into the data analytics market and investments, and it's because as a manager, I don't quite have the data literacy yet, and I don't quite have the transparency across the rest of the organization to close that trust gap on analytics. >> Here's my feeling, in terms of cultural transformations across businesses in general. I think the legal staff of every company is going to need to get real savvy on using those kinds of tools, like your catalog, with recommendation engines, to support e-discovery, or discovery of the algorithmic decision paths that were taken by their company's products, 'cause they're going to be called by judges and juries, under a subpoena and so forth, and so on, to explain all this, and they're human beings who've got law degrees, but who don't know data, and they need the data environment to help them frame up a case for what we did, and you know, so, we being the company that's involved. >> Yeah, and our politicians. I mean, anyone who's read Cathy's book, Weapons of Math Destruction, there are some great use cases of where, >> Math, M-A-T-H, yeah. >> Yes, M-A-T-H. But there are some great examples of where algorithms can go wrong, and many of our politicians and our representatives in government aren't quite ready to have that conversation. I think anyone who watched the Zuckerberg hearings you know, in congress saw the gap of knowledge that exists between >> Oh my gosh. >> The legal community, and you know, and the tech community today. So there's a lot of work to be done to get ready for this new future. >> But just getting back to the cultural transformation needed to be, to make data-driven decisions, one of the things you were talking about is getting the managers to trust the data, and we're hearing about what are the best practices to have that happen in the sense, of starting small, be willing to experiment, get out of the lab, try to get to insight right away. What are, what would your best advice be, to gain trust in the data? >> Yeah, I think the biggest gap is this issue of transparency. How do you make sure that everyone understands each step of the process and has access to be able to dig into that. If you have a foundation of transparency, it's a lot easier to trust, rather than, you know, right now, we have kind of like the high priesthood of analytics going on, right? (Rebecca laughs) And some believers will believe, but a lot of folks won't, and, you know, the origin story of Alation is really about taking these concepts of the scientific revolution and scientific process and how can we support, for data analysis, those same steps of scientific evaluation of a finding. That means that you need to publish your data set, you need to allow others to rework that data, and come up with their own findings, and you have to be open and foster conversations around data in your organization. One other customer of ours, Meijer, who's a grocery store in the mid-west, and if you're west coast or east coast-based, you might not have heard of them-- >> Oh, Meijers, thrifty acres. I'm from Michigan, and I know them, yeah. >> Gigantic. >> Yeah, there you go. Gigantic grocery chain in the mid-west, and, Joe Oppenheimer there actually introduced a program that he calls the social contract for analytics, and before anyone gets their license to use Tableau, or MicroStrategy, or SaaS, or any of the tools internally, he asks those individuals to sign a social contract, which basically says that I'll make my work transparent, I will document what I'm doing so that it's shareable, I'll use certain standards on how I format the data, so that if I come up with a, with a really insightful finding, it can be easily put into production throughout the rest of the organization. So this is a really simple example. His inspiration for that social contract was his high school freshman. He was entering high school and had to sign a social contract, that he wouldn't make fun of the teachers, or the students, you know, >> I love it. >> Very simple basics. >> Yeah, right, right, right. >> I wouldn't make fun of the teacher. >> We all need social contract. >> Oh my gosh, you have to make fun of the teacher. >> I think it was a little more formal than that, in the language, but that was the concept. >> That's violating your civil rights as a student. I'm sorry. (Stephanie laughs) >> Stephanie, always so much fun to have you here. Thank you so much for coming on. >> Thank you. It's a pleasure to be here. >> I'm Rebecca Knight, for James Kobielus. We'll have more of theCUBE's live coverage of DataWorks just after this.

Published Date : Jun 20 2018

SUMMARY :

brought to you by Hortonworks. She is the Vice President of Marketing Thank you for having me and that humans actually of the time is that yeah. I mean, who can crack but also make sure that the robots That's right. And really canceled because Alation is in the and the analysis of it. and leading to actual recommendations the way I see it. and to be comfortable with It's a requirement for GDPR compliance, the metadata, but also helping to manage that plays into what Alation that metadata or the data itself, or supplement and complement the data. it's all the logic, I love to geek out, but, She's the normal human being I'm normal. I'm the abnormal and know what to apply that needs to be in the That's right, we're always hiring. and how they should be able to apply that. Yeah, this conference It's a business culture and how comprehensive they are. in the organization. and in standard language? on data scientists to be to explain, and I mean, and the data scientist to rank the relevance rank, you know, definition of data catalog. and dry, and useless. And that's an important inside the big data catalog. and I don't quite have the transparency and so on, to explain all this, Yeah, and our politicians. and many of our politicians and the tech community today. is getting the managers to trust the data, and has access to be and I know them, yeah. or the students, you know, the teacher. the teacher. in the language, but that was That's violating much fun to have you here. It's a pleasure to be here. We'll have more of theCUBE's live coverage

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
James KobielusPERSON

0.99+

Stephanie McReynoldsPERSON

0.99+

Rebecca KnightPERSON

0.99+

RebeccaPERSON

0.99+

MichiganLOCATION

0.99+

StephaniePERSON

0.99+

BerlinLOCATION

0.99+

JamesPERSON

0.99+

100%QUANTITY

0.99+

Kevin SlavinPERSON

0.99+

San JoseLOCATION

0.99+

millionsQUANTITY

0.99+

CathyPERSON

0.99+

Silicon ValleyLOCATION

0.99+

PfizerORGANIZATION

0.99+

LinkedInORGANIZATION

0.99+

Munich Re InsuranceORGANIZATION

0.99+

San Jose, CaliforniaLOCATION

0.99+

congressORGANIZATION

0.99+

New York CityLOCATION

0.99+

Joe OppenheimerPERSON

0.99+

PythonTITLE

0.99+

10QUANTITY

0.99+

MeijersORGANIZATION

0.99+

ZuckerbergPERSON

0.99+

16 different algorithmsQUANTITY

0.99+

Weapons of Math DestructionTITLE

0.99+

GDPRTITLE

0.99+

OneQUANTITY

0.98+

each stepQUANTITY

0.98+

theCUBEORGANIZATION

0.98+

about 36%QUANTITY

0.98+

DataWorks Summit 2018EVENT

0.97+

TableauTITLE

0.97+

about 30QUANTITY

0.97+

HortonworksORGANIZATION

0.97+

AlationORGANIZATION

0.96+

one algorithmQUANTITY

0.96+

30 different vendorsQUANTITY

0.95+

billions of dollarsQUANTITY

0.95+

2015DATE

0.95+

SaaSTITLE

0.94+

oneQUANTITY

0.94+

GiganticORGANIZATION

0.93+

firstQUANTITY

0.9+

MicroStrategyTITLE

0.88+

this morningDATE

0.88+

couple of months agoDATE

0.84+

todayDATE

0.81+

MeijerORGANIZATION

0.77+

WikiTITLE

0.74+

Vice PresidentPERSON

0.72+

DataWorksORGANIZATION

0.71+

AlationPERSON

0.53+

DataWorksEVENT

0.43+

Joe Morrissey, Hortonworks | Dataworks Summit 2018


 

>> Narrator: From Berlin, Germany, it's theCUBE! Covering Dataworks Summit Europe 2018. Brought to you by Hortonworks. >> Well, hello. Welcome to theCUBE. I'm James Kobielus. I'm lead analyst at Wikibon for big data analytics. Wikibon, of course, is the analyst team inside of SiliconANGLE Media. One of our core offerings is theCUBE and I'm here with Joe Morrissey. Joe is the VP for International at Hortonworks and Hortonworks is the host of Dataworks Summit. We happen to be at Dataworks Summit 2018 in Berlin! Berlin, Germany. And so, Joe, it's great to have you. >> Great to be here! >> We had a number of conversations today with Scott Gnau and others from Hortonworks and also from your customer and partners. Now, you're International, you're VP for International. We've had a partner of yours from South Africa on theCUBE today. We've had a customer of yours from Uruguay. So there's been a fair amount of international presence. We had Munich Re from Munich, Germany. Clearly Hortonworks is, you've been in business as a company for seven years now, I think it is, and you've established quite a presence worldwide, I'm looking at your financials in terms of your customer acquisition, it just keeps going up and up so you're clearly doing a great job of bringing the business in throughout the world. Now, you've told me before the camera went live that you focus on both Europe and Asia PACS, so I'd like to open it up to you, Joe. Tell us how Hortonworks is doing worldwide and the kinds of opportunities you're selling into. >> Absolutely. 2017 was a record year for us. We grew revenues by over 40% globally. I joined to lead the internationalization of the business and you know, not a lot of people know that Hortonworks is actually one of the fastest growing software companies in history. We were the fastest to get to $100 million. Also, now the fastest to get to $200 million but the majority of that revenue contribution was coming from the United States. When I joined, it was about 15% of international contribution. By the end of 2017, we'd grown that to 31%, so that's a significant improvement in contribution overall from our international customer base even though the company was growing globally at a very fast rate. >> And that's also not only fast by any stretch of the imagination in terms of growth, some have said," Oh well, maybe Hortonworks, "just like Cloudera, maybe they're going to plateau off "because the bloom is off the rose of Hadoop." But really, Hadoop is just getting going as a market segment or as a platform but you guys have diversified well beyond that. So give us a sense for going forward. What are your customers? What kind of projects are you positioning and selling Hortonworks solutions into now? Is it a different, well you've only been there 18 months, but is it shifting towards more things to do with streaming, NiFi and so forth? Does it shift into more data science related projects? Coz this is worldwide. >> Yeah. That's a great question. This company was founded on the premise that data volumes and diversity of data is continuing to explode and we believe that it was necessary for us to come and bring enterprise-grade security and management and governance to the core Hadoop platform to make it really ready for the enterprise, and that's what the first evolution of our journey was really all about. A number of years ago, we acquired a company called Onyara, and the logic behind that acquisition was we believe companies now wanted to go out to the point of origin, of creation of data, and manage data throughout its entire life cycle and derive pre-event as well as post-event analytical insight into their data. So what we've seen as our customers are moving beyond just unifying data in the data lake and deriving post-transaction inside of their data. They're now going all the way out to the edge. They're deriving insight from their data in real time all the way from the point of creation and getting pre-transaction insight into data as well so-- >> Pre-transaction data, can you define what you mean by pre-transaction data. >> Well, I think if you look at it, it's really the difference between data in motion and data at rest, right? >> Oh, yes. >> A specific example would be if a customer walks into the store and they've interacted in the store maybe on social before they come in or in some other fashion, before they've actually made the purchase. >> Engagement data, interaction data, yes. >> Engagement, exactly. Exactly. Right. So that's one example, but that also extends out to use cases in IoT as well, so data in motion and streaming data, as you mentioned earlier since become a very, very significant use case that we're seeing a lot of adoption for. Data science, I think companies are really coming to the realization that that's an essential role in the organization. If we really believe that data is the most important asset, that it's the crucial asset in the new economy, then data scientist becomes a really essential role for any company. >> How do your Asian customers' requirements differ, or do they differ from your European cause European customers clearly already have their backs against the wall. We have five weeks until GDPR goes into effect. Do many of your Asian customer, I'm sure a fair number sell into Europe, are they putting a full court, I was going to say in the U.S., a full court press on complying with GDPR, or do they have equivalent privacy mandates in various countries in Asia or a bit of both? >> I think that one of the primary drivers I see in Asia is that a lot of companies there don't have the years of legacy architecture that European companies need to contend with. In some cases, that means that they can move towards next generation data-orientated architectures much quicker than European companies have. They don't have layers of legacy tech that they need to sunset. A great example of that is Reliance. Reliance is the largest company in India, they've got a subsidiary called GO, which is the fastest growing telco in the world. They've implemented our technology to build a next-generation OSS system to improve their service delivery on their network. >> Operational support system. >> Exactly. They were able to do that from the ground up because they formed their telco division around being a data-only company and giving away voice for free. So they can in some extent, move quicker and innovate a little faster in that regards. I do see much more emphasis on regulatory compliance in Europe than I see in Asia. I do think that GDPR amongst other regulations is a big driver of that. The other factor though I think that's influencing that is Cloud and Cloud strategy in general. What we've found is that, customers are drawn to the Cloud for a number of reasons. The economics sometimes can be attractive, the ability to be able to leverage the Cloud vendors' skills in terms of implementing complex technology is attractive, but most importantly, the elasticity and scalability that the Cloud provides us, hugely important. Now, the key concern for customers as they move to the Cloud though, is how do they leverage that as a platform in the context of an overall data strategy, right? And when you think about what a data strategy is all about, it all comes down to understanding what your data assets are and ensuring that you can leverage them for a competitive advantage but do so in a regulatory compliant manner, whether that's data in motion or data at rest. Whether it's on-prem or in the Cloud or in data across multiple Clouds. That's very much a top of mind concern for European companies. >> For your customers around the globe, specifically of course, your area of Europe and Asia, what percentage of your customers that are deploying Hortonworks into a purely public Cloud environment like HDInsight and Microsoft Azure or HDP inside of AWS, in a public Cloud versus in a private on-premises deployment versus in a hybrid public-private multi Cloud. Is it mostly on-prem? >> Most of our business is still on-prem to be very candid. I think almost all of our customers are looking at migrating, some more close to the Cloud. Even those that had intended to have a Cloud for a strategy have now realized that not all workloads belong in the Cloud. Some are actually more economically viable to be on-prem, and some just won't ever be able to move to the Cloud because of regulation. In addition to that, most of our customers are telling us that they actually want Cloud optionality. They don't want to be locked in to a single vendor, so we very much view the future as hybrid Cloud, as multi Cloud, and we hear our customers telling us that rather than just have a Cloud strategy, they need a data strategy. They need a strategy to be able to manage data no matter where it lives, on which tier, to ensure that they are regulatory compliant with that data. But then to be able to understand that they can secure, govern, and manage those data assets at any tier. >> What percentage of your deals involve a partner? Like IBM is a major partner. Do you do a fair amount of co-marketing and joint sales and joint deals with IBM and other partners or are they mostly Hortonworks-led? >> No, partners are absolutely critical to our success in the international sphere. Our partner revenue contribution across EMEA in the past year grew, every region grew by over 150% in terms of channel contribution. Our total channel business was 28% of our total, right? That's a very significant contribution. The growth rate is very high. IBM are a big part of that, as are many other partners. We've got, the very significant reseller channel, we've got IHV and ISV partners that are critical to our success also. Where we're seeing the most impact with with IBM is where we go to some of these markets where we haven't had a presence previously, and they've got deep and long-standing relationships and that helps us accelerate time to value with our customers. >> Yeah, it's been a very good and solid partnership going back several years. Well, Joe, this is great, we have to wrap it up, we're at the end of our time slot. This has been Joe Morrissey who is the VP for International at Hortonworks. We're on theCUBE here at Dataworks Summit 2018 in Berlin, and want to thank you all for watching this segment and tune in tomorrow, we'll have a full slate of further discussions with Hortonworks, with IBM and others tomorrow on theCUBE. Have a good one. (upbeat music)

Published Date : Apr 18 2018

SUMMARY :

Brought to you by Hortonworks. and Hortonworks is the host of Dataworks Summit. and the kinds of opportunities you're selling into. Also, now the fastest to get to $200 million of the imagination in terms of growth, and governance to the core Hadoop platform Pre-transaction data, can you define what you mean maybe on social before they come in or Engagement data, that that's an essential role in the organization. Do many of your Asian customer, that they need to sunset. the ability to be able to leverage the Cloud vendors' skills and Microsoft Azure or Most of our business is still on-prem to be very candid. and joint deals with IBM that are critical to our success also. and want to thank you all for watching this segment and

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
James KobielusPERSON

0.99+

Joe MorrisseyPERSON

0.99+

IBMORGANIZATION

0.99+

AsiaLOCATION

0.99+

EuropeLOCATION

0.99+

JoePERSON

0.99+

UruguayLOCATION

0.99+

HortonworksORGANIZATION

0.99+

IndiaLOCATION

0.99+

Scott GnauPERSON

0.99+

seven yearsQUANTITY

0.99+

WikibonORGANIZATION

0.99+

28%QUANTITY

0.99+

South AfricaLOCATION

0.99+

OnyaraORGANIZATION

0.99+

BerlinLOCATION

0.99+

United StatesLOCATION

0.99+

$100 millionQUANTITY

0.99+

$200 millionQUANTITY

0.99+

31%QUANTITY

0.99+

five weeksQUANTITY

0.99+

18 monthsQUANTITY

0.99+

GOORGANIZATION

0.99+

tomorrowDATE

0.99+

2017DATE

0.99+

bothQUANTITY

0.99+

GDPRTITLE

0.99+

one exampleQUANTITY

0.99+

oneQUANTITY

0.98+

todayDATE

0.98+

U.S.LOCATION

0.98+

Dataworks Summit 2018EVENT

0.98+

AWSORGANIZATION

0.98+

Berlin, GermanyLOCATION

0.98+

over 40%QUANTITY

0.98+

MicrosoftORGANIZATION

0.98+

RelianceORGANIZATION

0.98+

over 150%QUANTITY

0.97+

Dataworks SummitEVENT

0.97+

EMEAORGANIZATION

0.97+

first evolutionQUANTITY

0.96+

2018EVENT

0.96+

EuropeanOTHER

0.96+

SiliconANGLE MediaORGANIZATION

0.95+

Munich, GermanyLOCATION

0.95+

OneQUANTITY

0.95+

end of 2017DATE

0.94+

HadoopTITLE

0.93+

ClouderaORGANIZATION

0.93+

about 15%QUANTITY

0.93+

past yearDATE

0.92+

theCUBEORGANIZATION

0.92+

single vendorQUANTITY

0.91+

telcoORGANIZATION

0.89+

Munich ReORGANIZATION

0.88+

Keynote Analysis | Dataworks Summit 2018


 

>> Narrator: From Berlin, Germany, it's theCUBE! Covering DataWorks Summit, Europe 2018. (upbeat music) Brought to you by Hortonworks. (upbeat music) >> Hello, and welcome to theCUBE. I'm James Kobielus. I'm the lead analyst for Big Data analytics in the Wikibon team of SiliconANGLE Media, and we're here at DataWorks Summit 2018 in Berlin, Germany. And it's an excellent event, and we are here for two days of hard-hitting interviews with industry experts focused on the hot issues facing customers, enterprises, in Europe and the world over, related to the management of data and analytics. And what's super hot this year, and it will remain hot as an issue, is data privacy and privacy protection. Five weeks from now, a new regulation of the European Union called the General Data Protection Regulation takes effect, and it's a mandate that is effecting any business that is not only based in the EU but that does business in the EU. It's coming fairly quickly, and enterprises on both sides of the Atlantic and really throughout the world are focused on GDPR compliance. So that's a hot issue that was discussed this morning in the keynote, and so what we're going to be doing over the next two days, we're going to be having experts from Hortonworks, the show's host, as well as IBM, Hortonworks is one of their lead partners, as well as a customer, Munich Re, will appear on theCUBE and I'll be interviewing them about not just GDPR but really the trends facing the Big Data industry. Hadoop, of course, Hortonworks got started about seven years ago as one of the solution providers that was focused on commercializing the open source Hadoop code base, and they've come quite a ways. They've had their recent financials were very good. They continue to rock 'n' roll on the growth side and customer acquisitions and deal sizes. So we'll be talking a little bit later to Scott Gnau, their chief technology officer, who did the core keynote this morning. He'll be talking not only about how the business is doing but about a new product announcement, the Data Steward Studio that Hortonworks announced overnight. It is directly related to or useful, this new solution, for GDPR compliance, and we'll ask Scott to bring us more insight there. But what we'll be doing over the next two days is extracting signal from noise. The Big Data space continues to grow and develop. Hadoop has been around for a number of years now, but in many ways it's been superseded in the agenda as the priorities of enterprises that are building applications from data by some newer primarily open source technology such as Apache Spark, TensorFlow for building deep learning and so forth. We'll be discussing the trends towards the deepening of the open source data analytics stack with our guest. We'll be talking with a European based reinsurance company, Munich Re, about the data lake that they have built for their internal operations, and we'll be asking their, Andres Kohlmaier, their lead of data engineering, to discuss how they're using it, how they're managing their data lake, and possibly to give us some insight about it will serve them in achieving GDPR compliance and sustaining it going forward. So what we will be doing is that we'll be looking at trends, not just in compliance, not just in the underlying technologies, but the applications that Hadoop and Spark and so forth, these technologies are being used for, and the applications are really, the same initiatives in Europe are world-wide in terms of what enterprises are doing. They're moving away from Big Data environments built primarily on data at rest, that's where Hadoop has been, the sweet spot, towards more streaming architectures. And so Hortonworks, as I said the show's host, has been going more deeply towards streaming architectures with its investments in NiFi and so forth. We'll be asking them to give us some insight about where they're going with that. We'll also be looking at the growth of multi-cloud Big Data environments. What we're seeing is that there's a trend in the marketplace away from predominately premises-based Big Data platforms towards public cloud-based Big Data platforms. And so Hortonworks, they are partners with a number of the public cloud providers, the IBM that I mentioned. They've also got partnerships with Microsoft Azure, with Amazon Web Services, with Google and so forth. We'll be looking, we'll be asking our guest to give us some insight about where they're going in terms of their support for multi-clouds, support for edge computing, analytics, and the internet of things. Big Data increasingly is evolving towards more of a focus on serving applications at the edge like mobile devices that have autonomous smarts like for self-driving vehicles. Big Data is critically important for feeding, for modeling and building the AI needed to power the intelligence and endpoints. Not just self-driving cars but intelligent appliances, conversational user interfaces for mobile devices for our consumer appliances like, you know, Amazon's got their Alexa, Apple's got their Siri and so forth. So we'll be looking at those trends as well towards pushing more of that intelligence towards the edge and the power and the role of Big Data and data driven algorithms, like machine learning, and driving those kinds of applications. So what we see in the Wikibon, the team that I'm embedded within, we have published just recently our updated forecast for the Big Data analytics market, and we've identified key trends that are... revolutionizing and disrupting and changing the market for Big Data analytics. So among the core trends, I mentioned the move towards multi-clouds. The move towards a more public cloud-based big data environments in the enterprise, I'll be asking Hortonworks, who of course built their business and their revenue stream primarily on on-premises deployments, to give us a sense for how they plan to evolve as a business as their customers move towards more public cloud facing deployments. And IBM, of course, will be here in force. We have tomorrow, which is a Thursday. We have several representatives from IBM to talk about their initiatives and partnerships with Hortonworks and others in the area of metadata management, in the area of machine learning and AI development tools and collaboration platforms. We'll be also discussing the push by IBM and Hortonworks to enable greater depths of governance applied to enterprise deployments of Big Data, both data governance, which is an area where Hortonworks and IBM as partners have achieved a lot of traction in terms of recognition among the pace setters in data governance in the multi-cloud, unstructured, Big Data environments, but also model governments. The governing, the version controls and so forth of machine learning and AI models. Model governance is a huge push by enterprises who increasingly are doing data science, which is what machine learning is all about. Taking that competency, that practice, and turning into more of an industrialized pipeline of building and training and deploying into an operational environment, a steady stream of machine-learning models into multiple applications, you know, edge applications, conversational UIs, search engines, eCommerce environments that are driven increasingly by machine learning that's able to process Big Data in real time and deliver next best actions and so forth more intelligence into all applications. So we'll be asking Hortonworks and IBM to net out where they're going with their partnership in terms of enabling a multi-layered governance environment to enable this pipeline, this machine-learning pipeline, this data science pipeline, to be deployed it as an operational capability into more organizations. Also, one of the areas where I'll be probing our guest is to talk about automation in the machine learning pipeline. That's been a hot theme that Wikibon has seen in our research. A lot of vendors in the data science arena are adding automation capabilities to their machine-learning tools. Automation is critically important for productivity. Data scientists as a discipline are in limited supply. I mean experienced, trained, seasoned data scientists fetch a high price. There aren't that many of them, so more of the work they do needs to be automated. It can be automated by a mature tool, increasingly mature tools on the market, a growing range of vendors. I'll be asking IBM and Hortonworks to net out where they're going with automation in sight of their Big Data, their machine learning tools and partnerships going forward. So really what we're going to be doing over the next few days is looking at these trends, but it's going to come back down to GDPR as a core envelope that many companies attending this event, DataWorks Summit, Berlin, are facing. So I'm James Kobielus with theCUBE. Thank you very much for joining us, and we look forward to starting our interviews in just a little while. Our first up will be Scott Gnau from Hortonworks. Thank you very much. (upbeat music)

Published Date : Apr 18 2018

SUMMARY :

Brought to you by Hortonworks. and enterprises on both sides of the Atlantic

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
James KobielusPERSON

0.99+

IBMORGANIZATION

0.99+

HortonworksORGANIZATION

0.99+

Scott GnauPERSON

0.99+

Andres KohlmaierPERSON

0.99+

AppleORGANIZATION

0.99+

European UnionORGANIZATION

0.99+

EuropeLOCATION

0.99+

General Data Protection RegulationTITLE

0.99+

ScottPERSON

0.99+

GoogleORGANIZATION

0.99+

Amazon Web ServicesORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

MicrosoftORGANIZATION

0.99+

two daysQUANTITY

0.99+

Munich ReORGANIZATION

0.99+

ThursdayDATE

0.99+

SiriTITLE

0.99+

GDPRTITLE

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

Berlin, GermanyLOCATION

0.99+

WikibonORGANIZATION

0.99+

firstQUANTITY

0.99+

Data Steward StudioORGANIZATION

0.98+

bothQUANTITY

0.98+

tomorrowDATE

0.98+

DataWorks SummitEVENT

0.98+

AtlanticLOCATION

0.98+

oneQUANTITY

0.98+

BerlinLOCATION

0.98+

both sidesQUANTITY

0.97+

DataWorks Summit 2018EVENT

0.97+

ApacheORGANIZATION

0.96+

HadoopTITLE

0.95+

AlexaTITLE

0.94+

this yearDATE

0.94+

SparkTITLE

0.92+

2018EVENT

0.91+

EUORGANIZATION

0.91+

Dataworks Summit 2018EVENT

0.88+

TensorFlowORGANIZATION

0.81+

this morningDATE

0.77+

about seven years agoDATE

0.76+

AzureTITLE

0.7+

next two daysDATE

0.68+

Five weeksQUANTITY

0.62+

NiFiTITLE

0.59+

EuropeanLOCATION

0.59+

theCUBEORGANIZATION

0.58+

Gianthomas Volpe & Bertrand Cariou | DataWorks Summit Europe 2017


 

(upbeat music) >> Announcer: Live from Munich, Germany, it's the Cube covering DataWorks Summit Europe, 2017. Brought to you by Hortonworks. >> Hey, welcome back everyone. We're here live in Munich, Germany, at the DataWorks 2017 Summit. I'm John Furrier, my co-host Dave Vellante with the Cube, and our next two guests are Gianthomas Volpe, head of customer development e-media for Alation. Welcome to the Cube. And we have Bertrand Cariou, who's the director of solution marketing at Trifecta with partners. Guys, welcome to the Cube. >> Thank you. >> Thank you for having us. >> Big fans of both your start-ups and growing. You guys are doing great. We had your CEO on our big data SV, Joe Hellerstein, he talked about the rang, all the cool stuff that's going on, and Alation, we know Stephanie has been on many times, but you guys are start ups that are doing very well and growing in this ecosystem, and, you know, everyone's going public. Cloud Air has filed their S1, great news for those guys, so the data world has changed beyond Hadoop. You're seeing it, obviously Hadoop is not dead, but it's still going to be a critical component of a larger ecosystem that's developing. You guys are part of that. So I want to get your thoughts of why you're here in Europe, okay? And how you guys are working together to take data to the next level, because, you know, we're hearing more and more data is a foundational conversation starter, because now there's other things happening, IOT, business analysts, you guys are in the heart of it. Your thoughts? >> You know, going to be you. >> All in, yeah, sure. So definitely at Alation what we're seeing is more and more people across the organization want to get access to the data, and we're kind of breaking out of the traditional roles around IP managing both metadata, data preparation, like Trifecta's focused on. So we're pretty squarely focused on how do we bring that access to a wider range of people? How do we enable that social and collaborative approach to working with that data, whether it's in a data lake so, or here at DataWorks. So clearly that's one of the main topics. But also other data sources within the organization. >> So you're freeing the data up and the whole collaboration thing is more of, okay, don't just look at IT as this black box of give me some data and now spit out some data at me. Maybe that's the old way. The new way is okay, all of the data's out there, they're doing their thing, but the collaboration is for the user to get into that data you know, ingestion. Playing with the data, using the data, shaping the data. Developing with the data. Whatever they're doing, right? >> It's just bringing transparency to not only what IT is doing and making that accessible to users, but also helping users collaborate across different silos within an organization, so. We look at things like logs to understand who is doing what with the data, so if I'm working in one group, I can find out that somebody in a completely different group in the organization is working with similar data, bringing new techniques to their analysis, and can start leveraging that and have a conversation that others can learn from, too. >> So basically it's like a discovery platform for saying hey, you know, Mary in department X has got these models. I can leverage that. Is that kind of what you guys are all about? >> Yeah, definitely. And breaking through that, enabling communication across the different levels of the organization, and teaching other people at all different levels of maturity within the company, how they can start interacting with data and giving them the tools to up skill throughout that process. >> Bertrand, how about the Trifecta? 'Cause one of the things that I find exciting about Europe value proposition and talking to Joe, the founder, besides the fact that they all have GitHub on their about page, which is the coolest thing ever, 'cause they're all developers. But the more reality is is that a business person or person dealing with data in some part of a geography, could be whether it's in Europe or in the US, might have a completely different view and interest in data than someone in another area. It could be sales data, could be retail data, it doesn't matter but it's never going to be the same schema. So the issue is, got to take that away from the user complexity. That is really fundamental change. >> Yeah. You're totally correct. So information is there, it is available. Alation helps identify what is the right information that can be used, so if I'm in marketing, I could reuse sales information, associating maybe with web logs information. Alation will give me the opportunity to know what information is available and if I can trust it. If someone in finance is using that information, I can trust that data. So now as a user, I want to take that data, maybe combine the data, and the data is always a different format, structure, level of quality, and the work of data wrangling is really for the end user, you can be an analyst. Someone in the line of business most of the time, these could be like some of the customers we are here in Germany like Munich Re would be actuaries. Building risk models and or claimed for casting, payment for casting. So they are not technologies at all, but they need to combine these data sets by themselves, and at scale, and the work they're doing, they are producing new information and this information is used directly to their own business, but as soon as they share this information, back to the data lake, Alation will index this information, see how it is used, and put it to this visibility to the other users for reuse as well. >> So you guys have a partnership, or is this more of a standard API kind of thing? >> So we do have a partnership, we have plan development on the road map. It's currently happening. So I think by the end of the quarter, we're going to be delivering a new integration where whether I'm in Alation and looking for data and finding something that I want to work with, I know needs to be prepared I can quickly jump into Trifecta to do that. Or the other way around in Trifecta, if I'm looking for data to prepare, I can open the catalog, quickly find out what exists and how to work with it better. >> So basically the relationship, if I get this right is, you guys pass on your expertise of the data wrangling all the back processes you guys have, and advertise that into Alation. They discover it, make it surfaceable for the social collaboration or the business collaboration. >> Exactly. And when the data is wrangled, it began indexed and so it's a virtual circle where all the data that is traded and combined is exposed to the user to be reused. >> So if I were Chief Data Officer, I'd say okay, there's three sequential things that I need to do, and you can maybe help me with a couple of them. So the first one is I need to understand how data contributes to the monetization of my company, if I'm a public company or a for profit company. That's, I guess my challenge. But then, there are other two things that I need to give people access to that data, and I need quality. So I presume Alation can help me understand what data's available. I can actually, it kind of helps with number one as well because like you said, okay, this is the type of data, this is how the business process works. Feed it. And then the access piece and quality. I guess the quality is really where Trifecta comes in. >> GianThomas: Yes. >> What about that sequential flow that I just described? Is that common? >> Yeah >> In your business, your customer base. >> It's definitely very common. So, kind of going back to the Munich Re examples, since we're here in Munich, they're very focused on providing better services around risk reduction for their customers. Data that can impact that risk can be of all kinds from all different places. You kind of have to think five, ten years ahead of where we are now to see where it might be coming from. So you're going to have a ton of data going in to the data lake. Just because you have a lot of data, that does not mean that people will know how to work with it they won't know that it exists. And especially since the volumes are so high. It doesn't mean that it's all coming in at a greatly usable format. So Alation comes in to play in helping you find not only what exists, by automating that process of extraction but also looking at what data people are actually using. So going back to your point of how do I know what data's driving value for the organization, we can tell you in this schema, this is what's actually being used the most. That's a pretty good starting point to focus in on what is driving value and when you do find something, then you can move over to Trifecta to prepare it and get it ready for analysis. >> So keying on that for a second, so in the example of Munich Re, the value there is my reduction in expected loss. I'm going to reduce my risk, that puts money in my bottom line. Okay, so you can help me with number one, and then take that Munich Re example into Trifecta. >> Yes, so the user will be the same user using Alation and Trifecta. So is an actuary. So as soon as the actuary items you find the data that is the most relevant for what you'll be planning, so the actuaries are working with terms like development triangles over 20 years. And usually it's column by column. So they have to pivot the data row by row. They have to associate that with the paid claims the new claims coming in, so all these information is different format. Then they have to look at maybe weather information, or additional third party information where the level of quality is not well known, so they are bringing data in the lake that is not yet known. And they're combining all this data. The outcome of that work, that helps in the Reese modeling so that could be used by, they could use Sass or our older technology for the risk modeling. But when they've done that modeling and building these new data sets. They're, again, available to the community because Alation would index that information and explain how it is used. The other things that we've seen with our users is there's also a very strong, if you think about insurances banks, farmer companies, there is a lot of regulation. So, as the user, as you are creating new data, said where the data coming from. Where the data is going, how is it used in the company? So we're capturing all that information. Trifecta would have the rules to transform the data, Alation will see the overall eye level picture from table to the source system where the data is come. So super important as well for the team. >> And just one follow up. In that example, the actuary, I know hard core data scientists hate this term, but the actuaries, the citizen data scientist. Is that right? >> The actuaries would know I would say statistics, usually. But you get multiple level of actuaries. You get many actuaries, they're Excel users. They have to prepare data. They have to pin up, structure the data to give it to next actuary that will be doing the pricing model or the next actuary that will risk modeling. >> You guys are hitting on a great formula which is cutting edge, which is why you guys are on the startups. But, Bertrand I want to talk to you about your experience at Informatica. You were the founder the Informatica France. And you're also involved in some product development in the old, I'd say old days, but like. Back in the days when structured data and enterprise data, which was once a hard problem, deal with metadata, deal with search, you had schemes, all kinds of stuff to deal with. It was very difficult. You have expertise. I want you to talk about what's different now in this environment. Because it's still challenging. But now the world has got so much fast data, we got so much new IOT data, especially here in Europe. >> Oh yes. >> Where you have an industrialized focus, certainly Germany, like case in point, but it's pretty smart mobility going on in Europe. You've always had that mobile environment. You've got smart cities. A lot of focus on data. What's the new world like now? How are people dealing with this? What's your perspective? >> Yes, so there's and we all know about the big data and with all this volume, additional volume and new structure of data. And I would say legacy technology can deal as you mentioned, with well structured information. Also you want to give that information to the masses. Because the people who know the data best, are the business people. They know what to do with the data, but the access of this data is pretty complicated. So where Trifecta is really differentiating and has been thinking through that is to say whatever the structure of the data, IOT, Web Logs, Value per J son, XML, that should be for an end user, just metrics. So that's the way you understand the data. The next thing when play with data, usually you don't know what the schema would be at the end. Because you don't know what the outcome is. So, you are, as an end user, you are exploring the data combining data set and the structure is trading as you discover the data. So that is also something new compared to the old model where an end user would go to the data engineer to say I need that information, can you give me that information? And engineers would look at that and say okay. We can access here, what is the schema? There was all this back and forth. >> There was so much friction in the old way, because the creativity of the user is independent now of all that scaffolding and all the wrangling, pre-processing. So I get that piece of the Citizen's Journal, Citizen Analyst. But the key thing here is you were shrecking with the complexity to get the job done. So the question then comes in, because it's interesting, all the theme here at DataWorks Summit in Europe and in the US is all the big transformative conversations are starting with business people. So this a business unit so the front lines if you will, not IT. Although IT now's got to support that. If that's the case, the world's shifting to the business owners. Hence your start up. Is that kind of getting that right? >> I think so. And I think that's also where we're positioning ourselves is you have a data lake, you can put tons of data in it, but if you don't find an easy way to make that accessible to a business user, you're not going to get a value out of it. It's just going to become a storage place. So really, what we've focused on is how do you make that layer easily accessible? How do you share around and bring some of the common business practices to that? And make sure that you're communicating with IT. So IT shouldn't be cast aside, but they should have an ongoing relationship with the business user. >> By the way, I'll point out that Dave knows I'm not really a big fan of the data lake concept mainly because they've turned it into data swamps because IT deploys it, we're done! You know, check the box. But, data's getting stale because it's not being leveraged. You're not impacting the data or making it addressable, or discoverable or even wrangleable. If that's a word. But my point is that's all complexities. >> Yes, so we call it sort of frozen data lake. You build a lake, and then it's frozen and nobody can go fishing. >> You play hockey on it. (laughs) >> You dig and you're fishing. >> And you need to have this collaboration ongoing with the IT people, because they own the infrastructure. They can feed the lake with data with the business. If there is no collaboration, and we've seen that multiple times. Data lake initiatives, and then we come back one year after there is no one using the lake, like one, two person of the processing power, or the data is used. Nobody is going to the lake. So you need to index the data, catalog the data to know what is available. >> And the psychology for IT is important here, and I was talking yesterday with IBM folks, Nevacarti here, but this is important because IT is not necessarily in a position of doing it because doing the frozen lake or data swamp because they want to screw over the business people, they just do their job, but here you're empowering them because you guys are got some tech that's enabling the IT to do a data lake or data environment that allows them to free up the hassles, but more importantly, satisfy the business customer. >> GeanThomas: Exactly. >> There's a lot of tech involved. And certainly we've talked to you guys about that. Talk about that dynamic of the psychology because that's what IT wants. So what's that dev ops mindset for data, data ops if you will or you know, data as code if you will, constantly what we've been calling it but that's now the cloud ethos hits the date ethos. Kind of coming together. >> Yes, I think data catalogs are subtly different in that traditionally they are more of an IT function, but to some extent on the metadata side, where as on the business side, they tended to be a siloed organization of information that business itself kept to maintain very manually. So we've tried to bring that together. All the different parties within this process from the IT side to the govern stewardship all the way down to the analysts and data scientists can get value out of a data catalog that can help each other out throughout that process. So if it's communicating to end users what kind of impact any change IT will make, that makes their life easier, and have one way to communicate that out and see what's going to happen. But also understand what the business is doing for governance or stewardship. You can't really govern or curate if you don't know what exists and what matters to the business itself. So bring those different stages together, helping them help each other is really what Alation does. >> Tell about the prospects that you guys are engaging in from a customer standpoint. What are some of the conversations of those customers you haven't gotten yet together. And and also give an example of a customer that you guys have, and use cases where they've been successful. >> Absolutely. So typically what we see, is that an organization is starting up a data lake or they already have legacy data warehouses. Often it's both, together. And they just need a unified way of making information about those environments available to end users. And they want to have that better relationship. So we're often seeing IT engaged in trying to develop that relationship along with the business. So that's typically how we start and we in the process of deploying, work in to that conversation of now that you know what exists, what you might want to work with, you're often going to have to do some level of preparation or transformation. And that's what makes Trifecta a great fit for us, as a partner, is coming to that next step. >> Yeah, on Mobile Market Share, one of our common customers, we have DNSS, also a common customer, eBay, a common customer. So we've got already multiple customers and so some information about the issue Market Share, they have to deal with their customer information. So the first thing they receive is data, digital information about ads, and so it's really marketing type of data. They have to assess the quality of the data. They have to understand what values and combine the value with their existing data to provide back analytics to their customers. And that use case, we were talking to the business users, my people selling Market Share to their customers because the fastest they can unboard their data, they can qualify the quality of the data the easiest it is to deliver right level of quality analytics. And also to engage more customers. So it was really was to be fast onboarding customer data and deliver analytics. And where Alatia explain is that they can then analyze all the sequel statement that the customers, maybe I'll let you talk about use case, but there's also, it was the same users looking at the same information, so we engage with the business users. >> I wonder if we can talk about the different roles. You hear about the data scientists obviously, the data engineer, there might be a data quality professional involved, there's certainly the application developer. These guys may or may not even be in IT. And then you got a DVA. Then you may have somebody who's statistician. They might sit in the line of business. Am I overcomplicating it? Do larger organizations have these different roles? And how do you help bring them together? >> I'd say that those roles are still influx in the industry. Sometimes they sit on IT's legs, sometimes they sit in the business. I think there's a lot of movement happening it's not a consistent definition of those different roles. So I think it comes down to different functions. Sometimes you find those functions happening within different places in the company. So stewardship and governance may happen on the IT side, it might happen on the business side, and it's almost a maturity scale of how involved the two sides are within that. So we play with all of those different groups so it's sometimes hard to narrow down exactly who it is. But generally it's on the consumptions side whether it's the analyst or data scientists, and there's definitely a crossover between the two groups, moving up towards the governance and stewardship that wants to enable those users or document curing the data for them all the way to the IT data engineers that operationalize a lot of the work that the data scientists and analysts might be hypothesizing and working with in their research. >> And you sell to all of those roles? Who's your primary user constituency, or advocate? >> We sell both to the analytics groups as well as governance and they often merge together. But we tend to talk to all of those constituencies throughout a sales cycle. >> And how prominent in your customer base do you see that the role of the Chief Data Officer? Is it only reconfined within regulated industries? Does he seep into non-regulated industries? >> I'd say for us, it seeps with non-regulated industries. >> What percent of the customers, for instance have, just anecdotally, not even customers, just people that you talk to, have a Chief Data Officer? Formal Chief Data Officer? >> I'd say probably about 60 to 70 percent. >> That high? >> Yeah, same for us. In regulated industries (mumbles). I think they play a role. The real advantage a Chief Data and Analytical Officer, it's data and analytics, and they have to look at governance. Governance could be for regulation, because you have to, you've got governance policy, which data can be combined with which data, there is a lot. And you need to add that. But then, even if you are less regulated, you need to know what data is available, and what data is (mumbles). So you have this requirement as well. We see them a lot. We are more and more powerful, I would say in the enterprise where they are able to collaborate with the business to enable the business. >> Thanks so much for coming on the Cube, I really appreciate it. Congratulations on your partnership. Final word I'll give you guys before we end the segment. Share a story, obviously you guys have a unique partnership, you've been in the business for awhile, breaking into the business with Alation. Hot startups. What observations out there that people should know about that might not be known in this data world. Obviously there's a lot of false premises out there on what the industry may or may not be, but there's a lot of certainly a sea change happening. You see AI, it gives a mental model for people, Eugene Learning, Autonomous Vehicles, Smart Cities, some amazing, kind of magical things going on. But for the basic business out there, they're struggling. And there's a lot of opportunities if they get it right, what thing, observation, data, pattern you're seeing that people should know about that may not be known? It could be something anecdotal or something specific. >> You go first. (laughs) >> So maybe there will be surprising, but like Kaiser is a big customer of us. And you know Kaiser in California in the US. They have hundreds or thousands of hospitals. And surprisingly, some of the supply chain people where I've been working for years, trying to analyze, optimizing the relationship with their suppliers. Typically they would buy a staple gun without staples. Stupid. But they see that happening over and over with many products. They were never able to sell these, because why? There will be one product that have to go to IT, they have to work, it would take two months and there's another supplier, new products. So how to know- >> John: They're chasing their tail! >> Yeah. It's not super excited, they are now to do that in a couple of hours. So for them, they are able, by going to the data lakes, see what data, see how this hospital is buying, they were not able to do it. So there is nothing magical here, it's just giving access to the data who know the data best, the analyst. >> So your point is don't underestimate the innovation, as small as it may seem, or inconsequential, could have huge impacts. >> The innovation goes with the process to be more efficient with the data, not so much building new products, just basically being good at what you do, so then you can focus on the value you bring to the company. >> GianThomas what's your thoughts? >> So it's sort of related. I would actually say something we've seen pretty often is companies, all sizes, are all struggling with very similar, similar problems in the data space specifically so it's not a big companies have it all figured out, small companies are behind trying to catch up, and small companies aren't necessarily super agile and aren't able to change at the drop of a hat. So it's a journey. It's a journey and it's understanding what your problems are with the data in the company and it's about figuring out what works best for your solution, or for your problems. And understanding how that impacts everyone in the business. So it's really a learning process to understand what's going- >> What are your friends who aren't in the tech business say to you? Hey, what's this data thing? How do you explain it? The fundamental shift, how do you explain it? What do you say to them? >> I'm more and more getting people that already have an idea of what this data thing is. Which five years ago was not the case. Five years ago, it was oh, what's data? Tell me more about that? Why do you need to know about what's in these databases? Now, they actually get why that's important. So it's becoming a concept that everyone understands. Now it's just a matter of moving its practice and how that actually works. >> Operationalizing it, all the things you're talking about. Guys, thanks so much for bringing the insights. We wrangled it here on the Cube. Live. Congratulations to Trifecta and Alation. Great startups, you guys are doing great. Good to see you guys successful again and rising tide floats all boats in this open source world we're living in and we're bringing you more coverage here at DataWowrks 2017, I'm John Furrier with Dave Vellante. Stay with us, more great content coming after this short break. (upbeat music)

Published Date : Apr 6 2017

SUMMARY :

Brought to you by Hortonworks. at the DataWorks 2017 Summit. so the data world has So clearly that's one of the main topics. and the whole collaboration thing group in the organization Is that kind of what levels of the organization, So the issue is, the opportunity to know I can open the catalog, all the back processes you guys have, is exposed to the user to be reused. So the first one is I need to understand So Alation comes in to so in the example of Munich Re, So, as the user, as you In that example, the actuary, or the next actuary Back in the days when structured data What's the new world like now? So that's the way you understand the data. so the front lines if you will, not IT. some of the common fan of the data lake concept and nobody can go fishing. You play hockey on it. They can feed the lake with that's enabling the IT to do a data lake Talk about that dynamic of the psychology from the IT side to the govern stewardship What are some of the of now that you know what exists, the easiest it is to deliver You hear about the data that the data scientists and analysts We sell both to the analytics groups with non-regulated industries. about 60 to 70 percent. and they have to look at governance. breaking into the business with Alation. You go first. California in the US. it's just giving access to the the innovation, as small as it may seem, to be more efficient with the data, impacts everyone in the business. and how that actually works. Good to see you guys successful again

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Dave VellantePERSON

0.99+

JoePERSON

0.99+

DavePERSON

0.99+

Joe HellersteinPERSON

0.99+

JohnPERSON

0.99+

John FurrierPERSON

0.99+

EuropeLOCATION

0.99+

CaliforniaLOCATION

0.99+

GermanyLOCATION

0.99+

BertrandPERSON

0.99+

Bertrand CariouPERSON

0.99+

hundredsQUANTITY

0.99+

Gianthomas VolpePERSON

0.99+

InformaticaORGANIZATION

0.99+

MunichLOCATION

0.99+

IBMORGANIZATION

0.99+

AlationORGANIZATION

0.99+

yesterdayDATE

0.99+

StephaniePERSON

0.99+

two groupsQUANTITY

0.99+

USLOCATION

0.99+

two monthsQUANTITY

0.99+

MaryPERSON

0.99+

John FurrierPERSON

0.99+

fiveQUANTITY

0.99+

KaiserORGANIZATION

0.99+

two sidesQUANTITY

0.99+

Munich ReORGANIZATION

0.99+

GianThomasPERSON

0.99+

TrifectaORGANIZATION

0.99+

eBayORGANIZATION

0.99+

two thingsQUANTITY

0.99+

Cloud AirORGANIZATION

0.99+

one productQUANTITY

0.99+

AlationPERSON

0.99+

five years agoDATE

0.99+

Munich, GermanyLOCATION

0.98+

bothQUANTITY

0.98+

ExcelTITLE

0.98+

GeanThomasPERSON

0.98+

over 20 yearsQUANTITY

0.98+

DataWorks SummitEVENT

0.98+

oneQUANTITY

0.98+

Five years agoDATE

0.98+

Informatica FranceORGANIZATION

0.98+

two personQUANTITY

0.98+

first oneQUANTITY

0.98+

HadoopTITLE

0.97+

DataWorksORGANIZATION

0.97+

thousandsQUANTITY

0.97+

Munich ReORGANIZATION

0.96+

HortonworksORGANIZATION

0.96+

one groupQUANTITY

0.96+

DataWorks 2017 SummitEVENT

0.96+

firstQUANTITY

0.96+

GitHubORGANIZATION

0.96+

ten yearsQUANTITY

0.96+

about 60QUANTITY

0.96+

first thingQUANTITY

0.95+

CubeORGANIZATION

0.95+

Eugene LearningORGANIZATION

0.94+

Adam Wilson & Joe Hellerstein, Trifacta - Big Data SV 17 - #BigDataSV - #theCUBE


 

>> Commentator: Live from San Jose, California. It's theCUBE covering Big Data Silicon Valley 2017. >> Okay, welcome back everyone. We are here live in Silicon Valley for Big Data SV (mumbles) event in conjunction with Strata + Hadoop. Our companion event, the Big Data NYC and we're here breaking down the Big Data world as it evolves and goes to the next level up on the step function, AI machine learning, IOT really forcing people to really focus on a clear line of the side of the data. I'm John Furrier with our announcer from Wikibon, George Gilbert and our next guest, our two executives from Trifacta. The founder and Chief Strategy Officer, Joe Hellerstein and Adam Wilson, the CEO. Guys, welcome to theCUBE. Welcome back. >> Great to be here. >> Good to be here. >> Founder, co-founder? >> Co-founder. >> Co-founder. He's a multiple co-founders. I remember it 'cause you guys were one of the first sites that have the (mumbles) in the about section on all the management team. Just to show you how technical you guys are. Welcome back. >> And if you're Trifacta, you have to have three founders, right? So that's part of the tri, right? >> The triple threat, so to speak. Okay, so a big year for you guys. Give us the update. I mean, also we had Alation announce this partnering going on and some product movement. >> Yup. >> But there's a turbulent time right now. You have a lot of things happening in multiple theaters to technical theater to business theater. And also within the customer base. It's a land grand, it seems to be on the metadata and who's going to control what. What's happening? What's going on in the market place and what's the update from you guys? >> Yeah, yeah. Last year was an absolutely spectacular year for Trifacta. It was four times growth in bookings, three times growth in customers. You know, it's been really exciting for us to see the technology get in the hands of some of the largest companies on the planet and to see what they're able to do with it. From the very beginning, we really believed in this idea of self service and democratization. We recognize that the wrangling of the data is often where a lot of the time and the effort goes. In fact, up to 80% of the time and effort goes in a lot of these analytic projects and to the extent that we can help take the data from (mumbles) in a more productive way and to allow more people in an organization to do that. That's going to create information agility that that we feel really good about and there are customers and they are telling us is having an impact on their use of Big Data and Hadoop. And I think you're seeing that transition where, you know, in the very beginning there was a lot of offloading, a lot of like, hey we're going to grab some cost savings but then in some point, people scratch their heads and said, well, wait a minute. What about the strategic asset that we were building? That was going to change the way people work with the data. Where is that piece of it? And I think as people started figuring out in order to get our (mumbles), we got to have users and use cases on these clusters and the data like itself is not a used case. Tools like Trifacta have been absolutely instrumental and really fueling that maturity in the market and we feel great about what's happening there. >> I want to get some more drilled out before we get to some of these questions for Joe too because I think you mentioned, you got some quotes. I just want to double up a click on that. It always comes up in the business model question for people. What's your business model? >> Sure. >> And doing democratization is really hard. Sometimes democratization doesn't appear until years later so it's one of those elusive things. You see it and you believe it but then making it happen are two different things. >> Yeah, sure. >> So. And appreciate that the vision they-- (mumbles) But ultimately, at the end of the day, that business model comes down to how you organized. Prove points. >> Yup. >> Customers, partnerships. >> Yeah. >> We had Alation on Stephanie (mumbles). Can you share just and connect the dots on the business model? >> Sure. >> With respect to the product, customers, partners. How was that specifically evolving? >> Adam: Sure. >> Give some examples. >> Sure, yeah. And I would say kind of-- we felt from the beginning that, you know, we wanted to turn what was traditionally a very complex messy problem dealing with data, you know, in the user experience problem that was powered by machine learning and so, a lot of it was down to, you know, how we were going to build and architect the technology needed (mumbles) for really getting the power in the hands of the people who know the data best. But it's important, and I think this is often lost in Silicon Valley where the focus on innovation is all around technology to recognize that the business model also has to support democritization so one of the first things we did coming in was to release a free version of the product. So Trifacta Wrangler that is now being used by over 4500 companies, ten of thousands of users and the power of that in terms of getting people something of value that they could start using right away on spreadsheets and files and small data and allowing them to get value but then also for us, the exchange is that we're actually getting a chance to curate at scale usage data across all of these-- >> Is this a (mumbles) product? >> It's a hybrid product. >> Okay. >> So the data stays local. It never leaves their local laptop. The metadata is hashed and put into the cloud and now we're-- >> (mumbles) to that. >> Absolutely. And so now we can use that as training data that actually has more people wrangle, the product itself gets smarter based on that. >> That's good. >> So that's creating real tangible value for customers and for us is a source of very strategic advantage and so we think that combination of the technology innovation but also making sure that we can get this in the hands of users and they can get going and as their problem grows up to be bigger and more complicated, not just spreadsheets and files on the desktop but something more complicated, then we're right there along with them for products that would have been modified. >> How about partnerships with Alation? How they (mumbles)? What are all the deals you got going on there? >> So Alation has been a great partner for us for a while and we've really deepened the integration with the announcements today. We think that cataloging and data wrangling are very complimentary and they're a natural fit. We've got customers like Munich Re, like eBay as well as MarketShare that are using both solutions in concert with one another and so, we really felt that it was natural to tighten that coupling and to help people go from inventorying what's going on in their data legs and their clusters to then cleansing, standardizing. Essentially making it fit for purpose and then ensuring that metadata can roundtrip back into the catalog. And so that's really been an extension of what we're doing also at the technical level with technologies like Cloudera Navigator with Atlas and with the project that Joe's involved with at Berkeley called Ground. So I don't know if you want to talk-- >> Yeah, tell him about Ground. >> Sure. So part of our outlook on this and this speaks to the kind of way that the landscape in the industry's shaping out is that we're not going to see customers buying until it's sort of lock in on the key components of the area for (mumbles). So for example, storage, HD (mumbles). This is open and that's key, I think, for all the players in this base at HTFS. It's not a product from a storage vendor. It's an open platform and you can change vendors along the way and you could role your own and so on. So metadata, to my mind, is going to move in the same direction. That the storage of metadata, the basic component tree that keeps the metadata, that's got to be open to give people the confidence that they're going to pour the basic descriptions of what's in their business and what their people are doing into a place that they know they can count on and it will be vendor neutral. So the catalog vendors are, in my mind, providing a functionality above that basic storage that relates to how do you search the catalog, what does the catalog do for you to suggest things, to suggest data sets that you should be looking at. So that's a value we have on top but below that what we're seeing is, we're seeing Horton and Cloudera coming out with either products re opensource and it's sort of the metadata space and what would be a shame is if the two vendors ended up kind of pointing guns inward and kind of killing the metadata storage. So one of the things that I got interested in as my dual role as a professor at Berkeley and also as a founder of a company in this space was we want to ensure that there's a free open vendor neutral metadata solution. So we began building out a project called Ground which is both a platform for metadata storage that can be sitting underneath catalog vendors and other metadata value adds. And it's also a platform for research much as we did with Spark previously at Berkeley. So Ground is a project in our new lab at Berkeley. The RISELab which is the successor to the AMPLab that gave us Spark. And Ground has now got, you know, collaboratives from Cloudera, from LinkedIn. Capital One has significantly invested in Ground and is putting engineers behind it and contributors are coming also from some startups to build out an open-sourced platform for metadata. >> How old has Ground been around? >> Joe: Ground's been around for about 12 months. It's very-- >> So it's brand new. How do people get involved? >> Brand new. >> Just standard similar to the way the AMPLab was? Just jump in and-- >> Yeah, you know-- >> Go away and-- >> It comes up on GitHub. There's (mumbles) to go download and play with. It's in alpha. And you know, we hope we (mumbles) and the usual opensource still. >> This is interesting. I like this idea because one thing you've been riffing on the cue ball of time is how do you make data addressable? Because ultimately, you know, real time you need to have access to data really really low (mumbles) to see the inside to make it work. Hence the data swamp problem right? So, how do you guys see that? 'Cause now I can just pop in. I can hear the objections. Oh, security! You know. How do you guys see the protections? I'd love to help get my data in there and get something back in return in a community model. Security? Is it the hashing? What's the-- How do you get any security (mumbles)? Or what are the issues? >> Yeah, so I mean the straightforward issues are the traditional issues of authorization and encryption and those are issues that are reasonably well-plumed out in the industry and you can go out and you can take the solutions from people like Clutter or from Horton and those solutions have plugin quite nicely actually to a variety of platforms. And I feel like that level of enterprise security is understood. It's work for vendors to work with that technology so when we went out, we make sure we were carburized in all the right ways at Trifacta to work with these vendors and that we integrated well with Navigator, we integrated with Atlas. That was, you know, there was some labor there but it's understood. There's also-- >> It's solvable basically. >> It's solvable basically and pluggable. There are research questions there which, you know, on another day we could talk about but for instance if you don't trust your cloud hosting service what do you do? And that's like an open area that we're working on at Berkeley. Intel SGX is a really interesting technology and that's based probably a topic for another day. >> But you know, I think it's important-- >> The sooner we get you out of the studio, Paolo Alto would love to drill on that. >> I think it's important though that, you know, when we talk about self service, the first question that comes up is I'm only going to let you self service as far as I can govern what's going on, right? And so I think those things-- >> Restrictions, guard rails-- >> Really going hand in here. >> About handcuffs. >> Yeah so, right. Because that's always a first thing that kind of comes out where people say, okay wait minute now is this-- if I've now got, you know-- you've got an increasing number of knowledge workers who think that is their-- and believe that it is their unalienable right to have access to data. >> Well that's the (mumbles) democratization. That's the top down, you know, governance control point. >> So how do you balance that? And I think you can't solve for one side of that equation without the other, right? And that's really really critical. >> Democratization is anarchization, right? >> Right, exactly. >> Yes, exactly. But it's hard though. I mean, and you look at all the big trends where there was, you know, web one data, web (mumbles), all had those democratization trends but they took six years to play out and I think there might be a more auxiliary with cloud when you point about this new stop. Okay George, go ahead. You might get in there. >> I wanted to ask you about, you know, what we were talking about earlier and what customers are faced with which is, you know, a lot of choice and specialization because building something end to end and having it fully functional is really difficult. So... What are the functional points where you start driving the guard rails in that Ikee cares about and then what are the user experience points where you have critical mass so that the end users then draw other compliant tools in. You with me? On sort of the IT side and the user side and then which tools start pulling those standards? >> Well, I would say at the highest level, to me what's been very interesting especially would be with that's happened in opensource is that people have now gotten accustomed to the idea that like I don't have to go buy a big monolithic stacks where the innovation moves only as fast as the slowest product in the stack or the portfolio. I can grab onto things and I can download them today and be using them tomorrow. And that has, I think, changed the entire approach that companies like Trifacta are taking to how we how we build and release product to market, how we inter operate with partners like Alation and Waterline and how we integrate with the platform vendors like Cloudera, MapR, and Horton because we recognize that we are going to have to be meniacal focused on one piece of this puzzle and to go very very deep but then play incredibly well both, you know, with all the rest of the ecosystem and so I think that is really colored our entire product strategy and how we go to market and I think customers, you know, they want the flexibility to change their minds and the subscription model is all about that, right? You got to earn it every single year. >> So what's the future of (mumbles)? 'Cause that brings up a good point we were kind of critical of Google and you mentioned you guys had-- I saw in some news that you guys were involved with Google. >> Yup. >> Being enterprise ready is not just, hey we have the great tech and you buy from us, damn it we're Google. >> Right. >> I mean, you have to have sales people. You have to have automation mechanism to create great product. Will the future of wrangling and data prep go into-- where does it end up? Because enterprises want, they want certain things. They're finicky of things. >> Right, right. >> As you guys know. So how does the future of data prep deal with the, I won't say the slowness of the enterprise, but they're more conservative, more SLA driven than they are price performance. >> But they're also more fragmented than ever before and you know, while that may not be a great thing for the customers for a company that's all about harmonizing data that's actually a phenomenal opportunity, right? Because we want to be the decision that customers make that guarantee that all their other decisions are changeable, right? And I go and-- >> Well they have legacy systems of record. This is the challenge, right? So I got the old oracle monolithic-- >> That's fine. And that's good-- >> So how do you-- >> The more the merrier, right? >> Does that impact you guys at all? How did you guys handle that situation? >> To me, to us that is more fragmentation which creates more need for wrangling because that introduces more complexity, right? >> You guys do well in that environment. >> Absolutely. And that, you know, is only getting bigger, worse, and more complicated. And especially as people go from (mumbles) to cloud as people start thinking about moving from just looking at transactions to interactions to now looking at behavior data and the IOT-- >> You're welcome in that environment. >> So we welcome that. In fact, that's where-- we went to solve this problem for Hadoop and Big Data first because we wanted to solve the problems at scale that were the most complicated and over time we can always move downstream to sort of more structured and smaller data and that's kind of what's happened with our business. >> I guess I want to circle back to this issue of which part of this value chain of refining data is-- if I'm understanding you right, the data wrangling is the anchor and once a company has made that choice then all the other tool choices have to revolve around it? Is that a-- >> Well think about this way, I mean, the bulk of the time when you talk to the analysts and also the bulk of the labor cost and these things isn't getting the data from its raw form into usage. That whole process of wrangling which is not really just data prep. It's all the things you do all day long to kind of massage these data sets and get 'em from here to there and make 'em work. That space is where the labor cost is. That also means that's spaces were the value add is because that's where your people power or your business context is really getting poured in to understand what do I have, what am I doing with it and what do I want to get out of it. As we move from bottom line IT to top line value generation with data, it becomes all the more so, right? Because now it's not just the matter of getting the reports out every month. It's also what did that brilliant in sales do to that dataset to get that much left? I need to learn from her and do a similar thing. Alright? So, that whole space is where the value is. What that means is that, you know, you don't want that space to be tied to a particular BI tool or a particular execution edge. So when we say that we want to make a decision in the middle of that enables all the other decisions, what you really want to make sure is that that work process in there is not tightly bound to the rest of the stack. Okay? And so you want to particularly pick technologies in that space that will play nicely with different storage, that play nicely with different execution environments. Today it's a dupe, tomorrow it's Amazon, the next day it's Google and they have different engines back there potentially. And you want it certainly makes your place with all the analytic and visualizations-- >> So decouple from all that? >> You want to decouple that and you want to not lock yourself in 'cause that's where the creativity's happening on the consumption side and that's where the mess that you talked about is just growing on the production side so data production is just getting more complicated. Data consumption's getting more interesting. >> That's actually a really really cool good point. >> Elaborating on that, does that mean that you have to open up interfaces with either the UI layer or at the sort of data definition layer? Or does that just mean other companies have to do the work to tie in to the styles? The styles and structures that you have already written? >> In fact it's sort of the opposite. We do the work to tie in to a lot of this, these other decisions in this infrastructure, you know. We don't pretend for a minute that people are going to sort of pick a solution like Trifacta and then build their organization around it. As your point, there's tons of legacy, technology out there. There is all kinds of things moving. Absolutely. So we, a big part of being the decoder ring for data for Trifacta and saying it's like listen, we are going to inter operate with your existing investments and we're going to make sure that you can always get at your data, you can always take it from whatever state its in to whatever state you need to be in, you can change your mind along the way. And that puts a lot of owners on us and that's the reason why we have to be so focused on this space and not jump into visualization and analytics and not jump in to its storage and processing and not try to do the other things to the right or left. Right? >> So final question. I'd like you guys both to take a stab at it. You know, just going to pivot off at what Joe was saying. Some of the most interesting things are happening in the data exploration kind of discovery area from creativity to insights to game changing stuff. >> Yup. >> Ventures potentially. >> Joe: Yup. >> The problem of the complexity, that's conflict. >> Yeah. >> So how does we resolve this? I mean, besides the Trifacta solution which you guys are taming, creating a platform for that, how do people in industry work together to solve that problem? What's the approach? >> So I think actually there's a couple sort of heartening trends on this front that make me pretty optimistic. One of these is that the inside of structures are in the enterprises we work with becoming quite aligned between IT and the line of business. It's no longer the case that the line of business that are these annoying people that they're distracting IT from their bottom line function. IT's bottom line function is being translated into a what's your value for the business question? And the answer for a savvy IT management person is, I will try to empower the people around me to be rabid fans and I will also try to make sure that they do their own works so I don't have to learn how to do it for them. Right? And so, that I think is happening-- >> Guys to this (mumbles) business guys, a bunch of annoying guys who don't get what I need, right? So it works both ways, right? >> It does, it does. And I see that that's improving sort of in the industry as the corporate missions around data change, right? So it's no longer that the IT guys really only need to take care of executives and everyone else doesn't matter. Their function really is to serve the business and I see that alignment. The other thing that I think is a huge opportunity and the part of who I-- we're excited to be so tightly coupled with Google and also have our stuff running in Amazon and at Microsoft. It's as people read platform to the cloud, a lot of legacy becomes a shed or at least become deprecated. And so there is a real-- >> Or containerized or some sort of microservice. >> Yeah. >> Right, right. >> And so, people are peeling off business function and as part of that cost savings to migrate it to the cloud, they're also simplified. And you know, things will get complicated again. >> What's (mumbles) solution architects out there that kind of re-boot their careers because the old way was, hey I got networks, I got apps and stacks and so that gives the guys who could be the new heroes coming in. >> Right. >> And thinking differently about enabling that creativity. >> In the midst of all that, everything you said is true. IT is a massive place and it always will be. And tools that can come in and help are absolutely going to be (mumbles). >> This is obvious now. The tension's obviously eased a bit in the sense that there's clear line of sight that top line and bottom line are working together now on. You mentioned that earlier. Okay. Adam, take a stab at it. (mumbling) >> I was just going to-- hey, I know it's great. I was just going to give an example, I think, that illustrates that point so you know, one of our customers is Pepsi. And Pepsi came to us and they said, listen we work with retailers all over the world and their reality is that, when they place orders with us, they often get it wrong. And sometimes they order too much and then they return it, it spoils and that's bad for us. Or they order too little and they stock out and we miss revenue opportunities. So they said, we actually have to be better at demand planning and forecasting than the orders that are literally coming in the door. So how do we do that? Well, we're getting all of the customers to give us their point of sale data. We're combining that with geospatial data, with weather data. We're like looking at historical data and industry averages but as you can see, they were like-- we're stitching together data across a whole variety of sources and they said the best people to do this are actually the category managers and the people responsible for the brands 'cause they literally live inside those businesses and they understand it. And so what happened was they-- the IT organization was saying, look listen, we don't want to be the people doing the janitorial work on the data. We're going to give that work over to people who understand it and they're going to be more productive and get to better outcomes with that information and that brings us up to go find new and interesting sources and I think that collaborative model that you're starting to see emerge where they can now be the data heroes in a different way by not being the ones beating the bottleneck on provisioning but rather can go out and figure out how do we share the best stuff across the organization? How do we find new sources of information to bring in that people can leverage to make better decisions? That's in incredibly powerful place to be and you know, I think that that model is really what's going to be driving a lot of the thinking at Trifacta and in the industry over the next couple of years. >> Great. Adam Wilson, CEO of Trifacta. Joe Hellestein, CTO-- Chief Strategy Officer of Trifacta and also a professor at Berkeley. Great story. Getting the (mumbles) right is hard but under the hood stuff's complicated and again, congratulations about sharing the Ground project. Ground open source. Open source lab kind of thing at-- in Berkeley. Exciting new stuff. Thanks so much for coming on theCUBE. I appreciate great conversation. I'm John Furrier, George Gilbert. You're watching theCUBE here at Big Data SV in conjunction with Strata and Hadoop. Thanks for watching. >> Great. >> Thanks guys.

Published Date : Mar 16 2017

SUMMARY :

It's theCUBE covering Big Data Silicon Valley 2017. and Adam Wilson, the CEO. that have the (mumbles) in the about section Okay, so a big year for you guys. and what's the update from you guys? and really fueling that maturity in the market in the business model question for people. You see it and you believe it but then that business model comes down to how you organized. on the business model? With respect to the product, customers, partners. that the business model also has to support democritization So the data stays local. the product itself gets smarter and files on the desktop but something more complicated, and to help people go from inventorying that relates to how do you search the catalog, It's very-- So it's brand new. and the usual opensource still. I can hear the objections. and that we integrated well with Navigator, There are research questions there which, you know, The sooner we get you out and believe that it is their unalienable right That's the top down, you know, governance control point. And I think you can't solve for one side of that equation and I think there might be a more auxiliary with cloud so that the end users then draw other compliant tools in. and how we go to market and I think customers, you know, I saw in some news that you guys hey we have the great tech and you buy from us, I mean, you have to have sales people. So how does the future of data prep deal with the, So I got the old oracle monolithic-- And that's good-- in that environment. and the IOT-- You're welcome in that and that's kind of what's happened with our business. the bulk of the time when you talk to the analysts and you want to not lock yourself in and that's the reason why we have to be in the data exploration kind of discovery area The problem of the complexity, in the enterprises we work with becoming quite aligned And I see that that's improving sort of in the industry as or some sort of microservice. and as part of that cost savings to migrate it to the cloud, so that gives the guys who could be In the midst of all that, everything you said is true. in the sense that there's clear line of sight and in the industry over the next couple of years. and again, congratulations about sharing the Ground project.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Joe HellersteinPERSON

0.99+

GeorgePERSON

0.99+

JoePERSON

0.99+

George GilbertPERSON

0.99+

Joe HellesteinPERSON

0.99+

John FurrierPERSON

0.99+

TrifactaORGANIZATION

0.99+

PepsiORGANIZATION

0.99+

Adam WilsonPERSON

0.99+

AdamPERSON

0.99+

MicrosoftORGANIZATION

0.99+

WaterlineORGANIZATION

0.99+

GoogleORGANIZATION

0.99+

BerkeleyLOCATION

0.99+

Silicon ValleyLOCATION

0.99+

San Jose, CaliforniaLOCATION

0.99+

AlationORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

StephaniePERSON

0.99+

HortonORGANIZATION

0.99+

LinkedInORGANIZATION

0.99+

six yearsQUANTITY

0.99+

oneQUANTITY

0.99+

MapRORGANIZATION

0.99+

tomorrowDATE

0.99+

Capital OneORGANIZATION

0.99+

first questionQUANTITY

0.99+

TodayDATE

0.99+

OneQUANTITY

0.99+

Last yearDATE

0.99+

two executivesQUANTITY

0.99+

TrifactaPERSON

0.99+

ClouderaORGANIZATION

0.99+

one pieceQUANTITY

0.98+

both solutionsQUANTITY

0.98+

todayDATE

0.98+

over 4500 companiesQUANTITY

0.98+

IntelORGANIZATION

0.98+

both waysQUANTITY

0.98+

bothQUANTITY

0.98+

three foundersQUANTITY

0.97+

two vendorsQUANTITY

0.97+

first sitesQUANTITY

0.97+

GroundORGANIZATION

0.97+

Munich ReORGANIZATION

0.97+

about 12 monthsQUANTITY

0.97+

NYCLOCATION

0.96+

first thingQUANTITY

0.96+

four timesQUANTITY

0.96+

eBayORGANIZATION

0.95+

WikibonORGANIZATION

0.95+

Paolo AltoPERSON

0.95+

next dayDATE

0.95+

three timesQUANTITY

0.94+

ten of thousands of usersQUANTITY

0.93+

one sideQUANTITY

0.93+

years laterDATE

0.92+