Image Title

Search Results for DataWorks Summit Europe 2018:

John Kreisa, Hortonworks | DataWorks Summit 2018


 

>> Live from San José, in the heart of Silicon Valley, it's theCUBE! Covering DataWorks Summit 2018. Brought to you by Hortonworks. (electro music) >> Welcome back to theCUBE's live coverage of DataWorks here in sunny San José, California. I'm your host, Rebecca Knight, along with my co-host, James Kobielus. We're joined by John Kreisa. He is the VP of marketing here at Hortonworks. Thanks so much for coming on the show. >> Thank you for having me. >> We've enjoyed watching you on the main stage, it's been a lot of fun. >> Thank you, it's been great. It's been great general sessions, some great talks. Talking about the technology, we've heard from some customers, some third parties, and most recently from Kevin Slavin from The Shed which is really amazing. >> So I really want to get into this event. You have 2,100 attendees from 23 different countries, 32 different industries. >> Yep. This started as a small, >> That's right. tiny little thing! >> Didn't Yahoo start it in 2008? >> It did, yeah. >> You changed names a few year ago, but it's still the same event, looming larger and larger. >> Yeah! >> It's been great, it's gone international as you've said. It's actually the 17th total event that we've done. >> Yeah. >> If you count the ones we've done in Europe and Asia. It's a global community around data, so it's no surprise. The growth has been phenomenal, the energy is great, the innovations that the community is talking about, the ecosystem is talking about, is really great. It just continues to evolve as an event, it continues to bring new ideas and share those ideas. >> What are you hearing from customers? What are they buzzing about? Every morning on the main stage, you do different polls that say, "how much are you using machine learning? What portion of your data are you moving to the cloud?" What are you learning? >> So it's interesting because we've done similar polls in our show in Berlin, and the results are very similar. We did the cloud poll pole and there's a lot of buzz around cloud. What we're hearing is there's a lot of companies that are thinking about, or are somewhere along their cloud journey. It's exactly what their overall plans are, and there's a lot of news about maybe cloud will eat everything, but if you look at the pole results, something like 75% of the attendees said they have cloud in their plans. Only about 12% said they're going to move everything to the cloud, so a lot of hybrid with cloud. It's how to figure out which work loads to run where, how to think about that strategy in terms of where to deploy the data, where to deploy the work loads and what that should look like and that's one of the main things that we're hearing and talking a lot about. >> We've been seeing that Wikiban and our recent update to the recent market forecast showed that public cloud will dominate increasingly in the coming decade, but hybrid cloud will be a long transition period for many or most enterprises who are still firmly rooted in on-premises employment, so forth and so on. Clearly, the bulk of your customers, both of your custom employments are on premise. >> They are. >> So you're working from a good starting point which means you've got what, 1,400 customers? >> That's right, thereabouts. >> Predominantly on premises, but many of them here at this show want to sustain their investment in a vendor that provides them with that flexibility as they decide they want to use Google or Microsoft or AWS or IBM for a particular workload that their existing investment to Hortonworks doesn't prevent them from facilitating. It moves that data and those workloads. >> That's right. The fact that we want to help them do that, a lot of our customers have, I'll call it a multi-cloud strategy. They want to be able to work with an Amazon or a Google or any of the other vendors in the space equally well and have the ability to move workloads around and that's one of the things that we can help them with. >> One of the things you also did yesterday on the main stage, was you talked about this conference in the greater context of the world and what's going on right now. This is happening against the backdrop of the World Cup, and you said that this is really emblematic of data because this is a game, a tournament that generates tons of data. >> A tremendous amount of data. >> It's showing how data can launch new business models, disrupt old ones. Where do you think we're at right now? For someone who's been in this industry for a long time, just lay the scene. >> I think we're still very much at the beginning. Even though the conference has been around for awhile, the technology has been. It's emerging so fast and just evolving so fast that we're still at the beginning of all the transformations. I've been listening to the customer presentations here and all of them are at some point along the journey. Many are really still starting. Even in some of the polls that we had today talked about the fact that they're very much at the beginning of their journey with things like streaming or some of the A.I. machine learning technologies. They're at various stages, so I believe we're really at the beginning of the transformation that we'll see. >> That reminds me of another detail of your product portfolio or your architecture streaming and edge deployments are also in the future for many of your customers who still primarily do analytics on data at rest. You made an investment in a number of technologies NiFi from streaming. There's something called MiNiFi that has been discussed here at this show as an enabler for streaming all the way out to edge devices. What I'm getting at is that's indicative of Arun Murthy, one of your co-founders, has made- it was a very good discussion for us analysts and also here at the show. That is one of many investments you're making is to prepare for a future that will set workloads that will be more predominant in the coming decade. One of the new things I've heard this week that I'd not heard in terms of emphasis from you guys is more of an emphasis on data warehousing as an important use case for HDP in your portfolios, specifically with HIVE. The HIVE 3.0 now in- HDP3.0. >> Yes. >> With the enhancements to HIVE to support more real time and low latency, but also there's ACID capabilities there. I'm hearing something- what you guys are doing is consistent with one of your competitors, Cloudera. They're going deeper into data warehousing too because they recognize they've got to got there like you do to be able to absorb more of your customers' workloads. I think that's important that you guys are making that investment. You're not just big data, you're all data and all data applications. Potentially, if your customers want to go there and engage you. >> Yes. >> I think that was a significant, subtle emphasis that me as an analyst noticed. >> Thank you. There were so many enhancements in 3.0 that were brought from the community that it was hard to talk about everything in depth, but you're right. The enhancements to HIVE in terms of performance have really enabled it to take on a greater set of workloads and inner activity that we know that our customers want. The advantage being that you have a common data layer in the back end and you can run all this different work. It might be data warehousing, high speed query workloads, but you can do it on that same data with Spark and data-science related workloads. Again, it's that common pool backend of the data lake and having that ability to do it with common security and governance. It's one of the benefits our customers are telling us they really appreciate. >> One of the things we've also heard this morning was talking about data analytics in terms of brand value and brand protection importantly. Fedex, exactly. Talking about, the speaker said, we've all seen these apology commercials. What do you think- is it damage control? What is the customer motivation here? >> Well a company can have billions of dollars of market cap wiped out by breeches in security, and we've seen it. This is not theoretical, these are actual occurrences that we've seen. Really, they're trying to protect the brand and the business and continue to be viable. They can get knocked back so far that it can take years to recover from the impact. They're looking at the security aspects of it, the governance of their data, the regulations of GVPR. These things you've mentioned have real financial impact on the businesses, and I think it's brand and the actual operations and finances of the businesses that can be impacted negatively. >> When you're thinking about Hortonworks's marketing messages going forward, how do you want to be described now, and then how do you want customers to think of you five or 10 years from now? >> I want them to think of us as a partner to help us with their data journey, on all aspects of their data journey, whether they're collecting data from the EDGE, you mentioned NiFi and things like that. Bringing that data back, processing it in motion, as well as processing it in rest, regardless of where that data lands. On premise, in the cloud, somewhere in between, the hybrid, multi-cloud strategy. We really want to be thought of as their partner in their data journey. That's really what we're doing. >> Even going forward, one of the things you were talking about earlier is the company's sort of saying, "we want to be boring. We want to help you do all the stuff-" >> There's a lot of money in boring. >> There's a lot of money, right! Exactly! As you said, a partner in their data journey. Is it "we'll do anything and everything"? Are you going to do niche stuff? >> That's a good question. Not everything. We are focused on the data layer. The movement of data, the process and storage, and truly the analytic applications that can be built on top of the platform. Right now we've stuck to our strategy. It's been very consistent since the beginning of the company in terms of taking these open source technologies, making them enterprise viable, developing an eco-system around it and fostering a community around it. That's been our strategy since before the company even started. We want to continue to do that and we will continue to do that. There's so much innovation happening in the community that we quickly bring that into the products and make sure that's available in a trusted, enterprise-tested platform. That's really one of the things we see our customers- over and over again they select us because we bring innovation to them quickly, in a safe and consumable way. >> Before we came on camera, I was telling Rebecca that Hortonworks has done a sensational job of continuing to align your product roadmaps with those of your leading partners. IBM, AWS, Microsoft. In many ways, your primary partners are not them, but the entire open source community. 26 open source projects in which Hortonworks represents and incorporated in your product portfolio in which you are a primary player and committer. You're a primary ingester of innovation from all the communities in which you operate. >> We do. >> That is your core business model. >> That's right. We both foster the innovation and we help drive the information ourselves with our engineers and architects. You're absolutely right, Jim. It's the ability to get that innovation, which is happening so fast in the community, into the product and companies need to innovate. Things are happening so fast. Moore's Law was mentioned multiple times on the main stage, you know, and how it's impacting different parts of the organization. It's not just the technology, but business models are evolving quickly. We heard a little bit about Trumble, and if you've seen Tim Leonard's talk that he gave around what they're doing in terms of logistics and the ability to go all the way out to the farmer and impact what's happening at the farm and tracking things down to the level of a tomato or an egg all the way back and just understand that. It's evolving business models. It's not just the tech but the evolution of business models. Rob talked about it yesterday. I think those are some of the things that are kind of key. >> Let me stay on that point really quick. Industrial internet like precision agriculture and everything it relates to, is increasingly relying on visual analysis, parts and eggs and whatever it might be. That is convolutional neural networks, that is A.I., it has to be trained, and it has to be trained increasingly in the cloud where the data lives. The data lives in H.D.P, clusters and whatnot. In many ways, no matter where the world goes in terms of industrial IoT, there will be massive cluster of HTFS and object storage driving it and also embedded A.I. models that have to follow a specific DevOps life cycle. You guys have a strong orientation in your portfolio towards that degree of real-time streaming, as it were, of tasks that go through the entire life cycle. From the preparing the data, to modeling, to training, to deploying it out, to Google or IBM or wherever else they want to go. So I'm thinking that you guys are in a good position for that as well. >> Yeah. >> I just wanted to ask you finally, what is the takeaway? We're talking about the attendees, talking about the community that you're cultivating here, theme, ideas, innovation, insight. What do you hope an attendee leaves with? >> I hope that the attendee leaves educated, understanding the technology and the impacts that it can have so that they will go back and change their business and continue to drive their data projects. The whole intent is really, and we even changed the format of the conference for more educational opportunities. For me, I want attendees to- a satisfied attendee would be one that learned about the things they came to learn so that they could go back to achieve the goals that they have when they get back. Whether it's business transformation, technology transformation, some combination of the two. To me, that's what I hope that everyone is taking away and that they want to come back next year when we're in Washington, D.C. and- >> My stomping ground. >> His hometown. >> Easy trip for you. They'll probably send you out here- (laughs) >> Yeah, that's right. >> Well John, it's always fun talking to you. Thank you so much. >> Thank you very much. >> We will have more from theCUBE's live coverage of DataWorks right after this. I'm Rebecca Knight for James Kobielus. (upbeat electro music)

Published Date : Jun 20 2018

SUMMARY :

in the heart of Silicon Valley, He is the VP of marketing you on the main stage, Talking about the technology, So I really want to This started as a small, That's right. but it's still the same event, It's actually the 17th total event the innovations that the community is that's one of the main things that Clearly, the bulk of your customers, their existing investment to Hortonworks have the ability to move workloads One of the things you also did just lay the scene. Even in some of the polls that One of the new things I've heard this With the enhancements to HIVE to subtle emphasis that me the data lake and having that ability to One of the things we've also aspects of it, the the EDGE, you mentioned NiFi and one of the things you were talking There's a lot of money, right! That's really one of the things we all the communities in which you operate. It's the ability to get that innovation, the cloud where the data lives. talking about the community that learned about the things they came to They'll probably send you out here- fun talking to you. coverage of DataWorks right after this.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
James KobielusPERSON

0.99+

Rebecca KnightPERSON

0.99+

IBMORGANIZATION

0.99+

RebeccaPERSON

0.99+

MicrosoftORGANIZATION

0.99+

Tim LeonardPERSON

0.99+

AWSORGANIZATION

0.99+

Arun MurthyPERSON

0.99+

JimPERSON

0.99+

Kevin SlavinPERSON

0.99+

EuropeLOCATION

0.99+

John KreisaPERSON

0.99+

BerlinLOCATION

0.99+

AmazonORGANIZATION

0.99+

JohnPERSON

0.99+

GoogleORGANIZATION

0.99+

2008DATE

0.99+

Washington, D.C.LOCATION

0.99+

AsiaLOCATION

0.99+

75%QUANTITY

0.99+

RobPERSON

0.99+

fiveQUANTITY

0.99+

San JoséLOCATION

0.99+

next yearDATE

0.99+

YahooORGANIZATION

0.99+

Silicon ValleyLOCATION

0.99+

32 different industriesQUANTITY

0.99+

World CupEVENT

0.99+

yesterdayDATE

0.99+

23 different countriesQUANTITY

0.99+

oneQUANTITY

0.99+

1,400 customersQUANTITY

0.99+

todayDATE

0.99+

twoQUANTITY

0.99+

2,100 attendeesQUANTITY

0.99+

FedexORGANIZATION

0.99+

10 yearsQUANTITY

0.99+

26 open source projectsQUANTITY

0.99+

HortonworksORGANIZATION

0.98+

17thQUANTITY

0.98+

bothQUANTITY

0.98+

OneQUANTITY

0.98+

billions of dollarsQUANTITY

0.98+

ClouderaORGANIZATION

0.97+

about 12%QUANTITY

0.97+

theCUBEORGANIZATION

0.97+

this weekDATE

0.96+

DataWorks Summit 2018EVENT

0.95+

NiFiORGANIZATION

0.91+

this morningDATE

0.89+

HIVE 3.0OTHER

0.86+

SparkTITLE

0.86+

few year agoDATE

0.85+

WikibanORGANIZATION

0.85+

The ShedORGANIZATION

0.84+

San José, CaliforniaLOCATION

0.84+

tonsQUANTITY

0.82+

H.D.PLOCATION

0.82+

DataWorksEVENT

0.81+

thingsQUANTITY

0.78+

DataWorksORGANIZATION

0.74+

MiNiFiTITLE

0.62+

dataQUANTITY

0.61+

MooreTITLE

0.6+

yearsQUANTITY

0.59+

coming decadeDATE

0.59+

TrumbleORGANIZATION

0.59+

GVPRORGANIZATION

0.58+

3.0OTHER

0.56+

Pandit Prasad, IBM | DataWorks Summit 2018


 

>> From San Jose, in the heart of Silicon Valley, it's theCube. Covering DataWorks Summit 2018. Brought to you by Hortonworks. (upbeat music) >> Welcome back to theCUBE's live coverage of Data Works here in sunny San Jose, California. I'm your host Rebecca Knight along with my co-host James Kobielus. We're joined by Pandit Prasad. He is the analytics, projects, strategy, and management at IBM Analytics. Thanks so much for coming on the show. >> Thanks Rebecca, glad to be here. >> So, why don't you just start out by telling our viewers a little bit about what you do in terms of in relationship with the Horton Works relationship and the other parts of your job. >> Sure, as you said I am in Offering Management, which is also known as Product Management for IBM, manage the big data portfolio from an IBM perspective. I was also working with Hortonworks on developing this relationship, nurturing that relationship, so it's been a year since the Northsys partnership. We announced this partnership exactly last year at the same conference. And now it's been a year, so this year has been a journey and aligning the two portfolios together. Right, so Hortonworks had HDP HDF. IBM also had similar products, so we have for example, Big Sequel, Hortonworks has Hive, so how Hive and Big Sequel align together. IBM has a Data Science Experience, where does that come into the picture on top of HDP, so it means before this partnership if you look into the market, it has been you sell Hadoop, you sell a sequel engine, you sell Data Science. So what this year has given us is more of a solution sell. Now with this partnership we go to the customers and say here is NTN experience for you. You start with Hadoop, you put more analytics on top of it, you then bring Big Sequel for complex queries and federation visualization stories and then finally you put Data Science on top of it, so it gives you a complete NTN solution, the NTN experience for getting the value out of the data. >> Now IBM a few years back released a Watson data platform for team data science with DSX, data science experience, as one of the tools for data scientists. Is Watson data platform still the core, I call it dev ops for data science and maybe that's the wrong term, that IBM provides to market or is there sort of a broader dev ops frame work within which IBM goes to market these tools? >> Sure, Watson data platform one year ago was more of a cloud platform and it had many components of it and now we are getting a lot of components on to the (mumbles) and data science experience is one part of it, so data science experience... >> So Watson analytics as well for subject matter experts and so forth. >> Yes. And again Watson has a whole suit of side business based offerings, data science experience is more of a a particular aspect of the focus, specifically on the data science and that's been now available on PRAM and now we are building this arm from stack, so we have HDP, HDF, Big Sequel, Data Science Experience and we are working towards adding more and more to that portfolio. >> Well you have a broader reference architecture and a stack of solutions AI and power and so for more of the deep learning development. In your relationship with Hortonworks, are they reselling more of those tools into their customer base to supplement, extend what they already resell DSX or is that outside of the scope of the relationship? >> No it is all part of the relationship, these three have been the core of what we announced last year and then there are other solutions. We have the whole governance solution right, so again it goes back to the partnership HDP brings with it Atlas. IBM has a whole suite of governance portfolio including the governance catalog. How do you expand the story from being a Hadoop-centric story to an enterprise data-like story, and then now we are taking that to the cloud that's what Truata is all about. Rob Thomas came out with a blog yesterday morning talking about Truata. If you look at it is nothing but a governed data-link hosted offering, if you want to simplify it. That's one way to look at it caters to the GDPR requirements as well. >> For GDPR for the IBM Hortonworks partnership is the lead solution for GDPR compliance, is it Hortonworks Data Steward Studio or is it any number of solutions that IBM already has for data governance and curation, or is it a combination of all of that in terms of what you, as partners, propose to customers for soup to nuts GDPR compliance? Give me a sense for... >> It is a combination of all of those so it has a HDP, its has HDF, it has Big Sequel, it has Data Science Experience, it had IBM governance catalog, it has IBM data quality and it has a bunch of security products, like Gaurdium and it has some new IBM proprietary components that are very specific towards data (cough drowns out speaker) and how do you deal with the personal data and sensitive personal data as classified by GDPR. I'm supposed to query some high level information but I'm not allowed to query deep into the personal information so how do you blog those queries, how do you understand those, these are not necessarily part of Data Steward Studio. These are some of the proprietary components that are thrown into the mix by IBM. >> One of the requirements that is not often talked about under GDPR, Ricky of Formworks got in to it a little bit in his presentation, was the notion that the requirement that if you are using an UE citizen's PII to drive algorithmic outcomes, that they have the right to full transparency. It's the algorithmic decision paths that were taken. I remember IBM had a tool under the Watson brand that wraps up a narrative of that sort. Is that something that IBM still, it was called Watson Curator a few years back, is that a solution that IBM still offers, because I'm getting a sense right now that Hortonworks has a specific solution, not to say that they may not be working on it, that addresses that side of GDPR, do you know what I'm referring to there? >> I'm not aware of something from the Hortonworks side beyond the Data Steward Studio, which offers basically identification of what some of the... >> Data lineage as opposed to model lineage. It's a subtle distinction. >> It can identify some of the personal information and maybe provide a way to tag it and hence, mask it, but the Truata offering is the one that is bringing some new research assets, after GDPR guidelines became clear and then they got into they are full of how do we cater to those requirements. These are relatively new proprietary components, they are not even being productized, that's why I am calling them proprietary components that are going in to this hosting service. >> IBM's got a big portfolio so I'll understand if you guys are still working out what position. Rebecca go ahead. >> I just wanted to ask you about this new era of GDPR. The last Hortonworks conference was sort of before it came into effect and now we're in this new era. How would you say companies are reacting? Are they in the right space for it, in the sense of they're really still understand the ripple effects and how it's all going to play out? How would you describe your interactions with companies in terms of how they're dealing with these new requirements? >> They are still trying to understand the requirements and interpret the requirements coming to terms with what that really means. For example I met with a customer and they are a multi-national company. They have data centers across different geos and they asked me, I have somebody from Asia trying to query the data so that the query should go to Europe, but the query processing should not happen in Asia, the query processing all should happen in Europe, and only the output of the query should be sent back to Asia. You won't be able to think in these terms before the GDPR guidance era. >> Right, exceedingly complicated. >> Decoupling storage from processing enables those kinds of fairly complex scenarios for compliance purposes. >> It's not just about the access to data, now you are getting into where the processing happens were the results are getting displayed, so we are getting... >> Severe penalties for not doing that so your customers need to keep up. There was announcement at this show at Dataworks 2018 of an IBM Hortonwokrs solution. IBM post-analytics with with Hortonworks. I wonder if you could speak a little bit about that, Pandit, in terms of what's provided, it's a subscription service? If you could tell us what subset of IBM's analytics portfolio is hosted for Hortonwork's customers? >> Sure, was you said, it is a a hosted offering. Initially we are starting of as base offering with three products, it will have HDP, Big Sequel, IBM DB2 Big Sequel and DSX, Data Science Experience. Those are the three solutions, again as I said, it is hosted on IBM Cloud, so customers have a choice of different configurations they can choose, whether it be VMs or bare metal. I should say this is probably the only offering, as of today, that offers bare metal configuration in the cloud. >> It's geared to data scientist developers and machine-learning models will build the models and train them in IBM Cloud, but in a hosted HDP in IBM Cloud. Is that correct? >> Yeah, I would rephrase that a little bit. There are several different offerings on the cloud today and we can think about them as you said for ad-hoc or ephemeral workloads, also geared towards low cost. You think about this offering as taking your on PRAM data center experience directly onto the cloud. It is geared towards very high performance. The hardware and the software they are all configured, optimized for providing high performance, not necessarily for ad-hoc workloads, or ephemeral workloads, they are capable of handling massive workloads, on sitcky workloads, not meant for I turned this massive performance computing power for a couple of hours and then switched them off, but rather, I'm going to run these massive workloads as if it is located in my data center, that's number one. It comes with the complete set of HDP. If you think about it there are currently in the cloud you have Hive and Hbase, the sequel engines and the stories separate, security is optional, governance is optional. This comes with the whole enchilada. It has security and governance all baked in. It provides the option to use Big Sequel, because once you get on Hadoop, the next experience is I want to run complex workloads. I want to run federated queries across Hadoop as well as other data storage. How do I handle those, and then it comes with Data Science Experience also configured for best performance and integrated together. As a part of this partnership, I mentioned earlier, that we have progress towards providing this story of an NTN solution. The next steps of that are, yeah I can say that it's an NTN solution but are the product's look and feel as if they are one solution. That's what we are getting into and I have featured some of those integrations. For example Big Sequel, IBM product, we have been working on baking it very closely with HDP. It can be deployed through Morey, it is integrated with Atlas and Granger for security. We are improving the integrations with Atlas for governance. >> Say you're building a Spark machine learning model inside a DSX on HDP within IH (mumbles) IBM hosting with Hortonworks on HDP 3.0, can you then containerize that machine learning Sparks and then deploy into an edge scenario? >> Sure, first was Big Sequel, the next one was DSX. DSX is integrated with HDP as well. We can run DSX workloads on HDP before, but what we have done now is, if you want to run the DSX workloads, I want to run a Python workload, I need to have Python libraries on all the nodes that I want to deploy. Suppose you are running a big cluster, 500 cluster. I need to have Python libraries on all 500 nodes and I need to maintain the versioning of it. If I upgrade the versions then I need to go and upgrade and make sure all of them are perfectly aligned. >> In this first version will you be able build a Spark model and a Tesorflow model and containerize them and deploy them. >> Yes. >> Across a multi-cloud and orchestrate them with Kubernetes to do all that meshing, is that a capability now or planned for the future within this portfolio? >> Yeah, we have that capability demonstrated in the pedestal today, so that is a new one integration. We can run virtual, we call it virtual Python environment. DSX can containerize it and run data that's foreclosed in the HDP cluster. Now we are making use of both the data in the cluster, as well as the infrastructure of the cluster itself for running the workloads. >> In terms of the layers stacked, is also incorporating the IBM distributed deep-learning technology that you've recently announced? Which I think is highly differentiated, because deep learning is increasingly become a set of capabilities that are across a distributed mesh playing together as is they're one unified application. Is that a capability now in this solution, or will it be in the near future? DPL distributed deep learning? >> No, we have not yet. >> I know that's on the AI power platform currently, gotcha. >> It's what we'll be talking about at next year's conference. >> That's definitely on the roadmap. We are starting with the base configuration of bare metals and VM configuration, next one is, depending on how the customers react to it, definitely we're thinking about bare metal with GPUs optimized for Tensorflow workloads. >> Exciting, we'll be tuned in the coming months and years I'm sure you guys will have that. >> Pandit, thank you so much for coming on theCUBE. We appreciate it. I'm Rebecca Knight for James Kobielus. We will have, more from theCUBE's live coverage of Dataworks, just after this.

Published Date : Jun 19 2018

SUMMARY :

Brought to you by Hortonworks. Thanks so much for coming on the show. and the other parts of your job. and aligning the two portfolios together. and maybe that's the wrong term, getting a lot of components on to the (mumbles) and so forth. a particular aspect of the focus, and so for more of the deep learning development. No it is all part of the relationship, For GDPR for the IBM Hortonworks partnership the personal information so how do you blog One of the requirements that is not often I'm not aware of something from the Hortonworks side Data lineage as opposed to model lineage. It can identify some of the personal information if you guys are still working out what position. in the sense of they're really still understand the and interpret the requirements coming to terms kinds of fairly complex scenarios for compliance purposes. It's not just about the access to data, I wonder if you could speak a little that offers bare metal configuration in the cloud. It's geared to data scientist developers in the cloud you have Hive and Hbase, can you then containerize that machine learning Sparks on all the nodes that I want to deploy. In this first version will you be able build of the cluster itself for running the workloads. is also incorporating the IBM distributed It's what we'll be talking next one is, depending on how the customers react to it, I'm sure you guys will have that. Pandit, thank you so much for coming on theCUBE.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
RebeccaPERSON

0.99+

James KobielusPERSON

0.99+

Rebecca KnightPERSON

0.99+

EuropeLOCATION

0.99+

IBMORGANIZATION

0.99+

AsiaLOCATION

0.99+

Rob ThomasPERSON

0.99+

San JoseLOCATION

0.99+

Silicon ValleyLOCATION

0.99+

PanditPERSON

0.99+

last yearDATE

0.99+

PythonTITLE

0.99+

yesterday morningDATE

0.99+

HortonworksORGANIZATION

0.99+

three solutionsQUANTITY

0.99+

RickyPERSON

0.99+

NorthsysORGANIZATION

0.99+

HadoopTITLE

0.99+

Pandit PrasadPERSON

0.99+

GDPRTITLE

0.99+

IBM AnalyticsORGANIZATION

0.99+

first versionQUANTITY

0.99+

bothQUANTITY

0.99+

one year agoDATE

0.98+

HortonworkORGANIZATION

0.98+

threeQUANTITY

0.98+

todayDATE

0.98+

DSXTITLE

0.98+

FormworksORGANIZATION

0.98+

this yearDATE

0.98+

AtlasORGANIZATION

0.98+

firstQUANTITY

0.98+

GrangerORGANIZATION

0.97+

GaurdiumORGANIZATION

0.97+

oneQUANTITY

0.97+

Data Steward StudioORGANIZATION

0.97+

two portfoliosQUANTITY

0.97+

TruataORGANIZATION

0.96+

DataWorks Summit 2018EVENT

0.96+

one solutionQUANTITY

0.96+

one wayQUANTITY

0.95+

next yearDATE

0.94+

500 nodesQUANTITY

0.94+

NTNORGANIZATION

0.93+

WatsonTITLE

0.93+

HortonworksPERSON

0.93+

Dan Potter, Attunity & Ali Bajwa, Hortonworks | DataWorks Summit 2018


 

>> Live from San Jose in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2018, brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of DataWorks here in sunny San Jose, California. I'm your host Rebecca Knight along with my co-host James Kobielus. We're joined by Dan Potter. He is the VP Product Management at Attunity and also Ali Bajwah, who is the principal partner solutions engineer at Hortonworks. Thanks so much for coming on theCUBE. >> Pleasure to be here. >> It's good to be here. >> So I want to start with you, Dan, and have you tell our viewers a little bit about the company based in Boston, Massachusetts, what Attunity does. >> Attunity, we're a data integration vendor. We are best known as a provider of real-time data movement from transactional systems into data lakes, into clouds, into streaming architectures, so it's a modern approach to data integration. So as these core transactional systems are being updated, we're able to take those changes and move those changes where they're needed when they're needed for analytics for new operational applications, for a variety of different tasks. >> Change data capture. >> Change data capture is the heart of our-- >> They are well known in this business. They have changed data capture. Go ahead. >> We are. >> So tell us about the announcement today that Attunity has made at the Hortonworks-- >> Yeah, thank you, it's a great announcement because it showcases the collaboration between Attunity and Hortonworks and it's all about taking the metadata that we capture in that integration process. So we're a piece of a data lake architecture. As we are capturing changes from those source systems, we are also capturing the metadata, so we understand the source systems, we understand how the data gets modified along the way. We use that metadata internally and now we're built extensions to share that metadata into Atlas and to be able to extend that out through Atlas to higher data governance initiatives, so Data Steward Studio, into the DataPlane Services, so it's really important to be able to take the metadata that we have and to add to it the metadata that's from the other sources of information. >> Sure, for more of the transactional semantics of what Hortonworks has been describing they've baked in to HDP in your overall portfolios. Is that true? I mean, that supports those kind of requirements. >> With HTP, what we're seeing is you know the EDW optimization play has become more and more important for a lot of customers as they try to optimize the data that their EDWs are working on, so it really gels well with what we've done here with Attunity and then on the Atlas side with the integration on the governance side with GDPR and other sort of regulations coming into the play now, you know, those sort of things are becoming more and more important, you know, specifically around the governance initiative. We actually have a talk just on Thursday morning where we're actually showcasing the integration as well. >> So can you talk a little bit more about that for those who aren't going to be there for Thursday. GDPR was really a big theme at the DataWorks Berlin event and now we're in this new era and it's not talked about too, too much, I mean we-- >> And global business who have businesses at EU, but also all over the world, are trying to be systematic and are consistent about how they manage PII everywhere. So GDPR are those in EU regulation, really in many ways it's having ripple effects across the world in terms of practices. >> Absolutely and at the heart of understanding how you protect yourself and comply, I need to understand my data, and that's where metadata comes in. So having a holistic understanding of all of the data that resides in your data lake or in your cloud, metadata becomes a key part of that. And also in terms of enforcing that, if I understand my customer data, where the customer data comes from, the lineage from that, then I'm able to apply the protections of the masking on top of that data. So it's really, the GDPR effect has had, you know, it's created a broad-scale need for organizations to really get a handle on metadata so the timing of our announcement just works real well. >> And one nice thing about this integration is that you know it's not just about being able to capture the data in Atlas, but now with the integration of Atlas and Ranger, you can do enforcement of policies based on classifications as well, so if you can tag data as PCI, PII, personal data, that can get enforced through Ranger to say, hey, only certain admins can access certain types of data and now all that becomes possible once we've taken the initial steps of the Atlas integration. >> So with this collaboration, and it's really deepening an existing relationship, so how do you go to market? How do you collaborate with each other and then also service clients? >> You want to? >> Yeah, so from an engineering perspective, we've got deep roots in terms of being a first-class provider into the Hortonworks platform, both HDP and HDF. Last year about this time, we announced our support for acid merge capabilities, so the leading-edge work that Hortonworks has done in bringing acid compliance capabilities into Hive, was a really important one, so our change to data capture capabilities are able to feed directly into that and be able to support those extensions. >> Yeah, we have a lot of you know really key customers together with Attunity and you know maybe a a result of that they are actually our ISV of the Year as well, which they probably showcase on their booth there. >> We're very proud of that. Yeah, no, it's a nice honor for us to get that distinction from Hortonworks and it's also a proof point to the collaboration that we have commercially. You know our sales reps work hand in hand. When we go into a large organization, we both sell to very large organizations. These are big transformative initiatives for these organizations and they're looking for solutions not technologies, so the fact that we can come in, we can show the proof points from other customers that are successfully using our joint solution, that's really, it's critical. >> And I think it helps that they're integrating with some of our key technologies because, you know, that's where our sales force and our customers really see, you know, that as well as that's where we're putting in the investment and that's where these guys are also investing, so it really, you know, helps the story together. So with Hive, we're doing a lot of investment of making it closer and closer to a sort of real-time database, where you can combine historical insights as well as your, you know, real-time insights. with the new acid merge capabilities where you can do the inserts, updates and deletes, and so that's exactly what Attunity's integrating with with Atlas. We're doing a lot of investments there and that's exactly what these guys are integrating with. So I think our customers and prospects really see that and that's where all the wins are coming from. >> Yeah, and I think together there were two main barriers that we saw in terms of customers getting the most out of their data lake investment. One of them was, as I'm moving data into my data lake, I need to be able to put some structure around this, I need to be able to handle continuously updating data from multiple sources and that's what we introduce with Attunity composed for Hive, building out the structure in an automated fashion so I've got analytics-ready data and using the acid merge capabilities just made those updates much easier. The second piece was metadata. Business users need to have confidence that the data that they're using. Where did this come from? How is it modified? And overcoming both of those is really helping organizations make the most of those investments. >> How would you describe customer attitudes right now in terms of their approach to data because I mean, as we've talked about, data is the new oil, so there's a real excitement and there's a buzz around it and yet there's also so many high-profile cases of breeches and security concerns, so what would you say, is it that customers, are they more excited or are they more trepidatious? How would you describe the CIL mindset right now? >> So I think security and governance has become top of minds right, so more and more the serveways that we've taken with our customers, right, you know, more and more customers are more concerned about security, they're more concerned about governance. The joke is that we talk to some of our customers and they keep talking to us about Atlas, which is sort of one of the newer offerings on governance that we have, but then we ask, "Hey, what about Ranger for enforcement?" And they're like, "Oh, yeah, that's a standard now." So we have Ranger, now it's a question of you know how do we get our you know hooks into the Atlas and all that kind of stuff, so yeah, definitely, as you mentioned, because of GDPR, because of all these kind of issues that have happened, it's definitely become top of minds. >> And I would say the other side of that is there's real excitement as well about the possibilities. Now bringing together all of this data, AI, machine learning, real-time analytics and real-time visualization. There's analytic capabilities now that organizations have never had, so there's great excitement, but there's also trepidation. You know, how do we solve for both of those? And together, we're doing just that. >> But as you mentioned, if you look at Europe, some of the European companies that are more hit by GDPR, they're actually excited that now they can, you know, really get to understand their data more and do better things with it as a result of you know the GDPR initiative. >> Absolutely. >> Are you using machine learning inside of Attunity in a Hortonworks context to find patterns in that data in real time? >> So we enable data scientists to build those models. So we're not only bringing the data together but again, part of the announcement last year is the way we structure that data in Hive, we provide a complete historic data store so every single transaction that has happened and we send those transactions as they happen, it's at a big append, so if you're a data scientist, I want to understand the complete history of the transactions of a customer to be able to build those models, so building those out in Hive and making those analytics ready in Hive, that's what we do, so we're a key enabler to machine learning. >> Making analytics ready rather than do the analytics in the spring, yeah. >> Absolutely. >> Yeah, the other side to that is that because they're integrated with Atlas, you know, now we have a new capability called DataPlane and Data Steward Studio so the idea there is around multi-everything, so more and more customers have multiple clusters whether it's on-prem, in the cloud, so now more and more customers are looking at how do I get a single glass pane of view across all my data whether it's on-prem, in the cloud, whether it's IOT, whether it's data at rest, right, so that's where DataPlane comes in and with the Data Steward Studio, which is our second offering on top of DataPlane, they can kind of get that view across all their clusters, so as soon as you know the data lands from Attunity into Atlas, you can get a view into that across as a part of Data Steward Studio, and one of the nice things we do in Data Steward Studio is that we also have machine learning models to do some profiling, to figure out that hey, this looks like a credit card, so maybe I should suggest this as a tag of sensitive data and now the end user, the end administration has the option of you know saying that okay, yeah, this is a credit card, I'll accept that tag, or they can reject that and pick one of their own. >> Will any of this going forward of the Attunity CDC change in the capture capability be containerized for deployment to the edges in HDP 3.0? I mean, 'cause it seems, I mean for internetive things, edge analytics and so forth, change data capture, is it absolutely necessary to make the entire, some call it the fog computing, cloud or whatever, to make it a completely transactional environment for all applications from micro endpoint to micro endpoint? Are there any plans to do that going forward? >> Yeah, so I think what HDP 3.0 as you mentioned right, one of the key factors that was coming into play was around time to value, so with containerization now being able to bring third-party apps on top of Yarn through Docker, I think that's definitely an avenue that we're looking at. >> Yes, we're excited about that with 3.0 as well, so that's definitely in the cards for us. >> Great, well, Ali and Dan, thank you so much for coming on theCUBE. It's fun to have you here. >> Nice to be here, thank you guys. >> Great to have you. >> Thank you, it was a pleasure. >> I'm Rebecca Knight, for James Kobielus, we will have more from DataWorks in San Jose just after this. (techno music)

Published Date : Jun 19 2018

SUMMARY :

to you by Hortonworks. He is the VP Product So I want to start with able to take those changes They are well known in this business. about taking the metadata that we capture Sure, for more of the into the play now, you at the DataWorks Berlin event but also all over the world, so the timing of our announcement of the Atlas integration. so the leading-edge work ISV of the Year as well, fact that we can come in, so it really, you know, that the data that they're using. right, so more and more the about the possibilities. that now they can, you know, is the way we structure that data in Hive, do the analytics in the spring, yeah. Yeah, the other side to forward of the Attunity CDC one of the key factors so that's definitely in the cards for us. It's fun to have you here. Kobielus, we will have more

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
James KobielusPERSON

0.99+

Rebecca KnightPERSON

0.99+

Dan PotterPERSON

0.99+

HortonworksORGANIZATION

0.99+

Ali BajwahPERSON

0.99+

DanPERSON

0.99+

Ali BajwaPERSON

0.99+

AliPERSON

0.99+

James KobielusPERSON

0.99+

Thursday morningDATE

0.99+

San JoseLOCATION

0.99+

Silicon ValleyLOCATION

0.99+

last yearDATE

0.99+

San JoseLOCATION

0.99+

AttunityORGANIZATION

0.99+

Last yearDATE

0.99+

OneQUANTITY

0.99+

second pieceQUANTITY

0.99+

GDPRTITLE

0.99+

AtlasORGANIZATION

0.99+

ThursdayDATE

0.99+

bothQUANTITY

0.99+

theCUBEORGANIZATION

0.98+

RangerORGANIZATION

0.98+

second offeringQUANTITY

0.98+

DataWorksORGANIZATION

0.98+

EuropeLOCATION

0.98+

AtlasTITLE

0.98+

Boston, MassachusettsLOCATION

0.98+

todayDATE

0.97+

DataWorks Summit 2018EVENT

0.96+

two main barriersQUANTITY

0.95+

DataPlane ServicesORGANIZATION

0.95+

DataWorks Summit 2018EVENT

0.94+

oneQUANTITY

0.93+

San Jose, CaliforniaLOCATION

0.93+

DockerTITLE

0.9+

single glassQUANTITY

0.87+

3.0OTHER

0.85+

EuropeanOTHER

0.84+

AttunityPERSON

0.84+

HiveLOCATION

0.83+

HDP 3.0OTHER

0.82+

one nice thingQUANTITY

0.82+

DataWorks BerlinEVENT

0.81+

EUORGANIZATION

0.81+

firstQUANTITY

0.8+

DataPlaneTITLE

0.8+

EULOCATION

0.78+

EDWTITLE

0.77+

Data Steward StudioORGANIZATION

0.73+

HiveORGANIZATION

0.73+

Data Steward StudioTITLE

0.69+

single transactionQUANTITY

0.68+

RangerTITLE

0.66+

StudioCOMMERCIAL_ITEM

0.63+

CDCORGANIZATION

0.58+

DataPlaneORGANIZATION

0.55+

themQUANTITY

0.53+

HDP 3.0OTHER

0.52+

Eric Herzog, IBM | DataWorks Summit 2018


 

>> Live from San Jose in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2018, brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of DataWorks here in San Jose, California. I'm your host, Rebecca Knight, along with my co-host, James Kobielus. We have with us Eric Herzog. He is the Chief Marketing Officer and VP of Global Channels at the IBM Storage Division. Thanks so much for coming on theCUBE once again, Eric. >> Well, thank you. We always love to be on theCUBE and talk to all of theCUBE analysts about various topics, data, storage, multi-cloud, all the works. >> And before the cameras were rolling, we were talking about how you might be the biggest CUBE alum in the sense of you've been on theCUBE more times than anyone else. >> I know I'm in the top five, but I may be number one, I have to check with Dave Vellante and crew and see. >> Exactly and often wearing a Hawaiian shirt. >> Yes. >> Yes, I was on theCUBE last week from CISCO Live. I was not wearing a Hawaiian shirt. And Stu and John gave me a hard time about why was not I wearing a Hawaiian shirt? So I make sure I showed up to the DataWorks show- >> Stu, Dave, get a load. >> You're in California with a tan, so it fits, it's good. >> So we were talking a little bit before the cameras were rolling and you were saying one of the points that is sort of central to your professional life is it's not just about the storage, it's about the data. So riff on that a little bit. >> Sure, so at IBM we believe everything is data driven and in fact we would argue that data is more valuable than oil or diamonds or plutonium or platinum or silver to anything else. It is the most viable asset, whether you be a global Fortune 500, whether you be a midsize company or whether you be Herzogs Bar and Grill. So data is what you use with your suppliers, with your customers, with your partners. Literally everything around your company is really built around the data so most effectively managing it and make sure, A, it's always performant because when it's not performant they go away. As you probably know, Google did a survey that one, two, after one, two they go off your website, they click somewhere else so has to be performant. Obviously in today's 365, 7 by 24 company it needs to always be resilient and reliable and it always needs to be available, otherwise if the storage goes down, guess what? Your AI doesn't work, your Cloud doesn't work, whatever workload, if you're more traditional, your Oracle, Sequel, you know SAP, none of those workloads work if you don't have a solid storage foundation underneath your data driven enterprise. >> So with that ethos in mind, talk about the products that you are launching, that you newly launched and also your product roadmap going forward. >> Sure, so for us everything really is that storage is this critical foundation for the data driven, multi Cloud enterprise. And as I've said before on theCube, all of our storage software's now Cloud-ified so if you need to automatically tier out to IBM Cloud or Amazon or Azure, we automatically will move the data placement around from one premise out to a Cloud and for certain customers who may be multi Cloud, in this case using multiple private Cloud providers, which happens due to either legal reasons or procurement reasons or geographic reasons for the larger enterprises, we can handle that as well. That's part of it, second thing is we just announced earlier today an artificial intelligence, an AI reference architecture, that incorporates a full stack from the very bottom, both servers and storage, all the way up through the top layer, then the applications on top, so we just launched that today. >> AI for storage management or AI for run a range of applications? >> Regular AI, artificial intelligence from an application perspective. So we announced that reference architecture today. Basically think of the reference architecture as your recipe, your blueprint, of how to put it all together. Some of the components are from IBM, such as Spectrum Scale and Spectrum Computing from my division, our servers from our Cloud division. Some are opensource, Tensor, Caffe, things like that. Basic gives you what the stack needs to be, and what you need to do in various AI workloads, applications and use cases. >> I believe you have distributed deep learning as an IBM capability, that's part of that stack, is that correct? >> That is part of the stack, it's like in the middle of the stack. >> Is it, correct me if I'm wrong, that's containerization of AI functionality? >> Right. >> For distributed deployment? >> Right. >> In an orchestrated Kubernetes fabric, is that correct? >> Yeah, so when you look at it from an IBM perspective, while we clearly support the virtualized world, the VM wares, the hyper V's, the KVMs and the OVMs, and we will continue to do that, we're also heavily invested in the container environment. For example, one of our other divisions, the IBM Cloud Private division, has announced a solution that's all about private Clouds, you can either get it hosted at IBM or literally buy our stack- >> Rob Thomas in fact demoed it this morning, here. >> Right, exactly. And you could create- >> At DataWorks. >> Private Cloud initiative, and there are companies that, whether it be for security purposes or whether it be for legal reasons or other reasons, don't want to use public Cloud providers, be it IBM, Amazon, Azure, Google or any of the big public Cloud providers, they want a private Cloud and IBM either A, will host it or B, with IBM Cloud Private. All of that infrastructure is built around a containerized environment. We support the older world, the virtualized world, and the newer world, the container world. In fact, our storage, allows you to have persistent storage in a container's environment, Dockers and Kubernetes, and that works on all of our block storage and that's a freebie, by the way, we don't charge for that. >> You've worked in the data storage industry for a long time, can you talk a little bit about how the marketing message has changed and evolved since you first began in this industry and in terms of what customers want to hear and what assuages their fears? >> Sure, so nobody cares about speeds and feeds, okay? Except me, because I've been doing storage for 32 years. >> And him, he might care. (laughs) >> But when you look at it, the decision makers today, the CIOs, in 32 years, including seven start ups, IBM and EMC, I've never, ever, ever, met a CIO who used to be a storage guy, ever. So, they don't care. They know that they need storage and the other infrastructure, including servers and networking, but think about it, when the app is slow, who do they blame? Usually they blame the storage guy first, secondarily they blame the server guy, thirdly they blame the networking guy. They never look to see that their code stack is improperly done. Really what you have to do is talk applications, workloads and use cases which is what the AI reference architecture does. What my team does in non AI workloads, it's all about, again, data driven, multi Cloud infrastructure. They want to know how you're going to make a new workload fast AI. How you're going to make their Cloud resilient whether it's private or hybrid. In fact, IBM storage sells a ton of technology to large public Cloud providers that do not have the initials IBM. We sell gobs of storage to other public Cloud providers, both big, medium and small. It's really all about the applications, workloads and use cases, and that's what gets people excited. You basically need a position, just like I talked about with the AI foundations, storage is the critical foundation. We happen to be, knocking on wood, let's hope there's no earthquake, since I've lived here my whole life, and I've been in earthquakes, I was in the '89 quake. Literally fell down a bunch of stairs in the '89 quake. If there's an earthquake as great as IBM storage is, or any other storage or servers, it's crushed. Boom, you're done! Okay, well you need to make sure that your infrastructure, really your data, is covered by the right infrastructure and that it's always resilient, it's always performing and is always available. And that's what IBM drives is about, that's the message, not about how many gigabytes per second in bandwidth or what's the- Not that we can't spew that stuff when we talk to the right person but in general people don't care about it. What they want to know is, "Oh that SAP workload took 30 hours and now it takes 30 minutes?" We have public references that will say that. "Oh, you mean I can use eight to ten times less storage for the same money?" Yes, and we have public references that will say that. So that's what it's really about, so storage is really more from really a speeds and feeds Nuremberger sort of thing, and now all the Nurembergers are doing AI and Caffe and TensorFlow and all of that, they're all hackers, right? It used to be storage guys who used to do that and to a lesser extent server guys and definitely networking guys. That's all shifted to the software side so you got to talk the languages. What can we do with Hortonworks? By the way we were named in Q1 of 2018 as the Hortonworks infrastructure partner of the year. We work with Hortonworks all time, at all levels, whether it be with our channel partners, whether it be with our direct end users, however the customer wants to consume, we work with Hortonworks very closely and other providers as well in that big data analytics and the AI infrastructure world, that's what we do. >> So the containerizations side of the IBM AI stack, then the containerization capabilities in Hortonworks Data Platform 3.0, can you give us a sense for how you plan to, or do you plan at IBM, to work with Hortonworks to bring these capabilities, your reference architecture, into more, or bring their environment for that matter, into more of an alignment with what you're offering? >> So we haven't an exact decision of how we're going to do it, but we interface with Hortonworks on a continual basis. >> Yeah. >> We're working to figure out what's the right solution, whether that be an integrated solution of some type, whether that be something that we do through an adjunct to our reference architecture or some reference architecture that they have but we always make sure, again, we are their partner of the year for infrastructure named in Q1, and that's because we work very tightly with Hortonworks and make sure that what we do ties out with them, hits the right applications, workloads and use cases, the big data world, the analytic world and the AI world so that we're tied off, you know, together to make sure that we deliver the right solutions to the end user because that's what matters most is what gets the end users fired up, not what gets Hortonworks or IBM fired up, it's what gets the end users fired up. >> When you're trying to get into the head space of the CIO, and get your message out there, I mean what is it, what would you say is it that keeps them up at night? What are their biggest pain points and then how do you come in and solve them? >> I'd say the number one pain point for most CIOs is application delivery, okay? Whether that be to the line of business, put it this way, let's take an old workload, okay? Let's take that SAP example, that CIO was under pressure because they were trying, in this case it was a giant retailer who was shipping stuff every night, all over the world. Well guess what? The green undershirts in the wrong size, went to Paducah, Kentucky and then one of the other stores, in Singapore, which needed those green shirts, they ended up with shoes and the reason is, they couldn't run that SAP workload in a couple hours. Now they run it in 30 minutes. It used to take 30 hours. So since they're shipping every night, you're basically missing a cycle, essentially and you're not delivering the right thing from a retail infrastructure perspective to each of their nodes, if you will, to their retail locations. So they care about what do they need to do to deliver to the business the right applications, workloads and use cases on the right timeframe and they can't go down, people get fired for that at the CIO level, right? If something goes down, the CIO is gone and obviously for certain companies that are more in the modern mode, okay? People who are delivering stuff and their primary transactional vehicle is the internet, not retail, not through partners, not through people like IBM, but their primary transactional vehicle is a website, if that website is not resilient, performant and always reliable, then guess what? They are shut down and they're not selling anything to anybody, which is to true if you're Nordstroms, right? Someone can always go into the store and buy something, right, and figure it out? Almost all old retailers have not only a connection to core but they literally have a server and storage in every retail location so if the core goes down, guess what, they can transact. In the era of the internet, you don't do that anymore. Right? If you're shipping only on the internet, you're shipping on the internet so whether it be a new workload, okay? An old workload if you're doing the whole IOT thing. For example, I know a company that I was working with, it's a giant, private mining company. They have those giant, like three story dump trucks you see on the Discovery Channel. Those things cost them a hundred million dollars, so they have five thousand sensors on every dump truck. It's a fricking dump truck but guess what, they got five thousand sensors on there so they can monitor and make sure they take proactive action because if that goes down, whether these be diamond mines or these be Uranium mines or whatever it is, it costs them hundreds of millions of dollars to have a thing go down. That's, if you will, trying to take it out of the traditional, high tech area, which we all talk about, whether it be Apple or Google, or IBM, okay great, now let's put it to some other workload. In this case, this is the use of IOT, in a big data analytics environment with AI based infrastructure, to manage dump trucks. >> I think you're talking about what's called, "digital twins" in a networked environment for materials management, supply chain management and so forth. Are those requirements growing in terms of industrial IOT requirements of that sort and how does that effect the amount of data that needs to be stored, the sophistication of the AI and the stream competing that needs to be provisioned? Can you talk to that? >> The amount of data is growing exponentially. It's growing at yottabytes and zettabytes a year now, not at just exabytes anymore. In fact, everybody on their iPhone or their laptop, I've got a 10GB phone, okay? My laptop, which happens to be a Power Book, is two terabytes of flash, on a laptop. So just imagine how much data's being generated if you're doing in a giant factory, whether you be in the warehouse space, whether you be in healthcare, whether you be in government, whether you be in the financial sector and now all those additional regulations, such as GDPR in Europe and other regulations across the world about what you have to do with your healthcare data, what you have to do with your finance data, the amount of data being stored. And then on top of it, quite honestly, from an AI big data analytics perspective, the more data you have, the more valuable it is, the more you can mine it or the more oil, it's as if the world was just oil, forget the pollution side, let's assume oil didn't cause pollution. Okay, great, then guess what? You would be using oil everywhere and you wouldn't be using solar, you'd be using oil and by the way you need more and more and more, and how much oil you have and how you control that would be the power. That right now is the power of data and if anything it's getting more and more and more. So again, you always have to be able to be resilient with that data, you always have to interact with things, like we do with Hortonworks or other application workloads. Our AI reference architecture is another perfect example of the things you need to do to provide, you know, at the base infrastructure, the right foundation. If you have the wrong foundation to a building, it falls over. Whether it be your house, a hotel, this convention center, if it had the wrong foundation, it falls over. >> Actually to follow the oil analogy just a little bit further, the more of this data you have, the more PII there is and it usually, and the more the workloads need to scale up, especially for things like data masking. >> Right. >> When you have compliance requirements like GDPR, so you want to process the data but you need to mask it first, therefore you need clusters that conceivably are optimized for high volume, highly scalable masking in real time, to drive the downstream app, to feed the downstream applications and to feed the data scientist, you know, data lakes, whatever, and so forth and so on? >> That's why you need things like Incredible Compute which IBM offers with the Power Platform. And why you need storage that, again, can scale up. >> Yeah. >> Can get as big as you need it to be, for example in our reference architecture, we use both what we call Spectrum Scale, which is a big data analytics workload performance engine, it has multiple threaded, multi tasking. In fact one of the largest banks in the world, if you happen to bank with them, your credit card fraud is being done on our stuff, okay? But at the same time we have what's called IBM Cloud Object Storage which is an object store, you want to take every one of those searches for fraud and when they find out that no one stole my MasterCard or the Visa, you still want to put it in there because then you mine it later and see patterns of how people are trying to steal stuff because it's all being done digitally anyway. You want to be able to do that. So you A, want to handle it very quickly and resiliently but then you want to be able to mine it later, as you said, mining the data. >> Or do high value anomaly detection in the moment to be able to tag the more anomalous data that you can then sift through later or maybe in the moment for realtime litigation. >> Well that's highly compute intensive, it's AI intensive and it's highly storage intensive on a performance side and then what happens is you store it all for, lets say, further analysis so you can tell people, "When you get your Am Ex card, do this and they won't steal it." Well the only way to do that, is you use AI on this ocean of data, where you're analyzing all this fraud that has happened, to look at patterns and then you tell me, as a consumer, what to do. Whether it be in the financial business, in this case the credit card business, healthcare, government, manufacturing. One of our resellers actually developed an AI based tool that can scan boxes and cans for faults on an assembly line and actually have sold it to a beer company and to a soda company that instead of people looking at the cans, like you see on the Food Channel, to pull it off, guess what? It's all automatically done. There's no people pulling the can off, "Oh, that can is damaged" and they're looking at it and by the way, sometimes they slip through. Now, using cameras and this AI based infrastructure from IBM, with our storage underneath the hood, they're able to do this. >> Great. Well Eric thank you so much for coming on theCUBE. It's always been a lot of fun talking to you. >> Great, well thank you very much. We love being on theCUBE and appreciate it and hope everyone enjoys the DataWorks conference. >> We will have more from DataWorks just after this. (techno beat music)

Published Date : Jun 19 2018

SUMMARY :

in the heart of Silicon He is the Chief Marketing Officer and talk to all of theCUBE analysts in the sense of you've been on theCUBE I know I'm in the top five, Exactly and often And Stu and John gave me a hard time about You're in California with and you were saying one of the points and it always needs to be available, that you are launching, for the data driven, and what you need to do of the stack, it's like in in the container environment. Rob Thomas in fact demoed it And you could create- and that's a freebie, by the Sure, so nobody cares And him, he might care. and the AI infrastructure So the containerizations So we haven't an exact decision so that we're tied off, you know, together and the reason is, they of the AI and the stream competing and by the way you need more of this data you have, And why you need storage that, again, my MasterCard or the Visa, you still want anomaly detection in the moment at the cans, like you of fun talking to you. the DataWorks conference. We will have more from

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Diane GreenePERSON

0.99+

Eric HerzogPERSON

0.99+

James KobielusPERSON

0.99+

Jeff HammerbacherPERSON

0.99+

DianePERSON

0.99+

IBMORGANIZATION

0.99+

Mark AlbertsonPERSON

0.99+

MicrosoftORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

Rebecca KnightPERSON

0.99+

JenniferPERSON

0.99+

ColinPERSON

0.99+

Dave VellantePERSON

0.99+

CiscoORGANIZATION

0.99+

Rob HofPERSON

0.99+

UberORGANIZATION

0.99+

Tricia WangPERSON

0.99+

FacebookORGANIZATION

0.99+

SingaporeLOCATION

0.99+

James ScottPERSON

0.99+

ScottPERSON

0.99+

Ray WangPERSON

0.99+

DellORGANIZATION

0.99+

Brian WaldenPERSON

0.99+

Andy JassyPERSON

0.99+

VerizonORGANIZATION

0.99+

Jeff BezosPERSON

0.99+

Rachel TobikPERSON

0.99+

AlphabetORGANIZATION

0.99+

Zeynep TufekciPERSON

0.99+

TriciaPERSON

0.99+

StuPERSON

0.99+

Tom BartonPERSON

0.99+

GoogleORGANIZATION

0.99+

Sandra RiveraPERSON

0.99+

JohnPERSON

0.99+

QualcommORGANIZATION

0.99+

Ginni RomettyPERSON

0.99+

FranceLOCATION

0.99+

Jennifer LinPERSON

0.99+

Steve JobsPERSON

0.99+

SeattleLOCATION

0.99+

BrianPERSON

0.99+

NokiaORGANIZATION

0.99+

EuropeLOCATION

0.99+

Peter BurrisPERSON

0.99+

Scott RaynovichPERSON

0.99+

RadisysORGANIZATION

0.99+

HPORGANIZATION

0.99+

DavePERSON

0.99+

EricPERSON

0.99+

Amanda SilverPERSON

0.99+

Tendü Yogurtçu, Syncsort | DataWorks Summit 2018


 

>> Live from San Jose, in the heart of Silicon Valley, It's theCUBE, covering DataWorks Summit 2018. Brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of DataWorks here in San Jose, California, I'm your host, along with my cohost, James Kobielus. We're joined by Tendu Yogurtcu, she is the CTO of Syncsort. Thanks so much for coming on theCUBE, for returning to theCUBE I should say. >> Thank you Rebecca and James. It's always a pleasure to be here. >> So you've been on theCUBE before and the last time you were talking about Syncsort's growth. So can you give our viewers a company update? Where you are now? >> Absolutely, Syncsort has seen extraordinary growth within the last the last three year. We tripled our revenue, doubled our employees and expanded the product portfolio significantly. Because of this phenomenal growth that we have seen, we also embarked on a new initiative with refreshing our brand. We rebranded and this was necessitated by the fact that we have such a broad portfolio of products and we are actually showing our new brand here, articulating the value our products bring with optimizing existing infrastructure, assuring data security and availability and advancing the data by integrating into next generation analytics platforms. So it's very exciting times in terms of Syncsort's growth. >> So the last time you were on the show it was pre-GT prop PR but we were talking before the cameras were rolling and you were explaining the kinds of adoption you're seeing and what, in this new era, you're seeing from customers and hearing from customers. Can you tell our viewers a little bit about it? >> When we were discussing last time, I talked about four mega trends we are seeing and those mega trends were primarily driven by the advanced business and operation analytics. Data governance, cloud, streaming and data science, artificial intelligence. And we talked, we really made a lot of announcement and focus on the use cases around data governance. Primarily helping our customers for the GDPR Global Data Protection Regulation initiatives and how we can create that visibility in the enterprise through the data by security and lineage and delivering trust data sets. Now we are talking about cloud primarily and the keynotes, this event and our focus is around cloud, primarily driven by again the use cases, right? How the businesses are adopting to the new era. One of the challenges that we see with our enterprise customers, over 7000 customers by the way, is the ability to future-proof their applications. Because this is a very rapidly changing stack. We have seen the keynotes talking about the importance of how do you connect your existing infrastructure with the future modern, next generation platforms. How do you future-proof the platform, make a diagnostic about whether it's Amazon, Microsoft of Google Cloud. Whether it's on-premise in legacy platforms today that the data has to be available in the next generation platforms. So the challenge we are seeing is how do we keep the data fresh? How do we create that abstraction that applications are future-proofed? Because organizations, even financial services customers, banking, insurance, they now have at least one cluster running in the public cloud. And there's private implementations, hybrid becomes the new standard. So our focus and most recent announcements have been around really helping our customers with real-time resilient changes that capture, keeping the data fresh, feeding into the downstream applications with the streaming and messaging data frames, for example Kafka, Amazon Kinesis, as well as keeping the persistent stores and how to Data Lake on-premise in the cloud fresh. >> Puts you into great alignment with your partner Hortonworks so, Tendu I wonder if we are here at DataWorks, it's Hortonworks' show, if you can break out for our viewers, what is the nature, the levels of your relationship, your partnership with Hortonworks and how the Syncsort portfolio plays with HDP 3.0 with Hortonworks DataFlow and the data plan services at a high level. >> Absolutely, so we have been a longtime partner with Hortonworks and a couple of years back, we strengthened our partnership. Hortonworks is reselling Syncsort and we have actually a prescriptive solution for Hadoop and ETL onboarding in Hadoop jointly. And it's very complementary, our strategy is very complementary because what Hortonworks is trying and achieving, is creating that abstraction and future-proofing and interaction consistency around referred as this morning. Across the platform, whether it's on-premise or in the cloud or across multiple clouds. We are providing the data application layer consistency and future-proofing on top of the platform. Leveraging the tools in the platform for orchestration, integrating with HTP, certifying with Trange or HTP, all of the tools DataFlow and at last of course for lineage. >> The theme of this conference is ideas, insights and innovation and as a partner of Hortonworks, can you describe what it means for you to be at this conference? What kinds of community and deepening existing relationships, forming new ones. Can you talk about what happens here? >> This is one of the major events around data and it's DataWorks as opposed to being more specific to the Hadoop itself, right? Because stack is evolving and data challenges are evolving. For us, it means really the interactions with the customers, the organizations and the partners here. Because the dynamics of the use cases is also evolving. For example Data Lake implementations started in U.S. And we started MER European organizations moving to streaming, data streaming applications faster than U.S. >> Why is that? >> Yeah. >> Why are Europeans moving faster to streaming than we are in North America? >> I think a couple of different things might participate. The open sources really enabling organizations to move fast. When the Data Lake initiative started, we have seen a little bit slow start in Europe but more experimentation with the Open Source Stack. And by that the more transformative use cases started really evolving. Like how do I manage interactions of the users with the remote controls as they are watching live TV, type of transformative use cases became important. And as we move to the transformative use cases, streaming is also very critical because lots of data is available and being able to keep the cloud data stores as well as on-premise data stores and downstream applications with fresh data becomes important. We in fact in early June announced that Syncsort's now's a part of Microsoft One Commercial Partner Program. With that our integrate solutions with data integration and data quality are Azure gold certified and Azure ready. We are in co-sale agreement and we are helping jointly a lot of customers, moving data and workloads to Azure and keeping those data stores close to platforms in sync. >> Right. >> So lots of exciting things, I mean there's a lot happening with the application space. There's also lots still happening connected to the governance cases that we have seen. Feeding security and IT operations data into again modern day, next generation analytics platforms is key. Whether it's Splunk, whether it's Elastic, as part of the Hadoop Stack. So we are still focused on governance as part of this multi-cloud and on-premise the cloud implementations as well. We in fact launched our Ironstream for IBMI product to help customers, not just making this state available for mainframes but also from IBMI into Splunk, Elastic and other security information and event management platforms. And today we announced work flow optimization across on-premise and multi-cloud and cloud platforms. So lots of focus across to optimize, assure and integrate portfolio of products helping customers with the business use cases. That's really our focus as we innovate organically and also acquire technologies and solutions. What are the problems we are solving and how we can help our customers with the business and operation analytics, targeting those mega trends around data governance, cloud streaming and also data science. >> What is the biggest trend do you think that is sort of driving all of these changes? As you said, the data is evolving. The use cases are evolving. What is it that is keeping your customers up at night? >> Right now it's still governance, keeping them up at night, because this evolving architecture is also making governance more complex, right? If we are looking at financial services, banking, insurance, healthcare, there are lots of existing infrastructures, mission critical data stores on mainframe IBMI in addition to this gravity of data changing and lots of data with the online businesses generated in the cloud. So how to govern that also while optimizing and making those data stores available for next generation analytics, makes the governance quite complex. So that really keeps and creates a lot of opportunity for the community, right? All of us here to address those challenges. >> Because it sounds to me, I'm hearing Splunk, Advanced Machine did it, I think of the internet of things and sensor grids. I'm hearing IBM mainframes, that's transactional data, that's your customer data and so forth. It seems like much of this data that you're describing that customers are trying to cleanse and consolidate and provide strict governance on, is absolutely essential for them to drive more artificial intelligence into end applications and mobile devices that are being used to drive the customer experience. Do you see more of your customers using your tools to massage the data sets as it were than data scientists then use to build and train their models for deployment into edge applications. Is that an emerging area where your customers are deploying Syncsort? >> Thank you for asking that question. >> It's a complex question. (laughing) But thanks for impacting it... >> It is a complex question but it's very important question. Yes and in the previous discussions, we have seen, and this morning also, Rob Thomas from IBM mentioned it as well, that machine learning and artificial intelligence data science really relies on high-quality data, right? It's 1950s anonymous computer scientist says garbage in, garbage out. >> Yeah. >> When we are using artificial intelligence and machine learning, the implications, the impact of bad data multiplies. Multiplies with the training of historical data. Multiplies with the insights that we are getting out of that. So data scientists today are still spending significant time on preparing the data for the iPipeline, and the data science pipeline, that's where we shine. Because our integrate portfolio accesses the data from all enterprise data stores and cleanses and matches and prepares that in a trusted manner for use for advanced analytics with machine learning, artificial intelligence. >> Yeah 'cause the magic of machine learning for predictive analytics is that you build a statistical model based on the most valid data set for the domain of interest. If the data is junk, then you're going to be building a junk model that will not be able to do its job. So, for want of a nail, the kingdom was lost. For want of a Syncsort, (laughing) Data cleansing and you know governance tool, the whole AI superstructure will fall down. >> Yes, yes absolutely. >> Yeah, good. >> Well thank you so much Tendu for coming on theCUBE and for giving us a lot of background and information. >> Thank you for having me, thank you. >> Good to have you. >> Always a pleasure. >> I'm Rebecca Knight for James Kobielus. We will have more from theCUBE's live coverage of DataWorks 2018 just after this. (upbeat music)

Published Date : Jun 19 2018

SUMMARY :

in the heart of Silicon Valley, It's theCUBE, We're joined by Tendu Yogurtcu, she is the CTO of Syncsort. It's always a pleasure to be here. and the last time you were talking about Syncsort's growth. and expanded the product portfolio significantly. So the last time you were on the show it was pre-GT prop One of the challenges that we see with our enterprise and how the Syncsort portfolio plays with HDP 3.0 We are providing the data application layer consistency and innovation and as a partner of Hortonworks, can you Because the dynamics of the use cases is also evolving. When the Data Lake initiative started, we have seen a little What are the problems we are solving and how we can help What is the biggest trend do you think that is businesses generated in the cloud. massage the data sets as it were than data scientists It's a complex question. Yes and in the previous discussions, we have seen, and the data science pipeline, that's where we shine. If the data is junk, then you're going to be building and for giving us a lot of background and information. of DataWorks 2018 just after this.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
RebeccaPERSON

0.99+

James KobielusPERSON

0.99+

JamesPERSON

0.99+

IBMORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

Rebecca KnightPERSON

0.99+

MicrosoftORGANIZATION

0.99+

Tendu YogurtcuPERSON

0.99+

HortonworksORGANIZATION

0.99+

EuropeLOCATION

0.99+

Rob ThomasPERSON

0.99+

San JoseLOCATION

0.99+

U.S.LOCATION

0.99+

Silicon ValleyLOCATION

0.99+

SyncsortORGANIZATION

0.99+

1950sDATE

0.99+

San Jose, CaliforniaLOCATION

0.99+

Hortonworks'ORGANIZATION

0.99+

North AmericaLOCATION

0.99+

early JuneDATE

0.99+

DataWorksORGANIZATION

0.99+

over 7000 customersQUANTITY

0.99+

OneQUANTITY

0.98+

theCUBEORGANIZATION

0.98+

DataWorks Summit 2018EVENT

0.97+

ElasticTITLE

0.97+

oneQUANTITY

0.96+

todayDATE

0.96+

IBMITITLE

0.96+

fourQUANTITY

0.95+

SplunkTITLE

0.95+

Tendü YogurtçuPERSON

0.95+

KafkaTITLE

0.94+

this morningDATE

0.94+

Data LakeORGANIZATION

0.93+

DataWorksTITLE

0.92+

iPipelineCOMMERCIAL_ITEM

0.91+

DataWorks 2018EVENT

0.91+

SplunkPERSON

0.9+

ETLORGANIZATION

0.87+

AzureTITLE

0.85+

Google CloudORGANIZATION

0.83+

HadoopTITLE

0.82+

last three yearDATE

0.82+

couple of years backDATE

0.81+

SyncsortPERSON

0.8+

HTPTITLE

0.78+

EuropeanOTHER

0.77+

TenduPERSON

0.74+

EuropeansPERSON

0.72+

Data Protection RegulationTITLE

0.71+

KinesisTITLE

0.7+

least one clusterQUANTITY

0.7+

IronstreamCOMMERCIAL_ITEM

0.66+

ProgramTITLE

0.61+

AzureORGANIZATION

0.54+

Commercial PartnerOTHER

0.54+

DataFlowTITLE

0.54+

OneTITLE

0.54+

CTOPERSON

0.53+

3.0TITLE

0.53+

TrangeTITLE

0.53+

StackTITLE

0.51+

Arun Murthy, Hortonworks | DataWorks Summit 2018


 

>> Live from San Jose in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2018, brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of DataWorks here in San Jose, California. I'm your host, Rebecca Knight, along with my cohost, Jim Kobielus. We're joined by Aaron Murphy, Arun Murphy, sorry. He is the co-founder and chief product officer of Hortonworks. Thank you so much for returning to theCUBE. It's great to have you on >> Yeah, likewise. It's been a fun time getting back, yeah. >> So you were on the main stage this morning in the keynote, and you were describing the journey, the data journey that so many customers are on right now, and you were talking about the cloud saying that the cloud is part of the strategy but it really needs to fit into the overall business strategy. Can you describe a little bit about how you're approach to that? >> Absolutely, and the way we look at this is we help customers leverage data to actually deliver better capabilities, better services, better experiences, to their customers, and that's the business we are in. Now with that obviously we look at cloud as a really key part of it, of the overall strategy in terms of how you want to manage data on-prem and on the cloud. We kind of joke that we ourself live in a world of real-time data. We just live in it and data is everywhere. You might have trucks on the road, you might have drawings, you might have sensors and you have it all over the world. At that point, we've kind of got to a point where enterprise understand that they'll manage all the infrastructure but in a lot of cases, it will make a lot more sense to actually lease some of it and that's the cloud. It's the same way, if you're delivering packages, you don't got buy planes and lay out roads you go to FedEx and actually let them handle that view. That's kind of what the cloud is. So that is why we really fundamentally believe that we have to help customers leverage infrastructure whatever makes sense pragmatically both from an architectural standpoint and from a financial standpoint and that's kind of why we talked about how your cloud strategy, is part of your data strategy which is actually fundamentally part of your business strategy. >> So how are you helping customers to leverage this? What is on their minds and what's your response? >> Yeah, it's really interesting, like I said, cloud is cloud, and infrastructure management is certainly something that's at the foremost, at the top of the mind for every CIO today. And what we've consistently heard is they need a way to manage all this data and all this infrastructure in a hybrid multi-tenant, multi-cloud fashion. Because in some GEOs you might not have your favorite cloud renderer. You know, go to parts of Asia is a great example. You might have to use on of the Chinese clouds. You go to parts of Europe, especially with things like the GDPR, the data residency laws and so on, you have to be very, very cognizant of where your data gets stored and where your infrastructure is present. And that is why we fundamentally believe it's really important to have and give enterprise a fabric with which it can manage all of this. And hide the details of all of the underlying infrastructure from them as much as possible. >> And that's DataPlane Services. >> And that's DataPlane Services, exactly. >> The Hortonworks DataPlane Services we launched in October of last year. Actually I was on CUBE talking about it back then too. We see a lot of interest, a lot of excitement around it because now they understand that, again, this doesn't mean that we drive it down to the least common denominator. It is about helping enterprises leverage the key differentiators at each of the cloud renderers products. For example, Google, which we announced a partnership, they are really strong on AI and MO. So if you are running TensorFlow and you want to deal with things like Kubernetes, GKE is a great place to do it. And, for example, you can now go to Google Cloud and get DPUs which work great for TensorFlow. Similarly, a lot of customers run on Amazon for a bunch of the operational stuff, Redshift as an example. So the world we live in, we want to help the CIO leverage the best piece of the cloud but then give them a consistent way to manage and count that data. We were joking on stage that IT has just about learned how deal with Kerberos and Hadoob And now we're telling them, "Oh, go figure out IM on Google." which is also IM on Amazon but they are completely different. The only thing that's consistent is the name. So I think we have a unique opportunity especially with the open source technologies like Altas, Ranger, Knox and so on, to be able to draw a consistent fabric over this and secured occurrence. And help the enterprise leverage the best parts of the cloud to put a best fit architecture together, but which also happens to be a best of breed architecture. >> So the fabric is everything you're describing, all the Apache open source projects in which HortonWorks is a primary committer and contributor, are able to scheme as in policies and metadata and so forth across this distributed heterogeneous fabric of public and private cloud segments within a distributed environment. >> Exactly. >> That's increasingly being containerized in terms of the applications for deployment to edge nodes. Containerization is a big theme in HTP3.0 which you announced at this show. >> Yeah. >> So, if you could give us a quick sense for how that containerization capability plays into more of an edge focus for what your customers are doing. >> Exactly, great point, and again, the fabric is obviously, the core parts of the fabric are the open source projects but we've also done a lot of net new innovation with data plans which, by the way, is also open source. Its a new product and a new platform that you can actually leverage, to lay it out over the open source ones you're familiar with. And again, like you said, containerization, what is actually driving the fundamentals of this, the details matter, the scale at which we operate, we're talking about thousands of nodes, terabytes of data. The details really matter because a 5% improvement at that scale leads to millions of dollars in optimization for capex and opex. So that's why all of that, the details are being fueled and driven by the community which is kind of what we tell over HDP3 Until the key ones, like you said, are containerization because now we can actually get complete agility in terms of how you deploy the applications. You get isolation not only at the resource management level with containers but you also get it at the software level, which means, if two data scientists wanted to use a different version of Python or Scala or Spark or whatever it is, they get that consistently and holistically. That now they can actually go from the test dev cycle into production in a completely consistent manner. So that's why containers are so big because now we can actually leverage it across the stack and the things like MiNiFi showing up. We can actually-- >> Define MiNiFi before you go further. What is MiNiFi for our listeners? >> Great question. Yeah, so we've always had NiFi-- >> Real-time >> Real-time data flow management and NiFi was still sort of within the data center. What MiNiFi does is actually now a really, really small layer, a small thin library if you will that you can throw on a phone, a doorbell, a sensor and that gives you all the capabilities of NiFi but at the edge. >> Mmm Right? And it's actually not just data flow but what is really cool about NiFi it's actually command and control. So you can actually do bidirectional command and control so you can actually change in real-time the flows you want, the processing you do, and so on. So what we're trying to do with MiNiFi is actually not just collect data from the edge but also push the processing as much as possible to the edge because we really do believe a lot more processing is going to happen at the edge especially with the A6 and so on coming out. There will be custom hardware that you can throw and essentially leverage that hardware at the edge to actually do this processing. And we believe, you know, we want to do that even if the cost of data not actually landing up at rest because at the end of the day we're in the insights business not in the data storage business. >> Well I want to get back to that. You were talking about innovation and how so much of it is driven by the open source community and you're a veteran of the big data open source community. How do we maintain that? How does that continue to be the fuel? >> Yeah, and a lot of it starts with just being consistent. From day one, James was around back then, in 2011 we started, we've always said, "We're going to be open source." because we fundamentally believed that the community is going to out innovate any one vendor regardless of how much money they have in the bank. So we really do believe that's the best way to innovate mostly because their is a sense of shared ownership of that product. It's not just one vendor throwing some code out there try to shove it down the customers throat. And we've seen this over and over again, right. Three years ago, we talk about a lot of the data plane stuff comes from Atlas and Ranger and so on. None of these existed. These actually came from the fruits of the collaboration with the community with actually some very large enterprises being a part of it. So it's a great example of how we continue to drive it6 because we fundamentally believe that, that's the best way to innovate and continue to believe so. >> Right. And the community, the Apache community as a whole so many different projects that for example, in streaming, there is Kafka, >> Okay. >> and there is others that address a core set of common requirements but in different ways, >> Exactly. >> supporting different approaches, for example, they are doing streaming with stateless transactions and so forth, or stateless semantics and so forth. Seems to me that HortonWorks is shifting towards being more of a streaming oriented vendor away from data at rest. Though, I should say HDP3.0 has got great scalability and storage efficiency capabilities baked in. I wonder if you could just break it down a little bit what the innovations or enhancements are in HDP3.0 for those of your core customers, which is most of them who are managing massive multi-terabyte, multi-petabyte distributed, federated, big data lakes. What's in HDP3.0 for them? >> Oh for lots. Again, like I said, we obviously spend a lot of time on the streaming side because that's where we see. We live in a real-time world. But again, we don't do it at the cost of our core business which continues to be HDP. And as you can see, the community trend is drive, we talked about continuization massive step up for the Hadoob Community. We've also added support for GPUs. Again, if you think about Trove's at scale machine learning. >> Graphing processing units, >> Graphical-- >> AI, deep learning >> Yeah, it's huge. Deep learning, intensive flow and so on, really, really need a custom, sort of GPU, if you will. So that's coming. That's an HDP3. We've added a whole bunch of scalability improvements with HDFS. We've added federation because now we can go from, you can go over a billion files a billion objects in HDFS. We also added capabilities for-- >> But you indicated yesterday when we were talking that very few of your customers need that capacity yet but you think they will so-- >> Oh for sure. Again, part of this is as we enable more source of data in real-time that's the fuel which drives and that was always the strategy behind the HDF product. It was about, can we leverage the synergies between the real-time world, feed that into what you do today, in your classic enterprise with data at rest and that is what is driving the necessity for scale. >> Yes. >> Right. We've done that. We spend a lot of work, again, loading the total cost of ownership the TCO so we added erasure coding. >> What is that exactly? >> Yeah, so erasure coding is a classic sort of storage concept which allows you to actually in sort of, you know HTFS has always been three replicas So for redundancy, fault tolerance and recovery. Now, it sounds okay having three replicas because it's cheap disk, right. But when you start to think about our customers running 70, 80 hundred terabytes of data those three replicas add up because you've now gone from 80 terabytes of effective data where actually two 1/4 of an exobyte in terms of raw storage. So now what we can do with erasure coding is actually instead of storing the three blocks we actually store parody. We store the encoding of it which means we can actually go down from three to like two, one and a half, whatever we want to do. So, if we can get from three blocks to one and a half especially for your core data, >> Yeah >> the ones you're not accessing every day. It results in a massive savings in terms of your infrastructure costs. And that's kind of what we're in the business doing, helping customers do better with the data they have whether it's on-prem or on the cloud, that's sort of we want to help customers be comfortable getting more data under management along with secured and the lower TCO. The other sort of big piece I'm really excited about HDP3 is all the work that's happened to Hive Community for what we call the real-time database. >> Yes. >> As you guys know, you follow the whole sequel of ours in the Doob Space. >> And hive has changed a lot in the last several years, this is very different from what it was five years ago. >> The only thing that's same from five years ago is the name (laughing) >> So again, the community has done a phenomenal job, kind of, really taking sort of a, we used to call it like a sequel engine on HDFS. From there, to drive it with 3.0, it's now like, with Hive 3 which is part of HDP3 it's a full fledged database. It's got full asset support. In fact, the asset support is so good that writing asset tables is at least as fast as writing non-asset tables now. And you can do that not only on-- >> Transactional database. >> Exactly. Now not only can you do it on prem, you can do it on S3. So you can actually drive the transactions through Hive on S3. We've done a lot of work to actually, you were there yesterday when we were talking about some of the performance work we've done with LAP and so on to actually give consistent performance both on-prem and the cloud and this is a lot of effort simply because the performance characteristics you get from the storage layer with HDFS versus S3 are significantly different. So now we have been able to bridge those with things with LAP. We've done a lot of work and sort of enhanced the security model around it, governance and security. So now you get things like account level, masking, row-level filtering, all the standard stuff that you would expect and more from an Enprise air house. We talked to a lot of our customers, they're doing, literally tens of thousands of views because they don't have the capabilities that exist in Hive now. >> Mmm-hmm 6 And I'm sitting here kind of being amazed that for an open source set of tools to have the best security and governance at this point is pretty amazing coming from where we started off. >> And it's absolutely essential for GDPR compliance and compliance HIPA and every other mandate and sensitivity that requires you to protect personally identifiable information, so very important. So in many ways HortonWorks has one of the premier big data catalogs for all manner of compliance requirements that your customers are chasing. >> Yeah, and James, you wrote about it in the contex6t of data storage studio which we introduced >> Yes. >> You know, things like consent management, having--- >> A consent portal >> A consent portal >> In which the customer can indicate the degree to which >> Exactly. >> they require controls over their management of their PII possibly to be forgotten and so forth. >> Yeah, it's going to be forgotten, it's consent even for analytics. Within the context of GDPR, you have to allow the customer to opt out of analytics, them being part of an analytic itself, right. >> Yeah. >> So things like those are now something we enable to the enhanced security models that are done in Ranger. So now, it's sort of the really cool part of what we've done now with GDPR is that we can get all these capabilities on existing data an existing applications by just adding a security policy, not rewriting It's a massive, massive, massive deal which I cannot tell you how much customers are excited about because they now understand. They were sort of freaking out that I have to go to 30, 40, 50 thousand enterprise apps6 and change them to take advantage, to actually provide consent, and try to be forgotten. The fact that you can do that now by changing a security policy with Ranger is huge for them. >> Arun, thank you so much for coming on theCUBE. It's always so much fun talking to you. >> Likewise. Thank you so much. >> I learned something every time I listen to you. >> Indeed, indeed. I'm Rebecca Knight for James Kobeilus, we will have more from theCUBE's live coverage of DataWorks just after this. (Techno music)

Published Date : Jun 19 2018

SUMMARY :

brought to you by Hortonworks. It's great to have you on Yeah, likewise. is part of the strategy but it really needs to fit and that's the business we are in. And hide the details of all of the underlying infrastructure for a bunch of the operational stuff, So the fabric is everything you're describing, in terms of the applications for deployment to edge nodes. So, if you could give us a quick sense for Until the key ones, like you said, are containerization Define MiNiFi before you go further. Yeah, so we've always had NiFi-- and that gives you all the capabilities of NiFi the processing you do, and so on. and how so much of it is driven by the open source community that the community is going to out innovate any one vendor And the community, the Apache community as a whole I wonder if you could just break it down a little bit And as you can see, the community trend is drive, because now we can go from, you can go over a billion files the real-time world, feed that into what you do today, loading the total cost of ownership the TCO sort of storage concept which allows you to actually is all the work that's happened to Hive Community in the Doob Space. And hive has changed a lot in the last several years, And you can do that not only on-- the performance characteristics you get to have the best security and governance at this point and sensitivity that requires you to protect possibly to be forgotten and so forth. Within the context of GDPR, you have to allow The fact that you can do that now Arun, thank you so much for coming on theCUBE. Thank you so much. we will have more from theCUBE's live coverage of DataWorks

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Jim KobielusPERSON

0.99+

Rebecca KnightPERSON

0.99+

JamesPERSON

0.99+

Aaron MurphyPERSON

0.99+

Arun MurphyPERSON

0.99+

ArunPERSON

0.99+

2011DATE

0.99+

GoogleORGANIZATION

0.99+

5%QUANTITY

0.99+

80 terabytesQUANTITY

0.99+

FedExORGANIZATION

0.99+

twoQUANTITY

0.99+

Silicon ValleyLOCATION

0.99+

HortonworksORGANIZATION

0.99+

San JoseLOCATION

0.99+

AmazonORGANIZATION

0.99+

Arun MurthyPERSON

0.99+

HortonWorksORGANIZATION

0.99+

yesterdayDATE

0.99+

San Jose, CaliforniaLOCATION

0.99+

three replicasQUANTITY

0.99+

James KobeilusPERSON

0.99+

three blocksQUANTITY

0.99+

GDPRTITLE

0.99+

PythonTITLE

0.99+

EuropeLOCATION

0.99+

millions of dollarsQUANTITY

0.99+

ScalaTITLE

0.99+

SparkTITLE

0.99+

theCUBEORGANIZATION

0.99+

five years agoDATE

0.99+

one and a halfQUANTITY

0.98+

EnpriseORGANIZATION

0.98+

threeQUANTITY

0.98+

Hive 3TITLE

0.98+

Three years agoDATE

0.98+

bothQUANTITY

0.98+

AsiaLOCATION

0.97+

50 thousandQUANTITY

0.97+

TCOORGANIZATION

0.97+

MiNiFiTITLE

0.97+

ApacheORGANIZATION

0.97+

40QUANTITY

0.97+

AltasORGANIZATION

0.97+

Hortonworks DataPlane ServicesORGANIZATION

0.96+

DataWorks Summit 2018EVENT

0.96+

30QUANTITY

0.95+

thousands of nodesQUANTITY

0.95+

A6COMMERCIAL_ITEM

0.95+

KerberosORGANIZATION

0.95+

todayDATE

0.95+

KnoxORGANIZATION

0.94+

oneQUANTITY

0.94+

hiveTITLE

0.94+

two data scientistsQUANTITY

0.94+

eachQUANTITY

0.92+

ChineseOTHER

0.92+

TensorFlowTITLE

0.92+

S3TITLE

0.91+

October of last yearDATE

0.91+

RangerORGANIZATION

0.91+

HadoobORGANIZATION

0.91+

HIPATITLE

0.9+

CUBEORGANIZATION

0.9+

tens of thousandsQUANTITY

0.9+

one vendorQUANTITY

0.89+

last several yearsDATE

0.88+

a billion objectsQUANTITY

0.86+

70, 80 hundred terabytes of dataQUANTITY

0.86+

HTP3.0TITLE

0.86+

two 1/4 of an exobyteQUANTITY

0.86+

Atlas andORGANIZATION

0.85+

DataPlane ServicesORGANIZATION

0.84+

Google CloudTITLE

0.82+

Piotr Mierzejewski, IBM | Dataworks Summit EU 2018


 

>> Announcer: From Berlin, Germany, it's theCUBE covering Dataworks Summit Europe 2018 brought to you by Hortonworks. (upbeat music) >> Well hello, I'm James Kobielus and welcome to theCUBE. We are here at Dataworks Summit 2018, in Berlin, Germany. It's a great event, Hortonworks is the host, they made some great announcements. They've had partners doing the keynotes and the sessions, breakouts, and IBM is one of their big partners. Speaking of IBM, from IBM we have a program manager, Piotr, I'll get this right, Piotr Mierzejewski, your focus is on data science machine learning and data science experience which is one of the IBM Products for working data scientists to build and to train models in team data science enterprise operational environments, so Piotr, welcome to theCUBE. I don't think we've had you before. >> Thank you. >> You're a program manager. I'd like you to discuss what you do for IBM, I'd like you to discuss Data Science Experience. I know that Hortonworks is a reseller of Data Science Experience, so I'd like you to discuss the partnership going forward and how you and Hortonworks are serving your customers, data scientists and others in those teams who are building and training and deploying machine learning and deep learning, AI, into operational applications. So Piotr, I give it to you now. >> Thank you. Thank you for inviting me here, very excited. This is a very loaded question, and I would like to begin, before I get actually to why the partnership makes sense, I would like to begin with two things. First, there is no machine learning about data. And second, machine learning is not easy. Especially, especially-- >> James: I never said it was! (Piotr laughs) >> Well there is this kind of perception, like you can have a data scientist working on their Mac, working on some machine learning algorithms and they can create a recommendation engine, let's say in a two, three days' time. This is because of the explosion of open-source in that space. You have thousands of libraries, from Python, from R, from Scala, you have access to Spark. All these various open-source offerings that are enabling data scientists to actually do this wonderful work. However, when you start talking about bringing machine learning to the enterprise, this is not an easy thing to do. You have to think about governance, resiliency, the data access, actual model deployments, which are not trivial. When you have to expose this in a uniform fashion to actually various business units. Now all this has to actually work in a private cloud, public clouds environment, on a variety of hardware, a variety of different operating systems. Now that is not trivial. (laughs) Now when you deploy a model, as the data scientist is going to deploy the model, he needs to be able to actually explain how the model was created. He has to be able to explain what the data was used. He needs to ensure-- >> Explicable AI, or explicable machine learning, yeah, that's a hot focus of our concern, of enterprises everywhere, especially in a world where governance and tracking and lineage GDPR and so forth, so hot. >> Yes, you've mentioned all the right things. Now, so given those two things, there's no ML web data, and ML is not easy, why the partnership between Hortonworks and IBM makes sense, well, you're looking at the number one industry leading big data plot from Hortonworks. Then, you look at a DSX local, which, I'm proud to say, I've been there since the first line of code, and I'm feeling very passionate about the product, is the merger between the two, ability to integrate them tightly together gives your data scientists secure access to data, ability to leverage the spark that runs inside a Hortonworks cluster, ability to actually work in a platform like DSX that doesn't limit you to just one kind of technology but allows you to work with multiple technologies, ability to actually work on not only-- >> When you say technologies here, you're referring to frameworks like TensorFlow, and-- >> Precisely. Very good, now that part I'm going to get into very shortly, (laughs) so please don't steal my thunder. >> James: Okay. >> Now, what I was saying is that not only DSX and Hortonworks integrated to the point that you can actually manage your Hadoop clusters, Hadoop environments within a DSX, you can actually work on your Python models and your analytics within DSX and then push it remotely to be executed where your data is. Now, why is this important? If you work with the data that's megabytes, gigabytes, maybe you know you can pull it in, but in truly what you want to do when you move to the terabytes and the petabytes of data, what happens is that you actually have to push the analytics to where your data resides, and leverage for example YARN, a resource manager, to distribute your workloads and actually train your models on your actually HDP cluster. That's one of the huge volume propositions. Now, mind you to say this is all done in a secure fashion, with ability to actually install DSX on the edge notes of the HDP clusters. >> James: Hmm... >> As of HDP 264, DSX has been certified to actually work with HDP. Now, this partnership embarked, we embarked on this partnership about 10 months ago. Now, often happens that there is announcements, but there is not much materializing after such announcement. This is not true in case of DSX and HDP. We have had, just recently we have had a release of the DSX 1.2 which I'm super excited about. Now, let's talk about those open-source toolings in the various platforms. Now, you don't want to force your data scientists to actually work with just one environment. Some of them might prefer to work on Spark, some of them like their RStudio, they're statisticians, they like R, others like Python, with Zeppelin, say Jupyter Notebook. Now, how about Tensorflow? What are you going to do when actually, you know, you have to do the deep learning workloads, when you want to use neural nets? Well, DSX does support ability to actually bring in GPU notes and do the Tensorflow training. As a sidecar approach, you can append the note, you can scale the platform horizontally and vertically, and train your deep learning workloads, and actually remove the sidecar out. So you should put it towards the cluster and remove it at will. Now, DSX also actually not only satisfies the needs of your programmer data scientists, that actually code in Python and Scala or R, but actually allows your business analysts to work and create models in a visual fashion. As of DSX 1.2, you can actually, we have embedded, integrated, an SPSS modeler, redesigned, rebranded, this is an amazing technology from IBM that's been on for a while, very well established, but now with the new interface, embedded inside a DSX platform, allows your business analysts to actually train and create the model in a visual fashion and, what is beautiful-- >> Business analysts, not traditional data scientists. >> Not traditional data scientists. >> That sounds equivalent to how IBM, a few years back, was able to bring more of a visual experience to SPSS proper to enable the business analysts of the world to build and do data-mining and so forth with structured data. Go ahead, I don't want to steal your thunder here. >> No, no, precisely. (laughs) >> But I see it's the same phenomenon, you bring the same capability to greatly expand the range of data professionals who can do, in this case, do machine learning hopefully as well as professional, dedicated data scientists. >> Certainly, now what we have to also understand is that data science is actually a team sport. It involves various stakeholders from the organization. From executive, that actually gives you the business use case to your data engineers that actually understand where your data is and can grant the access-- >> James: They manage the Hadoop clusters, many of them, yeah. >> Precisely. So they manage the Hadoop clusters, they actually manage your relational databases, because we have to realize that not all the data is in the datalinks yet, you have legacy systems, which DSX allows you to actually connect to and integrate to get data from. It also allows you to actually consume data from streaming sources, so if you actually have a Kafka message cob and actually were streaming data from your applications or IoT devices, you can actually integrate all those various data sources and federate them within the DSX to use for machine training models. Now, this is all around predictive analytics. But what if I tell you that right now with the DSX you can actually do prescriptive analytics as well? With the 1.2, again I'm going to be coming back to this 1.2 DSX with the most recent release we have actually added decision optimization, an industry-leading solution from IBM-- >> Prescriptive analytics, gotcha-- >> Yes, for prescriptive analysis. So now if you have warehouses, or you have a fleet of trucks, or you want to optimize the flow in let's say, a utility company, whether it be for power or could it be for, let's say for water, you can actually create and train prescriptive models within DSX and deploy them the same fashion as you will deploy and manage your SPSS streams as well as the machine learning models from Spark, from Python, so with XGBoost, Tensorflow, Keras, all those various aspects. >> James: Mmmhmm. >> Now what's going to get really exciting in the next two months, DSX will actually bring in natural learning language processing and text analysis and sentiment analysis by Vio X. So Watson Explorer, it's another offering from IBM... >> James: It's called, what is the name of it? >> Watson Explorer. >> Oh Watson Explorer, yes. >> Watson Explorer, yes. >> So now you're going to have this collaborative message platform, extendable! Extendable collaborative platform that can actually install and run in your data centers without the need to access internet. That's actually critical. Yes, we can deploy an IWS. Yes we can deploy an Azure. On Google Cloud, definitely we can deploy in Softlayer and we're very good at that, however in the majority of cases we find that the customers have challenges for bringing the data out to the cloud environments. Hence, with DSX, we designed it to actually deploy and run and scale everywhere. Now, how we have done it, we've embraced open source. This was a huge shift within IBM to realize that yes we do have 350,000 employees, yes we could develop container technologies, but why? Why not embrace what is actually industry standards with the Docker and equivalent as they became industry standards? Bring in RStudio, the Jupyter, the Zeppelin Notebooks, bring in the ability for a data scientist to choose the environments they want to work with and actually extend them and make the deployments of web services, applications, the models, and those are actually full releases, I'm not only talking about the model, I'm talking about the scripts that can go with that ability to actually pull the data in and allow the models to be re-trained, evaluated and actually re-deployed without taking them down. Now that's what actually becomes, that's what is the true differentiator when it comes to DSX, and all done in either your public or private cloud environments. >> So that's coming in the next version of DSX? >> Outside of DSX-- >> James: We're almost out of time, so-- >> Oh, I'm so sorry! >> No, no, no. It's my job as the host to let you know that. >> Of course. (laughs) >> So if you could summarize where DSX is going in 30 seconds or less as a product, the next version is, what is it? >> It's going to be the 1.2.1. >> James: Okay. >> 1.2.1 and we're expecting to release at the end of June. What's going to be unique in the 1.2.1 is infusing the text and sentiment analysis, so natural language processing with predictive and prescriptive analysis for both developers and your business analysts. >> James: Yes. >> So essentially a platform not only for your data scientist but pretty much every single persona inside the organization >> Including your marketing professionals who are baking sentiment analysis into what they do. Thank you very much. This has been Piotr Mierzejewski of IBM. He's a Program Manager for DSX and for ML, AI, and data science solutions and of course a strong partnership is with Hortonworks. We're here at Dataworks Summit in Berlin. We've had two excellent days of conversations with industry experts including Piotr. We want to thank everyone, we want to thank the host of this event, Hortonworks for having us here. We want to thank all of our guests, all these experts, for sharing their time out of their busy schedules. We want to thank everybody at this event for all the fascinating conversations, the breakouts have been great, the whole buzz here is exciting. GDPR's coming down and everybody's gearing up and getting ready for that, but everybody's also focused on innovative and disruptive uses of AI and machine learning and business, and using tools like DSX. I'm James Kobielus for the entire CUBE team, SiliconANGLE Media, wishing you all, wherever you are, whenever you watch this, have a good day and thank you for watching theCUBE. (upbeat music)

Published Date : Apr 19 2018

SUMMARY :

brought to you by Hortonworks. and to train models in team data science and how you and Hortonworks are serving your customers, Thank you for inviting me here, very excited. from Python, from R, from Scala, you have access to Spark. GDPR and so forth, so hot. that doesn't limit you to just one kind of technology Very good, now that part I'm going to get into very shortly, and then push it remotely to be executed where your data is. Now, you don't want to force your data scientists of the world to build and do data-mining (laughs) you bring the same capability the business use case to your data engineers James: They manage the Hadoop clusters, With the 1.2, again I'm going to be coming back to this as you will deploy and manage your SPSS streams in the next two months, DSX will actually bring in and allow the models to be re-trained, evaluated It's my job as the host to let you know that. (laughs) is infusing the text and sentiment analysis, and of course a strong partnership is with Hortonworks.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Piotr MierzejewskiPERSON

0.99+

James KobielusPERSON

0.99+

JamesPERSON

0.99+

IBMORGANIZATION

0.99+

PiotrPERSON

0.99+

HortonworksORGANIZATION

0.99+

30 secondsQUANTITY

0.99+

BerlinLOCATION

0.99+

IWSORGANIZATION

0.99+

PythonTITLE

0.99+

SparkTITLE

0.99+

twoQUANTITY

0.99+

FirstQUANTITY

0.99+

ScalaTITLE

0.99+

Berlin, GermanyLOCATION

0.99+

350,000 employeesQUANTITY

0.99+

DSXORGANIZATION

0.99+

MacCOMMERCIAL_ITEM

0.99+

two thingsQUANTITY

0.99+

RStudioTITLE

0.99+

DSXTITLE

0.99+

DSX 1.2TITLE

0.98+

both developersQUANTITY

0.98+

secondQUANTITY

0.98+

GDPRTITLE

0.98+

Watson ExplorerTITLE

0.98+

Dataworks Summit 2018EVENT

0.98+

first lineQUANTITY

0.98+

Dataworks Summit Europe 2018EVENT

0.98+

SiliconANGLE MediaORGANIZATION

0.97+

end of JuneDATE

0.97+

TensorFlowTITLE

0.97+

thousands of librariesQUANTITY

0.96+

RTITLE

0.96+

JupyterORGANIZATION

0.96+

1.2.1OTHER

0.96+

two excellent daysQUANTITY

0.95+

Dataworks SummitEVENT

0.94+

Dataworks Summit EU 2018EVENT

0.94+

SPSSTITLE

0.94+

oneQUANTITY

0.94+

AzureTITLE

0.92+

one kindQUANTITY

0.92+

theCUBEORGANIZATION

0.92+

HDPORGANIZATION

0.91+

Mandy Chessell, IBM | Dataworks Summit EU 2018


 

>> Announcer: From Berlin, Germany, it's the Cube covering Dataworks Summit Europe 2018. Brought to you by Hortonworks. (electronic music) >> Well hello welcome to the Cube I'm James Kobielus. I'm the lead analyst for big data analytics within the Wikibon team of SiliconANGLE Media. I'm hosting the Cube this week at Dataworks Summit 2018 in Berlin, Germany. It's been an excellent event. Hortonworks, the host, had... We've completed two days of keynotes. They made an announcement of the Data Steward Studio as the latest of their offerings and demonstrated it this morning, to address GDPR compliance, which of course is hot and heavy is coming down on enterprises both in the EU and around the world including in the U.S. and the May 25th deadline is fast approaching. One of Hortonworks' prime partners is IBM. And today on this Cube segment we have Mandy Chessell. Mandy is a distinguished engineer at IBM who did an excellent keynote yesterday all about metadata and metadata management. Mandy, great to have you. >> Hi and thank you. >> So I wonder if you can just reprise or summarize the main take aways from your keynote yesterday on metadata and it's role in GDPR compliance, so forth and the broader strategies that enterprise customers have regarding managing their data in this new multi-cloud world where Hadoop and open source platforms are critically important for storing and processing data. So Mandy go ahead. >> So, metadata's not new. I mean it's basically information about data. And a lot of companies are trying to build a data catalog which is not a catalog of, you know, actually containing their data, it's a catalog that describes their data. >> James: Is it different with index or a glossary. How's the catalog different from-- >> Yeah, so catalog actually includes both. So it is a list of all the data sets plus a links to glossary definitions of what those data items mean within the data sets, plus information about the lineage of the data. It includes information about who's using it, what they're using it for, how it should be governed. >> James: It's like a governance repository. >> So governance is part of it. So the governance part is really saying, "This is how you're allowed to use it, "this is how the data's classified," "these are the automated actions that are going to happen "on the data as it's used "within the operational environment." >> James: Yeah. >> So there's that aspect to it, but there is the collaboration side. Hey I've been using this data set it's great. Or, actually this data set is full of errors, we can't use it. So you've got feedback to data set owners as well as, exchange and collaboration between data scientists working with the data. So it's really, it is a central resource for an organization that has a strong data strategy, is interested in becoming a data-driven organization as such, so, you know, this becomes their major catalog over their data assets, and how they're using it. So when a regulator comes in and says, "can you show up, show me that you're "managing personal data?" The data catalog will have the information about where personal data's located, what type of infrastructure it's sitting on, how it's being used by different services. So they can really show that they know what they're doing and then from that they can show how to processes are used in the metadata in order to use the data appropriately day to day. >> So Apache Atlas, so it's basically a catalog, if I understand correctly at least for IBM and Hortonworks, it's Hadoop, it's Apache Atlas and Apache Atlas is essentially a metadata open source code base. >> Mandy: Yes, yes. >> So explain what Atlas is in this context. >> So yes, Atlas is a collection of code, but it supports a server, a graph-based metadata server. It also supports-- >> James: A graph-based >> Both: Metadata server >> Yes >> James: I'm sorry, so explain what you mean by graph-based in this context. >> Okay, so it runs using the JanusGraph, graph repository. And this is very good for metadata 'cause if you think about what it is it's connecting dots. It's basically saying this data set means this value and needs to be classified in this way and this-- >> James: Like a semantic knowledge graph >> It is, yes actually. And on top of it we impose a type system that describes the different types of things you need to control and manage in a data catalog, but the graph, the Atlas component gives you that graph-based, sorry, graph-based repository underneath, but on top we've built what we call the open metadata and governance libraries. They run inside Atlas so when you run Atlas you will have all the open metadata interfaces, but you can also take those libraries and connect them and load them actually into another vendor's product. And what they're doing is allowing metadata to be exchanged between repositories of different types. And this becomes incredibly important as an organization increases their maturity and their use of data because you can't just have knowledge about data in a single server, it just doesn't scale. You need to get that knowledge into every runtime environment, into the data tools that people are using across the organization. And so it needs to be distributed. >> Mandy I'm wondering, the whole notion of what you catalog in that repository, does it include, or does Apache Atlas support adding metadata relevant to data derivative assets like machine learning models-- >> Mandy: Absolutely. >> So forth. >> Mandy: Absolutely, so we have base types in the upper metadata layer, but also it's a very flexible and sensible type system. So, if you've got a specialist machine learning model that needs additional information stored about it, that can easily be added to the runtime environment. And then it will be managed through the open metadata protocols as if it was part of the native type system. >> Because of the courses in analysts, one of my core areas is artificial intelligence and one of the hot themes in artificial, well there's a broad umbrella called AI safety. >> Mandy: Yeah. >> And one of the core subsets of that is something called explicable AI, being able to identify the lineage of a given algorithmic decision back to what machine learning models fed from what data. >> Mandy: Yeah. >> Throw what action like when let's say a self-driving vehicle hits a human being for legal, you know, discovery whatever. So what I'm getting at, what I'm working through to is the extent to which the Hortonworks, IBM big data catalog running Atlas can be a foundation for explicable AI either now or in the future. We see a lot of enterprise, me as an analyst at least, sees lots of enterprises that are exploring this topic, but it's not to the point where it's in production, explicable AI, but where clearly companies like IBM are exploring building a stack or a architecture for doing this kind of thing in a standardized way. What are your thoughts there? Is IBM working on bringing, say Atlas and the overall big data catalog into that kind of a use case. >> Yes, yeah, so if you think about what's required, you need to understand the data that was used to train the AI how, what data's been fed to it since it was deployed because that's going to change its behavior, and then also a view of how that data's going to change in the future so you can start to anticipate issues that might arising from the model's changing behavior. And this is where the data catalog can actually associate and maintain information about the data that's being used with the algorithm. You can also associate the checking mechanism that's constantly monitoring the profile of the data so you can see where the data is changing over time, that will obviously affect the behavior of the machine learning model. So it's really about providing, not just information about the model itself, but also the data that's feeding it, how those characteristics are changing over time so that you know the model is continuing to work into the future. >> So tell us about the IBM, Hortonworks partnership on metadata and so forth. >> Mandy: Okay. >> How is that evolving? So, you know, your partnership is fairly tight. You clearly, you've got ODPI, you've got the work that you're doing related to the big data catalog. What can we expect to see in the near future in terms of, initiatives building on all of that for governance of big data in the multi-cloud environment? >> Yeah so Hortonworks started the Apache Atlas project a couple of years ago with a number of their customers. And they built a base repository and a set of APIs that allow it to work in the Hadoop environment. We came along last year, formed our partnership. That partnership includes this open metadata and governance layer. So since then we worked with ING as well and ING bring the, sort of, user perspective, this is the organization's use of the data. And, so between the three of us we are basically transforming Apache Atlas from an Hadoop focused metadata repository to an enterprise focused metadata repository. Plus enabling other vendors to connect into the open metadata ecosystem. So we're standardizing types, standardizing format, the format of metadata, there's a protocol for exchanging metadata between repositories. And this is all coming from that three-way partnership where you've got a consuming organization, you've got a company who's used to building enterprise middleware, and you've got Hortonworks with their knowledge of open source development in their Hadoop environment. >> Quick out of left field, as you develop this architecture, clearly you're leveraging Hadoop HTFS for storage. Are you looking to at least evaluating maybe using block chain for more distributed management of the metadata in these heterogeneous environments in the multi-cloud, or not? >> So Atlas itself does run on HTFS, but doesn't need to run on HTFS, it's got other storage environments so that we can run it outside of Hadoop. When it comes to block chain, so block chain is, for, sharing data between partners, small amounts of data that basically express agreements, so it's like a ledger. There are some aspects that we could use for metadata management. It's more that we actually need to put metadata management into block chain. So the agreements and contracts that are stored in block chain are only meaningful if we understand the data that's there, what it's quality, where it came from what it means. And so actually there's a very interesting distributor metadata question that comes with the block chain technology. And I think that's an important area of research. >> Well Mandy we're at the end of our time. Thank you very much. We could go on and on. You're a true expert and it's great to have you on the Cube. >> Thank you for inviting me. >> So this is James Kobielus with Mandy Chessell of IBM. We are here this week in Berlin at Dataworks Summit 2018. It's a great event and we have some more interviews coming up so thank you very much for tuning in. (electronic music)

Published Date : Apr 19 2018

SUMMARY :

Announcer: From Berlin, Germany, it's the Cube I'm hosting the Cube this week at Dataworks Summit 2018 and the broader strategies that enterprise customers which is not a catalog of, you know, actually containing How's the catalog different from-- So it is a list of all the data sets plus a links "these are the automated actions that are going to happen in the metadata in order to use So Apache Atlas, so it's basically a catalog, So yes, Atlas is a collection of code, James: I'm sorry, so explain what you mean and needs to be classified in this way that describes the different types of things you need in the upper metadata layer, but also it's a very flexible and one of the hot themes in artificial, And one of the core subsets of that the extent to which the Hortonworks, IBM big data catalog in the future so you can start to anticipate issues So tell us about the IBM, Hortonworks partnership for governance of big data in the multi-cloud environment? And, so between the three of us we are basically of the metadata in these heterogeneous environments So the agreements and contracts that are stored You're a true expert and it's great to have you on the Cube. So this is James Kobielus with Mandy Chessell of IBM.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
James KobielusPERSON

0.99+

Mandy ChessellPERSON

0.99+

IBMORGANIZATION

0.99+

INGORGANIZATION

0.99+

JamesPERSON

0.99+

threeQUANTITY

0.99+

BerlinLOCATION

0.99+

MandyPERSON

0.99+

HortonworksORGANIZATION

0.99+

May 25thDATE

0.99+

last yearDATE

0.99+

U.S.LOCATION

0.99+

two daysQUANTITY

0.99+

AtlasTITLE

0.99+

yesterdayDATE

0.99+

Berlin, GermanyLOCATION

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

Data Steward StudioORGANIZATION

0.99+

bothQUANTITY

0.99+

BothQUANTITY

0.98+

EULOCATION

0.98+

GDPRTITLE

0.98+

OneQUANTITY

0.98+

oneQUANTITY

0.98+

Dataworks Summit 2018EVENT

0.97+

Dataworks Summit EU 2018EVENT

0.96+

this weekDATE

0.94+

single serverQUANTITY

0.94+

HadoopTITLE

0.94+

todayDATE

0.93+

this morningDATE

0.93+

three-way partnershipQUANTITY

0.93+

WikibonORGANIZATION

0.91+

Hortonworks'ORGANIZATION

0.9+

AtlasORGANIZATION

0.89+

Dataworks Summit Europe 2018EVENT

0.89+

couple of years agoDATE

0.87+

Apache AtlasTITLE

0.86+

CubeCOMMERCIAL_ITEM

0.83+

ApacheORGANIZATION

0.82+

JanusGraphTITLE

0.79+

hot themesQUANTITY

0.68+

HadoORGANIZATION

0.67+

Hadoop HTFSTITLE

0.63+

Dave McDonnell, IBM | Dataworks Summit EU 2018


 

>> Narrator: From Berlin, Germany, it's theCUBE (relaxing music) covering DataWorks Summit Europe 2018. (relaxing music) Brought to you by Hortonworks. (quieting music) >> Well, hello and welcome to theCUBE. We're here at DataWorks Summit 2018 in Berlin, Germany, and it's been a great show. Who we have now is we have IBM. Specifically we have Dave McDonnell of IBM, and we're going to be talkin' with him for the next 10 minutes or so about... Dave, you explain. You are in storage for IBM, and IBM of course is a partner of Hortonworks who are of course the host of this show. So Dave, have you been introduced, give us your capacity or roll at IBM. Discuss the partnership of Hortonworks, and really what's your perspective on the market for storage systems for Big Data right now and going forward? And what kind of work loads and what kind of requirements are customers coming to you with for storage systems now? >> Okay, sure, so I lead alliances for the storage business unit, and Hortonworks, we actually partner with Hortonworks not just in our storage business unit but also with our analytics counterparts, our power counterparts, and we're in discussions with many others, right? Our partner organization services and so forth. So the nature of our relationship is quite broad compared to many of our others. We're working with them in the analytics space, so these are a lot of these Big Data Data Lakes, BDDNA a lot of people will use as an acronym. These are the types of work loads that customers are using us both for. >> Mm-hmm. >> And it's not new anymore, you know, by now they're well past their first half dozen applications. We've got customers running hundreds of applications. These are production applications now, so it's all about, "How can I be more efficient? "How can I grow this? "How can I get the best performance and scalability "and ease of management to deploy these "in a way that's manageable?" 'cause if I have 400 production applications, that's not off in any corner anymore. So that's how I'd describe it in a nutshell. >> One of the trends that we're seeing at Wikibon, of course I'm the lead analyst for Big Data Analytics at Wikibon under SiliconANGLE Media, we're seeing a trend in the marketplace towards I wouldn't call them appliances, but what I would call them is workload optimized hardware software platforms so they can combine storage with compute and are optimized for AI and machine learning and so forth. Is that something that you're hearing from customers, that they require those built-out, AI optimized storage systems, or is that far in the future or? Give me a sense for whether IBM is doing anything in that area and whether that's on your horizon. >> If you were to define all of IBM in five words or less, you would say "artificial intelligence and cloud computing," so this is something' >> Yeah. that gets a lot of thought in Mindshare. So absolutely we hear about it a lot. It's a very broad market with a lot of diverse requirements. So we hear people asking for the Converged infrastructure, for Appliance solutions. There's of course Hyper Converged. We actually have, either directly or with partners, answers to all of those. Now we do think one of the things that customers want to do is they're going to scale and grow in these environments is to take a software-defined strategy so they're not limited, they're not limited by hardware blocks. You know, they don't want to have to buy processing power and spend all that money on it when really all they need is more data. >> Yeah. >> There's pros and cons to the different (mumbles). >> You have power AI systems, I know that, so that's where they're probably heading, yeah. >> Yes, yes, yes. So of course, we have packages that we've modeled in AI. They feed off of some of the Hortonworks data lakes that we're building. Of course we see a lot of people putting these on new pieces of infrastructure because they don't want to put this on their production applications, so they're extracting data from maybe a Hortonworks data lake number one, Hortonworks data lake number two, some of the EDWs, some external data, and putting that into the AI infrastructure. >> As customers move their cloud infrastructures towards more edge facing environments, or edge applications, how are storage requirements change or evolving in terms of in the move to edge computing. Can you give us a sense for any sort of trends you're seeing in that area? >> Well, if we're going to the world of AI and cognitive applications, all that data that I mighta thrown in the cloud five years ago I now, I'm educated enough 'cause I've been paying bills for a few years on just how expensive it is, and if I'm going to be bringing that data back, some of which I don't even know I'm going to be bringing back, it gets extremely expensive. So we see a pendulum shift coming back where now a lot of data is going to be on host, ah sorry, on premise, but it's not going to stay there. They need the flexibility to move it here, there, or everywhere. So if it's going to come back, how can we bring customers some of that flexibility that they liked about the cloud, the speed, the ease of deployment, even a consumption based model? These are very big changes on a traditional storage manufacturer like ourselves, right? So that's requiring a lot of development in software, it's requiring a lot of development in our business model, and one of the biggest thing you hear us talk about this year is IBM Cloud Private, which does exactly that, >> Right. and it gives them somethin' they can work with that's flexible, it's agile, and allows you to take containerized based applications and move them back and forth as you please. >> Yeah. So containerized applications. So if you can define it for our audience, what is a containerized application? You talk about Docker and orchestrate it through Kubernetes and so forth. So you mentioned Cloud Private. Can you bring us up to speed on what exactly Cloud Private is and in terms of the storage requirements or storage architecture within that portfolio? >> Oh yes, absolutely. So this is a set of infrastructure that's optimized for on-premise deployment that gives you multi-cloud access, not just IBM Cloud, Amazon Web Services, Microsoft Azure, et cetera, and then it also gives you multiple architectural choices basically wrapped by software to allow you to move those containers around and put them where you want them at the right time at the right place given the business requirement at that hour. >> Now is the data storager persisted in the container itself? I know that's fairly difficult to do in a Docker environment. How do ya handle persistence of data for containerized applications within your architecture? >> Okay, some of those are going to be application specific. It's the question of designing the right data management layer depending on the application. So we have software intelligence, some of it from open source, some of which we add on top of open source to bring some of the enterprise resilience and performance needed. And of course, you have to be very careful if the biggest trend in the world is unstructured data. Well, okay fine, it's a lot of sensor data. That's still fairly easy to move around. But once we get into things like medical images, lots of video, you know, HD video, 4K video, those are the things which you have to give a lot of thought to how to do that. And that's why we have lots of new partners that we work with the help us with edge cloud, which gives that on premise-like performance in really a cloud-like set up. >> Here's a question out of left field, and you may not have the answer, but I would like to hear your thoughts on this. How has Blockchain, and IBM's been making significant investments in blockchain technology database technology, how is blockchain changing the face of the storage industry in terms of customers' requirements for a storage systems to manage data in distributed blockchains? Is that something you're hearing coming from customers as a requirement? I'm just tryin' to get a sense for whether that's, you know, is it moving customers towards more flash, towards more distributed edge-oriented or edge deployed storage systems? >> Okay, so yes, yes, and yes. >> Okay. So all of a sudden, if you're doing things like a blockchain application, things become even more important than they are today. >> Yeah. >> Okay, so you can't lose a transaction. You can't have a storage going down. So there's a lot more care and thought into the resiliency of the infrastructure. If I'm, you know, buying a diamond from you, I can't accept the excuse that my $100,000 diamond, maybe that's a little optimistic, my $10,000 diamond or yours, you know, the transaction's corrupted because the data's not proper. >> Right. >> Or if I want my privacy, I need to be assured that there's good data governance around that transaction, and that that will be protected for a good 10, 20, and 30 years. So it's elevating the importance of all the infrastructure to a whole different level. >> Switching our focus slightly, so we're here at DataWorks Summit in Berlin. Where are the largest growth markets right now for cloud storage systems? Is it Apache, is it the North America, or where are the growth markets in terms of regions, in terms of vertical industries right now in the marketplace for enterprise grade storage systems for big data in the cloud? >> That's a great question, 'cause we certainly have these conversations globally. I'd say the place where we're seeing the most activity would be the Americas, we see it in China. We have a lot of interesting engagements and people reaching out to us. I would say by market, you can also point to financial services in more than those two regions. Financial services, healthcare, retail, these are probably the top verticals. I think it's probably safe to assume, and we can the federal governments also have a lot of stringent requirements and, you know, requirements, new applications around the space as well. >> Right. GDPR, how is that impacting your customers' storage requirements. The requirement for GDPR compliance, is that moving the needle in terms of their requirement for consolidated storage of the data that they need to maintain? I mean obviously there's a security, but there's just the sheer amount of, there's a leading to consolidation or centralization of storage, of customer data, that would seem to make it easier to control and monitor usage of the data. Is it making a difference at all? >> It's making a big difference. Not many people encrypt data today, so there's a whole new level of interest in encryption at many different levels, data at rest, data in motion. There's new levels of focus and attention on performance, on the ability for customers to get their arms around disparate islands of data, because now GDPR is not only a legal requirement that requires you to be able to have it, but you've also got timelines which you're expected to act on a request from a customer to have your data removed. And most of those will have a baseline of 30 days. So you can't fool around now. It's not just a nice to have. It's an actual core part of a business requirement that if you don't have a good strategy for, you could be spending tens of millions of dollars in liability if you're not ready for it. >> Well Dave, thank you very much. We're at the end of our time. This has been Dave McDonnell of IBM talking about system storage and of course a big Hortonworks partner. We are here on day two of the DataWorks Summit, and I'm James Kobielus of Wikibon SiliconANGLE Media, and have a good day. (upbeat music)

Published Date : Apr 19 2018

SUMMARY :

Brought to you by Hortonworks. are customers coming to you with for storage systems now? So the nature of our relationship is quite broad "and ease of management to deploy these One of the trends that we're seeing at Wikibon, and spend all that money on it to the different (mumbles). so that's where they're probably heading, yeah. and putting that into the AI infrastructure. in terms of in the move to edge computing. and one of the biggest thing you hear us and allows you to take containerized based applications and in terms of the storage requirements and put them where you want them at the right time in the container itself? And of course, you have to be very careful and you may not have the answer, and yes. So all of a sudden, Okay, so you can't So it's elevating the importance of all the infrastructure for big data in the cloud? and people reaching out to us. is that moving the needle in terms of their requirement on the ability for customers to get their arms around and of course a big Hortonworks partner.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
NicolaPERSON

0.99+

MichaelPERSON

0.99+

DavidPERSON

0.99+

JoshPERSON

0.99+

MicrosoftORGANIZATION

0.99+

DavePERSON

0.99+

Jeremy BurtonPERSON

0.99+

Paul GillonPERSON

0.99+

GMORGANIZATION

0.99+

Bob StefanskiPERSON

0.99+

Lisa MartinPERSON

0.99+

Dave McDonnellPERSON

0.99+

amazonORGANIZATION

0.99+

JohnPERSON

0.99+

James KobielusPERSON

0.99+

KeithPERSON

0.99+

Paul O'FarrellPERSON

0.99+

IBMORGANIZATION

0.99+

Keith TownsendPERSON

0.99+

BMWORGANIZATION

0.99+

FordORGANIZATION

0.99+

David SiegelPERSON

0.99+

CiscoORGANIZATION

0.99+

SandyPERSON

0.99+

Nicola AcuttPERSON

0.99+

PaulPERSON

0.99+

David LantzPERSON

0.99+

Stu MinimanPERSON

0.99+

threeQUANTITY

0.99+

LisaPERSON

0.99+

LithuaniaLOCATION

0.99+

MichiganLOCATION

0.99+

AWSORGANIZATION

0.99+

General MotorsORGANIZATION

0.99+

AppleORGANIZATION

0.99+

AmericaLOCATION

0.99+

CharliePERSON

0.99+

EuropeLOCATION

0.99+

Pat GelsingPERSON

0.99+

GoogleORGANIZATION

0.99+

BobbyPERSON

0.99+

LondonLOCATION

0.99+

Palo AltoLOCATION

0.99+

DantePERSON

0.99+

SwitzerlandLOCATION

0.99+

six-weekQUANTITY

0.99+

VMwareORGANIZATION

0.99+

SeattleLOCATION

0.99+

BobPERSON

0.99+

Amazon Web ServicesORGANIZATION

0.99+

100QUANTITY

0.99+

Michael DellPERSON

0.99+

John WallsPERSON

0.99+

AmazonORGANIZATION

0.99+

John FurrierPERSON

0.99+

CaliforniaLOCATION

0.99+

Sandy CarterPERSON

0.99+

John Kreisa, Hortonworks | Dataworks Summit EU 2018


 

>> Narrator: From Berlin, Germany, it's theCUBE. Covering Dataworks Summit Europe 2018. Brought to you by Hortonworks. >> Hello, welcome to theCUBE. We're here at Dataworks Summit 2018 in Berlin, Germany. I'm James Kobielus. I'm the lead analyst for Big Data Analytics, within the Wikibon team of SiliconAngle Media. Our guest is John Kreisa. He's the VP for Marketing at Hortonworks, of course, the host company of Dataworks Summit. John, it's great to have you. >> Thank you Jim, it's great to be here. >> We go long back, so you know it's always great to reconnect with you guys at Hortonworks. You guys are on a roll, it's been seven years I think since you guys were founded. I remember the founding of Hortonworks. I remember when it splashed in the Wall Street Journal. It was like oh wow, this big data thing, this Hadoop thing is actually, it's a market, it's a segment and you guys have built it. You know, you and your competitors, your partners, your ecosystem continues to grow. You guys went IPO a few years ago. Your latest numbers are pretty good. You're continuing to grow in revenues, in customer acquisitions, your deal sizes are growing. So Hortonworks remains on a roll. So, I'd like you to talk right now, John, and give us a sense of where Hortonworks is at in terms of engaging with the marketplace, in terms of trends that you're seeing, in terms of how you're addressing them. But talk about first of all the Dataworks Summit. How many attendees do you have from how many countries? Just give us sort of the layout of this show. >> I don't have all of the final counts yet. >> This is year six of the show? >> This is year six in Europe, absolutely, thank you. So it's great, we've moved it around different locations. Great venue, great host city here in Berlin. Super excited about it, I know we have representatives from more than 51 countries. If you think about that, drawing from a really broad set of countries, well beyond, as you know, because you've interviewed some of the folks beyond just Europe. We've had them from South America, U.S., Africa, and Asia as well, so really a broad swath of the open-source and big data community, which is great. The final attendance is going to be 1,250 to 1,300 range. The final numbers, but a great sized conference. The energy level's been really great, the sessions have been, you know, oversubscribed, standing room only in many of the popular sessions. So the community's strong, I think that's the thing that we really see here and that we're really continuing to invest in. It's something that Hortonworks was founded around. You referenced the founding, and driving the community forward and investing is something that has been part of our mantra since we started and it remains that way today. >> Right. So first of all what is Hortonworks? Now how does Hortonworks position itself? Clearly Hadoop is your foundation, but you, just like Cloudera, MapR, you guys have all continued to evolve to address a broader range of use-cases with a deeper stack of technology with fairly extensive partner ecosystems. So what kind of a beast is Hortonworks? It's an elephant, but what kind of an elephant is it? >> We're an elephant or riding on the elephant I'd say, so we're a global data management company. That's what we're helping organizations do. Really the end-to-end lifecycle of their data, helping them manage it regardless of where it is, whether it's on-premise or in the cloud, really through hybrid data architectures. That's really how we've seen the market evolve is, we started off in terms of our strategy with the platform based on Hadoop, as you said, to store, process, and analyze data at scale. The kind of fundamental use-case for Hadoop. Then as the company emerged, as the market kind of continued to evolve, we moved to and saw the opportunity really, capturing data from the edge. As IOT and kind of edge-use cases emerged it made sense for us to add to the platform and create the Hortonworks DataFlow. >> James: Apache NiFi >> Apache NiFi, exactly, HDF underneath, with associated additional open-source projects in there. Kafka and some streaming and things like that. So that was now move data, capture data in motion, move it back and put it into the platform for those large data applications that organizations are building on the core platform. It's also the next evolution, seeing great attach rates with that, the really strong interest in the Apache NiFi, you know, the meetup here for NiFi was oversubscribed, so really really strong interest in that. And then, the markets continued to evolve with cloud and cloud architectures, customers wanting to deploy in the cloud. You know, you saw we had that poll yesterday in the general session about cloud with really interesting results, but we saw that there was really companies wanting to deploy in a hybrid way. Some of them wanted to move specific workloads to the cloud. >> Multi-cloud, public, private. >> Exactly right, and multi-data center. >> The majority of your customer deployments are on prem. >> They are. >> Rob Bearden, your CEO, I think he said in a recent article on SiliconAngle that two-thirds of your deployments are on prem. Is that percentage going down over time? Are more of your customers shifting toward a public cloud orientation? Does Hortonworks worry about that? You've got partnerships, clearly, with the likes of IBM, AWS, and Microsoft Dasher and so forth, so do you guys see that as an opportunity, as a worrisome trend? >> No, we see it very much as an opportunity. And that's because we do have customers who are wanting to put more workloads and run things in the cloud, however, there's still almost always a component that's going to be on premise. And that creates a challenge for organizations. How do they manage the security and governance and really the overall operations of those deployments as they're in the cloud and on premise. And, to your point, multi-cloud. And so you get some complexity in there around that deployment and particularly with the regulations, we talked about GDPR earlier today. >> Oh, by the way, the Data Steward Studio demo today was really, really good. It showed that, first of all, you cover the entire range of core requirements for compliance. So that was actually the primary announcement at this show; Scott Gnau announced that. You demoed it today, I think you guys are off on a good start, yeah. We've gotten really, and thank you for that, we've gotten really good feedback on our DataPlane Services strategy, right, it provides that single pane of glass. >> I should say to our viewers that Data Steward Studio is the second of the services under the DataPlane, the Hortonworks DataPlane Services Portfolio. >> That's right, that's exactly right. >> Go ahead, keep going. >> So, you know, we see that as an opportunity. We think we're very strongly positioned in the market, being the first to bring that kind of solution to the customers and our large customers that we've been talking about and who have been starting to use DataPlane have been very, very positive. I mean they see it as something that is going to help them really kind of maintain control over these deployments as they start to spread around, as they grow their uses of the thing. >> And it's built to operate across the multi-cloud, I know this as well in terms of executing the consent or withdrawal of consent that the data subject makes through what is essentially a consent portal. >> That's right, that's right. >> That was actually a very compelling demonstration in that regard. >> It was good, and they worked very hard on it. And I was speaking to an analyst yesterday, and they were saying that they're seeing an increasing number of the customers, enterprises, wanting to have a multi-cloud strategy. They don't want to get locked into any one public cloud vendor, so, what they want is somebody who can help them maintain that common security and governance across their different deployments, and they see DataPlane Services is the way that's going to help them do that. >> So John, how is Hortonworks, what's your road map, how do you see the company in your go to market evolving over the coming years in terms of geographies, in terms of your focuses? Focus, in terms of the use-cases and workloads that the Hortonworks portfolio addresses. How is that shifting? You mentioned the Edge. AI, machine learning, deep learning. You are a reseller of IBM Data Science Experience. >> DSX, that's right. >> So, let's just focus on that. Do you see more customers turning to Hortonworks and IBM for a complete end-to-end pipeline for the ingest, for the preparation, modeling, training and so forth? And deployment of operationalized AI? Is that something you see going forward as an evolution path for your capabilities? >> I'd say yes, long-term, or even in the short-term. So, they have to get their data house in order, if you will, before they get to some of those other things, so we're still, Hortonworks strategy has always been focused on the platform aspect, right? The data-at-rest platform, data-in-motion platform, and now a platform for managing common security and governance across those different deployments. Building on that is the data science, machine learning, and AI opportunity, but our strategy there, as opposed to trying to trying to do it ourselves, is to partner, so we've got the strong partnership with IBM, resell their DSX product. And also other partnerships around to deliver those other capabilities, like machine learning and AI, from our partner ecosystem, which you referenced. We have over 2,300 partners, so a very, very strong ecosystem. And so, we're going to stick to our strategy of the platforms enabling that, which will subsequently enable data science, machine learning, and AI on top. And then, if you want me to talk about our strategy in terms of growth, so we already operate globally. We've got offices in I think 19 different countries. So we're really covering the globe in terms of the demand for Hortonworks products and beginning implements. >> Where's the fastest growing market in terms of regions for Hortonworks? >> Yeah, I mean, international generally is our fastest growing region, faster than the U.S. But we're seeing very strong growth in APAC, actually, so India, Asian countries, Singapore, and then up and through to Japan. There's a lot of growth out in the Asian region. And, you know, they're sort of moving directly to digital transformation projects at really large scale. Big banks, telcos, from a workload standpoint I'd say the patterns are very similar to what we've seen. I've been at Hortonworks for six and a half years, as it turns out, and the patterns we saw initially in terms of adoption in the U.S. became the patterns we saw in terms of adoption in Europe and now those patterns of adoption are the same in Asia. So, once a company realizes they need to either drive out operational costs or build new data applications, the patterns tend to be the same whether it's retail, financial services, telco, manufacturing. You can sort of replicate those as they move forward. >> So going forward, how is Hortonworks evolving as a company in terms of, for example with GDPR, Data Steward, data governance as a strong focus going forward, are you shifting your model in terms of your target customer away from the data engineers, the Hadoop cluster managers who are still very much the center of it, towards more data governance, towards more business analyst level of focus. Do you see Hortonworks shifting in that direction in terms of your focus, go to market, your message and everything? >> I would say it's not a shifting as much as an expansion, so we definitely are continuing to invest in the core platform, in Hadoop, and you would have heard of some of the changes that are coming in the core Hadoop 3.0 and 3.1 platform here. Alan and others can talk about those details, and in Apache NiFi. But, to your point, as we bring and have brought Data Steward Studio and DataPlane Services online, that allows us to address a different user within the organization, so it's really an expansion. We're not de-investing in any other things. It's really here's another way in a natural evolution of the way that we're helping organizations solve data problems. >> That's great, well thank you. This has been John Kreisa, he's the VP for marketing at Hortonworks. I'm James Kobielus of Wikibon SiliconAngle Media here at Dataworks Summit 2018 in Berlin. And it's been great, John, and thank you very much for coming on theCUBE. >> Great, thanks for your time. (techno music)

Published Date : Apr 19 2018

SUMMARY :

Brought to you by Hortonworks. of course, the host company of Dataworks Summit. to reconnect with you guys at Hortonworks. the sessions have been, you know, oversubscribed, you guys have all continued to evolve to address the platform based on Hadoop, as you said, in the Apache NiFi, you know, the meetup here so do you guys see that as an opportunity, and really the overall operations of those Oh, by the way, the Data Steward Studio demo today is the second of the services under the DataPlane, being the first to bring that kind of solution that the data subject makes through in that regard. an increasing number of the customers, Focus, in terms of the use-cases and workloads for the preparation, modeling, training and so forth? Building on that is the data science, machine learning, in terms of adoption in the U.S. the data engineers, the Hadoop cluster managers in the core platform, in Hadoop, and you would have This has been John Kreisa, he's the Great, thanks for your time.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
AlanPERSON

0.99+

James KobielusPERSON

0.99+

JimPERSON

0.99+

Rob BeardenPERSON

0.99+

IBMORGANIZATION

0.99+

John KreisaPERSON

0.99+

EuropeLOCATION

0.99+

JohnPERSON

0.99+

AsiaLOCATION

0.99+

AWSORGANIZATION

0.99+

HortonworksORGANIZATION

0.99+

BerlinLOCATION

0.99+

yesterdayDATE

0.99+

AfricaLOCATION

0.99+

South AmericaLOCATION

0.99+

SiliconAngle MediaORGANIZATION

0.99+

U.S.LOCATION

0.99+

1,250QUANTITY

0.99+

Scott GnauPERSON

0.99+

1,300QUANTITY

0.99+

Berlin, GermanyLOCATION

0.99+

seven yearsQUANTITY

0.99+

six and a half yearsQUANTITY

0.99+

JapanLOCATION

0.99+

HadoopTITLE

0.99+

AsianLOCATION

0.99+

secondQUANTITY

0.98+

over 2,300 partnersQUANTITY

0.98+

todayDATE

0.98+

two-thirdsQUANTITY

0.98+

19 different countriesQUANTITY

0.98+

Dataworks SummitEVENT

0.98+

more than 51 countriesQUANTITY

0.98+

Hadoop 3.0TITLE

0.98+

firstQUANTITY

0.98+

JamesPERSON

0.98+

Data Steward StudioORGANIZATION

0.98+

Dataworks Summit EU 2018EVENT

0.98+

Dataworks Summit 2018EVENT

0.97+

ClouderaORGANIZATION

0.97+

MapRORGANIZATION

0.96+

GDPRTITLE

0.96+

DataPlane ServicesORGANIZATION

0.96+

SingaporeLOCATION

0.96+

year sixQUANTITY

0.95+

2018EVENT

0.95+

Wikibon SiliconAngle MediaORGANIZATION

0.94+

IndiaLOCATION

0.94+

HadoopORGANIZATION

0.94+

APACORGANIZATION

0.93+

Big Data AnalyticsORGANIZATION

0.93+

3.1TITLE

0.93+

Wall Street JournalTITLE

0.93+

oneQUANTITY

0.93+

ApacheORGANIZATION

0.92+

WikibonORGANIZATION

0.92+

NiFiTITLE

0.92+

Pankaj Sodhi, Accenture | Dataworks Summit EU 2018


 

>> Narrator: From Berlin, Germany, it's theCUBE. Covering Data Works Summit, Europe 2018. Brought to you by, Horton Works. >> Well hello, welcome to theCUBE. I am James Kobielus. I'm the lead analyst within the Wikbon Team at Silicon Angled Media, focused on big data analytics. And big data analytics is what Data Works Summit is all about. We are at Data Works Summit 2018 in Berlin, Germany. We are on day two, and I have, as my special guest here, Pankaj Sodhi, who is the big data practice lead with Accenture. He's based in London, and he's here to discuss really what he's seeing in terms of what his clients are doing with Big DSO. Hello, welcome Pankaj, how's it going? >> Thank you Jim, very pleased to be there. >> Great, great, so what are you seeing in terms of customers adoption of the dupe and so forth, big data platforms, for what kind of use cases are you seeing? GDPR is coming down very quickly, and we saw this poll this morning that John Chrysler, of Horton Works, did from the stage, and it's a little bit worrisome if you're an enterprise data administrator. Really, in enterprise period, because it sounds like not everybody in this audience, in fact a sizeable portion, is not entirely ready to comply with GDRP on day one, which is May 25th. What are you seeing, in terms of customer readiness, for this new regulation? >> So Jim, I'll answer the question in two ways. One was, just in terms of, you know, the adoption of Hadoop, and then, you know, get into GDPR. So in regards to Hadoop adoption, I think I would place clients in three different categories. The first ones are the ones that have been quite successful in terms of adoption of Hadoop. And what they've done there is taken a very use case driven approach to actually build up the capabilities to deploy these use cases. And they've taken an additive approach. Deployed hybrid architectures, and then taken the time. >> Jim: Hybrid public, private cloud? >> Cloud as well, but often sort of, on premise. Hybrid being, for example, with an EDW and product type AA. In that scenario, they've taken the time to actually work out some of the technical complexities and nuances of deploying these pipelines in production. Consequently, what they're in a good position to do now, is to leverage the best of Cloud computing, open so its technology, while it's looking at making the best getting the investment protection that they have from the premise deployments as well. So they're in a fairly good position. Another set of customers have done successful pilots looking at either optimization use cases. >> Jim: How so, Hadoob? >> Yes, leveraging Hadoob. Either again from a cost optimization play or potentially a Bon Sand escape abilities. And there in the process of going to production, and starting to work out, from a footprint perspective, what elements of the future pipelines are going to be on prim, potentially with Hadoop, or on cloud with Hadoop. >> When you say the pipeline in this context, what are you referring to? When I think of pipeline, in fact in our coverage of pipeline, it refers to an end to end life cycle for development and deployment and management of big data. >> Pankaj: Absolutely >> And analytics, so that's what you're saying. >> So all the way from ingestion to curation to consuming the data, through multiple different access spots, so that's the full pipeline. And I think what the organizations that have been successful have done is not just looked at the technology aspect, which is just Hadoop in this case, but looked at a mix of architecture, delivery approaches, governance, and skills. So I'd like to bring this to life by looking at advanced analytics as a use case. So rather than take the approach of lets ingest all data in a data lake, it's been driven by a use case mapped to a set of valuable data sets that can be ingested. But what's interesting then is the delivery approach has been to bring together diverse skill sets. For example, date engineers, data scientists, data ops and visualization folks, and then use them to actually challenge architecture and delivery approach. I think this is where, the key ingredient for success, which is, for me, the modern sort of Hadoob's pipeline, need to be iteratively built and deployed, rather than linear and monolithic. So this notion of, I have raw data, let me come up a minimally curated data set. And then look at how I can do future engineering and build an analytical model. If that works, and I need to enhance, get additional data attributes, I then enhance the pipeline. So this is already starting to challenge organizations architecture approaches, and how you also deploy into production. And I think that's been one of the key differences between organizations that have embarked on the journey, ingested the data, but not had a path to production. So I think that's one aspect. >> How are the data stewards of the world, or are they challenging the architecture, now that GDPR is coming down fast and furious, we're seeing, for example Horton Works architecture for data studio, are you seeing did the data govern as the data stewards of the world coming, sitting around the virtual table, challenging this architecture further to evolve? >> I think. >> To enable privacy by default and so forth? >> I think again, you know the organizations that have been successful have already been looking at privacy by design before GDPR came along. Now one of the reasons a lot of the data link implementation haven't been as successful, is the business haven't had the ability to actually curate the data sets, work out what the definitions are, what the curation levels are. So therefore, what we see with business glossaries, and sort of data architectures, from a GDPR perspective, we see this as an opportunity rather than a threat. So to actually make the data usable in the data lakes, we often talk to clients about this concept of the data marketplace. So in the data marketplace, what you need to have, is well curated data sets. The proper definition such will, for business glossary or a data catalog, underpin by the right user access model, and available for example through a search or API's. So, GDPR actually is. >> There's not a public market place, this is an architectural concept. >> Yes. >> It could be inside, completely inside, the private data center, but it's reusable data, it's both through API, and standard glossaries and meta data and so forth, is that correct? >> Correct, so data marketplace is reusable, both internally, for example, to unlock access to data scientists who might want to use the data set and then put that into a data lab. It can also be extended, from an APR perspective, for a third party data market place for exchanging data with consumers or third parties as organizations look at data monetization as well. And therefore, I think the role of data stewards is changing around a bit. Rather than looking at it from a compliance perspective, it's about how can we make data usable to the analysts and the data scientists. So actually focusing on getting the right definitions upfront, and as we curate and publish data, and as we enrich it, what's the next definition that comes of that? And actually have that available before we publish the data. >> That's a fascinating concept. So, the notion of a data steward or a data curator. It's sort of sounds like you're blending them. Where the data curator, their job, part of it, very much of it, involves identifying the relevance of data and the potential reusability and attractiveness of that data for various downstream uses and possibly being a player in the ongoing identification of the monetize-ability of data elements, both internally and externally in the (mumbles). Am I describing correctly? >> Pankaj: I think you are, yes. >> Jim: Okay. >> I think it's an interesting implication for the CDO function, because, rather than see the function being looked at as a policy. >> Jim: The chief data officer. >> Yes, chief data officer functions. So rather than imposition of policies and standards, it's about actually trying to unlock business values. So rather than look at it from a compliance perspective, which is very important, but actually flip it around and look at it from a business value perspective. >> Jim: Hmm. >> So for example, if you're able to tag and classify data, and then apply the right kind of protection against it, it actually helps the data scientists to use that data for their models. While that's actually following GDPR guidelines. So it's a win-win from that perspective. >> So, in many ways, the core requirement for GDPR compliance, which is to discover an inventory and essentially tag all of your data, on a fine grade level, can be the greatest thing that ever happened to data monetization. In other words, it's the foundation of data reuse and monetization, unlocking the true value to your business of the data. So it needn't be an overhead burden, it can be the foundation for a new business model. >> Absolutely, Because I think if you talk about organizations becoming data driven, you have to look at what does the data asset actually mean. >> Jim: Yes. >> So to me, that's a curated data set with the right level of description, again underpinned by the right authority of privacy and ability to use the data. So I think GDPR is going to be a very good enabler, so again the small minority of organizations that have been successful have done this. They've had business laws freeze data catalogs, but now with GDPR, that's almost I think going to force the issue. Which I think is a very positive outcome. >> Now Pankaj, do you see any of your customers taking this concept of curation and so forth, the next step in terms of there's data assets but then there's data derived assets, like machine learning models and so forth. Data scientists build and train and deploy these models and algorithms, that's the core of their job. >> Man: Mhmm. >> And model governance is a hot hot topic we see all over. You've got to have tight controls, not just on the data, but on the models, 'cause they're core business IP. Do you see this architecture evolving among your customer so that they'll also increasingly be required to want to essentially catalog the models and identify curate them for re-usability. Possibly monetization opportunities. Is that something that any of your customers are doing or exploring? >> Some of our customers are looking at that as well. So again, initially, exactly it's an extension of the marketplace. So while one aspect of the marketplace is data sets, you can then combine to run the models, The other aspect is models that you can also search for and prescribe data. >> Jim: Yeah, like pre-trained models. >> Correct. >> Can be golden if they're pre trained and the core domain for which they're trained doesn't change all that often, they can have a great after market value conceivably if you want to resell that. >> Absolutely, and I think this is also a key enabler for the way data scientists and data engineers expect to operate. So this notion of IDs of collaborative notebooks and so forth, and being able to soft of share the outputs of models. And to be able to share that with other folks in the team who can then maybe tweak it for a different algorithm, is a huge, I think, productivity enabler, and we've seen. >> Jim: Yes. >> Quite a few of our technology partners working towards enabling these data scientists to move very quickly from a model they may have initially developed on a laptop, to actually then deploying the (mumbles). How can you do that very quickly, and reduce the time from an ideal hypothesis to production. >> (mumbles) Modularization of machine learning and deep learning, I'm seeing a lot of that among data scientists in the business world. Well thank you, Pankaj, we're out of time right now. This has been very engaging and fascinating discussion. And we thank you very much for coming on theCUBE. This has been Pankaj Sodhi of Accenture. We're here at Data Works Summit 2018 in Berlin, Germany. Its been a great show, and we have more expert guests that we'll be interviewing later in the day. Thank you very much, Pankaj. >> Thank you very much, Jim.

Published Date : Apr 19 2018

SUMMARY :

Brought to you by, Horton Works. He's based in London, and he's here to discuss really what is not entirely ready to comply with GDRP on day one, So in regards to Hadoop adoption, I think I would place In that scenario, they've taken the time to actually and starting to work out, from a footprint perspective, it refers to an end to end life cycle for development So this is already starting to challenge organizations haven't had the ability to actually curate the data sets, this is an architectural concept. the right definitions upfront, and as we curate and possibly being a player in the ongoing identification for the CDO function, because, rather than So rather than look at it from a compliance perspective, it actually helps the data scientists that ever happened to data monetization. Absolutely, Because I think if you talk So I think GDPR is going to be a very good enabler, and algorithms, that's the core of their job. so that they'll also increasingly be required to want to of the marketplace. if you want to resell that. And to be able to share that with other folks in the team to move very quickly from a model And we thank you very much for coming on theCUBE.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
PankajPERSON

0.99+

James KobielusPERSON

0.99+

JimPERSON

0.99+

LondonLOCATION

0.99+

Pankaj SodhiPERSON

0.99+

May 25thDATE

0.99+

AccentureORGANIZATION

0.99+

John ChryslerPERSON

0.99+

Horton WorksORGANIZATION

0.99+

Silicon Angled MediaORGANIZATION

0.99+

GDPRTITLE

0.99+

Berlin, GermanyLOCATION

0.99+

OneQUANTITY

0.98+

bothQUANTITY

0.98+

one aspectQUANTITY

0.97+

oneQUANTITY

0.97+

Data Works SummitEVENT

0.96+

two waysQUANTITY

0.96+

Data Works Summit 2018EVENT

0.95+

Dataworks Summit EU 2018EVENT

0.93+

EuropeLOCATION

0.93+

HadoopTITLE

0.92+

day twoQUANTITY

0.9+

HadoobPERSON

0.87+

2018EVENT

0.84+

day oneQUANTITY

0.82+

threeQUANTITY

0.79+

first onesQUANTITY

0.77+

theCUBEORGANIZATION

0.76+

Wikbon TeamORGANIZATION

0.72+

this morningDATE

0.7+

HadoobTITLE

0.7+

GDRPTITLE

0.55+

categoriesQUANTITY

0.54+

Big DSOORGANIZATION

0.52+

HadoobORGANIZATION

0.46+

Alan Gates, Hortonworks | Dataworks Summit 2018


 

(techno music) >> (announcer) From Berlin, Germany it's theCUBE covering DataWorks Summit Europe 2018. Brought to you by Hortonworks. >> Well hello, welcome to theCUBE. We're here on day two of DataWorks Summit 2018 in Berlin, Germany. I'm James Kobielus. I'm lead analyst for Big Data Analytics in the Wikibon team of SiliconANGLE Media. And who we have here today, we have Alan Gates whose one of the founders of Hortonworks and Hortonworks of course is the host of DataWorks Summit and he's going to be, well, hello Alan. Welcome to theCUBE. >> Hello, thank you. >> Yeah, so Alan, so you and I go way back. Essentially, what we'd like you to do first of all is just explain a little bit of the genesis of Hortonworks. Where it came from, your role as a founder from the beginning, how that's evolved over time but really how the company has evolved specifically with the folks on the community, the Hadoop community, the Open Source community. You have a deepening open source stack with you build upon with Atlas and Ranger and so forth. Gives us a sense for all of that Alan. >> Sure. So as I think it's well-known, we started as the team at Yahoo that really was driving a lot of the development of Hadoop. We were one of the major players in the Hadoop community. Worked on that for, I was in that team for four years. I think the team itself was going for about five. And it became clear that there was an opportunity to build a business around this. Some others had already started to do so. We wanted to participate in that. We worked with Yahoo to spin out Hortonworks and actually they were a great partner in that. Helped us get than spun out. And the leadership team of the Hadoop team at Yahoo became the founders of Hortonworks and brought along a number of the other engineering, a bunch of the other engineers to help get started. And really at the beginning, we were. It was Hadoop, Pig, Hive, you know, a few of the very, Hbase, the kind of, the beginning projects. So pretty small toolkit. And we were, our early customers were very engineering heavy people, or companies who knew how to take those tools and build something directly on those tools right? >> Well, you started off with the Hadoop community as a whole started off with a focus on the data engineers of the world >> Yes. >> And I think it's shifted, and confirm for me, over time that you focus increasing with your solutions on the data scientists who are doing the development of the applications, and the data stewards from what I can see at this show. >> I think it's really just a part of the adoption curve right? When you're early on that curve, you have people who are very into the technology, understand how it works, and want to dive in there. So those tend to be, as you said, the data engineering types in this space. As that curve grows out, you get, it comes wider and wider. There's still plenty of data engineers that are our customers, that are working with us but as you said, the data analysts, the BI people, data scientists, data stewards, all those people are now starting to adopt it as well. And they need different tools than the data engineers do. They don't want to sit down and write Java code or you know, some of the data scientists might want to work in Python in a notebook like Zeppelin or Jupyter but some, may want to use SQL or even Tablo or something on top of SQL to do the presentation. Of course, data stewards want tools more like Atlas to help manage all their stuff. So that does drive us to one, put more things into the toolkit so you see the addition of projects like Apache Atlas and Ranger for security and all that. Another area of growth, I would say is also the kind of data that we're focused on. So early on, we were focused on data at rest. You know, we're going to store all this stuff in HDFS and as the kind of data scene has evolved, there's a lot more focus now on a couple things. One is data, what we call data-in-motion for our HDF product where you've got in a stream manager like Kafka or something like that >> (James) Right >> So there's processing that kind of data. But now we also see a lot of data in various places. It's not just oh, okay I have a Hadoop cluster on premise at my company. I might have some here, some on premise somewhere else and I might have it in several clouds as well. >> K, your focus has shifted like the industry in general towards streaming data in multi-clouds where your, it's more stateful interactions and so forth? I think you've made investments in Apache NiFi so >> (Alan) yes. >> Give us a sense for your NiFi versus Kafka and so forth inside of your product strategy or your >> Sure. So NiFi is really focused on that data at the edge, right? So you're bringing data in from sensors, connected cars, airplane engines, all those sorts of things that are out there generating data and you need, you need to figure out what parts of the data to move upstream, what parts not to. What processing can I do here so that I don't have to move upstream? When I have a error event or a warning event, can I turn up the amount of data I'm sending in, right? Say this airplane engine is suddenly heating up maybe a little more than it's supposed to. Maybe I should ship more of the logs upstream when the plane lands and connects that I would if, otherwise. That's the kind o' thing that Apache NiFi focuses on. I'm not saying it runs in all those places by my point is, it's that kind o' edge processing. Kafka is still going to be running in a data center somewhere. It's still a pretty heavy weight technology in terms of memory and disk space and all that so it's not going to be run on some sensor somewhere. But it is that data-in-motion right? I've got millions of events streaming through a set of Kafka topics watching all that sensor data that's coming in from NiFi and reacting to it, maybe putting some of it in the data warehouse for later analysis, all those sorts of things. So that's kind o' the differentiation there between Kafka and NiFi. >> Right, right, right. So, going forward, do you see more of your customers working internet of things projects, is that, we don't often, at least in the industry of popular mind, associate Hortonworks with edge computing and so forth. Is that? >> I think that we will have more and more customers in that space. I mean, our goal is to help our customers with their data wherever it is. >> (James) Yeah. >> When it's on the edge, when it's in the data center, when it's moving in between, when it's in the cloud. All those places, that's where we want to help our customers store and process their data. Right? So, I wouldn't want to say that we're going to focus on just the edge or the internet of things but that certainly has to be part of our strategy 'cause it's has to be part of what our customers are doing. >> When I think about the Hortonworks community, now we have to broaden our understanding because you have a tight partnership with IBM which obviously is well-established, huge and global. Give us a sense for as you guys have teamed more closely with IBM, how your community has changed or broadened or shifted in its focus or has it? >> I don't know that it's shifted the focus. I mean IBM was already part of the Hadoop community. They were already contributing. Obviously, they've contributed very heavily on projects like Spark and some of those. They continue some of that contribution. So I wouldn't say that it's shifted it, it's just we are working more closely together as we both contribute to those communities, working more closely together to present solutions to our mutual customer base. But I wouldn't say it's really shifted the focus for us. >> Right, right. Now at this show, we're in Europe right now, but it doesn't matter that we're in Europe. GDPR is coming down fast and furious now. Data Steward Studio, we had the demonstration today, it was announced yesterday. And it looks like a really good tool for the main, the requirements for compliance which is discover and inventory your data which is really set up a consent portal, what I like to refer to. So the data subject can then go and make a request to have my data forgotten and so forth. Give us a sense going forward, for how or if Hortonworks, IBM, and others in your community are going to work towards greater standardization in the functional capabilities of the tools and platforms for enabling GDPR compliance. 'Cause it seems to me that you're going to need, the industry's going to need to have some reference architecture for these kind o' capabilities so that going forward, either your ecosystem of partners can build add on tools in some common, like the framework that was laid out today looks like a good basis. Is there anything that you're doing in terms of pushing towards more Open Source standardization in that area? >> Yes, there is. So actually one of my responsibilities is the technical management of our relationship with ODPI which >> (James) yes. >> Mandy Chessell referenced yesterday in her keynote and that is where we're working with IBM, with ING, with other companies to build exactly those standards. Right? Because we do want to build it around Apache Atlas. We feel like that's a good tool for the basis of that but we know one, that some people are going to want to bring their own tools to it. They're not necessarily going to want to use that one platform so we want to do it in an open way that they can still plug in their metadata repositories and communicate with others and we want to build the standards on top of that of how do you properly implement these features that GDPR requires like right to be forgotten, like you know, what are the protocols around PIII data? How do you prevent a breach? How do you respond to a breach? >> Will that all be under the umbrella of ODPI, that initiative of the partnership or will it be a separate group or? >> Well, so certainly Apache Atlas is part of Apache and remains so. What ODPI is really focused up is that next layer up of how do we engage, not the programmers 'cause programmers can gage really well at the Apache level but the next level up. We want to engage the data professionals, the people whose job it is, the compliance officers. The people who don't sit and write code and frankly if you connect them to the engineers, there's just going to be an impedance mismatch in that conversation. >> You got policy wonks and you got tech wonks so. They understand each other at the wonk level. >> That's a good way to put it. And so that's where ODPI is really coming is that group of compliance people that speak a completely different language. But we still need to get them all talking to each other as you said, so that there's specifications around. How do we do this? And what is compliance? >> Well Alan, thank you very much. We're at the end of our time for this segment. This has been great. It's been great to catch up with you and Hortonworks has been evolving very rapidly and it seems to me that, going forward, I think you're well-positioned now for the new GDPR age to take your overall solution portfolio, your partnerships, and your capabilities to the next level and really in terms of in an Open Source framework. In many ways though, you're not entirely 100% like nobody is, purely Open Source. You're still very much focused on open frameworks for building fairly scalable, very scalable solutions for enterprise deployment. Well, this has been Jim Kobielus with Alan Gates of Hortonworks here at theCUBE on theCUBE at DataWorks Summit 2018 in Berlin. We'll be back fairly quickly with another guest and thank you very much for watching our segment. (techno music)

Published Date : Apr 19 2018

SUMMARY :

Brought to you by Hortonworks. of Hortonworks and Hortonworks of course is the host a little bit of the genesis of Hortonworks. a bunch of the other engineers to help get started. of the applications, and the data stewards So those tend to be, as you said, the data engineering types But now we also see a lot of data in various places. So NiFi is really focused on that data at the edge, right? So, going forward, do you see more of your customers working I mean, our goal is to help our customers with their data When it's on the edge, when it's in the data center, as you guys have teamed more closely with IBM, I don't know that it's shifted the focus. the industry's going to need to have some So actually one of my responsibilities is the that GDPR requires like right to be forgotten, like and frankly if you connect them to the engineers, You got policy wonks and you got tech wonks so. as you said, so that there's specifications around. It's been great to catch up with you and

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
IBMORGANIZATION

0.99+

James KobielusPERSON

0.99+

Mandy ChessellPERSON

0.99+

AlanPERSON

0.99+

YahooORGANIZATION

0.99+

Jim KobielusPERSON

0.99+

EuropeLOCATION

0.99+

HortonworksORGANIZATION

0.99+

Alan GatesPERSON

0.99+

four yearsQUANTITY

0.99+

JamesPERSON

0.99+

INGORGANIZATION

0.99+

BerlinLOCATION

0.99+

yesterdayDATE

0.99+

ApacheORGANIZATION

0.99+

SQLTITLE

0.99+

JavaTITLE

0.99+

GDPRTITLE

0.99+

PythonTITLE

0.99+

100%QUANTITY

0.99+

Berlin, GermanyLOCATION

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

DataWorks SummitEVENT

0.99+

AtlasORGANIZATION

0.99+

DataWorks Summit 2018EVENT

0.98+

Data Steward StudioORGANIZATION

0.98+

todayDATE

0.98+

oneQUANTITY

0.98+

NiFiORGANIZATION

0.98+

Dataworks Summit 2018EVENT

0.98+

HadoopORGANIZATION

0.98+

one platformQUANTITY

0.97+

2018EVENT

0.97+

bothQUANTITY

0.97+

millions of eventsQUANTITY

0.96+

HbaseORGANIZATION

0.95+

TabloTITLE

0.95+

ODPIORGANIZATION

0.94+

Big Data AnalyticsORGANIZATION

0.94+

OneQUANTITY

0.93+

theCUBEORGANIZATION

0.93+

NiFiCOMMERCIAL_ITEM

0.92+

day twoQUANTITY

0.92+

about fiveQUANTITY

0.91+

KafkaTITLE

0.9+

ZeppelinORGANIZATION

0.89+

AtlasTITLE

0.85+

RangerORGANIZATION

0.84+

JupyterORGANIZATION

0.83+

firstQUANTITY

0.82+

Apache AtlasORGANIZATION

0.82+

HadoopTITLE

0.79+

Day Two Keynote Analysis | Dataworks Summit 2018


 

>> Announcer: From Berlin, Germany, it's the Cube covering Datawork Summit Europe 2018. Brought to you by Hortonworks. (electronic music) >> Hello and welcome to the Cube on day two of Dataworks Summit 2018 from Berlin. It's been a great show so far. We have just completed the day two keynote and in just a moment I'll bring ya up to speed on the major points and the presentations from that. It's been a great conference. Fairly well attended here. The hallway chatter, discussion's been great. The breakouts have been stimulating. For me the takeaway is the fact that Hortonworks, the show host, has announced yesterday at the keynote, Scott Gnau, the CTO of Hortonworks announced Data Steward Studio, DSS they call it, part of the data plane, Hotronworks data plane services portfolio and it could not be more timely Data Steward Studio because we are now five weeks away from GDPR, that's the General Data Protection Regulation becoming the law of the land. When I say the land, the EU, but really any company that operates in the EU, and that includes many U.S. based and Apac based and other companies will need to comply with the GDPR as of May 25th and ongoing. In terms of protecting the personal data of EU citizens. And that means a lot of different things. Data Steward Studio announced yesterday, was demo'd today, by Hortonworks and it was a really excellent demo, and showed that it's a powerful solution for a number of things that are at the core of GDPR compliance. The demo covered the capability of the solution to discover and inventory personal data within a distributed data lake or enterprise data environment, number one. Number two, the ability of the solution to centralize consent, provide a consent portal essentially that data subjects can use then to review the data that's kept on them to make fine grain consents or withdraw consents for use in profiling of their data that they own. And then number three, the show, they demonstrated the capability of the solution then to execute the data subject to people's requests in terms of the handling of their personal data. The three main points in terms of enabling, adding the teeth to enforce GDPR in an operational setting in any company that needs to comply with GDPR. So, what we're going to see, I believe going forward in the, really in the whole global economy and in the big data space is that Hortonworks and others in the data lake industry, and there's many others, are going to need to roll out similar capabilities in their portfolios 'cause their customers are absolutely going to demand it. In fact the deadline is fast approaching, it's only five weeks away. One of the interesting take aways from the, the keynote this morning was the fact that John Kreisa, the VP for marketing at Hortonworks today, a quick survey of those in the audience a poll, asking how ready they are to comply with GDPR as of May 25th and it was a bit eye opening. I wasn't surprised, but I think it was 19 or 20%, I don't have the numbers in front of me, said that they won't be ready to comply. I believe it was something where between 20 and 30% said they will be able to comply. About 40% I'm, don't quote me on that, but a fair plurality said that they're preparing. So that, indicates that they're not entirely 100% sure that they will be able to comply 100% to the letter of the law as of May 25th. I think that's probably accurate in terms of ballpark figures. I think there's a lot of, I know there's a lot of companies, users racing for compliance by that date. And so really GDPR is definitely the headline banner, umbrella story around this event and really around the big data community world-wide right now in terms of enterprise, investments in the needed compliance software and services and capabilities are needed to comply with GDPR. That was important. That wasn't the only thing that was covered in, not only the keynotes, but in the sessions here so far. AI, clearly AI and machine learning are hot themes in terms of the innovation side of big data. There's compliance, there's GDPR, but really innovation in terms of what enterprises are doing with their data, with their analytics, they're building more and more AI and embedding that in conversational UIs and chatbots and their embedding AI, you know manner of e-commerce applications, internal applications in terms of search, as well as things like face recognition, voice recognition, and so forth and so on. So, what we've seen here at the show is what I've been seeing for quite some time is that more of the actual developers who are working with big data are the data scientists of the world. And more of the traditional coders are getting up to speed very rapidly on the new state of the art for building machine learning and deep learning AI natural language processing into their applications. That said, so Hortonworks has become a fairly substantial player in the machine learning space. In fact, you know, really across their portfolio many of the discussions here I've seen shows that everybody's buzzing about getting up to speed on frameworks for building and deploying and iterating and refining machine learning models in operational environments. So that's definitely a hot theme. And so there was an AI presentation this morning from the first gentleman that came on that laid out the broad parameters of what, what developers are doing and looking to do with data that they maintain in their lakes, training data to both build the models and train them and deploy them. So, that was also something I expected and it's good to see at Dataworks Summit that there is a substantial focus on that in addition of course to GDPR and compliance. It's been about seven years now since Hortonworks was essentially spun off of Yahoo. It's been I think about three years or so since they went IPO. And what I can see is that they are making great progress in terms of their growth, in terms of not just the finances, but their customer acquisition and their deal size and also customer satisfaction. I get a sense from talking to many of the attendees at this event that Hortonworks has become a fairly blue chip vendor, that they're really in many ways, continuing to grow their footprint of Hortonworks products and services in most of their partners, such as IBM. And from what I can see everybody was wrapped with intention around Data Steward Studio and I sensed, sort of a sigh of relief that it looks like a fairly good solution and so I have no doubt that a fair number of those in this hall right now are probably, as we say in the U.S., probably kicking the tires of DSS and probably going to expedite their adoption of it. So, with that said, we have day two here, so what we're going to have is Alan Gates, one of the founders of Hortonworks coming on in just a few minutes and I'll be interviewing him, asking about the vibrancy in the health of the community, the Hortonworks ecosystem, developers, partners, and so forth as well as of course the open source communities for Hadoop and Ranger and Atlas and so forth, the growing stack of open source code upon which Hortonworks has built their substantial portfolio of solutions. Following him we'll have John Kreisa, the VP for marketing. I'm going to ask John to give us an update on, really the, sort of the health of Hortonworks as a business in terms of the reach out to the community in terms of their messaging obviously and have him really position Hortonworks in the community in terms of who's he see them competing with. What segments is Hortonworks in now? The whole Hadoop segment increasingly... Hadoop is there. It's the foundation. The word is not invoked in the context of discussions of Hortonworks as much now as it was in the past. And the same thing for say Cloudera one of their closest to traditional rivals, closest in the sense that people associate them. I was at the Cloudera analyst event the other week in Santa Monica, California. It was the same thing. I think both of these vendors are on a similar path to become fairly substantial data warehousing and data governance suppliers to the enterprises of the world that have traditionally gone with the likes of IBM and Oracle and SAP and so forth. So I think they're, Hortonworks, has definitely evolved into a far more diversified solution provider than people realize. And that's really one of the take aways from Dataworks Summit. With that said, this is Jim Kobielus. I'm the lead analyst, I should've said that at the outset. I'm the lead analyst at SiliconANGLE's Media's Wikibon team focused on big data analytics. I'm your host this week on the Cube at Dataworks Summit Berlin. I'll close out this segment and we'll get ready to talk to the Hortonworks and IBM personnel. I understand there's a gentleman from Accenture on as well today on the Cube here at Dataworks Summit Berlin. (electronic music)

Published Date : Apr 19 2018

SUMMARY :

Announcer: From Berlin, Germany, it's the Cube as a business in terms of the reach out to the community

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Jim KobielusPERSON

0.99+

John KreisaPERSON

0.99+

HortonworksORGANIZATION

0.99+

Scott GnauPERSON

0.99+

IBMORGANIZATION

0.99+

JohnPERSON

0.99+

ClouderaORGANIZATION

0.99+

May 25thDATE

0.99+

BerlinLOCATION

0.99+

YahooORGANIZATION

0.99+

five weeksQUANTITY

0.99+

Alan GatesPERSON

0.99+

OracleORGANIZATION

0.99+

HotronworksORGANIZATION

0.99+

Data Steward StudioORGANIZATION

0.99+

General Data Protection RegulationTITLE

0.99+

Santa Monica, CaliforniaLOCATION

0.99+

GDPRTITLE

0.99+

19QUANTITY

0.99+

bothQUANTITY

0.99+

100%QUANTITY

0.99+

todayDATE

0.99+

20%QUANTITY

0.99+

oneQUANTITY

0.99+

yesterdayDATE

0.99+

U.S.LOCATION

0.99+

DSSORGANIZATION

0.99+

30%QUANTITY

0.99+

Berlin, GermanyLOCATION

0.98+

Dataworks Summit 2018EVENT

0.98+

three main pointsQUANTITY

0.98+

AtlasORGANIZATION

0.98+

20QUANTITY

0.98+

about seven yearsQUANTITY

0.98+

AccentureORGANIZATION

0.97+

SiliconANGLEORGANIZATION

0.97+

OneQUANTITY

0.97+

about three yearsQUANTITY

0.97+

Day TwoQUANTITY

0.97+

first gentlemanQUANTITY

0.96+

day twoQUANTITY

0.96+

SAPORGANIZATION

0.96+

EULOCATION

0.95+

Datawork Summit Europe 2018EVENT

0.95+

Dataworks SummitEVENT

0.94+

this morningDATE

0.91+

About 40%QUANTITY

0.91+

WikibonORGANIZATION

0.9+

EUORGANIZATION

0.9+

Joe Morrissey, Hortonworks | Dataworks Summit 2018


 

>> Narrator: From Berlin, Germany, it's theCUBE! Covering Dataworks Summit Europe 2018. Brought to you by Hortonworks. >> Well, hello. Welcome to theCUBE. I'm James Kobielus. I'm lead analyst at Wikibon for big data analytics. Wikibon, of course, is the analyst team inside of SiliconANGLE Media. One of our core offerings is theCUBE and I'm here with Joe Morrissey. Joe is the VP for International at Hortonworks and Hortonworks is the host of Dataworks Summit. We happen to be at Dataworks Summit 2018 in Berlin! Berlin, Germany. And so, Joe, it's great to have you. >> Great to be here! >> We had a number of conversations today with Scott Gnau and others from Hortonworks and also from your customer and partners. Now, you're International, you're VP for International. We've had a partner of yours from South Africa on theCUBE today. We've had a customer of yours from Uruguay. So there's been a fair amount of international presence. We had Munich Re from Munich, Germany. Clearly Hortonworks is, you've been in business as a company for seven years now, I think it is, and you've established quite a presence worldwide, I'm looking at your financials in terms of your customer acquisition, it just keeps going up and up so you're clearly doing a great job of bringing the business in throughout the world. Now, you've told me before the camera went live that you focus on both Europe and Asia PACS, so I'd like to open it up to you, Joe. Tell us how Hortonworks is doing worldwide and the kinds of opportunities you're selling into. >> Absolutely. 2017 was a record year for us. We grew revenues by over 40% globally. I joined to lead the internationalization of the business and you know, not a lot of people know that Hortonworks is actually one of the fastest growing software companies in history. We were the fastest to get to $100 million. Also, now the fastest to get to $200 million but the majority of that revenue contribution was coming from the United States. When I joined, it was about 15% of international contribution. By the end of 2017, we'd grown that to 31%, so that's a significant improvement in contribution overall from our international customer base even though the company was growing globally at a very fast rate. >> And that's also not only fast by any stretch of the imagination in terms of growth, some have said," Oh well, maybe Hortonworks, "just like Cloudera, maybe they're going to plateau off "because the bloom is off the rose of Hadoop." But really, Hadoop is just getting going as a market segment or as a platform but you guys have diversified well beyond that. So give us a sense for going forward. What are your customers? What kind of projects are you positioning and selling Hortonworks solutions into now? Is it a different, well you've only been there 18 months, but is it shifting towards more things to do with streaming, NiFi and so forth? Does it shift into more data science related projects? Coz this is worldwide. >> Yeah. That's a great question. This company was founded on the premise that data volumes and diversity of data is continuing to explode and we believe that it was necessary for us to come and bring enterprise-grade security and management and governance to the core Hadoop platform to make it really ready for the enterprise, and that's what the first evolution of our journey was really all about. A number of years ago, we acquired a company called Onyara, and the logic behind that acquisition was we believe companies now wanted to go out to the point of origin, of creation of data, and manage data throughout its entire life cycle and derive pre-event as well as post-event analytical insight into their data. So what we've seen as our customers are moving beyond just unifying data in the data lake and deriving post-transaction inside of their data. They're now going all the way out to the edge. They're deriving insight from their data in real time all the way from the point of creation and getting pre-transaction insight into data as well so-- >> Pre-transaction data, can you define what you mean by pre-transaction data. >> Well, I think if you look at it, it's really the difference between data in motion and data at rest, right? >> Oh, yes. >> A specific example would be if a customer walks into the store and they've interacted in the store maybe on social before they come in or in some other fashion, before they've actually made the purchase. >> Engagement data, interaction data, yes. >> Engagement, exactly. Exactly. Right. So that's one example, but that also extends out to use cases in IoT as well, so data in motion and streaming data, as you mentioned earlier since become a very, very significant use case that we're seeing a lot of adoption for. Data science, I think companies are really coming to the realization that that's an essential role in the organization. If we really believe that data is the most important asset, that it's the crucial asset in the new economy, then data scientist becomes a really essential role for any company. >> How do your Asian customers' requirements differ, or do they differ from your European cause European customers clearly already have their backs against the wall. We have five weeks until GDPR goes into effect. Do many of your Asian customer, I'm sure a fair number sell into Europe, are they putting a full court, I was going to say in the U.S., a full court press on complying with GDPR, or do they have equivalent privacy mandates in various countries in Asia or a bit of both? >> I think that one of the primary drivers I see in Asia is that a lot of companies there don't have the years of legacy architecture that European companies need to contend with. In some cases, that means that they can move towards next generation data-orientated architectures much quicker than European companies have. They don't have layers of legacy tech that they need to sunset. A great example of that is Reliance. Reliance is the largest company in India, they've got a subsidiary called GO, which is the fastest growing telco in the world. They've implemented our technology to build a next-generation OSS system to improve their service delivery on their network. >> Operational support system. >> Exactly. They were able to do that from the ground up because they formed their telco division around being a data-only company and giving away voice for free. So they can in some extent, move quicker and innovate a little faster in that regards. I do see much more emphasis on regulatory compliance in Europe than I see in Asia. I do think that GDPR amongst other regulations is a big driver of that. The other factor though I think that's influencing that is Cloud and Cloud strategy in general. What we've found is that, customers are drawn to the Cloud for a number of reasons. The economics sometimes can be attractive, the ability to be able to leverage the Cloud vendors' skills in terms of implementing complex technology is attractive, but most importantly, the elasticity and scalability that the Cloud provides us, hugely important. Now, the key concern for customers as they move to the Cloud though, is how do they leverage that as a platform in the context of an overall data strategy, right? And when you think about what a data strategy is all about, it all comes down to understanding what your data assets are and ensuring that you can leverage them for a competitive advantage but do so in a regulatory compliant manner, whether that's data in motion or data at rest. Whether it's on-prem or in the Cloud or in data across multiple Clouds. That's very much a top of mind concern for European companies. >> For your customers around the globe, specifically of course, your area of Europe and Asia, what percentage of your customers that are deploying Hortonworks into a purely public Cloud environment like HDInsight and Microsoft Azure or HDP inside of AWS, in a public Cloud versus in a private on-premises deployment versus in a hybrid public-private multi Cloud. Is it mostly on-prem? >> Most of our business is still on-prem to be very candid. I think almost all of our customers are looking at migrating, some more close to the Cloud. Even those that had intended to have a Cloud for a strategy have now realized that not all workloads belong in the Cloud. Some are actually more economically viable to be on-prem, and some just won't ever be able to move to the Cloud because of regulation. In addition to that, most of our customers are telling us that they actually want Cloud optionality. They don't want to be locked in to a single vendor, so we very much view the future as hybrid Cloud, as multi Cloud, and we hear our customers telling us that rather than just have a Cloud strategy, they need a data strategy. They need a strategy to be able to manage data no matter where it lives, on which tier, to ensure that they are regulatory compliant with that data. But then to be able to understand that they can secure, govern, and manage those data assets at any tier. >> What percentage of your deals involve a partner? Like IBM is a major partner. Do you do a fair amount of co-marketing and joint sales and joint deals with IBM and other partners or are they mostly Hortonworks-led? >> No, partners are absolutely critical to our success in the international sphere. Our partner revenue contribution across EMEA in the past year grew, every region grew by over 150% in terms of channel contribution. Our total channel business was 28% of our total, right? That's a very significant contribution. The growth rate is very high. IBM are a big part of that, as are many other partners. We've got, the very significant reseller channel, we've got IHV and ISV partners that are critical to our success also. Where we're seeing the most impact with with IBM is where we go to some of these markets where we haven't had a presence previously, and they've got deep and long-standing relationships and that helps us accelerate time to value with our customers. >> Yeah, it's been a very good and solid partnership going back several years. Well, Joe, this is great, we have to wrap it up, we're at the end of our time slot. This has been Joe Morrissey who is the VP for International at Hortonworks. We're on theCUBE here at Dataworks Summit 2018 in Berlin, and want to thank you all for watching this segment and tune in tomorrow, we'll have a full slate of further discussions with Hortonworks, with IBM and others tomorrow on theCUBE. Have a good one. (upbeat music)

Published Date : Apr 18 2018

SUMMARY :

Brought to you by Hortonworks. and Hortonworks is the host of Dataworks Summit. and the kinds of opportunities you're selling into. Also, now the fastest to get to $200 million of the imagination in terms of growth, and governance to the core Hadoop platform Pre-transaction data, can you define what you mean maybe on social before they come in or Engagement data, that that's an essential role in the organization. Do many of your Asian customer, that they need to sunset. the ability to be able to leverage the Cloud vendors' skills and Microsoft Azure or Most of our business is still on-prem to be very candid. and joint deals with IBM that are critical to our success also. and want to thank you all for watching this segment and

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
James KobielusPERSON

0.99+

Joe MorrisseyPERSON

0.99+

IBMORGANIZATION

0.99+

AsiaLOCATION

0.99+

EuropeLOCATION

0.99+

JoePERSON

0.99+

UruguayLOCATION

0.99+

HortonworksORGANIZATION

0.99+

IndiaLOCATION

0.99+

Scott GnauPERSON

0.99+

seven yearsQUANTITY

0.99+

WikibonORGANIZATION

0.99+

28%QUANTITY

0.99+

South AfricaLOCATION

0.99+

OnyaraORGANIZATION

0.99+

BerlinLOCATION

0.99+

United StatesLOCATION

0.99+

$100 millionQUANTITY

0.99+

$200 millionQUANTITY

0.99+

31%QUANTITY

0.99+

five weeksQUANTITY

0.99+

18 monthsQUANTITY

0.99+

GOORGANIZATION

0.99+

tomorrowDATE

0.99+

2017DATE

0.99+

bothQUANTITY

0.99+

GDPRTITLE

0.99+

one exampleQUANTITY

0.99+

oneQUANTITY

0.98+

todayDATE

0.98+

U.S.LOCATION

0.98+

Dataworks Summit 2018EVENT

0.98+

AWSORGANIZATION

0.98+

Berlin, GermanyLOCATION

0.98+

over 40%QUANTITY

0.98+

MicrosoftORGANIZATION

0.98+

RelianceORGANIZATION

0.98+

over 150%QUANTITY

0.97+

Dataworks SummitEVENT

0.97+

EMEAORGANIZATION

0.97+

first evolutionQUANTITY

0.96+

2018EVENT

0.96+

EuropeanOTHER

0.96+

SiliconANGLE MediaORGANIZATION

0.95+

Munich, GermanyLOCATION

0.95+

OneQUANTITY

0.95+

end of 2017DATE

0.94+

HadoopTITLE

0.93+

ClouderaORGANIZATION

0.93+

about 15%QUANTITY

0.93+

past yearDATE

0.92+

theCUBEORGANIZATION

0.92+

single vendorQUANTITY

0.91+

telcoORGANIZATION

0.89+

Munich ReORGANIZATION

0.88+

Fernando Lopez, Quanam | Dataworks 2018


 

>> Narrator: From Berlin, Germany, it's theCUBE, covering Dataworks Summit Europe 2018. Brought to you by Hortonworks. >> Well hello, welcome to the Cube. I'm James Kobielus, I'm the lead analyst for the Wikibon team within SiliconANGLE Media. I'm your host today here at Dataworks Summit 2018 in Berlin, Germany. We have one of Hortonworks' customers in South America with us. This is Fernando Lopez of Quanam. He's based in Montevideo, Uruguay. And he has won, here at the conference, he and his company have won an award, a data science award so what I'd like to do is ask Fernando, Fernando Lopez to introduce himself, to give us his job description, to describe the project for which you won the award and take it from there, Fernando. >> Hello and thanks for the chance >> Great to have you. >> I work for Quanam, as you already explained. We are about 400 people in the whole company. And we are spread across Latin America. I come from the kind of headquarters, which is located in Montevideo, Uruguay. And there we have a business analytics business unit. Within that, we are about 70 people and we have a big data and artificial intelligence and cognitive computing group, which I lead. And yes, we also implement Hortonworks. We are actually partnering with Hortonworks. >> When you say you lead the group, are you a data scientist yourself, or do you manage a group of data scientists or a bit of both? >> Well a bit of both. You know, you have to do different stuff in this life. So yes, I lead implementation groups. Sometimes the project is more big data. Sometimes it's more data science, different flavors. But within this group, we try to cover different aspects that are related in some sense with big data. It could be artificial intelligence. It could be cognitive computing, you know. >> Yes, so describe how you're using Hortonworks and describe the project for which you won, I assume it's a one project, for which you won the award, here at this conference. >> All right, yes. We are running several projects, but this one, the one about the prize, is one that I like so much because I'm actually a bioinformatics student so I have a special interest in this one. >> James: Okay. >> It's good to clarify that this was a joint effort between Quanam and GeneLifes. >> James: Genelabs. >> GeneLifes. >> James: GeneLifes. >> Yes, it's genetics and bioinformatics company. >> Right. >> That they specialize-- >> James: Is that a Montevideo based company? >> Yes. In a line, they are a startup that was born from the Institut Pasteur, but in Montevideo and they have a lot of people, who are specialists in bioinformatics, genetics, with a long career in the subject. And we come from the other side, from big data. I was kind of in the middle because of my interest with bioinformatics. So something like one year and a half ago, we met both companies. Actually there is a research, an innovation center, ICT4V. You can visit ICT4V.org, which is a non-profit organization after an agreement between Uruguay and France, >> Oh okay. >> Both governments. >> That makes possible different private or public organizations to collaborate. We have brainstorming sessions and so on. And from one of that brainstorming sessions, this project was born. So, after that we started to discuss ideas of how to bring tools to the medical genetiticists in order to streamline his work, in order to put on the top of his desktop different tools that could make his work easier and more productive. >> Looking for genetic diseases, or what are they looking for in the data specifically? >> Correct, correct. >> I'm not a geneticist but I try to explain myself as good as I can. >> James: Okay, that's good. You have a great job. >> If I am-- >> If I am the doctor, then I will spend a lot of hours researching literature. Bear in mind that we have nearly 300 papers each day, coming up in PubMed, that could be related with genetics. That's a lot. >> These are papers in Spanish that are published in South America? >> No, just talking about, >> Or Portuguese? >> PubMed from the NIH, it's papers published in English. >> Okay. >> PubMed or MEDLINE or-- >> Different languages different countries different sources. >> Yeah but most of it or everything in PubMed is in English. There is another PubMed in Europe and we have SciELO in Latin America also. But just to give you an idea, there's only from that source, 300 papers each day that could be related to genetics. So only speaking about literature, there's a huge amount of information. If I am the doctor, it's difficult to process that. Okay, so that's part of the issue. But on the core of the solution, what we want to give is, starting from the sequence genome of one patient, what can we assert, what can we say about the different variations. It is believed that we have around, each one of us, has about four million mutations. Mutation doesn't mean disease. Mutation actually leads to variation. And variation is not necessarily something negative. We can have different color of the eyes. We can have more or less hair. Or this could represent some disease, something that we need to pay attention as doctors, okay? So this part of the solution tries to implement heuristics on what's coming from the sequencing process. And this heuristics, in short, they tell you, which is the score of each variant, variation, of being more or less pathogenic. So if I am the doctor, part of the work is done there. Then I have to decide, okay, my diagnosis is there is this disease or not. This can be used in two senses. It can be used as prevention, in order to predict, this could happen, you have this genetic risk or this could be used in order to explain some disease and find a treatment. So that's the more bioinformatics part. On the other hand we have the literature. What we do with the literature is, we ingest this 300 daily papers, well abstracts not papers. Actually we have about three million abstracts. >> You ingest text and graphics, all of it? >> No, only the abstract, which is about a few hundred words. >> James: So just text? >> Yes >> Okay. >> But from there we try to identify relevant identities, proteins, diseases, phenotypes, things like that. And then we try to infer valid relationships. This phenotype or this disease can be caused because of this protein or because of the expression of that gene which is another entity. So this builds up kind of ontology, we call it the mini-ontology because it's specific to this domain. So we have kind of mini-semantic network with millions of nodes and edges, which is quite easy to interrogate. But the point is, there you have more than just text. You have something that is already enriched. You have a series of nodes and arrows, and you can query that in terms of reasoning. What leads to what, you know? >> So the analytical tools you're using, they come from, well Hortonworks doesn't make those tools. Are they coming from another partner in South America? Or another partner of Hortonworks' like an IBM or where does that come from? >> That's a nice question. Actually, we have an architecture. The core of the architecture is Hortonworks because we have scalability topics >> James: Yeah, HDP? >> Yes, HDFS, High-von-tessa, Spark. We have a number of items that need to be easily, ultra-escalated because when we talk about genome, it's easy to think about one terrabyte per patient of work. So that's one thing regarding storage and computing. On the other hand, we use a graph database. We use Neo4j for that. >> James: Okay the Neo4j for graph. The Neo4j, you have Hortonworks. >> Yes and we also use, in order to process natural language processing, we use Nine, which is based here in Berlin, actually. So we do part of the machine learning with Nine. Then we have Neo4j for the graph, for building this semantic network. And for the whole processing we have Hortonworks, for running this analysis and heuristics, and scoring the variance. We also use Solr for enterprise search, on top of the documents, or the conclusions of the documents that come from the ontology. >> Wow, that's a very complex and intricate deployment. So, great, in terms of the takeaways from this event, we only just have a little bit more time, what of all the discussions, the breakouts and the keynotes did you find most interesting so far about this show? Data stewardship was a theme of Scott Knowles, with that new solution, you know, in terms of what you're describing as operational application, have you built out something that can be deployed, is being deployed by your customers on an ongoing basis? It wasn't a one-time project, right? This is an ongoing application they can use internally. Is there a need in Uruguay or among your customers to provide privacy protections on this data? >> Sure. >> Will you be using these solutions like the data studio to enable a degree of privacy, protection of data equivalent to what, say, GDPR requires in Europe? Is that something? >> Yes actually we are running other projects in Uruguay. We are helping the, with other companies, we are helping the National Telecommunications Company. So there are security and privacy topics over there. And we are also starting these days a new project, again with ICT4V, another French company. We are in charge of their big data part, for an education program, which is based on the one laptop per child initiative, from the times of Nicholas Negroponte. Well, that initiative has already 10 years >> James: Oh from MIT, yes. >> Yes, from MIT, right. That initiative has already 10 years old in Uruguay, and now it has evolved also to retired people. So it's a kind of going towards the digital society. >> Excellent, I have to wrap it up Fernando, that's great you have a lot of follow on work. This is great, so clearly a lot of very advanced research is being done all over the world. I had the previous guest from South Africa. You from Uruguay so really south of the Equator. There's far more activity in big data than, we, here in the northern hemisphere, Europe and North America realize so I'm very impressed. And I look forward to hearing more from Quanam and through your provider, Hortonworks. Well, thank you very much. >> Thank you and thanks for the chance. >> It was great to have you here on theCUBE. I'm James Kobielus, we're here at DataWorks Summit, in Berlin and we'll be talking to another guest fairly soon. (mood music)

Published Date : Apr 18 2018

SUMMARY :

Brought to you by Hortonworks. to describe the project for which you won the award And there we have a business analytics business unit. Sometimes the project is more big data. and describe the project for which you won, the one about the prize, is one that I like so much It's good to clarify that this was a joint effort from the Institut Pasteur, but in Montevideo So, after that we started to discuss ideas of how to explain myself as good as I can. You have a great job. Bear in mind that we have nearly 300 papers each day, On the other hand we have the literature. But the point is, there you have more than just text. So the analytical tools you're using, The core of the architecture is Hortonworks We have a number of items that need to be James: Okay the Neo4j for graph. to process natural language processing, we use Nine, So, great, in terms of the takeaways from this event, from the times of Nicholas Negroponte. and now it has evolved also to retired people. You from Uruguay so really south of the Equator. It was great to have you here on theCUBE.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
FernandoPERSON

0.99+

JamesPERSON

0.99+

James KobielusPERSON

0.99+

UruguayLOCATION

0.99+

IBMORGANIZATION

0.99+

Fernando LopezPERSON

0.99+

BerlinLOCATION

0.99+

EuropeLOCATION

0.99+

HortonworksORGANIZATION

0.99+

Hortonworks'ORGANIZATION

0.99+

South AfricaLOCATION

0.99+

MITORGANIZATION

0.99+

NIHORGANIZATION

0.99+

Scott KnowlesPERSON

0.99+

South AmericaLOCATION

0.99+

300 papersQUANTITY

0.99+

Nicholas NegropontePERSON

0.99+

10 yearsQUANTITY

0.99+

ICT4VORGANIZATION

0.99+

GeneLifesORGANIZATION

0.99+

both companiesQUANTITY

0.99+

Institut PasteurORGANIZATION

0.99+

PubMedTITLE

0.99+

Berlin, GermanyLOCATION

0.99+

North AmericaLOCATION

0.99+

MontevideoLOCATION

0.99+

Montevideo, UruguayLOCATION

0.99+

Latin AmericaLOCATION

0.99+

one year and a half agoDATE

0.99+

GDPRTITLE

0.99+

two sensesQUANTITY

0.99+

QuanamORGANIZATION

0.99+

MEDLINETITLE

0.98+

Dataworks Summit 2018EVENT

0.98+

EnglishOTHER

0.98+

Dataworks SummitEVENT

0.98+

WikibonORGANIZATION

0.98+

one-timeQUANTITY

0.97+

about 70 peopleQUANTITY

0.97+

PortugueseOTHER

0.97+

EquatorLOCATION

0.97+

one thingQUANTITY

0.97+

2018EVENT

0.97+

one projectQUANTITY

0.97+

each variantQUANTITY

0.97+

National Telecommunications CompanyORGANIZATION

0.97+

millions of nodesQUANTITY

0.97+

each oneQUANTITY

0.97+

about 400 peopleQUANTITY

0.96+

bothQUANTITY

0.96+

one patientQUANTITY

0.96+

nearly 300 papersQUANTITY

0.95+

DataWorks SummitEVENT

0.95+

one laptopQUANTITY

0.94+

Both governmentsQUANTITY

0.94+

Muggie van Staden, Obsidian | Dataworks Summit 2018


 

>> Voiceover: From Berlin, Germany, it's theCUBE, covering DataWorks Summit Europe 2018, brought to you by Hortonworks. >> Hi, hello, welcome to theCUBE, I'm James Kobielus. I'm the lead analyst for Big Data Analytics at the Wikibon, which is the team inside of SiliconANGLE Media that focuses on emerging trends and technologies. We are here, on theCUBE at DataWorks Summit 2018 in Berlin, Germany. And I have a guest here. This is, Muggie, and if I get it wrong, Muggie Van Staden >> That's good enough, yep. >> Who is with Obsidian, which is a South Africa-based partner of Hortonworks. And I'm not familiar with Obsidian, so I'm going to ask Muggie to tell us a little bit about your company, what you do, your focus on open source, and really the opportunities you see for big data, for Hadoop, in South Africa, really the African continent as a whole. So, Muggie? >> Yeah, James great to be here. Yes, Obsidian, we started it 23 years ago, focusing mostly on open source technologies, and as you can imagine that has changed a lot over the last 23 years when we started the concept of selling Linux was basically a box with a hat and maybe a T-shirt in it. Today that's changed. >> James: Hopefully there's a stuffed penguin in there, too. (laughing) I could use that right now. >> Maybe a manual. So our business has evolved a lot over the last 23 years. And one of the technologies that has come around is Hadoop. And we actually started with some of the other Hadoop vendors out there as our first partnerships, and probably three or four years ago we decided to take on Hortonworks as one of our vendors. We found them an amazing company to work with. And together with them we've now worked in four of the big banks in South Africa. One of them is actually here at DataWorks Summit. They won an award last night. So it's fantastic to be part of all of that. And yes, South Africa being so far removed from the rest of the world. They have different challenges. Everybody's nervous of Cloud. We have the joys that we don't really have any Cloud players locally yet. The two big players are in Microsoft and Amazon are planning some data centers soon. So the guys have different challenges to Europe and to the States. But big data, the big banks are looking at it, starting to deploy nice Hadoop clusters, starting to ingest data, starting to get real business value out of it, and we're there to help, and hopefully the four is the start for us and we can help lots of customers on this journey. >> Are South African-based companies, because you are so distant in terms of miles on the planet from Europe, from the EU, is any company in South Africa, or many companies, concerned at all about the global, or say the general data protection regulation, GDPR? US-based companies certainly are 'cause they operate in Europe. So is that a growing focus for them? And we have five weeks until GDPR kicks in. So tell me about it. >> Yeah, so from a South African point of view, some of the banks and some of the companies would have subsidiaries in Europe. So for them it's a very real thing. But we have our own Act called PoPI, which is the protection of private information, so very similar. So everybody's keeping an eye on it. Everybody's worried. I think everybody's worried for the first company to be fined. And then they will all make sure that they get their things right. But, I think not just because of a legislation, I think it's something that everybody should worry about. How do we protect data? How do we make sure the right people have access to the correct data when they should and nobody violates that because I mean, in this day and age, you know, Google and Amazon and those guys probably know more about me than my family does. So it's a challenge for everybody. And I think it's just the right thing for companies to do is to make sure that the data that they do have that they really do take good care of it. We trust them with our money and now we're trusting them with our data. So it's a real challenge for everybody. >> So how long has Obsidian been a partner of Hortonworks and how has your role, or partnership I should say, evolved over that time, and how do you see it evolving going forward. >> We've been a partner about three or four years now. And started off as a value added reseller. We also a training partner in South Africa for them. And as they as company have evolved, we've had to evolve with them. You know, so they started with HTTP as the Hadoop platform. Now they're doing NiFi and HDF, so we have to learn all of those technologies as well. But very, very excited where they're going with DataPlane service just managing a customer's data across multiple clusters, multiple clouds, because that's realistically where we see all the customers going, is you know clusters, on-premise clusters in typically multiple Clouds and how do you manage that? And we are very excited to walk this road together with Hortonworks and all the South African customers that we have. >> So you say your customers are deploying multiple Clouds. Public Clouds or hybrid private-public Clouds? Give us a sense, for South Africa, whether public Cloud is a major, or is a major deployment option or choice for financial services firms that you work with. >> Not necessarily financial services, so most of them are kicking tires at this stage, nobody's really put major work loads in there. As I mentioned, both Amazon and Microsoft are planning to put data centers down in South Africa very soon, and I think that will spur a big movement towards Cloud, but we do have some customers, unfortunately not Hortonworks customers, that are actually mostly in the Cloud. And they are now starting to look at a multi-Cloud strategy. So to ideally be in the three or four major Cloud providers and spinning up the right workloads in the right Cloud, and we're there to help. >> One of the most predominant workloads that your customers are running in the Cloud, is it backend in terms of data ingest and transformation? Is it a bit of maybe data warehousing with unstructured data? Is it a bit of things like queriable archiving. I want to get a sense for, what is predominant right now in workloads? >> Yeah I think most of them start with (mumble) environments. (mumbles) one customer that's heavily into Cloud from a data point of view. Literally it's their data warehouse. They put everything in there. I think from the banking customers, most of them are considering DR of their existing Hadoop clusters, maybe a subset of their data and not necessarily everything. And I think some of them are also considering putting their unstructured data outside on the Cloud because that's where most of it's coming from. I mean, if you have Twitter, Facebook, LinkedIn data, it's a bit silly to pull all of that into your environment, why not just put it in the Cloud, that's where it's coming from, and analyze that and connect it back to your data where relevant. So I think a lot of the customers would love to get there, and now Hortonworks makes it so much easier to do that. I think a lot of them will start moving in that direction. Now, excuse me, so are any or many of your customers doing development and training of machine learning algorithms and models in their Clouds? And to the extent that they are, are they using tools like the IBM Data Science Experience that Hortonworks resells for that? >> I think it's definitely on the radar for a lot of them. I'm not aware of anybody using it yet, but lots of people are looking at it and excited about the partnership between IBM and Hortonworks. And IBM has been a longstanding player in the South African market, and it's exciting for us as well to bring them into the whole Hortonworks ecosystem, and together solve real world problems. >> Give us a sense for how built out the big data infrastructure is in neighboring countries like Botswana or Angola or Mozambique and so forth. Is that an area that your company, are those regions that your company operates in? Sells into? >> We don't have offices, but we don't have a problem going in and helping customers there, so we've had projects in the past, not data related, that we've flown in and helped people. Most of the banks from a South African point of view, have branches into Africa. So it's on the roadmap, some are a little bit ahead of others, but definitely on the roadmap to actually put down Hadoop clusters in some of the major countries all throughout Africa. There's a big debate, do you put it down there, do you leave the data in South Africa? So they're all going through their own legislation, but it's definitely on the roadmap for all of them to actually take their data, knowledge in data science, up into Africa. >> Now you say that in South Africa Proper, there are privacy regulations, you know, maybe not the same as GDPR, but equivalent. Throughout Africa, at least throughout Southern Africa, how is privacy regulation lacking or is it emerging? >> I think it's emerging. A lot of the countries do have the basic rule that their data shouldn't leave the country. So everybody wants that data sovereignty and that's why a lot of them will not go to Cloud, and that's part of the challenges for the banks, that if they have banks up in Botswana, etc. And Botswana rules are our data has to stay in country. They have to figure out a way how do they connect that data to get the value for all of their customers. So real world challenges for everybody. >> When you're going into and selling into an emerging, or developing nation, of you need to provide upfront consulting to help the customer bootstrap their own understanding of the technology and making the business case and so forth. And how consultative is the selling process... >> Absolutely, and what we see with the banks, most of them even have a consultative approach within their own environment, so you would have the South African team maybe flying into the team at (mumbles) Botswana, and share some of the learnings that they've had. And then help those guys get up to speed. The reality is the skills are not necessarily in country. So there's a lot of training, a lot of help to go and say, we've done this, let us upscale you. And be a part of that process. So we sometimes send in teams to come and do two, three day training, basics, etc., so that ultimately the guys can operationalize in each country by themselves. >> So, that's very interesting, so what do you want to take away from this event? What do you find most interesting in terms of the sessions you've been in around the community showcase that you can take back to Obsidian, back in your country and apply? Like the announcement this morning of the Data Steward Studio. Do you see a possible, that your customers might be eager to use that for curation of their data in their clusters? >> Definitely, and one of the key messages for me was Scott, the CTO's message about your data strategy, your Cloud strategy, and your business strategy. It is effectively the same thing. And I think that's the biggest message that I would like to take back to the South African customers is to go and say, you need to start thinking about this. You know, as Cloud becomes a bigger reality for us, we have to align, we have to go and say, how do we get your data where it belongs? So you know, we like to say to our customers, we help the teams get the right code to the right computer and the right data, and I think it's absolutely critical for all of the customers to go and say, well, where is that data going to sit? Where is the right compute for that piece of data? And can we get it then, can we manage it, etc.? And align to business strategy. Everybody's trying to do digital transformation, and those three things go very much hand-in-hand. >> Well, Muggie, thank you very much. We're at the end of our slot. This has been great. It's been excellent to learn more about Obsidian and the work you're doing in South Africa, providing big data solutions or working with customers to build the big data infrastructure in the financial industry down there. So this has been theCUBE. We've been speaking with Muggie Van Staden of Obsidian Systems, and here at DataWorks Summit 2018 in Berlin. Thank you very much.

Published Date : Apr 18 2018

SUMMARY :

brought to you by Hortonworks. I'm the lead analyst for Big Data Analytics at the Wikibon, and really the opportunities you see for big data, and as you can imagine that has changed a lot I could use that right now. So the guys have different challenges to Europe or say the general data protection regulation, GDPR? And I think it's just the right thing for companies to do and how do you see it evolving going forward. And we are very excited to walk this road together So you say your customers are deploying multiple Clouds. And they are now starting to look at a multi-Cloud strategy. One of the most predominant workloads and now Hortonworks makes it so much easier to do that. and excited about the partnership the big data infrastructure is in neighboring countries but definitely on the roadmap to actually put down you know, maybe not the same as GDPR, and that's part of the challenges for the banks, And how consultative is the selling process... and share some of the learnings that they've had. around the community showcase that you can take back for all of the customers to go and say, and the work you're doing in South Africa,

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
IBMORGANIZATION

0.99+

James KobielusPERSON

0.99+

AmazonORGANIZATION

0.99+

HortonworksORGANIZATION

0.99+

MicrosoftORGANIZATION

0.99+

EuropeLOCATION

0.99+

Muggie Van StadenPERSON

0.99+

AfricaLOCATION

0.99+

GoogleORGANIZATION

0.99+

Muggie van StadenPERSON

0.99+

BotswanaLOCATION

0.99+

MozambiqueLOCATION

0.99+

AngolaLOCATION

0.99+

MuggiePERSON

0.99+

ScottPERSON

0.99+

South AfricaLOCATION

0.99+

JamesPERSON

0.99+

Southern AfricaLOCATION

0.99+

twoQUANTITY

0.99+

LinkedInORGANIZATION

0.99+

BerlinLOCATION

0.99+

three dayQUANTITY

0.99+

threeQUANTITY

0.99+

GDPRTITLE

0.99+

FacebookORGANIZATION

0.99+

Berlin, GermanyLOCATION

0.99+

TwitterORGANIZATION

0.99+

Obsidian SystemsORGANIZATION

0.99+

first companyQUANTITY

0.99+

five weeksQUANTITY

0.99+

fourQUANTITY

0.99+

first partnershipsQUANTITY

0.99+

threeDATE

0.99+

TodayDATE

0.98+

LinuxTITLE

0.98+

23 years agoDATE

0.98+

DataWorks Summit 2018EVENT

0.98+

bothQUANTITY

0.97+

EULOCATION

0.97+

WikibonORGANIZATION

0.97+

oneQUANTITY

0.97+

PoPITITLE

0.97+

Data Steward StudioORGANIZATION

0.97+

each countryQUANTITY

0.97+

CloudTITLE

0.97+

USLOCATION

0.96+

last nightDATE

0.96+

SiliconANGLE MediaORGANIZATION

0.96+

four yearsQUANTITY

0.96+

DataWorks SummitEVENT

0.96+

HadooORGANIZATION

0.96+

OneQUANTITY

0.96+

Dataworks Summit 2018EVENT

0.95+

HadoopORGANIZATION

0.93+

about threeQUANTITY

0.93+

two big playersQUANTITY

0.93+

theCUBEORGANIZATION

0.93+

Andreas Kohlmaier, Munich Re | Dataworks Summit EU 2018


 

>> Narrator: From Berlin, Germany, it's The Cube. Covering DataWorks Summit Europe 2018. Brought to you by Hortonworks. >> Well, hello. Welcome to The Cube. I'm James Kobielus. I'm the Lead Analyst for Big Data Analytics in the Wikibon team of SiliconANGLE Media. We are here at DataWorks Summit 2018 in Berlin. Of course, it's hosted by a Hortonworks. We are in day one of two days of interviews with executives, with developers, with customers. And this morning the opening keynote, one of the speaker's was a customer of Hortonworks from Munich Re, the reinsurance company based of course in Munich, Germany. Andreas Kohlmaier, who's the the head of Data Engineering I believe, it was an excellent discussion you've built out of data lake. And the first thing I'd like to ask you Andreas is right now it's five weeks until GDPR, the general data protection regulation, goes into full force on May 25th. And of course it applies to the EU, to anybody who does business in the EU including companies based elsewhere, such as in the US, needs to start complying with GDPR in terms of protecting personal data. Give us a sense for how Munich Re is approaching the deadline, your level of readiness to comply with GDPR, and how your investment in your data lake serves as a foundation for that compliance. >> Absolutely. So thanks for the question. GDPR, of course, is the hot topic across all European organizations. And we actually pretty well prepared. We compiled all the processes and the necessary regulations and in fact we are now selling this also as a service product to our customers. This has been an interesting side effect because we have lots of other insurance companies and we started to think about why not offer this as a service to other insurance companies to help them prepare for GDPR. This is actually proving to be one of the exciting interesting things that can happen about GDPR. >> Maybe that would be your new line of business. You make more money doing that then. >> I'm not sure! (crosstalk) >> Well that's excellent! So you've learned a lot of lessons. So already so you're ready for May 25th? You have, okay, that's great. You're probably far ahead of I know a lot of U.S. based firms. We're, you know in our country and in other countries, we're still getting our heads around all the steps that are needed so you know many companies outside the EU may call on you guys for some consulting support. That's great! So give us a sense for your data lake. You discussed it this morning but can you give us a sense for the business justification for building it out? How you've rolled it out? What stage it's in? Who's using it for what? >> So absolutely. So one of the key things for us at Munich Re is the issue about complexity or data diversity as it was also called this morning. So we have so many different areas where we are doing business in and we have lots of experts in the different areas. And those people and I really have they are very knowledgeable in the area and now they also get access to new sources of information. So to give you a sense we have people for example that are really familiar with weather and climate change, also with satellites. We have captains for ships and pilots for aircraft. So we have lots of expertise in all the different areas. Why? Because we are taking those risks in our books. >> Those are big risks too. You're a reinsurance company so yeah. >> And these are actually complex risks where we really have people that really are experts on their field. So we have sometimes have people that have 20 years plus of experience in the area and then they change to the insurer to actually bring their expertise on the field also to the risk management side. And all those people, they now get an additional source of input which is the data that is now more or less readily available everywhere. So first of all, we are getting new data with the submissions and the risks that we are taking and there are also interesting open data sources to connect to so that those experts can actually bring their knowledge and their analytics to a new level by adding the layer of data and analytics to their existing knowledge. And this allows us, first of all, to understand the risks even better, to put a better price tag on that, and also to take up new risks that have not been possible to cover before. So one of the things is also in the media I think is that we are also now covering the Hyperloop once it's going to be built. So those kind of new things are only possible with data analytics. >> So you're a Hortonworks customer. Give us a sense for how you're using or deploying Hortonworks data platform or data plane service and whatnot inside of your data lake. It sounds like it's a big data catalog, is that a correct characterization? >> So one of the things that is key to us is actually finding the right information and connecting those different experts to each other. So this is why the data catalog plays a central role. Here we have selected Alation as a catalog tool to connect the different experts in the group. The data lake at the moment is an on-prem installation. We are thinking about moving parts of that workload to the cloud to actually save operation costs. >> On top of HTP. >> Yeah so Alation is actually as far as I know technically it's a separate server that indexes the hive tables on HTP. >> So essentially the catalog itself is provides visualization and correlation across disparate data sources that are managing your hadoop. >> Yeah, so the the catalog actually is a great way of connecting the experts together. So that's you know okay if we have people on one part of the group that are very knowledgeable about weather and they have great data about weather then we'd like to connect them for example to the guys that doing crop insurance for India so that they can use the weather data to improve the models for example for crop insurance in Asia. And there the data catalog helps us to connect those experts because you can first of all find the data sources and you can also see who is the expert on the data. You can then also call them up or ask them a question in the tool. So it's essentially a great way to share knowledge and to connect the different experts of the group. >> Okay, so it's also surfacing up human expertise. Okay, is it also serving as a way to find training datasets possibly to use to build machine learning models to do more complex analyses? Is that something that you're doing now or plan to do in the future? >> Yes, so we are doing some of course machine learning also deep learning projects. We are also just started a Center of Excellence for artificial intelligence to see okay how we can use deep learning and machine learning also to find different ways of pricing insurance lists for example and this of course for all those cases data is key and we really need people to get access to the right data. >> I have to ask you. One of the things I'm seeing, you mentioned Center of Excellence for AI. I'm seeing more companies consider, maybe not do it, consider establishing a office of the chief AI officer like reporting to the CEO. I'm not sure that that's a great idea for a lot of businesses but since an insurance company lives and dies by data and calculations and so forth, is that something that Munich Re is doing or considering in a C-Suite level officer of that sort responsible for this AI competency or no? >> Could be in the future. >> Okay. >> We sort of just started with the AI Center of Excellence. That is now reporting to our Chief Data Officer so it's not yet a C-Suite. >> Is the Center of Excellence for AI, is it simply like a training institute to provide some basic skill building or is there something more there? Do you do development? >> Actually they are trying out and developing ways on how we can use AI on deep learning for insurance. One of the core things of course is also about understanding natural language to structure the information that we are getting in PDFs and in documents but really also while using deep learning as a new way to build tariffs for the insurance industry. So that's one of the the core things to find and create new tariffs. And we also experimenting, haven't found the product yet there, whether or not we can use deep learning to create better tariffs. That could also then be one of the services, again we are providing to our customers, the insurance companies and they build that into their products. Something like yeah the algorithms is powered by Munich Re. >> Now your users of your data lake, these are expert quantitative analysts, right, for the most part? So you mentioned using natural language understanding AI capabilities. Is that something that you have a need to do in high volume as a reinsurance company? Take lots of source documents and be able to as it were identify the content and high volume and important you know not OCR but rather the actual build a graph of semantic graph of what's going on inside the document? >> I'm going to give you an example of the things that we are doing with natural language processing. And this one is about the energy business in the US. So we are actually taking up or seeing most of the risks that are related to oil and gas in the U.S. So all the refineries, all the larger stations, and the the petroleum tanks. They are all in our books and for each and every one of them we get a nice report on risks there with a couple of hundred of pages. And inside these reports there's also some paragraph written in where actually the refinery or the plants gets its supplies from and where it ships its products to. And thence we are seeing all those documents. That's in the scale of a couple of thousands so it's not really huge but all together a couple of hundred thousand pages. We use NLP and AI on those documents to extract the supply chain information out of it so in that way we can stitch together a more or less complete picture of the supply chain for oil and gas in the U.S. which helps us again to better understand that risk because supply chain breakdown is one of the major risk in the world nowadays. >> Andreas, this has been great! We can keep on going on. I'm totally fascinated by your use of AI but also your use of a data lake and I'm impressed by your ability to get your, as a company get your as we say in the U.S. get your GDPR ducks in a row and that's great. So it's been great to have you on The Cube. We are here at DataWorks Summit in Berlin. (techno music)

Published Date : Apr 18 2018

SUMMARY :

Brought to you by Hortonworks. And the first thing I'd like to ask you Andreas of the exciting interesting things Maybe that would be your new line of business. all the steps that are needed so you know So one of the key things for us at Munich Re You're a reinsurance company so yeah. on the field also to the risk management side. of your data lake. So one of the things that is key to us the hive tables on HTP. So essentially the catalog itself experts of the group. or plan to do in the future? for artificial intelligence to see okay how we One of the things I'm seeing, That is now reporting to our Chief Data Officer so to structure the information that we are getting on inside the document? of the risks that are related to oil and gas in the U.S. So it's been great to have you on The Cube.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
James KobielusPERSON

0.99+

Munich ReORGANIZATION

0.99+

Andreas KohlmaierPERSON

0.99+

May 25thDATE

0.99+

AndreasPERSON

0.99+

20 yearsQUANTITY

0.99+

USLOCATION

0.99+

HortonworksORGANIZATION

0.99+

BerlinLOCATION

0.99+

AsiaLOCATION

0.99+

GDPRTITLE

0.99+

two daysQUANTITY

0.99+

U.S.LOCATION

0.99+

five weeksQUANTITY

0.99+

Center of Excellence for AIORGANIZATION

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

IndiaLOCATION

0.99+

Berlin, GermanyLOCATION

0.98+

OneQUANTITY

0.98+

Munich ReLOCATION

0.98+

oneQUANTITY

0.97+

DataWorks SummitEVENT

0.97+

one partQUANTITY

0.97+

DataWorks Summit 2018EVENT

0.97+

2018EVENT

0.96+

EUORGANIZATION

0.96+

Munich, GermanyLOCATION

0.96+

eachQUANTITY

0.96+

Dataworks Summit EU 2018EVENT

0.93+

first thingQUANTITY

0.88+

HyperloopTITLE

0.87+

this morningDATE

0.86+

Center of Excellence for artificial intelligenceORGANIZATION

0.85+

AlationTITLE

0.84+

EULOCATION

0.83+

hundred thousand pagesQUANTITY

0.82+

one ofQUANTITY

0.79+

AlationORGANIZATION

0.77+

WikibonORGANIZATION

0.74+

couple of hundred of pagesQUANTITY

0.73+

couple of thousandsQUANTITY

0.7+

CubeORGANIZATION

0.7+

C-SuiteTITLE

0.69+

firstQUANTITY

0.67+

EuropeanLOCATION

0.6+

DataPERSON

0.57+

servicesQUANTITY

0.56+

everyQUANTITY

0.53+

EuropeLOCATION

0.52+

AIORGANIZATION

0.52+

DataORGANIZATION

0.5+

coupleQUANTITY

0.43+

CubePERSON

0.42+

Bernard Marr | Dataworks Summit 2018


 

>> Narrator: From Berlin, Germany, it's theCUBE, covering DataWorks Summit Europe 2018, brought to you by Hortonworks. >> Well, hello, and welcome to the Cube. I'm James Kobielus. I'm the lead analyst for Big Data Analytics with the Wikibon team within SiliconANGLE Media. We are here at the DataWorks Summit 2018 in Berlin, Germany. And I have a special guest, we have a special guest, Bernard Marr, one of the most influential, thought leaders in the big data analytics arena. And it's not just me saying that. You look at anybody's rankings, Bernard's usually in the top two or three of influentials. He publishes a lot. He's a great consultant. He keynoted this morning on the main stage at Dataworks Summit. It was a very fascinating discussion, Bernard. And I'm a little bit star struck 'cause I assumed you were this mythical beast who just kept putting out these great books and articles and so forth. And I'm glad to have you. So, Bernard, I'd like for you to stand back, we are here in Berlin, in Europe. This is April of 2018, in five weeks time, the general data protection, feels global 'cause it sort of is. >> It is. >> The general data protection regulation will take full force, which means that companies that do business in Europe, in the EU, must under the law protect the personal data they collect on EU citizens ensuring the right to privacy, the right to be forgotten, ensuring user's, people's ability to withhold consent to process and profile and so forth. So that mandate is coming down very fast and so forth. What is your thoughts on GDPR? Is it a good thing, Bernard, is it high time? Is it a burden? Give us your thoughts on GDPR currently. >> Okay, first, let me return all the compliments. It's really great to be here. I think GDPR can be both. And for me it will come down very much to the way it gets implemented. So, in principle for me, it is a good thing because what I've always made companies do and advise them to do is to be completely transparent in the way they're collecting data and using data. I believe that the big data world can't thrive if we don't develop this trust and have this transparency. So in principle, it's a great thing. For me will come down to the implementation of all of this. I had an interesting chat just minutes ago with the event photographer saying that once GDPR kicks in he can't actually publish any photographs without getting written consent for everyone in the photograph. That's a massive challenge and he was saying he can't afford to lose 4% of his global revenue. So I think it will be very interesting to see how this will-- >> How it'll be affecting face recognition, I'm sorry go ahead. >> Bernard: Yeah maybe. >> Well maybe that's a bad thing, maybe it's a good thing. >> Maybe it is, yeah, maybe. So for me, in principle a very good thing. In practice, I'm intrigued to see how this will get implemented. >> Of the clients you consult, what percentage in the EU, without giving away names, what percentage do you think are really ready right now or at least will be by May 25th to comply with the letter of the law? Is it more than 50%? Is it more than 80%? Or will there be a lot of catching up to do in a short period of time? >> My sense is that there's a lot of catching up to do. I think people are scrambling to get ready at the moment. But the thing is nobody really knows what being ready really means. I think there are lots of different interpretations. I've been talking to a few lawyers recently. And everyone has a slightly different interpretation of how far they can push the boundaries, so, again, I'm intrigued to see what will actually happen. And I very much hope that common sense prevails and it will be seen as a good force and something that is actually good for everyone in the field of big data. >> So slightly changing track, in the introduction of you this morning, I think it was John Christ of Hortonworks said that you made a prediction about this year that AI will be used to automate more things than people realize and it'll come along fairly fast. Can you give us a sense for how automation, AI is enabling greater automation, and whether, you know, this is the hot button topic, AI will put lots of people out of work fairly quickly by automating everything that white collar workers and so forth are doing, what are your thoughts there? Is it cause for concern? >> Yes, and it's probably one of the questions I get asked the most and I wish I had a very good answer for it. If we look back at the other, I believe that we are experiencing a new industrial revolution at the moment, and if you look at what the World Economic Forums CEO and founder, Klaus Schwab, is preaching about, it is that we are experiencing this new industrial revolution that will truly transform the workplace and our lives. In history, all of the other three previous industrial revolutions have somehow made our lives better. And we have always found something to do for us. And they have changed the jobs. Again, there was a recent report that looked at some of the key AI trends and what they found is that actually AI produces more new jobs than it destroys. >> Will we all become data scientists under, as AI becomes predominant? Or what's going on here? >> No I don't, and this is, I wish I had the answer to this. For me is the advice I give my own children now is to focus on the really human element of it and probably the more strategic element. The problem is five, six years ago this was a lot easier. I could talk about emotional, intelligence, creativity, with advances in machine learning, this advice is no longer true. And lots of jobs, even some of the things I do, I write for Forbes on a regular basis. I also know that AIs write for Forbes. A lot of the analyst reports are now machine generated. >> Natural language generation, a huge use case for AI that people don't realize. >> Bernard: Absolutely. >> Yeah. >> So, for me I see it, as an optimist I see it positively. I also question whether we as human beings should be going to work eight hours a day doing lots of stuff we quite often don't enjoy. So for me, the challenge is adjusting our economic model to this new reality, and I see that there will be significant disruption over the next 20 years that with all the technology coming in and really challenging our jobs. >> Will AI put you and me out of a job. In other words, will it put the analysts and the consultants out of work and allow people to get expert advice on how to manage technology without having to go through somebody like a you or a me? >> Absolutely, and for me, my favorite example is looking at medicine. If you look at doctors, traditionally you send a doctor to medical school for seven years. You then hope that they retain 10% of what they've learned if you're lucky. Then they gain some experience. You then turn up in the practice with your conditions. Again, if you're super lucky, they might have skim read some of your previous conditions, and then diagnose you. And unless you have something that's very common, the chance that they get this right is very low. So compare this with your old stomping ground IBMs Watson, so they are able to feed all medical knowledge into that cognitive computing platform. They can update this continuously, and you think, and could then talk to Watson eight hours a day if I wanted to about my symptoms. >> But can you trust that advice? Why should you trust the advice that's coming from a bot? Yeah, that's one of the key issues. >> Absolutely, and I think at the moment maybe not quite because there's still a human element that a doctor can bring because they can read your emotions, they can understand your tone of voice. This is going to change with affective computing and the ability for machines to do more of this, too. >> Well science fiction authors run amok of course, because they imagine the end state of perfection of all the capabilities like you're describing. So we perfect robotics. We perfect emotion analytics and so forth. We use machine learning to drive conversational UIs. Clearly a lot of people imagine that the technology, all those technologies are perfected or close to it, so, you know. But clearly you and I know that it's a lot of work to do to get them-- >> And we both have been in the technology space long enough to know that there are promises and there's lots of hype, and then there's a lot of disappointment, and it usually takes longer than most people predict. So what I'm seeing is that every industry I work in, and this is what my prediction is, automation is happening across every industry I work in. More things, even things I thought five years ago couldn't be automated. But to get to a state where it really transforms our world, I think we are still a few years away from that. >> Bernard, in terms of the hype factor for AI, it's out of sight. What do you think is the most hyped technology or application under the big umbrella of AI right now in terms of the hype far exceeds the utility. I don't want to put words in your mouth. I've got some ideas. Your thoughts? >> Lots of them. I think that the two areas I write a lot about and talk to companies a lot about is deep learning, machine learning, and blockchain technology. >> James: Blockchain. >> So they are, for me, they have huge potential, some amazing use cases, at the same time the hype is far ahead of reality. >> And there's sort of an intersection between AI and blockchain right now, but it's kind of tentative. Hey, Bernard, we are at the end of this segment. It's been so great. We could just keep going on and on and on. >> I know we could just be... >> Yeah, there's a lot I've been wanting to ask you for a long time. I want to thank you for coming to theCUBE. >> Pleasure. >> This has been Bernard Marr. I'm James Kobielus on theCUBE from DataWorks Summit in Berlin, and we'll be back with another guest in just a little while. Thank you very much.

Published Date : Apr 18 2018

SUMMARY :

brought to you by Hortonworks. And I'm glad to have you. ensuring the right to privacy, I believe that the big data world can't thrive I'm sorry go ahead. In practice, I'm intrigued to see I think people are scrambling to get ready at the moment. in the introduction of you this morning, and if you look at what the World Economic Forums and probably the more strategic element. a huge use case for AI that people don't realize. and I see that there will be significant disruption and allow people to get expert advice the chance that they get this right is very low. Yeah, that's one of the key issues. and the ability for machines to do more of this, too. Clearly a lot of people imagine that the technology, I think we are still a few years away from that. Bernard, in terms of the hype factor for AI, and talk to companies a lot about at the same time the hype is far ahead of reality. Hey, Bernard, we are at the end of this segment. to ask you for a long time. and we'll be back with another guest in just a little while.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
BernardPERSON

0.99+

James KobielusPERSON

0.99+

BerlinLOCATION

0.99+

Bernard MarrPERSON

0.99+

John ChristPERSON

0.99+

April of 2018DATE

0.99+

EuropeLOCATION

0.99+

Klaus SchwabPERSON

0.99+

10%QUANTITY

0.99+

seven yearsQUANTITY

0.99+

May 25thDATE

0.99+

IBMsORGANIZATION

0.99+

JamesPERSON

0.99+

4%QUANTITY

0.99+

more than 80%QUANTITY

0.99+

HortonworksORGANIZATION

0.99+

Berlin, GermanyLOCATION

0.99+

threeQUANTITY

0.99+

more than 50%QUANTITY

0.99+

bothQUANTITY

0.99+

GDPRTITLE

0.99+

two areasQUANTITY

0.99+

DataWorks SummitEVENT

0.99+

firstQUANTITY

0.98+

oneQUANTITY

0.98+

eight hours a dayQUANTITY

0.98+

Dataworks SummitEVENT

0.98+

World Economic ForumsORGANIZATION

0.97+

DataWorks Summit 2018EVENT

0.97+

2018EVENT

0.97+

five years agoDATE

0.96+

fiveDATE

0.96+

six years agoDATE

0.96+

five weeksQUANTITY

0.95+

WikibonORGANIZATION

0.95+

Dataworks Summit 2018EVENT

0.92+

ForbesORGANIZATION

0.92+

this yearDATE

0.89+

EULOCATION

0.88+

three previous industrial revolutionsQUANTITY

0.87+

this morningDATE

0.86+

EUORGANIZATION

0.8+

Big Data AnalyticsORGANIZATION

0.78+

one ofQUANTITY

0.77+

next 20 yearsDATE

0.75+

WatsonTITLE

0.75+

SiliconANGLE MediaORGANIZATION

0.71+

WatsonORGANIZATION

0.68+

minutesDATE

0.61+

twoQUANTITY

0.54+

questionsQUANTITY

0.51+

morningDATE

0.5+

CubeORGANIZATION

0.48+

issuesQUANTITY

0.46+

Abhas Ricky, Hortonwork | Dataworks Summit 2018


 

>> Announcer: From Berlin, Germany, it's the CUBE covering Dataworks Summit Europe 2018. Brought to you by Hortonworks. >> Welcome to the CUBE, we're here at Dataworks Summit 2018 in Berlin. I'm James Kobielus. I am the lead analyst for big data analytics on the Wikibon team of SiliconANGLE Media On the CUBE, we extract the signal from the noise and here at Dataworks Summit, the signal is big data analytics and increasingly the imperative for many enterprises is compliance with GDPR, the General Data Protection Regulation comes in five weeks, May 25th. There's more things going on so what I'm going to be doing today for the next 20 minutes or so is from Hortonworks I have Abhas Ricky who is the director of strategy and innovation. He helps customers, and he'll explain what he does, but at a high level, he helps customers to identify the value of investments in big data, analytics, big data platforms in their business. And Abhas, how do you justify the value of compliance with GDPR. I guess, the value would be avoid penalties for noncompliance, right? Can you do it as an upside as well? Is there an upside in terms of if you make an investment, and you probably will need to make an investment to comply, Can you turn this around as a strategic asset, possibly? Yeah, so I'll take a step back first. >> James: Like a big data catalog and so forth. >> Yeah, so if you look at the value part which you said, it's interesting that you mentioned it. So there's a study which was done by McKinsey which said that only 15% of executives can understand what is the value of a digital initiative, let alone big data initiative. >> James: Yeah. >> Similarly, Gardner says that if you look at the various portraits and if you look at various issues, the fundamental thing which executives struggle with identifying the value which they will get. So that is where I pitch in. That is where I come in and do a data perspective. Now if you look at GDPR specifically, one of the things that we believe, and I've done multiple blogs around that and webinars, GDPR should be treated at a business opportunity because of the fact that -- >> James: Any opportunity? Business opportunity. It shouldn't necessarily be seen as a compliance burden on costs or your balance sheets because of the fact, it is the one single opportunity which allows you to clean up your data supply chain. It allows you to look at your data assets with a holistic view, and if you create a transparent data supply chain, and your IT systems talk to each other. So some of the provisions, as you know, in addition to right to content, right to portability, etc. It is also privacy by design which says that you have to be proactive in defining your IT systems and architecture. It's not necessarily reactive. But guess what? If you're able to do that, you will see the benefits in other use cases like single view of customer or fraud or anti-money laundering because at the end of the day, all GDPR is allowing you to say is that where do you store your data, what's the lineage, what's the provenance? Can you identify what the personally identifiable information is for any particular customer? And can you use that to your effect as you go forward? So it's a great opportunity because to be able to comply with the provisions, you've got to take steps before that which is essentially streamlining your data operations which obviously will have a domino effect on the efficiency of other use cases. So I believe it's a business opportunity. >> Right, now part of that opportunity in terms of getting your arms around what data you have, when the GDPR is concerned, the customer has a right to withhold consent for you and the enterprise that holds that data to use that personal data of theirs which they own for various and sundry reasons. Many enterprises and many of Hortonworks customers are using their big data for things like AI and machine learning. Won't this compliance with GDPR limit their ability to seize the opportunity to build deep learning and so forth? What are customers saying about that? Is that going to be kind of a downer or a chilling effect on their investments in AI and so forth? >> So there's two elements around it. The first thing which you said, there are customers, there's machine learning in AI, yes, there are. But broadly speaking, before you're able to do machine learning and AI, you need to get your data sets onto a particular platform in a particular fashion, clean data, otherwise, you can't do AI or machine learning on top of it. >> James: Right. So the reason why I say it's an opportunity is that because you're being forced by compliance to get that data from every other place onto this platform. So obviously those capabilities will get enhanced. Having said, I do agree if I'm an organization which does targeting, retargeting of customers based on multiple segmentations and then one of the things is online advertisements. In that case, yes, your ability might get affected, but I don't think you'll get prohibited. And that affected time span will be only small because you just adapt. So the good thing about machine learning and AI is that you don't create rules, you don't create manual rules. They pick up the rules based on the patterns and how the data sets have been performing. So obviously once you have created those structures in place, initially, yes, you'll have to make an investment to alter your programs of work. However, going forward, it will be even better. Because guess what? You just cleaned your entire data supply chain. So that's how I would see that, yes, a lot of companies, ecommerce you do targeting and retargeting based on the customer DNA, based on their shopping profiles, based on their shopping ad libs and then based off that, you give them the next best offer or whatever. So, yes, that might get affected initially, but that's not because GDPR is there or not. That's just because you're changing your program software. You're changing the fundamental way by which you're sourcing the data, the way they are coming from and which data can you use. But once you have tags against each of those attributes, once you have access controls, once you know exactly which customer attributes you can touch and you cannot for the purposes, do you have consent or not, your life's even better. The AI tools or the machine learning algorithms will learn from themselves. >> Right, so essentially, once you have a tight ship in terms of managing your data in line with the GDPR strictures and so forth, it sounds like what you're saying is that it gives you as an enterprise the confidence and assurance that if you want to use that data and need to use that data, you know exactly how you've the processes in place to gain the necessary consents from customers. So there won't be any nasty surprises later on of customers complaining because you've got legal procedures for getting the consent and that's great. You know, one of the things, Abhas, we're hearing right now in terms of compliance requirements that are coming along, maybe not apart of GDPR directly yet, but related to it is the whole notion of algorithmic transparency. As you build machine learning models and these machine learning models are driven into working applications, being able to transparently identify if those models make, in particular, let's say autonomous action based on particular data and particular variables, and then there is some nasty consequences like crashing an autonomous vehicle, the ability, they call it explicably AI to roll that back and determine who's liable for that event. Does Hortonworks have any capability within your portfolio to enable more transparency into the algorithmic underpinnings of a given decision? Is that something that you enable in your solutions or that your partner IBM enables through DSX and so forth? Give us a sense whether that's a capability currently that you guys offer and whether that's something in terms of your understand, are customers asking for that yet or is that too futuristic? >> So I would say that it's a two-part question. >> James: Yeah. >> The first one, yes, there are multiple regulations coming in, like Vilica Financial Markets, there's Mid Fair, the BCBS, etc. and organizations have to comply. You've got the IFRS which span to brokers, the insurance, etc., etc. So, yes, a lot of organizations across industries are getting affected by compliance use cases. Where does Hortonworks come into the picture is to be able to be compliant from a data standpoint, A you need to be able to identify which of those data sources you need to implement a particular use case. B you need to get them to a certain point whereby you can do analytics on that And then there's the whole storage and processing and all of that. But also which you might have heard at the keynote today, from a cloud perspective, it's starting to get more and more complex because everyone's moving to the cloud which means, if you look at any large multi-national organization, most of them have a hybrid cloud structure because they work with two or three cloud vendors which makes the process even more complex because now you have multiple clusters, you have have on premise and you have multiple different IT systems who need to talk to each other. Which is where the Hortonworks data plan services come into the picture because it gives you a unified view of your global data assets. >> James: Yes. >> Think of it like a single pane of glass which whereby you can do security and governance across all data assets. So from those angles, yes, we definitely enable those use cases which will help with compliance. >> Making the case to the customer for a big data catalog along the lines of what you guys offer, in making the case, there's a lot of upfront data architectural work that needs to be done to get all you data assets into shape within the context of the catalog. How do they justify making that expense in terms of hiring the people, the data architects and so forth needed to put it all in shape. I mean, how long does it take before you can really stand up in your working data catalog in most companies? >> So again, you've asked two questions. First of all is how do they justify it? Which is where we say that the platform is a means to an end. It's enabling you to deliver use cases. So I look at it in terms of five key value drivers. Either it's a risk reduction or it's a cost reduction or it's a cost avoidance. >> James: Okay. >> Or it's a revenue optimization, or it's time to market. Against each one of these value drivers, or multiple of them or a combination of them, each of the use cases that you're delivering on the platform will lead you to benefits around that. My job, obviously, is to work with the customers and executes to understand what will that be to quantify the potential impact which will then form the basis and give my customer champions enough ammunition so that they can go back and justify those investments. >> James: Abhas, we're going to have to cut it short, but I'm going to let you finish your point here, but we have to end this segment so go ahead. >> That's fine. >> Okay, well, anyway, have had Abhas Ricky who is the director of strategy and innovation at Hortonworks. We're here at Dataworks Summit Berlin. And thank you very much Sorry to cut it short, but we have to move to the next guest. >> No worries, pleasure, thank you very much. >> Take care, have a good one. >> Thanks a lot, yes. (upbeat music)

Published Date : Apr 18 2018

SUMMARY :

Brought to you by Hortonworks. and you probably will need to make an investment to comply, Yeah, so if you look at the value part which you said, the various portraits and if you look at various issues, So some of the provisions, as you know, the customer has a right to withhold consent for you you need to get your data sets onto a particular platform the way they are coming from and which data can you use. and need to use that data, you know exactly come into the picture because it gives you which whereby you can do security and governance a big data catalog along the lines of what you guys offer, the platform is a means to an end. will lead you to benefits around that. but I'm going to let you finish your point here, And thank you very much Thanks a lot, yes.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
JamesPERSON

0.99+

James KobielusPERSON

0.99+

twoQUANTITY

0.99+

BerlinLOCATION

0.99+

IBMORGANIZATION

0.99+

two questionsQUANTITY

0.99+

BCBSORGANIZATION

0.99+

two-partQUANTITY

0.99+

General Data Protection RegulationTITLE

0.99+

AbhasPERSON

0.99+

GardnerPERSON

0.99+

HortonworksORGANIZATION

0.99+

15%QUANTITY

0.99+

two elementsQUANTITY

0.99+

Vilica Financial MarketsORGANIZATION

0.99+

eachQUANTITY

0.99+

Abhas RickyPERSON

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

GDPRTITLE

0.99+

May 25thDATE

0.99+

todayDATE

0.98+

FirstQUANTITY

0.98+

Berlin, GermanyLOCATION

0.98+

Dataworks Summit 2018EVENT

0.98+

oneQUANTITY

0.98+

firstQUANTITY

0.97+

first oneQUANTITY

0.97+

singleQUANTITY

0.97+

Dataworks SummitEVENT

0.96+

five weeksQUANTITY

0.95+

five key value driversQUANTITY

0.95+

first thingQUANTITY

0.95+

WikibonORGANIZATION

0.95+

one single opportunityQUANTITY

0.93+

single paneQUANTITY

0.91+

McKinseyORGANIZATION

0.9+

CUBEORGANIZATION

0.9+

Mid FairORGANIZATION

0.89+

three cloud vendorsQUANTITY

0.89+

IFRSTITLE

0.87+

each oneQUANTITY

0.87+

Dataworks Summit Europe 2018EVENT

0.86+

DSXTITLE

0.8+

HortonworkORGANIZATION

0.78+

next 20 minutesDATE

0.72+

Scott Gnau, Hortonworks | Dataworks Summit EU 2018


 

(upbeat music) >> Announcer: From Berlin, Germany, it's The Cube, covering DataWorks Summit Europe 2018. Brought to you by Hortonworks. >> Hi, welcome to The Cube, we're separating the signal from the noise and tuning into the trends in data and analytics. Here at DataWorks Summit 2018 in Berlin, Germany. This is the sixth year, I believe, that DataWorks has been held in Europe. Last year I believe it was at Munich, now it's in Berlin. It's a great show. The host is Hortonworks and our first interviewee today is Scott Gnau, who is the chief technology officer of Hortonworks. Of course Hortonworks got established themselves about seven years ago as one of the up and coming start ups commercializing a then brand new technology called Hadoop and MapReduce. They've moved well beyond that in terms of their go to market strategy, their product portfolio, their partnerships. So Scott, this morning, it's great to have ya'. How are you doing? >> Glad to be back and good to see you. It's been awhile. >> You know, yes, I mean, you're an industry veteran. We've both been around the block a few times but I remember you years ago. You were at Teradata and I was at another analyst firm. And now you're with Hortonworks. And Hortonworks is really on a roll. I know you're not Rob Bearden, so I'm not going to go into the financials, but your financials look pretty good, your latest. You're growing, your deal sizes are growing. Your customer base is continuing to deepen. So you guys are on a roll. So we're here in Europe, we're here in Berlin in particular. It's five weeks--you did the keynote this morning, It's five weeks until GDPR. The sword of Damacles, the GDPR sword of Damacles. It's not just affecting European based companies, but it's affecting North American companies and others who do business in Europe. So your keynote this morning, your core theme was that, if you're in enterprise, your business strategy is equated with your cloud strategy now, is really equated with your data strategy. And you got to a lot of that. It was a really good discussion. And where GDPR comes into the picture is the fact that protecting data, personal data of your customers is absolutely important, in fact it's imperative and mandatory, and will be in five weeks or you'll face a significant penalty if you're not managing that data and providing customers with the right to have it erased, or the right to withdraw consent to have it profiled, and so forth. So enterprises all over the world, especially in Europe, are racing as fast as they can to get compliant with GDPR by the May 25th deadline time. So, one of the things you discussed this morning, you had an announcement overnight that Hortonworks has released a new solution in technical preview called The Data Steward Studio. And I'm wondering if you can tie that announcement to GDPR? It seems like data stewardship would have a strong value for your customers. >> Yeah, there's definitely a big tie-in. GDPR is certainly creating a milestone, kind of a trigger, for people to really think about their data assets. But it's certainly even larger than that, because when you even think about driving digitization of a business, driving new business models and connecting data and finding new use cases, it's all about finding the data you have, understanding what it is, where it came from, what's the lineage of it, who had access to it, what did they do to it? These are all governance kinds of things, which are also now mandated by laws like GDPR. And so it's all really coming together in the context of the new modern data architecture era that we live in, where a lot of data that we have access to, we didn't create. And so it was created outside the firewall by a device, by some application running with some customer, and so capturing and interpreting and governing that data is very different than taking derivative transactions from an ERP system, which are already adjudicated and understood, and governing that kind of a data structure. And so this is a need that's driven from many different perspectives, it's driven from the new architecture, the way IoT devices are connecting and just creating a data bomb, that's one thing. It's driven by business use cases, just saying what are the assets that I have access to, and how can I try to determine patterns between those assets where I didn't even create some of them, so how do I adjudicate that? >> Discovering and cataloging your data-- >> Discovering it, cataloging it, actually even... When I even think about data, just think the files on my laptop, that I created, and I don't remember what half of them are. So creating the metadata, creating that trail of bread crumbs that lets you piece together what's there, what's the relevance of it, and how, then, you might use it for some correlation. And then you get in, obviously, to the regulatory piece that says sure, if I'm a new customer and I ask to be forgotten, the only way that you can guarantee to forget me is to know where all of my data is. >> If you remember that they are your customer in the first place and you know where all that data is, if you're even aware that it exists, that's the first and foremost thing for an enterprise to be able to assess their degree of exposure to GDPR. >> So, right. It's like a whole new use case. It's a microcosm of all of these really big things that are going on. And so what we've been trying to do is really leverage our expertise in metadata management using the Apache Atlas project. >> Interviewer: You and IBM have done some major work-- >> We work with IBM and the community on Apache Atlas. You know, metadata tagging is not the most interesting topic for some people, but in the context that I just described, it's kind of important. And so I think one of the areas where we can really add value for the industry is leveraging our lowest common denominator, open source, open community kind of development to really create a standard infrastructure, a standard open infrastructure for metadata tagging, into which all of these use cases can now plug. Whether it's I want to discover data and create metadata about the data based on patterns that I see in the data, or I've inherited data and I want to ensure that the metadata stay with that data through its life cycle, so that I can guarantee the lineage of the data, and be compliant with GDPR-- >> And in fact, tomorrow we will have Mandy Chessell from IBM, a key Hortonworks partner, discussing the open metadata framework you're describing and what you're doing. >> And that was part of this morning's keynote close also. It all really flowed nicely together. Anyway, it is really a perfect storm. So what we've done is we've said, let's leverage this lowest common denominator, standard metadata tagging, Apache Atlas, and uplevel it, and not have it be part of a cluster, but actually have it be a cloud service that can be in force across multiple data stores, whether they're in the cloud or whether they're on prem. >> Interviewer: That's the Data Steward Studio? >> Well, Data Plane and Data Steward Studio really enable those things to come together. >> So the Data Steward Studio is the second service >> Like an app. >> under the Hortonworks DataPlane service. >> Yeah, so the whole idea is to be able to tie those things together, and when you think about it in today's hybrid world, and this is where I really started, where your data strategy is your cloud strategy, they can't be separate, because if they're separate, just think about what would happen. So I've copied a bunch of data out to the cloud. All memory of any lineage is gone. Or I've got to go set up manually another set of lineage that may not be the same as the lineage it came with. And so being able to provide that common service across footprint, whether it's multiple data centers, whether it's multiple clouds, or both, is a really huge value, because now you can sit back and through that single pane, see all of your data assets and understand how they interact. That obviously has the ability then to provide value like with Data Steward Studio, to discover assets, maybe to discover assets and discover duplicate assets, where, hey, I can save some money if I get rid of this cloud instance, 'cause it's over here already. Or to be compliant and say yeah, I've got these assets here, here, and here, I am now compelled to do whatever: delete, protect, encrypt. I can now go do that and keep a record through the metadata that I did it. >> Yes, in fact that is very much at the heart of compliance, you got to know what assets there are out there. And so it seems to me that Hortonworks is increasingly... the H-word rarely comes up these days. >> Scott: Not Hortonworks, you're talking about Hadoop. >> Hadoop rarely comes up these days. When the industry talks about you guys, it's known that's your core, that's your base, that's where HDP and so forth, great product, great distro. In fact, in your partnership with IBM, a year or more ago, I think it was IBM standardized on HDP in lieu of their distro, 'cause it's so well-established, so mature. But going forward, you guys in many ways, Hortonworks, you have positioned yourselves now. Wikibon sees you as being the premier solution provider of big data governance solutions specifically focused on multi-cloud, on structured data, and so forth. So the announcement today of the Data Steward Studio very much builds on that capability you already have there. So going forward, can you give us a sense to your roadmap in terms of building out DataPlane's service? 'Cause this is the second of these services under the DataPlane umbrella. Give us a sense for how you'll continue to deepen your governance portfolio in DataPlane. >> Really the way to think about it, there are a couple of things that you touched on that I think are really critical, certainly for me, and for us at Hortonworks to continue to repeat, just to make sure the message got there. Number one, Hadoop is definitely at the core of what we've done, and was kind of the secret sauce. Some very different stuff in the technology, also the fact that it's open source and community, all those kinds of things. But that really created a foundation that allowed us to build the whole beginning of big data data management. And we added and expanded to the traditional Hadoop stack by adding Data in Motion. And so what we've done is-- >> Interviewer: NiFi, I believe, you made a major investment. >> Yeah, so we made a large investment in Apache NiFi, as well as Storm and Kafka as kind of a group of technologies. And the whole idea behind doing that was to expand our footprint so that we would enable our customers to manage their data through its entire lifecycle, from being created at the edge, all the way through streaming technologies, to landing, to analytics, and then even analytics being pushed back out to the edge. So it's really about having that common management infrastructure for the lifecycle of all the data, including Hadoop and many other things. And then in that, obviously as we discuss whether it be regulation, whether it be, frankly, future functionality, there's an opportunity to uplevel those services from an overall security and governance perspective. And just like Hadoop kind of upended traditional thinking... and what I mean by that was not the economics of it, specifically, but just the fact that you could land data without describing it. That seemed so unimportant at one time, and now it's like the key thing that drives the difference. Think about sensors that are sending in data that reconfigure firmware, and those streams change. Being able to acquire data and then assess the data is a big deal. So the same thing applies, then, to how we apply governance. I said this morning, traditional governance was hey, I started this employee, I have access to this file, this file, this file, and nothing else. I don't know what else is out there. I only have access to what my job title describes. And that's traditional data governance. In the new world, that doesn't work. Data scientists need access to all of the data. Now, that doesn't mean we need to give away PII. We can encrypt it, we can tokenize it, but we keep referential integrity. We keep the integrity of the original structures, and those who have a need to actually see the PII can get the token and see the PII. But it's governance thought inversely as it's been thought about for 30 years. >> It's so great you've worked governance into an increasingly streaming, real-time in motion data environment. Scott, this has been great. It's been great to have you on The Cube. You're an alum of The Cube. I think we've had you at least two or three times over the last few years. >> It feels like 35. Nah, it's pretty fun.. >> Yeah, you've been great. So we are here at Dataworks Summit in Berlin. (upbeat music)

Published Date : Apr 18 2018

SUMMARY :

Brought to you by Hortonworks. So Scott, this morning, it's great to have ya'. Glad to be back and good to see you. So, one of the things you discussed this morning, of the new modern data architecture era that we live in, forgotten, the only way that you can guarantee and foremost thing for an enterprise to be able And so what we've been trying to do is really leverage so that I can guarantee the lineage of the data, discussing the open metadata framework you're describing And that was part of this morning's keynote close also. those things to come together. of lineage that may not be the same as the lineage And so it seems to me that Hortonworks is increasingly... When the industry talks about you guys, it's known And so what we've done is-- Interviewer: NiFi, I believe, you made So the same thing applies, then, to how we apply governance. It's been great to have you on The Cube. Nah, it's pretty fun.. So we are here at Dataworks Summit in Berlin.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
EuropeLOCATION

0.99+

ScottPERSON

0.99+

IBMORGANIZATION

0.99+

BerlinLOCATION

0.99+

Scott GnauPERSON

0.99+

HortonworksORGANIZATION

0.99+

TeradataORGANIZATION

0.99+

Last yearDATE

0.99+

May 25thDATE

0.99+

five weeksQUANTITY

0.99+

Mandy ChessellPERSON

0.99+

GDPRTITLE

0.99+

MunichLOCATION

0.99+

Rob BeardenPERSON

0.99+

second serviceQUANTITY

0.99+

30 yearsQUANTITY

0.99+

bothQUANTITY

0.99+

tomorrowDATE

0.99+

firstQUANTITY

0.99+

Berlin, GermanyLOCATION

0.99+

secondQUANTITY

0.99+

DataPlaneORGANIZATION

0.99+

sixth yearQUANTITY

0.98+

three timesQUANTITY

0.98+

first intervieweeQUANTITY

0.98+

Dataworks SummitEVENT

0.98+

oneQUANTITY

0.97+

this morningDATE

0.97+

DataWorks Summit 2018EVENT

0.97+

MapReduceORGANIZATION

0.96+

HadoopTITLE

0.96+

HadoopORGANIZATION

0.96+

one timeQUANTITY

0.96+

35QUANTITY

0.96+

single paneQUANTITY

0.96+

NiFiORGANIZATION

0.96+

todayDATE

0.94+

DataWorks Summit Europe 2018EVENT

0.93+

Data Steward StudioORGANIZATION

0.93+

Dataworks Summit EU 2018EVENT

0.92+

about seven years agoDATE

0.91+

a year orDATE

0.88+

yearsDATE

0.87+

StormORGANIZATION

0.87+

WikibonORGANIZATION

0.86+

Apache NiFiORGANIZATION

0.85+

The CubePERSON

0.84+

North AmericanOTHER

0.84+

DataWorksORGANIZATION

0.84+

Data PlaneORGANIZATION

0.76+

Data Steward StudioTITLE

0.75+

KafkaORGANIZATION

0.75+

Keynote Analysis | Dataworks Summit 2018


 

>> Narrator: From Berlin, Germany, it's theCUBE! Covering DataWorks Summit, Europe 2018. (upbeat music) Brought to you by Hortonworks. (upbeat music) >> Hello, and welcome to theCUBE. I'm James Kobielus. I'm the lead analyst for Big Data analytics in the Wikibon team of SiliconANGLE Media, and we're here at DataWorks Summit 2018 in Berlin, Germany. And it's an excellent event, and we are here for two days of hard-hitting interviews with industry experts focused on the hot issues facing customers, enterprises, in Europe and the world over, related to the management of data and analytics. And what's super hot this year, and it will remain hot as an issue, is data privacy and privacy protection. Five weeks from now, a new regulation of the European Union called the General Data Protection Regulation takes effect, and it's a mandate that is effecting any business that is not only based in the EU but that does business in the EU. It's coming fairly quickly, and enterprises on both sides of the Atlantic and really throughout the world are focused on GDPR compliance. So that's a hot issue that was discussed this morning in the keynote, and so what we're going to be doing over the next two days, we're going to be having experts from Hortonworks, the show's host, as well as IBM, Hortonworks is one of their lead partners, as well as a customer, Munich Re, will appear on theCUBE and I'll be interviewing them about not just GDPR but really the trends facing the Big Data industry. Hadoop, of course, Hortonworks got started about seven years ago as one of the solution providers that was focused on commercializing the open source Hadoop code base, and they've come quite a ways. They've had their recent financials were very good. They continue to rock 'n' roll on the growth side and customer acquisitions and deal sizes. So we'll be talking a little bit later to Scott Gnau, their chief technology officer, who did the core keynote this morning. He'll be talking not only about how the business is doing but about a new product announcement, the Data Steward Studio that Hortonworks announced overnight. It is directly related to or useful, this new solution, for GDPR compliance, and we'll ask Scott to bring us more insight there. But what we'll be doing over the next two days is extracting signal from noise. The Big Data space continues to grow and develop. Hadoop has been around for a number of years now, but in many ways it's been superseded in the agenda as the priorities of enterprises that are building applications from data by some newer primarily open source technology such as Apache Spark, TensorFlow for building deep learning and so forth. We'll be discussing the trends towards the deepening of the open source data analytics stack with our guest. We'll be talking with a European based reinsurance company, Munich Re, about the data lake that they have built for their internal operations, and we'll be asking their, Andres Kohlmaier, their lead of data engineering, to discuss how they're using it, how they're managing their data lake, and possibly to give us some insight about it will serve them in achieving GDPR compliance and sustaining it going forward. So what we will be doing is that we'll be looking at trends, not just in compliance, not just in the underlying technologies, but the applications that Hadoop and Spark and so forth, these technologies are being used for, and the applications are really, the same initiatives in Europe are world-wide in terms of what enterprises are doing. They're moving away from Big Data environments built primarily on data at rest, that's where Hadoop has been, the sweet spot, towards more streaming architectures. And so Hortonworks, as I said the show's host, has been going more deeply towards streaming architectures with its investments in NiFi and so forth. We'll be asking them to give us some insight about where they're going with that. We'll also be looking at the growth of multi-cloud Big Data environments. What we're seeing is that there's a trend in the marketplace away from predominately premises-based Big Data platforms towards public cloud-based Big Data platforms. And so Hortonworks, they are partners with a number of the public cloud providers, the IBM that I mentioned. They've also got partnerships with Microsoft Azure, with Amazon Web Services, with Google and so forth. We'll be looking, we'll be asking our guest to give us some insight about where they're going in terms of their support for multi-clouds, support for edge computing, analytics, and the internet of things. Big Data increasingly is evolving towards more of a focus on serving applications at the edge like mobile devices that have autonomous smarts like for self-driving vehicles. Big Data is critically important for feeding, for modeling and building the AI needed to power the intelligence and endpoints. Not just self-driving cars but intelligent appliances, conversational user interfaces for mobile devices for our consumer appliances like, you know, Amazon's got their Alexa, Apple's got their Siri and so forth. So we'll be looking at those trends as well towards pushing more of that intelligence towards the edge and the power and the role of Big Data and data driven algorithms, like machine learning, and driving those kinds of applications. So what we see in the Wikibon, the team that I'm embedded within, we have published just recently our updated forecast for the Big Data analytics market, and we've identified key trends that are... revolutionizing and disrupting and changing the market for Big Data analytics. So among the core trends, I mentioned the move towards multi-clouds. The move towards a more public cloud-based big data environments in the enterprise, I'll be asking Hortonworks, who of course built their business and their revenue stream primarily on on-premises deployments, to give us a sense for how they plan to evolve as a business as their customers move towards more public cloud facing deployments. And IBM, of course, will be here in force. We have tomorrow, which is a Thursday. We have several representatives from IBM to talk about their initiatives and partnerships with Hortonworks and others in the area of metadata management, in the area of machine learning and AI development tools and collaboration platforms. We'll be also discussing the push by IBM and Hortonworks to enable greater depths of governance applied to enterprise deployments of Big Data, both data governance, which is an area where Hortonworks and IBM as partners have achieved a lot of traction in terms of recognition among the pace setters in data governance in the multi-cloud, unstructured, Big Data environments, but also model governments. The governing, the version controls and so forth of machine learning and AI models. Model governance is a huge push by enterprises who increasingly are doing data science, which is what machine learning is all about. Taking that competency, that practice, and turning into more of an industrialized pipeline of building and training and deploying into an operational environment, a steady stream of machine-learning models into multiple applications, you know, edge applications, conversational UIs, search engines, eCommerce environments that are driven increasingly by machine learning that's able to process Big Data in real time and deliver next best actions and so forth more intelligence into all applications. So we'll be asking Hortonworks and IBM to net out where they're going with their partnership in terms of enabling a multi-layered governance environment to enable this pipeline, this machine-learning pipeline, this data science pipeline, to be deployed it as an operational capability into more organizations. Also, one of the areas where I'll be probing our guest is to talk about automation in the machine learning pipeline. That's been a hot theme that Wikibon has seen in our research. A lot of vendors in the data science arena are adding automation capabilities to their machine-learning tools. Automation is critically important for productivity. Data scientists as a discipline are in limited supply. I mean experienced, trained, seasoned data scientists fetch a high price. There aren't that many of them, so more of the work they do needs to be automated. It can be automated by a mature tool, increasingly mature tools on the market, a growing range of vendors. I'll be asking IBM and Hortonworks to net out where they're going with automation in sight of their Big Data, their machine learning tools and partnerships going forward. So really what we're going to be doing over the next few days is looking at these trends, but it's going to come back down to GDPR as a core envelope that many companies attending this event, DataWorks Summit, Berlin, are facing. So I'm James Kobielus with theCUBE. Thank you very much for joining us, and we look forward to starting our interviews in just a little while. Our first up will be Scott Gnau from Hortonworks. Thank you very much. (upbeat music)

Published Date : Apr 18 2018

SUMMARY :

Brought to you by Hortonworks. and enterprises on both sides of the Atlantic

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
James KobielusPERSON

0.99+

IBMORGANIZATION

0.99+

HortonworksORGANIZATION

0.99+

Scott GnauPERSON

0.99+

Andres KohlmaierPERSON

0.99+

AppleORGANIZATION

0.99+

European UnionORGANIZATION

0.99+

EuropeLOCATION

0.99+

General Data Protection RegulationTITLE

0.99+

ScottPERSON

0.99+

GoogleORGANIZATION

0.99+

Amazon Web ServicesORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

MicrosoftORGANIZATION

0.99+

two daysQUANTITY

0.99+

Munich ReORGANIZATION

0.99+

ThursdayDATE

0.99+

SiriTITLE

0.99+

GDPRTITLE

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

Berlin, GermanyLOCATION

0.99+

WikibonORGANIZATION

0.99+

firstQUANTITY

0.99+

Data Steward StudioORGANIZATION

0.98+

bothQUANTITY

0.98+

tomorrowDATE

0.98+

DataWorks SummitEVENT

0.98+

AtlanticLOCATION

0.98+

oneQUANTITY

0.98+

BerlinLOCATION

0.98+

both sidesQUANTITY

0.97+

DataWorks Summit 2018EVENT

0.97+

ApacheORGANIZATION

0.96+

HadoopTITLE

0.95+

AlexaTITLE

0.94+

this yearDATE

0.94+

SparkTITLE

0.92+

2018EVENT

0.91+

EUORGANIZATION

0.91+

Dataworks Summit 2018EVENT

0.88+

TensorFlowORGANIZATION

0.81+

this morningDATE

0.77+

about seven years agoDATE

0.76+

AzureTITLE

0.7+

next two daysDATE

0.68+

Five weeksQUANTITY

0.62+

NiFiTITLE

0.59+

EuropeanLOCATION

0.59+

theCUBEORGANIZATION

0.58+

Rob Thomas, IBM Analytics | IBM Fast Track Your Data 2017


 

>> Announcer: Live from Munich, Germany, it's theCUBE. Covering IBM: Fast Track Your Data. Brought to you by IBM. >> Welcome, everybody, to Munich, Germany. This is Fast Track Your Data brought to you by IBM, and this is theCUBE, the leader in live tech coverage. We go out to the events, we extract the signal from the noise. My name is Dave Vellante, and I'm here with my co-host Jim Kobielus. Rob Thomas is here, he's the General Manager of IBM Analytics, and longtime CUBE guest, good to see you again, Rob. >> Hey, great to see you. Thanks for being here. >> Dave: You're welcome, thanks for having us. So we're talking about, we missed each other last week at the Hortonworks DataWorks Summit, but you came on theCUBE, you guys had the big announcement there. You're sort of getting out, doing a Hadoop distribution, right? TheCUBE gave up our Hadoop distributions several years ago so. It's good that you joined us. But, um, that's tongue-in-cheek. Talk about what's going on with Hortonworks. You guys are now going to be partnering with them essentially to replace BigInsights, you're going to continue to service those customers. But there's more than that. What's that announcement all about? >> We're really excited about that announcement, that relationship, just to kind of recap for those that didn't see it last week. We are making a huge partnership with Hortonworks, where we're bringing data science and machine learning to the Hadoop community. So IBM will be adopting HDP as our distribution, and that's what we will drive into the market from a Hadoop perspective. Hortonworks is adopting IBM Data Science Experience and IBM machine learning to be a core part of their Hadoop platform. And I'd say this is a recognition. One is, companies should do what they do best. We think we're great at data science and machine learning. Hortonworks is the best at Hadoop. Combine those two things, it'll be great for clients. And, we also talked about extending that to things like Big SQL, where they're partnering with us on Big SQL, around modernizing data environments. And then third, which relates a little bit to what we're here in Munich talking about, is governance, where we're partnering closely with them around unified governance, Apache Atlas, advancing Atlas in the enterprise. And so, it's a lot of dimensions to the relationship, but I can tell you since I was on theCUBE a week ago with Rob Bearden, client response has been amazing. Rob and I have done a number of client visits together, and clients see the value of unlocking insights in their Hadoop data, and they love this, which is great. >> Now, I mean, the Hadoop distro, I mean early on you got into that business, just, you had to do it. You had to be relevant, you want to be part of the community, and a number of folks did that. But it's really sort of best left to a few guys who want to do that, and Apache open source is really, I think, the way to go there. Let's talk about Munich. You guys chose this venue. There's a lot of talk about GDPR, you've got some announcements around unified government, but why Munich? >> So, there's something interesting that I see happening in the market. So first of all, you look at the last five years. There's only 10 companies in the world that have outperformed the S&P 500, in each of those five years. And we started digging into who those companies are and what they do. They are all applying data science and machine learning at scale to drive their business. And so, something's happening in the market. That's what leaders are doing. And I look at what's happening in Europe, and I say, I don't see the European market being that aggressive yet around data science, machine learning, how you apply data for competitive advantage, so we wanted to come do this in Munich. And it's a bit of a wake-up call, almost, to say hey, this is what's happening. We want to encourage clients across Europe to think about how do they start to do something now. >> Yeah, of course, GDPR is also a hook. The European Union and you guys have made some talk about that, you've got some keynotes today, and some breakout sessions that are discussing that, but talk about the two announcements that you guys made. There's one on DB2, there's another one around unified governance, what do those mean for clients? >> Yeah, sure, so first of all on GDPR, it's interesting to me, it's kind of the inverse of Y2K, which is there's very little hype, but there's huge ramifications. And Y2K was kind of the opposite. So look, it's coming, May 2018, clients have to be GDPR-compliant. And there's a misconception in the market that that only impacts companies in Europe. It actually impacts any company that does any type of business in Europe. So, it impacts everybody. So we are announcing a platform for unified governance that makes sure clients are GDPR-compliant. We've integrated software technology across analytics, IBM security, some of the assets from the Promontory acquisition that IBM did last year, and we are delivering the only platform for unified governance. And that's what clients need to be GDPR-compliant. The second piece is data has to become a lot simpler. As you think about my comment, who's leading the market today? Data's hard, and so we're trying to make data dramatically simpler. And so for example, with DB2, what we're announcing is you can download and get started using DB2 in 15 minutes or less, and anybody can do it. Even you can do it, Dave, which is amazing. >> Dave: (laughs) >> For the first time ever, you can-- >> We'll test that, Rob. >> Let's go test that. I would love to see you do it, because I guarantee you can. Even my son can do it. I had my son do it this weekend before I came here, because I wanted to see how simple it was. So that announcement is really about bringing, or introducing a new era of simplicity to data and analytics. We call it Download And Go. We started with SPSS, we did that back in March. Now we're bringing Download And Go to DB2, and to our governance catalog. So the idea is make data really simple for enterprises. >> You had a community edition previous to this, correct? There was-- >> Rob: We did, but it wasn't this easy. >> Wasn't this simple, okay. >> Not anybody could do it, and I want to make it so anybody can do it. >> Is simplicity, the rate of simplicity, the only differentiator of the latest edition, or I believe you have Kubernetes support now with this new addition, can you describe what that involves? >> Yeah, sure, so there's two main things that are new functionally-wise, Jim, to your point. So one is, look, we're big supporters of Kubernetes. And as we are helping clients build out private clouds, the best answer for that in our mind is Kubernetes, and so when we released Data Science Experience for Private Cloud earlier this quarter, that was on Kubernetes, extending that now to other parts of the portfolio. The other thing we're doing with DB2 is we're extending JSON support for DB2. So think of it as, you're working in a relational environment, now just through SQL you can integrate with non-relational environments, JSON, documents, any type of no-SQL environment. So we're finally bringing to fruition this idea of a data fabric, which is I can access all my data from a single interface, and that's pretty powerful for clients. >> Yeah, more cloud data development. Rob, I wonder if you can, we can go back to the machine learning, one of the core focuses of this particular event and the announcements you're making. Back in the fall, IBM made an announcement of Watson machine learning, for IBM Cloud, and World of Watson. In February, you made an announcement of IBM machine learning for the z platform. What are the machine learning announcements at this particular event, and can you sort of connect the dots in terms of where you're going, in terms of what sort of innovations are you driving into your machine learning portfolio going forward? >> I have a fundamental belief that machine learning is best when it's brought to the data. So, we started with, like you said, Watson machine learning on IBM Cloud, and then we said well, what's the next big corpus of data in the world? That's an easy answer, it's the mainframe, that's where all the world's transactional data sits, so we did that. Last week with the Hortonworks announcement, we said we're bringing machine learning to Hadoop, so we've kind of covered all the landscape of where data is. Now, the next step is about how do we bring a community into this? And the way that you do that is we don't dictate a language, we don't dictate a framework. So if you want to work with IBM on machine learning, or in Data Science Experience, you choose your language. Python, great. Scala or Java, you pick whatever language you want. You pick whatever machine learning framework you want, we're not trying to dictate that because there's different preferences in the market, so what we're really talking about here this week in Munich is this idea of an open platform for data science and machine learning. And we think that is going to bring a lot of people to the table. >> And with open, one thing, with open platform in mind, one thing to me that is conspicuously missing from the announcement today, correct me if I'm wrong, is any indication that you're bringing support for the deep learning frameworks like TensorFlow into this overall machine learning environment. Am I wrong? I know you have Power AI. Is there a piece of Power AI in these announcements today? >> So, stay tuned on that. We are, it takes some time to do that right, and we are doing that. But we want to optimize so that you can do machine learning with GPU acceleration on Power AI, so stay tuned on that one. But we are supporting multiple frameworks, so if you want to use TensorFlow, that's great. If you want to use Caffe, that's great. If you want to use Theano, that's great. That is our approach here. We're going to allow you to decide what's the best framework for you. >> So as you look forward, maybe it's a question for you, Jim, but Rob I'd love you to chime in. What does that mean for businesses? I mean, is it just more automation, more capabilities as you evolve that timeline, without divulging any sort of secrets? What do you think, Jim? Or do you want me to ask-- >> What do I think, what do I think you're doing? >> No, you ask about deep learning, like, okay, that's, I don't see that, Rob says okay, stay tuned. What does it mean for a business, that, if like-- >> Yeah. >> If I'm planning my roadmap, what does that mean for me in terms of how I should think about the capabilities going forward? >> Yeah, well what it means for a business, first of all, is what they're going, they're using deep learning for, is doing things like video analytics, and speech analytics and more of the challenges involving convolution of neural networks to do pattern recognition on complex data objects for things like connected cars, and so forth. Those are the kind of things that can be done with deep learning. >> Okay. And so, Rob, you're talking about here in Europe how the uptick in some of the data orientation has been a little bit slower, so I presume from your standpoint you don't want to over-rotate, to some of these things. But what do you think, I mean, it sounds like there is difference between certainly Europe and those top 10 companies in the S&P, outperforming the S&P 500. What's the barrier, is it just an understanding of how to take advantage of data, is it cultural, what's your sense of this? >> So, to some extent, data science is easy, data culture is really hard. And so I do think that culture's a big piece of it. And the reason we're kind of starting with a focus on machine learning, simplistic view, machine learning is a general-purpose framework. And so it invites a lot of experimentation, a lot of engagement, we're trying to make it easier for people to on-board. As you get to things like deep learning as Jim's describing, that's where the market's going, there's no question. Those tend to be very domain-specific, vertical-type use cases and to some extent, what I see clients struggle with, they say well, I don't know what my use case is. So we're saying, look, okay, start with the basics. A general purpose framework, do some tests, do some iteration, do some experiments, and once you find out what's hunting and what's working, then you can go to a deep learning type of approach. And so I think you'll see an evolution towards that over time, it's not either-or. It's more of a question of sequencing. >> One of the things we've talked to you about on theCUBE in the past, you and others, is that IBM obviously is a big services business. This big data is complicated, but great for services, but one of the challenges that IBM and other companies have had is how do you take that service expertise, codify it to software and scale it at large volumes and make it adoptable? I thought the Watson data platform announcement last fall, I think at the time you called it Data Works, and then so the name evolved, was really a strong attempt to do that, to package a lot of expertise that you guys had developed over the years, maybe even some different software modules, but bring them together in a scalable software package. So is that the right interpretation, how's that going, what's the uptake been like? >> So, it's going incredibly well. What's interesting to me is what everybody remembers from that announcement is the Watson Data Platform, which is a decomposable framework for doing these types of use cases on the IBM cloud. But there was another piece of that announcement that is just as critical, which is we introduced something called the Data First method. And that is the recipe book to say to a client, so given where you are, how do you get to this future on the cloud? And that's the part that people, clients, struggle with, is how do I get from step to step? So with Data First, we said, well look. There's different approaches to this. You can start with governance, you can start with data science, you can start with data management, you can start with visualization, there's different entry points. You figure out the right one for you, and then we help clients through that. And we've made Data First method available to all of our business partners so they can go do that. We work closely with our own consulting business on that, GBS. But that to me is actually the thing from that event that has had, I'd say, the biggest impact on the market, is just helping clients map out an approach, a methodology, to getting on this journey. >> So that was a catalyst, so this is not a sequential process, you can start, you can enter, like you said, wherever you want, and then pick up the other pieces from majority model standpoint? Exactly, because everybody is at a different place in their own life cycle, and so we want to make that flexible. >> I have a question about the clients, the customers' use of Watson Data Platform in a DevOps context. So, are more of your customers looking to use Watson Data Platform to automate more of the stages of the machine learning development and the training and deployment pipeline, and do you see, IBM, do you see yourself taking the platform and evolving it into a more full-fledged automated data science release pipelining tool? Or am I misunderstanding that? >> Rob: No, I think that-- >> Your strategy. >> Rob: You got it right, I would just, I would expand a little bit. So, one is it's a very flexible way to manage data. When you look at the Watson Data Platform, we've got relational stores, we've got column stores, we've got in-memory stores, we've got the whole suite of open-source databases under the composed-IO umbrella, we've got cloud in. So we've delivered a very flexible data layer. Now, in terms of how you apply data science, we say, again, choose your model, choose your language, choose your framework, that's up to you, and we allow clients, many clients start by building models on their private cloud, then we say you can deploy those into the Watson Data Platform, so therefore then they're running on the data that you have as part of that data fabric. So, we're continuing to deliver a very fluid data layer which then you can apply data science, apply machine learning there, and there's a lot of data moving into the Watson Data Platform because clients see that flexibility. >> All right, Rob, we're out of time, but I want to kind of set up the day. We're doing CUBE interviews all morning here, and then we cut over to the main tent. You can get all of this on IBMgo.com, you'll see the schedule. Rob, you've got, you're kicking off a session. We've got Hilary Mason, we've got a breakout session on GDPR, maybe set up the main tent for us. >> Yeah, main tent's going to be exciting. We're going to debunk a lot of misconceptions about data and about what's happening. Marc Altshuller has got a great segment on what he calls the death of correlations, so we've got some pretty engaging stuff. Hilary's got a great piece that she was talking to me about this morning. It's going to be interesting. We think it's going to provoke some thought and ultimately provoke action, and that's the intent of this week. >> Excellent, well Rob, thanks again for coming to theCUBE. It's always a pleasure to see you. >> Rob: Thanks, guys, great to see you. >> You're welcome; all right, keep it right there, buddy, We'll be back with our next guest. This is theCUBE, we're live from Munich, Fast Track Your Data, right back. (upbeat electronic music)

Published Date : Jun 22 2017

SUMMARY :

Brought to you by IBM. This is Fast Track Your Data brought to you by IBM, Hey, great to see you. It's good that you joined us. and machine learning to the Hadoop community. You had to be relevant, you want to be part of the community, So first of all, you look at the last five years. but talk about the two announcements that you guys made. Even you can do it, Dave, which is amazing. I would love to see you do it, because I guarantee you can. but it wasn't this easy. and I want to make it so anybody can do it. extending that now to other parts of the portfolio. What are the machine learning announcements at this And the way that you do that is we don't dictate I know you have Power AI. We're going to allow you to decide So as you look forward, maybe it's a question No, you ask about deep learning, like, okay, that's, and speech analytics and more of the challenges But what do you think, I mean, it sounds like And the reason we're kind of starting with a focus One of the things we've talked to you about on theCUBE And that is the recipe book to say to a client, process, you can start, you can enter, and deployment pipeline, and do you see, IBM, models on their private cloud, then we say you can deploy and then we cut over to the main tent. and that's the intent of this week. It's always a pleasure to see you. This is theCUBE, we're live from Munich,

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Jim KobielusPERSON

0.99+

Dave VellantePERSON

0.99+

IBMORGANIZATION

0.99+

JimPERSON

0.99+

EuropeLOCATION

0.99+

RobPERSON

0.99+

Marc AltshullerPERSON

0.99+

HilaryPERSON

0.99+

Hilary MasonPERSON

0.99+

Rob BeardenPERSON

0.99+

FebruaryDATE

0.99+

DavePERSON

0.99+

HortonworksORGANIZATION

0.99+

Rob ThomasPERSON

0.99+

May 2018DATE

0.99+

MarchDATE

0.99+

MunichLOCATION

0.99+

ScalaTITLE

0.99+

ApacheORGANIZATION

0.99+

second pieceQUANTITY

0.99+

Last weekDATE

0.99+

JavaTITLE

0.99+

last yearDATE

0.99+

two announcementsQUANTITY

0.99+

10 companiesQUANTITY

0.99+

GDPRTITLE

0.99+

PythonTITLE

0.99+

DB2TITLE

0.99+

15 minutesQUANTITY

0.99+

last weekDATE

0.99+

IBM AnalyticsORGANIZATION

0.99+

European UnionORGANIZATION

0.99+

five yearsQUANTITY

0.99+

JSONTITLE

0.99+

Watson Data PlatformTITLE

0.99+

thirdQUANTITY

0.99+

OneQUANTITY

0.99+

this weekDATE

0.98+

todayDATE

0.98+

a week agoDATE

0.98+

two thingsQUANTITY

0.98+

SQLTITLE

0.98+

last fallDATE

0.98+

2017DATE

0.98+

Munich, GermanyLOCATION

0.98+

eachQUANTITY

0.98+

Y2KORGANIZATION

0.98+

Christoph Streubert, SAP - DataWorks Summit Europe 2017 - #DWS17 - #theCUBE


 

>> Announcer: Live from Munich, Germany, it's The CUBE, covering DataWorks Summit Europe 2017. Brought to you by Heartenworks. >> Okay, welcome back everyone, we are here live in Munich, Germany For DataWorks 2017, the DataWorks Summit, formally Hadoop Summit. I'm John Furrier with Silicone Angle's theCUBE, my co-host Dave Vellante, wrapping up day two of coverage here with Christoph Schubert, who's the Senior Director of SAP Big Data, handles all the go-to-market for SAP Big Data, @sapbigdata is the Twitter handle. You have a great shirt there, Go Live >> Go Live or go home. (Laughs) >> John: You guys are a part. Welcome to theCUBE. >> Christoph: Thank you, I appreciate it. >> Thanks for joining us and on the wrap up. You and I have known each other, we've known each other for a long time. We've been in many Sapphires together, we've had many conversations around the role of data, the role of architecture, the role of how organizations are transforming at the speed of business, which is SAP, it's a lot of software that powers business, under transformation right now. You guys are no stranger to analytics, we have the HANA Cloud Platform now. >> Christoph: We know a thing or two about that, yeah. (laughs) >> You know a little bit about data and legacy as well. You guys power pretty much most of the Fortune 100, if not all of them. What's your thoughts on this? >> Yeah, good point. On the topic of some numbers, about 75% of the world GDP runs through SAP systems eventually. So yes, we know a thing or two about transactional and analytical systems, definitely. >> John: And you're a partner with Hortonworks >> With Hortonworks and other Cloud providers, Hadoop Providers, certainly, absolutely but in this case, Hortonworks. We have, specifically, a solution that runs on Hadoop Spark and that allows, actually, our customers to unify much, much larger data sets with a system of records that we now do so many of them around the world for new and exciting new cases. >> And you were born in Munich. This is your hometown. >> This is actually a home gig for me, exactly. So, yes, unfortunately I'll also be presenting in English but yeah, I want to talk German, Bavarian, all the time. (laughs) >> I see my parents tonight. >> I wish we could help you >> but we don't speak Bavarian. But we do like to drink the beer though. It's the fifth season but a lot of great stuff here in Germany. Dave, you guys, I want to get your thoughts on something. I wanted to get you, just 'cause you're both, you're like an analyst, Christoph as well. I know you're over at SAP but, you know, you have such great industry expertise and Dave obviously covers the stuff everyday. I just think that the data world is so undervalued, in my mind. I think the ecosystem of startups that are coming out in the, out of the open source ecosystems, which are well-defined, by the way, and getting better. But now you have startups doing things like VIMTEC, we just had a bank on. Startups creating value and things like block chain on the horizon. Other new paradigms are coming on, is going to change the landscape of how wealth is created and value is created and charged. So, you've got a whole new tsunami of change. What's your thoughts on how this expands and obviously, certainly, Hortonworks as a public company and Cloudera is going public, so you expect to see that level up in valuation. >> They're in the process, yes. >> But I still think they're both undervalued. Your thoughts. >> Well it's not just the platform, right? and that what, I think, where Hadoop also came from. The legacy of Hadoop is that you don't have to really think about how you want to use your data. You have to, don't think ahead what kind of schema you want to apply and how you want to correlate your data. You can create a large data lake, right? That's the term that was created a long time ago, that allows customers to just collect all that data and think in the second stage about what to use with it and how to correlate it. And that's exactly, now, we're also seeing in the third stage, to not just create analytics but also creating applications instead of analytics or on top of analytics, correlating with data that also drives the business, the core business, from an OLTP perspective or also from an OLAP perspective. >> I mean, Dave, you were the one who said Amazon's a trillion dollar TAM, will be the first trillion dollar company and you were kind of, but you looked at the thousand points of Live with Cloud enables, all these aggregated all together, what's your thoughts on valuation of this industry? Because if Hortonworks continues on this peer play and they've got Cloudera coming in and they're doing well, you could argue that they're both undervalued companies if you count the ecosystem. >> Well, we always knew that big data was going to be a heavy lift, right? And I would agree with what Christoph was saying, was that Hadoop is profound in that it was no schema on right and ship five magabytes of code to a pedabyte of data. But it was hard to get that right. And I remember something you said, John, at one of our early SAP Sapphires, When the big data meme was just coming through. You said, "You know, SAP is not just big data, it's fast data". And you were talking about bringing transaction and analytic data together. >> John: Right. >> Again, something that has only recently been enabled. And you think about, you know, continuous streaming. I think that, now, big data has sort of entered the young-adulthood phase, we're going to start seeing steep part of that S-curve returns, and I think the hype will be realized. I think it is undervalued, much like the internet was. It was overvalued, then nobody wanted to touch it, and then it became. Actually, if you think back to 1999, the internet was undervalued in terms of what it actually achieved. >> John: Yeah. >> I think the same or similar thing is going to happen with big data. And since we have an SAP guest on, I'll say as well, We all remember the early days of ERP. >> Mhm, oh yeah. >> It wasn't clear >> Nope. >> Who was going to emerge as the king. >> Right. >> There were a few solutions. You're right. >> That's right. And, as well, something else we said about big data, it was the practitioners of ERP that made the most money, that created the most value and the same thing is happening here. >> Yeah. In fact, on that topic, I believe that 2017 and 2018 will be the big years for big data, so to speak. >> John: Uh huh. >> In fact, because of some statistics. >> John: In what way? >> Well, we just did >> Adoption, S-curve? >> Right, exactly. Utilizing the value of big data. You're talking about valuation here, right? 75% of CEOs of the top 1000 believe that the next three years are more important to their business than the last 50. And so that tells me that they're willing to invest. Not just the financial market, where I believe really run the most sophisticated big data analytics and models today. They had real use cases with real results very quickly. And so, they showed many how it's done. They created sort of the new role of a data scientist. They have roles like an AML officer. It's a real job, they do nothing else but anti-money laundering, right? So, in that industry they've shown us how to do that and I think others will follow. >> Yeah, and I think that when you look at this whole thing about digital transformation, it's all about data. >> John: Yeah. >> I mean, if you're serious about digital transformation, you must become a data-driven company and you have to hop on that curb. Even if you're talking to the, you know, bank today who got on in 2014, which was relatively late, but the pace at which they're advancing is astronomical. >> John: Yeah. >> I don't remember his name, a British mathematician, created, about 11 years already, that according to the phrase "Data is the new oil". >> John: Mhm. >> And I think it's very true because crude oil, in its original form, you also can't use it. >> John: It has to be refined. >> Right, exactly. It has to be refined to actually use it and use the value of it. Same thing with data. You have to distill it, you have to correlate it, you have to align it, you have to relate it to business transactions so the business really can take advantage of it. >> And then we're seeing, you know, to your point, you've got, I don't know, a list of big data companies that are now in public is growing. It's still small, not much profit. >> I mean, I just think, and this is while I'm getting your reaction, I mean, I'm just reading right now some news popping on my dashboard. Google just released some benchmarks on the TPU, the transistor processing unit, >> Dave: Right. >> Basically a chip dedicated to machine learning. >> Yep. >> You know, so, you're going to start to see some abstraction layers develop, whether it's a hardened-top processor hardware, you guys have certainly done innovation on the analytic side, we've seen that with some of the specialty apps. Just to make things go faster. I mean, so, more and more action is coming, so I would agree that this S-curve is coming. But the game might shift. I mean, this is not an easy, clear path. There's bets being made in big data and there's potential for huge money shift, of value. >> See, one of the things I see, and we talked to Hortonworks about this, the new president, you know, betting all on open source. I happen to think a hybrid model is going to win. I think the rich get richer here. SAP, IBM, even Oracle, you know, they can play the open source game and say, "Hey, we're going to contribute to open source, we're going to participate, we're going to utilize open source, but we're also going to put the imprimatur of our install base, our business model, our trusted brands behind so-called big data." We don't really use that term as much anymore. It's the confluence of not only the technology but the companies who, what'd you say, 75% of the world's transactions run though SAP at some point? >> Christoph: Yeah. >> With companies like SAP behind it, and others, that's when this thing, I think, really takes off. >> What I think a lot of people don't realize, and I've been a customer, also, for a long time before I joined the vendor side, and what is under-realized is the aspect of risk management. Once you have a system and once you have business processes digitized and they run your business, you can't introduce radical changes overnight as quickly anymore as you'd like or your business would like. So, risk management is really very important to companies. That's why you see innovation within organizations not necessarily come from the core digitization organization within their enterprise, it often happens on the outside, within different business units that are closer to the product or to the customer or something. >> Something else that's happening, too, that I wanted to address is this notion of digitization, which is all about data, allows companies to jump industries. You're seeing it everywhere, you're seeing Amazon getting into content, Apple getting into financial services. You know, there's this premise out there that Uber isn't about taxicabs, it's about logistics. >> John: Yeah. >> And so you're seeing these born-digital, born in the cloud companies now being able to have massive impacts across different industries. Huge disruption creates, you know, great opportunities, in my view. >> Christoph: Yeah. >> David: What do you think? >> I mean, I just think that the disruption is going to be brutal, and I want to, I'm trying to synthesize what's happening in this show, and you know, you're going to squint through all the announcements and the products, really an upgrade to 2.6, a new data platform. But here in Europe the IOT thing just, to me, is a catalyst point because it's really a proof point to where the value is today. >> David: Mhm. >> That people can actually look at and say, "This is going to have an impact on our business tier digitization point" and I think IOT is pulling the big data industry and cloud together. And I think machine learning and things that come over the top on it are only going to make it go faster. And so that intersection point, where the AI, augmented intelligence, is going to come in, I think that's where you're going to start to see real proof points on value proposition of data. I mean, right now it's all kind of an inner circle game. "Oh yeah, got to get the insights, optimize this process here and there" and so there's some low hanging fruit, but the big shifting, mind blowing, CEO changing strategies will come from some bigger moves. >> To that point, actually, two things I want to mention that SAP does in that space, specifically, right? Startups, we have a program actually, SAP.io, that Bill McDermont also recently introduced again, where we invest in startups in this space to help foster innovation faster, right? And also connecting that with our customers. >> John: What is it called? >> SAP.io Something to look out for. And on the topic of IOT, we made, also, an announcement at the beginning of the year, Project Leonardo. >> Yeah. >> It's a commitment, it's a solution set, and it's also an investment strategy, right? We're committed in this market to invest, to create solutions, we have solutions already in the cloud and also in primus. There are a few companies we also purchased in conjunction with Loeonardo, RT specifically. Some of our customers in the manufacturing space, very strong opportunity for IOT, sensor collection, creating SLAs for robotics on the manufacturing floor. For example, we have a complete solution set to make that possible and realize that for our customers and that's exactly a perfect example where these sensor applications in IOT, edge, compute rich environments come together also with a core where, then, a system of references like machine points, for example, matter because if you manage the SLA for a machine, for example, you just not only monitor it, you want to also automatically trigger the replacement of a part, for example, and that's why you need an SAP component, as well. So, in that space, we're heavily investing, as well. >> The other think I want to say about IOT is, I see it, I mean, cloud and big data have totally disrupted the IT business. You've seen Dell buying EMC, HP had to get out of the cloud business, Oracle pivoted to the cloud, SAP obviously, going hard after the cloud. Very, very disruptive, those two trends. I see IOT as not necessarily disruptive. I see those who have the install base as adopting IOT and doing very, very well. I think it's maybe disruptive to the economy at large, but I think existing companies like GE, like Siemens, like Dimar, are going to do very, very well as a result of IOT. I mean, to the extent they embrace digitization, which they would be crazy not to. >> Alright guys, final thoughts. What's your walkaway from this show? Dave, we'll start with you. >> I was going to say, you know, Hadoop has definitely not failed, in my mind, I think it's been wildly successful. It is entering this new phase that I call sort of young-adulthood and I think it's, we know it's gone mainstream into the enterprise, now it's about, okay, how do I really drive the value of data, as we've been discussing, and hit that steep part of the S-curve. Which, I agree, it's going to be within the next two years, you're going to start to see massive returns. And I think this industry is going to be realized, looked back, it was undervalued in 2017. >> Remember how long it took to align on TCP/IP? (laughter) >> Walk away, I mean interoperability was key with TCP/IP. >> Christoph: Yeah. One of the things that made things happen. >> I remember talking about it. (laughter) >> Yeah, two megabits per second. Yeah, but I mean, bringing back that, what's your walkaway? Because is it a unification opportunity? Is it more of an ecosystem? >> A good friend of mine, also at SAP on the West Coast, Andreas Walter, he shared an observation that he saw in another presentation years ago. It was suits versus hoodies. Different kind of way to run your IT shop, right? Top-down structure, waterfall projects, and suits, open source, hack it, quickly done, you know, get in, walk away, make money. >> Whoa, whoa, whoa, the suits were the waterfall, hoodies was the agile. >> Christoph: That's correct. >> Alright, alright, okay. >> Christoph: Correct. So, I think that it's not just the technology that's coming together, it's mindsets that are coming together. And I think organizationally for companies, that's the bigger challenge, actually. Because one is very subscribed, change control oriented, risk management aware. The other is very progressive, innovative, fast adopters. That these two can't bring those together, I think that's the real challenge in organizations. >> John: Mhm, yeah. >> Not the technology. And on that topic, we have a lot of very intelligent questions, very good conversations, deep conversations here with the audience at this event here in Munich. >> Dave, my walkaway was interesting because I had some preconceived notions coming in. Obviously, we were prepared to talk about, and because we saw the S1 File by Cloudera, you're starting to see the level of transparency relative to the business model. One's worth one billion dollars in private value, and then Hortonworks pushing only 2700 million in a public market, which I would agree with you is undervalued, vis a vis what's going on. So obviously, you're going to see my observation coming in from here is that I think that's going to be a haircut for Cloudera. The question is how much value will be chopped down off Cloudera, versus how much value of Hortonworks will go up. So the question is, does Cloudera plummit, or does Cloudera get a little bit of a haircut or stay and Hortonworks rises? Either way, the equilibrium in the industry will be established. The other option would be >> Dave: I think the former and the numbers are ugly, let's not sugarcoat it. And so that's got to change in order for this prediction that we're making. >> John: Former being the haircut? >> Yeah, the haircut's going to happen, I think. But the numbers are really ugly. >> But I think the question is how far does it drop and how much of that is venture. >> Sure. >> Venture, arbitrage, or just how they are capitalized but Hortonworks could roll up. >> But my point is that those numbers have to change and get better in order for our prediction to come true. Okay, so, but in your second talk, sorry to interrupt you but >> No, I like a debate and I want to know where that line is. We'll be watching. >> Dave: Yeah. >> But the value in, I think you guys are pointing out but I walk away, is IOT is bigger here, and I already said that, but I think the S-curve is, you're right on. I think you're going to start to see real, fast product development around incorporating data, whether that's a Hortonworks model, which seems to be the nice unifying, partner-oriented one, that's going to start seeing specialized hardware that people are going to start building chips for using flash or other things, and optimizing hard complexities. You pointed that out on the intro yesterday. And putting real product value on the table. I think the cards are going to start hitting the table in ecosystem, and what I'm seeing is that happening now. So, I think just an overall healthy ecosystem. >> Without a doubt. >> Okay. >> Great. >> Any final comments? >> Let's have a beer. >> Great to see you in Munich. (laughter) >> We'll have a beer, we had a pig knuckle last night, Dave. We had some sauerkraut. >> Christoph: (speaks foreign word) >> Yeah, we had the (speaks foreign word). Dave, we'll grab the beer, thanks. Good to be with you again. Thanks to the crew, thanks to everyone watching. >> Thanks, John. >> The CUBE, signing off from Munich, Germany for DataWorks 2017. Thanks for watching, see ya next time. (soft techno music)

Published Date : Apr 7 2017

SUMMARY :

Brought to you by Heartenworks. @sapbigdata is the Twitter handle. Go Live or go home. Welcome to theCUBE. at the speed of business, which is SAP, Christoph: We know a thing or two most of the Fortune 100, about 75% of the world GDP around the world for new And you were born in Munich. Bavarian, all the time. like block chain on the horizon. But I still think in the third stage, to I mean, Dave, you were the one who said And I remember something you said, John, the internet was undervalued in terms is going to happen with big data. There were a few solutions. that created the most value big data, so to speak. of some statistics. that the next three Yeah, and I think that when and you have to hop on that curb. that according to the phrase And I think it's very You have to distill it, you know, to your point, on the TPU, the transistor to machine learning. on the analytic side, we've seen that but the companies who, what'd you say, that's when this thing, I often happens on the outside, allows companies to jump industries. born in the cloud companies now being able that the disruption that come over the top on it to help foster innovation faster, right? And on the topic of IOT, we made, also, in the cloud and also in primus. I mean, to the extent Dave, we'll start with you. and hit that steep part of the S-curve. interoperability was key with TCP/IP. One of the things that made things happen. I remember talking about it. Is it more of an ecosystem? also at SAP on the West Coast, were the waterfall, hoodies was the agile. not just the technology And on that topic, we have a lot coming in from here is that I think and the numbers are ugly, But the numbers are really ugly. and how much of that is venture. but Hortonworks could roll up. sorry to interrupt you but and I want to know where that line is. that people are going to Great to see you in Munich. We'll have a beer, we had a Good to be with you again. Thanks for watching, see ya next time.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavePERSON

0.99+

Christoph SchubertPERSON

0.99+

ChristophPERSON

0.99+

Dave VellantePERSON

0.99+

SiemensORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

JohnPERSON

0.99+

HortonworksORGANIZATION

0.99+

IBMORGANIZATION

0.99+

GEORGANIZATION

0.99+

GermanyLOCATION

0.99+

Andreas WalterPERSON

0.99+

2014DATE

0.99+

EuropeLOCATION

0.99+

MunichLOCATION

0.99+

2017DATE

0.99+

DavidPERSON

0.99+

OracleORGANIZATION

0.99+

1999DATE

0.99+

HPORGANIZATION

0.99+

DellORGANIZATION

0.99+

John FurrierPERSON

0.99+

75%QUANTITY

0.99+

GoogleORGANIZATION

0.99+

AppleORGANIZATION

0.99+

UberORGANIZATION

0.99+

DimarORGANIZATION

0.99+

Christoph StreubertPERSON

0.99+

2018DATE

0.99+

Bill McDermontPERSON

0.99+

ClouderaORGANIZATION

0.99+

third stageQUANTITY

0.99+

first trillion dollarQUANTITY

0.99+

second stageQUANTITY

0.99+

one billion dollarsQUANTITY

0.99+

twoQUANTITY

0.99+

SAPORGANIZATION

0.99+

second talkQUANTITY

0.99+

yesterdayDATE

0.99+

Munich, GermanyLOCATION

0.99+

DataWorks SummitEVENT

0.99+

SAP Big DataORGANIZATION

0.98+

bothQUANTITY

0.98+

fifth seasonQUANTITY

0.98+

BavarianOTHER

0.98+

OneQUANTITY

0.98+